dsv4-fp4-b300-sglang: update image to nightly by yhyang201 · Pull Request #1506 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-05-18T18:17:28Z

Summary

Update image from deepseek-v4-b300@sha256:2fec8d... to nightly-dev-cu13-20260518-c67b2870
Refactor benchmark script to dispatch by CONC instead of nested DP_ATTENTION/CONC/EP_SIZE
Switch high-concurrency profiles (CONC 2048/4096/8192) from --moe-a2a-backend deepep to megamoe
Remove env vars deleted from sglang main or redundant with defaults
Remove --deepep-config (not needed by megamoe)
Fix CONC=512 yaml ep: 4 → ep: 1 (flashinfer_mxfp4 doesn't set ep=tp)

Note

Medium Risk
Changes benchmark launch parameters and cluster image import behavior; affects reproducibility of B300 sglang results but not production serving paths.

Overview
Updates the DeepSeek-V4-Pro FP4 B300 sglang benchmark to a newer nightly container and aligns launch recipes with megamoe for high concurrency.

The nvidia-master config switches from the pinned deepseek-v4-b300 image to lmsysorg/sglang:nightly-dev-cu13-20260529-a8cfae0b, documents CONC-based recipe selection, and corrects the CONC=512 search point from ep: 4 to ep: 1 so YAML matches flashinfer_mxfp4 (no implicit ep=tp).

dsv4_fp4_b300_sglang.sh is refactored to pick profiles by CONC (1/32 TP-only; 512 DP-attn + flashinfer; 2048–8192 DP-attn + --moe-a2a-backend megamoe instead of deepep). Stale or redundant SGLANG_OPT_* env vars, --deepep-config, and related DeepEP settings are dropped; the benchmark step upgrades transformers before serving.

launch_b300-nv.sh runs enroot import with temp/cache paths on local /tmp so squash import does not fail on NFS whiteout removal during image pull.

perf-changelog.yaml records the above for dsv4-fp4-b300-sglang.

^{Reviewed by Cursor Bugbot for commit 9f6043d. Bugbot is set up for automated code reviews on this repo. Configure here.}

github-actions · 2026-05-18T18:17:47Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-18T18:17:48Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-19T16:07:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26109529858
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26109529858

github-actions · 2026-05-19T16:54:56Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26109534591
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26109534591

github-actions · 2026-05-21T12:32:07Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26221509538
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26221509538

github-actions · 2026-05-22T04:28:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26221509538
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26221509538

github-actions · 2026-05-29T19:46:14Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26658560606
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26658560606

…e0b, refactor script, switch to megamoe

github-actions · 2026-05-29T19:50:20Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26658745339
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26658745339

github-actions · 2026-05-30T03:02:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26658745339
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26658745339

The head node's /scratch is an NFS mount that can return a stale file handle. enroot's runtime/cache/data/temp dirs are pinned under /scratch by /etc/enroot/enroot.conf{,.d}, so on a stale mount `enroot import` cannot create its working dirs and produces no .sqsh. That surfaces downstream as a cryptic pyxis "No such file or directory: ...sqsh" on the compute node and fails the single-node canary (e.g. actions run 26658745339). When /scratch is unusable, probe it and redirect enroot's paths to the healthy /data share for the import only. The exports stay inside the import subshell, so the salloc/srun below (and the compute node's own /scratch) are unaffected; on a healthy head node the probe passes and behavior is identical. Also fail fast if the import still can't produce a squash instead of proceeding to a doomed srun. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ort" This reverts commit f584272. /scratch has been remounted on the head node and the stale NFS handle is cleared, so the enroot temp/cache/data redirect workaround is no longer needed. Restores the original import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T18:54:46Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26691343935
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26691343935

github-actions · 2026-05-30T23:42:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26691343935
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26691343935

…PERM The single-node import extracts under ENROOT_TEMP_PATH, which /etc/enroot/enroot.conf pins to NFS /scratch. enroot-aufs2ovlfs unpacks the image's root-owned AUFS whiteout markers into a sticky /tmp and then can't unlink them over NFS (root-squash strips the CAP_FOWNER it needs), failing with 'failed to remove aufs whiteout: Operation not permitted' and producing no .sqsh -- which then surfaces as a pyxis 'No such file or directory' on the compute node. Run the import on local disk, where the extracted files are owned by the runner user and removable. Scoped to the import subshell and cleaned up on exit, so salloc/srun and the compute node's own /scratch are unaffected. Proper fix is to point ENROOT_TEMP_PATH at local disk in enroot.conf cluster-wide; this is the no-root workaround. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T02:51:09Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701471985
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701471985

github-actions · 2026-05-31T02:52:36Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

github-actions · 2026-05-31T10:51:12Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

github-actions · 2026-05-31T18:29:47Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

github-actions · 2026-05-31T20:04:37Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

github-actions · 2026-05-31T20:25:05Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

github-actions · 2026-06-01T21:10:48Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26701489318
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26701489318

Oseltamivir · 2026-06-01T22:04:41Z

/reuse-sweep-run

Oseltamivir

lgtm

github-actions · 2026-06-01T22:06:19Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26784885157
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=26784885157

…, switch to megamoe Migrate the 7 STP disagg recipes to the megamoe MoE backend (deepep -> megamoe, drop deepep-config) and strip obsolete SGLANG_OPT_*/SGLANG_DEEPEP env vars now defaulted upstream, mirroring the b300 migration (#1506). Clean the 5 dynamo recipes: fix container to dsv4-grace-blackwell, remove personal extra_mount and hardcoded nodelist pins so they run on CI.

yhyang201 requested a review from a team May 18, 2026 18:17

yhyang201 requested review from jgangani and kedarpotdar-nv as code owners May 18, 2026 18:17

github-project-automation Bot added this to InferenceMAX Board May 18, 2026

yhyang201 changed the title ~~dsv4-fp4-b300-sglang: update image to nightly, switch to megamoe~~ dsv4-fp4-b300-sglang: update image to nightly May 18, 2026

yhyang201 force-pushed the yyh/update-dsv4-b300-sglang-image branch from f25519e to cf36b0c Compare May 19, 2026 15:32

yhyang201 added the full-sweep-enabled label May 19, 2026

yhyang201 force-pushed the yyh/update-dsv4-b300-sglang-image branch from d8ca8a8 to 09875d7 Compare May 21, 2026 10:52

yhyang201 added a commit that referenced this pull request May 29, 2026

Append perf-changelog entry for PR #1506

cfa7211

yhyang201 force-pushed the yyh/update-dsv4-b300-sglang-image branch from 09875d7 to cfa7211 Compare May 29, 2026 19:45

yhyang201 added 2 commits May 30, 2026 03:49

dsv4-fp4-b300-sglang: update image to nightly-dev-cu13-20260529-a8cfa…

f593147

…e0b, refactor script, switch to megamoe

Append perf-changelog entry for PR #1506

0ba92fd

yhyang201 force-pushed the yyh/update-dsv4-b300-sglang-image branch from cfa7211 to 0ba92fd Compare May 29, 2026 19:49

Oseltamivir added full-sweep-enabled and removed full-sweep-enabled labels May 30, 2026

Oseltamivir and others added 2 commits May 30, 2026 10:21

Merge branch 'main' into yyh/update-dsv4-b300-sglang-image

fdcfe59

SemiAnalysisAI deleted a comment from github-actions Bot May 30, 2026

Merge branch 'main' into yyh/update-dsv4-b300-sglang-image

7dac7cc

Merge branch 'main' into yyh/update-dsv4-b300-sglang-image

9f6043d

Oseltamivir approved these changes Jun 1, 2026

View reviewed changes

Oseltamivir merged commit ef75a2a into main Jun 1, 2026
18 of 19 checks passed

Oseltamivir deleted the yyh/update-dsv4-b300-sglang-image branch June 1, 2026 22:05

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 1, 2026

claude Bot mentioned this pull request Jun 5, 2026

Add DSv4-Pro FP4 GB200 SGLang disagg + MTP config #1676

Open

Conversation

yhyang201 commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Oseltamivir commented Jun 1, 2026

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yhyang201 commented May 18, 2026 •

edited by cursor Bot

Loading