Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1986,7 +1986,7 @@ dsr1-fp8-b300-sglang:
# until a B300-specific recipe ships. Prefix caching is disabled.
# Parallelisms and concurrency ranges mirror dsv4-fp4-b200-vllm.
dsv4-fp4-b300-sglang:
image: lmsysorg/sglang:deepseek-v4-b300@sha256:2fec8d7958bb0d53b50d7bf04d6ae6a7de8a35503775826e0550a45dd8c3ee15
image: lmsysorg/sglang:v0.5.12-cu130
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bumping dsv4-fp4-b300-sglang (line 1986) and dsv4-fp4-b300-sglang-mtp (line 2027) from the SHA-pinned lmsysorg/sglang:deepseek-v4-b300@sha256:... custom image to the generic lmsysorg/sglang:v0.5.12-cu130 strips the patched transformers that registers model_type: "deepseek_v4", so AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro") will crash with KeyError: '\''deepseek_v4'\'' before the server is even probed. Either hold this change until upstream sglang ships transformers with deepseek_v4 support, or have the recipe pip install a patched transformers inside the container before invoking the bench client. The PR author has already acknowledged this in the timeline.

Extended reasoning...

What the bug is

Both modified entries (dsv4-fp4-b300-sglang at line 1986 and dsv4-fp4-b300-sglang-mtp at line 2027) swap out a SHA-pinned custom image for the generic lmsysorg/sglang:v0.5.12-cu130 image. The custom deepseek-v4-b300@sha256:... builds bundle a patched transformers that registers a model type for deepseek_v4 (the config.json of deepseek-ai/DeepSeek-V4-Pro declares model_type: "deepseek_v4"). The generic v0.5.12-cu130 image bundles the upstream transformers release, which has no deepseek_v4 entry in its model-type registry.

Code path that triggers the failure

  1. Sweep dispatcher launches a container with image: lmsysorg/sglang:v0.5.12-cu130.
  2. Bench client runs and calls AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro").
  3. transformers downloads the model repo's config.json, reads model_type: "deepseek_v4", then attempts to look it up in CONFIG_MAPPING.
  4. Upstream transformers in v0.5.12-cu130 does not have deepseek_v4 registered → KeyError: 'deepseek_v4' is raised before the SGLang server is ever probed.

Why existing code doesn't prevent it

The recipe scripts for these two configs only change the image tag; nothing in the recipe pipes in a pip install transformers ... upgrade to bring in deepseek-v4 support. The sister entry dsv4-fp4-b200-sglang at line 1699 is still pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc... for exactly this reason, and other DSV4 entries (e.g. trtllm variants at lines 1781/1802/3016/3039) all use specifically-tagged trtllm-deepseek-v4:feat-deepseek_v4-9aa3715 images. Every DSV4 config in this file requires a special image with deepseek_v4 support — the b300 sglang variants are no exception.

Author confirmation

The PR author (functionstackx) acknowledged this directly in this PR's timeline on 2026-05-18T07:45:18Z: "the generic v0.5.12-cu130 image bundles a transformers that doesn'''t recognise model_type: "deepseek_v4", so the bench client crashes in AutoTokenizer.from_pretrained with KeyError: '\''deepseek_v4'\''. ... the generic-image bump is NOT viable until sglang ships transformers with deepseek_v4 support." They closed the PR as not viable, then reopened it with sweep labels intentionally disabled to avoid auto-triggering failing runs while they debug.

Step-by-step proof

  1. Open .github/configs/nvidia-master.yaml at line 1986 — image is now lmsysorg/sglang:v0.5.12-cu130, model is deepseek-ai/DeepSeek-V4-Pro.
  2. Pull the v0.5.12-cu130 image: docker pull lmsysorg/sglang:v0.5.12-cu130.
  3. Inside the container: python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('\''deepseek-ai/DeepSeek-V4-Pro'\'')".
  4. Observe: KeyError: '\''deepseek_v4'\'' raised from CONFIG_MAPPING.__getitem__ because upstream transformers in this image has no entry for deepseek_v4.
  5. Repeat for line 2027 (dsv4-fp4-b300-sglang-mtp) — same image, same model, identical failure.

Impact

Both dsv4-fp4-b300-sglang and dsv4-fp4-b300-sglang-mtp sweep runs will fail at tokenizer load 100% of the time. No benchmarks will be produced. The PR description itself acknowledges this risk: "⚠️ Note: the deepseek-v4-b300 tag is a custom DSV4 build; the generic v0.5.12-cu130 may or may not retain DSV4-specific features."

How to fix

Either (a) revert the image to the SHA-pinned custom deepseek-v4-b300 builds and wait for upstream sglang to ship a transformers release with deepseek_v4 registered, or (b) keep the generic image bump but have the recipe pip install a transformers build containing deepseek_v4 support inside the container before invoking the bench client. Option (a) is the safer choice and matches what is already done for the b200 sister entry at line 1699.

model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: b300
Expand Down Expand Up @@ -2027,7 +2027,7 @@ dsv4-fp4-b300-sglang:
# dp-attn: true -> DP-attn + flashinfer_mxfp4 + chunked-prefill 32768
# + EAGLE (1,1,2) + mem-fraction 0.92 + max-running 256
dsv4-fp4-b300-sglang-mtp:
image: lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd211e300dbb76924d56c5cbe6cc3ee5ee2fe314859cb8774f5bc070f3
image: lmsysorg/sglang:v0.5.12-cu130
model: deepseek-ai/DeepSeek-V4-Pro
model-prefix: dsv4
runner: b300
Expand Down
7 changes: 7 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3022,3 +3022,10 @@
description:
- "Update SGLang image from nightly-dev-cu13-20260518-c67b2870 to nightly-dev-cu13-20260519-dbac4647"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1492

- config-keys:
- dsv4-fp4-b300-sglang
- dsv4-fp4-b300-sglang-mtp
description:
- "Update SGLang image from SHA-pinned deepseek-v4-b300 custom build (20/18d old) to v0.5.12-cu130"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1455
Loading