[Model] MTP fallback to eager for DeepSeek v32 #25982

luccafong · 2025-09-30T19:50:52Z

Purpose

Cudagraph + MTP V32 is still under progress, enable eager for now on MTP part by default which verified acceptance and eval and piecewise used with MTP enabled
Fix Eagle tests as well MTP tests (@njhill )

Test Plan

VLLM_SKIP_DEEP_GEMM_WARMUP=1 vllm serve "deepseek-ai/DeepSeek-V3.2-Exp" --max_model_len=20000 --gpu_memory_utilization=0.9 --tensor_parallel_size 8 --max_num_seqs=256 --speculative_config '{"num_speculative_tokens":1, "method":"mtp"}'  --no-enable-prefix-caching --compilation_config

lm_eval --model local-completions --tasks gsm8k     --model_args model=deepseek-ai/DeepSeek-V3.2-Exp,base_url=http://127.0.0.1:8000/v1/completions,num_concurrent=200,max_retries=3,tokenized_requests=False --batch_size 32 --num_fewshot 20

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	20	exact_match	↑	0.9507	±	0.006
		strict-match	20	exact_match	↑	0.9500	±	0.006

Note: it takes 9:58 with MTP, w/o MTP it is 12:08 (25% speedup)

^[[1;36m(APIServer pid=1007953)^[[0;0m INFO 09-30 12:31:58 [metrics.py:96] SpecDecoding metrics: Mean acceptance length: 1.92, Accepted throughput: 80.90 tokens/s, Drafted throughput: 88.00 tokens/s, Accepted: 809 tokens, Drafted: 880 tokens, Per-position acceptance rate: 0.919, Avg Draft acceptance rate: 91.9%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Lu Fang <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a fallback to eager execution mode for DeepSeek v32 models when using Medusa-style Token Prediction (MTP) for speculative decoding. This is a temporary measure to address an issue where CUDA graph is not yet supported for this specific combination.

The changes are well-contained and logical:

A new enforce_eager flag is added to SpeculativeConfig to allow overriding the default execution mode.
This flag is automatically enabled for DeepSeek v32 models when MTP is used.
The EagleProposer, which handles MTP, now checks this flag to disable CUDA graph when necessary.

The implementation is correct and effectively resolves the issue described. The changes are specific to the problematic configuration and should not affect other models or execution paths. Overall, this is a good, targeted fix.

LucasWilkinson

LGTM; thanks!

Signed-off-by: Lu Fang <[email protected]>

Signed-off-by: Nick Hill <[email protected]>

njhill · 2025-10-01T00:08:23Z

@luccafong fyi I pushed a similar fix that was needed to test_mtp.py.

luccafong · 2025-10-01T00:12:53Z

@luccafong fyi I pushed a similar fix that was needed to test_mtp.py.

thx!

Signed-off-by: Lu Fang <[email protected]> Signed-off-by: simon-mo <[email protected]>

MTP fallback to eager for v32

3c529d6

Signed-off-by: Lu Fang <[email protected]>

luccafong requested review from benchislett, simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 30, 2025 19:50

mergify bot added deepseek Related to DeepSeek models speculative-decoding v1 labels Sep 30, 2025

fix eagle tests

f3536c6

Signed-off-by: Lu Fang <[email protected]>

luccafong force-pushed the v3.2_mtp_enforce_eager branch from 69b3255 to f3536c6 Compare September 30, 2025 19:52

gemini-code-assist bot reviewed Sep 30, 2025

View reviewed changes

LucasWilkinson approved these changes Sep 30, 2025

View reviewed changes

luccafong added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 30, 2025

further eagle test fix

cc43fce

Signed-off-by: Lu Fang <[email protected]>

heheda12345 added this to the v0.11.0 Cherry Picks milestone Sep 30, 2025

default using piecewise when mtp enabled for indexer

3676c94

Signed-off-by: Lu Fang <[email protected]>

luccafong mentioned this pull request Sep 30, 2025

[BugFix] Fix eagle test broken by DeepSeek-V3.2 PR #25978

Closed

fix eagle tree test

ef7e8d4

Signed-off-by: Lu Fang <[email protected]>

luccafong force-pushed the v3.2_mtp_enforce_eager branch from 276e573 to ef7e8d4 Compare September 30, 2025 22:29

luccafong enabled auto-merge (squash) September 30, 2025 22:30

also fix test_mtp.py

fb807c5

Signed-off-by: Nick Hill <[email protected]>

luccafong merged commit 001e50c into vllm-project:main Oct 1, 2025
48 checks passed

simon-mo pushed a commit that referenced this pull request Oct 1, 2025

[Model] MTP fallback to eager for DeepSeek v32 (#25982)

bab9231

Signed-off-by: Lu Fang <[email protected]> Signed-off-by: simon-mo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Model] MTP fallback to eager for DeepSeek v32 #25982

[Model] MTP fallback to eager for DeepSeek v32 #25982

luccafong commented Sep 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

njhill commented Oct 1, 2025

Uh oh!

luccafong commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Model] MTP fallback to eager for DeepSeek v32 #25982

[Model] MTP fallback to eager for DeepSeek v32 #25982

Conversation

luccafong commented Sep 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

njhill commented Oct 1, 2025

Uh oh!

luccafong commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

luccafong commented Sep 30, 2025 •

edited by github-actions bot

Loading