[vLLM-ATOM] Support Eagle 3.1 spec decoding by kliuae · Pull Request #1201 · ROCm/ATOM

kliuae · 2026-06-12T17:58:58Z

Motivation

This PR adds EAGLE 3/3.1 speculative decoding support to vLLM-ATOM and its support for Kimi-K2.6.
Supported combinations are MLA target with MLA/MHA draft models. In the case of MLA target with MHA draft, separate KV cache pools are built so that vLLM doesn't attempt to unify the page sizes.

Technical Details

Test Plan

Target model: amd/Kimi-K2.6-MXFP4
Draft model: lightseekorg/kimi-k2.6-eagle3.1-mla (MLA), lightseekorg/kimi-k2.6-eagle3 (MHA)

Server command

vllm serve amd/Kimi-K2.6-MXFP4 \
    --host localhost \
    --port 8000 \
    --async-scheduling \
    --trust-remote-code \
    --gpu_memory_utilization 0.8 \
    --kv-cache-dtype fp8 \
    --no-enable-prefix-caching \
    --enable-auto-tool-choice \
    --tool-call-parser kimi_k2 \
    --reasoning-parser kimi_k2  \
    --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
    --tensor-parallel-size 8 \
    --speculative-config.method eagle3 \
    --speculative-config.model "$DRAFT_MODEL" \
    --speculative-config.num_speculative_tokens 3

lm_eval

lm_eval --model local-completions --model_args model=amd/Kimi-K2.6-MXFP4,base_url=http://localhost:8000/v1/completions,num_concurrent=64,max_retries=3,tokenized_requests=False,trust_remote_code=True --tasks gsm8k --num_fewshot 5

Test Result

lm_eval with gsm8k

lightseekorg/kimi-k2.6-eagle3.1-mla

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	_	0.9416	_	0.0065
		strict-match	5	exact_match	_	0.9409	_	0.0065

Per-position acceptance rate: 0.800, 0.602, 0.429, Avg Draft acceptance rate: 61.0%

lightseekorg/kimi-k2.6-eagle3

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	_	0.9348	_	0.0068
		strict-match	5	exact_match	_	0.9356	_	0.0068

Per-position acceptance rate: 0.819, 0.605, 0.388, Avg Draft acceptance rate: 60.4%

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

zejunchen-zejun · 2026-06-15T06:26:01Z

+        from atom.plugin.vllm.model_wrapper import ATOMModelBase
+        import vllm.v1.spec_decode.llm_base_proposer as llm_base_proposer
+    except Exception:
+        return


maybe we need to raise some warning here if import path doesn't exist when upgrading vllm to avoid the silent return

Good point, added a warning message.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>

kliuae added 5 commits June 13, 2026 00:18

add eagle 3.1 support for vllm plugin

fc4d96c

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

af0ad9a

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

3383617

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

3636951

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

b9bb066

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

zejunchen-zejun requested review from PerryZhang01, whx-sjtu and zejunchen-zejun and removed request for whx-sjtu June 14, 2026 13:49

zejunchen-zejun reviewed Jun 15, 2026

View reviewed changes

kliuae added 6 commits June 18, 2026 02:14

Merge branch 'main'

f9f5929

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

fix norm quant fusion consumer dtype inconsistency

f66cbc1

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

add import fail warning

e68e53a

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

put mla spec decode in cudagraph

87382d9

Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>

enable mla target mha draft

52c6ffe

Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>

fix

836ddb1

Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>

kliuae marked this pull request as ready for review June 25, 2026 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vLLM-ATOM] Support Eagle 3.1 spec decoding#1201

[vLLM-ATOM] Support Eagle 3.1 spec decoding#1201
kliuae wants to merge 11 commits into
ROCm:mainfrom
kliuae:kliuae/plugin_enable_eagle31_merge

kliuae commented Jun 12, 2026 •

edited

Loading

Uh oh!

zejunchen-zejun Jun 15, 2026

Uh oh!

kliuae Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kliuae commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

zejunchen-zejun Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

kliuae Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kliuae commented Jun 12, 2026 •

edited

Loading