[vLLM-ATOM] Support Eagle 3.1 spec decoding#1201
Open
kliuae wants to merge 11 commits into
Open
Conversation
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
| from atom.plugin.vllm.model_wrapper import ATOMModelBase | ||
| import vllm.v1.spec_decode.llm_base_proposer as llm_base_proposer | ||
| except Exception: | ||
| return |
Collaborator
There was a problem hiding this comment.
maybe we need to raise some warning here if import path doesn't exist when upgrading vllm to avoid the silent return
Contributor
Author
There was a problem hiding this comment.
Good point, added a warning message.
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>
Signed-off-by: miloice <kuanfu.liu@embeddedllm.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR adds EAGLE 3/3.1 speculative decoding support to vLLM-ATOM and its support for Kimi-K2.6.
Supported combinations are MLA target with MLA/MHA draft models. In the case of MLA target with MHA draft, separate KV cache pools are built so that vLLM doesn't attempt to unify the page sizes.
Technical Details
Test Plan
Target model:
amd/Kimi-K2.6-MXFP4Draft model:
lightseekorg/kimi-k2.6-eagle3.1-mla(MLA),lightseekorg/kimi-k2.6-eagle3(MHA)Server command
lm_eval
Test Result
lm_eval with gsm8k
lightseekorg/kimi-k2.6-eagle3.1-mlaPer-position acceptance rate: 0.800, 0.602, 0.429, Avg Draft acceptance rate: 61.0%
lightseekorg/kimi-k2.6-eagle3Per-position acceptance rate: 0.819, 0.605, 0.388, Avg Draft acceptance rate: 60.4%
Submission Checklist