Skip to content

Conversation

tlrmchlsmth
Copy link
Member

@tlrmchlsmth tlrmchlsmth commented Sep 30, 2025

We are using the verifier model's config instead of the draft model's config when using Eagle for Deepseek.

Introduced in #24134. Similar issue for llama3 was fixed in #25883.

Signed-off-by: Tyler Michael Smith <[email protected]>
@mergify mergify bot added deepseek Related to DeepSeek models speculative-decoding labels Sep 30, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug in the DeepseekV2 Eagle speculative decoding implementation. Previously, the draft model's decoder layers were incorrectly using the verifier model's configuration. The changes introduce an optional config parameter to the DeepseekV2DecoderLayer initializer, allowing the correct draft model configuration to be passed. The implementation maintains backward compatibility by falling back to the verifier model's configuration when the new parameter is not provided. The fix is well-implemented and aligns with similar corrections in the codebase.

DeepseekV2DecoderLayer(
vllm_config,
prefix=maybe_prefix(prefix, f"layers.{i + start_layer_id}"),
config=self.config,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also be applied in deepseek_mtp.py?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch deepseek_mtp.py because even prior to #24134, it passed in the verifier model config rather than the draft model... I guess it was always broken?

Copy link
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config needs to be added at the end of DeepseekV2DecoderLayer's constructor, because deepseek_mtp.py passes the arguments positionally

self.mtp_block = DeepseekV2DecoderLayer(vllm_config, prefix,

@tlrmchlsmth
Copy link
Member Author

closing in favor of #25987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants