Yeyu/hf eagle medusa #664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

yeyu-nvidia wants to merge 9 commits into main from yeyu/hf_eagle_medusa

+120 −473

Contributor

yeyu-nvidia commented Dec 8, 2025

What does this PR do?

new feature,

Overview:
This PR implements HF parallel draft by combining eagle and medusa. In training, multiple medusa heads are added and trained together with eagle. In inference, medusa heads are used to generate draft tokens after all eagle tokens.

Usage

Set parallel_draft_step>1 in eagle_config to enable parallel draft.

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

yeyu-nvidia added 3 commits

December 8, 2025 10:56


          implement eagle+medusa in HF

bbf6b14

Signed-off-by: Ye Yu <[email protected]>


          implement eagle+medusa and update EagleTrainingPlot accordingly

a01af59

Signed-off-by: Ye Yu <[email protected]>


          minor

a62b597

Signed-off-by: Ye Yu <[email protected]>

yeyu-nvidia requested a review from a team as a code owner

December 8, 2025 19:07

yeyu-nvidia requested a review from h-guo18

December 8, 2025 19:07

yeyu-nvidia added 3 commits

December 9, 2025 10:02


          debug

561530e

Signed-off-by: Ye Yu <[email protected]>


          move hidden_size and vocab from main to model modify

1b50e60

Signed-off-by: Ye Yu <[email protected]>


          debug

cdea9ed

Signed-off-by: Ye Yu <[email protected]>

yeyu-nvidia self-assigned this

Contributor Author

yeyu-nvidia commented Dec 9, 2025

/ok to test cdea9ed


          fix a bug in default_config

f8a1088

Signed-off-by: Ye Yu <[email protected]>

Contributor Author

yeyu-nvidia commented Dec 10, 2025

/ok to test f8a1088

codecov bot commented Dec 10, 2025 •

edited

Loading

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.78%. Comparing base (53a2dde) to head (49954df).
⚠️ Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
+ Coverage   74.50%   74.78%   +0.27%     
==========================================
  Files         183      192       +9     
  Lines       18400    18814     +414     
==========================================
+ Hits        13709    14070     +361     
- Misses       4691     4744      +53

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


          remove tree decoding test code

2a7d7fe

Signed-off-by: Ye Yu <[email protected]>

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py Outdated Show resolved Hide resolved

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py Outdated Show resolved Hide resolved

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py

    
                      draft_logits_list = [eagle_logits]

                      if self.eagle_config.parallel_draft_step > 1:

                          # Get additional draft logits from parallel draft heads

                          for draft_head in self.eagle_module.parallel_draft_heads:

Contributor

h-guo18 Dec 11, 2025

I think we can optimize this for-loop and do it parallelly. Can be done in following PRs perhaps.

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py

    
                                  (

                                      torch.zeros(b, ttt_step, dtype=loss_mask.dtype, device=loss_mask.device),

                                      loss_mask[:, 1 + ttt_step :],

                          for i in range(self.eagle_config.parallel_draft_step):

Contributor

h-guo18 Dec 11, 2025 •

edited

Loading

Similar as above, this for-loop seems parallelizable

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py Outdated

    
                          eagle_input_hidden_states = base_model_hidden_states

                      draft_tokens = []

                      for _ in range(steps):

Contributor

h-guo18 Dec 11, 2025

The semantic of this steps argument seems ambiguous to me. Shall we rename it to eagle_steps? Then it means we do eagle_steps sequential drafting + num_medusa_heads parallel drafrting.

h-guo18 reviewed

View reviewed changes

modelopt/torch/speculative/plugins/transformers.py Show resolved Hide resolved

h-guo18 approved these changes

View reviewed changes

Contributor

h-guo18 left a comment

Left some comments and questions. Other changes in HF LGTM.
I think it would be great to test PTQ and AR regression test before merging. Thanks


          clean up config; fix export bug

49954df

Signed-off-by: Ye Yu <[email protected]>

yeyu-nvidia requested a review from a team as a code owner

December 11, 2025 21:31

yeyu-nvidia requested a review from sugunav14

December 11, 2025 21:31

sugunav14 reviewed

View reviewed changes

examples/speculative_decoding/eagle_config.json

    
            @@ -1,11 +1,3 @@
          
              {

                  "rope_scaling": {

Contributor

sugunav14 Dec 11, 2025

why do we need to delete this configuration?

Contributor Author

yeyu-nvidia Dec 11, 2025

It's duplicated. We already have this in the default config here https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/eagle/default_config.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet