-
Notifications
You must be signed in to change notification settings - Fork 213
Yeyu/hf eagle medusa #664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Yeyu/hf eagle medusa #664
Conversation
Signed-off-by: Ye Yu <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
Signed-off-by: Ye Yu <[email protected]>
|
/ok to test cdea9ed |
Signed-off-by: Ye Yu <[email protected]>
|
/ok to test f8a1088 |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #664 +/- ##
==========================================
+ Coverage 74.50% 74.78% +0.27%
==========================================
Files 183 192 +9
Lines 18400 18814 +414
==========================================
+ Hits 13709 14070 +361
- Misses 4691 4744 +53 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Ye Yu <[email protected]>
| draft_logits_list = [eagle_logits] | ||
| if self.eagle_config.parallel_draft_step > 1: | ||
| # Get additional draft logits from parallel draft heads | ||
| for draft_head in self.eagle_module.parallel_draft_heads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can optimize this for-loop and do it parallelly. Can be done in following PRs perhaps.
| ( | ||
| torch.zeros(b, ttt_step, dtype=loss_mask.dtype, device=loss_mask.device), | ||
| loss_mask[:, 1 + ttt_step :], | ||
| for i in range(self.eagle_config.parallel_draft_step): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar as above, this for-loop seems parallelizable
| eagle_input_hidden_states = base_model_hidden_states | ||
|
|
||
| draft_tokens = [] | ||
| for _ in range(steps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic of this steps argument seems ambiguous to me. Shall we rename it to eagle_steps? Then it means we do eagle_steps sequential drafting + num_medusa_heads parallel drafrting.
h-guo18
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and questions. Other changes in HF LGTM.
I think it would be great to test PTQ and AR regression test before merging. Thanks
Signed-off-by: Ye Yu <[email protected]>
| @@ -1,11 +1,3 @@ | |||
| { | |||
| "rope_scaling": { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to delete this configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's duplicated. We already have this in the default config here https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/eagle/default_config.py
What does this PR do?
new feature,
Overview:
This PR implements HF parallel draft by combining eagle and medusa. In training, multiple medusa heads are added and trained together with eagle. In inference, medusa heads are used to generate draft tokens after all eagle tokens.
Usage
Set parallel_draft_step>1 in eagle_config to enable parallel draft.
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information