Skip to content

feat: support post-norm architecture#97

Merged
zhyncs merged 2 commits into
lightseekorg:mainfrom
Dogacel:attention-drift
May 13, 2026
Merged

feat: support post-norm architecture#97
zhyncs merged 2 commits into
lightseekorg:mainfrom
Dogacel:attention-drift

Conversation

@Dogacel
Copy link
Copy Markdown
Contributor

@Dogacel Dogacel commented May 12, 2026

Post-norm architecture out-performs the pre-norm for speculative decoding models. Changes required

  1. Feed the model its own hidden states after the norm.
  2. Norm each target model hidden state before inputting to FC.

Short blog: https://x.com/dogacel0/status/2054200111043949012?s=20

Paper: https://arxiv.org/abs/2605.09992

Comparison of acceptance lengths & throughput. This method shines especially on long context.

image

I am interested to see if this new architecture will benefit to Kimi drafters.

Disclaimer: I only had 1-GPU and couldn't test the training pipeline E2E, only the unit tests were run. Let me know if there is a way to launch training on 1 GPU.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 226188362f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread torchspec/models/draft/deepseek_eagle.py Outdated
@zhyncs zhyncs merged commit 068f253 into lightseekorg:main May 13, 2026
1 of 2 checks passed
@yubofredwang
Copy link
Copy Markdown
Collaborator

Nice work! I will kick off trainings to verify

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants