[MiniMax M3] Enable and Optimize the MiniMax M3 Eagle by zejunchen-zejun · Pull Request #1333 · ROCm/ATOM

zejunchen-zejun · 2026-06-24T01:01:34Z

enable m3 eagle functionality
enable PD disagg for m3 eagle
cut EOS in sequence before return to frontend user
fusion kernel for prepare_mtp_decode in eagle
fusion kernel for token local argmax in eagle
replicated vocab emb to reduce the communication in eagle
fuse triple rmsnorm for aux hidden in eagle
fuse allreduce and rmsnorm in llama eagle

Model	Mode	flexible_extract	strict_match	Acceptance rate	Status
MiniMax-M3-MXFP4	EAGLE	0.9462	0.9469	73.56% (90483/123000, avg_toks_fwd=3.21)	PASS
MiniMax-M3-MXFP4	non-EAGLE	0.9462	0.9469	N/A	PASS

FP4 M3, 8k/1k, eagle model https://huggingface.co/Inferact/MiniMax-M3-EAGLE3

Concurrency	Non-Eagle Total tok/s	Eagle Total tok/s	Eagle Uplift
4	4,688.40	7,653.56	+63.24%
8	7,795.90	11,146.00	+42.97%
16	11,890.78	16,928.43	+42.37%
32	17,195.82	21,132.49	+22.89%
64	23,466.78	26,857.80	+14.45%

wuhuikx · 2026-06-24T05:34:27Z

Can you put the test results on the accuracy and performance into the comment? Are them the same in the recipe?

Copilot

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

+model_path=amd/MiniMax-M3-MXFP4
+model_path=MiniMaxAI/MiniMax-M3-MXFP8
+BS=65


+        if aux_hidden_states:
+            return hidden_states, aux_hidden_states
        return hidden_states


+        logits = tgemm.mm(x, self.weight, self.bias)  # [N, vocab/tp]
+        if self.tp_size <= 1:
+            return logits.argmax(dim=-1)
+        # Pack (val, idx) as fp32 — idx < 2^24 is exact — and all-gather only the
+        # per-rank reductions ([N, 2]) instead of the full logits.
+        packed = lm_head_argmax_pack(logits, self.vocab_start_idx)
+        gathered = get_tp_group().all_gather(packed, dim=0).view(self.tp_size, -1, 2)
+        winner = gathered[:, :, 0].argmax(dim=0)  # [N] winning rank (ties -> lowest)
+        token = gathered[:, :, 1].gather(0, winner.unsqueeze(0)).squeeze(0)  # [N] fp32
+        return token.to(torch.long)


Bring the model-agnostic / draft-side MiniMax-M3 EAGLE3 work from wuhuikx/atom-m3-bf16-to-main (2f1c385). These files' pre-eagle base is byte-identical to current main, so they port as-is: - eagle3_llama.py / eagle3_deepseek_mla.py: draft fusions (fused dual-RMSNorm +concat, fused group-RMSNorm aux, AR+RMSNorm fusion), compute_draft_token, replicated-embed option. - fused_aux_rmsnorm.py (new): the fused RMSNorm kernels for the draft. - lm_head_argmax.py (new) + embed_head.py: distributed greedy argmax (all-gather [N,2] per-rank maxima instead of full [N,vocab] logits). - spec_decode/eagle.py: draft loop with distributed-argmax fast path, no-pre-concat aux, and Eagle3 MHA draft KV-cache transfer for PD disaggregation (from #1331). - envs.py: ATOM_EAGLE_REPLICATE_EMBED. - tests/test_lm_head_argmax.py (new, importorskip(aiter) for the no-aiter CI). Target-side enablement (aux-hidden capture in minimax_m3, q>1 spec-verify metadata, prepare_mtp_decode) follows in Phase 2; note eagle.py now references attn_metadata_builder.prepare_mtp_decode which Phase 2 adds. Mocked suite: 437 passed / 38 pre-existing failures / +1 new skip — no regression. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Enable MiniMax-M3 EAGLE3 on current main's (Triton-sparse) M3 base, adapting the target side to main's API instead of wuhuikx's asm/gluon infra (absent on main). aiter_attention.py: - Add the generic block-paged MHA Eagle3 draft metadata: _mtp_prepare_decode_ metadata_kernel + prepare_mtp_decode + fuse_mtp_decode_position_update (used by the migrated eagle.py for both Kimi and M3 drafts; not M3-sparse coupled). - Replace the two "speculative decode not supported" NotImplementedError sites: route q>1 spec-verify through the sparse PREFILL path (make_sparse_prefill_ metadata; per-query causal via cu_seqlens_q, which is now filled uniformly for q>1). prefix_lens is bound to a new persistent sparse_prefix_lens buffer so the CUDAGraph-captured sparse indexer reads live causal lengths on each replay. minimax_m3.py: Eagle3 aux-hidden-state capture (Dynamo-safe, mirrors deepseek_v2): aux_hidden_state_layers, in-layer residual.clone() after the fused-allreduce norm, model forward returns (hidden, aux) tuple, set/get_eagle3_aux_hidden_state_layers on the ForCausalLM + VL-wrapper delegation. model_runner.py: extend KV transfer regions with the Eagle3 draft pool for PD disaggregation (#1331). scheduler.py: trim emitted spec tokens past the stop position (rejection sampler emits past EOS) so flexible-extract doesn't pick up leaked trailing tokens. recipes/MiniMax-M3.md: full EAGLE3 section (with a note that the ASM-PA/fp8/MXFP8 specifics reflect the fully-optimized variant, not this Triton-sparse base). Drop tests/test_lm_head_argmax.py (per request). Note: the q>1 sparse-verify path is new on main and CUDAGraph-sensitive — needs GPU validation (GSM8K + accept on TP4/TP8; confirm Kimi eagle unaffected). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

make lint happy Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

main's MiniMaxM3Attention (dense layers) does not set force_triton_attn in code and attention_mha has no block-128 guard, so on this base the dense attention is routed to Triton only via ATOM_FORCE_ATTN_TRITON=1 (the MXFP4 base section already sets it). The EAGLE section migrated from wuhuikx omitted it (wuhuikx set force_triton_attn=True in code instead), so the spec-verify dense attention (q=num_spec+1) fell into paged_attention_asm and aborted in get_heuristic_kernel (no bf16 block-128 ASM-PA kernel). Add the env to the EAGLE launch and drop the stale MXFP8 model_path line. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

+model_path=amd/MiniMax-M3-MXFP4
+model_path=MiniMaxAI/MiniMax-M3-MXFP8


@@ -672,9 +684,11 @@ def forward(
            hidden_states = intermediate_tensors["hidden_states"]


Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

wuhuikx marked this pull request as ready for review June 24, 2026 05:33

wuhuikx requested review from ZhangLirong-amd, Copilot, valarLip, whx-sjtu and yhl-amd June 24, 2026 05:34

Copilot AI reviewed Jun 24, 2026

zejunchen-zejun marked this pull request as draft June 24, 2026 09:26

zejunchen-zejun force-pushed the zejun/enable_and_opt_minimax_m3_eagle_0623 branch 2 times, most recently from f7d612b to 9fc4833 Compare June 25, 2026 03:25

zejunchen-zejun marked this pull request as ready for review June 25, 2026 03:46

Copilot AI review requested due to automatic review settings June 25, 2026 03:46

Copilot started reviewing on behalf of zejunchen-zejun June 25, 2026 03:46 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

zejunchen-zejun and others added 9 commits June 25, 2026 21:46

update recipe

aec38df

make lint happy Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

remove fp8 attn related command

2d8f581

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

remove ATOM_FORCE_ATTN_TRITON

1eef549

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

update recipe

f8d5579

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

update recipe

9e81237

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

update the recipe with the perf

3ee0b21

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

zejunchen-zejun force-pushed the zejun/enable_and_opt_minimax_m3_eagle_0623 branch from b481f3d to 3ee0b21 Compare June 25, 2026 13:50

Copilot AI review requested due to automatic review settings June 25, 2026 13:50

Copilot started reviewing on behalf of zejunchen-zejun June 25, 2026 13:50 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread recipes/MiniMax-M3.md

Comment on lines +197 to +198

model_path=amd/MiniMax-M3-MXFP4

model_path=MiniMaxAI/MiniMax-M3-MXFP8

Comment thread atom/models/minimax_m3.py

@@ -672,9 +684,11 @@ def forward(

hidden_states = intermediate_tensors["hidden_states"]

refine the comment

a1f38cd

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

valarLip approved these changes Jun 25, 2026

View reviewed changes

yhl-amd approved these changes Jun 25, 2026

View reviewed changes

valarLip merged commit 6e565c5 into main Jun 25, 2026
20 of 31 checks passed

valarLip deleted the zejun/enable_and_opt_minimax_m3_eagle_0623 branch June 25, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MiniMax M3] Enable and Optimize the MiniMax M3 Eagle#1333

[MiniMax M3] Enable and Optimize the MiniMax M3 Eagle#1333
valarLip merged 10 commits into
mainfrom
zejun/enable_and_opt_minimax_m3_eagle_0623

zejunchen-zejun commented Jun 24, 2026 •

edited

Loading

Uh oh!

wuhuikx commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		model_path=amd/MiniMax-M3-MXFP4
		model_path=MiniMaxAI/MiniMax-M3-MXFP8

		@@ -672,9 +684,11 @@ def forward(
		hidden_states = intermediate_tensors["hidden_states"]

Uh oh!

Conversation

zejunchen-zejun commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wuhuikx commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zejunchen-zejun commented Jun 24, 2026 •

edited

Loading