Add MiniMax-M3 (MXFP4/AttnFP8) model support by thpereir · Pull Request #1317 · ROCm/ATOM

thpereir · 2026-06-22T22:35:29Z

The in-tree MiniMax-M3 model already covers the BF16 checkpoint. This
adds the small pieces the quantized amd/MiniMax-M3-MXFP4-AttnFP8 build
needs, without disturbing the BF16 path.

config.py: register the minimax_m3_vl multimodal wrapper and parse its
text sub-config (which declares no model_type) with the base
PretrainedConfig so every field is retained and no deepseek/MLA
defaults leak in; stamp model_type=minimax_m3 from the top-level type.
The quark quantization_config (already propagated from the root) and
the original architectures are preserved, so loading resolves to the
existing MiniMaxM3Sparse model. The BF16 checkpoint keeps its direct
minimax_m3 model_type and is unaffected.
linear.py: pad the MXFP4 Linear contraction dim to 256. The a4w4 asm
GEMM reads K in 256-wide tiles, so an unaligned K (M3's shared-expert
down_proj at TP=8, K=384) faults on GPU. LinearBase._pad_mxfp4_input_dim()
zero-pads the fp4x2 weight, its e8m0 scale, and the activation up to
256-alignment; no-op when already aligned.

Validated: GSM8K 93.9% at TP=8 (full 1319, 5-shot), matching the TP=1
baseline; lossless vs the aligned baseline (cosine 0.9934).

Motivation

Properly run MiniMax M3 MXFP4 on ATOM with TP=8

Test Result

Tested with ATOM with TP=8:

python -m atom.entrypoints.openai_server \
  --model amd/MiniMax-M3-MXFP4-AttnFP8 \
  --trust-remote-code \
  --tensor-parallel-size 8 \
  --block-size 128 \
  --server-port 8000

lm-eval

lm_eval \
  --model local-chat-completions \
  --model_args "model=amd/MiniMax-M3-MXFP4-AttnFP8,base_url=http://127.0.0.1:8000/v1/chat/completions,num_concurrent=32,max_gen_toks=16384" \
  --tasks gsm8k \
  --num_fewshot 5 \
  --batch_size 1 \
  --apply_chat_template \
  --fewshot_as_multiturn

Results, for reference with TP=1 gsm8k gives ~0.9424:

Build	flexible-extract	strict-match
Baseline — `origin/main` (no fix)	0.7665 ± 0.0117	0.7657 ± 0.0117
Fixed — this branch	0.9378 ± 0.0065	0.9386 ± 0.0065

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

The in-tree MiniMax-M3 model already covers the BF16 checkpoint. This adds the small pieces the quantized amd/MiniMax-M3-MXFP4-AttnFP8 build needs, without disturbing the BF16 path. - config.py: register the minimax_m3_vl multimodal wrapper and parse its text sub-config (which declares no model_type) with the base PretrainedConfig so every field is retained and no deepseek/MLA defaults leak in; stamp model_type=minimax_m3 from the top-level type. The quark quantization_config (already propagated from the root) and the original architectures are preserved, so loading resolves to the existing MiniMaxM3Sparse model. The BF16 checkpoint keeps its direct minimax_m3 model_type and is unaffected. - linear.py: pad the MXFP4 Linear contraction dim to 256. The a4w4 asm GEMM reads K in 256-wide tiles, so an unaligned K (M3's shared-expert down_proj at TP=8, K=384) faults on GPU. LinearBase._pad_mxfp4_input_dim() zero-pads the fp4x2 weight, its e8m0 scale, and the activation up to 256-alignment; no-op when already aligned.

thpereir force-pushed the thpereir/m3 branch 3 times, most recently from b93487e to cb66f66 Compare June 23, 2026 18:47

thpereir marked this pull request as ready for review June 23, 2026 18:51

thpereir force-pushed the thpereir/m3 branch from cb66f66 to 849ee04 Compare June 23, 2026 18:55

thpereir force-pushed the thpereir/m3 branch from 849ee04 to 9d0b328 Compare June 24, 2026 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add MiniMax-M3 (MXFP4/AttnFP8) model support#1317

Add MiniMax-M3 (MXFP4/AttnFP8) model support#1317
thpereir wants to merge 1 commit into
ROCm:mainfrom
thpereir:thpereir/m3

thpereir commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

thpereir commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thpereir commented Jun 22, 2026 •

edited

Loading