Skip to content

[AMD] Add AMD MI350X/MI355X (gfx950) blockwise FP8 support for run_qwen3_30b_a3b#1465

Draft
JessicaJiang-123 wants to merge 7 commits into
radixark:mainfrom
JessicaJiang-123:amd-qwen3-30b-a3b-fp8
Draft

[AMD] Add AMD MI350X/MI355X (gfx950) blockwise FP8 support for run_qwen3_30b_a3b#1465
JessicaJiang-123 wants to merge 7 commits into
radixark:mainfrom
JessicaJiang-123:amd-qwen3-30b-a3b-fp8

Conversation

@JessicaJiang-123

@JessicaJiang-123 JessicaJiang-123 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Co-authored-with: @XinyuJiangCMU

Summary

Add a ROCm gfx950 (MI350X / MI355X) blockwise FP8 training path for the Qwen3-30B-A3B RL recipe, and make the RL weight reload run the fp8 post-processing the inference engine needs.

Changes

  • scripts/run_qwen3_30b_a3b.py: add MI350X / MI355X as hardware options. On these, enable the TransformerEngine blockwise FP8 recipe (--fp8-recipe blockwise, --fp8-format e4m3, NVTE_ROCM_ENABLE_FP8_BLOCK_SCALING=1) and disable gradient-accumulation-fusion (ROCm has no wgrad fusion yet). Add a single-node parallel config (TP1 + sequence-parallel, PP2, CP2, EP4, --max-tokens-per-gpu 16384) with the CPU-offload optimizer, plus the matching rollout settings. Keep Ray from blanking HIP/CUDA visibility for the job entrypoint.
  • update_weight_from_tensor.py: run post_process_weights after the RL weight update for fp8 as well as compressed-tensors, so the inference engine re-applies its fp8 weight post-processing (the ROCm aiter pre-shuffle) on the freshly loaded weights each step.

Validated end-to-end on Qwen3-30B-A3B (8x MI350X): fp8 blockwise train + rollout matches the bf16 reference to ~0.04 relerr, and stays on-policy (per-step train-vs-rollout logprob abs-diff ~0.04), including under TP2 + sequence parallel (fwd/dgrad/wgrad).

Related

Part of the AMD Qwen3-30B-A3B blockwise-FP8 bring-up:

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant