Add MXFP8 attention unit test with linear and rope layers#3033
Conversation
Greptile SummaryThis PR adds a DSv3 671B-shaped MXFP8 end-to-end attention unit test, covering
Confidence Score: 5/5Safe to merge; only test files are added with no modifications to production TE code, and tests are correctly gated on MXFP8 availability. All changes are test infrastructure. The MLA RoPE Triton kernel math is correct, the Triton masking concern from a prior round was fixed, and the single fp8_autocast scope now covers the full forward path. The two remaining concerns are a hard speedup assertion in CI and a private-symbol import, both quality-of-life issues rather than correctness bugs. tests/pytorch/attention/test_linear_mxfp8_attention.py — the performance assertion and private symbol import are worth revisiting before this test runs on new CI pools. Important Files Changed
Reviews (5): Last reviewed commit: "Merge branch 'main' into add_linear_mxfp..." | Re-trigger Greptile |
|
Thanks for the contribution! Could you please:
|
Signed-off-by: Layali Rashid <lrashid@nvidia.com>
c2a41f1 to
46c6a44
Compare
|
/te-ci pytorch L0 |
Add a DSv3-shaped MXFP8 attention unit test covering the training path:
Linear(QKV) -> MLA RoPE -> DotProductAttention -> Linear(out).DotProductAttentionwrapper.Validation
Local checks:
python -m py_compile tests/pytorch/attention/test_linear_mxfp8_attention.py tests/pytorch/attention/mla_rope_utils.pygit diff --checkGB300 dlcluster validation:
1062811(10, 3)(9, 21, 1)(True, '')python -m pytest tests/pytorch/attention/test_linear_mxfp8_attention.py -v -s3 passedPerf output: