Problem
Issue:
Performance difference observed where FP8-MX underperforms compared to BF16 in Qwen3 235B and 30B model on 26.02 release.
Affected Chips/Configurations:
Qwen3 235B: B300, B200, H100
Qwen3 30B: GB300, GB200, B200, H100
Minimal repro
Launch the Qwen3 30B and 235B via the `setup_experiment.py`
Expected behavior
FP8 should be faster than BF16
Affected area
area:perf
Regression?
Yes
Environment
No response
Logs
Problem
Issue:
Performance difference observed where FP8-MX underperforms compared to BF16 in Qwen3 235B and 30B model on 26.02 release.
Affected Chips/Configurations:
Qwen3 235B: B300, B200, H100
Qwen3 30B: GB300, GB200, B200, H100
Minimal repro
Launch the Qwen3 30B and 235B via the `setup_experiment.py`Expected behavior
FP8 should be faster than BF16
Affected area
area:perf
Regression?
Yes
Environment
No response
Logs