Skip to content

[bug] Qwen3 FP8 worse than BF16 #3245

@rhmukundan

Description

@rhmukundan

Problem

Issue:
Performance difference observed where FP8-MX underperforms compared to BF16 in Qwen3 235B and 30B model on 26.02 release.

Affected Chips/Configurations:
Qwen3 235B: B300, B200, H100
Qwen3 30B: GB300, GB200, B200, H100

Minimal repro

Launch the Qwen3 30B and 235B via the `setup_experiment.py`

Expected behavior

FP8 should be faster than BF16

Affected area

area:perf

Regression?

Yes

Environment

No response

Logs

Metadata

Metadata

Labels

bugSomething isn't workingneeds-triageNew item needs classification and ownership

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions