[bug] Qwen3 FP8 worse than BF16

### Problem

Issue:
Performance difference observed where FP8-MX underperforms compared to BF16 in Qwen3 235B and 30B model on 26.02 release.

Affected Chips/Configurations:
Qwen3 235B: B300, B200, H100
Qwen3 30B: GB300, GB200, B200, H100 



### Minimal repro

```shell
Launch the Qwen3 30B and 235B via the `setup_experiment.py`
```

### Expected behavior

FP8 should be faster than BF16 

### Affected area

area:perf

### Regression?

Yes

### Environment

_No response_

### Logs

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug] Qwen3 FP8 worse than BF16 #3245

Problem

Minimal repro

Expected behavior

Affected area

Regression?

Environment

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Qwen3 FP8 worse than BF16 #3245

Description

Problem

Minimal repro

Expected behavior

Affected area

Regression?

Environment

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions