[Perf Regression] 26 config(s) regressed @ e79fe6f5

## Performance Regression Detected

**Commit:** `e79fe6f5`
**Run:** https://github.com/ROCm/ATOM/actions/runs/27639353976
**Date:** 2026-06-17T07:25:14.061422+00:00

### Regressed Configurations

| Model | ISL/OSL | Conc | Tput (cur) | Tput (base) | Δ% | TPOT (cur) | TPOT (base) | Δ% |
|-------|---------|------|-----------|------------|-----|-----------|------------|-----|
| DeepSeek-R1-0528 | 1024/1024 | 4 | 260.0 | 337.8 | -23.0% | 14.96 | 11.40 | 31.2% |
| DeepSeek-R1-0528 | 1024/1024 | 256 | 6316.2 | 6342.6 | -0.4% | 38.90 | 38.80 | 0.3% |
| DeepSeek-R1-0528 | 8192/1024 | 8 | 535.8 | 585.6 | -8.5% | 13.96 | 13.02 | 7.2% |
| DeepSeek-R1-0528 MTP3 | 8192/1024 | 4 | 457.8 | 585.0 | -21.7% | 8.06 | 6.32 | 27.5% |
| DeepSeek-R1-0528 MTP3 | 8192/1024 | 16 | 1218.5 | 1284.4 | -5.1% | 12.29 | 11.62 | 5.8% |
| DeepSeek-R1-0528-MXFP4 | 1024/1024 | 32 | 1791.7 | 1770.2 | 1.2% | 17.23 | 17.49 | -1.5% |
| DeepSeek-R1-0528-MXFP4 | 1024/1024 | 128 | 4078.9 | 4042.4 | 0.9% | 30.12 | 30.50 | -1.3% |
| DeepSeek-V4-Pro DPA | 1024/1024 | 64 | 1952.3 | 1924.9 | 1.4% | 30.35 | 31.15 | -2.6% |
| DeepSeek-V4-Pro DPA | 1024/1024 | 1024 | 11607.8 | 12515.8 | -7.3% | 83.37 | 68.98 | 20.9% |
| DeepSeek-V4-Pro DPA | 8192/1024 | 1024 | 5016.6 | 4499.5 | 11.5% | 129.46 | 210.23 | -38.4% |
| DeepSeek-V4-Pro DPA TBO | 8192/1024 | 1024 | 5377.1 | 4538.3 | 18.5% | 131.64 | 211.77 | -37.8% |
| DeepSeek-V4-Pro MTP3 | 8192/1024 | 4 | 437.0 | 435.1 | 0.5% | 8.18 | 8.22 | -0.5% |
| GLM-5.1-MXFP4 | 1024/1024 | 64 | 1854.6 | 1858.3 | -0.2% | 33.08 | 33.13 | -0.2% |
| GLM-5.1-MXFP4 | 1024/1024 | 256 | 4123.8 | 4126.4 | -0.1% | 59.62 | 59.75 | -0.2% |
| Kimi-K2.5-MXFP4 | 1024/1024 | 32 | 1698.8 | 1708.0 | -0.5% | 18.19 | 18.15 | 0.3% |
| Kimi-K2.5-MXFP4 | 8192/1024 | 256 | 2198.1 | 2186.7 | 0.5% | 109.50 | 111.40 | -1.7% |
| Llama-3.3-70B-Instruct-MXFP4 | 1024/1024 | 8 | 527.0 | 529.9 | -0.5% | 14.69 | 14.63 | 0.5% |
| Qwen3.5-397B-A17B-FP8 | 1024/1024 | 16 | 1216.3 | 1235.0 | -1.5% | 12.74 | 12.56 | 1.4% |
| Qwen3.5-397B-A17B-FP8 MTP3 | 8192/1024 | 8 | 1255.8 | 1242.5 | 1.1% | 5.77 | 5.84 | -1.0% |
| Qwen3.5-397B-A17B-MXFP4 | 1024/1024 | 16 | 1301.0 | 1308.6 | -0.6% | 11.84 | 11.82 | 0.2% |
| Qwen3.5-397B-A17B-MXFP4 | 1024/1024 | 32 | 2035.9 | 2078.4 | -2.0% | 15.12 | 14.84 | 1.9% |
| gpt-oss-120b | 1024/1024 | 16 | 2764.1 | 2811.9 | -1.7% | 5.57 | 5.51 | 0.9% |
| gpt-oss-120b | 1024/1024 | 32 | 4278.9 | 4328.0 | -1.1% | 7.18 | 7.10 | 1.0% |
| gpt-oss-120b | 1024/1024 | 256 | 12047.0 | 12777.3 | -5.7% | 20.27 | 18.17 | 11.6% |
| gpt-oss-120b | 8192/1024 | 64 | 4735.0 | 4721.0 | 0.3% | 12.80 | 12.90 | -0.8% |
| gpt-oss-120b | 8192/1024 | 128 | 5881.9 | 5714.5 | 2.9% | 20.46 | 21.36 | -4.2% |

### Performance Summary

```
# Trace Performance Summary

**File:** `DeepSeek-R1-0528_ts_20260617_073624_594.pt.trace.json.gz`

## Prefill

| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=866 ctx=866]` | 95.72 ms |
| 1 | `prefill[bs=3 tok=1962 ctx=[991, 1011, 936]]` | 89.99 ms |
| 2 | `prefill[bs=1 tok=886 ctx=886]` | 90.09 ms |
| 3 | `prefill[bs=1 tok=1014 ctx=1014]` | 83.89 ms |
| 4 | `prefill[bs=1 tok=922 ctx=922]` | 86.88 ms |
| 5 | `prefill[bs=1 tok=828 ctx=828]` | 86.26 ms |

**Total prefill:** 532.83 ms

## Decode

- **Iterations:** 1919
- **Mean:** 1.08 ms
- **Min:** 905.3 us
- **Max:** 3.61 ms
- **Total:** 2077.46 ms

```

### Profiler Traces

Download from [workflow artifacts](https://github.com/ROCm/ATOM/actions/runs/27639353976).
Open in [Perfetto UI](https://ui.perfetto.dev/) or Chrome `chrome://tracing` for analysis.

### Next Steps
1. Download `profiler-analysis-27639353976` artifact
2. Open trace files in Perfetto UI
3. Compare kernel durations against previous traces
4. Identify bottleneck changes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Perf Regression] 26 config(s) regressed @ e79fe6f5 #1251

Performance Regression Detected

Regressed Configurations

Performance Summary

Profiler Traces

Next Steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	ISL/OSL	Conc	Tput (cur)	Tput (base)	Δ%	TPOT (cur)	TPOT (base)	Δ%
DeepSeek-R1-0528	1024/1024	4	260.0	337.8	-23.0%	14.96	11.40	31.2%
DeepSeek-R1-0528	1024/1024	256	6316.2	6342.6	-0.4%	38.90	38.80	0.3%
DeepSeek-R1-0528	8192/1024	8	535.8	585.6	-8.5%	13.96	13.02	7.2%
DeepSeek-R1-0528 MTP3	8192/1024	4	457.8	585.0	-21.7%	8.06	6.32	27.5%
DeepSeek-R1-0528 MTP3	8192/1024	16	1218.5	1284.4	-5.1%	12.29	11.62	5.8%
DeepSeek-R1-0528-MXFP4	1024/1024	32	1791.7	1770.2	1.2%	17.23	17.49	-1.5%
DeepSeek-R1-0528-MXFP4	1024/1024	128	4078.9	4042.4	0.9%	30.12	30.50	-1.3%
DeepSeek-V4-Pro DPA	1024/1024	64	1952.3	1924.9	1.4%	30.35	31.15	-2.6%
DeepSeek-V4-Pro DPA	1024/1024	1024	11607.8	12515.8	-7.3%	83.37	68.98	20.9%
DeepSeek-V4-Pro DPA	8192/1024	1024	5016.6	4499.5	11.5%	129.46	210.23	-38.4%
DeepSeek-V4-Pro DPA TBO	8192/1024	1024	5377.1	4538.3	18.5%	131.64	211.77	-37.8%
DeepSeek-V4-Pro MTP3	8192/1024	4	437.0	435.1	0.5%	8.18	8.22	-0.5%
GLM-5.1-MXFP4	1024/1024	64	1854.6	1858.3	-0.2%	33.08	33.13	-0.2%
GLM-5.1-MXFP4	1024/1024	256	4123.8	4126.4	-0.1%	59.62	59.75	-0.2%
Kimi-K2.5-MXFP4	1024/1024	32	1698.8	1708.0	-0.5%	18.19	18.15	0.3%
Kimi-K2.5-MXFP4	8192/1024	256	2198.1	2186.7	0.5%	109.50	111.40	-1.7%
Llama-3.3-70B-Instruct-MXFP4	1024/1024	8	527.0	529.9	-0.5%	14.69	14.63	0.5%
Qwen3.5-397B-A17B-FP8	1024/1024	16	1216.3	1235.0	-1.5%	12.74	12.56	1.4%
Qwen3.5-397B-A17B-FP8 MTP3	8192/1024	8	1255.8	1242.5	1.1%	5.77	5.84	-1.0%
Qwen3.5-397B-A17B-MXFP4	1024/1024	16	1301.0	1308.6	-0.6%	11.84	11.82	0.2%
Qwen3.5-397B-A17B-MXFP4	1024/1024	32	2035.9	2078.4	-2.0%	15.12	14.84	1.9%
gpt-oss-120b	1024/1024	16	2764.1	2811.9	-1.7%	5.57	5.51	0.9%
gpt-oss-120b	1024/1024	32	4278.9	4328.0	-1.1%	7.18	7.10	1.0%
gpt-oss-120b	1024/1024	256	12047.0	12777.3	-5.7%	20.27	18.17	11.6%
gpt-oss-120b	8192/1024	64	4735.0	4721.0	0.3%	12.80	12.90	-0.8%
gpt-oss-120b	8192/1024	128	5881.9	5714.5	2.9%	20.46	21.36	-4.2%

Uh oh!

[Perf Regression] 26 config(s) regressed @ e79fe6f5 #1251

Description

Performance Regression Detected

Regressed Configurations

Performance Summary

Profiler Traces

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions