Performance Regression Detected
Commit: e79fe6f5
Run: https://github.com/ROCm/ATOM/actions/runs/27639353976
Date: 2026-06-17T07:25:14.061422+00:00
Regressed Configurations
| Model |
ISL/OSL |
Conc |
Tput (cur) |
Tput (base) |
Δ% |
TPOT (cur) |
TPOT (base) |
Δ% |
| DeepSeek-R1-0528 |
1024/1024 |
4 |
260.0 |
337.8 |
-23.0% |
14.96 |
11.40 |
31.2% |
| DeepSeek-R1-0528 |
1024/1024 |
256 |
6316.2 |
6342.6 |
-0.4% |
38.90 |
38.80 |
0.3% |
| DeepSeek-R1-0528 |
8192/1024 |
8 |
535.8 |
585.6 |
-8.5% |
13.96 |
13.02 |
7.2% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
4 |
457.8 |
585.0 |
-21.7% |
8.06 |
6.32 |
27.5% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
16 |
1218.5 |
1284.4 |
-5.1% |
12.29 |
11.62 |
5.8% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
32 |
1791.7 |
1770.2 |
1.2% |
17.23 |
17.49 |
-1.5% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
128 |
4078.9 |
4042.4 |
0.9% |
30.12 |
30.50 |
-1.3% |
| DeepSeek-V4-Pro DPA |
1024/1024 |
64 |
1952.3 |
1924.9 |
1.4% |
30.35 |
31.15 |
-2.6% |
| DeepSeek-V4-Pro DPA |
1024/1024 |
1024 |
11607.8 |
12515.8 |
-7.3% |
83.37 |
68.98 |
20.9% |
| DeepSeek-V4-Pro DPA |
8192/1024 |
1024 |
5016.6 |
4499.5 |
11.5% |
129.46 |
210.23 |
-38.4% |
| DeepSeek-V4-Pro DPA TBO |
8192/1024 |
1024 |
5377.1 |
4538.3 |
18.5% |
131.64 |
211.77 |
-37.8% |
| DeepSeek-V4-Pro MTP3 |
8192/1024 |
4 |
437.0 |
435.1 |
0.5% |
8.18 |
8.22 |
-0.5% |
| GLM-5.1-MXFP4 |
1024/1024 |
64 |
1854.6 |
1858.3 |
-0.2% |
33.08 |
33.13 |
-0.2% |
| GLM-5.1-MXFP4 |
1024/1024 |
256 |
4123.8 |
4126.4 |
-0.1% |
59.62 |
59.75 |
-0.2% |
| Kimi-K2.5-MXFP4 |
1024/1024 |
32 |
1698.8 |
1708.0 |
-0.5% |
18.19 |
18.15 |
0.3% |
| Kimi-K2.5-MXFP4 |
8192/1024 |
256 |
2198.1 |
2186.7 |
0.5% |
109.50 |
111.40 |
-1.7% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
8 |
527.0 |
529.9 |
-0.5% |
14.69 |
14.63 |
0.5% |
| Qwen3.5-397B-A17B-FP8 |
1024/1024 |
16 |
1216.3 |
1235.0 |
-1.5% |
12.74 |
12.56 |
1.4% |
| Qwen3.5-397B-A17B-FP8 MTP3 |
8192/1024 |
8 |
1255.8 |
1242.5 |
1.1% |
5.77 |
5.84 |
-1.0% |
| Qwen3.5-397B-A17B-MXFP4 |
1024/1024 |
16 |
1301.0 |
1308.6 |
-0.6% |
11.84 |
11.82 |
0.2% |
| Qwen3.5-397B-A17B-MXFP4 |
1024/1024 |
32 |
2035.9 |
2078.4 |
-2.0% |
15.12 |
14.84 |
1.9% |
| gpt-oss-120b |
1024/1024 |
16 |
2764.1 |
2811.9 |
-1.7% |
5.57 |
5.51 |
0.9% |
| gpt-oss-120b |
1024/1024 |
32 |
4278.9 |
4328.0 |
-1.1% |
7.18 |
7.10 |
1.0% |
| gpt-oss-120b |
1024/1024 |
256 |
12047.0 |
12777.3 |
-5.7% |
20.27 |
18.17 |
11.6% |
| gpt-oss-120b |
8192/1024 |
64 |
4735.0 |
4721.0 |
0.3% |
12.80 |
12.90 |
-0.8% |
| gpt-oss-120b |
8192/1024 |
128 |
5881.9 |
5714.5 |
2.9% |
20.46 |
21.36 |
-4.2% |
Performance Summary
# Trace Performance Summary
**File:** `DeepSeek-R1-0528_ts_20260617_073624_594.pt.trace.json.gz`
## Prefill
| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=866 ctx=866]` | 95.72 ms |
| 1 | `prefill[bs=3 tok=1962 ctx=[991, 1011, 936]]` | 89.99 ms |
| 2 | `prefill[bs=1 tok=886 ctx=886]` | 90.09 ms |
| 3 | `prefill[bs=1 tok=1014 ctx=1014]` | 83.89 ms |
| 4 | `prefill[bs=1 tok=922 ctx=922]` | 86.88 ms |
| 5 | `prefill[bs=1 tok=828 ctx=828]` | 86.26 ms |
**Total prefill:** 532.83 ms
## Decode
- **Iterations:** 1919
- **Mean:** 1.08 ms
- **Min:** 905.3 us
- **Max:** 3.61 ms
- **Total:** 2077.46 ms
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.
Next Steps
- Download
profiler-analysis-27639353976 artifact
- Open trace files in Perfetto UI
- Compare kernel durations against previous traces
- Identify bottleneck changes
Performance Regression Detected
Commit:
e79fe6f5Run: https://github.com/ROCm/ATOM/actions/runs/27639353976
Date: 2026-06-17T07:25:14.061422+00:00
Regressed Configurations
Performance Summary
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome
chrome://tracingfor analysis.Next Steps
profiler-analysis-27639353976artifact