Performance Regression Detected
Commit: d7964d50
Run: https://github.com/ROCm/ATOM/actions/runs/27840294756
Date: 2026-06-20T05:32:27.026062+00:00
Regressed Configurations
| Model |
ISL/OSL |
Conc |
Tput (cur) |
Tput (base) |
Δ% |
TPOT (cur) |
TPOT (base) |
Δ% |
| DeepSeek-R1-0528 |
8192/1024 |
4 |
240.7 |
293.4 |
-18.0% |
15.91 |
12.99 |
22.5% |
| DeepSeek-R1-0528 |
8192/1024 |
8 |
593.0 |
583.5 |
1.6% |
12.79 |
13.06 |
-2.1% |
| DeepSeek-R1-0528 |
8192/1024 |
32 |
1267.9 |
1271.8 |
-0.3% |
23.76 |
23.81 |
-0.2% |
| DeepSeek-R1-0528 MTP3 |
1024/1024 |
4 |
599.4 |
665.4 |
-9.9% |
6.25 |
5.70 |
9.8% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
4 |
479.3 |
568.8 |
-15.7% |
7.65 |
6.46 |
18.3% |
| DeepSeek-R1-0528 MTP3 |
8192/1024 |
32 |
1597.1 |
1762.5 |
-9.4% |
18.86 |
17.05 |
10.6% |
| DeepSeek-R1-0528-MXFP4 |
1024/1024 |
4 |
352.8 |
440.9 |
-20.0% |
10.81 |
8.67 |
24.7% |
| DeepSeek-R1-0528-MXFP4 |
8192/1024 |
16 |
857.5 |
872.8 |
-1.8% |
17.39 |
17.27 |
0.7% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
8192/1024 |
4 |
514.7 |
571.0 |
-9.9% |
6.96 |
6.34 |
9.8% |
| DeepSeek-R1-0528-MXFP4 MTP3 |
8192/1024 |
32 |
1360.1 |
1472.1 |
-7.6% |
20.20 |
20.36 |
-0.8% |
| DeepSeek-V4-Pro |
1024/1024 |
8 |
513.9 |
529.5 |
-2.9% |
15.00 |
14.59 |
2.8% |
| DeepSeek-V4-Pro |
1024/1024 |
64 |
2109.9 |
2142.2 |
-1.5% |
29.12 |
28.71 |
1.4% |
| DeepSeek-V4-Pro |
1024/1024 |
256 |
4951.8 |
4923.6 |
0.6% |
49.68 |
50.09 |
-0.8% |
| DeepSeek-V4-Pro |
8192/1024 |
8 |
463.7 |
475.7 |
-2.5% |
16.32 |
16.04 |
1.7% |
| DeepSeek-V4-Pro |
8192/1024 |
16 |
740.3 |
744.0 |
-0.5% |
20.24 |
20.19 |
0.2% |
| DeepSeek-V4-Pro DPA |
1024/1024 |
128 |
3397.8 |
3377.4 |
0.6% |
34.87 |
35.61 |
-2.1% |
| DeepSeek-V4-Pro DPA |
1024/1024 |
256 |
5876.1 |
5832.5 |
0.8% |
40.63 |
41.27 |
-1.5% |
| DeepSeek-V4-Pro DPA MTP3 |
1024/1024 |
1024 |
11081.4 |
10750.5 |
3.1% |
88.08 |
91.12 |
-3.3% |
| DeepSeek-V4-Pro DPA TBO |
1024/1024 |
1024 |
11602.0 |
9920.2 |
16.9% |
77.60 |
98.52 |
-21.2% |
| DeepSeek-V4-Pro MTP3 |
1024/1024 |
4 |
447.0 |
434.4 |
2.9% |
8.19 |
8.47 |
-3.3% |
| DeepSeek-V4-Pro MTP3 |
1024/1024 |
8 |
828.4 |
814.2 |
1.7% |
9.02 |
9.31 |
-3.1% |
| DeepSeek-V4-Pro MTP3 |
8192/1024 |
8 |
556.8 |
659.6 |
-15.6% |
11.37 |
11.02 |
3.2% |
| GLM-5.2-FP8 |
1024/1024 |
8 |
491.5 |
490.9 |
0.1% |
15.67 |
15.84 |
-1.1% |
| Kimi-K2.5-MXFP4 |
8192/1024 |
8 |
639.5 |
643.5 |
-0.6% |
11.81 |
11.79 |
0.1% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
4 |
263.7 |
265.6 |
-0.7% |
14.50 |
14.46 |
0.3% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
16 |
1002.1 |
1004.7 |
-0.3% |
15.36 |
15.39 |
-0.2% |
| Llama-3.3-70B-Instruct-MXFP4 |
1024/1024 |
32 |
1805.5 |
1796.0 |
0.5% |
16.91 |
17.08 |
-1.0% |
| MiniMax-M2.7 |
1024/1024 |
256 |
5620.0 |
5589.7 |
0.5% |
43.93 |
44.19 |
-0.6% |
| Qwen3.5-397B-A17B-FP8 MTP3 |
8192/1024 |
32 |
2230.2 |
2176.2 |
2.5% |
13.16 |
13.67 |
-3.8% |
| Qwen3.5-397B-A17B-MXFP4 |
1024/1024 |
256 |
6239.9 |
6158.6 |
1.3% |
39.67 |
40.16 |
-1.2% |
| gpt-oss-120b |
1024/1024 |
64 |
6460.6 |
6539.9 |
-1.2% |
9.49 |
9.37 |
1.2% |
| gpt-oss-120b |
1024/1024 |
256 |
12783.5 |
12284.4 |
4.1% |
18.12 |
19.99 |
-9.3% |
| gpt-oss-120b |
8192/1024 |
4 |
863.6 |
874.0 |
-1.2% |
4.38 |
4.34 |
0.9% |
| gpt-oss-120b |
8192/1024 |
16 |
2387.1 |
2430.6 |
-1.8% |
6.25 |
6.20 |
0.8% |
| gpt-oss-120b |
8192/1024 |
128 |
5726.1 |
5777.5 |
-0.9% |
21.20 |
21.11 |
0.4% |
| gpt-oss-120b |
8192/1024 |
256 |
6561.6 |
6534.3 |
0.4% |
36.68 |
37.18 |
-1.3% |
Performance Summary
# Trace Performance Summary
**File:** `DeepSeek-R1-0528_ts_20260620_054304_455.pt.trace.json.gz`
## Prefill
| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=5 ctx=7237]` | 89.85 ms |
| 1 | `prefill[bs=3 tok=16384 ctx=[7112, 7769, 1503]]` | 95.80 ms |
| 2 | `prefill[bs=1 tok=5885 ctx=7388]` | 91.64 ms |
| 3 | `prefill[bs=1 tok=7316 ctx=7316]` | 90.99 ms |
| 4 | `prefill[bs=1 tok=7936 ctx=7936]` | 88.51 ms |
| 5 | `prefill[bs=1 tok=7586 ctx=7586]` | 88.76 ms |
| 6 | `prefill[bs=1 tok=6830 ctx=6830]` | 86.04 ms |
**Total prefill:** 631.59 ms
## Decode
- **Iterations:** 1920
- **Mean:** 1.07 ms
- **Min:** 720.2 us
- **Max:** 4.54 ms
- **Total:** 2055.02 ms
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.
Next Steps
- Download
profiler-analysis-27840294756 artifact
- Open trace files in Perfetto UI
- Compare kernel durations against previous traces
- Identify bottleneck changes
Performance Regression Detected
Commit:
d7964d50Run: https://github.com/ROCm/ATOM/actions/runs/27840294756
Date: 2026-06-20T05:32:27.026062+00:00
Regressed Configurations
Performance Summary
Profiler Traces
Download from workflow artifacts.
Open in Perfetto UI or Chrome
chrome://tracingfor analysis.Next Steps
profiler-analysis-27840294756artifact