Skip to content

[Perf Regression] 36 config(s) regressed @ d7964d50 #1296

Description

@github-actions

Performance Regression Detected

Commit: d7964d50
Run: https://github.com/ROCm/ATOM/actions/runs/27840294756
Date: 2026-06-20T05:32:27.026062+00:00

Regressed Configurations

Model ISL/OSL Conc Tput (cur) Tput (base) Δ% TPOT (cur) TPOT (base) Δ%
DeepSeek-R1-0528 8192/1024 4 240.7 293.4 -18.0% 15.91 12.99 22.5%
DeepSeek-R1-0528 8192/1024 8 593.0 583.5 1.6% 12.79 13.06 -2.1%
DeepSeek-R1-0528 8192/1024 32 1267.9 1271.8 -0.3% 23.76 23.81 -0.2%
DeepSeek-R1-0528 MTP3 1024/1024 4 599.4 665.4 -9.9% 6.25 5.70 9.8%
DeepSeek-R1-0528 MTP3 8192/1024 4 479.3 568.8 -15.7% 7.65 6.46 18.3%
DeepSeek-R1-0528 MTP3 8192/1024 32 1597.1 1762.5 -9.4% 18.86 17.05 10.6%
DeepSeek-R1-0528-MXFP4 1024/1024 4 352.8 440.9 -20.0% 10.81 8.67 24.7%
DeepSeek-R1-0528-MXFP4 8192/1024 16 857.5 872.8 -1.8% 17.39 17.27 0.7%
DeepSeek-R1-0528-MXFP4 MTP3 8192/1024 4 514.7 571.0 -9.9% 6.96 6.34 9.8%
DeepSeek-R1-0528-MXFP4 MTP3 8192/1024 32 1360.1 1472.1 -7.6% 20.20 20.36 -0.8%
DeepSeek-V4-Pro 1024/1024 8 513.9 529.5 -2.9% 15.00 14.59 2.8%
DeepSeek-V4-Pro 1024/1024 64 2109.9 2142.2 -1.5% 29.12 28.71 1.4%
DeepSeek-V4-Pro 1024/1024 256 4951.8 4923.6 0.6% 49.68 50.09 -0.8%
DeepSeek-V4-Pro 8192/1024 8 463.7 475.7 -2.5% 16.32 16.04 1.7%
DeepSeek-V4-Pro 8192/1024 16 740.3 744.0 -0.5% 20.24 20.19 0.2%
DeepSeek-V4-Pro DPA 1024/1024 128 3397.8 3377.4 0.6% 34.87 35.61 -2.1%
DeepSeek-V4-Pro DPA 1024/1024 256 5876.1 5832.5 0.8% 40.63 41.27 -1.5%
DeepSeek-V4-Pro DPA MTP3 1024/1024 1024 11081.4 10750.5 3.1% 88.08 91.12 -3.3%
DeepSeek-V4-Pro DPA TBO 1024/1024 1024 11602.0 9920.2 16.9% 77.60 98.52 -21.2%
DeepSeek-V4-Pro MTP3 1024/1024 4 447.0 434.4 2.9% 8.19 8.47 -3.3%
DeepSeek-V4-Pro MTP3 1024/1024 8 828.4 814.2 1.7% 9.02 9.31 -3.1%
DeepSeek-V4-Pro MTP3 8192/1024 8 556.8 659.6 -15.6% 11.37 11.02 3.2%
GLM-5.2-FP8 1024/1024 8 491.5 490.9 0.1% 15.67 15.84 -1.1%
Kimi-K2.5-MXFP4 8192/1024 8 639.5 643.5 -0.6% 11.81 11.79 0.1%
Llama-3.3-70B-Instruct-MXFP4 1024/1024 4 263.7 265.6 -0.7% 14.50 14.46 0.3%
Llama-3.3-70B-Instruct-MXFP4 1024/1024 16 1002.1 1004.7 -0.3% 15.36 15.39 -0.2%
Llama-3.3-70B-Instruct-MXFP4 1024/1024 32 1805.5 1796.0 0.5% 16.91 17.08 -1.0%
MiniMax-M2.7 1024/1024 256 5620.0 5589.7 0.5% 43.93 44.19 -0.6%
Qwen3.5-397B-A17B-FP8 MTP3 8192/1024 32 2230.2 2176.2 2.5% 13.16 13.67 -3.8%
Qwen3.5-397B-A17B-MXFP4 1024/1024 256 6239.9 6158.6 1.3% 39.67 40.16 -1.2%
gpt-oss-120b 1024/1024 64 6460.6 6539.9 -1.2% 9.49 9.37 1.2%
gpt-oss-120b 1024/1024 256 12783.5 12284.4 4.1% 18.12 19.99 -9.3%
gpt-oss-120b 8192/1024 4 863.6 874.0 -1.2% 4.38 4.34 0.9%
gpt-oss-120b 8192/1024 16 2387.1 2430.6 -1.8% 6.25 6.20 0.8%
gpt-oss-120b 8192/1024 128 5726.1 5777.5 -0.9% 21.20 21.11 0.4%
gpt-oss-120b 8192/1024 256 6561.6 6534.3 0.4% 36.68 37.18 -1.3%

Performance Summary

# Trace Performance Summary

**File:** `DeepSeek-R1-0528_ts_20260620_054304_455.pt.trace.json.gz`

## Prefill

| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=5 ctx=7237]` | 89.85 ms |
| 1 | `prefill[bs=3 tok=16384 ctx=[7112, 7769, 1503]]` | 95.80 ms |
| 2 | `prefill[bs=1 tok=5885 ctx=7388]` | 91.64 ms |
| 3 | `prefill[bs=1 tok=7316 ctx=7316]` | 90.99 ms |
| 4 | `prefill[bs=1 tok=7936 ctx=7936]` | 88.51 ms |
| 5 | `prefill[bs=1 tok=7586 ctx=7586]` | 88.76 ms |
| 6 | `prefill[bs=1 tok=6830 ctx=6830]` | 86.04 ms |

**Total prefill:** 631.59 ms

## Decode

- **Iterations:** 1920
- **Mean:** 1.07 ms
- **Min:** 720.2 us
- **Max:** 4.54 ms
- **Total:** 2055.02 ms

Profiler Traces

Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.

Next Steps

  1. Download profiler-analysis-27840294756 artifact
  2. Open trace files in Perfetto UI
  3. Compare kernel durations against previous traces
  4. Identify bottleneck changes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions