Skip to content

[Perf Regression] 16 config(s) regressed @ ef446036 #1338

Description

@github-actions

Performance Regression Detected

Commit: ef446036
Run: https://github.com/ROCm/ATOM/actions/runs/28044968211
Date: 2026-06-24T07:03:05.409832+00:00

Regressed Configurations

Model ISL/OSL Conc Tput (cur) Tput (base) Δ% TPOT (cur) TPOT (base) Δ%
DeepSeek-R1-0528 1024/1024 64 2873.2 2845.4 1.0% 21.31 21.65 -1.6%
DeepSeek-R1-0528 8192/1024 4 299.2 335.4 -10.8% 12.72 11.29 12.6%
DeepSeek-R1-0528 8192/1024 8 482.2 597.8 -19.3% 15.91 12.74 24.9%
DeepSeek-R1-0528 MTP3 1024/1024 16 1552.5 1455.5 6.7% 9.73 10.49 -7.3%
DeepSeek-R1-0528 MTP3 8192/1024 4 599.6 637.2 -5.9% 6.00 5.62 6.7%
DeepSeek-R1-0528 MTP3 8192/1024 32 1546.7 1771.9 -12.7% 19.54 16.82 16.1%
DeepSeek-R1-0528-MXFP4 1024/1024 128 4076.0 4075.2 0.0% 30.13 30.24 -0.4%
DeepSeek-R1-0528-MXFP4 MTP3 1024/1024 16 1542.8 1586.7 -2.8% 9.92 9.55 3.9%
DeepSeek-R1-0528-MXFP4 MTP3 1024/1024 32 2340.6 2297.6 1.9% 12.92 13.25 -2.5%
DeepSeek-V4-Pro 1024/1024 128 3114.5 3100.5 0.5% 39.12 39.79 -1.7%
DeepSeek-V4-Pro 8192/1024 8 442.7 467.4 -5.3% 17.13 16.25 5.4%
DeepSeek-V4-Pro DPA 1024/1024 512 9015.3 8776.9 2.7% 51.44 54.02 -4.8%
DeepSeek-V4-Pro DPA TBO 8192/1024 512 3984.1 4278.3 -6.9% 117.95 103.17 14.3%
DeepSeek-V4-Pro DPA TBO 8192/1024 1024 4585.8 5407.7 -15.2% 209.79 127.69 64.3%
DeepSeek-V4-Pro MTP3 8192/1024 8 694.4 738.5 -6.0% 10.60 10.06 5.4%
GLM-5.1-MXFP4 1024/1024 4 262.2 266.2 -1.5% 14.24 14.24 0.0%

Performance Summary

# Trace Performance Summary

**File:** `DeepSeek-R1-0528_ts_20260624_080334_016.pt.trace.json.gz`

## Prefill

| # | Label | Duration |
|---|-------|----------|
| 0 | `prefill[bs=1 tok=5 ctx=7237]` | 91.08 ms |
| 1 | `prefill[bs=3 tok=16384 ctx=[7388, 7316, 1680]]` | 95.91 ms |
| 2 | `prefill[bs=3 tok=16384 ctx=[7936, 7586, 2542]]` | 148.56 ms |
| 3 | `prefill[bs=3 tok=16384 ctx=[6830, 7112, 4984]]` | 97.20 ms |
| 4 | `prefill[bs=1 tok=2785 ctx=7769]` | 84.92 ms |
| 5 | `prefill[bs=1 tok=7152 ctx=7152]` | 91.31 ms |
| 6 | `prefill[bs=1 tok=7647 ctx=7647]` | 89.03 ms |
| 7 | `prefill[bs=1 tok=8049 ctx=8049]` | 90.51 ms |
| 8 | `prefill[bs=1 tok=7153 ctx=7153]` | 89.28 ms |
| 9 | `prefill[bs=1 tok=7973 ctx=7973]` | 88.93 ms |
| 10 | `prefill[bs=1 tok=6867 ctx=6867]` | 88.97 ms |
| 11 | `prefill[bs=1 tok=7258 ctx=7258]` | 88.28 ms |
| 12 | `prefill[bs=1 tok=8063 ctx=8063]` | 85.53 ms |

**Total prefill:** 1229.51 ms

## Decode

- **Iterations:** 2009
- **Mean:** 874.5 us
- **Min:** 660.4 us
- **Max:** 7.40 ms
- **Total:** 1756.90 ms

Profiler Traces

Download from workflow artifacts.
Open in Perfetto UI or Chrome chrome://tracing for analysis.

Next Steps

  1. Download profiler-analysis-28044968211 artifact
  2. Open trace files in Perfetto UI
  3. Compare kernel durations against previous traces
  4. Identify bottleneck changes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions