Skip to content

test: adds tensilelite sia0 pgr2 characterization testing#8788

Open
davidd-amd wants to merge 3 commits into
developfrom
users/davidd-amd/tensilelite-sia0-pgr2-characterization
Open

test: adds tensilelite sia0 pgr2 characterization testing#8788
davidd-amd wants to merge 3 commits into
developfrom
users/davidd-amd/tensilelite-sia0-pgr2-characterization

Conversation

@davidd-amd

@davidd-amd davidd-amd commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a CPU-only characterization that exercises the legacy SIA0 (ScheduleIterAlg=0)
PrefetchGlobalRead=2, non-TDM global-read / tail-reset emission path in TensileLite.
That path previously had no content-sensitive test: every designed config in the
characterization matrix pins ScheduleIterAlg: [3], and the subtile configs use the
LogicalScheduler, so a SIA0-only codegen change produced no CPU-PR-CI signal. PR #8417
("Fix SIA0 PGR2 global-read placement") changed exactly this path and reached develop
with no characterization diff. This PR closes that coverage gap.

Risk Assessment

Risk 1. Test-only, non-shipping: adds one characterization test, one designed input
config, and its snapshot. No product/library code changes; the legacy SIA0 emission
path is exercised but not modified.

Related

Device / Architecture Coverage

CPU-only. The test drives the config-emit harness to generate gfx1250 AMDGCN assembly
text on the host (no GPU, no compile, no hardware). The pinned signal is a
Tensile-emitted comment, independent of the amdclang/hipcc version. Passing the
TensileLite unit lane in PR CI is sufficient; no specific-arch or sweep run is required.

Testing Summary

  • TensileLite characterization unit suite — the new test emits the SIA0 kernel and
    pins the SIA0 tail-reset markers.
  • Differential proof (fails-before / passes-after [tensile] Fix SIA0 PGR2 global-read placement #8417) captured locally to show the
    guard catches the SIA0 placement change class.

Testing Checklist

Adjacent Tests Considered

A gfx950 MX subtile SIA0 variant was built and rejected: emitting it pre- vs
post-#8417 produced byte-identical assembly, because UseSubtileImpl uses the
LogicalScheduler and bypasses the legacy SIA path #8417 modifies. The discriminating
config is therefore a non-subtile SIA0 kernel (gfx1250 F32X TN), derived from the
xfp32_gfx1250.yaml problem the #8417 author lists among the configs that changed.

Risk Acceptance / Waivers

W-NOTICKET — proactive coverage hardening; no defect ticket. The related change (#8417)
is linked and resolves.

Technical Changes

  • New designed config …/_designed/gfx1250/sia0_pgr2_xf32_tn.yaml: F32X TN, SIA0,
    PGR2, TDMInst=0 (non-TDM), StreamK=0, 1LDSBuffer=0 (SIA0 rejects 1LDSBuffer!=0),
    reduced to one MI shape / size for a cheap emit.
  • New test …/_codegen/test_r7_sia0_pgr2_placement_char.py: an emit smoke test
    (err==0, real gfx1250 asm) plus a golden snapshot of the Tensile-emitted
    Tail: local read reset offsets a/b markers per kernel.
  • Snapshot …/__snapshots__/test_r7_sia0_pgr2_placement_char.ambr.
  • Projection rationale: the tail-reset markers are emitted by the Python codegen (not
    the assembler), so they are toolchain-independent and flip on a change to the SIA0
    non-TDM tail-reset / placement logic — the precise class of change [tensile] Fix SIA0 PGR2 global-read placement #8417 made.

ROCM-27082

davidd-amd and others added 2 commits June 24, 2026 19:36
Add a CPU-only characterization that exercises the legacy SIA0
(ScheduleIterAlg=0) PrefetchGlobalRead=2 emission path. Every other
designed config in the gfx matrix pins ScheduleIterAlg: [3], and the
subtile configs use the LogicalScheduler, so the SIA0 path had no
content-sensitive test -- a SIA0-only codegen change produced no snapshot
diff and passed CPU PR CI unguarded.

The new designed config (gfx1250 F32X TN, SIA0, PGR2, TDMInst=0/non-TDM,
StreamK=0; derived from Tests/common/gemm/gfx12/xfp32_gfx1250.yaml) drives
both arms of the SIA0 placement logic. The golden snapshot pins the
Tensile-emitted "Tail: local read reset offsets a/b" markers, which are
emitted by the Python codegen (independent of the amdclang/hipcc version)
and flip on a change to the SIA0 non-TDM tail-reset behavior.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@davidd-amd davidd-amd requested a review from a team as a code owner June 24, 2026 20:23
@davidd-amd davidd-amd changed the title Users/davidd amd/tensilelite sia0 pgr2 characterization test: adds tensilelite sia0 pgr2 characterization testing Jun 24, 2026
…avior

Update the SIA0 tail-reset golden to develop's current emission
(tail_lr_reset_a/b = True), so the characterization is green on develop and
guards against future regressions of the SIA0 PGR2 non-TDM placement.

The snapshot was first captured against #8417's parent (markers False) to
prove the guard fails-before / passes-after the fix; this commit pins the
post-fix behavior that should hold going forward.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.89%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8788      +/-   ##
===========================================
+ Coverage    71.45%   71.47%   +0.03%     
===========================================
  Files         2612     2612              
  Lines       407014   407795     +781     
  Branches     60772    60983     +211     
===========================================
+ Hits        290794   291466     +672     
- Misses       94937    94995      +58     
- Partials     21283    21334      +51     
Flag Coverage Δ *Carryforward flag
TensileLite 76.95% <ø> (+0.14%) ⬆️
hipBLAS 90.81% <ø> (ø) Carriedforward from 9c32748
hipBLASLt 41.36% <ø> (-0.03%) ⬇️
hipCUB 82.68% <ø> (ø) Carriedforward from 9c32748
hipDNN 86.74% <ø> (ø) Carriedforward from 9c32748
hipFFT 50.17% <ø> (ø) Carriedforward from 9c32748
hipRAND 76.12% <ø> (ø) Carriedforward from 9c32748
hipSOLVER 69.18% <ø> (ø) Carriedforward from 9c32748
hipSPARSE 86.55% <ø> (ø) Carriedforward from 9c32748
rocBLAS 48.08% <ø> (ø) Carriedforward from 9c32748
rocFFT 47.40% <ø> (ø) Carriedforward from 9c32748
rocRAND 57.07% <ø> (ø) Carriedforward from 9c32748
rocSOLVER 77.89% <ø> (ø) Carriedforward from 9c32748
rocSPARSE 72.37% <ø> (ø) Carriedforward from 9c32748
rocThrust 91.34% <ø> (ø) Carriedforward from 9c32748

*This pull request uses carry forward flags. Click here to find out more.
see 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants