test: adds tensilelite sia0 pgr2 characterization testing#8788
Open
davidd-amd wants to merge 3 commits into
Open
test: adds tensilelite sia0 pgr2 characterization testing#8788davidd-amd wants to merge 3 commits into
davidd-amd wants to merge 3 commits into
Conversation
Add a CPU-only characterization that exercises the legacy SIA0 (ScheduleIterAlg=0) PrefetchGlobalRead=2 emission path. Every other designed config in the gfx matrix pins ScheduleIterAlg: [3], and the subtile configs use the LogicalScheduler, so the SIA0 path had no content-sensitive test -- a SIA0-only codegen change produced no snapshot diff and passed CPU PR CI unguarded. The new designed config (gfx1250 F32X TN, SIA0, PGR2, TDMInst=0/non-TDM, StreamK=0; derived from Tests/common/gemm/gfx12/xfp32_gfx1250.yaml) drives both arms of the SIA0 placement logic. The golden snapshot pins the Tensile-emitted "Tail: local read reset offsets a/b" markers, which are emitted by the Python codegen (independent of the amdclang/hipcc version) and flip on a change to the SIA0 non-TDM tail-reset behavior. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…avior Update the SIA0 tail-reset golden to develop's current emission (tail_lr_reset_a/b = True), so the characterization is green on develop and guards against future regressions of the SIA0 PGR2 non-TDM placement. The snapshot was first captured against #8417's parent (markers False) to prove the guard fails-before / passes-after the fix; this commit pins the post-fix behavior that should hold going forward. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (77.89%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #8788 +/- ##
===========================================
+ Coverage 71.45% 71.47% +0.03%
===========================================
Files 2612 2612
Lines 407014 407795 +781
Branches 60772 60983 +211
===========================================
+ Hits 290794 291466 +672
- Misses 94937 94995 +58
- Partials 21283 21334 +51
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a CPU-only characterization that exercises the legacy SIA0 (
ScheduleIterAlg=0)PrefetchGlobalRead=2, non-TDM global-read / tail-reset emission path in TensileLite.That path previously had no content-sensitive test: every designed config in the
characterization matrix pins
ScheduleIterAlg: [3], and the subtile configs use theLogicalScheduler, so a SIA0-only codegen change produced no CPU-PR-CI signal. PR #8417
("Fix SIA0 PGR2 global-read placement") changed exactly this path and reached develop
with no characterization diff. This PR closes that coverage gap.
Risk Assessment
Risk 1. Test-only, non-shipping: adds one characterization test, one designed input
config, and its snapshot. No product/library code changes; the legacy SIA0 emission
path is exercised but not modified.
Related
used as the fails-before/passes-after reference point)
proactive test-coverage hardening with no separate defect ticket.
Device / Architecture Coverage
CPU-only. The test drives the config-emit harness to generate gfx1250 AMDGCN assembly
text on the host (no GPU, no compile, no hardware). The pinned signal is a
Tensile-emitted comment, independent of the amdclang/hipcc version. Passing the
TensileLite unit lane in PR CI is sufficient; no specific-arch or sweep run is required.
Testing Summary
pins the SIA0 tail-reset markers.
guard catches the SIA0 placement change class.
Testing Checklist
pytest Tensile/Tests/unit/characterization/_codegen/test_r7_sia0_pgr2_placement_char.py- Status: Passed (local tl container, rocisa rebuilt against merged tree)False→pinned), confirming the guardAdjacent Tests Considered
A gfx950 MX subtile SIA0 variant was built and rejected: emitting it pre- vs
post-#8417 produced byte-identical assembly, because
UseSubtileImpluses theLogicalScheduler and bypasses the legacy SIA path #8417 modifies. The discriminating
config is therefore a non-subtile SIA0 kernel (gfx1250 F32X TN), derived from the
xfp32_gfx1250.yamlproblem the #8417 author lists among the configs that changed.Risk Acceptance / Waivers
W-NOTICKET — proactive coverage hardening; no defect ticket. The related change (#8417)
is linked and resolves.
Technical Changes
…/_designed/gfx1250/sia0_pgr2_xf32_tn.yaml: F32X TN, SIA0,PGR2,
TDMInst=0(non-TDM),StreamK=0,1LDSBuffer=0(SIA0 rejects1LDSBuffer!=0),reduced to one MI shape / size for a cheap emit.
…/_codegen/test_r7_sia0_pgr2_placement_char.py: an emit smoke test(
err==0, real gfx1250 asm) plus a golden snapshot of the Tensile-emittedTail: local read reset offsets a/bmarkers per kernel.…/__snapshots__/test_r7_sia0_pgr2_placement_char.ambr.the assembler), so they are toolchain-independent and flip on a change to the SIA0
non-TDM tail-reset / placement logic — the precise class of change [tensile] Fix SIA0 PGR2 global-read placement #8417 made.
ROCM-27082