Skip to content

feat(tensilelite): Add gfx1250 support for AdaptiveGemmNTAB#8421

Open
AndySu12 wants to merge 3 commits into
developfrom
users/andysu12/gfx1250_ntab
Open

feat(tensilelite): Add gfx1250 support for AdaptiveGemmNTAB#8421
AndySu12 wants to merge 3 commits into
developfrom
users/andysu12/gfx1250_ntab

Conversation

@AndySu12

@AndySu12 AndySu12 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

AIHPBLAS-879
JIRA ID : https://amd-hub.atlassian.net/browse/AIHPBLAS-879

Motivation

gfx1250 adds TemporalHint and NonVolatile cache modifiers that AdaptiveGemmNTAB uses for per-body cache behavior. AGNTAB emits three main-loop bodies differing only in their NonTemporal/TemporalHint state, but the hint was set by mutating immutable solution parameters and the loop labels collided across bodies. More importantly, every gfx1250 kernel is lowered through StinkyTofu, whose MUBUF model dropped th/nv, silently disabling the hint on all gfx1250 kernels. This PR makes AGNTAB cache hints generate correctly and survive the full rocisa → StinkyTofu → assembly pipeline.

Technical Details

In Tensilelite, the per-body TemporalHint now uses the transient tensorParameters (tP) working copy instead of mutating kernel object, and unrolled-loop labels are made NTAB-specific. StinkyTofu's MUBUFModifiers is extended to carry th/nv, plumbed through the rocisa conversion, asm emitter, and stir round-trip with isStore-aware naming. StinkyTofu emits purely from the values; capability gating (HasTHModifier/HasNVModifier) stays upstream in rocisa. The stir deserialization recovers isStore from the th token prefix so store hints aren't downgraded on round-trip.

Test Plan

test_mubuf.py, agntab_coverage_gfx1250.yaml, tox, hipblaslt-test

Test Result

PASSED

Submission Checklist

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends TensileLite/Tensile + StinkyTofu/rocisa plumbing to support gfx1250-specific cache-hint semantics needed by AdaptiveGemm / AdaptiveGemmNTAB, including temporal-hint (th:TH_*) and non-volatile (nv) modifiers, and adds a gfx1250 coverage YAML for AGNTAB.

Changes:

  • Add NonVolatile support to StinkyTofu MUBUF modifiers and ensure it round-trips through asm emission + (de)serialization.
  • Update Tensile KernelWriter to avoid label collisions under AdaptiveGemmNTAB and to map NTAB selections to gfx1250 temporal hints when supported.
  • Add gfx1250 AGNTAB coverage test YAML and extend rocisa tests for th and nv emission.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
shared/stinkytofu/src/serialization/asm/StinkyAsmEmitter.cpp Emit nv token for MUBUF modifiers when present.
shared/stinkytofu/src/serialization/asm/ModifierSerializer.cpp Serialize/deserialize nv and use store/load-specific temporal-hint strings.
shared/stinkytofu/src/conversion/rocisa/ToStinkyTofuUtils.cpp Convert rocisa nv into StinkyTofu and minor loop-body detection tweak.
shared/stinkytofu/include/stinkytofu/ir/asm/StinkyModifiers.hpp Introduce NonVolatile enum + parse/print helpers; extend MUBUFModifiers.
projects/hipblaslt/tensilelite/Tensile/Tests/common/gemm/gfx12/agntab_coverage_gfx1250.yaml New gfx1250 AGNTAB/AG coverage configuration.
projects/hipblaslt/tensilelite/Tensile/KernelWriter.py Make unrolled-loop labels NTAB-specific; adjust TH(A/B) under NTAB when supported.
projects/hipblaslt/tensilelite/rocisa/test/test_mubuf.py Add StinkyTofu emission tests for MUBUF temporal-hint and non-volatile modifiers.
projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Remove TH/NV asmCaps gating for FLAT/MUBUF/SMEM modifier emission (needs fixes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/rocisa/rocisa/include/container.hpp Outdated
@codecov-commenter

codecov-commenter commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...ects/hipblaslt/tensilelite/Tensile/KernelWriter.py 9.09% 10 Missing ⚠️

❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8421      +/-   ##
===========================================
- Coverage    71.33%   71.33%   -0.00%     
===========================================
  Files         2628     2628              
  Lines       413045   413056      +11     
  Branches     61875    61878       +3     
===========================================
+ Hits        294615   294616       +1     
- Misses       96656    96666      +10     
  Partials     21774    21774              
Flag Coverage Δ *Carryforward flag
TensileLite 76.64% <9.09%> (-0.01%) ⬇️
hipBLAS 90.81% <ø> (ø) Carriedforward from 0fd8b2c
hipBLASLt 41.35% <ø> (ø)
hipCUB 82.68% <ø> (ø) Carriedforward from 0fd8b2c
hipDNN 85.91% <ø> (ø) Carriedforward from 0fd8b2c
hipFFT 50.17% <ø> (ø) Carriedforward from 0fd8b2c
hipRAND 76.12% <ø> (ø) Carriedforward from 0fd8b2c
hipSOLVER 69.18% <ø> (ø) Carriedforward from 0fd8b2c
hipSPARSE 86.55% <ø> (ø) Carriedforward from 0fd8b2c
rocBLAS 48.06% <ø> (ø) Carriedforward from 0fd8b2c
rocFFT 46.30% <ø> (ø) Carriedforward from 0fd8b2c
rocRAND 57.07% <ø> (ø) Carriedforward from 0fd8b2c
rocSOLVER 76.92% <ø> (ø) Carriedforward from 0fd8b2c
rocSPARSE 72.37% <ø> (ø) Carriedforward from 0fd8b2c
rocThrust 91.36% <ø> (ø) Carriedforward from 0fd8b2c

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...blaslt/tensilelite/Tensile/KernelWriterAssembly.py 69.04% <ø> (ø)
...ects/hipblaslt/tensilelite/Tensile/KernelWriter.py 70.73% <9.09%> (-0.09%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread projects/hipblaslt/tensilelite/Tensile/KernelWriter.py Outdated
Comment thread shared/stinkytofu/src/conversion/rocisa/ToStinkyTofuUtils.cpp
@AndySu12 AndySu12 force-pushed the users/andysu12/gfx1250_ntab branch 7 times, most recently from 94ac370 to 74d72c9 Compare June 17, 2026 15:47
@AndySu12 AndySu12 requested review from KKyang, Copilot and nakajee June 17, 2026 15:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread shared/stinkytofu/src/serialization/asm/ModifierSerializer.cpp Outdated

@KKyang KKyang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look the comment above

@AndySu12 AndySu12 changed the title [tensilelite] Add gfx1250 support for AdaptiveGemm and AdaptiveGemmNTAB [tensilelite] Add gfx1250 support for AdaptiveGemmNTAB Jun 23, 2026
@AndySu12 AndySu12 force-pushed the users/andysu12/gfx1250_ntab branch 2 times, most recently from f6ba794 to d4d0c00 Compare June 23, 2026 09:12
@AndySu12 AndySu12 requested a review from Copilot June 23, 2026 09:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

@AndySu12 AndySu12 force-pushed the users/andysu12/gfx1250_ntab branch from d4d0c00 to a5f8a4b Compare June 24, 2026 00:35
@AndySu12 AndySu12 requested a review from talumbau June 25, 2026 04:20
@AndySu12 AndySu12 requested review from a team and jichangjichang June 25, 2026 06:42
@AndySu12 AndySu12 force-pushed the users/andysu12/gfx1250_ntab branch from a5f8a4b to f73ca4f Compare June 30, 2026 07:10
@therock-pr-bot

therock-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

✅ All Checks Passed — Ready for Review

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ✅ Pass
Forbidden Files ✅ Pass
🧪 Unit Test ✅ Pass
🔎 pre-commit ✅ Pass
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled
🤖 therock-pr-bot ✅ Pass

🎉 All checks passed! This PR is ready for review.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot

therock-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

🎉 All checks passed! This PR is ready for review.

@AndySu12 AndySu12 changed the title [tensilelite] Add gfx1250 support for AdaptiveGemmNTAB feat(tensilelite) Add gfx1250 support for AdaptiveGemmNTAB Jun 30, 2026
@AndySu12 AndySu12 changed the title feat(tensilelite) Add gfx1250 support for AdaptiveGemmNTAB feat(tensilelite): add gfx1250 support for AdaptiveGemmNTAB Jun 30, 2026
@AndySu12 AndySu12 changed the title feat(tensilelite): add gfx1250 support for AdaptiveGemmNTAB feat(tensilelite): Add gfx1250 support for AdaptiveGemmNTAB Jun 30, 2026
AndySu12 added 3 commits July 1, 2026 16:59
StinkyTofu's MUBUF modifier model did not carry the gfx1250 TemporalHint
(th:) or NonVolatile (nv) fields, so converting a rocisa module via
toStinkyTofuModule and rendering it with emitAssembly dropped them while
keeping scope:. Because every gfx1250 kernel is emitted through StinkyTofu
for any ScheduleIterAlg, this silently disabled the AdaptiveGemmNTAB
per-body non-temporal hint (th:TH_LOAD_NT) on all gfx1250 kernels.

Add TemporalHint/NonVolatile to StinkyTofu's MUBUFModifiers across the
model, the rocisa->StinkyTofu conversion, the asm emitter, and the .stir
round-trip, mirroring rocisa's encoding and isStore-aware naming. StinkyTofu
emits th:/nv purely from the modifier values; capability gating lives in
rocisa (container.hpp) upstream. Add regression tests for the th: and nv
paths.
…working copy

AdaptiveGemmNTAB emits multiple loop bodies whose A/B global loads differ
only by their non-temporal/temporal-hint state. NonTemporal already switches
this per body via the transient tensorParameters["NonTemporal"] working copy;
TemporalHint, however, was toggled by mutating the kernel solution parameters
kernel["TemporalHintA"/"TemporalHintB"] directly, which is an architectural
violation (solution parameters are immutable inputs, not per-body scratch).

Mirror the NonTemporal pattern: seed tP["TemporalHint"] alongside
tP["NonTemporal"] at tensorParameters init, read it on the A/B global-read
paths (TemporalHint(tP["TemporalHint"])), and snapshot/set/restore it at the
tensorParameters level inside the NTAB body loop. The kernel-level
TemporalHint solution parameters are no longer mutated.

Behavior is unchanged: the same per-body int values flow through the same
TemporalHint enum into decodeNonTemporal; only the storage location moves
from the kernel dict to the per-tensor working copy. The kernel-direct load
path (non-tP) keeps reading the solution parameter as before.
@AndySu12 AndySu12 force-pushed the users/andysu12/gfx1250_ntab branch from f73ca4f to 039fe23 Compare July 1, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants