Skip to content

Fix naming bug that would incorrectly discard kernels as duplicates#8745

Open
AlexBrownAMD wants to merge 3 commits into
developfrom
users/alexbrownamd/WGMHang
Open

Fix naming bug that would incorrectly discard kernels as duplicates#8745
AlexBrownAMD wants to merge 3 commits into
developfrom
users/alexbrownamd/WGMHang

Conversation

@AlexBrownAMD

Copy link
Copy Markdown
Contributor

Motivation

Fixes a bug that could cause kernels to hang or produce invalid results.

Technical Details

WGMXCC value was ignored in the kernel name (flagged as an internal arg). But, WGMXCC=-1 generates different assembly code from WGMXCC set to a fixed value. One version generates code to handle chunking, the other is regular xcc mapping.

If WGMXCC is left out of the name, kernels with different values are discarded as duplicates, even though they contain different assembly code. This can lead to errors like kernel hangs or validation errors if a kernel with regular xcc mapping is called with arguments meant for the chunking algorithm.

This change includes WGMXCC in kernel name as either WGMXCCn1 for -1 (chunk) or WGMXCC1 for any other fixed value since they would produce the same kernel code.

Test Plan

Tested locally and it fixes the kernel hang that was discovered. Running CI to verify other cases.

@nakajee

nakajee commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Is this the right approach?
Shouldn't we make asm code independent from WorkGroupMappingXCC?

@codecov-commenter

codecov-commenter commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.89%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8745   +/-   ##
========================================
  Coverage    71.50%   71.50%           
========================================
  Files         2612     2612           
  Lines       407895   407895           
  Branches     60982    60982           
========================================
  Hits        291632   291632           
  Misses       94879    94879           
  Partials     21384    21384           
Flag Coverage Δ *Carryforward flag
TensileLite 76.92% <ø> (ø) Carriedforward from 3bda378
hipBLAS 90.81% <ø> (ø) Carriedforward from 3bda378
hipBLASLt 41.36% <ø> (ø)
hipCUB 82.68% <ø> (ø) Carriedforward from 3bda378
hipDNN 86.74% <ø> (ø) Carriedforward from 3bda378
hipFFT 50.17% <ø> (ø) Carriedforward from 3bda378
hipRAND 76.12% <ø> (ø) Carriedforward from 3bda378
hipSOLVER 69.18% <ø> (ø) Carriedforward from 3bda378
hipSPARSE 86.55% <ø> (ø) Carriedforward from 3bda378
rocBLAS 48.49% <ø> (ø) Carriedforward from 3bda378
rocFFT 47.16% <ø> (ø) Carriedforward from 3bda378
rocRAND 57.07% <ø> (ø) Carriedforward from 3bda378
rocSOLVER 77.89% <ø> (ø) Carriedforward from 3bda378
rocSPARSE 72.37% <ø> (ø) Carriedforward from 3bda378
rocThrust 91.34% <ø> (ø) Carriedforward from 3bda378

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...t/tensilelite/Tensile/Common/RequiredParameters.py 100.00% <ø> (ø)
...aslt/tensilelite/Tensile/SolutionStructs/Naming.py 98.33% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AlexBrownAMD

Copy link
Copy Markdown
Contributor Author

Is this the right approach? Shouldn't we make asm code independent from WorkGroupMappingXCC?

There are 2 xcc mapping algorithms: regular xcc mapping and the new chunking algorithm. For now I think both are needed since we have many kernels tuned using both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants