Skip to content

fix: [hipblaslt] mi350P performance gap#8973

Open
yenong-amd wants to merge 3 commits into
developfrom
users/yenong-amd/mi350P_lib_changes
Open

fix: [hipblaslt] mi350P performance gap#8973
yenong-amd wants to merge 3 commits into
developfrom
users/yenong-amd/mi350P_lib_changes

Conversation

@yenong-amd

@yenong-amd yenong-amd commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

JIRA ID: AIHPBLAS-1868
JIRA ID: AIHPBLAS-3634
JIRA ID: AIHPBLAS-3635

Motivation

Library logic changes to improve performance for mi350P.

Technical Details

Replaced small MT 16x16x256, 32x16x256 and 32x32x256 solutions with higher PGR values in BBS.
Added MT 384x160x64, 384x128x64, 384x64x64 in BBS.
Removed MT 128x128x256.

Test Plan

Tested in combination with model changes for mi350P.

Submission Checklist

@yenong-amd yenong-amd requested a review from a team as a code owner June 30, 2026 17:55
@therock-pr-bot

therock-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

✅ All Checks Passed — Ready for Review

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ✅ Pass
Forbidden Files ✅ Pass
🧪 Unit Test ✅ Pass PR does not contain code files — Unit Test auto-passed
🔎 pre-commit ✅ Pass
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled
🤖 therock-pr-bot ✅ Pass

🎉 All checks passed! This PR is ready for review.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot

therock-pr-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

🎉 All checks passed! This PR is ready for review.

@codecov-commenter

codecov-commenter commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8973      +/-   ##
===========================================
+ Coverage    71.33%   71.37%   +0.04%     
===========================================
  Files         2628     2628              
  Lines       413045   413209     +164     
  Branches     61875    61882       +7     
===========================================
+ Hits        294615   294917     +302     
+ Misses       96656    96484     -172     
- Partials     21774    21808      +34     
Flag Coverage Δ *Carryforward flag
TensileLite 76.65% <ø> (-<0.01%) ⬇️ Carriedforward from 4cf4119
hipBLAS 90.81% <ø> (ø) Carriedforward from 4cf4119
hipBLASLt 41.35% <ø> (ø)
hipCUB 82.68% <ø> (ø) Carriedforward from 4cf4119
hipDNN 85.91% <ø> (ø) Carriedforward from 4cf4119
hipFFT 50.17% <ø> (ø) Carriedforward from 4cf4119
hipRAND 76.12% <ø> (ø) Carriedforward from 4cf4119
hipSOLVER 69.18% <ø> (ø) Carriedforward from 4cf4119
hipSPARSE 86.55% <ø> (ø) Carriedforward from 4cf4119
rocBLAS 48.49% <ø> (+0.43%) ⬆️ Carriedforward from 4cf4119
rocFFT 46.30% <ø> (ø) Carriedforward from 4cf4119
rocRAND 57.07% <ø> (ø) Carriedforward from 4cf4119
rocSOLVER 76.92% <ø> (ø) Carriedforward from 4cf4119
rocSPARSE 72.37% <ø> (ø) Carriedforward from 4cf4119
rocThrust 91.36% <ø> (ø) Carriedforward from 4cf4119

*This pull request uses carry forward flags. Click here to find out more.
see 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yenong-amd yenong-amd changed the title [hipblaslt] mi350P library logic changes fix: [hipblaslt] mi350P performance gap Jul 1, 2026
@@ -459,7 +459,7 @@
ScheduleGlobalRead: 1
ScheduleIterAlg: 3
ScheduleLocalWrite: 1
SolutionIndex: 1
SolutionIndex: 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lands at index 0, but MT256x256x32 is already at index 0 earlier in the file (develop had these at 0 and 1). Same duplicate-0 in the other five logic YAMLs — 315 solutions but max index is 313. Probably needs another renumber pass so indices stay contiguous and unique.

@@ -58276,34 +58567,34 @@
InternalSupportParams: {KernArgsVersion: 2, SupportCustomStaggerU: true, SupportCustomWGM: true, SupportUserGSU: false, UseUniversalArgs: true}
Kernel: true
KernelLanguage: Assembly
KernelNameMin: Cijk_Alik_Bljk_BBS_BH_BiasSB_HAS_SAV_UserArgs_MT128x128x256_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTLA0_DTLB0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA8_GRVWB8_GSU0_GSUAMB_GLS0_ISA950_IU1_K1_LBSPPA2048_LBSPPB2048_LBSPPM0_LPA16_LPB16_LPM0_LRVW8_LWPMn1_MIAV0_MIWT4_4_MO40_NTn1_NTA0_NTB0_NTC1_NTD1_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR1_PKA1_SIA3_SS1_SPO1_SRVW0_SSO4_SVW4_SK5_SKXCCM0_TLDS1_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA4_VWB4_WSGRA0_WSGRB0_WS64_WG32_8_1_WGM0_WGMXCCn1
KernelNameMin: Cijk_Alik_Bljk_BBS_BH_BiasSB_HAS_SAV_UserArgs_MT128x192x128_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTLA0_DTLB0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA8_GRVWB8_GSU0_GSUAMB_GLS0_ISA950_IU1_K1_LBSPPA2048_LBSPPB256_LBSPPM0_LPA16_LPB16_LPM0_LRVW8_LWPMn1_MIAV0_MIWT8_3_MO40_NTn1_NTA0_NTB0_NTC1_NTD4_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR1_PKA1_SIA3_SS1_SPO0_SRVW0_SSO0_SVW8_SK5_SKXCCM0_TLDS2_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA8_VWB1_WSGRA0_WSGRB0_WS64_WG16_16_1_WGM0_WGMXCCn1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit on the PR description: 128x128x256 wasn't just dropped here — it's replaced with 128x192x128. The small MT updates are also kernel-family swaps (BH_UserArgsBiasSB_HAS_SAV) with higher PGR, not only a PGR tweak. HHS adds 48x64x256 and 80x128x128 too, which isn't called out in the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants