fix: [hipblaslt] mi350P performance gap#8973
Conversation
✅ All Checks Passed — Ready for Review
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🎉 All checks passed! This PR is ready for review. |
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #8973 +/- ##
===========================================
+ Coverage 71.33% 71.37% +0.04%
===========================================
Files 2628 2628
Lines 413045 413209 +164
Branches 61875 61882 +7
===========================================
+ Hits 294615 294917 +302
+ Misses 96656 96484 -172
- Partials 21774 21808 +34
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
| @@ -459,7 +459,7 @@ | |||
| ScheduleGlobalRead: 1 | |||
| ScheduleIterAlg: 3 | |||
| ScheduleLocalWrite: 1 | |||
| SolutionIndex: 1 | |||
| SolutionIndex: 0 | |||
There was a problem hiding this comment.
This lands at index 0, but MT256x256x32 is already at index 0 earlier in the file (develop had these at 0 and 1). Same duplicate-0 in the other five logic YAMLs — 315 solutions but max index is 313. Probably needs another renumber pass so indices stay contiguous and unique.
| @@ -58276,34 +58567,34 @@ | |||
| InternalSupportParams: {KernArgsVersion: 2, SupportCustomStaggerU: true, SupportCustomWGM: true, SupportUserGSU: false, UseUniversalArgs: true} | |||
| Kernel: true | |||
| KernelLanguage: Assembly | |||
| KernelNameMin: Cijk_Alik_Bljk_BBS_BH_BiasSB_HAS_SAV_UserArgs_MT128x128x256_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTLA0_DTLB0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA8_GRVWB8_GSU0_GSUAMB_GLS0_ISA950_IU1_K1_LBSPPA2048_LBSPPB2048_LBSPPM0_LPA16_LPB16_LPM0_LRVW8_LWPMn1_MIAV0_MIWT4_4_MO40_NTn1_NTA0_NTB0_NTC1_NTD1_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR1_PKA1_SIA3_SS1_SPO1_SRVW0_SSO4_SVW4_SK5_SKXCCM0_TLDS1_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA4_VWB4_WSGRA0_WSGRB0_WS64_WG32_8_1_WGM0_WGMXCCn1 | |||
| KernelNameMin: Cijk_Alik_Bljk_BBS_BH_BiasSB_HAS_SAV_UserArgs_MT128x192x128_MI16x16x1_SN_LDSB1_AFC0_AFEM1_AFEM1_ASEM1_CLR0_CADS0_DTLA0_DTLB0_DTVA0_DTVB0_EPS0_FDSI0_GRPM1_GRVWA8_GRVWB8_GSU0_GSUAMB_GLS0_ISA950_IU1_K1_LBSPPA2048_LBSPPB256_LBSPPM0_LPA16_LPB16_LPM0_LRVW8_LWPMn1_MIAV0_MIWT8_3_MO40_NTn1_NTA0_NTB0_NTC1_NTD4_NTM0_NEPBS0_NLCA1_NLCB1_ONLL1_PGR2_PLR1_PKA1_SIA3_SS1_SPO0_SRVW0_SSO0_SVW8_SK5_SKXCCM0_TLDS2_ULSGRO0_USL1_UIOFGRO0_USFGRO0_VSn1_VWA8_VWB1_WSGRA0_WSGRB0_WS64_WG16_16_1_WGM0_WGMXCCn1 | |||
There was a problem hiding this comment.
Nit on the PR description: 128x128x256 wasn't just dropped here — it's replaced with 128x192x128. The small MT updates are also kernel-family swaps (BH_UserArgs → BiasSB_HAS_SAV) with higher PGR, not only a PGR tweak. HHS adds 48x64x256 and 80x128x128 too, which isn't called out in the description.
JIRA ID: AIHPBLAS-1868
JIRA ID: AIHPBLAS-3634
JIRA ID: AIHPBLAS-3635
Motivation
Library logic changes to improve performance for mi350P.
Technical Details
Replaced small MT 16x16x256, 32x16x256 and 32x32x256 solutions with higher PGR values in BBS.
Added MT 384x160x64, 384x128x64, 384x64x64 in BBS.
Removed MT 128x128x256.
Test Plan
Tested in combination with model changes for mi350P.
Submission Checklist