Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_*_xdlops solvers #3923

amd-bartgips · 2025-07-31T13:37:28Z

Proposed changes

With this PR MIOpen should be able to use heuristics for the three conv_hip_implicit_gemm_3d_grouped_*_xdlops.
I have made bunch of changes to how the metadata is structured, we are able to robustly map kernel config strings that MIOpen generates from CK to vectors that can be used as input to the fdeep model.

I have added gtest functions for all the new stuff.

I continued with the structure set up before:

ai_heuristics.cpp contains all the code already there beforehand, plust the fdeep code, since fdeep can only be imported in a single file
ai_candidate_selection contains the actual "model" and "metadata" classes that do the computation using floating point vectors. This his been somewhat updated to make the translation from "input" (=fdbkey) and "output" (=kernel configs) more robust
ai_conv_3d_kernel_tuning_utils contains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it.
kernels/gfx942.... model and metadatafiles. Perhaps these should gitlfs, but I suggest we worry about that when we make the final PR in the monorepo
solver/conv/...cpp solver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily on ai_conv_3d_kernel_tuning_utils and through that on ai_candidate_selection
gtest files: should speak for themselves, please have a look
solvers.hpp a huuuuge header file that contains declarations for all solvers (i.e. for the solver/conv/...cpp files), so this needed to be altered as well

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added automated tests relevant to the introduced functionality
I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR
I have ran the tests, and they are all passing locally
I have added relevant documentation for the changes
I have removed the stale documentation which is no longer relevant after this pull request
I have ran make format & make check_format to ensure the changes have been formatted

TODO: * check if metadata is correct. * Implement relevant cpp functions to load models and run them

…date kernel config

…r submodel

… the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver

… (encoding + dropping constants)

* Forward declaration of EncodeKernelParams.

Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)

…e for easier reuse across other 3D conv solvers

* Added first attempt at fdeep versions of 3d kernel tuning models (two towers). TODO: * Implement relevant cpp functions to load models and run them * Added co-pilot generated new model class for selecting the best candidate kernel config * performed dot product directly in cpp code, removed candidate_selector submodel * Added the machinery to use the new candidate selection heuristics for the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver * added split_k functionality * refactored by moving some helper functions outside the main function * improved loading of metadata for candidate selection model * split off own metadata class to keep it distinct from legacy version * Added new versions of CS models (+new metadata) * Added methods for preprocessing input and kernel_convigs for CS model (encoding + dropping constants) * fixed naming of CandidateSelectionMetadata and Ptrs variables * Added new model and metadata class to header file * removed unused function * removed duplicate definition of CandidateSelectionMetadata * * altered SelectBestCandidate to avoid errors. * Forward declaration of EncodeKernelParams. * removed unused function * Build now works. Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use) * moved new candidateSelection code to its own files (and sub namespace) * refactored general 3D conv kernel tuning functions into their own file for easier reuse across other 3D conv solvers * Cleaned up unused include * removed superfluous includes

All fdeep references should stay in ai_heuristics.cpp because fdeep stuff is not declared inline.

…checks

…default. Can be altered to single thread using MIOPEN_AI_FDEEP_USE_SINGLE_PREDICT env var

…orresponding tests. TODO: check if split_k needs to be adjusted

Implemented white-space blind matching for encoding values in metadata for kernel configs.

amd-bartgips · 2025-08-08T06:32:48Z

src/conv/heuristics/ai_conv_3d_kernel_tuning_utils.cpp

+        }
+        if(use_split_k)
+        {
+            std::vector<int> split_ks = GenerateSplitK(128); // TODO: make configurable


This we could perhaps put in the metadata? Since our model has only been trained using powers of two up till 128. Meaning, from the training data we know exactly which values should be in "split_ks", so we could save this when generating the metadata file, and fill this directly from that file in this line here.
However, as Ville said, there is no inherent reason why we should stick to these values. The model has been trained using the real values as input (i.e. 1, 2, 4, ... 128, not some kind of categorical coding like the original KTN did), so the model should be able to interpolate to other values.

let's do that. metadata.

amd-bartgips · 2025-08-08T06:33:59Z

src/conv/heuristics/ai_candidate_selection.cpp

+                                    value    = static_cast<float>(kv.second);
+                                    found_ws = true;
+                                    // TODO consider decreasing this verbosity
+                                    MIOPEN_LOG_I2("Found whitespace-stripped match for output "


This is a lot of logging at the moments, since it seems the labels have changed (i.e. some spaces were removed). Should we remove this? Or log it at a higher debug level?

I'm not sure if there's any higher debug level available.

amd-bartgips · 2025-08-08T06:57:29Z

src/solver/conv/conv_hip_implicit_gemm_3d_grouped_bwd_xdlops.cpp

+       !env::disabled(MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_BWD_XDLOPS_AI_HEUR))
+    {
+        bool ai_success = false;
+        using DataType = float; // This is a stupid hack. The same hold for the other three solvers.


I don't know exactly what this "DataType" is supposed to be. Is it related to the datatype of the convolution operation? Or is this some kind of internal datatype that MIOpen uses to pass around values?

I suppose we could copy e.g. src/solver/conv/conv_hip_implicit_gemm_grouped_bwd_xdlops.cpp

if(problem.GetInDataType() == miopenFloat) { if(RunParameterPredictionModel<float>(ctx, problem)) return; } else if(problem.GetInDataType() == miopenBFloat16) { if(RunParameterPredictionModel<ck::bhalf_t>(ctx, problem)) return; } else { if(RunParameterPredictionModel<ck::half_t>(ctx, problem)) return; }

But I do not know why those three datatypes are hard coded there either.... (nor do I understand when to use MIOpen or CK datatypes)

I decided to move the fill_valid_kernels function inside the RunParameterPredictionModel function/template. So it can just reuse the datatype there
(working on it, not yet committed)

On second thought this turned out to be too difficult, since there are too many dependencies of thing declared in the respective solver's .cpp files. So I kept the current structure where a fill_valid_kernel lambda is still declared/defined in the solvers' .cpp files, but now differently for the different datatypes

…ption

* Add 3D convolution AI heuristics implementation - Add Gfx942Model_3D class for 3D convolution predictions - Implement metadata_3d.cpp for 3D feature extraction - Add trained 3D TunaNet models for gfx942 - Update CMakeLists.txt for 3D components * remove redundant hpp include on fstream" * updated unused ExecutionContext parameter. * refactor: update metadata_3d exception pattern to use std::optional * Add comprehensive gtest suite for 3D convolution AI heuristics - Add 13 unit tests covering 3D AI heuristics functionality - Test metadata loading, model creation, and solver predictions - Add MIOpenDriver equivalent test (BartTest) for real-world validation - Enhance logging to show solver predictions with scores - Verify correct solver selection for 3D convolution problems * Added newline to end of file + Clang formatting * Removed const_cast calls for private member variables in the constructor of Metadata3D, since the compiler did not like this. * Added [[maybe_unused]] to silence warnings of unused arch arguments * automated formatting using Clang * More Clang formatting --------- Co-authored-by: amd-bartgips <[email protected]>

…gips/3d_conv_test_funcs

…INGLE_PREDICT env var. Added docstring that explains

vpietila-amd

Looks good, only some minor comments.

src/conv/heuristics/ai_candidate_selection.cpp

vpietila-amd · 2025-08-12T08:17:08Z

src/conv/heuristics/ai_candidate_selection.cpp

+                                    value    = static_cast<float>(kv.second);
+                                    found_ws = true;
+                                    // TODO consider decreasing this verbosity
+                                    MIOPEN_LOG_I2("Found whitespace-stripped match for output "


I'm not sure if there's any higher debug level available.

vpietila-amd · 2025-08-12T08:30:46Z

src/conv/heuristics/ai_heuristics.cpp

+{
+    const char* env = std::getenv("MIOPEN_AI_FDEEP_USE_SINGLE_PREDICT");
+    return env != nullptr && std::string(env) == "1";
+}


I would add here "thread" to make it more clear that it refers to using a single thread for running the inference.

that's maybe also good. I have now added some mutex stuff based co-pilot's suggestions on how to deal with the "thread-safe" warnings.
see the latest changes.

I could not really tell whether this was something we could ignore or not.
I understood that the env var could be changed and therefore it would have different effects between different threads?
In your opinion:
a) is this a thing we should worry about? Or is it perfectly normal to allow this slight "vulnerability"
b) is the mutex thing suggested by co-pilot actually solving anything for us? (other than silencing warnings)

src/kernels/gfx942_ConvHipImplicitGemm3DGroupBwdXdlops_metadata.tn.model

src/kernels/gfx942_ConvHipImplicitGemm3DGroupFwdXdlops_metadata.tn.model

src/kernels/gfx942_ConvHipImplicitGemm3DGroupWrwXdlops_metadata.tn.model

vpietila-amd · 2025-08-12T09:43:23Z

src/solver/conv/conv_hip_implicit_gemm_3d_grouped_bwd_xdlops.cpp

+                [=](const miopen::conv::ProblemDescription& problem) -> std::vector<std::string> {
+                return miopen::solver::FillValidKernelsIDs<DeviceOpGBwdDefaultPtrs<ck::half_t>,
+                                                           CKArgs<ck::half_t>>(problem);
+            };


Consider defining a templated function

template<typename T> bool RunPrediction(const ProblemDescription& problem, ...) { auto fill_valid_kernels = [=](const miopen::conv::ProblemDescription& problem) -> std::vector<std::string> { return miopen::solver::FillValidKernelsIDs<DeviceOpGBwdDefaultPtrs<T>, CKArgs<T>>(problem); }; return miopen::solver::conv::RunParameterPredictionModel<T>( ctx, problem, valid_kernels, index, split_k, kernel_id, fill_valid_kernels, solver_name); }

to decrease the repetition. This could be used in all branches of the switch statement. It is then also easier to add new data types.

I tried doing that... but it quickly spiralled out of control so I eventually reverted back to this ugly solution and moved on. I will see if I can give it another final try.

src/solver/conv/conv_hip_implicit_gemm_3d_grouped_fwd_xdlops.cpp

vpietila-amd · 2025-08-12T09:59:51Z

src/solver/conv/conv_hip_implicit_gemm_3d_grouped_wrw_xdlops.cpp

+                    DeviceOpGBwdWeightDefaultPtrs<ck::half_t>,
+                    CKArgs<ck::half_t>>(problem);
+            };
            ai_success =


Could again extract the repeated method lines from switch branches into a single templated function.

… tower model) (#1154) Copied over form branch in old repo: ROCm/MIOpen#3923 ## Motivation With this PR MIOpen should be able to use heuristics for 3D convolutions on gfx942: * the parameter selection/kernel tuning of the three `conv_hip_implicit_gemm_3d_grouped_*_xdlops` solvers. We have added some `ai_*` files housing the new models and helper functions, and made some changes to existing files where required. * ~~the solver selection (i.e. a "3D tunanet")~~ At the moment the 3D solver selection model is inadequate and the already existing WTI fallback is preferred. Improving the 3D solver selection heuristics will be the focus of a future PR ## Technical Details * `ai_heuristics.cpp` contains all the code already there beforehand, plust new code that relies on fdeep includes, since fdeep can only be imported in a single file * `ai_candidate_selection` contains the actual two-towers (aka CandidateSelection) "model" and "metadata" classes that do the computation using floating point vectors. * `ai_conv_3d_kernel_tuning_utils` contains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it in this file. * `kernels/gfx942....` model and metadatafiles. Perhaps these should be committed using git lfs, but I have not seen this done to other model files (e.g. Tunanet or KTN), so have not done so here. * `solver/conv/...cpp` solver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily on `ai_conv_3d_kernel_tuning_utils` and through that on `ai_candidate_selection` * gtest files: should speak for themselves, please have a look. They test all the new machinery * solvers.hpp a huge header file that contains declarations for all solvers (i.e. for the `solver/conv/...cpp` files), so this needed to be altered as well ## Test Plan 3 new gtest .cpp files are added: * ~~`test/gtest/conv_ai_3d_heuristics.cpp` This aims to test all new functionality related to the 3D tunanet (model + metadata).~~ While the tests are still there for future use, they will be skipped since 3D Tunanet model data is no longer included. * `test/gtest/conv_ai_3d_kernel_tuning_utils.cpp` Aims to test all new machinery in `ai_conv_3d_kernel_tuning_utils.cpp` (preprocessing and handling of inputs to the CandidateSelectionModel for the 3D solvers). * `test/gtest/conv_ai_candidate_selection_model.cpp` Test interenal code related to the CandidateSelectionModel and its metadata. ## Test Result The `./bin/test_conv_ai_*` tests all succeed without errors when building and running them on a conductor MI300 node. Besides manually running all other `./bin/test_*` functions, is there a better way to perform a full test? ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

[MIOpen] Implement kernel tuning heuristic model for 3D conv ops (two tower model) (#1154) Copied over form branch in old repo: #3923 ## Motivation With this PR MIOpen should be able to use heuristics for 3D convolutions on gfx942: * the parameter selection/kernel tuning of the three `conv_hip_implicit_gemm_3d_grouped_*_xdlops` solvers. We have added some `ai_*` files housing the new models and helper functions, and made some changes to existing files where required. * ~~the solver selection (i.e. a "3D tunanet")~~ At the moment the 3D solver selection model is inadequate and the already existing WTI fallback is preferred. Improving the 3D solver selection heuristics will be the focus of a future PR ## Technical Details * `ai_heuristics.cpp` contains all the code already there beforehand, plust new code that relies on fdeep includes, since fdeep can only be imported in a single file * `ai_candidate_selection` contains the actual two-towers (aka CandidateSelection) "model" and "metadata" classes that do the computation using floating point vectors. * `ai_conv_3d_kernel_tuning_utils` contains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it in this file. * `kernels/gfx942....` model and metadatafiles. Perhaps these should be committed using git lfs, but I have not seen this done to other model files (e.g. Tunanet or KTN), so have not done so here. * `solver/conv/...cpp` solver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily on `ai_conv_3d_kernel_tuning_utils` and through that on `ai_candidate_selection` * gtest files: should speak for themselves, please have a look. They test all the new machinery * solvers.hpp a huge header file that contains declarations for all solvers (i.e. for the `solver/conv/...cpp` files), so this needed to be altered as well ## Test Plan 3 new gtest .cpp files are added: * ~~`test/gtest/conv_ai_3d_heuristics.cpp` This aims to test all new functionality related to the 3D tunanet (model + metadata).~~ While the tests are still there for future use, they will be skipped since 3D Tunanet model data is no longer included. * `test/gtest/conv_ai_3d_kernel_tuning_utils.cpp` Aims to test all new machinery in `ai_conv_3d_kernel_tuning_utils.cpp` (preprocessing and handling of inputs to the CandidateSelectionModel for the 3D solvers). * `test/gtest/conv_ai_candidate_selection_model.cpp` Test interenal code related to the CandidateSelectionModel and its metadata. ## Test Result The `./bin/test_conv_ai_*` tests all succeed without errors when building and running them on a conductor MI300 node. Besides manually running all other `./bin/test_*` functions, is there a better way to perform a full test? ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

amd-bartgips added 30 commits July 25, 2025 09:44

Added first attempt at fdeep versions of 3d ktn models.

49a0bbb

TODO: * check if metadata is correct. * Implement relevant cpp functions to load models and run them

Added co-pilot generated new model class for selecting the best candi…

d015559

…date kernel config

performed dot product directly in cpp code, removed candidate_selecto…

289a1f2

…r submodel

Added the machinery to use the new candidate selection heuristics for…

2c155cc

… the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver

added split_k functionality

10722a3

refactored by moving some helper functions outside the main function

fe19eb9

improved loading of metadata for candidate selection model

2e000f3

split off own metadata class to keep it distinct from legacy version

dbf021b

Added new versions of CS models (+new metadata)

11ec8d5

Added methods for preprocessing input and kernel_convigs for CS model…

05871e3

… (encoding + dropping constants)

fixed naming of CandidateSelectionMetadata and Ptrs variables

b007d9e

Added new model and metadata class to header file

7574ca7

removed unused function

e795096

removed duplicate definition of CandidateSelectionMetadata

c0a95a8

* altered SelectBestCandidate to avoid errors.

b89cae2

* Forward declaration of EncodeKernelParams.

removed unused function

b09b18f

Build now works.

cb846ec

Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)

moved new candidateSelection code to its own files (and sub namespace)

efeece1

refactored general 3D conv kernel tuning functions into their own fil…

33244fc

…e for easier reuse across other 3D conv solvers

Cleaned up unused include

25b25a8

removed superfluous includes

279b87a

made model methods public. Made output vectors instead of fdeep tensors.

2cbb65d

started on 3D tuning heuristics test function

5473d54

Reorganised declaration of classes and function (in .hpp vs .cpp)

de3b965

All fdeep references should stay in ai_heuristics.cpp because fdeep stuff is not declared inline.

added ai_candidate_selection.cpp to cmakelists

ece42d0

added encodekernelconfig test. Changed assert calls into explicit if …

af37d7a

…checks

fdeep does not support batch processing, so loop over candidates

f482b50

implemented multi-thread prediction for candidate condig encoding by …

5926d4a

…default. Can be altered to single thread using MIOPEN_AI_FDEEP_USE_SINGLE_PREDICT env var

added more (edge case) test functions

ebaf508

amd-bartgips added 4 commits August 6, 2025 14:30

removed old non-gtest test functions

19570bd

copied functionality to other 2 direction solvers (Bwd, Fwd). Added c…

cd39f00

…orresponding tests. TODO: check if split_k needs to be adjusted

Implemented working heuristics for all 3 conv 3d solvers.

5141c18

Implemented white-space blind matching for encoding values in metadata for kernel configs.

converted std::cerr into MIOPEN_THROW. Made metadata const

657aaf8

amd-bartgips changed the title ~~Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_wrw_xdlops~~ Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_*_xdlops solvers Aug 7, 2025

amd-bartgips requested review from JH-Leon-KIM-AMD and vpietila-amd August 7, 2025 13:58

amd-bartgips marked this pull request as ready for review August 7, 2025 13:59

amd-bartgips commented Aug 8, 2025

View reviewed changes

Added latest model files, added TODO statement

6466043

amd-bartgips commented Aug 8, 2025

View reviewed changes

amd-bartgips and others added 3 commits August 8, 2025 08:14

undid erroneous additions of "const" prefix to member variables

7e06292

removed hard-coded datatype, now properly deduced form problem descri…

4755e4a

…ption

amd-bartgips mentioned this pull request Aug 12, 2025

Add 3D convolution AI heuristics implementation #3925

Merged

amd-bartgips added 8 commits August 12, 2025 07:42

Merge remote-tracking branch 'origin/miopenff/3d-heuristic' into bart…

3cccb91

…gips/3d_conv_test_funcs

replaced funciton-local static dummy_ctx with a GetDummyCtx function

344b27b

fixed discrepancy in name of GetDummyCtx helper function

cfc273e

fixed mismatch in declaration of GetNanToken

7012c6c

fixed discrepancy in name of GetDummyCtx helper function

d4af913

move to range-based for loop to remove warning

4b1035d

cache env var to make it thread safe

9e138ca

Added a mutex for thread-safe initialization of MIOPEN_AI_FDEEP_USE_S…

a2de4cd

…INGLE_PREDICT env var. Added docstring that explains

amd-bartgips changed the base branch from miopenff/3d-heuristic to develop_deprecated August 12, 2025 09:04

changed year in copyright statement

0b203de

amd-bartgips mentioned this pull request Aug 12, 2025

[MIOpen] Implement kernel tuning heuristic model for 3D conv ops (two tower model) ROCm/rocm-libraries#1154

Merged

1 task

vpietila-amd approved these changes Aug 12, 2025

View reviewed changes

Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_*_xdlops solvers #3923

Are you sure you want to change the base?

Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_*_xdlops solvers #3923

Uh oh!

Conversation

amd-bartgips commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Uh oh!

amd-bartgips Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amd-bartgips Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amd-bartgips Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpietila-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amd-bartgips Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amd-bartgips commented Jul 31, 2025 •

edited

Loading

amd-bartgips Aug 8, 2025 •

edited

Loading

amd-bartgips Aug 8, 2025 •

edited

Loading

amd-bartgips Aug 8, 2025 •

edited

Loading

amd-bartgips Aug 12, 2025 •

edited

Loading