Skip to content

Conversation

amd-bartgips
Copy link

@amd-bartgips amd-bartgips commented Jul 31, 2025

Proposed changes

With this PR MIOpen should be able to use heuristics for the three conv_hip_implicit_gemm_3d_grouped_*_xdlops.
I have made bunch of changes to how the metadata is structured, we are able to robustly map kernel config strings that MIOpen generates from CK to vectors that can be used as input to the fdeep model.

I have added gtest functions for all the new stuff.

I continued with the structure set up before:

  • ai_heuristics.cpp contains all the code already there beforehand, plust the fdeep code, since fdeep can only be imported in a single file
  • ai_candidate_selection contains the actual "model" and "metadata" classes that do the computation using floating point vectors. This his been somewhat updated to make the translation from "input" (=fdbkey) and "output" (=kernel configs) more robust
  • ai_conv_3d_kernel_tuning_utils contains the machinery one level higher. That is, how to convert kernel configs and fdb_key input to float vectors and how to fetch and call the relevant CandidateSelectionModel. This is shared for all three solvers, so it made sense to centralise it.
  • kernels/gfx942.... model and metadatafiles. Perhaps these should gitlfs, but I suggest we worry about that when we make the final PR in the monorepo
  • solver/conv/...cpp solver_specific files, they ultimately contain the solver-specific machinery. In this case our 3 solvers rely heavily on ai_conv_3d_kernel_tuning_utils and through that on ai_candidate_selection
  • gtest files: should speak for themselves, please have a look
  • solvers.hpp a huuuuge header file that contains declarations for all solvers (i.e. for the solver/conv/...cpp files), so this needed to be altered as well

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added automated tests relevant to the introduced functionality
  • I have sufficient test coverage for the changes, and code coverage hasn't decreased as a result of my PR
  • I have ran the tests, and they are all passing locally
  • I have added relevant documentation for the changes
  • I have removed the stale documentation which is no longer relevant after this pull request
  • I have ran make format & make check_format to ensure the changes have been formatted

TODO:
* check if metadata is correct.
* Implement relevant cpp functions to load models and run them
… the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver
* Forward declaration of EncodeKernelParams.
Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)
…e for easier reuse across other 3D conv solvers
* Added first attempt at fdeep versions of 3d kernel tuning models (two towers).
TODO:
* Implement relevant cpp functions to load models and run them

* Added co-pilot generated new model class for selecting the best candidate kernel config

* performed dot product directly in cpp code, removed candidate_selector submodel

* Added the machinery to use the new candidate selection heuristics for the conv_hip_implicit_gemm_3d_grouped_wrw_xdlops solver

* added split_k functionality

* refactored by moving some helper functions outside the main function

* improved loading of metadata for candidate selection model

* split off own metadata class to keep it distinct from legacy version

* Added new versions of CS models (+new metadata)

* Added methods for preprocessing input and kernel_convigs for CS model (encoding + dropping constants)

* fixed naming of CandidateSelectionMetadata and Ptrs variables

* Added new model and metadata class to header file

* removed unused function

* removed duplicate definition of CandidateSelectionMetadata

* * altered SelectBestCandidate to avoid errors.
* Forward declaration of EncodeKernelParams.

* removed unused function

* Build now works.
Removed some redundant code and unused vars to avoid most warnings during build (left unused var "arch" in for future use)

* moved new candidateSelection code to its own files (and sub namespace)

* refactored general 3D conv kernel tuning functions into their own file for easier reuse across other 3D conv solvers

* Cleaned up unused include

* removed superfluous includes
All fdeep references should stay in ai_heuristics.cpp because fdeep stuff is not declared inline.
…default. Can be altered to single thread using MIOPEN_AI_FDEEP_USE_SINGLE_PREDICT env var
@amd-bartgips amd-bartgips changed the title Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_wrw_xdlops Use CandidateSelectionModel heuristic for conv_hip_implicit_gemm_3d_grouped_*_xdlops solvers Aug 7, 2025
@amd-bartgips amd-bartgips marked this pull request as ready for review August 7, 2025 13:59
}
if(use_split_k)
{
std::vector<int> split_ks = GenerateSplitK(128); // TODO: make configurable
Copy link
Author

@amd-bartgips amd-bartgips Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This we could perhaps put in the metadata? Since our model has only been trained using powers of two up till 128. Meaning, from the training data we know exactly which values should be in "split_ks", so we could save this when generating the metadata file, and fill this directly from that file in this line here.
However, as Ville said, there is no inherent reason why we should stick to these values. The model has been trained using the real values as input (i.e. 1, 2, 4, ... 128, not some kind of categorical coding like the original KTN did), so the model should be able to interpolate to other values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do that. metadata.

value = static_cast<float>(kv.second);
found_ws = true;
// TODO consider decreasing this verbosity
MIOPEN_LOG_I2("Found whitespace-stripped match for output "
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of logging at the moments, since it seems the labels have changed (i.e. some spaces were removed). Should we remove this? Or log it at a higher debug level?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's any higher debug level available.

!env::disabled(MIOPEN_DEBUG_3D_CONV_IMPLICIT_GEMM_HIP_BWD_XDLOPS_AI_HEUR))
{
bool ai_success = false;
using DataType = float; // This is a stupid hack. The same hold for the other three solvers.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly what this "DataType" is supposed to be. Is it related to the datatype of the convolution operation? Or is this some kind of internal datatype that MIOpen uses to pass around values?

I suppose we could copy e.g. src/solver/conv/conv_hip_implicit_gemm_grouped_bwd_xdlops.cpp


        if(problem.GetInDataType() == miopenFloat)
        {
            if(RunParameterPredictionModel<float>(ctx, problem))
                return;
        }
        else if(problem.GetInDataType() == miopenBFloat16)
        {
            if(RunParameterPredictionModel<ck::bhalf_t>(ctx, problem))
                return;
        }
        else
        {
            if(RunParameterPredictionModel<ck::half_t>(ctx, problem))
                return;
        }

But I do not know why those three datatypes are hard coded there either.... (nor do I understand when to use MIOpen or CK datatypes)

Copy link
Author

@amd-bartgips amd-bartgips Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to move the fill_valid_kernels function inside the RunParameterPredictionModel function/template. So it can just reuse the datatype there
(working on it, not yet committed)

Copy link
Author

@amd-bartgips amd-bartgips Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought this turned out to be too difficult, since there are too many dependencies of thing declared in the respective solver's .cpp files. So I kept the current structure where a fill_valid_kernel lambda is still declared/defined in the solvers' .cpp files, but now differently for the different datatypes

amd-bartgips and others added 3 commits August 8, 2025 08:14
* Add 3D convolution AI heuristics implementation

  - Add Gfx942Model_3D class for 3D convolution predictions
  - Implement metadata_3d.cpp for 3D feature extraction
  - Add trained 3D TunaNet models for gfx942
  - Update CMakeLists.txt for 3D components

* remove redundant hpp include on fstream"

* updated unused ExecutionContext parameter.

* refactor: update metadata_3d exception pattern to use std::optional

* Add comprehensive gtest suite for 3D convolution AI heuristics

  - Add 13 unit tests covering 3D AI heuristics functionality
  - Test metadata loading, model creation, and solver predictions
  - Add MIOpenDriver equivalent test (BartTest) for real-world validation
  - Enhance logging to show solver predictions with scores
  - Verify correct solver selection for 3D convolution problems

* Added newline to end of file + Clang formatting

* Removed const_cast calls for private member variables in the constructor of Metadata3D, since the compiler did not like this.

* Added [[maybe_unused]] to silence warnings of unused arch arguments

* automated formatting using Clang

* More Clang formatting

---------

Co-authored-by: amd-bartgips <[email protected]>
@amd-bartgips amd-bartgips changed the base branch from miopenff/3d-heuristic to develop_deprecated August 12, 2025 09:04
Copy link

@vpietila-amd vpietila-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, only some minor comments.

value = static_cast<float>(kv.second);
found_ws = true;
// TODO consider decreasing this verbosity
MIOPEN_LOG_I2("Found whitespace-stripped match for output "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there's any higher debug level available.

{
const char* env = std::getenv("MIOPEN_AI_FDEEP_USE_SINGLE_PREDICT");
return env != nullptr && std::string(env) == "1";
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add here "thread" to make it more clear that it refers to using a single thread for running the inference.

Copy link
Author

@amd-bartgips amd-bartgips Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's maybe also good. I have now added some mutex stuff based co-pilot's suggestions on how to deal with the "thread-safe" warnings.
see the latest changes.

I could not really tell whether this was something we could ignore or not.
I understood that the env var could be changed and therefore it would have different effects between different threads?
In your opinion:
a) is this a thing we should worry about? Or is it perfectly normal to allow this slight "vulnerability"
b) is the mutex thing suggested by co-pilot actually solving anything for us? (other than silencing warnings)

[=](const miopen::conv::ProblemDescription& problem) -> std::vector<std::string> {
return miopen::solver::FillValidKernelsIDs<DeviceOpGBwdDefaultPtrs<ck::half_t>,
CKArgs<ck::half_t>>(problem);
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider defining a templated function

template<typename T>
bool RunPrediction(const ProblemDescription& problem, ...) 
{
   auto fill_valid_kernels = [=](const miopen::conv::ProblemDescription& problem) -> std::vector<std::string> {
        return miopen::solver::FillValidKernelsIDs<DeviceOpGBwdDefaultPtrs<T>, CKArgs<T>>(problem);
    };
    
    return miopen::solver::conv::RunParameterPredictionModel<T>(
        ctx, problem, valid_kernels, index, split_k, kernel_id, fill_valid_kernels, solver_name);
}

to decrease the repetition. This could be used in all branches of the switch statement. It is then also easier to add new data types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing that... but it quickly spiralled out of control so I eventually reverted back to this ugly solution and moved on. I will see if I can give it another final try.

DeviceOpGBwdWeightDefaultPtrs<ck::half_t>,
CKArgs<ck::half_t>>(problem);
};
ai_success =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could again extract the repeated method lines from switch branches into a single templated function.

amd-bartgips added a commit to ROCm/rocm-libraries that referenced this pull request Sep 22, 2025
… tower model) (#1154)

Copied over form branch in old repo:
ROCm/MIOpen#3923

## Motivation

With this PR MIOpen should be able to use heuristics for 3D convolutions
on gfx942:
* the parameter selection/kernel tuning of the three
`conv_hip_implicit_gemm_3d_grouped_*_xdlops` solvers.
We have added some `ai_*` files housing the new models and helper
functions, and made some changes to existing files where required.
* ~~the solver selection (i.e. a "3D tunanet")~~ At the moment the 3D
solver selection model is inadequate and the already existing WTI
fallback is preferred. Improving the 3D solver selection heuristics will
be the focus of a future PR

## Technical Details

* `ai_heuristics.cpp` contains all the code already there beforehand,
plust new code that relies on fdeep includes, since fdeep can only be
imported in a single file
* `ai_candidate_selection` contains the actual two-towers (aka
CandidateSelection) "model" and "metadata" classes that do the
computation using floating point vectors.
* `ai_conv_3d_kernel_tuning_utils` contains the machinery one level
higher. That is, how to convert kernel configs and fdb_key input to
float vectors and how to fetch and call the relevant
CandidateSelectionModel. This is shared for all three solvers, so it
made sense to centralise it in this file.
* `kernels/gfx942....` model and metadatafiles. Perhaps these should be
committed using git lfs, but I have not seen this done to other model
files (e.g. Tunanet or KTN), so have not done so here.
* `solver/conv/...cpp` solver_specific files, they ultimately contain
the solver-specific machinery. In this case our 3 solvers rely heavily
on `ai_conv_3d_kernel_tuning_utils` and through that on
`ai_candidate_selection`
* gtest files: should speak for themselves, please have a look. They
test all the new machinery
* solvers.hpp a huge header file that contains declarations for all
solvers (i.e. for the `solver/conv/...cpp` files), so this needed to be
altered as well

## Test Plan

3 new gtest .cpp files are added:
* ~~`test/gtest/conv_ai_3d_heuristics.cpp`
This aims to test all new functionality related to the 3D tunanet (model
+ metadata).~~ While the tests are still there for future use, they will
be skipped since 3D Tunanet model data is no longer included.
* `test/gtest/conv_ai_3d_kernel_tuning_utils.cpp`
Aims to test all new machinery in `ai_conv_3d_kernel_tuning_utils.cpp`
(preprocessing and handling of inputs to the CandidateSelectionModel for
the 3D solvers).
* `test/gtest/conv_ai_candidate_selection_model.cpp`
Test interenal code related to the CandidateSelectionModel and its
metadata.

## Test Result
The `./bin/test_conv_ai_*` tests all succeed without errors when
building and running them on a conductor MI300 node.
Besides manually running all other `./bin/test_*` functions, is there a
better way to perform a full test?

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
assistant-librarian bot pushed a commit that referenced this pull request Sep 22, 2025
[MIOpen] Implement kernel tuning heuristic model for 3D conv
 ops (two tower model) (#1154)

Copied over form branch in old repo:
#3923

## Motivation

With this PR MIOpen should be able to use heuristics for 3D convolutions
on gfx942:
* the parameter selection/kernel tuning of the three
`conv_hip_implicit_gemm_3d_grouped_*_xdlops` solvers.
We have added some `ai_*` files housing the new models and helper
functions, and made some changes to existing files where required.
* ~~the solver selection (i.e. a "3D tunanet")~~ At the moment the 3D
solver selection model is inadequate and the already existing WTI
fallback is preferred. Improving the 3D solver selection heuristics will
be the focus of a future PR

## Technical Details

* `ai_heuristics.cpp` contains all the code already there beforehand,
plust new code that relies on fdeep includes, since fdeep can only be
imported in a single file
* `ai_candidate_selection` contains the actual two-towers (aka
CandidateSelection) "model" and "metadata" classes that do the
computation using floating point vectors.
* `ai_conv_3d_kernel_tuning_utils` contains the machinery one level
higher. That is, how to convert kernel configs and fdb_key input to
float vectors and how to fetch and call the relevant
CandidateSelectionModel. This is shared for all three solvers, so it
made sense to centralise it in this file.
* `kernels/gfx942....` model and metadatafiles. Perhaps these should be
committed using git lfs, but I have not seen this done to other model
files (e.g. Tunanet or KTN), so have not done so here.
* `solver/conv/...cpp` solver_specific files, they ultimately contain
the solver-specific machinery. In this case our 3 solvers rely heavily
on `ai_conv_3d_kernel_tuning_utils` and through that on
`ai_candidate_selection`
* gtest files: should speak for themselves, please have a look. They
test all the new machinery
* solvers.hpp a huge header file that contains declarations for all
solvers (i.e. for the `solver/conv/...cpp` files), so this needed to be
altered as well

## Test Plan

3 new gtest .cpp files are added:
* ~~`test/gtest/conv_ai_3d_heuristics.cpp`
This aims to test all new functionality related to the 3D tunanet (model
+ metadata).~~ While the tests are still there for future use, they will
be skipped since 3D Tunanet model data is no longer included.
* `test/gtest/conv_ai_3d_kernel_tuning_utils.cpp`
Aims to test all new machinery in `ai_conv_3d_kernel_tuning_utils.cpp`
(preprocessing and handling of inputs to the CandidateSelectionModel for
the 3D solvers).
* `test/gtest/conv_ai_candidate_selection_model.cpp`
Test interenal code related to the CandidateSelectionModel and its
metadata.

## Test Result
The `./bin/test_conv_ai_*` tests all succeed without errors when
building and running them on a conductor MI300 node.
Besides manually running all other `./bin/test_*` functions, is there a
better way to perform a full test?

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants