feat(hip-kernel-provider): add rocKE conv engine with ML heuristic by cderb · Pull Request #8982 · ROCm/rocm-libraries

cderb · 2026-06-30T20:19:32Z

Summary

Adds the rocKE conv forward engine to hip-kernel-provider, providing JIT-compiled implicit-GEMM convolution kernels selected by a LightGBM ML heuristic. The engine compiles IR to HSACO at runtime via the hipRTC linker API and selects the best kernel candidate per-shape using a trained tflops prediction model. Ships trained models for gfx90a, gfx942, and gfx950. Also includes C++ sweep tooling with fork+exec per-candidate GPU isolation for robust training data collection.

JIRA ID : AICK-1533

Conv model stats (5-fold grouped CV, 2000 LightGBM estimators)

Arch	Shapes	Rows	Features	CV Mean Eff	CV P10 Eff	CV R²
gfx90a	1,654	104,504	101	0.990	0.970	0.998
gfx942	2,621	26,531	101	0.991	0.972	0.998
gfx950	8,957	621,002	72	0.950	0.903	0.966

Risk Assessment

Medium risk. This adds a new opt-in engine (ENABLE_ROCKE_CONV_ENGINE=ON, off by default) with JIT compilation, ML model loading, and a new hipRTC linker code path. The engine is behind a build flag and does not affect existing engines or default behavior. Integration tests cover correctness on gfx90a and gfx942; gfx950 coverage is pending.

ASIC Coverage

Specific-ASIC runs required on gfx90a, gfx942, and gfx950. The engine ships arch-specific ML models and generates arch-specific IR; each target architecture must pass integration tests independently. The engine is gated by ENABLE_ROCKE_CONV_ENGINE and does not affect other engines, so no full multi-arch sweep of unrelated components is needed.

Testing Summary

C++ integration tests (IntegrationGpuRockeConvFwdFp16) validate end-to-end: model load → feature extraction → LightGBM inference → kernel selection → IR compile → dispatch → numerical correctness vs CPU reference.
Python dispatcher unit tests (test_conv.py) validate candidate selection, arch gating, and support surface checks (CPU-only).

Testing Checklist

C++ integration tests - --gtest_filter="*RockeConv*" - ASICs: gfx90a - Status: Passed
C++ integration tests - --gtest_filter="*RockeConv*" - ASICs: gfx942 - Status: Passed
C++ integration tests - --gtest_filter="*RockeConv*" - ASICs: gfx950 - Status: Pending
Python dispatcher unit tests - test_conv.py - Status: Passed
PR CI - GitHub PR checks - Status: Pending

Technical Changes

Adds RockeConvEngine hipdnn engine plugin: receives conv op-graph, extracts NHWC problem params, delegates to ConvFwdPlanBuilder for JIT kernel compilation and dispatch.
Adds ConvFwdPlanBuilder: queries ConvMLHeuristic for top-K candidate kernels, compiles IR → HSACO via hiprtcLinkCreate/hiprtcLinkAddData/hiprtcLinkComplete, returns executable ConvFwdPlan.
Adds ConvMLHeuristic (C++): loads .lgbm model and feature_spec.json, extracts features from conv problem + hardware profile (hipDeviceProp_t), predicts tflops per candidate.
Registers ROCKE_CONV_ENGINE in EngineNames.hpp alongside existing ROCKE_ENGINE.
Adds C++ sweep tooling (conv_candidate_sweep.cpp, rocke_kern_time.cpp) with fork+exec isolation per candidate (5s timeout, SIGKILL on hang) for training data collection.
Replaces old Python-only conv sweep with gen_conv_sweep_data.py wrapping the C++ sweep binary; supports --shapes CSV input for targeted coverage augmentation.
Ships compressed LightGBM models (.lgbm.gz) and feature_spec.json for gfx90a (101 features), gfx942 (101 features), and gfx950 (72 features).
Adds IntegrationGpuRockeConvFwdFp16 integration test with 4 smoke shapes (3×3, 1×1, strided, rectangular).

… compilation Adds a new hipDNN engine plugin for grouped convolution forward pass using the rocKE implicit GEMM framework. The engine selects tile configurations via a LightGBM ML heuristic and compiles kernels at plan-build time by lowering rocKE IR to HSACO. ## C++ engine (src/engines/rocke_conv_engine/) - RockeConvEngine: registers the engine with the plugin, handles isApplicable checks (fp16, rank-4, gfx942/gfx950/gfx90a, model file present). - ConvFwdPlanBuilder: extracts the conv problem from the hipDNN op-graph (NCHW logical dim order), enumerates tile candidates, scores them with the ML heuristic, lowers the winning spec to LLVM IR via rocKE, patches the IR for LLVM 23 / ROCm 7.14 compatibility, and compiles to HSACO via direct `clang -x ir` invocation. - ConvFwdPlan: stores the compiled HIP module/function and kernel launch params; executes by binding tensor pointers from the workspace map and dispatching hipModuleLaunchKernel. Key implementation details: - gfx942 uses warp_tile_k=8 (32x32x8 MFMA atom for f16); gfx950/gfx90a use 16. - LightGBM symbols declared weak so the plugin loads without liblgbm.so at link time; falls back to first valid tile config if the model is not loaded. - IR patching (patchMakeBufferRsrc): normalises the llvm.amdgcn.make.buffer.rsrc intrinsic across rocKE LLVM20/22 output flavors to the form accepted by the ROCm 7.14 container clang 23 build (.p8.p1, i64 num_records, no parameter attributes). Injects zext instructions in the kernel entry block to widen i32 byte-count params to i64 at call sites without breaking the kernel ABI. - hipRTC/comgr bypassed entirely: comgr's internal IR auto-upgrade pass mangles ptr addrspace(N) intrinsic arguments, causing verifier failures. Direct clang invocation avoids this. - Tensor dims read in NCHW logical order ([N,C,Hi,Wi] / [K,C,Y,X]) as required by the hipDNN frontend, with NHWC-contiguous strides set separately. ## ML heuristic (rocKE/Cpp/include/rocke/conv_ml_heuristic.h) New C++ header wrapping the LGBM C API for conv tile-config scoring. Declares LGBM symbols as weak externals so the engine plugin loads in environments where liblgbm.so is absent. ## Python heuristics (rocKE/Python/rocke/heuristics/) - feature_engine_grouped_conv.py: extended feature set (101 → 107 features) including log-space geometric features, CU occupancy ratios, and L2/memory pressure proxies. - gen_conv_sweep_data.py: sweep data generator for grouped conv; runs inside Enroot containers on Slurm GPU nodes. - augment_coverage_conv.py: targeted OOF-driven shape generator to fill coverage gaps in the training distribution. - generate_coverage_conv.py: coverage analysis and shape generation utilities. - sample_shapes_conv.py: random shape sampler with architecture-aware filtering. - validate_ml_vs_oracle_conv.py: ML vs oracle comparison for conv predictions. - train.py: updated to write feature_spec.json alongside trained models. ## Trained models (rocKE/Python/rocke/heuristics/models/) Initial model checkpoints for gfx942, gfx950, gfx90a (grouped conv fwd fp16): - model_tflops.lgbm.gz: compressed LightGBM booster (gunzip before use) - feature_spec.json: ordered feature name list consumed by the C++ heuristic - train_manifest.json: training provenance metadata ## CMake wiring - ENABLE_ROCKE_CONV_ENGINE option (default OFF) guards the new engine. - rocke_core (the rocKE C library) built as a subproject; linked into hip_kernel_provider_impl via TARGET_OBJECTS propagation. - Integration test target hip_kernel_provider_integration_tests extended with four conv forward smoke tests (3x3_small, 1x1_pointwise, 3x3_stride2, 3x3_rect_spatial) covering correctness on gfx942. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Add per-candidate process isolation to the conv candidate sweep via fork()+exec() of a helper binary (rocke_kern_time), eliminating GPU context poisoning from hung kernels. Pre-launch validation via hipFuncGetAttribute catches resource-limit failures before launching. Replace gen_conv_sweep_data.py (slow Python-only sweep) with a C++ sweep wrapper that maintains the same CLI interface (--shapes, --shape-set, --arch, --max-shapes) and generate() API for gen_sweep_data.py dispatch. Conv engine integration: hipRTC linker API replaces popen(clang), ConvHwProfile uses live hipDeviceProp_t, int64 byte-size computation. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

…conv-models # Conflicts: # projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/EngineNames.hpp

therock-pr-bot · 2026-06-30T20:24:39Z

❌ PR Check — Action Required

Check	Status	Details
🌿 Branch Name	✅ Pass	—
📝 PR Title/Description	✅ Pass	—
⛔ Forbidden Files	✅ Pass	—
🧪 Unit Test	✅ Pass	—
🔎 pre-commit	❌ Fail	Error: Check concluded with `failure`.
🚫 Draft PR	🔜 To Be Enabled	—
🚩 Feature Flag	🔜 To Be Enabled	—
📊 Code Coverage	🔜 To Be Enabled	—

⚠️ 1 policy check(s) failed. Please address the issues above before this PR can be Reviewed.

🚫 Please fix the failed policies

❌ pre-commit

The Not ready to Review label was added to this PR. Once all policies pass, the label is removed automatically.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

therock-pr-bot · 2026-06-30T20:24:40Z

🎉 All checks passed! This PR is ready for review.

…spatcher tests Run black 25.12.0 and clang-format 18 on all changed files to satisfy pre-commit. Add gfx90a unit tests to test_conv.py for dispatcher coverage. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

Add trailing newlines to model JSON files and remove extra blank line in rocke_conv_engine/CMakeLists.txt. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

…istics - Add missing stride_w/pad_w to validation results (validate_ml_vs_oracle_conv) - Convert sweep latency_us to latency_ms to match training pipeline (gen_conv_sweep_data) - Cache ConvMLHeuristic across buildPlan calls to avoid reloading model from disk - Remove dead depthwise coverage section (C/G=1 fails MFMA alignment constraint) Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

therock-pr-bot · 2026-06-30T22:10:37Z

Pre-commit check failed

⛔ pre-commit failed

Please run locally:

python -m pip install pre-commit
pre-commit install
pre-commit run --all-files --show-diff-on-failure

This repo uses .pre-commit-config.yaml.

BradPepersAMD · 2026-06-30T22:51:00Z

This work is very cool and usefully shows a bunch of the pieces that we can build on and learn from. The base library is still in a lot of flux and likely things move around that breaks this PR and we are trying to focus on SDPA so we may delay landing convs until we are sure about how we organize the SDPA work. but all the pieces of the conv work are coming together here nicely and we will land some version of this in the next weeks!

…ic tooling Simplify ConvFwdParams to only fields needed at execute time (UIDs, byte sizes, grid/block, kernel name), pre-compute byte sizes at build time. Eliminate two heap allocations per predict_tflops call via pre-allocated member buffers. Replace mkstemp with memfd_create and busy-wait with blocking waitpid+alarm in sweep tool. Consolidate duplicated Python helpers (HEADER, SHAPE_COLS, bucket functions, write_csv) into canonical locations. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

…n changes Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

codecov-commenter · 2026-07-01T00:30:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8982   +/-   ##
========================================
  Coverage    71.33%   71.33%           
========================================
  Files         2628     2628           
  Lines       413043   413043           
  Branches     61875    61875           
========================================
+ Hits        294613   294617    +4     
+ Misses       96656    96653    -3     
+ Partials     21774    21773    -1

Flag	Coverage Δ		*Carryforward flag
TensileLite	`76.65% <ø> (ø)`		Carriedforward from 80f384f
hipBLAS	`90.81% <ø> (ø)`		Carriedforward from 80f384f
hipBLASLt	`41.35% <ø> (ø)`		Carriedforward from 80f384f
hipCUB	`82.68% <ø> (ø)`		Carriedforward from 80f384f
hipDNN	`85.92% <ø> (+0.01%)`	⬆️
hipFFT	`50.17% <ø> (ø)`		Carriedforward from 80f384f
hipRAND	`76.12% <ø> (ø)`		Carriedforward from 80f384f
hipSOLVER	`69.18% <ø> (ø)`		Carriedforward from 80f384f
hipSPARSE	`86.55% <ø> (ø)`		Carriedforward from 80f384f
rocBLAS	`48.06% <ø> (ø)`		Carriedforward from 80f384f
rocFFT	`46.30% <ø> (ø)`		Carriedforward from 80f384f
rocRAND	`57.07% <ø> (ø)`		Carriedforward from 80f384f
rocSOLVER	`76.92% <ø> (ø)`		Carriedforward from 80f384f
rocSPARSE	`72.37% <ø> (ø)`		Carriedforward from 80f384f
rocThrust	`91.36% <ø> (ø)`		Carriedforward from 80f384f

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines	Coverage Δ
.../include/hipdnn_data_sdk/utilities/EngineNames.hpp	`96.23% <ø> (ø)`

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…conv-models # Conflicts: # dnn-providers/hip-kernel-provider/rocke/platform/Cpp/include/rocke/conv_ml_heuristic.h # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/augment_coverage_conv.py # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/gen_conv_sweep_data.py # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/gen_sweep_data.py # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/generate_coverage_conv.py # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx90a/feature_spec.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx90a/model_tflops.lgbm.gz # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx90a/train_manifest.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx942/feature_spec.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx942/model_tflops.lgbm.gz # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx942/train_manifest.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx950/feature_spec.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx950/model_tflops.lgbm.gz # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/models/grouped_conv_forward_fp16_gfx950/train_manifest.json # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/sample_shapes_conv.py # dnn-providers/hip-kernel-provider/rocke/platform/Python/rocke/heuristics/validate_ml_vs_oracle_conv.py

…alid refs all_buf_{kConvFeatureCount} used brace-init which selects the initializer_list<double> ctor, creating a 1-element vector (value 109.0) instead of a 109-element buffer — heap overflow on every predict_tflops call. Use explicit construction instead. Also fix two _valid() call sites in generate_coverage_conv.py missed during the _valid → conv_shape_valid rename. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

cderb and others added 3 commits June 27, 2026 01:58

Merge remote-tracking branch 'origin/develop' into users/cderb/rocke-…

5a810ee

…conv-models # Conflicts: # projects/hipdnn/data_sdk/include/hipdnn_data_sdk/utilities/EngineNames.hpp

github-actions Bot added project: hipdnn project: hip-kernel-provider labels Jun 30, 2026

therock-pr-bot Bot added the Not ready to Review label Jun 30, 2026

assistant-librarian Bot added the organization: ROCm label Jun 30, 2026

chore(hip-kernel-provider): fix pre-commit end-of-file and cmake-lint

d6eee6a

Add trailing newlines to model JSON files and remove extra blank line in rocke_conv_engine/CMakeLists.txt. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

cderb requested review from bartekxk and yraparti June 30, 2026 21:14

therock-pr-bot Bot removed the Not ready to Review label Jun 30, 2026

cderb marked this pull request as ready for review June 30, 2026 21:24

cderb requested review from a team as code owners June 30, 2026 21:24

cderb and others added 2 commits June 30, 2026 18:55

style(hip-kernel-provider): apply black + clang-format to optimizatio…

46a90ac

…n changes Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>

cderb and others added 2 commits July 1, 2026 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(hip-kernel-provider): add rocKE conv engine with ML heuristic#8982

feat(hip-kernel-provider): add rocKE conv engine with ML heuristic#8982
cderb wants to merge 10 commits into
developfrom
users/cderb/rocke-conv-models

cderb commented Jun 30, 2026 •

edited

Loading

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026

Uh oh!

BradPepersAMD commented Jun 30, 2026

Uh oh!

codecov-commenter commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

cderb commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Conv model stats (5-fold grouped CV, 2000 LightGBM estimators)

Risk Assessment

ASIC Coverage

Testing Summary

Testing Checklist

Technical Changes

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ PR Check — Action Required

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

therock-pr-bot Bot commented Jun 30, 2026

Pre-commit check failed

Uh oh!

BradPepersAMD commented Jun 30, 2026

Uh oh!

codecov-commenter commented Jul 1, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cderb commented Jun 30, 2026 •

edited

Loading

therock-pr-bot Bot commented Jun 30, 2026 •

edited

Loading

therock-pr-bot Bot commented Jun 30, 2026 •

edited

Loading