Skip to content

[hipDNN] ALMIOPEN-2075 Add real-workload benchmark graph collection (DVC)#8795

Open
SamuelReeder wants to merge 4 commits into
developfrom
users/sareeder/sdpa-benchmark-suites
Open

[hipDNN] ALMIOPEN-2075 Add real-workload benchmark graph collection (DVC)#8795
SamuelReeder wants to merge 4 commits into
developfrom
users/sareeder/sdpa-benchmark-suites

Conversation

@SamuelReeder

@SamuelReeder SamuelReeder commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a curated collection of ~9,000 real-workload hipDNN graphs (SDPA, GEMM, Conv, Normalization, and the other support hipDNN node types) for the Perf & benchmarking app, tracked via DVC so the JSON blobs live on the remote and only pointers are committed. Graphs come from real OSS model configs and an open model-crawler op dump; no customer data is included.

Risk Assessment

Low risk. In-repo footprint is data/metadata only — DVC pointers, per-bucket .gitignores, and two sample graph fixtures. No product/library code, build configuration, or public APIs change. The collection blobs are served from the DVC remote, so blast radius is limited to consumers of the benchmark Workloads.

Testing Summary

  • Graph validation: every collected graph passes opgraph-level deserialization against the real hipdnn_frontend.
  • DVC integrity: all bundle blobs verified present and in sync on the remote.
  • Data-safety: confirmed no sensitive/MAD-internal graphs are present in any pushed bundle.

Testing Checklist

  • Graph deserialization (opgraph level) - python check_deserialize.py --level opgraph "<collection>/**/*.json" - Status: Passed
  • DVC remote sync - dvc status -c - Status: Passed (0 out-of-sync)
  • PR CI - GitHub PR checks - Status: Pending

Technical Changes

  • Track the benchmark graph collection as DVC tarballs under Workloads/, split into per-model bundles (Workloads/models/) and per-library shape-sweep bundles (Workloads/microbench/); only .tar.gz.dvc pointers are committed, blobs are gitignored and pushed to the DVC remote.
  • Source every graph from a real workload: OSS model configs (Llama, Qwen3, GLM, etc.) and an open model-crawler op dump. Quantized/fp8/fp4 tensors are mapped to the nearest supported dtype; ops without a hipDNN mapping (MoE/grouped-gemm, rotary, transposed/1D/3D conv, etc.) are documented and skipped rather than fabricated.
  • Keep tarball internal paths at the op-folder level (e.g. {gemm,conv,sdpa}/...) so each bundle extracts cleanly without a redundant source/model prefix.
  • Add MANIFEST.md/INDEX.md provenance bundled alongside the collection, two sample graph fixtures (sample_add, sample_relu), and per-bucket .gitignore entries for the tarball blobs.

… (DVC)

Curate ~9,000 real-workload hipDNN graphs (SDPA, GEMM, Conv, Normalization, etc.)
for the Perf & benchmarking app, tracked via DVC: only .tar.gz.dvc pointers are
committed, blobs live on the DVC remote.

- Split bundles into per-model (Workloads/models/) and per-library shape-sweep
  (Workloads/microbench/) tarballs; tarball internal paths kept at op-folder level.
- Every graph sourced from a real workload (OSS model configs + open model-crawler
  op dump); quantized dtypes mapped to nearest supported; unsupported ops documented
  and skipped, never fabricated. No customer/MAD-internal data included.
- Add MANIFEST.md/INDEX.md provenance, sample graph fixtures, and per-bucket
  .gitignore for tarball blobs.
@SamuelReeder SamuelReeder force-pushed the users/sareeder/sdpa-benchmark-suites branch from 079fd4b to 633eaac Compare June 24, 2026 22:58
Standalone helper (tools/check_deserialize.py) that verifies benchmark graph JSON
deserializes/validates at three levels (json / graph / opgraph) without building a
plan or running a kernel. Document it under the Workloads section of the README.
…s no GPU

build_operation_graph only assembles/finalizes the operation-graph descriptor;
the sole device-properties query is in the engine-heuristics path the script never
reaches. Drop the inaccurate GPU/ranked-engine claims from the docstring and README.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this script separate from dnn-benchmarking to narrow scope. Its a low-effort utility that may occasionally be useful. Could move elsewhere or drop if desired.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some other scripts for generating JSON graphs for each op from their direct params, and another for scraping modelcrawler to generate graphs. I didn't commit them because the use-cases would be scarce, but I could throw them in another tools directory or mlse-tools-internal.

opgraph is a strict superset of graph with identical prerequisites and per-stage
error messages, so graph added nothing. Keep json (no build) and opgraph (full);
default to opgraph.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelcrawler has a convenient way to sort shapes by the number of models that use it and number of cumulative calls across models, so I only included the most frequent 256 shapes per-op out of ~173,500 total unique shapes.

This subset may possess more value than the complete set due to its feasibility to run and the incidence of the selected shapes.

Perhaps the collection script should be added to mlse-tools-internal at the very least to recollect the popular shapes on-demand.

@SamuelReeder SamuelReeder marked this pull request as ready for review June 25, 2026 01:29
@SamuelReeder SamuelReeder requested a review from a team as a code owner June 25, 2026 01:29
@SamuelReeder SamuelReeder self-assigned this Jun 25, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.89%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8795   +/-   ##
========================================
  Coverage    71.43%   71.44%           
========================================
  Files         2612     2612           
  Lines       406992   407023   +31     
  Branches     60771    60771           
========================================
+ Hits        290730   290770   +40     
+ Misses       94960    94953    -7     
+ Partials     21302    21300    -2     
Flag Coverage Δ *Carryforward flag
TensileLite 76.83% <ø> (ø) Carriedforward from 633eaac
hipBLAS 90.81% <ø> (ø) Carriedforward from 633eaac
hipBLASLt 41.36% <ø> (ø) Carriedforward from 633eaac
hipCUB 82.68% <ø> (ø) Carriedforward from 633eaac
hipDNN 86.74% <ø> (+0.03%) ⬆️
hipFFT 50.79% <ø> (ø) Carriedforward from 633eaac
hipRAND 76.12% <ø> (ø) Carriedforward from 633eaac
hipSOLVER 69.18% <ø> (ø) Carriedforward from 633eaac
hipSPARSE 86.55% <ø> (ø) Carriedforward from 633eaac
rocBLAS 48.08% <ø> (ø) Carriedforward from 633eaac
rocFFT 47.22% <ø> (ø) Carriedforward from 633eaac
rocRAND 57.07% <ø> (ø) Carriedforward from 633eaac
rocSOLVER 77.89% <ø> (ø) Carriedforward from 633eaac
rocSPARSE 72.37% <ø> (ø) Carriedforward from 633eaac
rocThrust 91.34% <ø> (ø) Carriedforward from 633eaac

*This pull request uses carry forward flags. Click here to find out more.
see 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants