Skip to content

[origami] Consume Origami as a findable package; stop duplicate origami builds#7996

Merged
davidd-amd merged 9 commits into
developfrom
users/davidd-amd/tensilelite-shared-p1-origami
Jun 16, 2026
Merged

[origami] Consume Origami as a findable package; stop duplicate origami builds#7996
davidd-amd merged 9 commits into
developfrom
users/davidd-amd/tensilelite-shared-p1-origami

Conversation

@davidd-amd

Copy link
Copy Markdown
Contributor

Motivation

hipBLASLt and hipSPARSELt each built their own liborigami.so via add_subdirectory,
duplicating the library and its exports. This re-enables Origami as an installed,
findable package so it is built once and shared.

What this does

  • hipBLASLt: replaces the if(FALSE) / add_subdirectory(.../shared/origami) fallback
    (disabled by [origami] Enable build in TheRock #3687 "until origami is available in TheRock") with
    find_package(origami REQUIRED). Origami is now built/installed in both TheRock and the
    monorepo superbuild, so the disable was a stale merge omission.
  • hipSPARSELt: drops the now-dead ORIGAMI_ENABLE_INSTALL OFF shim (it only mattered
    while hipBLASLt sub-built origami).
  • Origami CMake hardening (additive, reversible):
    • generate_export_header (ORIGAMI_EXPORT) + CXX_VISIBILITY_PRESET hidden
      the exported ABI is the public API, not "everything".
    • FILE_SET HEADERS public-header contract + VERIFY_INTERFACE_HEADER_SETS (every public
      header compiles standalone).
    • hip::host on the headers interface (public headers use hip types); hip::host demoted
      PRIVATE on the library itself.

Validation (Docker, amdclang++ 23)

  • Origami builds SHARED; internal helpers now hidden (dynamic-defined exports 378 → 325).
  • All 11 public headers pass all_verify_interface_header_sets.
  • A standalone find_package(origami) consumer compiles, links, and runs; ldd resolves the
  • hipBLASLt build tree contains zero origami .o objects / no origami build target —

Notes

  • Bottom of a 3-PR stack (Origami → TensileLite → device-codegen).
  • Reversible: re-comment find_package(origami) and restore the add_subdirectory fallback.
  • TheRock delta: re-enables the integration [origami] Enable build in TheRock #3687 disabled; new
    hipBLASLt → origami / hipSPARSELt → origami package dependency edges.

🤖 Generated with Claude Code

@codecov-commenter

codecov-commenter commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7996   +/-   ##
========================================
  Coverage    65.05%   65.05%           
========================================
  Files         2597     2597           
  Lines       403887   403887           
  Branches     60168    60168           
========================================
  Hits        262717   262717           
  Misses      121929   121929           
  Partials     19241    19241           
Flag Coverage Δ *Carryforward flag
TensileLite 29.71% <ø> (ø) Carriedforward from f6f7a00
hipBLAS 90.65% <ø> (ø) Carriedforward from f6f7a00
hipBLASLt 41.20% <ø> (ø)
hipCUB 82.68% <ø> (ø) Carriedforward from f6f7a00
hipDNN 86.68% <ø> (ø) Carriedforward from f6f7a00
hipFFT 50.97% <ø> (ø) Carriedforward from f6f7a00
hipRAND 76.12% <ø> (ø) Carriedforward from f6f7a00
hipSOLVER 69.18% <ø> (ø) Carriedforward from f6f7a00
hipSPARSE 86.55% <ø> (ø) Carriedforward from f6f7a00
rocBLAS 48.10% <ø> (ø) Carriedforward from f6f7a00
rocFFT 49.48% <ø> (ø) Carriedforward from f6f7a00
rocRAND 57.02% <ø> (ø) Carriedforward from f6f7a00
rocSOLVER 77.83% <ø> (ø) Carriedforward from f6f7a00
rocSPARSE 72.64% <ø> (ø) Carriedforward from f6f7a00
rocThrust 91.34% <ø> (ø) Carriedforward from f6f7a00

*This pull request uses carry forward flags. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@talumbau talumbau self-requested a review June 5, 2026 22:23
@davidd-amd davidd-amd marked this pull request as ready for review June 10, 2026 22:59
@davidd-amd davidd-amd requested review from a team as code owners June 10, 2026 23:00
@davidd-amd davidd-amd force-pushed the users/davidd-amd/tensilelite-shared-p1-origami branch 2 times, most recently from 0bddd4f to 468c316 Compare June 11, 2026 23:14

@talumbau talumbau left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good. I don't know idiomatic usage of the docs/ directory but I was surprised to see content like that in this PR. is that standard?

Other issue is that I would generally prefer the build to fail if we can't find the origami from the build from TheRock (in the case that we are building in TheRock).

Comment thread projects/hipblaslt/CMakeLists.txt
Comment thread docs/therock-minimal-tpl-provisioning-without-full-sources.md Outdated
Comment thread docs/therock-publish-build-only-tpls-as-artifacts.md Outdated
@davidd-amd davidd-amd force-pushed the users/davidd-amd/tensilelite-shared-p1-origami branch from ccc8527 to 81bc34f Compare June 13, 2026 18:11
@davidd-amd

Copy link
Copy Markdown
Contributor Author

@talumbau

spmm_quick_suite (Timeout): CI bundle contention, not a regression in this PR

The failing hipsparselt-test_spmm_quick_suite is a CI-environment timeout, not a code regression. The suite passes; it only exceeds its 300s cap when hipSPARSELt is tested inside the combined multi-project bundle this PR triggers.

Same suite, same ctest --parallel 8, same 300s cap:

run bundle spmm_quick (cap 300s) spmm_standard (cap 1800s)
#8404 (hipSPARSELt only) isolated 70.33s PASS 145.36s PASS
this PR #7996 rocblas,hipblaslt,tensilelite,hipblas,hipsparselt,rocroller 300.85s TIMEOUT 376.45s PASS
#8133 (p2) rocblas,rocroller,hipblas,hipsparselt,hipblaslt 300.71s TIMEOUT 935.65s PASS

The identical hipSPARSELt suite runs 70s in isolation but 300s+ in the bundle (and spmm_standard ran 935s in p2's bundle, passing only because its cap is 1800s). The variable is cross-project GPU contention from co-scheduled test jobs, not hipSPARSELt code.

Why this PR surfaces it: it edits projects/hipsparselt/ and shared/origami together, so CI groups all dependent projects into one bundle. Ordinary single-project hipSPARSELt PRs test in isolation and pass (#8404). It reproduces in p2 (#8133), so it is not specific to this PR's commits.

Not a perf regression: same-hardware A/B of develop vs this PR's origami change (built -O3 -DNDEBUG, full spmm filter) is at parity (develop 653s vs this PR 634s over 50971 cases).

Suggested fix is CI-side, not origami:

  • Raise the spmm_quick/quick category_timeouts in projects/hipsparselt/clients/test/test_categories.yaml (e.g. 300 -> 900). Precedent: [rocBLAS] Update test_categories.yaml #8441 raises rocBLAS quick: 900 -> 1800 ("increment timeout until we move to different test set"); rocBLAS quick is already 900-1800s while hipSPARSELt spmm_quick is still 300s.
  • Or avoid co-scheduling multiple projects' GPU test jobs on one GPU in the combined bundle.

Citations:

@davidd-amd

Copy link
Copy Markdown
Contributor Author

Looks mostly good. I don't know idiomatic usage of the docs/ directory but I was surprised to see content like that in this PR. is that standard?

Other issue is that I would generally prefer the build to fail if we can't find the origami from the build from TheRock (in the case that we are building in TheRock).

It will once that path is available which will require the PR in the rock to merge. Otherwise, we would unnecessarily fail.

davidd-amd and others added 8 commits June 15, 2026 19:29
Replace origami's "export everything" behaviour (default visibility +
WINDOWS_EXPORT_ALL_SYMBOLS) with an explicitly controlled export surface:

- CMakeLists: CXX_VISIBILITY_PRESET hidden + VISIBILITY_INLINES_HIDDEN; add
  generate_export_header producing origami/origami_export.h (ORIGAMI_EXPORT
  macro, ORIGAMI_STATIC for static builds); wire the generated build-tree
  include dir into roc::origami-headers and install the header.
- Headers: annotate the public API (7 classes/structs + ~61 free functions
  across 8 headers) with ORIGAMI_EXPORT; include origami/origami_export.h.

Verified in hipblaslt-tpls:local (amdclang++ 23, SHARED build): origami
builds, links, installs; internal helpers are now hidden (308 local-text
symbols) and total dynamic-defined exports drop 378 -> 325. Residual
std::filesystem exports (~142) are a libstdc++ default-visibility artifact
pulled in via std::ofstream, pre-existing in baseline; stripping them fully
would need a linker version-script (optional follow-up).

Phase 1 G1 of TensileLite shared-library effort. Additive/reversible;
no namespace, install/export-primitive, or ORIGAMI_BUILD_SHARED_LIBS change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ation

Replace the enumerated INTERFACE target_sources + install(DIRECTORY include/)
with a CMake HEADERS file set on roc::origami-headers:

- target_sources(origami-headers INTERFACE FILE_SET HEADERS BASE_DIRS include +
  build-include FILES <10 public headers + generated origami_export.h>): the
  file set is now the single source of truth for the public surface; anything
  not listed is private by construction. BASE_DIRS carries the include roots
  into the build/install interfaces.
- Install the file set via install(TARGETS origami-headers EXPORT origami-targets
  FILE_SET HEADERS ...) (rocm_install's TARGETS wrapper cannot forward FILE_SET,
  so origami-headers is installed explicitly and origami alone goes through
  rocm_install). The exported origami-targets.cmake now carries the HEADERS set.
- set_target_properties(origami-headers VERIFY_INTERFACE_HEADER_SETS ON): every
  public header is now compiled standalone by all_verify_interface_header_sets.
- Public headers include <hip/hip_runtime.h> and use hip types, so the header
  interface requires hip: link hip::host INTERFACE on origami-headers (never
  hip::device, per the config guard) and demote origami's own hip::host link to
  PRIVATE. This both models the real requirement and lets the verify compile and
  header-only consumers (hipblaslt-rocroller) see hip's include dirs.

Verified in hipblaslt-tpls:local: all 11 public headers pass
all_verify_interface_header_sets; install tree contains exactly the public
headers in the correct origami/ layout; a standalone find_package(origami)
consumer configures, compiles, links (exported API resolves), and ldd resolves
to the single installed liborigami.so.1.

Phase 1 G2+G3 of TensileLite shared-library effort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…p double-building

Re-enable Origami as a findable package in both consumers so liborigami.so is
built once and shared, instead of being recompiled inside each consumer's tree.

hipBLASLt (G5): replace the `if(FALSE)` / add_subdirectory("../../shared/origami")
fallback (disabled by #3687 "until origami is available in TheRock") with a plain
`find_package(origami REQUIRED)` in the non-superbuild path. D1 review confirmed
the disable was a merge omission -- origami is now built/installed in both TheRock
and the monorepo superbuild (where this branch is skipped and roc::origami comes
from the superbuild's earlier add_subdirectory(shared/origami)). rocisa's own
`if(NOT TARGET roc::origami)` acquisition is left untouched (TensileLite scope).

hipSPARSELt (G6): drop the now-dead `set(ORIGAMI_ENABLE_INSTALL OFF)` shim (it
only mattered while hipBLASLt add_subdirectory'd origami) and declare the
dependency explicitly with `find_package(origami REQUIRED)`. Per Phase 1 scope,
`HIPBLASLT_ENABLE_HOST OFF`, `add_subdirectory(hipblaslt)`, and the
`TENSILE_STATIC_ONLY` reach-in are kept (Phase 2 removes them).

Validated in hipblaslt-tpls:local against an installed origami: hipBLASLt
configures via find_package(origami) (exit 0) and its build tree contains zero
origami .o objects, no origami.dir target, and no liborigami build rule -- the
only liborigami reference is the installed /tmp/oi/lib/liborigami.so.1.0 it links
against. Full hipBLASLt+hipSPARSELt build (36-export collapse + ldd gates) is the
G7 validation, pending.

Phase 1 G5+G6 of TensileLite shared-library effort. Reversible (§6.1).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… block

hipsparselt does not directly depend on origami — it is hipBLASLt's (and
tensilelite-host's) dependency. hipBLASLt's own find_package(origami REQUIRED)
at line 262 fires before hipsparselt uses any origami target, so the explicit
call in the hipsparselt block was redundant. Remove it and keep only the
comment documenting why the old ORIGAMI_ENABLE_INSTALL shim is gone.

Feedback from /cmake:grade finding SC-55 (dependency ownership clarity).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…oller)

find_package(origami REQUIRED) requires a pre-installed/known origami, which
breaks a standalone hipBLASLt configure (no origami in CMAKE_PREFIX_PATH).

Mirror the rocroller pattern: find_package when HIPBLASLT_ENABLE_THEROCK, else
fetch/build origami as a subdirectory. The TheRock/superbuild path still uses the
single installed/super-built copy (the dedup that matters in a shared prefix); the
add_subdirectory is the dev-build fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…MI_ENABLE_INSTALL

So the origami dedup can merge into rocm-libraries ahead of the TheRock side
without breaking TheRock's build:

- hipBLASLt consumes origami via a QUIET find_package probe, falling back to
  add_subdirectory(... EXCLUDE_FROM_ALL) when origami is not yet a findable
  package. EXCLUDE_FROM_ALL orphans origami's install rules (lib, headers,
  license, export) in the embedding build -- no duplicate install, no
  unclaimed-license artifact-accounting failure. Probe (not REQUIRED) keeps
  TheRock building today and auto-upgrades to the single installed origami
  once it ships.
- origami's nanobind dependency is found first and only FetchContent-fetched as a
  fallback, so a hermetic build that already provides nanobind does not fetch.
- Drop the ORIGAMI_ENABLE_INSTALL option and its install gate: every embedding
  consumer (hipBLASLt, rocisa, origami's own python) uses EXCLUDE_FROM_ALL to
  suppress the install, so the toggle is redundant. origami's install is now
  unconditional; standalone/superbuild/TheRock install it, EXCLUDE_FROM_ALL
  embeds do not.

Validated in hipblaslt-tpls:local: origami standalone still configures+installs
(lib/license/config/export); hipBLASLt standalone configure orphans origami's
install (top-level cmake_install does not descend into the origami subdir).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove the comments added on this branch (origami/hipBLASLt CMake), keeping only
the SPDX/copyright headers and develop's pre-existing comments. No code changes --
comment/blank lines only; C++ headers (and their #include directives) untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…red-build link

The G1 visibility refactor set CXX_VISIBILITY_PRESET hidden + per-symbol
ORIGAMI_EXPORT annotations but missed logger.hpp. Under the standalone
SHARED build, Logger::{instance,log,flush,update_from_env} (defined in
logger.cpp) were hidden, breaking origami-tests link with undefined
symbols. Annotate the four public out-of-line methods to match the
per-method export style used across the other origami headers.
The spmm clients verify every case against a CPU OpenBLAS reference
(cblas_gemm). The test env pinned OPENBLAS_NUM_THREADS=OMP_NUM_THREADS=48
regardless of ctest --parallel 8, so up to 8 concurrent test processes
spawned ~384 reference-BLAS threads, oversubscribing the CPU and inflating
suite runtime. This pushed spmm_quick_suite (300s cap) into Timeout when
hipsparselt is exercised in the combined multi-project test bundle, while
the same suite passes in isolation. Drop the per-process thread count to 6
so 8 parallel processes use ~48 threads total.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@davidd-amd davidd-amd force-pushed the users/davidd-amd/tensilelite-shared-p1-origami branch from 81bc34f to 1791b67 Compare June 15, 2026 19:29

@aliry95amd aliry95amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Origami perspective, it looks good to me. My only concern is the new ORIGAMI_EXPORT macro. I am not really familiar with the best practices here, but is it possible to apply the macro as a block to all functions instead of touching every single one?

@davidd-amd davidd-amd self-assigned this Jun 15, 2026
@davidd-amd

davidd-amd commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

From Origami perspective, it looks good to me. My only concern is the new ORIGAMI_EXPORT macro. I am not really familiar with the best practices here, but is it possible to apply the macro as a block to all functions instead of touching every single one?

Good suggestion, and I did look at consolidating these (class-scope on the types, or a #pragma GCC visibility push/pop region around the free functions) to cut the per-function annotations. I'm deliberately keeping the per-symbol labels, for one main reason: failure-mode safety under -fvisibility=hidden.

With per-function labels, anything without a macro is hidden, so a forgotten/copy-pasted declaration fails as a missing export -> link error: loud and immediate. With class-scope or a visibility region, the export comes from context rather than the declaration, so a new private method on an exported class, or a line pasted into a push(default) block, gets silently exported - no error, just a widened surface and a re-opened interposition hazard. For a shared/ library that a lot of people edit, biasing every mistake toward the loud/safe direction is worth the extra annotations, and it keeps the public surface grep-able (grep ORIGAMI_EXPORT == the API). This also matches how libc++ does it (explicit macros at each declaration; the bare region pragma reserved for things like template instantiations and extern "C").

That said, there are a few places where I used class scope labels and I may create a follow on PR to really dial these details in to be certain we aren't doing something unintended like exporting the vtable if someone copy/pastes code - i.e. per-function is only equivalent to class-scope for non-polymorphic, non-RTTI-crossed types.

@davidd-amd

Copy link
Copy Markdown
Contributor Author

Overriding infra approval requirement.

@davidd-amd davidd-amd merged commit b94027a into develop Jun 16, 2026
61 checks passed
@davidd-amd davidd-amd deleted the users/davidd-amd/tensilelite-shared-p1-origami branch June 16, 2026 02:25
@awhittle3 awhittle3 mentioned this pull request Jun 16, 2026
1 task
ryanswann-amd added a commit that referenced this pull request Jun 18, 2026
## What
Fixes the failing `precheckin(origami)` on #6334.

The attention model was added without the symbol-export boilerplate
#7996 introduced (hidden default visibility + `generate_export_header`).
None of the `origami::attention` functions were marked `ORIGAMI_EXPORT`,
so they ended up as local symbols in `liborigami.so` — breaking the link
of both `origami-tests` and the `_pyorigami` bindings (which take the
address of these functions) with undefined references.

- `attention.hpp`: include `origami/origami_export.h` and mark the 18
public functions `ORIGAMI_EXPORT` (matches `gemm.hpp`; these are the
public API exposed via `bindings.cpp`).
- `CMakeLists.txt`: add `attention.hpp` to the `FILE_SET HEADERS` list
so it's installed and checked by `VERIFY_INTERFACE_HEADER_SETS`.

2 files, +20/−18. No logic changes.

## Validation (MI300X / gfx942)
- Build + link: OK (both `origami-tests` and `_pyorigami`)
- C++ `ctest`: 110/110 pass
- Python `ctest`: 6/6 pass (2 torch-gated selector tests skip without
torch)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants