Skip to content

feat(tensilelite): stop treating tensilelite host as internal library#8133

Open
davidd-amd wants to merge 44 commits into
developfrom
users/davidd-amd/tensilelite-shared-p2-tensilelite
Open

feat(tensilelite): stop treating tensilelite host as internal library#8133
davidd-amd wants to merge 44 commits into
developfrom
users/davidd-amd/tensilelite-shared-p2-tensilelite

Conversation

@davidd-amd

@davidd-amd davidd-amd commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

hipBLASLt and hipSPARSELt each compiled their own copy of the TensileLite host code, so
both libhipblaslt.so and libhipsparselt.so exported the same ~117 TensileLite symbols.
Under ELF flat-namespace interposition the loader binds all references to whichever copy
loads first — the cause of a March 2026 segfault (#6917, originally worked around by hiding
all TensileLite symbols at the source level with bespoke TENSILE_HIDDEN_BEGIN/END
wrappers).

This PR makes tensilelite-host a first-class, exportable CMake target with a curated
public API, and removes the source-level symbol-hiding workaround in favor of the standard
generated export header. It also folds in the device-library codegen export and the
hipSPARSELt consumption path, so the whole "TensileLite host is a real library, not an
internal blob" change lands as one unit.

Default build mode is static embed. tensilelite-host builds STATIC by default and is
absorbed into each consumer; under the hidden-visibility preset its internals (and, in
static mode, the public API itself) stay hidden, so neither consumer re-exports the
TensileLite surface and the interposition hazard is avoided — the same end state as the old
workaround, achieved through the normal export-header mechanism rather than hand-maintained
hidden regions. A single-owner shared libtensilelite-host.so (true structural dedup) is
available behind TENSILELITE_BUILD_SHARED_LIBS=ON but is not the default yet: it needs
a single producer across the two consumers plus a TheRock packaging edge that is still being
worked out, so it is deferred to avoid shipping a half-wired shared mode.

What changed

Library shape & exports

  • tensilelite-host is an exportable target (was an OBJECT library absorbed into each
    consumer). Build type is driven by TENSILELITE_BUILD_SHARED_LIBS (default OFF → static
    embed; ON → co-exported roc::tensilelite-host shared library). The dedicated option is
    used instead of the not-yet-synced global BUILD_SHARED_LIBS so the fork can't silently
    flip the mode.
  • Export inversion, not a rename. TENSILE_API flips from a hidden marker to the
    generated TENSILELITEHOST_EXPORT (generate_export_header). Under the hidden visibility
    preset only the curated public API is annotated for export; internals stay hidden. In a
    static build the macro resolves empty (TENSILELITEHOST_STATIC), so the public API stays
    hidden inside the consuming .so too. The 100 dead TENSILE_HIDDEN_BEGIN/END no-op
    wrappers are removed.
  • Stop re-exporting static LLVM. With YAML serialization on, tensilelite-host links the
    static LLVM archives; as a shared library it re-exported ~3700 LLVM symbols at default
    visibility (the preset governs the target's own code, not prebuilt archives). In a client
    process those interpose on the HIP runtime's in-process comgr LLVM and corrupt its AMDGPU
    cl::opt registry — comgr then rejects its own backend options at code-object load
    (Unknown command line argument '-amdgpu-prelink'). Linking with --exclude-libs,ALL
    (scoped to SHARED AND NOT WIN32) localizes the static-archive symbols; LLVM exports drop
    ~3700 → ~10 with the public API unchanged.
  • Windows dllexport fix. Annotate the friend operator<<(TensorDescriptor) first
    declaration with the export macro so clang-cl doesn't error on a redeclaration.

Device-library codegen (formerly "Phase 3")

  • Codegen is an exported CMake function. New HipBLASLtCodegen.cmake /
    HipBLASLtCodegenInstall.cmake modules wire the Tensile device-library generation as a
    reusable, exported step so a downstream consumer can produce TensileLibrary_lazy_<arch>
    without re-deriving the codegen invocation.
  • hipSPARSELt consumes hipBLASLt via find_package on TheRock. When hipBLASLt is a
    provided package (HIPSPARSELT_ENABLE_THEROCK AND "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES) hipSPARSELt does find_package(hipblaslt CONFIG REQUIRED);
    otherwise it builds hipBLASLt in-tree via add_subdirectory.

Install-surface correctness

  • Ship the Tensile device library; keep embedded deps out of the package. EXCLUDE_FROM_ALL
    on the in-tree hipBLASLt subdirectory was suppressing the add_custom_target(... ALL) that
    generates the device library, so TensileLibrary_lazy_<arch>.dat never installed and the
    hipSPARSELt suite failed every case with "Could not initialize Tensile library". Root fix:
    gate origami's and stinkytofu's install/export/package rules on <PROJ>_STANDALONE
    (top-level project ⇒ find_package'd ⇒ installs its surface; add_subdirectory ⇒ embedded
    ⇒ installs nothing), matching what rocisa already does. hipSPARSELt then drops
    EXCLUDE_FROM_ALL: hipBLASLt's device-library target builds and installs to
    lib/hipsparselt/library with no origami/stinkytofu pollution in the install tree.
  • Headers stay in-tree for the library export. tensilelite-host ships binary-only; the
    Tensile host header file sets are attached for build-tree/IDE organization and consumed via
    BUILD_INTERFACE, so the installed Tensile/*.hpp collision problem cannot occur.
  • Removed the dead TENSILE_USE_HIP non-HIP DataTypes paths, un-gated
    TENSILE_DEFAULT_SERIALIZATION, deleted the dead Tensile-fork glue and the old hipSPARSELt
    per-symbol workaround, and kept origami out of the installed package surface.

Diagnostics

  • Structured diagnostics facility. A [tensilelite:diag] logfmt+banner facility in the
    client (Diagnostic.hpp) mirrored in the python test harness, with auto config/gpu/phase
    context, to make client/codegen failures legible in CI.

CI fixes folded in

  • gfx950 multi-DU YAMLs: drop dead GlobalParameters keys that fail validation on the
    current loader.
  • joblib Set changed size during iteration: the logic-file load used
    ParallelMap2(return_as="generator_unordered"). joblib ≥1.5.0's unordered _retrieve
    iterates self._jobs_set unlocked (next(iter(...))) while the dispatcher thread mutates
    it, racing into RuntimeError: Set changed size during iteration (seen on the gfx950 /
    ubuntu-24 precheckin loading 648 logic files across 32 threads). Switched to ordered
    return_as="generator", which waits on self._jobs[0] and never iterates _jobs_set.
    Result is unchanged — the loop merges into masterLibraries and sorts explicitly for
    determinism, so consumption order is irrelevant. (Latent since the call adopted unordered
    mode in Jan 2025 [hipBLASLt] Remove debug parameter #1514; armed once CI picked up joblib ≥1.5.0.)

Validation (gfx90a / gfx942)

  • Static default: tensilelite-host builds libtensilelite-host.a; the export carries it
    STATIC IMPORTED with INTERFACE_COMPILE_DEFINITIONS TENSILELITEHOST_STATIC. The public
    API stays hidden inside the consuming .so, so neither consumer re-exports the TensileLite
    surface.
  • Shared opt-in (TENSILELITE_BUILD_SHARED_LIBS=ON): libhipblaslt.so DT_NEEDEDs the
    single libtensilelite-host.so.1; 0 strong-symbol duplicates between them. Remaining
    overlaps are benign weak template/inline vague-linkage, loader-deduped. hipSPARSELt
    resolves the same single lib.
  • Symbol surface: static LLVM is localized — relink shows ~3700 → ~10 LLVM symbols
    exported, 0 cl::opt/target-registry symbols remain, TensileLite API exports unchanged.
  • Device library installs through the add_subdirectory path (gfx942):
    TensileLibrary_lazy_gfx942.dat.zlib installs to lib/hipsparselt/library; the install
    tree contains no origami/stinkytofu headers, cmake exports, or libraries.

TheRock-visible delta

  • No new top-level package. With the shared mode enabled, roc::tensilelite-host is
    co-exported inside hipBLASLt's existing package; in the default static build there is no new
    installed library at all.
  • No new installed headers — Tensile host headers stay in-tree.
  • New hipSPARSELt → hipBLASLt edge: hipSPARSELt either find_packages hipBLASLt (TheRock,
    provided) or builds it in-tree, and the device library now installs to lib/hipsparselt.
  • Origami/stinkytofu install their package surface only when built standalone; embedded into a
    consumer they install nothing.

AIHPBLAS-3522

@codecov-commenter

codecov-commenter commented Jun 6, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #8133      +/-   ##
===========================================
- Coverage    71.33%   71.30%   -0.03%     
===========================================
  Files         2628     2614      -14     
  Lines       413115   409843    -3272     
  Branches     61878    61235     -643     
===========================================
- Hits        294661   292202    -2459     
+ Misses       96680    96190     -490     
+ Partials     21774    21451     -323     
Flag Coverage Δ *Carryforward flag
TensileLite 76.65% <ø> (ø)
hipBLAS 90.81% <ø> (ø) Carriedforward from e310076
hipBLASLt 41.35% <ø> (ø)
hipCUB 82.68% <ø> (ø)
hipDNN 86.50% <ø> (+0.59%) ⬆️ Carriedforward from e310076
hipFFT 50.17% <ø> (ø) Carriedforward from e310076
hipRAND 76.12% <ø> (ø)
hipSOLVER 69.18% <ø> (ø) Carriedforward from e310076
hipSPARSE 86.55% <ø> (ø) Carriedforward from e310076
rocBLAS 48.08% <ø> (ø) Carriedforward from e310076
rocFFT 46.30% <ø> (ø) Carriedforward from e310076
rocRAND 57.08% <ø> (+0.01%) ⬆️
rocSOLVER 76.92% <ø> (ø) Carriedforward from e310076
rocSPARSE 72.37% <ø> (ø) Carriedforward from e310076
rocThrust 91.45% <ø> (+0.10%) ⬆️

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines Coverage Δ
...lt/tensilelite/Tensile/TensileCreateLibrary/Run.py 86.73% <ø> (ø)

... and 38 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@davidd-amd davidd-amd force-pushed the users/davidd-amd/tensilelite-shared-p2-tensilelite branch from 48d4727 to 2259295 Compare June 16, 2026 02:48
@davidd-amd davidd-amd changed the title Users/davidd amd/tensilelite shared p2 tensilelite [tensilelite] Co-export TensileLite host as a shared library; stop duplicate TensileLite builds Jun 16, 2026
Comment thread projects/hipblaslt/cmake/hipblaslt-config.cmake.in Outdated
Comment thread projects/hipblaslt/tensilelite/include/Tensile/Macros.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/include/Tensile/Macros.hpp Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/include/CMakeLists.txt Outdated
Comment thread projects/hipblaslt/tensilelite/src/ContractionSolution.cpp Outdated
Comment thread projects/hipblaslt/tensilelite/src/Tensile.cpp Outdated
davidd-amd and others added 5 commits June 25, 2026 17:01
P2 turned tensilelite-host into a shared library, but a39bf8f/9c6312c118f
dropped Loading.hpp's TENSILE_HIDDEN wrapper without adding the export macro,
so fileToMsgObject stayed hidden by default visibility. libhipblaslt.so links
fine (shared libs tolerate undefined symbols at link time) but the ext-op
softmax path fails at runtime: undefined symbol TensileLite::fileToMsgObject.

Annotate it with TENSILELITEHOST_EXPORT, matching objectToMap in the sibling
MessagePack.hpp. It is the only free function in this header consumed across
the .so boundary (hipblaslt-ext-op-internal.hpp); the templated loaders are
internal to tensilelite-host.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rface (re-fix)

b6e10da (P3 header co-export) reverted the e69e0f8 fix six seconds
after it was re-applied: it re-added find_dependency(origami) + DEPENDS PACKAGE
origami and un-wrapped roc::origami from BUILD_INTERFACE. origami builds static
and is absorbed into libtensilelite-host.so, so leaking it into the export
interface forced an unsatisfiable find_dependency(origami) on downstream
find_package(hipblaslt) and broke rocBLAS configure against installed hipBLASLt.

Re-apply the three-part fix:
- tensilelite/CMakeLists.txt: link roc::origami via $<BUILD_INTERFACE:> so it
  stays build-only and out of the install interface
- hipblaslt/CMakeLists.txt: drop DEPENDS PACKAGE origami from rocm_export_targets
- hipblaslt-config.cmake.in: drop hand-written find_dependency(origami)

hipsparselt already inherits the fix through the shared tensilelite-host target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The multi-DU YAMLs (#7781) carry MergeFiles, DeviceLDS and MaxLDS under
GlobalParameters, none of which tensilelite reads there: MergeFiles has no
consumer, DeviceLDS is a hardware archCap, and MaxLDS is a solution parameter
(state["MaxLDS"], validParameters). They tripped the strict corpus gate (#8328)
once both PRs landed in develop. The strict gate only reported MergeFiles
because it raises on the first offender.

Remove all three. Output-equivalent for gfx950: archCaps["DeviceLDS"] is
160 KiB = 163840 (the hardcoded value) and solution MaxLDS defaults to -1 which
resolves to that same archCap. Verified 0 offenders across all 389 corpus YAMLs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UILD_SHARED_LIBS (default OFF, static embed)

Stop shipping tensilelite-host as a standalone shared .so that both hipBLASLt
and hipSPARSELt produce and overlay into one ROCm dist (collision). Default to
static-embed in both consumers via one native switch:

  option(TENSILELITE_BUILD_SHARED_LIBS ... OFF)
  set(BUILD_SHARED_LIBS ${TENSILELITE_BUILD_SHARED_LIBS})
  add_library(tensilelite-host)   # type derived natively from BUILD_SHARED_LIBS

--exclude-libs,ALL and the install block gate on BUILD_SHARED_LIBS; the
hipBLASLt co-export of roc::tensilelite-host and hipSPARSELt's find_package
gate on the cache var. hipSPARSELt forces the switch OFF and only
find_package(hipblaslt) when "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES,
so it auto-enables when a future TheRock edge declares the dependency.

No LLVM symbol re-leak: the static-embed consumer .so carries the same LLVM
cl::opt exposure as develop's OBJECT embed (no consumer --exclude-libs in
either; develop is green with YAML=ON). The P2 failure was the standalone
overlaid .so, which this default does not produce. Shared (.so) deferred
behind the switch until a single-producer TheRock dependency edge exists.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@Alex-Vasile Alex-Vasile left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

…alls on STANDALONE

The hipSPARSELt test suite fails every case with "Could not initialize Tensile
library" because TensileLibrary_lazy_<arch>.dat is absent from the install.

Root cause: 51f9d51 added EXCLUDE_FROM_ALL to add_subdirectory(hipblaslt) to keep
hipBLASLt's package surface out of hipSPARSELt. On TheRock the add_subdirectory
path is the active one (hipBLASLt is not in hipSPARSELt's declared deps, so it is
not in THEROCK_PROVIDED_PACKAGES), and the device libraries are produced by
add_custom_target(... ALL) that nothing links. EXCLUDE_FROM_ALL drops that target
from the default build AND suppresses the subdirectory's install rules, so the
device library is never generated or installed.

EXCLUDE_FROM_ALL was masking a real defect: origami and stinkytofu installed
their package surface unconditionally, unlike rocisa and tensilelite-host which
only install when built standalone/shared. Gate origami's and stinkytofu's
install/export/package rules on <PROJ>_STANDALONE (top-level project = will be
find_package'd = installs its surface; add_subdirectory = embedded/static =
installs nothing), matching what rocisa already does. With that in place
hipSPARSELt can simply drop EXCLUDE_FROM_ALL: hipBLASLt's own device-library
target builds and its rocm_install ships the library to lib/hipsparselt with no
embedded-dep pollution.

Verified by building the device library through the add_subdirectory path for
gfx942: TensileLibrary_lazy_gfx942.dat.zlib installs to lib/hipsparselt/library
and the install tree contains no origami/stinkytofu headers, cmake exports, or
libraries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@davidd-amd davidd-amd requested a review from a team as a code owner June 26, 2026 22:28
davidd-amd and others added 3 commits June 27, 2026 00:19
The previous commit gated origami's and stinkytofu's install/export/package
rules on <PROJ>_STANDALONE to keep them out of the hipSPARSELt artifact. That
gating is unnecessary: origami and stinkytofu are already add_subdirectory'd with
EXCLUDE_FROM_ALL at every embedded site (hipblaslt and rocisa), so the installer
never descends into their directories and their package surface is never part of
the install walk -- whether or not the rules are wrapped in a STANDALONE guard.
Verified on cmake 3.27.9: an EXCLUDE_FROM_ALL subdirectory installs nothing even
when its targets are built as link dependencies, and a non-excluded parent still
installs its own rules.

Revert both dependency edits. The hipSPARSELt device-library fix is the single
add_subdirectory(hipblaslt) EXCLUDE_FROM_ALL removal in the prior commit; no
changes to shared/origami or shared/stinkytofu are needed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… loading

Switch the logic-file load from ParallelMap2 return_as="generator_unordered"
to "generator". joblib >=1.5.0's unordered _retrieve iterates self._jobs_set
unlocked (next(iter(...))) while the dispatcher thread mutates it, racing into
RuntimeError: Set changed size during iteration. The ordered generator path
waits on self._jobs[0] and never iterates _jobs_set, so it is immune. Result
is unchanged: the loop merges into masterLibraries and sorts explicitly for
determinism, so consumption order is irrelevant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@therock-pr-bot

therock-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

❌ PR Check — Action Required

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ✅ Pass
Forbidden Files ✅ Pass
🧪 Unit Test ❌ Fail Error: Source/code files changed without an accompanying unit test.
Expected: add at least one test file named like test_<name>.py / test_<name>.cpp (or <name>_test.*).
Current: code file(s) changed: projects/hipblaslt/clients/tests/src/caching_library_gtest.cpp, projects/hipblaslt/tensilelite/Tensile/Source/tensile_float8_bfloat8.h, projects/hipblaslt/tensilelite/Tensile/TensileCreateLibrary/Run.py, projects/hipblaslt/tensilelite/include/Tensile/AMDGPU.hpp, projects/hipblaslt/tensilelite/include/Tensile/AMDGPUPredicates.hpp (+102 more); no test file found
🔎 pre-commit ✅ Pass
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled

⚠️ 1 policy check(s) failed. Please address the issues above before this PR can be Reviewed.

🚫 Please fix the failed policies

  • ❌ Unit Test

The Not ready to Review label was added to this PR. Once all policies pass, the label is removed automatically.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot

therock-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

🚫 Please fix the failed policies before requesting reviews.

The following policy checks failed:

  • ❌ Unit Test

The Not ready to Review label has been added to this PR.
Once all policies pass, the label will be removed automatically.

@davidd-amd davidd-amd changed the title [tensilelite] stop treating tensilelite host as internal library feat(tensilelite): stop treating tensilelite host as internal library Jun 29, 2026
davidd-amd and others added 3 commits June 29, 2026 21:14
…t, drop the install copy

The codegen generator (Tensile/rocisa) is a build requirement provisioned a
priori via `uv/pip install tensile`, not materialized by CMake install rules.

- Drop the hand-enumerated Tensile python subset copy (HipBLASLtCodegenInstall.cmake)
  and the redundant codegen-requirements.txt; tensilelite/requirements.txt and
  pyproject.toml are the authoritative dep lists (install.sh already installs them).
- Keep exporting the consumer cmake surface: install HipBLASLtCodegen.cmake and
  include it from hipblaslt-config.cmake so find_package(hipblaslt) provides
  hipblaslt_create_device_library() (hipSPARSELt-on-TheRock relies on this).
- Installed config points HIPBLASLT_PYTHON_COMMAND at a bare python3 (installed
  tensile) instead of PYTHONPATH=share/hipblaslt/codegen, and resolves
  HIPBLASLT_CODEGEN_ROOT from the a-priori-installed tensile package. Python is
  looked up QUIET/optional so pure C++ consumers are unaffected.
- Centralize HIPBLASLT_CODEGEN_ROOT into one overridable top-level cache var;
  the bundled-python PYTHONPATH and device-library codegen both derive from it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ensilelite-shared-p2-tensilelite

# Conflicts:
#	projects/hipblaslt/tensilelite/client/include/TimingInstrumentation.hpp
…is PR

The diagnostics facility (Diagnostic.hpp, DIAGNOSTICS.md, the client
phase-tracking, and the Python harness mirror) is functional scope
orthogonal to this shared-library/visibility change. Extracted to its
own branch (tensilelite-client-diagnostics); the facility-touched files
are restored to develop, taking develop's reworked ScopedTimer (#6043).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
endif()
endblock()

if(EXISTS "${CMAKE_CURRENT_LIST_DIR}/HipBLASLtCodegen.cmake")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to go we want determinism and if we are in the config file that means we are consuming an install and we should use the Codegen cmake file.

$<BUILD_INTERFACE:rocisa::rocisa-cpp>
roc::origami
$<BUILD_INTERFACE:roc::origami>
hip::host

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about not wrapping this in a build_interface gen expression

Comment thread projects/hipblaslt/CMakeLists.txt Outdated
Comment on lines +214 to +217
set(HIPBLASLT_CODEGEN_ROOT "" CACHE STRING "Root of the codegen (tensilelite) Python sources. Defaults to the in-tree 'tensilelite' directory; override to use an installed/alternate tensile.")
if(NOT HIPBLASLT_CODEGEN_ROOT)
set(HIPBLASLT_CODEGEN_ROOT "${hipblaslt_SOURCE_DIR}/tensilelite")
endif()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not certain that this is necessary. We know this path at all times whether we are in a build or install context

Comment on lines +427 to +432
set(_hipblaslt_export_targets roc::hipblaslt)
if(TENSILELITE_BUILD_SHARED_LIBS)
list(APPEND _hipblaslt_export_targets roc::tensilelite-host)
endif()
rocm_export_targets(
TARGETS roc::hipblaslt
TARGETS ${_hipblaslt_export_targets}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a smell and the logic is actually different than in the other file. we should realy not add installation calls in nested subdirs IMO for this very reason. we want to collect relevant logic so they don't get out of sync and we don't have to go to multiple files to learn what is installed.

Comment thread projects/hipblaslt/CMakeLists.txt Outdated
Comment on lines +444 to +450
if(NOT WIN32)
install(
FILES "${CMAKE_CURRENT_SOURCE_DIR}/cmake/HipBLASLtCodegen.cmake"
DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/hipblaslt"
COMPONENT devel
)
endif()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense. we didn't have any problems doing codegen on windows before.

Comment thread projects/hipsparselt/CMakeLists.txt Outdated
Comment on lines +150 to +151
if(HIPSPARSELT_ENABLE_THEROCK AND "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES)
find_package(hipblaslt CONFIG REQUIRED)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an anti pattern having a dependency aware of their parents optoins/details. Further, I can't think of a scenario where these conditions wouldn't be true.

davidd-amd and others added 3 commits June 30, 2026 00:07
- hipblaslt-config.cmake.in: drop the runtime Python3/import-Tensile codegen
  discovery; deterministically include HipBLASLtCodegen.cmake via
  CMAKE_CURRENT_LIST_DIR (co-located in the install).
- Eliminate HIPBLASLT_CODEGEN_ROOT: HipBLASLtCodegen.cmake self-locates its
  Tensile companion files (known_bugs.yaml, TensileLogic) via
  CMAKE_CURRENT_LIST_DIR; the build-time PYTHONPATH uses the known source path
  ${hipblaslt_SOURCE_DIR}/tensilelite.
- Ungate the codegen cmake install (codegen on Windows worked before).
- tensilelite-host: wrap hip::host in $<BUILD_INTERFACE:> so it does not leak
  into the exported interface, matching rocisa/origami.
- hipSPARSELt: drop the THEROCK_PROVIDED_PACKAGES find_package(hipblaslt)
  branch so the dependency is not aware of its parent's options; collapse to
  the add_subdirectory path (find_package switch deferred to a follow-on).

Verified: device-library build+install ships TensileLibrary_lazy_<arch>.dat
for hipBLASLt (gfx90a) and hipSPARSELt embedded (gfx942).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants