feat(tensilelite): stop treating tensilelite host as internal library#8133
feat(tensilelite): stop treating tensilelite host as internal library#8133davidd-amd wants to merge 44 commits into
Conversation
e1c74f5 to
48d4727
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #8133 +/- ##
===========================================
- Coverage 71.33% 71.30% -0.03%
===========================================
Files 2628 2614 -14
Lines 413115 409843 -3272
Branches 61878 61235 -643
===========================================
- Hits 294661 292202 -2459
+ Misses 96680 96190 -490
+ Partials 21774 21451 -323
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
48d4727 to
2259295
Compare
…ensilelite-shared-p2-tensilelite
P2 turned tensilelite-host into a shared library, but a39bf8f/9c6312c118f dropped Loading.hpp's TENSILE_HIDDEN wrapper without adding the export macro, so fileToMsgObject stayed hidden by default visibility. libhipblaslt.so links fine (shared libs tolerate undefined symbols at link time) but the ext-op softmax path fails at runtime: undefined symbol TensileLite::fileToMsgObject. Annotate it with TENSILELITEHOST_EXPORT, matching objectToMap in the sibling MessagePack.hpp. It is the only free function in this header consumed across the .so boundary (hipblaslt-ext-op-internal.hpp); the templated loaders are internal to tensilelite-host. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rface (re-fix) b6e10da (P3 header co-export) reverted the e69e0f8 fix six seconds after it was re-applied: it re-added find_dependency(origami) + DEPENDS PACKAGE origami and un-wrapped roc::origami from BUILD_INTERFACE. origami builds static and is absorbed into libtensilelite-host.so, so leaking it into the export interface forced an unsatisfiable find_dependency(origami) on downstream find_package(hipblaslt) and broke rocBLAS configure against installed hipBLASLt. Re-apply the three-part fix: - tensilelite/CMakeLists.txt: link roc::origami via $<BUILD_INTERFACE:> so it stays build-only and out of the install interface - hipblaslt/CMakeLists.txt: drop DEPENDS PACKAGE origami from rocm_export_targets - hipblaslt-config.cmake.in: drop hand-written find_dependency(origami) hipsparselt already inherits the fix through the shared tensilelite-host target. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The multi-DU YAMLs (#7781) carry MergeFiles, DeviceLDS and MaxLDS under GlobalParameters, none of which tensilelite reads there: MergeFiles has no consumer, DeviceLDS is a hardware archCap, and MaxLDS is a solution parameter (state["MaxLDS"], validParameters). They tripped the strict corpus gate (#8328) once both PRs landed in develop. The strict gate only reported MergeFiles because it raises on the first offender. Remove all three. Output-equivalent for gfx950: archCaps["DeviceLDS"] is 160 KiB = 163840 (the hardcoded value) and solution MaxLDS defaults to -1 which resolves to that same archCap. Verified 0 offenders across all 389 corpus YAMLs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UILD_SHARED_LIBS (default OFF, static embed)
Stop shipping tensilelite-host as a standalone shared .so that both hipBLASLt
and hipSPARSELt produce and overlay into one ROCm dist (collision). Default to
static-embed in both consumers via one native switch:
option(TENSILELITE_BUILD_SHARED_LIBS ... OFF)
set(BUILD_SHARED_LIBS ${TENSILELITE_BUILD_SHARED_LIBS})
add_library(tensilelite-host) # type derived natively from BUILD_SHARED_LIBS
--exclude-libs,ALL and the install block gate on BUILD_SHARED_LIBS; the
hipBLASLt co-export of roc::tensilelite-host and hipSPARSELt's find_package
gate on the cache var. hipSPARSELt forces the switch OFF and only
find_package(hipblaslt) when "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES,
so it auto-enables when a future TheRock edge declares the dependency.
No LLVM symbol re-leak: the static-embed consumer .so carries the same LLVM
cl::opt exposure as develop's OBJECT embed (no consumer --exclude-libs in
either; develop is green with YAML=ON). The P2 failure was the standalone
overlaid .so, which this default does not produce. Shared (.so) deferred
behind the switch until a single-producer TheRock dependency edge exists.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…alls on STANDALONE The hipSPARSELt test suite fails every case with "Could not initialize Tensile library" because TensileLibrary_lazy_<arch>.dat is absent from the install. Root cause: 51f9d51 added EXCLUDE_FROM_ALL to add_subdirectory(hipblaslt) to keep hipBLASLt's package surface out of hipSPARSELt. On TheRock the add_subdirectory path is the active one (hipBLASLt is not in hipSPARSELt's declared deps, so it is not in THEROCK_PROVIDED_PACKAGES), and the device libraries are produced by add_custom_target(... ALL) that nothing links. EXCLUDE_FROM_ALL drops that target from the default build AND suppresses the subdirectory's install rules, so the device library is never generated or installed. EXCLUDE_FROM_ALL was masking a real defect: origami and stinkytofu installed their package surface unconditionally, unlike rocisa and tensilelite-host which only install when built standalone/shared. Gate origami's and stinkytofu's install/export/package rules on <PROJ>_STANDALONE (top-level project = will be find_package'd = installs its surface; add_subdirectory = embedded/static = installs nothing), matching what rocisa already does. With that in place hipSPARSELt can simply drop EXCLUDE_FROM_ALL: hipBLASLt's own device-library target builds and its rocm_install ships the library to lib/hipsparselt with no embedded-dep pollution. Verified by building the device library through the add_subdirectory path for gfx942: TensileLibrary_lazy_gfx942.dat.zlib installs to lib/hipsparselt/library and the install tree contains no origami/stinkytofu headers, cmake exports, or libraries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The previous commit gated origami's and stinkytofu's install/export/package rules on <PROJ>_STANDALONE to keep them out of the hipSPARSELt artifact. That gating is unnecessary: origami and stinkytofu are already add_subdirectory'd with EXCLUDE_FROM_ALL at every embedded site (hipblaslt and rocisa), so the installer never descends into their directories and their package surface is never part of the install walk -- whether or not the rules are wrapped in a STANDALONE guard. Verified on cmake 3.27.9: an EXCLUDE_FROM_ALL subdirectory installs nothing even when its targets are built as link dependencies, and a non-excluded parent still installs its own rules. Revert both dependency edits. The hipSPARSELt device-library fix is the single add_subdirectory(hipblaslt) EXCLUDE_FROM_ALL removal in the prior commit; no changes to shared/origami or shared/stinkytofu are needed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… loading Switch the logic-file load from ParallelMap2 return_as="generator_unordered" to "generator". joblib >=1.5.0's unordered _retrieve iterates self._jobs_set unlocked (next(iter(...))) while the dispatcher thread mutates it, racing into RuntimeError: Set changed size during iteration. The ordered generator path waits on self._jobs[0] and never iterates _jobs_set, so it is immune. Result is unchanged: the loop merges into masterLibraries and sorts explicitly for determinism, so consumption order is irrelevant. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
❌ PR Check — Action Required
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🚫 Please fix the failed policies before requesting reviews. The following policy checks failed:
The |
…t, drop the install copy The codegen generator (Tensile/rocisa) is a build requirement provisioned a priori via `uv/pip install tensile`, not materialized by CMake install rules. - Drop the hand-enumerated Tensile python subset copy (HipBLASLtCodegenInstall.cmake) and the redundant codegen-requirements.txt; tensilelite/requirements.txt and pyproject.toml are the authoritative dep lists (install.sh already installs them). - Keep exporting the consumer cmake surface: install HipBLASLtCodegen.cmake and include it from hipblaslt-config.cmake so find_package(hipblaslt) provides hipblaslt_create_device_library() (hipSPARSELt-on-TheRock relies on this). - Installed config points HIPBLASLT_PYTHON_COMMAND at a bare python3 (installed tensile) instead of PYTHONPATH=share/hipblaslt/codegen, and resolves HIPBLASLT_CODEGEN_ROOT from the a-priori-installed tensile package. Python is looked up QUIET/optional so pure C++ consumers are unaffected. - Centralize HIPBLASLT_CODEGEN_ROOT into one overridable top-level cache var; the bundled-python PYTHONPATH and device-library codegen both derive from it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ensilelite-shared-p2-tensilelite # Conflicts: # projects/hipblaslt/tensilelite/client/include/TimingInstrumentation.hpp
…is PR The diagnostics facility (Diagnostic.hpp, DIAGNOSTICS.md, the client phase-tracking, and the Python harness mirror) is functional scope orthogonal to this shared-library/visibility change. Extracted to its own branch (tensilelite-client-diagnostics); the facility-touched files are restored to develop, taking develop's reworked ScopedTimer (#6043). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| endif() | ||
| endblock() | ||
|
|
||
| if(EXISTS "${CMAKE_CURRENT_LIST_DIR}/HipBLASLtCodegen.cmake") |
There was a problem hiding this comment.
This needs to go we want determinism and if we are in the config file that means we are consuming an install and we should use the Codegen cmake file.
| $<BUILD_INTERFACE:rocisa::rocisa-cpp> | ||
| roc::origami | ||
| $<BUILD_INTERFACE:roc::origami> | ||
| hip::host |
There was a problem hiding this comment.
I am concerned about not wrapping this in a build_interface gen expression
| set(HIPBLASLT_CODEGEN_ROOT "" CACHE STRING "Root of the codegen (tensilelite) Python sources. Defaults to the in-tree 'tensilelite' directory; override to use an installed/alternate tensile.") | ||
| if(NOT HIPBLASLT_CODEGEN_ROOT) | ||
| set(HIPBLASLT_CODEGEN_ROOT "${hipblaslt_SOURCE_DIR}/tensilelite") | ||
| endif() |
There was a problem hiding this comment.
I am not certain that this is necessary. We know this path at all times whether we are in a build or install context
| set(_hipblaslt_export_targets roc::hipblaslt) | ||
| if(TENSILELITE_BUILD_SHARED_LIBS) | ||
| list(APPEND _hipblaslt_export_targets roc::tensilelite-host) | ||
| endif() | ||
| rocm_export_targets( | ||
| TARGETS roc::hipblaslt | ||
| TARGETS ${_hipblaslt_export_targets} |
There was a problem hiding this comment.
This is a smell and the logic is actually different than in the other file. we should realy not add installation calls in nested subdirs IMO for this very reason. we want to collect relevant logic so they don't get out of sync and we don't have to go to multiple files to learn what is installed.
| if(NOT WIN32) | ||
| install( | ||
| FILES "${CMAKE_CURRENT_SOURCE_DIR}/cmake/HipBLASLtCodegen.cmake" | ||
| DESTINATION "${CMAKE_INSTALL_LIBDIR}/cmake/hipblaslt" | ||
| COMPONENT devel | ||
| ) | ||
| endif() |
There was a problem hiding this comment.
this doesn't make sense. we didn't have any problems doing codegen on windows before.
| if(HIPSPARSELT_ENABLE_THEROCK AND "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES) | ||
| find_package(hipblaslt CONFIG REQUIRED) |
There was a problem hiding this comment.
This is an anti pattern having a dependency aware of their parents optoins/details. Further, I can't think of a scenario where these conditions wouldn't be true.
- hipblaslt-config.cmake.in: drop the runtime Python3/import-Tensile codegen
discovery; deterministically include HipBLASLtCodegen.cmake via
CMAKE_CURRENT_LIST_DIR (co-located in the install).
- Eliminate HIPBLASLT_CODEGEN_ROOT: HipBLASLtCodegen.cmake self-locates its
Tensile companion files (known_bugs.yaml, TensileLogic) via
CMAKE_CURRENT_LIST_DIR; the build-time PYTHONPATH uses the known source path
${hipblaslt_SOURCE_DIR}/tensilelite.
- Ungate the codegen cmake install (codegen on Windows worked before).
- tensilelite-host: wrap hip::host in $<BUILD_INTERFACE:> so it does not leak
into the exported interface, matching rocisa/origami.
- hipSPARSELt: drop the THEROCK_PROVIDED_PACKAGES find_package(hipblaslt)
branch so the dependency is not aware of its parent's options; collapse to
the add_subdirectory path (find_package switch deferred to a follow-on).
Verified: device-library build+install ships TensileLibrary_lazy_<arch>.dat
for hipBLASLt (gfx90a) and hipSPARSELt embedded (gfx942).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
hipBLASLt and hipSPARSELt each compiled their own copy of the TensileLite host code, so
both
libhipblaslt.soandlibhipsparselt.soexported the same ~117 TensileLite symbols.Under ELF flat-namespace interposition the loader binds all references to whichever copy
loads first — the cause of a March 2026 segfault (#6917, originally worked around by hiding
all TensileLite symbols at the source level with bespoke
TENSILE_HIDDEN_BEGIN/ENDwrappers).
This PR makes
tensilelite-hosta first-class, exportable CMake target with a curatedpublic API, and removes the source-level symbol-hiding workaround in favor of the standard
generated export header. It also folds in the device-library codegen export and the
hipSPARSELt consumption path, so the whole "TensileLite host is a real library, not an
internal blob" change lands as one unit.
Default build mode is static embed.
tensilelite-hostbuildsSTATICby default and isabsorbed into each consumer; under the hidden-visibility preset its internals (and, in
static mode, the public API itself) stay hidden, so neither consumer re-exports the
TensileLite surface and the interposition hazard is avoided — the same end state as the old
workaround, achieved through the normal export-header mechanism rather than hand-maintained
hidden regions. A single-owner shared
libtensilelite-host.so(true structural dedup) isavailable behind
TENSILELITE_BUILD_SHARED_LIBS=ONbut is not the default yet: it needsa single producer across the two consumers plus a TheRock packaging edge that is still being
worked out, so it is deferred to avoid shipping a half-wired shared mode.
What changed
Library shape & exports
tensilelite-hostis an exportable target (was anOBJECTlibrary absorbed into eachconsumer). Build type is driven by
TENSILELITE_BUILD_SHARED_LIBS(defaultOFF→ staticembed;
ON→ co-exportedroc::tensilelite-hostshared library). The dedicated option isused instead of the not-yet-synced global
BUILD_SHARED_LIBSso the fork can't silentlyflip the mode.
TENSILE_APIflips from a hidden marker to thegenerated
TENSILELITEHOST_EXPORT(generate_export_header). Under the hidden visibilitypreset only the curated public API is annotated for export; internals stay hidden. In a
static build the macro resolves empty (
TENSILELITEHOST_STATIC), so the public API stayshidden inside the consuming
.sotoo. The 100 deadTENSILE_HIDDEN_BEGIN/ENDno-opwrappers are removed.
tensilelite-hostlinks thestatic LLVM archives; as a shared library it re-exported ~3700 LLVM symbols at default
visibility (the preset governs the target's own code, not prebuilt archives). In a client
process those interpose on the HIP runtime's in-process comgr LLVM and corrupt its AMDGPU
cl::optregistry — comgr then rejects its own backend options at code-object load(
Unknown command line argument '-amdgpu-prelink'). Linking with--exclude-libs,ALL(scoped to
SHARED AND NOT WIN32) localizes the static-archive symbols; LLVM exports drop~3700 → ~10 with the public API unchanged.
dllexportfix. Annotate thefriend operator<<(TensorDescriptor)firstdeclaration with the export macro so clang-cl doesn't error on a redeclaration.
Device-library codegen (formerly "Phase 3")
HipBLASLtCodegen.cmake/HipBLASLtCodegenInstall.cmakemodules wire the Tensile device-library generation as areusable, exported step so a downstream consumer can produce
TensileLibrary_lazy_<arch>without re-deriving the codegen invocation.
find_packageon TheRock. When hipBLASLt is aprovided package (
HIPSPARSELT_ENABLE_THEROCK AND "hipblaslt" IN_LIST THEROCK_PROVIDED_PACKAGES) hipSPARSELt doesfind_package(hipblaslt CONFIG REQUIRED);otherwise it builds hipBLASLt in-tree via
add_subdirectory.Install-surface correctness
on the in-tree hipBLASLt subdirectory was suppressing the
add_custom_target(... ALL)thatgenerates the device library, so
TensileLibrary_lazy_<arch>.datnever installed and thehipSPARSELt suite failed every case with "Could not initialize Tensile library". Root fix:
gate origami's and stinkytofu's install/export/package rules on
<PROJ>_STANDALONE(top-level project ⇒ find_package'd ⇒ installs its surface;
add_subdirectory⇒ embedded⇒ installs nothing), matching what rocisa already does. hipSPARSELt then drops
EXCLUDE_FROM_ALL: hipBLASLt's device-library target builds and installs to
lib/hipsparselt/librarywith no origami/stinkytofu pollution in the install tree.tensilelite-hostships binary-only; theTensile host header file sets are attached for build-tree/IDE organization and consumed via
BUILD_INTERFACE, so the installedTensile/*.hppcollision problem cannot occur.TENSILE_USE_HIPnon-HIPDataTypespaths, un-gatedTENSILE_DEFAULT_SERIALIZATION, deleted the dead Tensile-fork glue and the old hipSPARSELtper-symbol workaround, and kept origami out of the installed package surface.
Diagnostics
[tensilelite:diag]logfmt+banner facility in theclient (
Diagnostic.hpp) mirrored in the python test harness, with auto config/gpu/phasecontext, to make client/codegen failures legible in CI.
CI fixes folded in
GlobalParameterskeys that fail validation on thecurrent loader.
Set changed size during iteration: the logic-file load usedParallelMap2(return_as="generator_unordered"). joblib ≥1.5.0's unordered_retrieveiterates
self._jobs_setunlocked (next(iter(...))) while the dispatcher thread mutatesit, racing into
RuntimeError: Set changed size during iteration(seen on the gfx950 /ubuntu-24 precheckin loading 648 logic files across 32 threads). Switched to ordered
return_as="generator", which waits onself._jobs[0]and never iterates_jobs_set.Result is unchanged — the loop merges into
masterLibrariesand sorts explicitly fordeterminism, so consumption order is irrelevant. (Latent since the call adopted unordered
mode in Jan 2025 [hipBLASLt] Remove debug parameter #1514; armed once CI picked up joblib ≥1.5.0.)
Validation (gfx90a / gfx942)
tensilelite-hostbuildslibtensilelite-host.a; the export carries itSTATIC IMPORTEDwithINTERFACE_COMPILE_DEFINITIONS TENSILELITEHOST_STATIC. The publicAPI stays hidden inside the consuming
.so, so neither consumer re-exports the TensileLitesurface.
TENSILELITE_BUILD_SHARED_LIBS=ON):libhipblaslt.soDT_NEEDEDs thesingle
libtensilelite-host.so.1; 0 strong-symbol duplicates between them. Remainingoverlaps are benign weak template/inline vague-linkage, loader-deduped. hipSPARSELt
resolves the same single lib.
exported, 0
cl::opt/target-registry symbols remain, TensileLite API exports unchanged.add_subdirectorypath (gfx942):TensileLibrary_lazy_gfx942.dat.zlibinstalls tolib/hipsparselt/library; the installtree contains no origami/stinkytofu headers, cmake exports, or libraries.
TheRock-visible delta
roc::tensilelite-hostisco-exported inside hipBLASLt's existing package; in the default static build there is no new
installed library at all.
find_packages hipBLASLt (TheRock,provided) or builds it in-tree, and the device library now installs to
lib/hipsparselt.consumer they install nothing.
AIHPBLAS-3522