fix(build): probe CUDA toolkit layouts in a shared openinfer-build crate by FeathBow · Pull Request #343 · openinfer-project/openinfer

FeathBow · 2026-06-10T18:36:18Z

Description

Fixes #342

CUDA toolkit discovery was duplicated across build scripts, each assuming the classic /usr/local/cuda layout (lib64/ for libs, include/ for headers). Two real layouts break it: conda/micromamba (libs in lib/, headers in targets/<arch>-linux/include/) and the NVIDIA HPC SDK (cuBLAS in a math_libs/<ver> sibling tree). This PR concentrates discovery in a shared openinfer-build crate: find_package probes several check files per root, and cuda_libs probes lib64/lib/targets/<arch>/lib plus the math_libs sibling, emitting only dirs that exist. The evidenced sites (openinfer-kernels, cuda-sys, cudart-sys) migrate to it; gdrapi-sys/libibverbs-sys keep their behavior through the same helper.

Before

Dual-GH200 (aarch64, sm_90), NVIDIA HPC SDK toolkit: linking openinfer-kernels fails; the only workaround was a manual LIBRARY_PATH export.
Single GPU (x86_64, sm_89), conda toolkit: the cuda-sys/cudart-sys build scripts panic on the header probe, taking cargo test --workspace down with them.

Error logs

# Dual-GH200 (aarch64, sm_90), HPC SDK toolkit — openinfer-kernels link stage
ld: cannot find -lcublas: No such file or directory
ld: cannot find -lcublasLt: No such file or directory

# Single GPU (x86_64, sm_89), conda toolkit — openinfer-comm-cuda-sys build script
cuda-sys build error: required header `include/cuda.h` not found.
Looked at `$CUDA_HOME` (set to ".../envs/<conda-env>") and default paths ["/usr/local/cuda"]

After

Dual-GH200 (aarch64, sm_90), NVIDIA HPC SDK toolkit: openinfer-kernels relinks with LIBRARY_PATH unset; the workaround is deleted.
Single GPU (x86_64, sm_89), conda toolkit: cuda-sys/cudart-sys build, openinfer-kernels relinks, and the Qwen3-4B golden gate runs green on the fixed tree.
Layout unit tests pass on both machines and on a CUDA-less host (the crate has no CUDA dependency).

Verification logs

== buildfix verify, Dual-GH200 (aarch64, sm_90), HPC SDK toolkit ==
LIBRARY_PATH=unset
=A= openinfer-build unit tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s
=B= kernels relink (gate --no-run)
    Finished `release` profile [optimized] target(s) in 45.60s

== buildfix verify, Single GPU (x86_64, sm_89), conda toolkit ==
=A= openinfer-build unit tests
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
=B= cuda-sys
    Finished `release` profile [optimized] target(s) in 1.89s
=C= cudart-sys
    Finished `release` profile [optimized] target(s) in 1.23s
=D= kernels relink (gate --no-run)
    Finished `release` profile [optimized] target(s) in 38.85s
=E= golden gate full run
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 38.05s

== buildfix verify, CUDA-less dev host (macOS arm64) ==
test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Type of Change

Bug fix (non-breaking change which fixes an issue)

Checklist

My code follows the style guidelines of this project (see docs/conventions/coding-style.md).
I have performed a self-review of my own code.
I have formatted my commits according to Commitizen conventions.
I have run the local test suite and all tests pass (see CLAUDE.md).

fix(build): probe CUDA toolkit layouts in a shared openinfer-build crate

a71b566

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(build): probe CUDA toolkit layouts in a shared openinfer-build crate#343

fix(build): probe CUDA toolkit layouts in a shared openinfer-build crate#343
FeathBow wants to merge 1 commit into
openinfer-project:mainfrom
FeathBow:fix/build-cuda-discovery

FeathBow commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FeathBow commented Jun 10, 2026

Description

Before

After

Type of Change

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant