Skip to content

publish manylinux wheel to PyPI (Linux x86_64, CUDA 13) #81

@nclack

Description

@nclack

Goal

Make damacy installable via pip install damacy on a host with only the NVIDIA driver — no CUDA toolkit, no nvcomp, no flake.nix, no Docker. Today the only paths are source build (full toolchain required) or the dev image; both are heavy lifts for a coworker who just wants to try it.

Scope: one platform

  • Linux x86_64, manylinux_2_28_x86_64
  • CUDA 13 (single major; CUDA 12 only if asked)
  • Python 3.11 / 3.12 / 3.13 (cp wheels — pyproject.toml already sets wheel.py-api = \"\")

Skip aarch64, Windows, conda-forge for v1. conda-forge is a separate recipe and isn't gated on the PyPI wheel.

Bundling strategy

Bundle directly into the wheel (no Requires-Dist: nvidia-*-cu13):

  • libcudart.so.13
  • libnvcomp.so + any nvCOMP plugins that get dlopen'd at runtime
  • libcufile.so.0 + the matching cufile.json (compat-mode enabled, same as the devShell ships at ${cudaPkgs.libcufile}/etc/cufile.json)

Wheel is large (~150-200 MB) but self-contained. Rationale: avoids version-pin friction with PyTorch's nvidia-* wheels, and PyTorch doesn't ship nvcomp / cufile anyway so there's nothing to share. libcuda.so.1 stays a host dep (driver lib, not bundleable).

libmount.so.1 / libudev.so.1 (transitive dlopens from cufile init) stay host deps too — both are present on any modern Linux.

Python shim

damacy/__init__.py patches the loader path before from damacy import _native:

  • Prepend the bundled lib dirs to allow nested dlopens (libnvcomp finding its plugins, libcufile finding libcudart, etc).
  • Set CUFILE_ENV_PATH_JSON to the bundled cufile.json unless the caller has already set it.

Pattern crib from PyTorch's torch/__init__.py lib-path injection.

Build plumbing

  • New CI workflow building inside quay.io/pypa/manylinux_2_28_x86_64 with CUDA toolkit + nvcomp redist tarball installed on top of it. The existing Dockerfile is nvidia/cuda:13.2.1-devel-ubuntu24.04 — useful for reference but not manylinux-tagged.
  • scikit-build-core already drives the build via pyproject.toml; no source changes needed in CMakeLists.
  • auditwheel repair won't catch dlopen'd libs on its own. Either extend the audit step manually or ship a small post-build script that copies libcufile / libnvcomp into the wheel's data dir and patches RUNPATH.
  • Publish to TestPyPI first; promote to PyPI after a fresh-host smoke install confirms import damacy works.

Where the real work lives

Compile is fast (single-digit minutes). The time goes into the dlopen / RPATH iteration loop: build wheel → install in toolchain-less venv → import damacy → debug the next missing transitive dlopen → repeat. Plan accordingly.

Key files

  • pyproject.tomlscikit-build-core config; already wheel-ready
  • Dockerfile — current CUDA-on-Ubuntu build; the manylinux variant cribs from this
  • flake.nix:106-135 — the canonical env-var wiring (CUFILE_ENV_PATH_JSON, Nvcomp_ROOT, LD_LIBRARY_PATH ordering) the Python shim has to replicate at runtime
  • python/damacy/__init__.py — where the lib-path shim goes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions