Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA forward compatibility hook #948

Merged
merged 6 commits into from
Feb 28, 2025

Conversation

elezar
Copy link
Member

@elezar elezar commented Feb 27, 2025

With #877 the default behaviour of the NVIDIA Container Runtime / NVIDIA Container Runtime Hook was changed to not mount compat libraries from the container into the container. This removed "automatic" support for CUDA Forward compatibility.

This change attempts to address this by adding a createContainerHook that will create a file in /etc/ld.so.conf.d/ in the container to ensure that the /usr/local/cuda/compat libraries are added to the ldcache over the libraries mounted from the host. The provided host diver version is compared to the version of the compat libraries in the container and the config update is only performed if the compat libraries are newer than the host drivers.

Note that the hook only creates a file in the container's file system and does not perform any mount operations. This means that this mechanism is not present the same vulnerabilities causing CVE-2024-0132 and CVE-2025-23359.

In the case of the legacy runtime, this behaviour is only triggered if the allow-cuda-compat-libs-from-container feature flag is not enabled. The CDI spec generation has also been extended to include this hook.

This backports #906

This change adds an nvidia-cdi-hook enable-cuda-compat hook that checks the
container for cuda compat libs and updates /etc/ld.so.conf.d to include their
parent folder if their driver major version is sufficient.

This allows CUDA Forward Compatibility to be used when this is not available
through the libnvidia-container.

Signed-off-by: Evan Lezar <[email protected]>
This change adds the enable-cuda-compat hook to the incomming OCI runtime spec
if the allow-cuda-compat-libs-from-container feature flag is not enabled.

An update-ldcache hook is also injected to ensure that the required folders
are processed.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar added this to the v1.17.5 milestone Feb 27, 2025
@elezar elezar self-assigned this Feb 27, 2025
@elezar elezar force-pushed the add-compat-lib-hook branch from 3307cb1 to c1bac28 Compare February 27, 2025 15:35
@elezar elezar merged commit f5680dd into NVIDIA:release-1.17 Feb 28, 2025
10 checks passed
@elezar elezar deleted the add-compat-lib-hook branch February 28, 2025 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants