Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does CDI mode support CUDA Forward Compatibility? #942

Open
wqlparallel opened this issue Feb 26, 2025 · 3 comments
Open

Does CDI mode support CUDA Forward Compatibility? #942

wqlparallel opened this issue Feb 26, 2025 · 3 comments

Comments

@wqlparallel
Copy link

hi, I was reviewing the CVE-2025-23359 security bulletin and noticed that the vulnerability does not affect CDI mode. While this is reassuring, I’d like to kindly ask for clarification on how CUDA Forward Compatibility is handled in CDI mode, particularly for containers built with newer CUDA Toolkits running on nodes with older NVIDIA Linux GPU drivers.

After inspecting /etc/cdi/nvidia.yaml, I see that nvidia-cdi-hook injects the path(e.g., /usr/lib64) which host’s libcuda path mount into the container’s /etc/ld.so.conf.d/00-nvcr-<RANDOM_STRING>.conf. However, I’m uncertain how this ensures compatibility for applications requiring CUDA Forward Compatibility (e.g., binding /usr/local/cuda/compat libraries). For example, if a container built with CUDA 12.2 (requiring driver ≥535) runs on a host with driver 525, I don’t see mechanisms in CDI specs to automatically include compatibility stubs.

I also came across PR #906, which introduced nvidia-cdi-hook compat-libs --driver-version 999.88.77 to address Forward Compatibility. This makes me wonder:

Before #906: Was CDI mode inherently unable to support CUDA Forward Compatibility due to missing library bindings?
After #906: Does enabling compatibility now require manual configuration (e.g., specifying --driver-version), or is this handled automatically in CDI spec generation?

@elezar
Copy link
Member

elezar commented Feb 27, 2025

Without the changes in #906 CDI mode does not support forward compatibility. After #906 has been merged and is generally available, there should be no user input and the CDI spec generation should take care of injecting the correct --driver-version for the hook.

@wqlparallel
Copy link
Author

Without the changes in #906 CDI mode does not support forward compatibility. After #906 has been merged and is generally available, there should be no user input and the CDI spec generation should take care of injecting the correct --driver-version for the hook.

Thanks for clarifying! We're observing CUDA Forward Compatibility hiccups in v1.17.4 and plan to adopt the release containing #906 changes. Could you share the expected timeline for the next version rollout? This will help us schedule the upgrade smoothly.

@wqlparallel wqlparallel reopened this Feb 28, 2025
@elezar
Copy link
Member

elezar commented Mar 18, 2025

@wqlparallel the v1.17.5 release should address this. Could you let us know if you're still having problems after updating?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants