Does CDI mode support CUDA Forward Compatibility? #942

wqlparallel · 2025-02-26T08:30:29Z

hi, I was reviewing the CVE-2025-23359 security bulletin and noticed that the vulnerability does not affect CDI mode. While this is reassuring, I’d like to kindly ask for clarification on how CUDA Forward Compatibility is handled in CDI mode, particularly for containers built with newer CUDA Toolkits running on nodes with older NVIDIA Linux GPU drivers.

After inspecting /etc/cdi/nvidia.yaml, I see that nvidia-cdi-hook injects the path(e.g., /usr/lib64) which host’s libcuda path mount into the container’s /etc/ld.so.conf.d/00-nvcr-<RANDOM_STRING>.conf. However, I’m uncertain how this ensures compatibility for applications requiring CUDA Forward Compatibility (e.g., binding /usr/local/cuda/compat libraries). For example, if a container built with CUDA 12.2 (requiring driver ≥535) runs on a host with driver 525, I don’t see mechanisms in CDI specs to automatically include compatibility stubs.

I also came across PR #906, which introduced nvidia-cdi-hook compat-libs --driver-version 999.88.77 to address Forward Compatibility. This makes me wonder:

Before #906: Was CDI mode inherently unable to support CUDA Forward Compatibility due to missing library bindings?
After #906: Does enabling compatibility now require manual configuration (e.g., specifying --driver-version), or is this handled automatically in CDI spec generation?

The text was updated successfully, but these errors were encountered:

elezar · 2025-02-27T08:49:18Z

Without the changes in #906 CDI mode does not support forward compatibility. After #906 has been merged and is generally available, there should be no user input and the CDI spec generation should take care of injecting the correct --driver-version for the hook.

wqlparallel · 2025-02-28T08:31:02Z

Without the changes in #906 CDI mode does not support forward compatibility. After #906 has been merged and is generally available, there should be no user input and the CDI spec generation should take care of injecting the correct --driver-version for the hook.

Thanks for clarifying! We're observing CUDA Forward Compatibility hiccups in v1.17.4 and plan to adopt the release containing #906 changes. Could you share the expected timeline for the next version rollout? This will help us schedule the upgrade smoothly.

elezar · 2025-03-18T11:28:24Z

@wqlparallel the v1.17.5 release should address this. Could you let us know if you're still having problems after updating?

wqlparallel closed this as completed Feb 28, 2025

wqlparallel reopened this Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does CDI mode support CUDA Forward Compatibility? #942

Does CDI mode support CUDA Forward Compatibility? #942

wqlparallel commented Feb 26, 2025

elezar commented Feb 27, 2025

wqlparallel commented Feb 28, 2025

elezar commented Mar 18, 2025

Does CDI mode support CUDA Forward Compatibility? #942

Does CDI mode support CUDA Forward Compatibility? #942

Comments

wqlparallel commented Feb 26, 2025

elezar commented Feb 27, 2025

wqlparallel commented Feb 28, 2025

elezar commented Mar 18, 2025