Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow enable-cuda-compat hook to be disabled in CDI spec generation #968

Merged
merged 2 commits into from
Mar 7, 2025

Conversation

elezar
Copy link
Member

@elezar elezar commented Mar 7, 2025

This change adds support to the nvcdi package to opt out of specific hooks.

Currently only the enable-cuda-compat hook is supported. This allows clients to generate a CDI spec that is compatible with older nvidia-cdi-hook CLIs.

This is required in cases where a tool such as the k8s-device-plugin or k8s-dra-driver-gpu needs to generate a CDI specification for a version of the NVIDIA Container Toolkit that does not support this hook.

@elezar elezar requested a review from jgehrcke March 7, 2025 12:42
@elezar elezar self-assigned this Mar 7, 2025
@elezar elezar force-pushed the allow-hooks-disable branch from 3bf1a8b to 7dc7e69 Compare March 7, 2025 12:47
@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Mar 7, 2025
@elezar elezar force-pushed the allow-hooks-disable branch from 7dc7e69 to 5785120 Compare March 7, 2025 12:49
@ArangoGutierrez ArangoGutierrez requested a review from Copilot March 7, 2025 14:02
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ArangoGutierrez ArangoGutierrez requested a review from Copilot March 7, 2025 14:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ArangoGutierrez ArangoGutierrez requested a review from Copilot March 7, 2025 14:15
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

const (
// HookEnableCudaCompat refers to the hook used to enable CUDA Forward Compatibility.
// This was added with v1.17.5 of the NVIDIA Container Toolkit.
HookEnableCudaCompat = HookName("enable-cuda-compat")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but we should have as TODO to have a central place to define this internal names to make it easier to edit/change a hook name in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should refactor the hook handling to have them defined more consistently. Some of this falls in the scope of pulling the CDI spec generation into go-nvlib instead of implementing it here in the NVIDIA Container Toolkit. This would also mean that the device plugin and the dra driver do not need to depend on the nvidia-container-toolkit.

elezar added 2 commits March 7, 2025 16:54
This change adds support to the nvcdi package to opt out of specific hooks.

Currently only the `enable-cuda-compat` hook is supported. This allows clients to
generate a CDI spec that is compatible with older nvidia-cdi-hook CLIs.

Signed-off-by: Evan Lezar <[email protected]>
Management containers don't generally need forward compatibility.
We disable the enable-cuda-compat hook to not include this in the
generated CDI specifications.

Signed-off-by: Evan Lezar <[email protected]>
@elezar elezar force-pushed the allow-hooks-disable branch from 5785120 to 0f299c3 Compare March 7, 2025 14:54
Copy link

copy-pr-bot bot commented Mar 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@elezar elezar merged commit e436533 into NVIDIA:main Mar 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
must-backport The changes in PR need to be backported to at least one stable release branch.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants