Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update logic to set environment for calls out to nvidia-smi #110

Merged
merged 1 commit into from
May 7, 2024

Conversation

klueska
Copy link
Collaborator

@klueska klueska commented May 7, 2024

Previously, we were setting the envvars needed for nvidia-smi in the plugin's environment itself. This caused errors, however, when shelling out to other binaries that didn't need these envvars set.

This change pushes the setting of the envvars into the actual exec call for nvidia-smi instead of setting them in the plugin's environment itself.

Closes #106

@klueska klueska requested a review from elezar May 7, 2024 08:53
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions / comments.

cmd/nvidia-dra-plugin/nvlib.go Outdated Show resolved Hide resolved
@@ -495,11 +511,19 @@ func (l deviceLib) setTimeSlice(uuids []string, timeSlice int) error {
}

func (l deviceLib) setComputeMode(uuids []string, mode string) error {
// In order for nvidia-smi to run, we need to set the PATH to the parent of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought about it, but purposefully didn't since there are only 2 instances of this call and we want to drop the second one soon.

cmd/nvidia-dra-plugin/nvlib.go Outdated Show resolved Hide resolved
@klueska klueska force-pushed the fix-mps-mounting branch from 1e70595 to ff86e5c Compare May 7, 2024 11:10
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@klueska
Copy link
Collaborator Author

klueska commented May 7, 2024

Just tested this before merging and it doesn't seem to work:

error starting MPS control daemon: error setting compute mode: error running nvidia-smi: exec: "nvidia-smi": executable file not found in $PATH

@klueska
Copy link
Collaborator Author

klueska commented May 7, 2024

I think wrapping the call in a "bash -c" would likely work.

@klueska klueska force-pushed the fix-mps-mounting branch 2 times, most recently from f44f0ba to b338e30 Compare May 7, 2024 12:51
Previously, we were setting the envvars needed for nvidia-smi in the plugin's
environment itself. This caused errors, however, when shelling out to other
binaries that didn't need these envvars set.

This change pushes the setting of the envvars into the actual exec call for
nvidia-smi instead of setting them in the plugin's environment itself.

Closes NVIDIA#106

Signed-off-by: Kevin Klues <[email protected]>
@klueska klueska force-pushed the fix-mps-mounting branch from b338e30 to 5bfc7f8 Compare May 7, 2024 12:51
@klueska klueska merged commit 3dda7bf into NVIDIA:main May 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems with MPS quickstart
2 participants