-
Notifications
You must be signed in to change notification settings - Fork 67
GPU Driver Container Won't Start #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@BHSDuncan I would recommend to try CNS 10.4 or CNS 11.1 with If you want driver as part of GPU Operator then I would recommend to wait to hear from GPU Operator team. |
But that will install a driver on the host itself, right? I'd prefer to avoid installing anything on the machine and keep the driver in the cluster. For that, you're saying I'll need to wait for the GPU Operator team? If so, they've made it known they're working on a fix. Once the fix is in place, will the CNS playbooks need updating? |
yeah if you look at the comment NVIDIA/gpu-operator#564 (comment) so with latest kernel the current Operator fixed, will validate with CNS and then if it requires any changes will make the changes to CNS as well and let you know |
@BHSDuncan CNS is updated with new Operator version, please check |
Essentially I'm seeing what's in this ticket: NVIDIA/gpu-operator#564 (when I start up my machine running a cluster with a version of CNS installed, currently an old one, like 9.x)
...and because I'm using one of the playbooks from this repo, I'm not sure how to resolve this issue.
I'm also unsure as to why the issue is happening now...I've been running this on a machine since last fall, but the issue linked above pre-dates it.
Will updating to the latest CNS version solve this issue? Or will it still be a problem, given that it looks like the install.sh and Dockerfile(s) are pretty much the same. (I'll probably try doing this anyway on a test box but I wanted to ask here as well.)
Thank you.
The text was updated successfully, but these errors were encountered: