You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on the AWS docs,
All of the optimized accelerated Amazon Linux AMIs have Nvidia Driver pre-installed. So, I disabled the driver installation when I install the GPU operator. (Even I enabled the driver installation on GPU operator, it did not overwrite the exiting driver in the host.)
Then, one issue appeared that there is no way to upgrade the Nvidia driver in the amazon host when I upgrade the GPU operator.
For example, the Nvidia driver in my existing GPU host is, 470.182.03
But the newer Nvidia driver coming with GPU operator is: 535.129.03
After the upgrading GPU operator to the latest version, v23.9.1, the Nvidia driver in the host is still 470.182.03.
I thought that the GPU operator can help me to manage the Nvidia driver in the host under my situation. But it did not.
Is it possible for me to upgrade the Nvidia driver by upgrading the GPU operator? How?
3. Steps to reproduce the issue
1: build a EKS and a managed node group with optimized accelerated Amazon Linux AMIs.
2: install the GPU operator by helm chart and disable the driver installation.
driver:
enabled: false
3: upgrade the GPU operator and check the Nvidia driver.
The driver version will not be changed after the upgrading of GPU operator.
4. Information to attach (optional if deemed irrelevant)
[root@ip-10-10-20-111 /]# nvidia-smi -q | head
==============NVSMI LOG==============
Timestamp : Fri Jan 26 23:11:04 2024
Driver Version : 470.182.03
CUDA Version : 11.4
Attached GPUs : 1
GPU 00000000:00:1E.0
Product Name : Tesla T4
The text was updated successfully, but these errors were encountered:
1. Quick Debug Information
2. Issue or feature description
I am using the amazon optimized accelerated Amazon Linux AMIs to build a managed node group in EKS cluster to support GPU.
(https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html#gpu-ami)
Based on the AWS docs,
All of the optimized accelerated Amazon Linux AMIs have Nvidia Driver pre-installed. So, I disabled the driver installation when I install the GPU operator. (Even I enabled the driver installation on GPU operator, it did not overwrite the exiting driver in the host.)
Then, one issue appeared that there is no way to upgrade the Nvidia driver in the amazon host when I upgrade the GPU operator.
For example, the Nvidia driver in my existing GPU host is,
470.182.03
But the newer Nvidia driver coming with GPU operator is:
535.129.03
After the upgrading GPU operator to the latest version, v23.9.1, the Nvidia driver in the host is still
470.182.03
.I thought that the GPU operator can help me to manage the Nvidia driver in the host under my situation. But it did not.
Is it possible for me to upgrade the Nvidia driver by upgrading the GPU operator? How?
3. Steps to reproduce the issue
1: build a EKS and a managed node group with optimized accelerated Amazon Linux AMIs.
2: install the GPU operator by helm chart and disable the driver installation.
3: upgrade the GPU operator and check the Nvidia driver.
The driver version will not be changed after the upgrading of GPU operator.
4. Information to attach (optional if deemed irrelevant)
The text was updated successfully, but these errors were encountered: