Skip to content

Commit 0328dbf

Browse files
committed
Upgrade Nvidia Drivers, CUDA and fabric manager to 470.82
The latest version of Tesla driver is 470.82 that is not aligned to the latest Fabric manager version available (495.29). We're upgrading the three packages to stay aligned with the latest version of drivers available, so: * Nvidia and Fabric manager from 460.73 to 470.82 * CUDA from 11.3 to 11.4 (we're avoid the upgrade to 11.5 to keep driver version aligned) See: * https://docs.nvidia.com/datacenter/tesla/index.html * https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/ Note: the name of the Fabric Manager package in the repo has been changed. Signed-off-by: Enrico Usai <[email protected]>
1 parent 3f538f6 commit 0328dbf

File tree

2 files changed

+14
-7
lines changed

2 files changed

+14
-7
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ This file is used to list changes made in each version of the AWS ParallelCluste
1111
- Upgrade Slurm to version 20.11.8.
1212
- Upgrade Cinc Client to version 17.2.29.
1313
- Upgrade NICE DCV to version 2021.2-11190.
14+
- Upgrade NVIDIA driver to version 470.82.01.
15+
- Upgrade CUDA library to version 11.4.3.
16+
- Upgrade NVIDIA Fabric manager to `nvidia-fabricmanager-470`.
1417
- Disable unattended upgrades for Ubuntu.
1518

1619
2.11.3

attributes/default.rb

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -123,13 +123,17 @@
123123

124124
# NVIDIA
125125
default['cfncluster']['nvidia']['enabled'] = 'no'
126-
default['cfncluster']['nvidia']['driver_version'] = '460.73.01'
127-
default['cfncluster']['nvidia']['driver_url'] = 'https://us.download.nvidia.com/tesla/460.73.01/NVIDIA-Linux-x86_64-460.73.01.run'
128-
default['cfncluster']['nvidia']['cuda_version'] = '11.3'
129-
default['cfncluster']['nvidia']['cuda_url'] = 'https://developer.download.nvidia.com/compute/cuda/11.3.0/local_installers/cuda_11.3.0_465.19.01_linux.run'
130-
131-
# NVIDIA fabric-manager
132-
default['cfncluster']['nvidia']['fabricmanager']['package'] = "nvidia-fabricmanager-460"
126+
default['cfncluster']['nvidia']['driver_version'] = '470.82.01'
127+
default['cfncluster']['nvidia']['driver_url'] = 'https://us.download.nvidia.com/tesla/470.82.01/NVIDIA-Linux-x86_64-470.82.01.run'
128+
default['cfncluster']['nvidia']['cuda_version'] = '11.4'
129+
default['cfncluster']['nvidia']['cuda_url'] = 'https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run'
130+
131+
# The package name of Fabric Manager for alinux2 and centos7 is nvidia-fabric-manager-<version>
132+
# For ubuntu, it is nvidia-fabricmanager-<major-version>_<version>
133+
default['cfncluster']['nvidia']['fabricmanager']['package'] = value_for_platform(
134+
'default' => "nvidia-fabric-manager",
135+
'ubuntu' => { 'default' => "nvidia-fabricmanager-470" }
136+
)
133137
default['cfncluster']['nvidia']['fabricmanager']['repository_key'] = "7fa2af80.pub"
134138
default['cfncluster']['nvidia']['fabricmanager']['version'] = value_for_platform(
135139
'default' => node['cfncluster']['nvidia']['driver_version'],

0 commit comments

Comments
 (0)