Skip to content

[pull] master from NVIDIA:master#80

Open
pull[bot] wants to merge 617 commits into
jolorunyomi:masterfrom
NVIDIA:master
Open

[pull] master from NVIDIA:master#80
pull[bot] wants to merge 617 commits into
jolorunyomi:masterfrom
NVIDIA:master

Conversation

@pull

@pull pull Bot commented Dec 4, 2021

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull Bot added the ⤵️ pull label Dec 4, 2021
ajdecon and others added 29 commits April 20, 2022 12:05
Fix Ansible Galaxy dependencies to address Molecule test failures
Clarify hybrid cluster documentation
Enhanced nccl tests slurm validation playbook.
In tasks that interact with SELinux, add a check so we skip the task if
SELinux is fully disabled (rather than just in permissive mode)
Update release update instructions
Github is deprecating git.io on April 29, 2022:
https://github.blog/changelog/2022-04-25-git-io-deprecation/

Change our shortlinks not to use this url shortener
git.io is being deprecated, fix shortlinks
Check for SELinux disabled in Ansible tasks
Update Ansible to match Kubespray supported versions
See
https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/
for details on the key changes.

This commit:

- Removes the nvidia-ml repo, which is deprecated and will not be
updated
- Updates the nvidia_cuda and nvidia_dcgm roles to use the new key and
install workflow
- Updates roles/requirements.txt to point to an updated version of
nvidia.nvidia_driver
Update default Slurm version to 21.08.8
This specifies the env path for the sudo user.
Add a basic playbook for installing Mellanox OFED
Share slurm.conf to have same of it - in case of configuring nodes and/or partitions.
Locate slurm.conf at /sw/.slurm/slurm.conf of controller where the /sw is nfs mount point to all of nodes.
Let /etc/slurm/slurm.conf be a soft link to it from nodes and controller.

Signed-off-by: Seyong Um <seyong.um@hyundai.com>
The flag slurm_conf_symlink will allow modes to share slurm.conf via nfs.

Signed-off-by: Seyong Um <seyong.um@hyundai.com>
dholt and others added 30 commits May 14, 2026 17:01
chore(release): bump component versions for 26.05
feat(dgx): update DGX software stack role
Add Ubuntu 24.04 NVIDIA Container Toolkit path
docs: clarify legacy CI and virtual lab status
Support Red Hat NVIDIA Container Toolkit path
docs: clarify current OS validation targets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.