[pull] master from NVIDIA:master#80
Open
pull[bot] wants to merge 617 commits into
Open
Conversation
Fix Ansible Galaxy dependencies to address Molecule test failures
Clarify hybrid cluster documentation
Enhanced nccl tests slurm validation playbook.
In tasks that interact with SELinux, add a check so we skip the task if SELinux is fully disabled (rather than just in permissive mode)
Update release update instructions
Github is deprecating git.io on April 29, 2022: https://github.blog/changelog/2022-04-25-git-io-deprecation/ Change our shortlinks not to use this url shortener
git.io is being deprecated, fix shortlinks
Check for SELinux disabled in Ansible tasks
DeepOps Release 22.04
Update Ansible to match Kubespray supported versions
See https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/ for details on the key changes. This commit: - Removes the nvidia-ml repo, which is deprecated and will not be updated - Updates the nvidia_cuda and nvidia_dcgm roles to use the new key and install workflow - Updates roles/requirements.txt to point to an updated version of nvidia.nvidia_driver
Update NVIDIA signing key
Update default Slurm version to 21.08.8
This specifies the env path for the sudo user.
Add a basic playbook for installing Mellanox OFED
Trident fix alt
…CL container image name and location
Share slurm.conf to have same of it - in case of configuring nodes and/or partitions. Locate slurm.conf at /sw/.slurm/slurm.conf of controller where the /sw is nfs mount point to all of nodes. Let /etc/slurm/slurm.conf be a soft link to it from nodes and controller. Signed-off-by: Seyong Um <seyong.um@hyundai.com>
The flag slurm_conf_symlink will allow modes to share slurm.conf via nfs. Signed-off-by: Seyong Um <seyong.um@hyundai.com>
chore(release): bump component versions for 26.05
release: 26.05 notes and README tag
feat(dgx): update DGX software stack role
Refresh CUDA example images
Refresh DCGM exporter
Add Ubuntu 24.04 NVIDIA Container Toolkit path
Retire legacy PXE provisioning paths
docs: clarify legacy CI and virtual lab status
Support Red Hat NVIDIA Container Toolkit path
Refresh Container Toolkit airgap docs
docs: clarify current OS validation targets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )