Skip to content

Conversation

himani2411
Copy link
Contributor

@himani2411 himani2411 commented Aug 7, 2025

Description of changes

  • Add Chef-Attribute to configure Imex-install on any instance
  • By dfeault chef keeps the value of this attribute as nil

Tests

DevSettings:
  Cookbook:
    ChefCookbook: https://github.com/himani2411/aws-parallelcluster-cookbook/nvidia-imex-install/tarball
ExtraChefAttributes: |
    {"cluster":{"p6egb200_block_sizes":"1","nvidia":{"imex":{"force_configuration":"true"}}}}
  • Contains
ls -al /opt/parallelcluster/shared/nvidia-imex/
total 20
drwxr-xr-x. 2 root root   110 Aug  8 19:32 .
drwxr-xr-x. 5 root root  4096 Aug  8 19:28 ..
-rwxr-xr-x  1 root root 10662 Aug  8 19:32 config_LaunchTemplateA7211c84b953696f.cfg
-rwxr-xr-x  1 root root   129 Aug  8 19:32 nodes_config_LaunchTemplateA7211c84b953696f.cfg

cat /opt/parallelcluster/shared/nvidia-imex/nodes_config_LaunchTemplateA7211c84b953696f.cfg
## Please replace below fake IP's with correct IP address of launched instances in Gb200 Capacity Block
172.31.51.93
172.31.48.43

cat /opt/parallelcluster/shared/nvidia-imex/config_LaunchTemplateA7211c84b953696f.cfg  | grep IMEX_NODE_CONFIG_FILE
IMEX_NODE_CONFIG_FILE=/opt/parallelcluster/shared/nvidia-imex/nodes_config_LaunchTemplateA7211c84b953696f.cfg

systemctl status nvidia-imex
○ nvidia-imex.service - NVIDIA IMEX service
     Loaded: loaded (/etc/systemd/system/nvidia-imex.service; enabled; preset: disabled)
     Active: inactive (dead) since Fri 2025-08-08 19:32:55 UTC; 25min ago
        CPU: 587ms

Aug 08 19:32:55 queue1-st-cr1-1 nvidia-imex[2403]: [Aug 08 2025 19:32:55] [INFO] [tid 2403] Logging file name/path = /var/log/nvidia-imex.log
Aug 08 19:32:55 queue1-st-cr1-1 nvidia-imex[2403]: [Aug 08 2025 19:32:55] [INFO] [tid 2403] Append to log file = 1
Aug 08 19:32:55 queue1-st-c

References

#2996

Checklist

  • Make sure you are pointing to the right branch.
  • If you're creating a patch for a branch other than develop add the branch name as prefix in the PR title (e.g. [release-3.6]).
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@himani2411 himani2411 enabled auto-merge (rebase) August 8, 2025 20:14
@himani2411 himani2411 merged commit 433b130 into aws:develop Aug 11, 2025
28 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants