Skip to content

Use only a single GPU/numa node for distributed tests#1121

Merged
msimberg merged 3 commits intoC2SM:mainfrom
msimberg:distributed-single-gpu
Mar 24, 2026
Merged

Use only a single GPU/numa node for distributed tests#1121
msimberg merged 3 commits intoC2SM:mainfrom
msimberg:distributed-single-gpu

Conversation

@msimberg
Copy link
Copy Markdown
Contributor

Since tests are currently anyway serialized, I think we don't benefit from running across the full node. This uses MPS to run all four ranks on a single GPU. This is based on #819.

@msimberg msimberg force-pushed the distributed-single-gpu branch from 1fa1486 to 3e39f63 Compare March 23, 2026 19:59
@msimberg
Copy link
Copy Markdown
Contributor Author

cscs-ci run distributed

@github-actions
Copy link
Copy Markdown

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Comment thread ci/distributed.yml
- echo "using virtual environment at ${UV_PROJECT_ENVIRONMENT}"
- source ${UV_PROJECT_ENVIRONMENT}/bin/activate
- echo "running with $(python --version)"
- source ci/scripts/start-cuda-mps.sh
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not strictly needed, yet, but it'll be needed for #1012.

@msimberg msimberg requested a review from edopao March 23, 2026 21:30
@msimberg msimberg marked this pull request as ready for review March 23, 2026 21:30
Copy link
Copy Markdown
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Mar 24, 2026

cscs-ci run default

@edopao
Copy link
Copy Markdown
Contributor

edopao commented Mar 24, 2026

cscs-ci run distributed

@msimberg msimberg merged commit 4a3ecaa into C2SM:main Mar 24, 2026
54 checks passed
@msimberg msimberg deleted the distributed-single-gpu branch March 24, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants