Skip to content

Conversation

@technic960183
Copy link
Member

@technic960183 technic960183 commented Apr 29, 2025

Summary

This pull request introduces a configuration file and job submission scripts for running GAMER on the NERSC Perlmutter supercomputer, adding a new configuration file tailored for Perlmutter's environment and two job submission scripts for CPU and GPU nodes.

Changes

Configuration file:

  • configs/perlmutter.config: Added a new configuration file tailored for the NERSC Perlmutter system, including paths for CUDA, FFTW3, MPI, and HDF5 libraries, compiler settings, and GPU-specific flags for NVIDIA A100 GPUs.

Job Submission Scripts:

  • example/queue/submit_perlmutter_cpu.job: Added a new SLURM job submission script for running jobs on Perlmutter's CPU nodes.
  • example/queue/submit_perlmutter_gpu.job: Added a new SLURM job submission script for running jobs on Perlmutter's GPU nodes.

Usage

Before doing anything, please:

module load gcc-native/12.3

Tests

  • Run two Quick Start demos.
  • Submit a GPU node job with 4 nodes for the 3D blast wave.
  • Submit a CPU node job with 4 nodes for the 3D blast wave.
  • Run the regression tests.

For the regression test:
python regression_test.py --machine=perlmutter -e level2

> Test name           : Error code      Reason
> MHD_ABC_MHD         : SUCCESS         
> BlastWave_MHD       : COMPARISON      Fail data comparison.
> Riemann_MHD         : COMPARISON      Fail data comparison.
> BlastWave_Hydro     : COMPARISON      Fail data comparison.
> Riemann_Hydro       : COMPARISON      Fail data comparison.
> AcousticWave_Hydro  : SUCCESS         
> Riemann_SRHD        : SUCCESS         
> Plummer_Gravity     : SUCCESS 

But the regression test itself has problem on running on other machine. So this is just for a reference.

Performance

3D blastwave

256x256x256+MAX_LEVEL=3:

4 GPU nodes: 682.3 s.
2 GPU nodes: 700.6 s.
1 GPU nodes: 1029.6 s.
4 CPU nodes: 1551.0 s.
2 CPU nodes: 2585.1 s.
1 CPU nodes: 4444.3 s.

128x128x128+MAX_LEVEL=3:

4 GPU nodes: 354.8 s.
2 GPU nodes: 265.6 s.
1 GPU nodes: 329.3 s.

@hyschive hyschive requested a review from ChunYen-Chen May 2, 2025 03:18
@hyschive hyschive added enhancement general General issues and improvement labels May 2, 2025
Copy link
Collaborator

@ChunYen-Chen ChunYen-Chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@technic960183 Thanks for the contribution. I have left some comments.

@hyschive hyschive merged commit 8e073b3 into gamer-project:main May 3, 2025
@technic960183 technic960183 deleted the machine_perlmutter branch May 7, 2025 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement general General issues and improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants