Update Forerunner I job script #503

hsinhaoHHuang · 2026-01-03T05:56:35Z

Issue

Encountered by the user, the old job script doesn't work for Forerunner-I now.
We previously expect the NUMA nodes to be the same in all the nodes on Forerunner-I, and it should launch 16 MPI ranks per node and 7 OpenMP threads per rank (112 CPUs per node in total).

Now, this is still the same for some nodes. However, for some other nodes, it will request only 4 MPI ranks, and there are 28 OpenMP threads per rank. Worse, the CPU core IDs in Record__Note in the same rank are repeated 4 times. Only 28 CPUs in total are actually used per node. This will make the performance bad because not all the resources on the nodes are used, and the parallelization is not full.

Change

As @koarakawaii once suggested to me a while ago, we can use -map-by ppr:16:node:pe=7 instead to avoid this problem.

fix: use node instead of numa in ForerunnerI job script

d74556e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Forerunner I job script #503

Update Forerunner I job script #503

Uh oh!

hsinhaoHHuang commented Jan 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update Forerunner I job script #503

Are you sure you want to change the base?

Update Forerunner I job script #503

Uh oh!

Conversation

hsinhaoHHuang commented Jan 3, 2026

Issue

Change

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant