Update Forerunner I job script #503
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Encountered by the user, the old job script doesn't work for Forerunner-I now.
We previously expect the NUMA nodes to be the same in all the nodes on Forerunner-I, and it should launch 16 MPI ranks per node and 7 OpenMP threads per rank (112 CPUs per node in total).
Now, this is still the same for some nodes. However, for some other nodes, it will request only 4 MPI ranks, and there are 28 OpenMP threads per rank. Worse, the CPU core IDs in
Record__Notein the same rank are repeated 4 times. Only 28 CPUs in total are actually used per node. This will make the performance bad because not all the resources on the nodes are used, and the parallelization is not full.Change
As @koarakawaii once suggested to me a while ago, we can use
-map-by ppr:16:node:pe=7instead to avoid this problem.