Implement OpenMP offload for do group update #1782

edoyango · 2025-10-10T04:58:11Z

Description
This adds the necessary code to make mpp_do_group_update work with arrays that are managed by NVIDIA's OpenMP offload runtime. This attempts to be minimally disruptive in that non-nvidia compilers will see the same behaviour as previously by adding macros around the relevant openmp directives.

Fixes #1771

How Has This Been Tested?
The OpenMP offload capability is currently tested on the "double gyre" case in MOM6-examples using the nvfortran compiler and a cuda-aware openmpi. We have some notes on how to run the gpu-enabled MOM6, but is outdated.

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
New check tests, if applicable, are included
make distcheck passes

* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call

nikizadehgfdl · 2025-12-10T21:01:31Z

mpp/include/group_update_unpack.inc

                 do i = is, ie
-                    pos = pos + 1
-                    field(i,j,k) = buffer(pos)
+                    idx = pos + (k-1)*nj*ni + (j-js)*ni + (i-is) + 1


How are these two implementations equivalent? Is new idx = old pos always?

Yes, they're equivalent. For any iteration, idx = pos + (k-1)*nj*ni + (j-js)*ni + (i-is) + 1 produces the same value that pos would have had at that point. The formula accounts for all the iterations that would have occurred in the nested loops up to that (i,j,k) position.

The reason for the change is that each nested iteration is now independent and can be performed in parallel.

To enable this, had to be removed - otherwise segfaults happen on the GPU.

edoyango and others added 5 commits October 10, 2025 15:50

add gpu2gpu mpi transer with flag for do_group_update

19fedef

add missing collapse(3) clauses

f81247b

Use __NVCOMPILER macro for target regions

9f068e3

add back old omp directive wrapped in #ifndef __NVCOMPILER

93d148e

port remaining un/pack loops

3e3da6e

edoyango force-pushed the ompoffload branch from 3346bf4 to 3e3da6e Compare October 10, 2025 05:32

add multi gpu support (#2)

d5739ef

* add multi gpu support * address review comments, add helpful comment for the acc/mp runbtime call

nikizadehgfdl reviewed Dec 10, 2025

View reviewed changes

edoyango added 2 commits December 16, 2025 16:13

sub __NVCOMPILER with __NVCOMPILER_OPENMP_GPU

b287471

allow choice of gpu or cpu parallel

0cc2a77

To enable this, had to be removed - otherwise segfaults happen on the GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement OpenMP offload for do group update #1782

Implement OpenMP offload for do group update #1782

Uh oh!

edoyango commented Oct 10, 2025 •

edited

Loading

Uh oh!

nikizadehgfdl Dec 10, 2025

Uh oh!

edoyango Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement OpenMP offload for do group update #1782

Are you sure you want to change the base?

Implement OpenMP offload for do group update #1782

Uh oh!

Conversation

edoyango commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikizadehgfdl Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

edoyango Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edoyango commented Oct 10, 2025 •

edited

Loading