Skip to content

Conversation

@edoyango
Copy link

@edoyango edoyango commented Oct 10, 2025

Description
This adds the necessary code to make mpp_do_group_update work with arrays that are managed by NVIDIA's OpenMP offload runtime. This attempts to be minimally disruptive in that non-nvidia compilers will see the same behaviour as previously by adding macros around the relevant openmp directives.

Fixes #1771

How Has This Been Tested?
The OpenMP offload capability is currently tested on the "double gyre" case in MOM6-examples using the nvfortran compiler and a cuda-aware openmpi. We have some notes on how to run the gpu-enabled MOM6, but is outdated.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • New check tests, if applicable, are included
  • make distcheck passes

* add multi gpu support

* address review comments, add helpful comment for the acc/mp runbtime call
do i = is, ie
pos = pos + 1
field(i,j,k) = buffer(pos)
idx = pos + (k-1)*nj*ni + (j-js)*ni + (i-is) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these two implementations equivalent? Is new idx = old pos always?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're equivalent. For any iteration, idx = pos + (k-1)*nj*ni + (j-js)*ni + (i-is) + 1 produces the same value that pos would have had at that point. The formula accounts for all the iterations that would have occurred in the nested loops up to that (i,j,k) position.

The reason for the change is that each nested iteration is now independent and can be performed in parallel.

To enable this,  had to be removed -
otherwise segfaults happen on the GPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable OpenMP GPU-to-GPU MPI blocking transfers

3 participants