Skip to content

Sync NCAR/main branch of CCPP physics with ufs/dev branch#3190

Open
grantfirl wants to merge 11 commits intoufs-community:developfrom
grantfirl:NCAR-main-sync-20260401
Open

Sync NCAR/main branch of CCPP physics with ufs/dev branch#3190
grantfirl wants to merge 11 commits intoufs-community:developfrom
grantfirl:NCAR-main-sync-20260401

Conversation

@grantfirl
Copy link
Copy Markdown
Collaborator

Commit Queue Requirements:

  • This PR addresses a relevant WM issue (if not, create an issue).
  • All subcomponent pull requests (if any) have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines), preferably on Ursa (Derecho or Hercules are acceptable alternatives). Exceptions: documentation-only PRs, CI-only PRs, etc.
    • Commit log file w/full results from RT suite run (if applicable).
    • Verify that test_changes.list indicates which tests, if any, are changed by this PR. Commit test_changes.list, even if it is empty.
  • Fill out all sections of this template.

Description:

This is primarily just a CCPP physics update, although it contains some host-side metadata changes to work with the ccpp-physics changes. See ufs-community/ccpp-physics#369 for a description of the CCPP-physics changes.

Commit Message:

* UFSWM - Sync NCAR/main branch of CCPP physics with ufs/dev branch 
  * UFSATM - Sync NCAR/main branch of CCPP physics with ufs/dev branch
    * ccpp-physics - Sync NCAR/main branch of CCPP physics with ufs/dev branch

Priority:

  • Critical Bugfix: Reason
  • High: Reason
  • Normal

Git Tracking

UFSWM:

  • N/A

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • Blocked by #
  • None

Documentation:

  • Documentation update required.
    • Relevant updates are included with this PR.
    • A WM issue has been opened to track the need for a documentation update; a person responsible for submitting the update has been assigned to the issue (link issue).
  • Documentation update NOT required.
    • Explanation: This just updates CCPP physics and doesn't change any results. It's possible that documentation changes should be done for the CCPP technical documentation related to the new ccpp_bcast capability.

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.
  • PR adds input data.
  • PR changes existing input data.

Library Changes/Upgrades:

  • Required
    • Library names w/versions:
    • Git Stack Issue (JCSDA/spack-stack#)
  • No Updates
    This does allow CCPP to use the IP library rather than the SP library, which is a prerequisite for using spack-stack 2.0+

Testing Log:

  • RDHPCS
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@grantfirl
Copy link
Copy Markdown
Collaborator Author

There are several time-outs in my Ursa RT run. I think that this may be related to our resources being overused and fairshare reflecting that, throttling our runs.

@dpsarmie
Copy link
Copy Markdown
Collaborator

I'll rerun the tests on Ursa as a secondary check.

Also, a new GNU warning was introduced in this PR:

/scratch4/NCEPDEV/stmp/Daniel.Sarmiento/grantfirl/UFSATM/ccpp/physics/physics/tools/mpiutil.F90:239:0:

  239 | #else if defined(__GFORTRAN__)
      |
Warning: extra tokens at end of #else directive

Can we get that fixed as part of this PR? Should just be an else if --> elif

@grantfirl
Copy link
Copy Markdown
Collaborator Author

I'll rerun the tests on Ursa as a secondary check.

Also, a new GNU warning was introduced in this PR:

/scratch4/NCEPDEV/stmp/Daniel.Sarmiento/grantfirl/UFSATM/ccpp/physics/physics/tools/mpiutil.F90:239:0:

  239 | #else if defined(__GFORTRAN__)
      |
Warning: extra tokens at end of #else directive

Can we get that fixed as part of this PR? Should just be an else if --> elif

Sure, I'll make that change and update the PR.

@dpsarmie
Copy link
Copy Markdown
Collaborator

It looks like the timeouts on the HAFS tests are real, unfortunately. I repeated one of the HAFS tests to make sure and it does seem to hang early on in the run. Every other test passed.

My first guess is that something in the CCPP updates is conflicting with the nests. This will need to be resolved before bringing in these changes.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

It looks like the timeouts on the HAFS tests are real, unfortunately. I repeated one of the HAFS tests to make sure and it does seem to hang early on in the run. Every other test passed.

My first guess is that something in the CCPP updates is conflicting with the nests. This will need to be resolved before bringing in these changes.

@dpsarmie Thanks very much for rerunning those. I'll see if I can debug the HAFS tests that are timing out. I'm guessing that it's another issue with the new ccpp_bcast functionality, perhaps triggered by the nests.

@gspetro-NOAA gspetro-NOAA added No Baseline Change No Baseline Change UFSATM There are changes to the UFSATM repository. CCPP There are changes to a CCPP repository. labels Apr 14, 2026
@gspetro-NOAA gspetro-NOAA moved this from Evaluating to Waiting for Reviews (subcomponent) in PRs to Process Apr 14, 2026
@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

@grantfirl Just wanted to confirm that this PR is ready for review? If so, I'll request UFSATM reviews.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@grantfirl Just wanted to confirm that this PR is ready for review? If so, I'll request UFSATM reviews.

@grantfirl Just wanted to confirm that this PR is ready for review? If so, I'll request UFSATM reviews.

No, there are still problems with the nested tests. I'll probably need help debugging these. I'm going to ask Dom, who made most of the changes related to mpi_bcast, to help.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@jkbk2004 @gspetro-NOAA Do you know of anyone who could help debug this nesting problem? I described what I'm seeing for @climbfuji here: ufs-community/ccpp-physics#369 (comment)

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

@jkbk2004 @gspetro-NOAA Do you know of anyone who could help debug this nesting problem? I described what I'm seeing for @climbfuji here: ufs-community/ccpp-physics#369 (comment)

Perhaps @DusanJovic-NOAA would have some idea since it's happening in UFSATM? That said, @NickSzapiro-NOAA suggested that you could potentially revert some of the CCPP changes in your PR (like NCAR/ccpp-physics#1187) to isolate where the failures might be coming from.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@DusanJovic-NOAA Do you know/remember what happens with the MPI root PE used in a nested situation? It seems as though there is a conflict with the MPI root PE when nesting is turned on. The MPI_bcast calls as part of this PR are trying to use the mpi_root value given to physics (which is set from FV3's mpp_root_pe() function) rather than a hard-coded 0 value, which is used in various places throughout the UFSATM code. I'm wondering if this is causing the conflict/hanging behavior noticed in this PR?

Any insights are appreciated!

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA Do you know/remember what happens with the MPI root PE used in a nested situation? It seems as though there is a conflict with the MPI root PE when nesting is turned on. The MPI_bcast calls as part of this PR are trying to use the mpi_root value given to physics (which is set from FV3's mpp_root_pe() function) rather than a hard-coded 0 value, which is used in various places throughout the UFSATM code. I'm wondering if this is causing the conflict/hanging behavior noticed in this PR?

Any insights are appreciated!

Root PE on the nested domain is the rank of the first mpi task that runs the nest. For example, if a parent domain runs on 80 tasks, and nest runs on 60 tasks:

first 80 tasks, ranks 0,..,79 will have root PE = 0, and
next 60 tasks, ranks 80,...,139 will have root PE = 80.

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator

I think I see where the problem is. In atmos_model.F90 we currently have:

!--- setup Init_parm
   Init_parm%me              =  mpp_pe()
   Init_parm%master          =  mpp_root_pe()
   Init_parm%fcst_mpi_comm   =  fcst_mpi_comm
   Init_parm%fcst_ntasks     =  fcst_ntasks
   ....

fcst_mpi_comm is the communicator which includes all tasks for all domains, the entire 'fcst grid component'. It is initialized from fcst's ESMF VM. If this communicator is actually used in ccpp-physics for collective calls, then all tasks need the same value for the 'root' task. But in this case, mpp_root_pe() is different for each domain. I think this is because it's an FMS routine used by fv3, and in fv3 they have different ways of identifying which domain (grid) a given task belongs to.

I think solution is eiteher:

  1. Set Init_parm%master to 0 so that all tasks in fcst_mpi_comm have the same 'root'
    or:
  2. Split the communicator into two (or more) communicators, one for each domain (grid), and adjust me and master to be relative to those new communicators.

I'm not sure about the rules for how ccpp treats multiple domains, or if it lacks a concept of multiple domains and treats all point columns uniformly, just as a list/array of columns.

This hasn't been an issue so far, probably because ccpp-physics hardcodes '0' for the root of MPI collective calls. Basically solution 1) reverts to that behavior.

@climbfuji
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA I didn't follow this closely, but regarding the CCPP question: This depends on how the multiple domains are implemented. Is there a separate "copy" (instance) of CCPP for the global domain (the six tiles) and the nested domain (one tile)? Or are the grid points of the nested domain simply appended to those of the global domain?

@DusanJovic-NOAA
Copy link
Copy Markdown
Collaborator

@climbfuji I'm not sure I fully understand what you mean by 'a separate "copy" (instance) of CCPP for the global domain (the six tiles) and the nested domain (one tile)'. If I understand correctly how this works, CCPP runs over a list of columns on each MPI task. Each task runs it's own copy/instance of CCPP. Does CCPP 'know' which (geographical) domain that list of points belongs to? I do not think it does. If it does 'know,' how does it know? The notion of domains, and the potential need for ccpp to 'know' this, is relevant if, for example, collective calls should only communicate with other (MPI) tasks belonging to the same domain, for example finding 'per domain' average, minimum, or maximum. Or something like that. I don't know if this is or ever will be needed in ccpp. In that case, domain communicators will be required, I think. 

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@DusanJovic-NOAA @climbfuji I also had the idea of setting Init_parm%master to 0 as a test, and I can confirm that it does indeed work (model no longer hangs). I'm just not sure that it is the preferred solution. Dom, if a different host model handles MPI communicators and roots differently, as long as all ccpp_bcast calls (and any remaining straight mpi_bcast calls) are set up to use whatever communicator/root that they're given, instead of hard-coded, is that OK from your perspective?

@climbfuji
Copy link
Copy Markdown
Collaborator

@DusanJovic-NOAA @climbfuji I also had the idea of setting Init_parm%master to 0 as a test, and I can confirm that it does indeed work (model no longer hangs). I'm just not sure that it is the preferred solution. Dom, if a different host model handles MPI communicators and roots differently, as long as all ccpp_bcast calls (and any remaining straight mpi_bcast calls) are set up to use whatever communicator/root that they're given, instead of hard-coded, is that OK from your perspective?

Yes, absolutely. I am pretty sure that's how it works in NEPTUNE. We have a specific MPI communicator with a specific MPI root and CCPP should (and hopefully does) use those exclusively. Never MPI_COMM_WORLD or 0 by default.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@gspetro-NOAA I think that this should be ready to re-enter the UFS merge queue again. I think that the nested hanging problem has been fixed. I've re-run Ursa RTs and everything is passing with the exception of control_csawmg_gnu with a time-out and control_p8_ugwpv1_tempo_aerosol_hail_intel with memory issues, both of which I'm hoping are transitory and not related to this PR. I'll upload the new log from Ursa when it finishes up. It's stuck on the last test in the queue since my project's fairshare is so low right now.

That said, I'd like to reintroduce 2 commits from @dustinswales related to the RRTMGP scheme that were reverted back in ufs-community/ccpp-physics#332 and related to ufs-community/ccpp-physics#352. I think that these were reverted due to the same issue that we ran into in this PR with the model hanging when using nests. These 2 commits should now work with HAFS with RRTMGP, although I haven't tested it yet (also due to the low fairshare). I'm wondering if @BinLiu-NOAA might be willing to test RRTMGP with HAFS if I cherry-pick those 2 commits and put them into ufs-community/ccpp-physics#369?

@climbfuji
Copy link
Copy Markdown
Collaborator

@gspetro-NOAA I think that this should be ready to re-enter the UFS merge queue again. I think that the nested hanging problem has been fixed. I've re-run Ursa RTs and everything is passing with the exception of control_csawmg_gnu with a time-out and control_p8_ugwpv1_tempo_aerosol_hail_intel with memory issues, both of which I'm hoping are transitory and not related to this PR. I'll upload the new log from Ursa when it finishes up. It's stuck on the last test in the queue since my project's fairshare is so low right now.

That said, I'd like to reintroduce 2 commits from @dustinswales related to the RRTMGP scheme that were reverted back in ufs-community/ccpp-physics#332 and related to ufs-community/ccpp-physics#352. I think that these were reverted due to the same issue that we ran into in this PR with the model hanging when using nests. These 2 commits should now work with HAFS with RRTMGP, although I haven't tested it yet (also due to the low fairshare). I'm wondering if @BinLiu-NOAA might be willing to test RRTMGP with HAFS if I cherry-pick those 2 commits and put them into ufs-community/ccpp-physics#369?

Out of curiosity, how did you fix the MPI communicator problem?

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@gspetro-NOAA I think that this should be ready to re-enter the UFS merge queue again. I think that the nested hanging problem has been fixed. I've re-run Ursa RTs and everything is passing with the exception of control_csawmg_gnu with a time-out and control_p8_ugwpv1_tempo_aerosol_hail_intel with memory issues, both of which I'm hoping are transitory and not related to this PR. I'll upload the new log from Ursa when it finishes up. It's stuck on the last test in the queue since my project's fairshare is so low right now.
That said, I'd like to reintroduce 2 commits from @dustinswales related to the RRTMGP scheme that were reverted back in ufs-community/ccpp-physics#332 and related to ufs-community/ccpp-physics#352. I think that these were reverted due to the same issue that we ran into in this PR with the model hanging when using nests. These 2 commits should now work with HAFS with RRTMGP, although I haven't tested it yet (also due to the low fairshare). I'm wondering if @BinLiu-NOAA might be willing to test RRTMGP with HAFS if I cherry-pick those 2 commits and put them into ufs-community/ccpp-physics#369?

Out of curiosity, how did you fix the MPI communicator problem?

setting Init_parm%master to 0 (AKA, "mpi_root" known by the CCPP).

@climbfuji
Copy link
Copy Markdown
Collaborator

@gspetro-NOAA I think that this should be ready to re-enter the UFS merge queue again. I think that the nested hanging problem has been fixed. I've re-run Ursa RTs and everything is passing with the exception of control_csawmg_gnu with a time-out and control_p8_ugwpv1_tempo_aerosol_hail_intel with memory issues, both of which I'm hoping are transitory and not related to this PR. I'll upload the new log from Ursa when it finishes up. It's stuck on the last test in the queue since my project's fairshare is so low right now.
That said, I'd like to reintroduce 2 commits from @dustinswales related to the RRTMGP scheme that were reverted back in ufs-community/ccpp-physics#332 and related to ufs-community/ccpp-physics#352. I think that these were reverted due to the same issue that we ran into in this PR with the model hanging when using nests. These 2 commits should now work with HAFS with RRTMGP, although I haven't tested it yet (also due to the low fairshare). I'm wondering if @BinLiu-NOAA might be willing to test RRTMGP with HAFS if I cherry-pick those 2 commits and put them into ufs-community/ccpp-physics#369?

Out of curiosity, how did you fix the MPI communicator problem?

setting Init_parm%master to 0 (AKA, "mpi_root" known by the CCPP).

Ok, thanks.

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

gspetro-NOAA commented Apr 30, 2026

@grantfirl Could we add NCAR/MYNN-SFC#27 as a dependency for your CCPP PR? I just filed ufs-community/ccpp-physics#372, but basically, ufs/dev is still pointing at your feature/tendency_cleanup MYNN branch because we didn't see the PR listed at the WM level in #2810 to have you revert gitmodules, update the hash, and merge that PR.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

ufs-community/ccpp-physics#332

Yes, I was planning on doing that since this was already pointed out to me by someone else.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@grantfirl Could we add NCAR/MYNN-SFC#27 as a dependency for your CCPP PR? I just filed ufs-community/ccpp-physics#372, but basically, ufs/dev is still pointing at your feature/tendency_cleanup MYNN branch because we didn't see the PR listed at the WM level in #2810 to have you revert gitmodules, update the hash, and merge that PR.

@gspetro-NOAA The developers of MYNN-SFC went ahead and merged in that PR, since it was forgotten during the last round. I made sure that ccpp-physics is now pointing to the correct commit hash of the correct fork/branch of MYNN-SFC, so I'm not adding the PR as a dependency. I made a note that I did this in the ccpp-physics PR and referenced the issue that you created.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

grantfirl commented May 1, 2026

Note: The Ursa pre-test log that was uploaded had compilation failures for MPAS. I forgot to update the MPAS-Model submodule before testing, so I just did this and am re-running the MPAS-specific tests only. I'll upload that log to comments when finished.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

@BinLiu-NOAA This PR most likely fixes ufs-community/ccpp-physics#352 too due to the mpi_root being passed to RRTMGP being consistent with the rest of UFS now, even for nesting configurations. There isn't a HAFS nested test with RRTMGP to test this, but if you wanted to try, it would be appreciated, and we could perhaps close out the issue.

@grantfirl
Copy link
Copy Markdown
Collaborator Author

MPAS RT logs:
RegressionTests_ursa.log

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

gspetro-NOAA commented May 5, 2026

@grantfirl @NickSzapiro-NOAA There is an increase of one warning (warning #5194: Source line truncated) for four tests:

  • s2s_32bit_sfs_debug_intel
  • s2sw_32bit_pdlib_debug_intel
  • rrfs_dyn32_phy32_debug_intel
  • rrfs_dyn64_phy32_debug_intel

There is also an increase of 1 remark for several tests, but it appears to be the "ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated" that will go away w/the switch to LLVM.

UPDATE: You can check my develop and 3190 run_dirs here:

  • develop: /work2/noaa/epic/gpetro/hercules/RTs/ufs-wm/stmp/gpetro/FV3_RT/rt_1456523
  • 3190: /work2/noaa/epic/gpetro/hercules/RTs/ufs-wm/stmp/gpetro/FV3_RT/rt_1458202

@NickSzapiro-NOAA
Copy link
Copy Markdown
Collaborator

Thanks @gspetro-NOAA! Do you know the line to make sure it's truncated in the comment? It would be nice to fix all of these warning #5194: Source line truncated together ... until then we live with them

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

gspetro-NOAA commented May 6, 2026

Thanks @gspetro-NOAA! Do you know the line to make sure it's truncated in the comment? It would be nice to fix all of these warning #5194: Source line truncated together ... until then we live with them

It looks like 2 warnings were added in physics/Radiation/radiation_gases.f (possibly lines 286 & 535?) and 1 warning was removed from physics/Radiation/radiation_surface.f for a net 1 added. Not positive which line bc I was just grepping counts of warnings (and now counts of warnings in specific files), but from the vimdiff of compile_rrfs_dyn32_phy32_debug_intel/err, it looks like the culprits are:

/work2/noaa/epic/gpetro/hercules/RTs/ufs-wm/pretests/3190/UFSATM/ccpp/physics/physics/Radiation/radiation_gases.f(286): warning #5194: Source line truncated.
/work2/noaa/epic/gpetro/hercules/RTs/ufs-wm/pretests/3190/UFSATM/ccpp/physics/physics/Radiation/radiation_gases.f(535): warning #5194: Source line truncated.

Those lines don't look like they're changed in ufs-community/ccpp-physics#369 though... 🤔

@grantfirl
Copy link
Copy Markdown
Collaborator Author

Those lines don't look like they're changed in ufs-community/ccpp-physics#369 though... 🤔

@gspetro-NOAA @NickSzapiro-NOAA I went ahead and cleaned up all lines longer than 72 columns in the radiation_gases.f file, so we should be good to go on the warnings/remarks front.

@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

Those lines don't look like they're changed in ufs-community/ccpp-physics#369 though... 🤔

@gspetro-NOAA @NickSzapiro-NOAA I went ahead and cleaned up all lines longer than 72 columns in the radiation_gases.f file, so we should be good to go on the warnings/remarks front.

Great! I ran a compile, and it has reduced the warnings for those tests. Thank you! We should be good to go on this PR as soon as 3207 is merged.

@gspetro-NOAA gspetro-NOAA moved this from Waiting for Reviews (subcomponent) to Review in PRs to Process May 6, 2026
@gspetro-NOAA
Copy link
Copy Markdown
Collaborator

@grantfirl Please sync w/develop and address the minor comment in the UFSATM sub-PR, and we can begin testing for this PR. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CCPP There are changes to a CCPP repository. No Baseline Change No Baseline Change UFSATM There are changes to the UFSATM repository.

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

6 participants