Sync from NCAR/main (many combined updates) by grantfirl · Pull Request #369 · ufs-community/ccpp-physics

grantfirl · 2026-04-02T14:44:21Z

Description of Changes:

This PR brings development from NCAR/main (several separate PRs) to the ufs/dev branch. It contains primarily development from NRL and is not expected to change RT results.

NCAR/main #1183:
by closing the namelist file in lnd_iau_mod_set_control if the land iau section is not found and INTERNAL_FILE_NML is not used, and by checking that the file open call in cires_ugwpv0_mod_init is successful; if not, it returns with an appropriate error message and error code following CCPP requirements

NCAR/main #1188:
Prevent division by zero when applying free convection adjustment in NSSTM. Require the heat content in diurnal thermocline xt to be positive. This is consistent with the earlier calculation of depth for convective adjustment d_conv in convdepth subroutine.

NCAR/main #1189:

Remove unused variable latr, and additional local variables exists, ios, dxsg, k, that are also unused, from GWD schemes.
Remove white space, empty lines, tabs (all done in a separate commit).
Remove duplicate lines intent=out from two meta files.

NCAR/main #1199:
Update radlw_main.F90: remove comments at end of file for LLVM 22

NCAR/main #1201:
Remove unused variables and improve scheme description in cnvc90.f (remove duplicate description).

NCAR/main #1202:

Improve readability inspired by updates in GWD/cires_ugwpv1_oro.F90 (making comparison between v0 and v1 easier).
Convert GWD/ugwp_driver_v0.F to GWD/ugwp_driver_v0.F90 and update related meta files.
Remove constants that are input but not used con_g, con_omega, and remove unused old code.
Add errflg and errmsg.

NCAR/main #1187:
This PR modifies how data is read in the CCPP init and timestep_init phases. Instead of reading the data serially with every single MPI task, the data is read by the MPI root rank and then broadcasted. This is implemented for all code except the GOCART aerosols (NEPTUNE doesn't use these, hence we have no way to test; also to check: new o3 and h2o code).

The implementation is taking the path described in NCAR#1106: an MPI broadcast wrapper is added in a new module mpiutils which wraps around the - now type dependent - MPI interfaces in mpi_f08.

The CCPP MPI broadcast routines in this PR make use of a ccpp_abort function to stop the model in the event of an MPI error. This is not following CCPP requirements to avoid having to pass errmsg and errflg all the way down and then back out to the host model to abort. CCPP compliancy with current rules can be implemented, but it is worth discussing if alternative methods are preferable and/or simplify the code. To note: The authoritative code in NCAR ccpp-physics in many places simplies calls stop to abort the model. That's much worse than using MPI_ABORT and of course also not CCPP compliant. In NEPTUNE, we've used a function equivalent to ccpp_abort in these places.

NCAR/main #1197:
Add ability to build with ip library if it is found. The sp library is being replaced by ip so this is required. Note that in spack-stack the ip package builds with the OpenMP flag so it gets turned on in the CMake build. This means the CMAKE_Fortran_FLAGS_OPENMP_OFF needs to be set by the host model since the RRTMGP files currently break if compiled with OpenMP flags. The CMAKE_Fortran_FLAGS_OPENMP_OFF environment variable will make sure OpenMP isn't used by the RRTMGP files.

NCAR/main #1205:
Similar changes to NCAR/main NCAR#1187, but targeted toward NOAA model-specific interstitials; also removed some problematic ccpp_bcast calls in aerinterp.F90

-Also point to ccpp/dev branch of MYNN-SFC submodule.

Tests Conducted:

SCM RTs, UFS RTs, NEPTUNE testing for individual PRs along the way using NCAR/main

Dependencies:

None

Documentation:

N/A

Issue (optional):

Fixes #372

Contributors (optional):

@matusmartini @climbfuji @scrasmussen

UFS/dev PR327

This PR changes the option relative_path in the CCPP metadata to dependencies_path as discussed in NCAR/ccpp-framework#685.

…hen an error occurs during read

…, return with a meaningful error message and flag

ufs/dev sync (NCAR#329)

… NSSTM

duplicate lines from meta files

Ufs dev pr336

…s://github.com/climbfuji/ccpp-physics

…pp-physics into feature/wrapper_1183_1188_1189_ncar

… into feature/wrapper_1183_1188_1189_ncar

…_1189_ncar Wrapper PR for NCAR#1183 NCAR#1188 NCAR#1189 (NRL cleanup PRs)

UFS-dev PR#298

ufs-dev PRs 349 343

…esentative for NRL

ufs-dev 332

AnningCheng-NOAA

fine with me

grantfirl · 2026-04-27T17:16:45Z

@climbfuji It looks like most/all of the nested UFS RTs are still hanging when using these code changes. Can you think of any reason why only the nested tests would hang?

I traced the point where the model is stopping back to UFSATM/fv3/module_fcst_grid_comp.F90/fcst_initialize here: https://github.com/NOAA-EMC/ufsatm/blob/0f4fe59702e81be34c5b6bce5ab31b2e6004d804/fv3/module_fcst_grid_comp.F90#L937

This happens when n=2 in the loop. It seems like ESMF is basically setting an error and returning, but the model doesn't actually stop. The run directory on Ursa is: /scratch3/BMC/gmtb/Grant.Firl/stmp2/Grant.Firl/FV3_RT/rt_884137/hafs_regional_1nest_atm_intel

I further looked into the ESMF log file for this process (000) and found an error stack, but this is as far as I went, since this is well outside of the realm of physics.

20260427 171000.089 ERROR PET000 ESMCI_DistGrid.C:1441 ESMCI::DistGrid::create() Invalid argument - deCount must match between provided DELayout
and provided regDecomp
20260427 171000.089 ERROR PET000 ESMCI_DistGrid_F.C:147 c_esmc_distgridcreaterd() Invalid argument - Internal subroutine call returned Error
20260427 171000.090 ERROR PET000 ESMF_DistGrid.F90:1242 ESMF_DistGridCreateRD() Invalid argument - Internal subroutine call returned Error
20260427 171000.090 ERROR PET000 ESMF_Grid.F90:29822 ESMF_GridCreateDistgridReg Invalid argument - Internal subroutine call returned Error
20260427 171000.091 ERROR PET000 ESMF_Grid.F90:10892 ESMF_GridCreateNoPeriDimR Invalid argument - Internal subroutine call returned Error
20260427 171000.091 ERROR PET000 module_fcst_grid_comp.F90:261 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 module_fcst_grid_comp.F90:944 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 ufsatm_cap.F90:675 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 ATM:src/addon/NUOPC/src/NUOPC_ModelBase.F90:692 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2918 Invalid argument - Phase 'IPDvXp01' Initialize f
or modelComp 1: ATM did not return ESMF_SUCCESS
20260427 171000.091 ERROR PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1365 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:486 Invalid argument - Passing error in return code
20260427 171000.091 ERROR PET000 UFS.F90:397 Invalid argument - Aborting UFS

grantfirl · 2026-04-27T17:21:47Z

@climbfuji There does appear to be some MPI broadcasting happening immediately above where the model stops. Would there be some kind of broadcasting conflict between what is going on in the physics vs what is happening in module_fcst_grid_comp.F90?

climbfuji · 2026-04-27T17:25:46Z

@climbfuji There does appear to be some MPI broadcasting happening immediately above where the model stops. Would there be some kind of broadcasting conflict between what is going on in the physics vs what is happening in module_fcst_grid_comp.F90?

This rings a bell. @dustinswales had to add an ugly workaround to the RRTMGP (?) code to pass in another MPI communicator (long before the mpi_bcast changes were made). I never had the chance to look at this, but may guess is that the MPI communicator passed in is not the correct one. In a situation with nests, my guess is that you have multiple MPI communicators (similar to a coupled model).

grantfirl · 2026-04-27T17:30:08Z

@climbfuji There does appear to be some MPI broadcasting happening immediately above where the model stops. Would there be some kind of broadcasting conflict between what is going on in the physics vs what is happening in module_fcst_grid_comp.F90?

This rings a bell. @dustinswales had to add an ugly workaround to the RRTMGP (?) code to pass in another MPI communicator (long before the mpi_bcast changes were made). I never had the chance to look at this, but may guess is that the MPI communicator passed in is not the correct one. In a situation with nests, my guess is that you have multiple MPI communicators (similar to a coupled model).

Ya, this came to mind to me as well (with respect to the RRTMGP/nested thing). I too was wondering if they had the same root problem.

dustinswales · 2026-04-27T17:41:12Z

For GP to work with nested configs, I needed to add initialization flags to any module that was reading data and broadcasting.
Otherwise, each nested domain will try to read the data, which was a no no.

grantfirl · 2026-04-28T17:29:31Z

@dustinswales The initialization flags for reading/broadcasting were reverted, though, and the workaround of setting the local RRTMGP mpiroot = 0 was reinstated upon merge. The issue #352 was added to fix this in the future. Apparently, the future is now! It definitely seems like a clue that setting mpiroot = 0 before broadcasting in the physics (at least in RRTMGP) seems to "work", albeit perhaps for the wrong reason?

It seems like there is a conflict in the MPI root PE used for broadcasting somewhere in the UFS/FV3 code vs physics, only when nesting is active? I'm out of my depth here, but a quick search brings up the following:

Mismatched Root Arguments: Every process in the communicator must pass the same integer value for the root argument. If Rank 0 calls MPI_Bcast(..., root=0, ...) while Rank 1 calls MPI_Bcast(..., root=1, ...), the MPI environment may hang because Rank 1 is waiting to send data while Rank 0 is expecting to send data, with no one acting as the receiver for either.

In Fortran, an MPI_Bcast root conflict occurs when the processes involved in the collective call do not agree on which process is the source (root) of the data. This is a common logical error that leads to undefined behavior, such as hanging (deadlocks), garbage data, or program crashes.

So, it seems like the correct way to fix this is to find all mpi_bcast calls in the UFS, make sure that they're all using the correct MPI communicator and MPI root PE?

I'm doing a quick test where I'm running one of the HAFS nested RTs and I switched the immediately preceding MPI_bcast call in module_fcst_grid_comp to use mpp_root_pe(), which is what is passed in to the physics as "mpi_root". Of course, if this works, RRTMGP would need to go back to using mpi_root instead of 0.

grantfirl · 2026-04-28T17:43:30Z

I'm doing a quick test where I'm running one of the HAFS nested RTs and I switched the immediately preceding MPI_bcast call in module_fcst_grid_comp to use mpp_root_pe(), which is what is passed in to the physics as "mpi_root". Of course, if this works, RRTMGP would need to go back to using mpi_root instead of 0.

No joy from this. Another test that I could try is to pass in 0 as the mpi_root for all ccpp_bcast calls and see if that "hack" works. Not that I want to keep that in, it's just a data point to help debug this.

dustinswales · 2026-04-28T17:44:34Z

@dustinswales The initialization flags for reading/broadcasting were reverted, though, and the workaround of setting the local RRTMGP mpiroot = 0 was reinstated upon merge. The issue #352 was added to fix this in the future. Apparently, the future is now! It definitely seems like a clue that setting mpiroot = 0 before broadcasting in the physics (at least in RRTMGP) seems to "work", albeit perhaps for the wrong reason?

It seems like there is a conflict in the MPI root PE used for broadcasting somewhere in the UFS/FV3 code vs physics, only when nesting is active? I'm out of my depth here, but a quick search brings up the following:

Mismatched Root Arguments: Every process in the communicator must pass the same integer value for the root argument. If Rank 0 calls MPI_Bcast(..., root=0, ...) while Rank 1 calls MPI_Bcast(..., root=1, ...), the MPI environment may hang because Rank 1 is waiting to send data while Rank 0 is expecting to send data, with no one acting as the receiver for either.
In Fortran, an MPI_Bcast root conflict occurs when the processes involved in the collective call do not agree on which process is the source (root) of the data. This is a common logical error that leads to undefined behavior, such as hanging (deadlocks), garbage data, or program crashes.

So, it seems like the correct way to fix this is to find all mpi_bcast calls in the UFS, make sure that they're all using the correct MPI communicator and MPI root PE?

I'm doing a quick test where I'm running one of the HAFS nested RTs and I switched the immediately preceding MPI_bcast call in module_fcst_grid_comp to use mpp_root_pe(), which is what is passed in to the physics as "mpi_root". Of course, if this works, RRTMGP would need to go back to using mpi_root instead of 0.

@grantfirl My apologies. I forgot they reverted the init flag solution for GP. Snap.
I feel like you are on the right track by tracking down inconsistencies with the MPI communicator. It sure seems like a host configuration problem and not a physics issue.

grantfirl · 2026-04-28T17:52:26Z

@grantfirl My apologies. I forgot they reverted the init flag solution for GP. Snap. I feel like you are on the right track by tracking down inconsistencies with the MPI communicator. It sure seems like a host configuration problem and not a physics issue.

Ya, it sure seems like some kind on inconsistency in the MPI root used for broadcasting. I'm asking Dusan for help to see if he remembers perhaps what is different about the nested situation that could cause this, pointing to a fix. It sure seems like the physics code is OK. It's trying to use the MPI communicator and root that it is given, but it's just conflicting with the host.

hertneky and others added 30 commits November 24, 2025 18:42

Merge branch 'entr_f0f1' into ufs-dev-PR327

531a3b7

Merge pull request NCAR#1179 from hertneky/ufs-dev-PR327

9fe0af3

UFS/dev PR327

CCPP metadata: relative_path --> dependencies_path (NCAR#329)

0039828

This PR changes the option relative_path in the CCPP metadata to dependencies_path as discussed in NCAR/ccpp-framework#685.

physics/SFC_Models/Land/Noahmp/lnd_iau_mod.F90: close namelist file w…

02e756a

…hen an error occurs during read

physics/GWD/cires_ugwp_module.F90: if opening the namelist file fails…

c7818cc

…, return with a meaningful error message and flag

Merge pull request NCAR#1181 from hertneky/ufs-dev-329

35e9980

ufs/dev sync (NCAR#329)

Merge branch 'feature/gcycle_non6' into ufs-dev-PR336

673f127

Prevent division by zero when applying free convection adjustment in…

c6b6e99

… NSSTM

Remove unused variable (latr) from GWD schemes

18699e2

Remove white space, empty or duplicate lines

c6dd1ee

Remove additional unused local variables from GWD schemes and remove

a5709b3

duplicate lines from meta files

Merge pull request NCAR#1186 from hertneky/ufs-dev-PR336

4e408e7

Ufs dev pr336

Merge branch 'feature/inline_sync' into ufs-dev-PR298

6d3b661

Merge branch 'bugfixes/lnd_iau_and_cires_ugwp_namelist_reads' of http…

ff19404

…s://github.com/climbfuji/ccpp-physics

Merge branch 'bugfix-nsstm-fpe' of https://github.com/matusmartini/cc…

d8566f5

…pp-physics into feature/wrapper_1183_1188_1189_ncar

Merge branch 'gwd_latr' of https://github.com/matusmartini/ccpp-physics…

713ddac

… into feature/wrapper_1183_1188_1189_ncar

Merge pull request NCAR#1191 from climbfuji/feature/wrapper_1183_1188…

52a9db3

…_1189_ncar Wrapper PR for NCAR#1183 NCAR#1188 NCAR#1189 (NRL cleanup PRs)

Merge branch 'main' into ufs-dev-PR298

3984842

Merge branch 'dev_alt_dfac' into ufs-devpr-349-343

89d98c7

Merge branch 'fix/cleanup_sfcsub' into ufs-devpr-349-343

f7be6c0

Merge branch 'feature/hafsv2_sync' into ufs-dev-332

26b473d

Merge pull request NCAR#1190 from grantfirl/ufs-dev-PR298

e84beb7

UFS-dev PR#298

Merge branch 'main' into ufs-devpr-349-343

469da7c

Merge pull request NCAR#1194 from scrasmussen/ufs-devpr-349-343

f4c82f6

ufs-dev PRs 349 343

Merge branch 'main' into ufs-dev-332

3bf5d62

add Dom to CODEOWNERS file for automatic code review requests as repr…

941cfd2

…esentative for NRL

Merge pull request NCAR#1195 from scrasmussen/ufs-dev-332

a2aa6cc

ufs-dev 332

Update radlw_main.F90: remove comments at end of file for LLVM 22

82e0d17

Apply suggestion from @climbfuji

b6a5792

Merge branch 'feature/mynn_sfc_submodule' into ufs-dev-PR345

1318f22

grantfirl requested review from AndersJensen-NOAA, AnningCheng-NOAA, BoYang-NOAA, RuiyuSun, XuLi-NOAA, andrewgettelman, gthompsnWRF, mdtoyNOAA, mjiacono and mzhangw as code owners April 10, 2026 17:27

This was referenced Apr 10, 2026

Sync NCAR/main branch of CCPP physics with ufs/dev branch NOAA-EMC/ufsatm#1092

Open

Sync NCAR/main branch of CCPP physics with ufs/dev branch ufs-community/ufs-weather-model#3190

Open

mdtoyNOAA approved these changes Apr 10, 2026

View reviewed changes

fix new compiler warning in mpiutil.F90

d41d8c6

AnningCheng-NOAA approved these changes Apr 20, 2026

View reviewed changes

Qingfu-Liu approved these changes Apr 20, 2026

View reviewed changes

grantfirl and others added 5 commits April 29, 2026 18:12

Merge branch 'ufs/dev' into NCAR-main-sync-20260401

e2a1222

Remove local mpiroot

06c8cae

Add initialization flags.

9f1ae08

update MYNN-SFC submodule pointer

24730c9

clean up truncated comment lines in radiation_gases.f

37ad157

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync from NCAR/main (many combined updates)#369

Sync from NCAR/main (many combined updates)#369
grantfirl wants to merge 59 commits intoufs-community:ufs/devfrom
grantfirl:NCAR-main-sync-20260401

grantfirl commented Apr 2, 2026 •

edited

Loading

Uh oh!

AnningCheng-NOAA left a comment

Uh oh!

grantfirl commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 27, 2026 •

edited

Loading

Uh oh!

climbfuji commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 27, 2026

Uh oh!

dustinswales commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

dustinswales commented Apr 28, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

grantfirl commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of Changes:

Tests Conducted:

Dependencies:

Documentation:

Issue (optional):

Contributors (optional):

Uh oh!

AnningCheng-NOAA left a comment

Choose a reason for hiding this comment

Uh oh!

grantfirl commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

climbfuji commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 27, 2026

Uh oh!

dustinswales commented Apr 27, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

dustinswales commented Apr 28, 2026

Uh oh!

grantfirl commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

grantfirl commented Apr 2, 2026 •

edited

Loading

grantfirl commented Apr 27, 2026 •

edited

Loading