Skip to content

Update gsibec codes to fix issue with NaNs during the second outer loop#533

Merged
ShunLiu-NOAA merged 4 commits intoNOAA-EMC:developfrom
SamuelDegelia-NOAA:feature/gsibec_fixes
Feb 6, 2026
Merged

Update gsibec codes to fix issue with NaNs during the second outer loop#533
ShunLiu-NOAA merged 4 commits intoNOAA-EMC:developfrom
SamuelDegelia-NOAA:feature/gsibec_fixes

Conversation

@SamuelDegelia-NOAA
Copy link
Contributor

Description

This PR updates a few of the gsibec workaround codes to prevent NaNs from appearing the in the second outer loop. The issue was related to background temperature values becoming unreasonable in certain grids.

Thanks to @Masanori-NOAA for quickly fixing this problem.

This PR also includes a small fix for the srun version on Hera to get the ctests working again.

Issue(s) addressed

None

Dependencies (if applicable)

None

Checklist

  • I have performed a self-review of my own code.
  • I have run rrfs tests before creating the PR (if applicable).
  • Unit tests added/updated (if applicable).

Copy link
Contributor

@TingLei-NOAA TingLei-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SamuelDegelia-NOAA Thanks for running test confirming this changes work., as the work-around. Hope we would have more systematic solution soon with the fillmissingvalue function .

@rrfsbot
Copy link
Collaborator

rrfsbot commented Feb 5, 2026

FAILED on hera

started build_and_test on hera at UTC time: Thu Feb 5 19:52:48 UTC 2026
finished at UTC time: Thu Feb 5 20:29:30 UTC 2026

Test project /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533/build/rrfs-test
      Start  6: rrfs_fv3jedi_2024052700_getkf_observer
      Start 15: rrfs_mpasjedi_2024052700_getkf_observer
      Start  1: rrfs_fv3jedi_2024052700_3dvar
      Start  2: rrfs_fv3jedi_2024052700_3denvar
      Start  3: rrfs_fv3jedi_2024052700_3denvar_mgbf
      Start  4: rrfs_fv3jedi_2024052700_hybrid3denvar
      Start  5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf
      Start  8: rrfs_fv3jedi_2024052700_3dvar_conv_surface
 1/18 Test #15: rrfs_mpasjedi_2024052700_getkf_observer .......***Failed   33.41 sec
      Start  9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair
 2/18 Test  #6: rrfs_fv3jedi_2024052700_getkf_observer ........***Failed   36.08 sec
      Start  7: rrfs_fv3jedi_2024052700_getkf_solver
 3/18 Test  #4: rrfs_fv3jedi_2024052700_hybrid3denvar .........***Failed   40.11 sec
      Start 10: rrfs_fv3jedi_2024052700_3dvar_remote
 4/18 Test  #8: rrfs_fv3jedi_2024052700_3dvar_conv_surface ....***Failed   43.03 sec
      Start 11: rrfs_fv3jedi_2024052700_3dvar_satrad
 5/18 Test  #1: rrfs_fv3jedi_2024052700_3dvar .................***Failed   46.00 sec
      Start 12: rrfs_fv3jedi_2024052700_3denvar_refl
 6/18 Test  #9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair ...***Failed   15.58 sec
      Start 13: rrfs_mpasjedi_2024052700_bumploc
 7/18 Test #10: rrfs_fv3jedi_2024052700_3dvar_remote ..........***Failed   12.96 sec
      Start 14: rrfs_mpasjedi_2024052700_3denvar
 8/18 Test #13: rrfs_mpasjedi_2024052700_bumploc ..............***Failed    8.03 sec
      Start 16: rrfs_mpasjedi_2024052700_getkf_solver
 9/18 Test #11: rrfs_fv3jedi_2024052700_3dvar_satrad ..........   Passed   76.45 sec
      Start 17: rrfs_mpasjedi_2024052700_3dvar
10/18 Test #17: rrfs_mpasjedi_2024052700_3dvar ................   Passed   60.58 sec
      Start 18: rrfs_bufr2ioda_msonet
11/18 Test #18: rrfs_bufr2ioda_msonet .........................   Passed   62.00 sec
12/18 Test  #3: rrfs_fv3jedi_2024052700_3denvar_mgbf ..........   Passed  427.09 sec
13/18 Test  #7: rrfs_fv3jedi_2024052700_getkf_solver ..........***Failed  426.02 sec
14/18 Test  #2: rrfs_fv3jedi_2024052700_3denvar ...............   Passed  542.42 sec
15/18 Test  #5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf ....   Passed  580.96 sec
16/18 Test #16: rrfs_mpasjedi_2024052700_getkf_solver .........***Failed  535.64 sec
17/18 Test #14: rrfs_mpasjedi_2024052700_3denvar ..............   Passed  721.08 sec
18/18 Test #12: rrfs_fv3jedi_2024052700_3denvar_refl ..........   Passed  771.32 sec

44% tests passed, 10 tests failed out of 18

Label Time Summary:
mpi            = 4438.77 sec*proc (18 tests)
rdas-bundle    = 4438.77 sec*proc (18 tests)
script         = 4438.77 sec*proc (18 tests)

Total Test time (real) = 817.34 sec

The following tests FAILED:
	  1 - rrfs_fv3jedi_2024052700_3dvar (Failed)
	  4 - rrfs_fv3jedi_2024052700_hybrid3denvar (Failed)
	  6 - rrfs_fv3jedi_2024052700_getkf_observer (Failed)
	  7 - rrfs_fv3jedi_2024052700_getkf_solver (Failed)
	  8 - rrfs_fv3jedi_2024052700_3dvar_conv_surface (Failed)
	  9 - rrfs_fv3jedi_2024052700_3dvar_conv_upperair (Failed)
	 10 - rrfs_fv3jedi_2024052700_3dvar_remote (Failed)
	 13 - rrfs_mpasjedi_2024052700_bumploc (Failed)
	 15 - rrfs_mpasjedi_2024052700_getkf_observer (Failed)
	 16 - rrfs_mpasjedi_2024052700_getkf_solver (Failed)
Errors while running CTest
Output from these tests are in: /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533/build/rrfs-test/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

workdir: /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533

@SamuelDegelia-NOAA
Copy link
Contributor Author

PASSED on wcoss2

started build_and_test on wcoss2 at UTC time: Thu Feb 5 19:51:32 UTC 2026
finished at UTC time: Thu Feb 5 20:46:31 UTC 2026

Test project /lfs/h2/emc/da/noscrub/samuel.degelia/rrfsbot/PRs_RDASApp/533/build/rrfs-test
      Start  6: rrfs_fv3jedi_2024052700_getkf_observer
      Start 15: rrfs_mpasjedi_2024052700_getkf_observer
      Start  1: rrfs_fv3jedi_2024052700_3dvar
      Start  2: rrfs_fv3jedi_2024052700_3denvar
      Start  3: rrfs_fv3jedi_2024052700_3denvar_mgbf
      Start  4: rrfs_fv3jedi_2024052700_hybrid3denvar
      Start  5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf
      Start  8: rrfs_fv3jedi_2024052700_3dvar_conv_surface
      Start  9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair
      Start 10: rrfs_fv3jedi_2024052700_3dvar_remote
 1/18 Test #10: rrfs_fv3jedi_2024052700_3dvar_remote ..........   Passed   78.05 sec
      Start 11: rrfs_fv3jedi_2024052700_3dvar_satrad
 2/18 Test  #1: rrfs_fv3jedi_2024052700_3dvar .................   Passed   85.17 sec
      Start 12: rrfs_fv3jedi_2024052700_3denvar_refl
 3/18 Test  #9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair ...   Passed  100.05 sec
      Start 13: rrfs_mpasjedi_2024052700_bumploc
 4/18 Test  #8: rrfs_fv3jedi_2024052700_3dvar_conv_surface ....   Passed  102.09 sec
      Start 14: rrfs_mpasjedi_2024052700_3denvar
 5/18 Test  #6: rrfs_fv3jedi_2024052700_getkf_observer ........   Passed  136.08 sec
      Start  7: rrfs_fv3jedi_2024052700_getkf_solver
 6/18 Test #11: rrfs_fv3jedi_2024052700_3dvar_satrad ..........   Passed  128.02 sec
      Start 17: rrfs_mpasjedi_2024052700_3dvar
 7/18 Test  #2: rrfs_fv3jedi_2024052700_3denvar ...............   Passed  249.23 sec
      Start 18: rrfs_bufr2ioda_msonet
 8/18 Test  #4: rrfs_fv3jedi_2024052700_hybrid3denvar .........   Passed  258.13 sec
 9/18 Test #18: rrfs_bufr2ioda_msonet .........................   Passed   41.09 sec
10/18 Test  #3: rrfs_fv3jedi_2024052700_3denvar_mgbf ..........   Passed  294.20 sec
11/18 Test  #5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf ....   Passed  305.05 sec
12/18 Test  #7: rrfs_fv3jedi_2024052700_getkf_solver ..........   Passed  174.00 sec
13/18 Test #17: rrfs_mpasjedi_2024052700_3dvar ................   Passed  119.99 sec
14/18 Test #13: rrfs_mpasjedi_2024052700_bumploc ..............   Passed  349.11 sec
15/18 Test #15: rrfs_mpasjedi_2024052700_getkf_observer .......   Passed  498.10 sec
      Start 16: rrfs_mpasjedi_2024052700_getkf_solver
16/18 Test #14: rrfs_mpasjedi_2024052700_3denvar ..............   Passed  485.03 sec
17/18 Test #12: rrfs_fv3jedi_2024052700_3denvar_refl ..........   Passed  671.26 sec
18/18 Test #16: rrfs_mpasjedi_2024052700_getkf_solver .........   Passed  318.98 sec

100% tests passed, 0 tests failed out of 18

Label Time Summary:
rdas-bundle    = 4393.62 sec*proc (18 tests)
script         = 4393.62 sec*proc (18 tests)

Total Test time (real) = 817.13 sec

workdir: /lfs/h2/emc/da/noscrub/samuel.degelia/rrfsbot/PRs_RDASApp/533

@SamuelDegelia-NOAA
Copy link
Contributor Author

New Hera ctest failure for many tests:

/scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533/build/bin/fv3jedi_var.x: error while loading shared libraries: /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533/bundle/../build/lib64/libgsibec.so: cannot read file data: Input/output error

This error is unrelated to the changes in this PR. I am guessing that this is due to ongoing /scratch3 errors on Hera. I can run the ctests on my own in /scratch4 and they all pass.

@rrfsbot
Copy link
Collaborator

rrfsbot commented Feb 6, 2026

PASSED on hera

started build_and_test on hera at UTC time: Fri Feb 6 01:49:47 UTC 2026
finished at UTC time: Fri Feb 6 02:17:09 UTC 2026

Test project /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533/build/rrfs-test
      Start  6: rrfs_fv3jedi_2024052700_getkf_observer
      Start 15: rrfs_mpasjedi_2024052700_getkf_observer
      Start  1: rrfs_fv3jedi_2024052700_3dvar
      Start  2: rrfs_fv3jedi_2024052700_3denvar
      Start  3: rrfs_fv3jedi_2024052700_3denvar_mgbf
      Start  4: rrfs_fv3jedi_2024052700_hybrid3denvar
      Start  5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf
      Start  8: rrfs_fv3jedi_2024052700_3dvar_conv_surface
 1/18 Test  #1: rrfs_fv3jedi_2024052700_3dvar .................   Passed   31.47 sec
      Start  9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair
 2/18 Test  #8: rrfs_fv3jedi_2024052700_3dvar_conv_surface ....   Passed   49.57 sec
      Start 10: rrfs_fv3jedi_2024052700_3dvar_remote
 3/18 Test  #6: rrfs_fv3jedi_2024052700_getkf_observer ........   Passed   53.82 sec
      Start  7: rrfs_fv3jedi_2024052700_getkf_solver
 4/18 Test #10: rrfs_fv3jedi_2024052700_3dvar_remote ..........   Passed   17.47 sec
      Start 11: rrfs_fv3jedi_2024052700_3dvar_satrad
 5/18 Test  #9: rrfs_fv3jedi_2024052700_3dvar_conv_upperair ...   Passed   44.50 sec
      Start 12: rrfs_fv3jedi_2024052700_3denvar_refl
 6/18 Test  #7: rrfs_fv3jedi_2024052700_getkf_solver ..........   Passed   52.37 sec
      Start 13: rrfs_mpasjedi_2024052700_bumploc
 7/18 Test  #2: rrfs_fv3jedi_2024052700_3denvar ...............   Passed  120.84 sec
      Start 14: rrfs_mpasjedi_2024052700_3denvar
 8/18 Test #11: rrfs_fv3jedi_2024052700_3dvar_satrad ..........   Passed   58.82 sec
      Start 17: rrfs_mpasjedi_2024052700_3dvar
 9/18 Test  #5: rrfs_fv3jedi_2024052700_hybrid3denvar_mgbf ....   Passed  173.97 sec
      Start 18: rrfs_bufr2ioda_msonet
10/18 Test  #4: rrfs_fv3jedi_2024052700_hybrid3denvar .........   Passed  176.88 sec
11/18 Test  #3: rrfs_fv3jedi_2024052700_3denvar_mgbf ..........   Passed  180.21 sec
12/18 Test #17: rrfs_mpasjedi_2024052700_3dvar ................   Passed   57.84 sec
13/18 Test #18: rrfs_bufr2ioda_msonet .........................   Passed   24.11 sec
14/18 Test #15: rrfs_mpasjedi_2024052700_getkf_observer .......   Passed  210.49 sec
      Start 16: rrfs_mpasjedi_2024052700_getkf_solver
15/18 Test #16: rrfs_mpasjedi_2024052700_getkf_solver .........   Passed  168.94 sec
16/18 Test #14: rrfs_mpasjedi_2024052700_3denvar ..............   Passed  259.68 sec
17/18 Test #13: rrfs_mpasjedi_2024052700_bumploc ..............   Passed  299.34 sec
18/18 Test #12: rrfs_fv3jedi_2024052700_3denvar_refl ..........   Passed  334.17 sec

100% tests passed, 0 tests failed out of 18

Label Time Summary:
mpi            = 2314.49 sec*proc (18 tests)
rdas-bundle    = 2314.49 sec*proc (18 tests)
script         = 2314.49 sec*proc (18 tests)

Total Test time (real) = 410.18 sec

workdir: /scratch3/NCEPDEV/fv3-cam/rrfsbot/PRs_RDASApp/533

@ShunLiu-NOAA ShunLiu-NOAA merged commit fc99d78 into NOAA-EMC:develop Feb 6, 2026
1 check passed
@SamuelDegelia-NOAA SamuelDegelia-NOAA deleted the feature/gsibec_fixes branch February 9, 2026 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants