Skip to content

Revamp obs staging and analysis stats job#4306

Merged
DavidHuber-NOAA merged 73 commits intoNOAA-EMC:developfrom
DavidNew-NOAA:feature/stage-obs
Dec 19, 2025
Merged

Revamp obs staging and analysis stats job#4306
DavidHuber-NOAA merged 73 commits intoNOAA-EMC:developfrom
DavidNew-NOAA:feature/stage-obs

Conversation

@DavidNew-NOAA
Copy link
Contributor

@DavidNew-NOAA DavidNew-NOAA commented Dec 9, 2025

Description

This PR makes changes in two areas of GW code for JEDI jobs.

First, it makes changes to ush/python/pygfs/jedi/jedi.py related to observations handling that does the following:

  1. It creates a stage_observations() method for the Jedi class that stages observations for analysis jobs, rather than relying on the task config YAML in GDASApp for staging. This change is justified by the fact that obs staging is essentially the same across all analysis tasks.
  2. Methods for staging, extracting, taring bias corrections, and saving obs diags and radiative bias corrections are moved from the the Analysis class into the Jedi class for the following reason:
  3. The paths, file prefixes, and file suffixes used in these methods for obs, obs diags, and bias corrections are taken from the JCB config dictionary in the Jedi class. This ensures that the file structure and naming for obs and their statistics are consistent between how they are staged in GW and how they are stages and saved by JEDI applications. Thus there can never be naming conflicts.
  4. The JCB config dictionary for a Jedi object is created by the class constructor rather than the initialize class. This way, task_config doesn't need to be passed to both the class constructor and the initialize class. One benefit of this is that it cuts down on the number of times task_config is dumped by the logger.
  5. Other minor changes are made to the Jedi class code such as hardening and more descriptive method/variable naming.

Second, ush/python/pygfs/task/analysis_stats.py is refactored in the following ways:

  1. AnalysisStats class now inherits from Analysis rather than Task. This allows it to inherit parameters like APREFIX, GPREFIX, etc.
  2. Changes are made so that task_config is never modified after the class constructor is run, consistent now with all other tasks.
  3. The "base config" and "JEDI config" YAMLs are consolidated into a single master config YAML (like all other task now), and any staging/saving that was carried on in the original Python code is now invoked by the FileHandler with that master YAML (data_in and data_out keys).
  4. The run directory is reorganized by analysis type with subdirectories for the inputs and outputs.
  5. Input and output paths for staging/saving are taken from the JCB config dictionary of the relevant Jedi object, to ensure that file paths and naming are consistent between the GW code and JEDI application configuration YAMLs.

Resolves #4224
Resolves #4228

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this change expected to change outputs (e.g. value changes to existing outputs, new files stored in COM, files removed from COM, filename changes, additions/subtractions to archives)? YES/NO (If YES, please indicate to which system(s))
    • GFS
    • GEFS
    • SFS
    • GCAFS
  • Is this a breaking change (a change in existing functionality)? YES
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? YES
    • EMC verif-global
    • GDAS
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

Clone, build, and full CI suite on Hera

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

DavidNew-NOAA added a commit to NOAA-EMC/DA-utils that referenced this pull request Dec 19, 2025
This PR, a companion to
NOAA-EMC/global-workflow#4306 and
NOAA-EMC/GDASApp#1999 makes a small change to
the obs statistics application, so that the key `obs spaces` becomes
`observers`. This way, the YAML structure for obs spaces is identical to
all other JEDI configuration YAMLs and the `clean_empty_obsspaces()`
method in the GW `Jedi` class can be applied to the configuration YAML
for this application as well.

---------

Co-authored-by: Cory Martin <[email protected]>
DavidNew-NOAA added a commit to NOAA-EMC/GDASApp that referenced this pull request Dec 19, 2025
…1999)

# Description

The companion PR supports changes made in the
NOAA-EMC/global-workflow#4306 in the following
ways:
1. It consolidates the base yaml and JEDI config YAMLs for the analysis
stats job into a master config YAML and adds keys for the wxflow
FileHandler for staging and saving files for that job.
2. Any obs staging YAMLs are removed since that functionality is now
handled by the `Jedi` class in GW.
3. COM paths are removed from all JCB base YAMLs. Now, these paths are
passed as method arguments to the obs staging and diags saving methods
that now exist in the `Jedi` class. This way, the JCB base YAML is
strictly concerned with paths and file names for the run directories.
4. Changes are made to the JCB base YAMLs for the analysis stats job in
GW to be consistent with its refactoring in the companion PR.
5. An obs list is created for the analysis stats job, determining which
obs to create statistics for.

Additionally, `copy` and `link` keys in various YAMLs which go into the
FileHandler for several jobs become `copy_req` and `link_req`, since the
former is no deprecated.

# Companion PRs

NOAA-EMC/global-workflow#4306
NOAA-EMC/jcb-gdas#214
NOAA-EMC/jcb-algorithms#17
NOAA-EMC/DA-utils#49

# Issues

Refs NOAA-EMC/global-workflow#4224
Refs NOAA-EMC/global-workflow#4228

# Automated CI tests to run in Global Workflow

CI testing will be performed as part of review for the GW companion PR.
Preliminary testing has already been performed.
@DavidNew-NOAA
Copy link
Contributor Author

All companion PRs have been merged and GDAS hash has been updated to develop

@DavidHuber-NOAA
Copy link
Contributor

Great, thank you @DavidNew-NOAA. I'll merge in GW develop now and start testing on WCOSS2.

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress label Dec 19, 2025
@DavidHuber-NOAA
Copy link
Contributor

@DavidNew-NOAA The gfs_sfcanl job for cycle 202112210600 failed for the C96_atm3Dvar_extended test on WCOSS2 in global_cycle with the error

nid002045.dogwood.wcoss2.ncep.noaa.gov 0:   first grib record.
  kpds( 1-10)=           7         120         255         192          91
         102           0          20           1           1
  kpds(11-20)=           0           0           1           0           0
          10          31           1           2           0
  kpds(21-  )=          21           2
nid002045.dogwood.wcoss2.ncep.noaa.gov 5:  FATAL ERROR: ice concentration
 analysis read error.
----- contents of errfile -----
nid002045.dogwood.wcoss2.ncep.noaa.gov 5: abort: 
nid002045.dogwood.wcoss2.ncep.noaa.gov: rank 5 exited with code 134
nid002045.dogwood.wcoss2.ncep.noaa.gov 0: forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source             
libpnetcdf.so.4.0  000014A99A27B2EC  for__signal_handl     Unknown  Unknown
libpthread-2.31.s  000014A9A3B95910  Unknown               Unknown  Unknown
global_cycle       0000000000495788  fi635_                   3155  w3fi63.f
global_cycle       0000000000490F20  w3fi63_                   325  w3fi63.f
global_cycle       000000000048F7B0  getgb1r_                   45  getgb1r.f
global_cycle       000000000048C5BC  getgbm_                   236  getgbm.f
global_cycle       000000000048C19C  getgb_                    186  getgb.f
global_cycle       00000000004478B9  fixrdc_                  8502  sfcsub.F
global_cycle       000000000043C4D7  clima_                   7728  sfcsub.F
global_cycle       000000000041FC00  sfccycle_                1137  sfcsub.F
global_cycle       0000000000412C71  sfcdrv_                   658  cycle.F90
global_cycle       00000000004096B7  MAIN__                    184  cycle.F90
global_cycle       0000000000408B12  Unknown               Unknown  Unknown
libc-2.31.so       000014A99843E24D  __libc_start_main     Unknown  Unknown
global_cycle       0000000000408A2A  Unknown               Unknown  Unknown

Could you please take a look? The log file can be found here: /lfs/h2/emc/ptmp/david.huber/rt_4306/COMROOT/C96_atm3DVar_extended_4306/logs/2021122106/gfs_sfcanl.log.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Dec 19, 2025
@DavidHuber-NOAA
Copy link
Contributor

@DavidNew-NOAA It's possible that this is being caused by the grib-util module not being loaded on WCOSS2. This causes a silent error in the sfcanl jobs. If that is the case, #4328 will provide a fix. See https://github.com/NOAA-EMC/global-workflow/pull/4328/files#diff-6ff39005740b76465a1ddd10c4d16dbdb903d3f41b188118149aa4523f95238a. I will try making this change locally on WCOSS2 and let you know if that fixes the error.

@DavidHuber-NOAA
Copy link
Contributor

Unfortunately, I received the same error with that change added in.

@DavidNew-NOAA
Copy link
Contributor Author

@DavidHuber-NOAA I don't have access to WCOSS to look at the logs. Is it possible to transfer that log file to Hera/Ursa? I'm not sure what could be breaking gdas_sfcanl

@DavidHuber-NOAA
Copy link
Contributor

@DavidNew-NOAA Apologies for the late reply. Here is the log file on Ursa: /scratch3/NCEPDEV/global/David.Huber/for_daveN/gfs_sfcanl.log. I'm running some additional tests on WCOSS2 myself to see if this is an issue in develop. Interestingly, both the 00Z and 12Z cycles ran fine for the extended test. This only happened on the 06z cycle.

If this doesn't reproduce in develop, then I will work on creating a reproducer that can be run on Ursa.

@DavidHuber-NOAA
Copy link
Contributor

It may be worth running a C96_atm3DVar test (non-extended) on Ursa where INTERVAL_GFS=6 to see if it replicates.

@DavidNew-NOAA
Copy link
Contributor Author

Thanks @DavidHuber-NOAA already ahead of you, running CI now. I'll take a look on Sunday.

@DavidHuber-NOAA
Copy link
Contributor

The develop branch also produces this error, so it appears we have a bug elsewhere that needs squashing.

Merging based on otherwise successful testing.

@DavidHuber-NOAA DavidHuber-NOAA merged commit 8c7d6e6 into NOAA-EMC:develop Dec 19, 2025
5 checks passed
@DavidNew-NOAA
Copy link
Contributor Author

Thanks @DavidHuber-NOAA !

@DavidHuber-NOAA
Copy link
Contributor

@DavidNew-NOAA @RussTreadon-NOAA @CoryMartin-NOAA It appears that this update to the GDASApp broke the capability to build the GDASApp on compute nodes. The build fails on WCOSS2, Ursa, and Hera while attempting to run pip. The error is identical to NOAA-EMC/GDASApp#1851 (comment). Is this a known issue? Any chance of a workaround fix besides disabling compute node builds for the GDASApp?

@RussTreadon-NOAA
Copy link
Contributor

Pasting the same comment I added to GDASApp issue #2016

Yes, we previously resolved this issue. The current compute node GDASApp failure caught me by surprise. I stumbled across the failure while running CI for the weekly JEDI hash update this weekend. The compute node failure slipped through both GDASApp PR NOAA-EMC/GDASApp#1999 and g-w PR #4306.

I do not have a short term fix. I see that hotfix #4368 has already been merged into g-w develop. I'm on leave until 1/5/2026. I'll do what I can in terms of troubleshooting between now and then as time permits.

DavidNew-NOAA added a commit to NOAA-EMC/GDASApp that referenced this pull request Jan 6, 2026
This PR makes changes which support changes made in
NOAA-EMC/global-workflow#4306 and
#1999 in the following ways:
1. The obs statistics templates are moved from `algorithms/obsstats` to
`model/obsstats`, since these are indeed model templates and not
standalone algorithms.
2. The algorithm YAML for the JEDI obs stats application is moved from
the jcb-algorithms repo to this repo in `algorithms/obsstats`. The
justification is that the algorithm templates in jcb-algorithms should
be for general model-agnostics JEDI applications, usually existing in
the OOPS repo. This application on the other hand is specific to
GDASApp.
3. Minor variable names changes are made in the obs statistics
templates.
DavidNew-NOAA added a commit to NOAA-EMC/GDASApp that referenced this pull request Jan 16, 2026
…1999)

# Description

The companion PR supports changes made in the
NOAA-EMC/global-workflow#4306 in the following
ways:
1. It consolidates the base yaml and JEDI config YAMLs for the analysis
stats job into a master config YAML and adds keys for the wxflow
FileHandler for staging and saving files for that job.
2. Any obs staging YAMLs are removed since that functionality is now
handled by the `Jedi` class in GW.
3. COM paths are removed from all JCB base YAMLs. Now, these paths are
passed as method arguments to the obs staging and diags saving methods
that now exist in the `Jedi` class. This way, the JCB base YAML is
strictly concerned with paths and file names for the run directories.
4. Changes are made to the JCB base YAMLs for the analysis stats job in
GW to be consistent with its refactoring in the companion PR.
5. An obs list is created for the analysis stats job, determining which
obs to create statistics for.

Additionally, `copy` and `link` keys in various YAMLs which go into the
FileHandler for several jobs become `copy_req` and `link_req`, since the
former is no deprecated.

# Companion PRs

NOAA-EMC/global-workflow#4306
NOAA-EMC/jcb-gdas#214
NOAA-EMC/jcb-algorithms#17
NOAA-EMC/DA-utils#49

# Issues

Refs NOAA-EMC/global-workflow#4224
Refs NOAA-EMC/global-workflow#4228

# Automated CI tests to run in Global Workflow

CI testing will be performed as part of review for the GW companion PR.
Preliminary testing has already been performed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Wcoss2-Failed CI testing on WCOSS for this PR has failed GFS Change This PR, if merged, will change results for the GFS.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create stage_obs method for Jedi class anlstat job hardening and refactoring

6 participants