Skip to content

Enable EnKF-only for atmosphere#2010

Merged
CoryMartin-NOAA merged 24 commits intoNOAA-EMC:developfrom
bhuang95:feature/enkf_only_dev
Jan 23, 2026
Merged

Enable EnKF-only for atmosphere#2010
CoryMartin-NOAA merged 24 commits intoNOAA-EMC:developfrom
bhuang95:feature/enkf_only_dev

Conversation

@bhuang95
Copy link
Contributor

@bhuang95 bhuang95 commented Dec 16, 2025

This PR works along with the following three dependent PRs to enable the EnKF-only configuration for the atmosphere within the global workflow (see detailed description NOAA-EMC/global-workflow#4345)

Dependencies:
-NOAA-EMC/global-workflow#4345
-NOAA-EMC/jcb-gdas#219
-NOAA-EMC/jcb#32

Resolve

@bhuang95 bhuang95 changed the title Feature/enkf only dev Enable EnKF-only for atmosphere Dec 16, 2025
CoryMartin-NOAA added a commit to NOAA-EMC/jcb that referenced this pull request Jan 8, 2026
This PR works along with the following three dependent PRs to enable the
EnKF-only configuration for the atmosphere within the global workflow
(see detailed description
NOAA-EMC/global-workflow#4345)

Dependencies:
-NOAA-EMC/global-workflow#4345
-NOAA-EMC/GDASApp#2010
-NOAA-EMC/jcb-gdas#219

Resolve
- NOAA-EMC/global-workflow#4339

---------

Co-authored-by: Cory Martin <[email protected]>
@bhuang95
Copy link
Contributor Author

bhuang95 commented Jan 9, 2026

@CoryMartin-NOAA I updated this branch with develop. It now has issues of reading the obs distribution and localization blocks in atm_ens_obs_dist_localizations.yaml.j2
to configure LETKF yaml wtih below errors. This yaml is used on Line 23 of this LETKF YAML template. Do you know what recent changes may cause this? Looks like something is wrong with the format in this yaml?

1251 ^[[38;5;39m2026-01-09 22:53:31,307 - DEBUG    - jedi        : Writing JEDI YAML config to: /scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/TestGWSet     upRealCase/tmp/RUNDIRS/TestGW-EnKFOnly-JEDI-MEM3-T11/enkfgdas.2022010318/enkfgdasatmensanl_18/atmensanlobs.yaml^[[0m
1252 Traceback (most recent call last):
1253   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/scripts/exglobal_atmens_analysis_initialize.py", line 26, in <module>
1254     AtmEnsAnl.initialize()
1255   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/sorc/wxflow/src/wxflow/logger.py", line 252, in wrapper
1256     retval = func(*args, **kwargs)
1257              ^^^^^^^^^^^^^^^^^^^^^
1258   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/ush/python/pygfs/task/atmens_analysis.py", line 90, in initialize
1259     self.jedi_dict['atmensanlobs'].initialize(clean_empty_obsspaces=True)
1260   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/sorc/wxflow/src/wxflow/logger.py", line 252, in wrapper
1261     retval = func(*args, **kwargs)
1262              ^^^^^^^^^^^^^^^^^^^^^
1263   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/ush/python/pygfs/jedi/jedi.py", line 189, in initialize
1264     save_as_yaml(self.exe_config, self.jedi_config.exe_config_yaml)
1265   File "/scratch4/BMC/gsienkf/Bo.Huang/expCodes/Workflow/EnKFOnly-20251031/global-workflow/sorc/wxflow/src/wxflow/yaml_file.py", line 54, in save_as_yaml
1266     yaml.safe_dump(vanilla_yaml(data), fh,
...

1320   File "/contrib/spack-stack/spack-stack-1.9.2/envs/ue-oneapi-2024.2.1/install/oneapi/2024.2.1/py-pyyaml-6.0.2-55i3htn/lib/python3.11/site-packages/yaml/representer.py     ", line 58, in represent_data
1321     node = self.yaml_representers[None](self, data)
1322            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1323   File "/contrib/spack-stack/spack-stack-1.9.2/envs/ue-oneapi-2024.2.1/install/oneapi/2024.2.1/py-pyyaml-6.0.2-55i3htn/lib/python3.11/site-packages/yaml/representer.py     ", line 231, in represent_undefined
1324     raise RepresenterError("cannot represent an object", data)
1325 yaml.representer.RepresenterError: ('cannot represent an object', {'name': 'RoundRobin', 'halo size': '1250e3'})

@CoryMartin-NOAA
Copy link
Contributor

@bhuang95 I ran into this same problem on Thursday. I don't immediately know what new change broke this, but I think the fix is this: NOAA-EMC/wxflow#65

DavidNew-NOAA pushed a commit to NOAA-EMC/jcb-gdas that referenced this pull request Jan 16, 2026
This PR works along with the following three dependent PRs to enable the
EnKF-only configuration for the atmosphere within the global workflow
(see detailed description
NOAA-EMC/global-workflow#4345)

Dependencies:
-NOAA-EMC/global-workflow#4345
-NOAA-EMC/GDASApp#2010
-NOAA-EMC/jcb#32

Resolve
- NOAA-EMC/global-workflow#4339

---------

Co-authored-by: Cory Martin <[email protected]>
Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @bhuang95 I'm going to test this and then approve if it works well

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but we need to not merge until the workflow PR is ready or else things will be broken

Copy link
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested manually on WCOSS2, and confirmed that it creates a YAML for the observer that looks good. Thanks, this will really help with YAML maintenance going forward.

@CoryMartin-NOAA CoryMartin-NOAA merged commit 9fe6905 into NOAA-EMC:develop Jan 23, 2026
3 checks passed
@RussTreadon-NOAA
Copy link
Contributor

FYI: @bhuang95 and @CoryMartin-NOAA

Updated feature/stable-nightly with develop at 9fe6905 (this PR). Start GDASApp CI with select g-w CI included on Hera. While CI is still running, the following jobs failed

 73/144 Test #312: test_gdasapp_atm_jjob_ens_letkf ..........................................***Failed  458.42 sec
 90/144 Test #314: test_gdasapp_atm_jjob_ens_obs ............................................***Failed  234.37 sec
 93/144 Test #315: test_gdasapp_atm_jjob_ens_sol ............................................***Failed   74.32 sec
 95/144 Test #316: test_gdasapp_atm_jjob_ens_inc ............................................***Failed   42.30 sec
 97/144 Test #317: test_gdasapp_atm_jjob_ens_final ..........................................***Failed   42.28 sec
 98/144 Test #214: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_atmensanlobs_202402240000 ......***Failed  369.66 sec

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA well that is odd, I tested with the develop branch of the workflow. Clearly I screwed something up. Do you have logs I can look at?

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA : GDASApp CI ran on Hera in

HOMEgfs=/scratch3/NCEPDEV/da/Russ.Treadon/CI/hera/GDASApp/stable/20260123/global-workflow

This is a copy of g-w develop at 8004c37. $HOMEgfs/sorc/gdas.cd is GDASApp branch feature/stable-nightly at 2707092.

The GDASApp ctest log file is $HOMEgfs/sorc/gdas.cd/build/log.ctest. CI is complete and the following tests failed

 73/144 Test #312: test_gdasapp_atm_jjob_ens_letkf ..........................................***Failed  458.42 sec
 90/144 Test #314: test_gdasapp_atm_jjob_ens_obs ............................................***Failed  234.37 sec
 93/144 Test #315: test_gdasapp_atm_jjob_ens_sol ............................................***Failed   74.32 sec
 95/144 Test #316: test_gdasapp_atm_jjob_ens_inc ............................................***Failed   42.30 sec
 97/144 Test #317: test_gdasapp_atm_jjob_ens_final ..........................................***Failed   42.28 sec
 98/144 Test #214: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_atmensanlobs_202402240000 ......***Failed  369.66 sec
100/144 Test #215: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_atmensanlsol_202402240000 ......***Failed   76.49 sec
103/144 Test #216: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_atmensanlfv3inc_202402240000 ...***Failed   75.64 sec
107/144 Test #217: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_atmensanlfinal_202402240000 ....***Failed   75.23 sec
121/144 Test #218: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_ecen_fv3jedi_202402240000 ......***Failed   71.50 sec
125/144 Test #220: test_gdasapp_C96C48_ufs_hybatmDA_enkfgdas_fcst_202402240000 ..............***Failed   84.28 sec
92% tests passed, 11 tests failed out of 144

Log files for atm_jjob_ens are in $HOMEgfs/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun.

Log files for C96C48_ufs_hybatmDA are in $HOMEgfs/sorc/gdas.cd/build/gdas/test/gw-ci/C96C48_ufs_hybatmDA/COMROOT/C96C48_ufs_hybatmDA/logs

@RussTreadon-NOAA
Copy link
Contributor

/scratch3/NCEPDEV/da/Russ.Treadon/CI/hera/GDASApp/stable/20260123/global-workflow/sorc/gdas.cd/build/gdas/test/atm/global-workflow/testrun/jjob_ens_obs.o21385024 contains the message

0: QC sondes windNorthward: 1795 passed out of 2827 observations.
0: Exception: ConfigurationNotFound: [obs localizations]
1: Exception: ConfigurationNotFound: [obs localizations]
2: Exception: ConfigurationNotFound: [obs localizations]
3: Exception: ConfigurationNotFound: [obs localizations]
4: Exception: ConfigurationNotFound: [obs localizations]
5: Exception: ConfigurationNotFound: [obs localizations]
2: ConfigurationNotFound: [obs localizations] caught in  (/scratch3/NCEPDEV/da/Russ.Treadon/CI/hera/GDASApp/stable/20260123/global-workflow/sorc/gdas.cd/bundle/oops/\
src/oops/runs/Run.cc:170 execute)
2: Exception: oops::LocalEnsembleDA<FV3JEDI, UFO and IODA observations> terminating...
2: Exception stack:
2: ConfigurationNotFound: [obs localizations]
2: backtrace [1] stack has 12 addresses

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA ah okay I think I already know what the problem is. I was using develop of JCB by accident. We need JCB updated into the global workflow before we can update the GDAS hash.
Can you retry after updating jcb past this hash: NOAA-EMC/jcb@d7bd27b

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA . I updated $HOMEgfs/sorc/gdas.cd/sorc/jcb to d7bd27b. I reran atm_jjob_ens. The ens_init job failed with

    raise RepresenterError("cannot represent an object", data)
yaml.representer.RepresenterError: ('cannot represent an object', {'name': 'RoundRobin', 'halo size': '1250e3'})

I updated $HOMEgfs/sorc/gdas.cd/sorc/jcb to 72ccf27, the current head of jcb develop. The ens_init job failed with the same error message.

Did I correctly make the change you suggested?

@CoryMartin-NOAA
Copy link
Contributor

Yikes, sorry @RussTreadon-NOAA my workflow checkout was not as fresh as I thought. You also need develop of wxflow. I'm going to open two PRs for that.

@RussTreadon-NOAA
Copy link
Contributor

Update $HOMEgfs/sorc/wxflow to 88c576d. Rerun test_gdasapp_atm_jjob_ens.

Test project /scratch3/NCEPDEV/da/Russ.Treadon/CI/hera/GDASApp/stable/20260123/global-workflow/sorc/gdas.cd/build
    Start 311: test_gdasapp_atm_jjob_ens_init
1/7 Test #311: test_gdasapp_atm_jjob_ens_init .........   Passed  333.92 sec
    Start 312: test_gdasapp_atm_jjob_ens_letkf
2/7 Test #312: test_gdasapp_atm_jjob_ens_letkf ........   Passed  1834.79 sec
    Start 313: test_gdasapp_atm_jjob_ens_init_split
3/7 Test #313: test_gdasapp_atm_jjob_ens_init_split ...   Passed  301.77 sec
    Start 314: test_gdasapp_atm_jjob_ens_obs
4/7 Test #314: test_gdasapp_atm_jjob_ens_obs ..........***Failed  330.46 sec
    Start 315: test_gdasapp_atm_jjob_ens_sol
5/7 Test #315: test_gdasapp_atm_jjob_ens_sol ..........***Failed  108.22 sec
    Start 316: test_gdasapp_atm_jjob_ens_inc
6/7 Test #316: test_gdasapp_atm_jjob_ens_inc ..........***Failed   43.07 sec
    Start 317: test_gdasapp_atm_jjob_ens_final
7/7 Test #317: test_gdasapp_atm_jjob_ens_final ........***Failed   74.92 sec

43% tests passed, 4 tests failed out of 7

test_gdasapp_atm_jjob_ens_letkf did not pass. It was killed by the system when it reached the specified 30 minute wall clock limit.

Comparison of the atmensanlobs.yaml and atmensanlsol.yaml from this rerun and those from yesterday's stable-nightly runs show numerous differences. This makes sense. PR #2010 replaced the use of atmosphere-lgetkf observation yamls with those from atmosphere. The atmosphere observation yamls have a lot more QC. By comparison the atmospher-lgetkf observations yamls are very sparse. They simple test functionality, not science.

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA and @bhuang95: Several GDASApp tests are broken. The stable-nightly run will fail tonight and thereafter until we decide on a path forward.

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA what is your preference? Do we want to revert this PR? Do we need to be more selective on which observations are assimilated in the CI test? Do we accept the stable nightly will fail over the weekend?

@CoryMartin-NOAA
Copy link
Contributor

@RussTreadon-NOAA I don't know why this is just now showing up, but at least some of the errors are related to this file:
https://github.com/NOAA-EMC/jcb-algorithms/blob/develop/observer_components.yaml

I think we need linear obs operator added to more of the entries in that file. Question is, why does JEDI even use the linear obs operator for LocalEnsembleDA observer?

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA and @bhuang95 : Ideally we create a g-w branch that works with the current head of GDASApp develop. This won't happen before COB today.

Reverting this PR restores things to where they were for last night's stable-nightly run. This leaves us with a GDASApp develop that works with g-w develop. A revert gives us time to test the changes in this PR with whatever g-w changes we need in order to see all GDASApp and select g-w CI tests pass.

CoryMartin-NOAA added a commit that referenced this pull request Jan 23, 2026
CoryMartin-NOAA added a commit that referenced this pull request Jan 23, 2026
This reverts commit 9fe6905.

# Description

Will investigate issues further on Monday to un-revert this and include
updates to jcb-algorithms and jcb

# Companion PRs

<!-- Enter links to any companion PRs here. -->

# Issues

<!-- Enter any issues referenced or resolved by this PR here. Use
keywords "Resolves" or "Refs".
Resolves #1234
Refs #4321
Refs NOAA-EMC/repo#5678
-->

# Automated CI tests to run in Global Workflow
<!-- Which Global Workflow CI tests are required to adequately test this
PR? -->
- [ ] atm_jjob <!-- JEDI atm single cycle DA !-->
- [ ] C96C48_ufs_hybatmDA <!-- JEDI atm cycled DA !-->
- [ ] C96C48_hybatmsnowDA <!-- JEDI snow cycled DA !-->
- [ ] C96_gcafs_cycled <!-- JEDI aerosol cycled DA !-->
- [ ] C48mx500_3DVarAOWCDA <!-- JEDI low-res marine 3DVar cycled DA !-->
- [ ] C48mx500_hybAOWCDA <!-- JEDI marine hybrid envar cycled DA !-->
- [ ] C96C48_ufsgsi_hybatmDA <!-- JEDI atm Var with GSI EnKF cycled DA
!-->
- [ ] C96C48_hybatmDA <!-- GSI atm cycled DA !-->
@bhuang95
Copy link
Contributor Author

@RussTreadon-NOAA I don't know why this is just now showing up, but at least some of the errors are related to this file: https://github.com/NOAA-EMC/jcb-algorithms/blob/develop/observer_components.yaml

I think we need linear obs operator added to more of the entries in that file. Question is, why does JEDI even use the linear obs operator for LocalEnsembleDA observer?

@CoryMartin-NOAA - linear obs operator needs to be added in local_ensemble_da: block in observer_components.yaml. Proabably linear obs operator is needed for the linear observer calculation in JEDI LETKF? Let me ask around.

@CoryMartin-NOAA
Copy link
Contributor

Thanks @bhuang95 I'm surprised we didn't notice this until now that the section of YAML gets scrubbed

@RussTreadon-NOAA
Copy link
Contributor

@CoryMartin-NOAA and @bhuang95 : Can you accumulate either in this PR or a new issue/PR all the submodules@hash we expect to use in order to enable enkf-only for atmosphere? I'll create a working copy of g-w with these components for testing purposes.

RussTreadon-NOAA pushed a commit that referenced this pull request Jan 26, 2026
This PR works along with the following three dependent PRs to enable the
EnKF-only configuration for the atmosphere within the global workflow
(see detailed description
NOAA-EMC/global-workflow#4345)

Dependencies:
-NOAA-EMC/global-workflow#4345
-NOAA-EMC/jcb-gdas#219
-NOAA-EMC/jcb#32

Resolve
- NOAA-EMC/global-workflow#4339

---------

Co-authored-by: Bo Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants