Skip to content

Link gsi diag.* directories instead of moving them and enable Gaea emails in v17#4574

Merged
DavidHuber-NOAA merged 6 commits intoNOAA-EMC:dev/gfs.v17from
DavidHuber-NOAA:feature/gfsv17_link_diags
Feb 26, 2026
Merged

Link gsi diag.* directories instead of moving them and enable Gaea emails in v17#4574
DavidHuber-NOAA merged 6 commits intoNOAA-EMC:dev/gfs.v17from
DavidHuber-NOAA:feature/gfsv17_link_diags

Conversation

@DavidHuber-NOAA
Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA commented Feb 20, 2026

Description

This is a cherry-pick of PRs #4545 and #4458 into dev/gfs.v17. This

  • fixes a bug that prevents rerunning atmospheric analyses (gdas_anal, gfs_anal, or enkfgdas_eobs)
  • moves to an improved and NCO-approved method of linking instead of moving the gsidiag files at the end of the jobs
  • enables scrontab emailing on Gaea C6

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this change expected to change outputs NO
  • Is this a breaking change (a change in existing functionality)? NO
  • Does this change require a documentation update? NO
  • Does this change require an update to any of the following submodules? NO

How has this been tested?

Testing is in progress on C6.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added

AntonMFernando-NOAA and others added 2 commits February 20, 2026 12:23
…OAA-EMC#4458)

Implements automated email notifications when jobs fail in
scrontab-launched Rocoto experiments
@DavidHuber-NOAA DavidHuber-NOAA changed the title Link gsi diag.* directories instead of moving them in v17 Link gsi diag.* directories instead of moving them and enable Gaea emails in v17 Feb 20, 2026
@CatherineThomas-NOAA
Copy link
Contributor

@DavidHuber-NOAA - I built your branch on WCOSS2 and setup a test, but something went wrong with the crontab creation. The crontab pointed to a cron.sh script but no script was created. I'm guessing it's related to the wxflow changes bundled here, like there was a mix of the Gaea and WCOSS2 methods.

@DavidHuber-NOAA
Copy link
Contributor Author

It looks like I missed hotfix PR #4497. This is why it worked on C6 and not WCOSS2. I'll get it added and verify the cron runs correctly on WCOSS2.

- On systems using regular crontab, the workflow generation creates a
crontab entry that references a `.cron.sh` script that doesn't exist.
This causes cron jobs to fail with "script not found" errors.
@DavidHuber-NOAA
Copy link
Contributor Author

@CatherineThomas-NOAA after cherry-picking #4497, the crontab populated correctly and the jobs are running on WCOSS2. I'll update when they complete.

@DavidHuber-NOAA DavidHuber-NOAA added the CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress label Feb 23, 2026
@DavidHuber-NOAA
Copy link
Contributor Author

All tests passed on WCOSS2 except the C96_atm3DVar_extended case, which failed on the last cycle for job gfs_atmos_prod_f363-f384.log with the following message from wgrib2:

+ exglobal_atmos_products.sh[3]cd /lfs/h2/emc/stmp/david.huber/RUNDIRS/C96_atm3DVar_extended_email/gfs.2021122118/atmos_products_f363.2965106
.....
+ exglobal_atmos_products.sh[108] /apps/ops/para/libs/intel/19.1.3.304/wgrib2/2.0.8/bin/wgrib2 tmpfilea_f363 -for 595:625 -grib tmpfilea_f363_20

*** FATAL ERROR: Could not open tmpfilea_f363_20 ***

+ exglobal_atmos_products.sh[109]export err=8
+ exglobal_atmos_products.sh[109]err=8
+ exglobal_atmos_products.sh[110][[ 8 -ne 0 ]] 
+ exglobal_atmos_products.sh[111]err_exit 'wgrib2 failed to geneate an intermediate grib2 file from tmpfilea_f363 records 595 to 625'

I reran this command on the still-available /lfs/h2/emc/stmp/david.huber/RUNDIRS/C96_atm3DVar_extended_email/gfs.2021122118/atmos_products_f363.2965106/tmpfilea_f363 file and received no error message. I think this may have been a transient I/O issue. Unfortunately, with the production switch happening today, I cannot relaunch the job.

@CatherineThomas-NOAA when Dogwood comes up for development, would you like me to rerun this test?

@CatherineThomas-NOAA
Copy link
Contributor

@DavidHuber-NOAA - I had a similar error for CI tests of my own last night, but I also had a job with a more explicit "stmp disk quota exceeded" error. If the other cycles completed cleanly, this is good enough for me.

I also ran my the v17 lowres to compare with a control and poke around myself. Results are identical and everything looks good.

@DavidHuber-NOAA DavidHuber-NOAA added CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully and removed CI-Wcoss2-Running CI testing on WCOSS for this PR is in-progress labels Feb 24, 2026
@DavidHuber-NOAA
Copy link
Contributor Author

@JessicaMeixner-NOAA @CatherineThomas-NOAA @RuiyuSun Please let me know when you are ready for this PR to be merged.

@DavidHuber-NOAA DavidHuber-NOAA merged commit 1e3f9c8 into NOAA-EMC:dev/gfs.v17 Feb 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI-Wcoss2-Passed CI testing on WCOSS for this PR has completed successfully

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants