-
Notifications
You must be signed in to change notification settings - Fork 207
Add Email Notification System for scrontab-Launched Rocoto Workflows #4458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
DavidHuber-NOAA
merged 99 commits into
NOAA-EMC:develop
from
AntonMFernando-NOAA:feature/scrontab
Jan 30, 2026
Merged
Changes from 84 commits
Commits
Show all changes
99 commits
Select commit
Hold shift + click to select a range
b81e3ea
Add email notification system for scrontab-launched Rocoto workflows
AntonMFernando-NOAA bb35450
update rocoto scripts
AntonMFernando-NOAA 227ece7
Merge changes into feature/scrontab branch
AntonMFernando-NOAA 0275f6c
Revert "Add email notification system for scrontab-launched Rocoto wo…
AntonMFernando-NOAA 573c451
update rocoto
AntonMFernando-NOAA 51f6744
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 6a46e53
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 79dac8a
update dev/workflow/rocoto/rocoto_cron_template.sh
AntonMFernando-NOAA 7dcfe25
update dev/workflow/rocoto/rocoto_cron_template.sh
AntonMFernando-NOAA 7f16b5b
updated for a test
AntonMFernando-NOAA c3cf594
update dev/workflow/rocoto/rocoto_cron_template.sh
AntonMFernando-NOAA 54f373f
update name
AntonMFernando-NOAA 26bafaa
update dev/workflow/rocoto/rocoto_scron_template.sh
AntonMFernando-NOAA d21db79
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 1cee746
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 07a9f08
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA e9a8772
remove typo
AntonMFernando-NOAA 31d8ae5
update ufs submodule
AntonMFernando-NOAA 26b5946
update ufs_utils submodule
AntonMFernando-NOAA 2d1bf18
update dev/workflow/rocoto/rocoto_xml.py
AntonMFernando-NOAA 22c3725
update dev/workflow/rocoto/rocoto_scron_template.sh
AntonMFernando-NOAA 262178c
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA f53c449
update dev/workflow/rocoto/rocoto_scron_template.sh
AntonMFernando-NOAA c9f2745
Merge branch 'feature/scrontab' of https://github.com/AntonMFernando-…
AntonMFernando-NOAA 0907055
shelcheck error fix
AntonMFernando-NOAA e0a5db9
update dev/workflow/rocoto/rocoto_scron_template.sh
AntonMFernando-NOAA d34feaa
update dev/workflow/rocoto/rocoto_scron_template.sh
AntonMFernando-NOAA 966f0a2
update dev/workflow/rocoto/rocoto_xml.py
AntonMFernando-NOAA 1556bc5
Merge branch 'develop' into feature/scrontab
AntonMFernando-NOAA a4d2024
shellcheck errors
AntonMFernando-NOAA abf88c8
update to fix shellcheck errors
AntonMFernando-NOAA 4ab3636
update bash
AntonMFernando-NOAA bb65399
update dev/workflow/rocoto/rocoto_xml.py
AntonMFernando-NOAA 68af080
submodule update
AntonMFernando-NOAA 6f010af
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA ad48031
added template
AntonMFernando-NOAA e1aab1f
Merge branch 'develop' into feature/scrontab
DavidHuber-NOAA 325e44c
Update dev/workflow/rocoto/rocoto_scron.sh.j2
AntonMFernando-NOAA b12b1f4
added email settings
AntonMFernando-NOAA bd5eabb
removed typo
AntonMFernando-NOAA 3a5c7e3
Update submodules to develop branch hashes
AntonMFernando-NOAA 8fb44f9
Align submodules with upstream develop branch hashes
AntonMFernando-NOAA 90b082c
cleaning
AntonMFernando-NOAA f11e95b
get REPLYTO from env variables
AntonMFernando-NOAA 9d7a644
update dev/workflow/rocoto/rocoto_xml.py
AntonMFernando-NOAA 219ba1f
typos
AntonMFernando-NOAA e7badd0
bug
AntonMFernando-NOAA 2d9aec5
change comment
AntonMFernando-NOAA 447002b
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 8a626f4
added a message
AntonMFernando-NOAA ba292d9
add only to scrontab
AntonMFernando-NOAA 49df8de
add scrontab read option
AntonMFernando-NOAA 2ecdaac
update generate_workflow.sh
AntonMFernando-NOAA 55d8149
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA f304a12
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA d13bcfe
update scripts
AntonMFernando-NOAA db30ff2
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 468f9e7
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 5b87973
updated replyto>mailto
AntonMFernando-NOAA 0e7d161
Fix crontab MAILTO handling and conditional message display
AntonMFernando-NOAA 158d7df
Add MAILTO to tests.cron regardless of -c flag usage
AntonMFernando-NOAA becde3e
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA a518d80
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 5577920
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 9dddf3a
Suppress MAILTO message when email already exists in crontab
AntonMFernando-NOAA 0ed8c56
Fix MAILTO detection to match any MAILTO format
AntonMFernando-NOAA 3b2ab13
update generate_workflows
AntonMFernando-NOAA 9466f15
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 11992ab
typo
AntonMFernando-NOAA efcc25d
Format MAILTO line with consistent 65-character width
AntonMFernando-NOAA e7eb283
typo
AntonMFernando-NOAA 5851901
update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 7f8948a
typo
AntonMFernando-NOAA 016b3c4
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA ecf4324
typo
AntonMFernando-NOAA e2dfd99
typos
AntonMFernando-NOAA a262543
shellcheck
AntonMFernando-NOAA ce44ca4
dev/workflow/generate_workflows.sh
AntonMFernando-NOAA dbf9d88
typo
AntonMFernando-NOAA 84f56fe
update dev/workflow/rocoto/rocoto_xml.py
AntonMFernando-NOAA cd59295
typo
AntonMFernando-NOAA caa4b4e
Merge branch 'develop' into feature/scrontab
AntonMFernando-NOAA 17d038e
delete extra file
AntonMFernando-NOAA 2835d6b
Merge branch 'develop' into feature/scrontab
AntonMFernando-NOAA ad32fd2
update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA 85440a9
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA 9c08d00
remove comment
AntonMFernando-NOAA 7b5d085
remove unncessary functions
AntonMFernando-NOAA 2bb573c
update with cron check
AntonMFernando-NOAA d84ae51
warning conditions changes
AntonMFernando-NOAA bf49fb5
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA ecb5bd6
Update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA d6aba95
update warning
AntonMFernando-NOAA 777ae9c
remove all the additional stuff. back to normal.
AntonMFernando-NOAA 8975476
update dev/workflow/generate_workflows.sh
AntonMFernando-NOAA daf15b4
update generate_workflow
AntonMFernando-NOAA d82b122
update parm/globus/init_xfer.sh.j2
AntonMFernando-NOAA 3fdafff
Merge branch 'develop' into feature/scrontab
AntonMFernando-NOAA d10615d
Merge branch 'NOAA-EMC:develop' into feature/scrontab
AntonMFernando-NOAA File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| #! /usr/bin/env bash | ||
AntonMFernando-NOAA marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| source "{{ HOMEgfs }}/dev/ush/gw_setup.sh" | ||
|
|
||
| # Run rocotorun | ||
| bash -c "{{ rocotorunstr }}" | ||
|
|
||
| # Monitor for failed jobs using rocotostat | ||
| LOCKFILE="{{ expdir }}/.failed_jobs.lock" | ||
| ROCOTOSTAT=$(command -v rocotostat) | ||
|
|
||
| if [[ -n "${ROCOTOSTAT}" ]]; then | ||
| FAILED_JOBS=$(${ROCOTOSTAT} -d "{{ expdir }}/{{ pslot }}.db" -w "{{ expdir }}/{{ pslot }}.xml" -c all 2> /dev/null | grep -E 'DEAD') | ||
|
|
||
| if [[ -n "${FAILED_JOBS}" ]]; then | ||
| # Read previously reported failures | ||
| PREV_FAILED="" | ||
| if [[ -f "${LOCKFILE}" ]]; then | ||
| PREV_FAILED=$(cat "${LOCKFILE}") | ||
| fi | ||
|
|
||
| # Check for NEW failures only (not just changes) | ||
| NEW_FAILURES="" | ||
| while IFS= read -r job; do | ||
| if [[ -n "${job}" ]] && ! echo "${PREV_FAILED}" | grep -qF "${job}"; then | ||
| NEW_FAILURES="${NEW_FAILURES}${job}"$'\n' | ||
| fi | ||
| done <<< "${FAILED_JOBS}" | ||
|
|
||
| # Send email only if there are NEW failures | ||
| if [[ -n "${NEW_FAILURES}" ]]; then | ||
| MSGFILE="/tmp/rocoto_fail_msg_$$.txt" | ||
| { | ||
| echo "The following jobs have failed in experiment {{ pslot }} on ${MACHINE_ID}:" | ||
| echo "" | ||
|
|
||
| # Format each failed job with detailed information | ||
| while IFS= read -r line; do | ||
| if [[ -n "${line}" ]]; then | ||
| # Parse rocotostat output: Cycle Task JobID State Try MaxTries Duration | ||
| read -r cycle task jobid state try maxtries duration <<< "${line}" | ||
| # Extract YYYYMMDDHH from cycle (first 10 characters) | ||
| cycle_short=${cycle:0:10} | ||
| # Get current timestamp | ||
| timestamp=$(date -u '+%m/%d/%y %H:%M:%S UTC') | ||
|
|
||
| # Format similar to user's example | ||
| echo "${timestamp} :: {{ pslot }}.xml :: Cycle ${cycle}, Task ${task}, jobid=${jobid}, in state ${state}, ran for ${duration} seconds, try=${try} (of ${maxtries})" | ||
| echo "Check log: {{ comroot }}/{{ pslot }}/logs/${cycle_short}/${task}.log" | ||
| echo "" | ||
| fi | ||
| done <<< "${NEW_FAILURES}" | ||
| } > "${MSGFILE}" | ||
|
|
||
| # Try to send email | ||
| EMAIL="{{ mailto }}" | ||
| hostname_domain=$(hostname -d) | ||
| FROM_EMAIL="no-reply@${hostname_domain}" | ||
| if [[ "${EMAIL}" != "None" ]] && command -v mail &> /dev/null; then | ||
| # On Gaea, the mail utility requires the -v (verbose) flag to ensure delivery. | ||
| # To avoid receiving verbose output as an actual email, a spoofed 'from' address is used for notifications. | ||
| mail -r "${FROM_EMAIL}" -v -s "[{{ pslot }}] Workflow Job Failures Detected" "${EMAIL}" < "${MSGFILE}" 2>&1 | ||
| fi | ||
|
|
||
| rm -f "${MSGFILE}" | ||
| fi | ||
|
|
||
| # Always update lockfile to reflect current failures | ||
| echo "${FAILED_JOBS}" > "${LOCKFILE}" | ||
| else | ||
| # No failures, remove lockfile if it exists | ||
| [[ -f "${LOCKFILE}" ]] && rm -f "${LOCKFILE}" | ||
| fi | ||
| fi | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.