Skip to content
Merged
Show file tree
Hide file tree
Changes from 46 commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
aa7db2e
start of an ecflow def
CoryMartin-NOAA Nov 14, 2025
d370bba
touch a lot of empty ecflow scripts
CoryMartin-NOAA Nov 14, 2025
dbb6ceb
load modules in ecflow scripts
CoryMartin-NOAA Nov 14, 2025
e84b9f3
add prepare obs placeholder
CoryMartin-NOAA Nov 14, 2025
564fc2e
Add gcdas post and other cycles to suite definition file
CoryMartin-NOAA Nov 14, 2025
279030f
End of day commit
CoryMartin-NOAA Nov 14, 2025
48b8553
add gcdas post
CoryMartin-NOAA Nov 17, 2025
4c3ed57
Add in gcafs cycles
CoryMartin-NOAA Nov 17, 2025
c9d5fd8
fix bugs
CoryMartin-NOAA Nov 17, 2025
82ec40c
Merge remote-tracking branch 'origin/develop' into feature/gcafs-ecflow
CoryMartin-NOAA Nov 17, 2025
4171f5c
checkout release branch of gdas
CoryMartin-NOAA Nov 17, 2025
560774d
add obs processing stuff to workflow; still needs work
CoryMartin-NOAA Nov 17, 2025
8dba0c2
update hash
CoryMartin-NOAA Nov 18, 2025
187b63b
update to try on cactus
CoryMartin-NOAA Nov 18, 2025
913581b
fix link_workflow
CoryMartin-NOAA Nov 18, 2025
5383ed6
Add expdir
CoryMartin-NOAA Nov 18, 2025
ab5f856
Merge branch 'NOAA-EMC:develop' into release/gcafs.v0.0.1
CoryMartin-NOAA Nov 19, 2025
b1a9845
Merge branch 'NOAA-EMC:develop' into feature/gcafs-ecflow
CoryMartin-NOAA Nov 19, 2025
87e333f
add some things
CoryMartin-NOAA Nov 19, 2025
20d1501
Merge branch 'feature/gcafs-ecflow' into release/gcafs.v0.0.1
CoryMartin-NOAA Nov 19, 2025
27455f5
Add EOL
CoryMartin-NOAA Nov 19, 2025
40f8bd0
More ecflow changes
CoryMartin-NOAA Nov 21, 2025
f724aaa
more ecflow
CoryMartin-NOAA Nov 21, 2025
03fa9ed
Merge branch 'NOAA-EMC:develop' into feature/gcafs-ecflow
CoryMartin-NOAA Nov 21, 2025
0b89f40
More ecflow changes
CoryMartin-NOAA Nov 21, 2025
fc1624e
more ecflow
CoryMartin-NOAA Nov 21, 2025
c3a9fa4
Merge branch 'feature/gcafs-ecflow' of https://github.com/corymartin-…
CoryMartin-NOAA Nov 21, 2025
45014cb
Merge branch 'develop' into release/gcafs.v0.0.1
CoryMartin-NOAA Nov 21, 2025
e440de0
remove script with annoying code norms
CoryMartin-NOAA Nov 21, 2025
4b8013d
fix issues in suite def file
Nov 21, 2025
eb0da6c
fix issues in suite def file
Nov 21, 2025
2632a3f
update gdas hash
CoryMartin-NOAA Nov 24, 2025
72f403e
add start to setup script
CoryMartin-NOAA Nov 24, 2025
6c1bfa1
save before meetings
CoryMartin-NOAA Nov 24, 2025
955f4ed
commit progress
CoryMartin-NOAA Nov 25, 2025
a97531a
Merge branch 'develop' into feature/gcafs-ecflow
CoryMartin-NOAA Dec 1, 2025
639e7f1
Merge branch 'develop' into release/gcafs.v0.0.1
CoryMartin-NOAA Dec 1, 2025
c8c6e40
Merge branch 'feature/gcafs-ecflow' into release/gcafs.v0.0.1
CoryMartin-NOAA Dec 1, 2025
db4aa13
Merge branch 'develop' into feature/gcafs-ecflow
CoryMartin-NOAA Dec 9, 2025
6416270
Script copies jobs and scripts
CoryMartin-NOAA Dec 9, 2025
537d6ea
Save script
CoryMartin-NOAA Dec 9, 2025
95602fb
Merge branch 'feature/gcafs-ecflow' into release/gcafs.v0.0.1
CoryMartin-NOAA Dec 10, 2025
8c6a176
Merge branch 'dev/gcafs.v1' into release/gcafs.v0.0.1
CoryMartin-NOAA Jan 26, 2026
27bb94d
Update dev/jobs/JGCDAS_PREPARE_OBS
CoryMartin-NOAA Jan 26, 2026
8083a05
Update dev/jobs/JGCDAS_PREPARE_OBS
CoryMartin-NOAA Jan 26, 2026
5e892c7
Update dev/jobs/JGCDAS_PREPARE_OBS
CoryMartin-NOAA Jan 26, 2026
dcb20fb
remove module loads
CoryMartin-NOAA Jan 26, 2026
3e1c671
Merge branch 'feature/more-ecflow' of https://github.com/corymartin-n…
CoryMartin-NOAA Jan 26, 2026
c1bbb8b
remove modules from prepare obs
Jan 26, 2026
7b8ef54
modify ush directory
CoryMartin-NOAA Jan 26, 2026
e0d2bdf
smarter script
CoryMartin-NOAA Jan 26, 2026
862aff7
add support for new job
CoryMartin-NOAA Jan 26, 2026
52bb5f4
change parm config too
CoryMartin-NOAA Jan 26, 2026
5e80bd3
change dev scripts
CoryMartin-NOAA Jan 26, 2026
f201195
more changes
CoryMartin-NOAA Jan 26, 2026
e308073
add parm/config/gcafs
Jan 26, 2026
ba266af
update sdf
Jan 26, 2026
f4c117a
changes for ecf
CoryMartin-NOAA Jan 27, 2026
de1feb6
para not prod
Jan 27, 2026
90adcdc
commit some things
Jan 27, 2026
a95080a
copy ex script
CoryMartin-NOAA Jan 27, 2026
2aaa2fd
load python modules
CoryMartin-NOAA Jan 27, 2026
1dcf75a
more changes
CoryMartin-NOAA Jan 27, 2026
231c04e
more changes again
CoryMartin-NOAA Jan 27, 2026
e543ab7
modify ecf scripts
CoryMartin-NOAA Jan 27, 2026
ff140ee
more modulefile changes
Jan 27, 2026
79d8fc9
more replacements
CoryMartin-NOAA Jan 27, 2026
b1f7975
more fault tolerant on replacements
CoryMartin-NOAA Jan 27, 2026
86a516a
fix more modules
Jan 27, 2026
fcb76be
end of day commit
CoryMartin-NOAA Jan 27, 2026
07f2536
Add more functionality to the script
CoryMartin-NOAA Jan 28, 2026
d45bbfa
add EXPDIR
CoryMartin-NOAA Jan 28, 2026
2d3b8b2
sh to py
CoryMartin-NOAA Jan 28, 2026
cc5ae0c
update resources
CoryMartin-NOAA Jan 28, 2026
a281f63
add some scripts
Jan 28, 2026
c91a6ca
Merge branch 'feature/more-ecflow' of https://github.com/corymartin-n…
Jan 28, 2026
8e9e320
Merge branch 'feature/more-ecflow' of https://github.com/corymartin-n…
CoryMartin-NOAA Jan 28, 2026
73c5f18
more ecf module changes
CoryMartin-NOAA Jan 28, 2026
bdb8923
more hacks
CoryMartin-NOAA Jan 28, 2026
f07a7b2
rename j-job
CoryMartin-NOAA Jan 28, 2026
464dc14
rename back to gdas.cd
CoryMartin-NOAA Jan 29, 2026
54ff3d2
gcdas back to gdas
Jan 29, 2026
676c44c
update hash
CoryMartin-NOAA Jan 29, 2026
80478e1
Merge branch 'feature/more-ecflow' of https://github.com/corymartin-n…
CoryMartin-NOAA Jan 29, 2026
c7863e5
update setup script
CoryMartin-NOAA Jan 30, 2026
b5bc5b7
more small changes
CoryMartin-NOAA Jan 30, 2026
231d08c
some aod prep obs changes
CoryMartin-NOAA Jan 30, 2026
b7c96de
fix regex
CoryMartin-NOAA Jan 30, 2026
4046113
changes for more jobs to run
CoryMartin-NOAA Jan 30, 2026
5cc3b53
fix style
CoryMartin-NOAA Feb 2, 2026
a488fe1
actually fix style
CoryMartin-NOAA Feb 2, 2026
0f20d5c
Apply suggestions from code review
CoryMartin-NOAA Feb 2, 2026
ed62a68
change to using wxflow executable
CoryMartin-NOAA Feb 2, 2026
0e88c37
Merge branch 'feature/more-ecflow' of https://github.com/corymartin-n…
CoryMartin-NOAA Feb 2, 2026
27e4f4f
Addressed comments, STILL NEEDS TESTED, USER BEWARE
CoryMartin-NOAA Feb 3, 2026
9b94a19
Update dev/jobs/JGLOBAL_OFFLINE_ATMOS_ANALYSIS
CoryMartin-NOAA Feb 3, 2026
bbc0b75
fixed a typo
CoryMartin-NOAA Feb 3, 2026
bad8284
fixes for prep_emissions NRT (#12)
bbakernoaa Feb 3, 2026
076bc71
Fix typo in logger usage
CoryMartin-NOAA Feb 4, 2026
a9d52b5
Update GFS template path in config.com
CoryMartin-NOAA Feb 4, 2026
21cb379
Update environment variable declarations in JGLOBAL_OFFLINE_ATMOS_ANA…
CoryMartin-NOAA Feb 4, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions dev/jobs/JGCDAS_PREPARE_OBS
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#! /usr/bin/env bash

source "${HOMEgcafs}/ush/preamble.sh"
source "${HOMEgcafs}/ush/jjob_header.sh"

##############################################
# Set variables used in the script
##############################################

##############################################
# Begin JOB SPECIFIC work
##############################################

###############################################################
# Run relevant script

EXSCRIPT=${DUMPAODPY:-${HOMEgcafs}/scripts/exgcdas_prepare_obs.py}
${EXSCRIPT}
status=$?
if [[ ${status} -ne 0 ]]; then
exit "${status}"
fi

##############################################
# End JOB SPECIFIC work
##############################################

##############################################
# Final processing
##############################################
if [[ -e "${pgmout}" ]]; then
cat "${pgmout}"
fi

##########################################
# Remove the Temporary working directory
##########################################
cd "${DATAROOT}" || exit
if [[ "${KEEPDATA}" == "NO" ]]; then
rm -rf "${DATA}"
fi

exit 0
27 changes: 27 additions & 0 deletions dev/scripts/exgcdas_prepare_obs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/usr/bin/env python3
# exgcdas_prepare_obs.py
# This script will collect and preprocess
# aerosol optical depth observations for
# global aerosol assimilation
import os

from wxflow import AttrDict, Logger, cast_strdict_as_dtypedict, parse_j2yaml
from pyobsforge.task.aero_prepobs import AerosolObsPrep

# Initialize root logger
logger = Logger(level='DEBUG', colored_log=True)


if __name__ == '__main__':

# Take configuration from environment and cast it as python dictionary
config_env = cast_strdict_as_dtypedict(os.environ)
# Take configuration from YAML file to augment/append config dict
config_yaml = parse_j2yaml(os.path.join(config_env['HOMEgcafs'], 'parm', 'chem', 'prepare_obs.yaml'), config_env)
# Combine configs together
config = AttrDict(**config_env, **config_yaml['aoddump'])

aeroObs = AerosolObsPrep(config)
aeroObs.initialize()
aeroObs.execute()
aeroObs.finalize()
4 changes: 2 additions & 2 deletions dev/ush/setup_gcafs_for_nco.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ def setup_gcafs_for_nco():
for file_path in all_copied_files:
num_replacements = replace_gfs_with_gcafs(file_path)
print(f"Modified {file_path}: {num_replacements} replacements made.")


if __name__ == "__main__":
setup_gcafs_for_nco()
9 changes: 9 additions & 0 deletions parm/chem/prepare_obs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
aoddump:
provider: VIIRSAOD
platforms: ['npp', 'j01', 'n21'] # note j01==n20
binning_stride: 33
binning_min_number_of_obs: 32
binning_cressman_radius: 25
thinning_threshold: 0
channel: 4
preqc: 0
5 changes: 5 additions & 0 deletions ush/python/pyobsforge/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import os

__docformat__ = "restructuredtext"
__version__ = "0.1.0"
pyobsforge_directory = os.path.dirname(__file__)
1 change: 1 addition & 0 deletions ush/python/pyobsforge/obsdb/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .obsdb import BaseDatabase # noqa
93 changes: 93 additions & 0 deletions ush/python/pyobsforge/obsdb/jrr_aod_db.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
import os
import glob
from datetime import datetime, timedelta
from pyobsforge.obsdb import BaseDatabase


class JrrAodDatabase(BaseDatabase):
"""Class to manage an observation file database for JRR-AOD data."""

def __init__(self, db_name="jrr_aod_obs.db",
dcom_dir="/lfs/h1/ops/prod/dcom/",
obs_dir="jrr_aod"):
base_dir = os.path.join(dcom_dir, '*', obs_dir)
super().__init__(db_name, base_dir)

def create_database(self):
"""
Create the SQLite database and observation files table.

The table `obs_files` contains the following columns:
- `id`: A unique identifier for each record (auto-incremented primary key).
- `filename`: The full path to the observation file (must be unique).
- `obs_time`: The timestamp of the observation, extracted from the filename.
- `receipt_time`: The timestamp when the file was added to the `dcom` directory.
- `satellite`: The satellite from which the observation was collected (e.g., NPP, NOAA-20).
"""
query = """
CREATE TABLE IF NOT EXISTS obs_files (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT UNIQUE,
obs_time TIMESTAMP,
receipt_time TIMESTAMP,
satellite TEXT
)
"""
self.execute_query(query)

def parse_filename(self, filename):
"""Extract metadata from filenames matching the JRR-AOD pattern."""
# Make sure the filename matches the expected pattern
# Pattern: JRR-AOD_v3r2_n21_sYYYYMMDDHHMMSS_eYYYYMMDDHHMMSS_cYYYYMMDDHHMMSS.nc
basename = os.path.basename(filename)
parts = basename.split('_')
try:
if len(parts) >= 4 and parts[0] == "JRR-AOD":
obs_time = datetime.strptime(parts[3][1:13], "%Y%m%d%H%M")
receipt_time = datetime.fromtimestamp(os.path.getctime(filename))
satellite = parts[2]
return filename, obs_time, receipt_time, satellite
except ValueError:
return None

return None

def ingest_files(self):
"""Scan the directory for new JRR-AOD observation files and insert them into the database."""
obs_files = glob.glob(os.path.join(self.base_dir, "*.nc"))
print(f"Found {len(obs_files)} new files to ingest")

records_to_insert = []
for file in obs_files:
parsed_data = self.parse_filename(file)
if parsed_data:
records_to_insert.append(parsed_data)

if records_to_insert:
query = """
INSERT INTO obs_files (filename, obs_time, receipt_time, satellite)
VALUES (?, ?, ?, ?)
"""
self.insert_records(query, records_to_insert)


if __name__ == "__main__":
db = JrrAodDatabase(dcom_dir="/home/gvernier/Volumes/hera-s1/runs/realtimeobs/lfs/h1/ops/prod/dcom/")

# Check for new files
db.ingest_files()

# Query files for a given DA cycle
da_cycle = "20250316120000"
window_begin = datetime.strptime(da_cycle, "%Y%m%d%H%M%S") - timedelta(hours=3)
window_end = datetime.strptime(da_cycle, "%Y%m%d%H%M%S") + timedelta(hours=3)

valid_files = db.get_valid_files(window_begin=window_begin,
window_end=window_end)

print(f"Found {len(valid_files)} valid files for DA cycle {da_cycle}")
for valid_file in valid_files:
if os.path.exists(valid_file):
print(f"Valid file: {valid_file}")
else:
print(f"File does not exist: {valid_file}")
144 changes: 144 additions & 0 deletions ush/python/pyobsforge/obsdb/obsdb.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
from logging import getLogger
import sqlite3
from datetime import datetime, timedelta
from wxflow.sqlitedb import SQLiteDB
from wxflow import FileHandler
from os.path import basename, join

logger = getLogger(__name__.split('.')[-1])


class BaseDatabase(SQLiteDB):
"""Base class for managing different types of file-based databases."""

def __init__(self, db_name: str, base_dir: str) -> None:
"""
Initialize the database.

:param db_name: Name of the SQLite database.
:param base_dir: Directory containing observation files.
"""
super().__init__(db_name)
self.base_dir = base_dir
self.create_database()

def create_database(self):
"""Create the SQLite database. Must be implemented by subclasses."""
raise NotImplementedError("Subclasses must implement create_database method")

def get_connection(self):
"""Return the database connection."""
return self.connection

def parse_filename(self):
"""Parse a filename and extract relevant metadata. Must be implemented by subclasses."""
raise NotImplementedError("Subclasses must implement parse_filename method")

def ingest_files(self):
"""Scan the directory for new observation files and insert them into the database."""
raise NotImplementedError("Subclasses must implement ingest_files method")

def insert_record(self, query: str, params: tuple) -> None:
"""Insert a record into the database."""
self.connect()
cursor = self.connection.cursor()
try:
cursor.execute(query, params)
self.connection.commit()
except sqlite3.IntegrityError:
pass # Skip duplicates
finally:
self.disconnect()

def insert_records(self, query: str, params_list: list[tuple]) -> None:
"""
Insert multiple records into the database.

:param query: SQL query for inserting records.
:param params_list: List of tuples containing the parameters for each record.
"""
self.connect()
cursor = self.connection.cursor()
try:
cursor.executemany(query, params_list)
self.connection.commit()
except sqlite3.IntegrityError:
pass # Skip duplicates
finally:
self.disconnect()

def execute_query(self, query: str, params: tuple = None) -> list:
"""Execute a query and return the results."""
self.connect()
cursor = self.connection.cursor()
cursor.execute(query, params or [])
results = cursor.fetchall()
self.disconnect()
return results

def get_valid_files(self,
window_begin: datetime,
window_end: datetime,
dst_dir: str,
instrument: str = None,
satellite: str = None,
obs_type: str = None,
check_receipt: str = "none") -> list:
"""
Retrieve and copy to dst_dir a list of observation files within a specified time window, possibly filtered by instrument,
satellite, and observation type. The check_receipt parameter can be 'gdas', 'gfs', or 'none'. If 'gdas' or
'gfs' is specified, files are further filtered based on their receipt time to ensure they meet the
required delay criteria.

:param window_begin: Start of the time window (datetime object).
:param window_end: End of the time window (datetime object).
:param dst_dir: Destination directory where valid files will be copied.
:param instrument: (Optional) Filter by instrument name.
:param satellite: (Optional) Filter by satellite name.
:param obs_type: (Optional) Filter by observation type.
:param check_receipt: (Optional) Specify receipt time check ('gdas', 'gfs', or 'none').
:return: List of valid observation file paths in the destination directory.
"""

query = """
SELECT filename FROM obs_files
WHERE obs_time BETWEEN ? AND ?
"""
minutes_behind_realtime = {'gdas': 160, 'gfs': 20}
params = [window_begin, window_end]

if instrument:
query += " AND instrument = ?"
params.append(instrument)
if satellite:
query += " AND satellite = ?"
params.append(satellite)
if obs_type:
query += " AND obs_type = ?"
params.append(obs_type)

results = self.execute_query(query, tuple(params))
valid_files = []
for row in results:
filename = row[0]
if check_receipt in ["gdas", "gfs"]:
query = "SELECT receipt_time FROM obs_files WHERE filename = ?"
receipt_time = self.execute_query(query, (filename,))[0][0]
receipt_time = datetime.strptime(receipt_time, "%Y-%m-%d %H:%M:%S.%f")
if receipt_time <= window_end - timedelta(minutes=minutes_behind_realtime[check_receipt]):
continue

valid_files.append(filename)

# Copy files to the destination directory
dst_files = []
if len(valid_files) > 0:
src_dst_obs_list = [] # list of [src_file, dst_file]
for src_file in valid_files:
dst_file = join(dst_dir, f"{basename(src_file)}")
dst_files.append(dst_file)
src_dst_obs_list.append([src_file, dst_file])
FileHandler({'mkdir': [dst_dir]}).sync()
FileHandler({'copy': src_dst_obs_list}).sync()

return dst_files
Empty file.
Loading
Loading