-
Notifications
You must be signed in to change notification settings - Fork 8
Introduce Restriction Screening Scripts. Add RSRD/EXPRSRD Extraction and update CMake #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
9a3de71
b3fcd4a
de87926
64fe01c
dc7646a
dae4623
73ee0fc
249e956
ceb7f18
90e3d54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,364 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from netCDF4 import Dataset | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import argparse | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import glob | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import shutil | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from datetime import datetime, timedelta | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OBS_DIM = "Location" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ---------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Compress consecutive index ranges (used only in exprsrd mode) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ---------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def compress_ranges(idx_list): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not idx_list: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges = [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| start = idx_list[0] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| prev = idx_list[0] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for x in idx_list[1:]: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if x == prev + 1: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| prev = x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if start == prev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| start = x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| prev = x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if start == prev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | |
| start = x | |
| prev = x | |
| if start == prev: | |
| ranges.append(f"{start}") | |
| else: | |
| ranges.append(f"{start}–{prev}") | |
| ranges.append(f"{start}-{prev}") | |
| start = x | |
| prev = x | |
| if start == prev: | |
| ranges.append(f"{start}") | |
| else: | |
| ranges.append(f"{start}-{prev}") |
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_group reads every variable fully into memory (data = var_in[:]) even when only a subset of locations is needed. For large IODA files this can cause very high memory use and slow runtimes. Prefer slicing directly from the netCDF variable when OBS_DIM is present (read only the kept indices) instead of loading the full array first.
| data = var_in[:] | |
| if mask is None: | |
| var_out[:] = data | |
| elif OBS_DIM in var_in.dimensions: | |
| if data.ndim == 1: | |
| var_out[:] = data[mask] | |
| else: | |
| var_out[:] = data[mask, ...] | |
| else: | |
| var_out[:] = data | |
| if mask is None: | |
| # No restriction: copy entire variable | |
| var_out[:] = var_in[:] | |
| elif OBS_DIM in var_in.dimensions: | |
| # Restricted copy along OBS_DIM; assume OBS_DIM is the leading dimension | |
| if var_in.ndim == 1: | |
| var_out[:] = var_in[mask] | |
| else: | |
| var_out[:] = var_in[mask, ...] | |
| else: | |
| # Variable does not depend on OBS_DIM: copy entire variable | |
| var_out[:] = var_in[:] |
Copilot
AI
Mar 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When restriction variables are missing, the script copies the input file unchanged into the supposedly filtered output directory. If this tool is used as a restriction screen, passing data through unscreened can defeat the purpose and may leak restricted observations. Consider failing fast (non-zero exit), or at least skipping output / writing an empty filtered file when the screening fields are unavailable.
| print(" Missing restriction variables — copying unchanged.") | |
| copy_entire_file(infile, outfile) | |
| continue | |
| flag = md["restrictionFlag"][:] | |
| exp = md["restrictionExpiration"][:] | |
| if flag.size == 0 or exp.size == 0: | |
| print(" Restriction arrays zero length — copying unchanged.") | |
| copy_entire_file(infile, outfile) | |
| continue | |
| flag_mask = np.ma.getmaskarray(flag) | |
| exp_mask = np.ma.getmaskarray(exp) | |
| mask = flag_mask & exp_mask | |
| print(" Missing restriction variables — writing empty restricted file.") | |
| mask = np.zeros(len(loc_dim), dtype=bool) | |
| else: | |
| flag = md["restrictionFlag"][:] | |
| exp = md["restrictionExpiration"][:] | |
| if flag.size == 0 or exp.size == 0: | |
| print(" Restriction arrays zero length — writing empty restricted file.") | |
| mask = np.zeros(len(loc_dim), dtype=bool) | |
| else: | |
| flag_mask = np.ma.getmaskarray(flag) | |
| exp_mask = np.ma.getmaskarray(exp) | |
| mask = flag_mask & exp_mask |
Copilot
AI
Mar 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above for EXPRSRD mode: if restriction variables are missing the script copies the file unchanged into atmos.us. For a screening/filtering script this is unsafe; prefer skipping output or failing so restricted data can't silently pass through.
| print(" Missing restriction variables — copying unchanged.") | |
| copy_entire_file(infile, outfile) | |
| print(" Missing restriction variables — no output will be written for this file.") |
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restricted_idx is computed but never used, which adds noise and can confuse future maintenance. Remove it or use it for the intended logging/reporting.
| restricted_mask = ~non_restricted_mask | |
| restricted_idx = np.where(restricted_mask)[0] |
Copilot
AI
Mar 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main() unconditionally reads /lfs/h1/ops/prod/config/prodmachinefile and /etc/cluster_name without handling missing/unreadable files. Since this script is copied into the general build bin/, this will crash on non-OPS systems. Consider making this an optional CLI flag/env override, and catch FileNotFoundError/OSError to default to skipping EXPRSRD mode with a clear message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copilot solution is acceptable, i.e. if the machine can't be identified, don't run the filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
copy_restrictiontarget copies into${CMAKE_BINARY_DIR}/bin, but it doesn't ensure that directory exists. Ifoops_FOUNDis false (no executables built) or the bin dir hasn't been created yet, the build can fail at this step. Add a precedingcmake -E make_directory ${CMAKE_BINARY_DIR}/bin(and considercopy_if_differentto avoid unnecessary rebuilds).