-
Notifications
You must be signed in to change notification settings - Fork 8
Introduce Restriction Screening Scripts. Add RSRD/EXPRSRD Extraction and update CMake #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
9a3de71
b3fcd4a
de87926
64fe01c
dc7646a
dae4623
73ee0fc
249e956
ceb7f18
90e3d54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,341 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| #!/usr/bin/env python3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from netCDF4 import Dataset | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import numpy as np | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import argparse | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import yaml | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import glob | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from datetime import datetime, timedelta | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| OBS_DIM = "Location" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ---------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Compress consecutive index ranges (used only in exprsrd mode) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # ---------------------------------------------------------------------- | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def compress_ranges(idx_list): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not idx_list: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges = [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| start = prev = idx_list[0] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think python is a bit finicky with this sort of thing, and it might change prev when you change start, unless you do a deep copy |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for x in idx_list[1:]: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if x == prev + 1: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| prev = x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if start == prev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| start = prev = x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if start == prev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| else: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ranges.append(f"{start}–{prev}") | |
| start = prev = x | |
| if start == prev: | |
| ranges.append(f"{start}") | |
| else: | |
| ranges.append(f"{start}–{prev}") | |
| ranges.append(f"{start}-{prev}") | |
| start = prev = x | |
| if start == prev: | |
| ranges.append(f"{start}") | |
| else: | |
| ranges.append(f"{start}-{prev}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this suggestion
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy_group reads every variable fully into memory (data = var_in[:]) even when only a subset of locations is needed. For large IODA files this can cause very high memory use and slow runtimes. Prefer slicing directly from the netCDF variable when OBS_DIM is present (read only the kept indices) instead of loading the full array first.
| data = var_in[:] | |
| if mask is None: | |
| var_out[:] = data | |
| elif OBS_DIM in var_in.dimensions: | |
| if data.ndim == 1: | |
| var_out[:] = data[mask] | |
| else: | |
| var_out[:] = data[mask, ...] | |
| else: | |
| var_out[:] = data | |
| if mask is None: | |
| # No restriction: copy entire variable | |
| var_out[:] = var_in[:] | |
| elif OBS_DIM in var_in.dimensions: | |
| # Restricted copy along OBS_DIM; assume OBS_DIM is the leading dimension | |
| if var_in.ndim == 1: | |
| var_out[:] = var_in[mask] | |
| else: | |
| var_out[:] = var_in[mask, ...] | |
| else: | |
| # Variable does not depend on OBS_DIM: copy entire variable | |
| var_out[:] = var_in[:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be overkill. If you just are copying an entire file, use shutil.copy and just copy it, no need to use the netCDF library
Copilot
AI
Mar 27, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When restriction variables are missing, the script copies the input file unchanged into the supposedly filtered output directory. If this tool is used as a restriction screen, passing data through unscreened can defeat the purpose and may leak restricted observations. Consider failing fast (non-zero exit), or at least skipping output / writing an empty filtered file when the screening fields are unavailable.
| print(" Missing restriction variables — copying unchanged.") | |
| copy_entire_file(infile, outfile) | |
| continue | |
| flag = md["restrictionFlag"][:] | |
| exp = md["restrictionExpiration"][:] | |
| if flag.size == 0 or exp.size == 0: | |
| print(" Restriction arrays zero length — copying unchanged.") | |
| copy_entire_file(infile, outfile) | |
| continue | |
| flag_mask = np.ma.getmaskarray(flag) | |
| exp_mask = np.ma.getmaskarray(exp) | |
| mask = flag_mask & exp_mask | |
| print(" Missing restriction variables — writing empty restricted file.") | |
| mask = np.zeros(len(loc_dim), dtype=bool) | |
| else: | |
| flag = md["restrictionFlag"][:] | |
| exp = md["restrictionExpiration"][:] | |
| if flag.size == 0 or exp.size == 0: | |
| print(" Restriction arrays zero length — writing empty restricted file.") | |
| mask = np.zeros(len(loc_dim), dtype=bool) | |
| else: | |
| flag_mask = np.ma.getmaskarray(flag) | |
| exp_mask = np.ma.getmaskarray(exp) | |
| mask = flag_mask & exp_mask |
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask = flag.mask & exp.mask has the same scalar-mask problem as elsewhere: if either variable has no masked values, .mask can be a scalar bool and len(mask)/np.sum(mask) will not behave as intended. Normalize to boolean arrays (e.g., via np.ma.getmaskarray) before computing mask and counting kept/dropped.
| mask = flag.mask & exp.mask | |
| total = len(mask) | |
| kept = np.sum(mask) | |
| flag_mask = np.ma.getmaskarray(flag) | |
| exp_mask = np.ma.getmaskarray(exp) | |
| mask = flag_mask & exp_mask | |
| total = mask.size | |
| kept = int(mask.sum()) |
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restricted_idx is computed but never used, which adds noise and can confuse future maintenance. Remove it or use it for the intended logging/reporting.
| restricted_mask = ~non_restricted_mask | |
| restricted_idx = np.where(restricted_mask)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
copy_restrictiontarget copies into${CMAKE_BINARY_DIR}/bin, but it doesn't ensure that directory exists. Ifoops_FOUNDis false (no executables built) or the bin dir hasn't been created yet, the build can fail at this step. Add a precedingcmake -E make_directory ${CMAKE_BINARY_DIR}/bin(and considercopy_if_differentto avoid unnecessary rebuilds).