Skip to content

Introduce Restriction Screening Scripts. Add RSRD/EXPRSRD Extraction and update CMake#73

Merged
CoryMartin-NOAA merged 10 commits intodevelopfrom
feature/restriction
Apr 3, 2026
Merged

Introduce Restriction Screening Scripts. Add RSRD/EXPRSRD Extraction and update CMake#73
CoryMartin-NOAA merged 10 commits intodevelopfrom
feature/restriction

Conversation

@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor

This PR adds full support for extracting and screening observation‑level restriction metadata.
It adds two operational scripts for extracting and filtering restricted data.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two Python utilities under src/ioda-restrict/ to post-process IODA NetCDF observation files based on restriction metadata, and wires them into the build so they are copied into the build bin/ directory.

Changes:

  • Add check_rsrd.py to filter/copy NetCDF files based on restrictionFlag / restrictionExpiration metadata.
  • Add check_exprsrd.py to extract non-restricted observations from a “previous 48h” directory and write filtered outputs.
  • Update src/CMakeLists.txt to copy both scripts into ${CMAKE_BINARY_DIR}/bin.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/ioda-restrict/check_rsrd.py New NetCDF filtering/copy tool based on restriction metadata.
src/ioda-restrict/check_exprsrd.py New tool to compute non-restricted mask (incl. EXPRSRD logic) and write filtered NetCDF outputs.
src/CMakeLists.txt Adds a custom target to copy the new scripts into the build bin/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +213 to +215
for i in range(len(flag)):
fval = None if flag.mask[i] else int(flag[i])
eval = None if exp.mask[i] else int(exp[i])
Comment thread src/ioda-restrict/check_rsrd.py Outdated
Comment on lines +57 to +61
# Recursively copy subgroups
for grp_name, grp_in in in_group.groups.items():
grp_out = out_group.createGroup(grp_name)
copy_group(grp_in, grp_out, mask)

Comment thread src/ioda-restrict/ioda_restriction_filter.py
Comment on lines +209 to +213
print(" Unique RSRD / EXPRSRD patterns:")

unique_groups = {}

for i in range(len(flag)):
Comment thread src/ioda-restrict/check_rsrd.py Outdated
Comment on lines +156 to +160
mask = flag.mask & exp.mask

total = len(mask)
kept = np.sum(mask)
dropped = total - kept
Comment thread src/ioda-restrict/check_rsrd.py Outdated
Comment on lines +27 to +35
if fill_value is not None:
var_out = out_group.createVariable(
var_name, var_in.dtype, var_in.dimensions, fill_value=fill_value
)
else:
var_out = out_group.createVariable(
var_name, var_in.dtype, var_in.dimensions
)

Comment thread src/CMakeLists.txt Outdated
Comment on lines +17 to +23
add_custom_target(copy_restriction ALL
COMMAND ${CMAKE_COMMAND} -E copy
${CMAKE_CURRENT_SOURCE_DIR}/ioda-restrict/check_rsrd.py
${CMAKE_BINARY_DIR}/bin/check_rsrd.py
COMMAND ${CMAKE_COMMAND} -E copy
${CMAKE_CURRENT_SOURCE_DIR}/ioda-restrict/check_exprsrd.py
${CMAKE_BINARY_DIR}/bin/check_exprsrd.py
Comment thread src/ioda-restrict/ioda_restriction_filter.py
Copy link
Copy Markdown
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two check scripts seem to be similar. Could they be combined in to one?

if not idx_list:
return []
ranges = []
start = prev = idx_list[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think python is a bit finicky with this sort of thing, and it might change prev when you change start, unless you do a deep copy

@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor Author

The two check scripts seem to be similar. Could they be combined in to one?

Good point. I will dig into both scripts and see how they can be merged into one.

# ----------------------------------------------------------------------
# Copy entire file unchanged
# ----------------------------------------------------------------------
def copy_entire_file(nc_in, outfile):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be overkill. If you just are copying an entire file, use shutil.copy and just copy it, no need to use the netCDF library

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new operational workflow for extracting and screening observation-level restriction metadata from IODA NetCDF files, and makes the script available from the build tree.

Changes:

  • Add ioda_restriction_filter.py to run both RSRD (current cycle) and EXPRSRD (previous 48h cycle) filtering on directories of .nc files.
  • Add a CMake custom target to copy the restriction filtering script into build/bin/.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
src/ioda-restrict/ioda_restriction_filter.py New Python script implementing directory-level filtering and NetCDF group/variable copying for restriction screening.
src/CMakeLists.txt Adds an always-built custom target to copy the restriction filtering script into the build’s bin/ directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ioda-restrict/ioda_restriction_filter.py
Comment on lines +189 to +192
mask = flag.mask & exp.mask

total = len(mask)
kept = np.sum(mask)
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mask = flag.mask & exp.mask has the same scalar-mask problem as elsewhere: if either variable has no masked values, .mask can be a scalar bool and len(mask)/np.sum(mask) will not behave as intended. Normalize to boolean arrays (e.g., via np.ma.getmaskarray) before computing mask and counting kept/dropped.

Suggested change
mask = flag.mask & exp.mask
total = len(mask)
kept = np.sum(mask)
flag_mask = np.ma.getmaskarray(flag)
exp_mask = np.ma.getmaskarray(exp)
mask = flag_mask & exp_mask
total = mask.size
kept = int(mask.sum())

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +64
data = var_in[:]

if mask is None:
var_out[:] = data
elif OBS_DIM in var_in.dimensions:
if data.ndim == 1:
var_out[:] = data[mask]
else:
var_out[:] = data[mask, ...]
else:
var_out[:] = data
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_group reads every variable fully into memory (data = var_in[:]) even when only a subset of locations is needed. For large IODA files this can cause very high memory use and slow runtimes. Prefer slicing directly from the netCDF variable when OBS_DIM is present (read only the kept indices) instead of loading the full array first.

Suggested change
data = var_in[:]
if mask is None:
var_out[:] = data
elif OBS_DIM in var_in.dimensions:
if data.ndim == 1:
var_out[:] = data[mask]
else:
var_out[:] = data[mask, ...]
else:
var_out[:] = data
if mask is None:
# No restriction: copy entire variable
var_out[:] = var_in[:]
elif OBS_DIM in var_in.dimensions:
# Restricted copy along OBS_DIM; assume OBS_DIM is the leading dimension
if var_in.ndim == 1:
var_out[:] = var_in[mask]
else:
var_out[:] = var_in[mask, ...]
else:
# Variable does not depend on OBS_DIM: copy entire variable
var_out[:] = var_in[:]

Copilot uses AI. Check for mistakes.
Comment thread src/ioda-restrict/ioda_restriction_filter.py
Comment on lines +254 to +255
restricted_mask = ~non_restricted_mask
restricted_idx = np.where(restricted_mask)[0]
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restricted_idx is computed but never used, which adds noise and can confuse future maintenance. Remove it or use it for the intended logging/reporting.

Suggested change
restricted_mask = ~non_restricted_mask
restricted_idx = np.where(restricted_mask)[0]

Copilot uses AI. Check for mistakes.
Comment on lines +29 to +34
ranges.append(f"{start}–{prev}")
start = prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}–{prev}")
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compress_ranges formats ranges using a Unicode en-dash ("–"). This can cause encoding/display issues in some logs/terminals and makes grepping harder. Prefer an ASCII hyphen ("-") for portability.

Suggested change
ranges.append(f"{start}{prev}")
start = prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}{prev}")
ranges.append(f"{start}-{prev}")
start = prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}-{prev}")

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this suggestion

Comment thread src/CMakeLists.txt Outdated
# copy unified restriction script into build/bin/
# ----------------------------------------------------------------------
add_custom_target(copy_restriction ALL
COMMAND ${CMAKE_COMMAND} -E copy
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy_restriction target copies into ${CMAKE_BINARY_DIR}/bin, but it doesn't ensure that directory exists. If oops_FOUND is false (no executables built) or the bin dir hasn't been created yet, the build can fail at this step. Add a preceding cmake -E make_directory ${CMAKE_BINARY_DIR}/bin (and consider copy_if_different to avoid unnecessary rebuilds).

Suggested change
COMMAND ${CMAKE_COMMAND} -E copy
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_BINARY_DIR}/bin
COMMAND ${CMAKE_COMMAND} -E copy_if_different

Copilot uses AI. Check for mistakes.
@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor Author

To enable ioda_restriction_filter.py from da-utils/src/ioda-restriction within obsForge, the task scripts
ush/python/pyobsforge/task/aero_prepobs.py and
ush/python/pyobsforge/task/atmos_bufr_prepobs.py
also need to be updated to call the new restriction‑filter step.

Would it be appropriate to include these updates in obsForge push request?

@CoryMartin-NOAA
Copy link
Copy Markdown
Contributor

@HyundeokChoi-NOAA yes absolutely. But one thing to think about. @ilianagenkova mentioned that in operations, we don't run the unrestriction code, so it may need to be a separate job rather than part of the existing jobs. Does that make sense? Happy to discuss further if needed

@ilianagenkova
Copy link
Copy Markdown

The two scripts (one to create *nr and one to open up the files after EXPRSRD) could stay in one job, but check which machine you are on (dev or prod) to trigger the second scripts. That's how we have it in legacy obsproc, NCO has not complained (not yet).

@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor Author

@CoryMartin-NOAA @ilianagenkova I added this block ahead of the EXPRSRD filter step and confirmed that it behaves as expected. This filter does not run on the production machine.

# --- Determine whether to run EXPRSRD filter ---
# dev_m = current dev machine
dev_m = None
with open("/lfs/h1/ops/prod/config/prodmachinefile") as f:
    for line in f:
        if "backup" in line:
            parts = line.strip().split(":")
            if len(parts) >= 2:
                dev_m = parts[1]
            break

# this_m = dev machine
with open("/etc/cluster_name") as f:
    this_m = f.read().strip()

print(f"\nCluster check: dev_m={dev_m}, this_m={this_m}")

run_exprsrd = (dev_m == this_m)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +32 to +38
ranges.append(f"{start}–{prev}")
start = x
prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}–{prev}")
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compress_ranges uses an en-dash character ("–") in the range strings. This can render oddly or break downstream parsing/log collection that assumes ASCII. Prefer a plain hyphen ("-") unless Unicode output is explicitly required.

Suggested change
ranges.append(f"{start}{prev}")
start = x
prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}{prev}")
ranges.append(f"{start}-{prev}")
start = x
prev = x
if start == prev:
ranges.append(f"{start}")
else:
ranges.append(f"{start}-{prev}")

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +189
print(" Missing restriction variables — copying unchanged.")
copy_entire_file(infile, outfile)
continue

flag = md["restrictionFlag"][:]
exp = md["restrictionExpiration"][:]

if flag.size == 0 or exp.size == 0:
print(" Restriction arrays zero length — copying unchanged.")
copy_entire_file(infile, outfile)
continue

flag_mask = np.ma.getmaskarray(flag)
exp_mask = np.ma.getmaskarray(exp)
mask = flag_mask & exp_mask

Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When restriction variables are missing, the script copies the input file unchanged into the supposedly filtered output directory. If this tool is used as a restriction screen, passing data through unscreened can defeat the purpose and may leak restricted observations. Consider failing fast (non-zero exit), or at least skipping output / writing an empty filtered file when the screening fields are unavailable.

Suggested change
print(" Missing restriction variables — copying unchanged.")
copy_entire_file(infile, outfile)
continue
flag = md["restrictionFlag"][:]
exp = md["restrictionExpiration"][:]
if flag.size == 0 or exp.size == 0:
print(" Restriction arrays zero length — copying unchanged.")
copy_entire_file(infile, outfile)
continue
flag_mask = np.ma.getmaskarray(flag)
exp_mask = np.ma.getmaskarray(exp)
mask = flag_mask & exp_mask
print(" Missing restriction variables — writing empty restricted file.")
mask = np.zeros(len(loc_dim), dtype=bool)
else:
flag = md["restrictionFlag"][:]
exp = md["restrictionExpiration"][:]
if flag.size == 0 or exp.size == 0:
print(" Restriction arrays zero length — writing empty restricted file.")
mask = np.zeros(len(loc_dim), dtype=bool)
else:
flag_mask = np.ma.getmaskarray(flag)
exp_mask = np.ma.getmaskarray(exp)
mask = flag_mask & exp_mask

Copilot uses AI. Check for mistakes.
Comment thread src/ioda-restrict/ioda_restriction_filter.py
Comment on lines +327 to +337
with open("/lfs/h1/ops/prod/config/prodmachinefile") as f:
for line in f:
if "backup" in line:
parts = line.strip().split(":")
if len(parts) >= 2:
dev_m = parts[1]
break

# this_m = dev machine
with open("/etc/cluster_name") as f:
this_m = f.read().strip()
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main() unconditionally reads /lfs/h1/ops/prod/config/prodmachinefile and /etc/cluster_name without handling missing/unreadable files. Since this script is copied into the general build bin/, this will crash on non-OPS systems. Consider making this an optional CLI flag/env override, and catch FileNotFoundError/OSError to default to skipping EXPRSRD mode with a clear message.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copilot solution is acceptable, i.e. if the machine can't be identified, don't run the filter.

Comment thread src/ioda-restrict/ioda_restriction_filter.py Outdated
Comment thread src/CMakeLists.txt
Comment on lines +245 to +246
print(" Missing restriction variables — copying unchanged.")
copy_entire_file(infile, outfile)
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above for EXPRSRD mode: if restriction variables are missing the script copies the file unchanged into atmos.us. For a screening/filtering script this is unsafe; prefer skipping output or failing so restricted data can't silently pass through.

Suggested change
print(" Missing restriction variables — copying unchanged.")
copy_entire_file(infile, outfile)
print(" Missing restriction variables — no output will be written for this file.")

Copilot uses AI. Check for mistakes.
Comment thread src/ioda-restrict/ioda_restriction_filter.py Outdated
@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor Author

This PR is required for obsForge PR #205, and I believe it is ready for merge. Please review when you have a moment. Thank you.

Comment thread src/ioda-restrict/ioda_restriction_filter.py Outdated
Copy link
Copy Markdown
Contributor

@CoryMartin-NOAA CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CoryMartin-NOAA CoryMartin-NOAA merged commit a63aac0 into develop Apr 3, 2026
1 check passed
@CoryMartin-NOAA CoryMartin-NOAA deleted the feature/restriction branch April 3, 2026 18:34
@HyundeokChoi-NOAA
Copy link
Copy Markdown
Contributor Author

@CoryMartin-NOAA @ilianagenkova Thank you all!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants