Introduce Restriction Screening Scripts. Add RSRD/EXPRSRD Extraction and update CMake by HyundeokChoi-NOAA · Pull Request #73 · NOAA-EMC/DA-utils

HyundeokChoi-NOAA · 2026-03-17T19:25:13Z

This PR adds full support for extracting and screening observation‑level restriction metadata.
It adds two operational scripts for extracting and filtering restricted data.

Copilot

Pull request overview

Adds two Python utilities under src/ioda-restrict/ to post-process IODA NetCDF observation files based on restriction metadata, and wires them into the build so they are copied into the build bin/ directory.

Changes:

Add check_rsrd.py to filter/copy NetCDF files based on restrictionFlag / restrictionExpiration metadata.
Add check_exprsrd.py to extract non-restricted observations from a “previous 48h” directory and write filtered outputs.
Update src/CMakeLists.txt to copy both scripts into ${CMAKE_BINARY_DIR}/bin.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/ioda-restrict/check_rsrd.py`	New NetCDF filtering/copy tool based on restriction metadata.
`src/ioda-restrict/check_exprsrd.py`	New tool to compute non-restricted mask (incl. EXPRSRD logic) and write filtered NetCDF outputs.
`src/CMakeLists.txt`	Adds a custom target to copy the new scripts into the build `bin/` directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

+            for i in range(len(flag)):
+                fval = None if flag.mask[i] else int(flag[i])
+                eval = None if exp.mask[i] else int(exp[i])


+    # Recursively copy subgroups
+    for grp_name, grp_in in in_group.groups.items():
+        grp_out = out_group.createGroup(grp_name)
+        copy_group(grp_in, grp_out, mask)
+


+            print("  Unique RSRD / EXPRSRD patterns:")
+
+            unique_groups = {}
+
+            for i in range(len(flag)):


+            mask = flag.mask & exp.mask
+
+            total = len(mask)
+            kept = np.sum(mask)
+            dropped = total - kept


+        if fill_value is not None:
+            var_out = out_group.createVariable(
+                var_name, var_in.dtype, var_in.dimensions, fill_value=fill_value
+            )
+        else:
+            var_out = out_group.createVariable(
+                var_name, var_in.dtype, var_in.dimensions
+            )
+


+add_custom_target(copy_restriction ALL
+        COMMAND ${CMAKE_COMMAND} -E copy
+                ${CMAKE_CURRENT_SOURCE_DIR}/ioda-restrict/check_rsrd.py
+                ${CMAKE_BINARY_DIR}/bin/check_rsrd.py
+        COMMAND ${CMAKE_COMMAND} -E copy
+                ${CMAKE_CURRENT_SOURCE_DIR}/ioda-restrict/check_exprsrd.py
+                ${CMAKE_BINARY_DIR}/bin/check_exprsrd.py


CoryMartin-NOAA

The two check scripts seem to be similar. Could they be combined in to one?

CoryMartin-NOAA · 2026-03-17T20:06:04Z

+    if not idx_list:
+        return []
+    ranges = []
+    start = prev = idx_list[0]


I think python is a bit finicky with this sort of thing, and it might change prev when you change start, unless you do a deep copy

HyundeokChoi-NOAA · 2026-03-17T20:31:48Z

The two check scripts seem to be similar. Could they be combined in to one?

Good point. I will dig into both scripts and see how they can be merged into one.

CoryMartin-NOAA · 2026-03-23T18:14:09Z

+# ----------------------------------------------------------------------
+# Copy entire file unchanged
+# ----------------------------------------------------------------------
+def copy_entire_file(nc_in, outfile):


I think this might be overkill. If you just are copying an entire file, use shutil.copy and just copy it, no need to use the netCDF library

Copilot

Pull request overview

This PR introduces a new operational workflow for extracting and screening observation-level restriction metadata from IODA NetCDF files, and makes the script available from the build tree.

Changes:

Add ioda_restriction_filter.py to run both RSRD (current cycle) and EXPRSRD (previous 48h cycle) filtering on directories of .nc files.
Add a CMake custom target to copy the restriction filtering script into build/bin/.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
`src/ioda-restrict/ioda_restriction_filter.py`	New Python script implementing directory-level filtering and NetCDF group/variable copying for restriction screening.
`src/CMakeLists.txt`	Adds an always-built custom target to copy the restriction filtering script into the build’s `bin/` directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-23T18:16:48Z

+            mask = flag.mask & exp.mask
+
+            total = len(mask)
+            kept = np.sum(mask)


mask = flag.mask & exp.mask has the same scalar-mask problem as elsewhere: if either variable has no masked values, .mask can be a scalar bool and len(mask)/np.sum(mask) will not behave as intended. Normalize to boolean arrays (e.g., via np.ma.getmaskarray) before computing mask and counting kept/dropped.

Suggested change

mask = flag.mask & exp.mask

total = len(mask)

kept = np.sum(mask)

flag_mask = np.ma.getmaskarray(flag)

exp_mask = np.ma.getmaskarray(exp)

mask = flag_mask & exp_mask

total = mask.size

kept = int(mask.sum())

Copilot · 2026-03-23T18:16:48Z

+        data = var_in[:]
+
+        if mask is None:
+            var_out[:] = data
+        elif OBS_DIM in var_in.dimensions:
+            if data.ndim == 1:
+                var_out[:] = data[mask]
+            else:
+                var_out[:] = data[mask, ...]
+        else:
+            var_out[:] = data


copy_group reads every variable fully into memory (data = var_in[:]) even when only a subset of locations is needed. For large IODA files this can cause very high memory use and slow runtimes. Prefer slicing directly from the netCDF variable when OBS_DIM is present (read only the kept indices) instead of loading the full array first.

Suggested change

data = var_in[:]

if mask is None:

var_out[:] = data

elif OBS_DIM in var_in.dimensions:

if data.ndim == 1:

var_out[:] = data[mask]

else:

var_out[:] = data[mask, ...]

else:

var_out[:] = data

if mask is None:

# No restriction: copy entire variable

var_out[:] = var_in[:]

elif OBS_DIM in var_in.dimensions:

# Restricted copy along OBS_DIM; assume OBS_DIM is the leading dimension

if var_in.ndim == 1:

var_out[:] = var_in[mask]

else:

var_out[:] = var_in[mask, ...]

else:

# Variable does not depend on OBS_DIM: copy entire variable

var_out[:] = var_in[:]

Copilot · 2026-03-23T18:16:49Z

+            restricted_mask = ~non_restricted_mask
+            restricted_idx = np.where(restricted_mask)[0]


restricted_idx is computed but never used, which adds noise and can confuse future maintenance. Remove it or use it for the intended logging/reporting.

Suggested change

restricted_mask = ~non_restricted_mask

restricted_idx = np.where(restricted_mask)[0]

Copilot · 2026-03-23T18:16:49Z

+                ranges.append(f"{start}–{prev}")
+            start = prev = x
+    if start == prev:
+        ranges.append(f"{start}")
+    else:
+        ranges.append(f"{start}–{prev}")


compress_ranges formats ranges using a Unicode en-dash ("–"). This can cause encoding/display issues in some logs/terminals and makes grepping harder. Prefer an ASCII hyphen ("-") for portability.

Suggested change

ranges.append(f"{start}–{prev}")

start = prev = x

if start == prev:

ranges.append(f"{start}")

else:

ranges.append(f"{start}–{prev}")

ranges.append(f"{start}-{prev}")

start = prev = x

if start == prev:

ranges.append(f"{start}")

else:

ranges.append(f"{start}-{prev}")

I agree with this suggestion

Copilot · 2026-03-23T18:16:49Z

+# copy unified restriction script into build/bin/
+# ----------------------------------------------------------------------
+add_custom_target(copy_restriction ALL
+        COMMAND ${CMAKE_COMMAND} -E copy


The copy_restriction target copies into ${CMAKE_BINARY_DIR}/bin, but it doesn't ensure that directory exists. If oops_FOUND is false (no executables built) or the bin dir hasn't been created yet, the build can fail at this step. Add a preceding cmake -E make_directory ${CMAKE_BINARY_DIR}/bin (and consider copy_if_different to avoid unnecessary rebuilds).

Suggested change

COMMAND ${CMAKE_COMMAND} -E copy

COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_BINARY_DIR}/bin

COMMAND ${CMAKE_COMMAND} -E copy_if_different

HyundeokChoi-NOAA · 2026-03-24T17:21:24Z

To enable ioda_restriction_filter.py from da-utils/src/ioda-restriction within obsForge, the task scripts
ush/python/pyobsforge/task/aero_prepobs.py and
ush/python/pyobsforge/task/atmos_bufr_prepobs.py
also need to be updated to call the new restriction‑filter step.

Would it be appropriate to include these updates in obsForge push request?

CoryMartin-NOAA · 2026-03-24T17:41:25Z

@HyundeokChoi-NOAA yes absolutely. But one thing to think about. @ilianagenkova mentioned that in operations, we don't run the unrestriction code, so it may need to be a separate job rather than part of the existing jobs. Does that make sense? Happy to discuss further if needed

ilianagenkova · 2026-03-24T18:27:40Z

The two scripts (one to create *nr and one to open up the files after EXPRSRD) could stay in one job, but check which machine you are on (dev or prod) to trigger the second scripts. That's how we have it in legacy obsproc, NCO has not complained (not yet).

… filter either in prod or dev machines

HyundeokChoi-NOAA · 2026-03-27T20:31:00Z

@CoryMartin-NOAA @ilianagenkova I added this block ahead of the EXPRSRD filter step and confirmed that it behaves as expected. This filter does not run on the production machine.

# --- Determine whether to run EXPRSRD filter ---
# dev_m = current dev machine
dev_m = None
with open("/lfs/h1/ops/prod/config/prodmachinefile") as f:
    for line in f:
        if "backup" in line:
            parts = line.strip().split(":")
            if len(parts) >= 2:
                dev_m = parts[1]
            break

# this_m = dev machine
with open("/etc/cluster_name") as f:
    this_m = f.read().strip()

print(f"\nCluster check: dev_m={dev_m}, this_m={this_m}")

run_exprsrd = (dev_m == this_m)

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T20:39:37Z

+                ranges.append(f"{start}–{prev}")
+            start = x
+            prev  = x
+    if start == prev:
+        ranges.append(f"{start}")
+    else:
+        ranges.append(f"{start}–{prev}")


compress_ranges uses an en-dash character ("–") in the range strings. This can render oddly or break downstream parsing/log collection that assumes ASCII. Prefer a plain hyphen ("-") unless Unicode output is explicitly required.

Suggested change

ranges.append(f"{start}–{prev}")

start = x

prev = x

if start == prev:

ranges.append(f"{start}")

else:

ranges.append(f"{start}–{prev}")

ranges.append(f"{start}-{prev}")

start = x

prev = x

if start == prev:

ranges.append(f"{start}")

else:

ranges.append(f"{start}-{prev}")

Copilot · 2026-03-27T20:39:38Z

+                print("  Missing restriction variables — copying unchanged.")
+                copy_entire_file(infile, outfile)
+                continue
+
+            flag = md["restrictionFlag"][:]
+            exp  = md["restrictionExpiration"][:]
+
+            if flag.size == 0 or exp.size == 0:
+                print("  Restriction arrays zero length — copying unchanged.")
+                copy_entire_file(infile, outfile)
+                continue
+
+            flag_mask = np.ma.getmaskarray(flag)
+            exp_mask  = np.ma.getmaskarray(exp)
+            mask = flag_mask & exp_mask
+


When restriction variables are missing, the script copies the input file unchanged into the supposedly filtered output directory. If this tool is used as a restriction screen, passing data through unscreened can defeat the purpose and may leak restricted observations. Consider failing fast (non-zero exit), or at least skipping output / writing an empty filtered file when the screening fields are unavailable.

Suggested change

print(" Missing restriction variables — copying unchanged.")

copy_entire_file(infile, outfile)

continue

flag = md["restrictionFlag"][:]

exp = md["restrictionExpiration"][:]

if flag.size == 0 or exp.size == 0:

print(" Restriction arrays zero length — copying unchanged.")

copy_entire_file(infile, outfile)

continue

flag_mask = np.ma.getmaskarray(flag)

exp_mask = np.ma.getmaskarray(exp)

mask = flag_mask & exp_mask

print(" Missing restriction variables — writing empty restricted file.")

mask = np.zeros(len(loc_dim), dtype=bool)

else:

flag = md["restrictionFlag"][:]

exp = md["restrictionExpiration"][:]

if flag.size == 0 or exp.size == 0:

print(" Restriction arrays zero length — writing empty restricted file.")

mask = np.zeros(len(loc_dim), dtype=bool)

else:

flag_mask = np.ma.getmaskarray(flag)

exp_mask = np.ma.getmaskarray(exp)

mask = flag_mask & exp_mask

Copilot · 2026-03-27T20:39:38Z

+    with open("/lfs/h1/ops/prod/config/prodmachinefile") as f:
+        for line in f:
+            if "backup" in line:
+                parts = line.strip().split(":")
+                if len(parts) >= 2:
+                    dev_m = parts[1]
+                break
+
+    # this_m = dev machine
+    with open("/etc/cluster_name") as f:
+        this_m = f.read().strip()


main() unconditionally reads /lfs/h1/ops/prod/config/prodmachinefile and /etc/cluster_name without handling missing/unreadable files. Since this script is copied into the general build bin/, this will crash on non-OPS systems. Consider making this an optional CLI flag/env override, and catch FileNotFoundError/OSError to default to skipping EXPRSRD mode with a clear message.

This is a really good point

The copilot solution is acceptable, i.e. if the machine can't be identified, don't run the filter.

Copilot · 2026-03-27T20:39:39Z

+                print("  Missing restriction variables — copying unchanged.")
+                copy_entire_file(infile, outfile)


Same issue as above for EXPRSRD mode: if restriction variables are missing the script copies the file unchanged into atmos.us. For a screening/filtering script this is unsafe; prefer skipping output or failing so restricted data can't silently pass through.

Suggested change

print(" Missing restriction variables — copying unchanged.")

copy_entire_file(infile, outfile)

print(" Missing restriction variables — no output will be written for this file.")

HyundeokChoi-NOAA · 2026-04-03T16:16:40Z

This PR is required for obsForge PR #205, and I believe it is ready for merge. Please review when you have a moment. Thank you.

CoryMartin-NOAA

Thanks @HyundeokChoi-NOAA

HyundeokChoi-NOAA · 2026-04-03T18:35:31Z

@CoryMartin-NOAA @ilianagenkova Thank you all!!!!

Add PRSRD/EXPRSRD, update CMake, add check_rsrd.py and check_exprsrd.py

9a3de71

HyundeokChoi-NOAA requested review from CoryMartin-NOAA, RussTreadon-NOAA, ilianagenkova and nicholasesposito March 17, 2026 19:25

CoryMartin-NOAA requested a review from Copilot March 17, 2026 19:44

Copilot started reviewing on behalf of CoryMartin-NOAA March 17, 2026 19:44 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

CoryMartin-NOAA reviewed Mar 17, 2026

View reviewed changes

Merged into ioda_restriction_filter.py

b3fcd4a

CoryMartin-NOAA requested a review from Copilot March 23, 2026 18:12

Copilot started reviewing on behalf of CoryMartin-NOAA March 23, 2026 18:13 View session

CoryMartin-NOAA reviewed Mar 23, 2026

View reviewed changes

Copilot AI reviewed Mar 23, 2026

View reviewed changes

Revised ioda_restriction_filter.py

de87926

Update ioda_restriction_filter.py to determine whether to run EXPRSRD…

64fe01c

… filter either in prod or dev machines

CoryMartin-NOAA requested review from CoryMartin-NOAA and Copilot March 27, 2026 20:35

Copilot started reviewing on behalf of CoryMartin-NOAA March 27, 2026 20:36 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

hyundeok choi and others added 2 commits March 30, 2026 17:31

Update CMakeLists and ioda_restriction_filter

dc7646a

Update restriction filter works on other machines

dae4623

CoryMartin-NOAA reviewed Apr 1, 2026

View reviewed changes

Comment thread src/ioda-restrict/ioda_restriction_filter.py Outdated

hyundeok choi and others added 2 commits April 2, 2026 11:28

revied additional changes.

73ee0fc

modified for import of restriction filter in Python

249e956

cleanup the script

ceb7f18

CoryMartin-NOAA reviewed Apr 3, 2026

View reviewed changes

Comment thread src/ioda-restrict/ioda_restriction_filter.py Outdated

changed main to run_rsrd_exprsrd.

90e3d54

CoryMartin-NOAA approved these changes Apr 3, 2026

View reviewed changes

CoryMartin-NOAA merged commit a63aac0 into develop Apr 3, 2026
1 check passed

CoryMartin-NOAA deleted the feature/restriction branch April 3, 2026 18:34

CoryMartin-NOAA mentioned this pull request Apr 6, 2026

Update atmos_bufr_prepobs.py for restriction logic NOAA-EMC/obsForge#205

Merged

		restricted_mask = ~non_restricted_mask
		restricted_idx = np.where(restricted_mask)[0]

	COMMAND ${CMAKE_COMMAND} -E copy
	COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_BINARY_DIR}/bin
	COMMAND ${CMAKE_COMMAND} -E copy_if_different

		print(" Missing restriction variables — copying unchanged.")
		copy_entire_file(infile, outfile)

Conversation

HyundeokChoi-NOAA commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

CoryMartin-NOAA left a comment

Choose a reason for hiding this comment

Uh oh!

CoryMartin-NOAA Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

HyundeokChoi-NOAA commented Mar 17, 2026

Uh oh!

CoryMartin-NOAA Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

CoryMartin-NOAA Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

HyundeokChoi-NOAA commented Mar 24, 2026

Uh oh!

CoryMartin-NOAA commented Mar 24, 2026

Uh oh!

ilianagenkova commented Mar 24, 2026

Uh oh!

HyundeokChoi-NOAA commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

CoryMartin-NOAA Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

ilianagenkova Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!