Skip to content

Add save_bboxes.to_netcdf() and load_bboxes.from_netcdf() for complete netCDF4 I/O support#153

Open
manashk29 wants to merge 5 commits into
neuroinformatics-unit:mainfrom
manashk29:feat/netcdf-io-wrappers
Open

Add save_bboxes.to_netcdf() and load_bboxes.from_netcdf() for complete netCDF4 I/O support#153
manashk29 wants to merge 5 commits into
neuroinformatics-unit:mainfrom
manashk29:feat/netcdf-io-wrappers

Conversation

@manashk29
Copy link
Copy Markdown

@manashk29 manashk29 commented Apr 9, 2026

Hii @sfmig
Closes #152

What is this PR

  • Bug fix
  • Addition of a new feature
  • Other

Adds two new public functions to complete the netCDF4 I/O round-trip
for ethology bounding box annotation datasets.

##save_bboxes.to_netcdf()

What it does:
Wraps xarray.Dataset.to_netcdf() with automatic handling of all
attribute types that netCDF4 cannot store natively.

The problem it solves:
The ethology Dataset has three dict attributes
(map_category_to_str, map_image_id_to_filename,
map_image_id_to_original_coco_id), a list attribute
(annotation_files), and a None attribute (images_directories).
netCDF4 cannot store any of these types — calling ds.to_netcdf()
directly would either silently corrupt them or raise a TypeError.

How it works:

  1. Makes a deep copy of the Dataset so the caller's attrs are never mutated
  2. Iterates over all attrs and converts:
    • dict → JSON string via json.dumps()
    • list → JSON string (each element converted to str first)
    • Path → plain string
    • None → dropped entirely
    • str, int, float → kept as-is (already netCDF4-safe)
  3. Calls xarray.Dataset.to_netcdf() on the cleaned copy

##load_bboxes.from_netcdf()

What it does:
Wraps xr.open_dataset() with automatic deserialisation of all
attributes back to their original Python types, then validates
the loaded Dataset.

The problem it solves:
After loading a netCDF4 file with raw xr.open_dataset(),
map_category_to_str comes back as the string
'{"1": "Mallard", "3": "Goose"}' instead of the dict
{1: "Mallard", 3: "Goose"}. The keys are also strings ("1", "3")
not integers (1, 3) because json.dumps() always serialises dict
keys as strings. Any downstream code calling
ds.attrs["map_category_to_str"][1] would fail with a KeyError.

How it works:

  1. Raises FileNotFoundError with a clear message if the file does not exist
  2. Calls xr.open_dataset(file_path).load() — the .load() call
    reads all data eagerly into memory and closes the file handle
    immediately, preventing PermissionError on Windows
  3. Iterates over the three dict attrs and for each:
    • Calls json.loads() to parse the JSON string back to a dict
    • Converts all numeric string keys ("1", "3") back to integers
      (1, 3) using int(k) if k.lstrip("-").isdigit() else k
  4. Restores annotation_files from JSON string back to a list of
    Path objects
  5. Uses @_check_output(ValidBboxAnnotationsDataset) decorator —
    same as from_files() — so the returned Dataset is guaranteed
    to be valid

Tests

Added tests/test_unit/test_io_annotations/test_io_netcdf.py with 3 test classes

Checklist:

  • The code has been tested locally
  • Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

@manashk29 manashk29 marked this pull request as ready for review April 9, 2026 20:00
manashk29

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add save_bboxes.to_netcdf() and load_bboxes.from_netcdf() for complete netCDF4 I/O support

1 participant