Add save_bboxes.to_netcdf() and load_bboxes.from_netcdf() for complete netCDF4 I/O support#153
Open
manashk29 wants to merge 5 commits into
Open
Conversation
for more information, see https://pre-commit.ci
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hii @sfmig
Closes #152
What is this PR
Adds two new public functions to complete the netCDF4 I/O round-trip
for ethology bounding box annotation datasets.
##save_bboxes.to_netcdf()
What it does:
Wraps xarray.Dataset.to_netcdf() with automatic handling of all
attribute types that netCDF4 cannot store natively.
The problem it solves:
The ethology Dataset has three dict attributes
(
map_category_to_str,map_image_id_to_filename,map_image_id_to_original_coco_id), a list attribute(
annotation_files), and a None attribute (images_directories).netCDF4 cannot store any of these types — calling
ds.to_netcdf()directly would either silently corrupt them or raise a TypeError.
How it works:
dict→ JSON string viajson.dumps()list→ JSON string (each element converted to str first)Path→ plain stringNone→ dropped entirelystr,int,float→ kept as-is (already netCDF4-safe)xarray.Dataset.to_netcdf()on the cleaned copy##load_bboxes.from_netcdf()
What it does:
Wraps
xr.open_dataset()with automatic deserialisation of allattributes back to their original Python types, then validates
the loaded Dataset.
The problem it solves:
After loading a netCDF4 file with raw
xr.open_dataset(),map_category_to_strcomes back as the string'{"1": "Mallard", "3": "Goose"}'instead of the dict{1: "Mallard", 3: "Goose"}. The keys are also strings ("1", "3")not integers (1, 3) because
json.dumps()always serialises dictkeys as strings. Any downstream code calling
ds.attrs["map_category_to_str"][1]would fail with aKeyError.How it works:
FileNotFoundErrorwith a clear message if the file does not existxr.open_dataset(file_path).load()— the.load()callreads all data eagerly into memory and closes the file handle
immediately, preventing
PermissionErroron Windowsjson.loads()to parse the JSON string back to a dict(1, 3) using
int(k) if k.lstrip("-").isdigit() else kannotation_filesfrom JSON string back to a list ofPathobjects@_check_output(ValidBboxAnnotationsDataset)decorator —same as
from_files()— so the returned Dataset is guaranteedto be valid
Tests
Added
tests/test_unit/test_io_annotations/test_io_netcdf.pywith 3 test classesChecklist: