Skip to content

feat(io): add optional retain_image_id flag to preserve input image IDs (COCO/VIA) --closes #97#138

Open
GauravSRC wants to merge 9 commits into
neuroinformatics-unit:mainfrom
GauravSRC:feat/retain-image-id
Open

feat(io): add optional retain_image_id flag to preserve input image IDs (COCO/VIA) --closes #97#138
GauravSRC wants to merge 9 commits into
neuroinformatics-unit:mainfrom
GauravSRC:feat/retain-image-id

Conversation

@GauravSRC
Copy link
Copy Markdown

Summary

This PR adds an optional retain_image_id: bool = False parameter to the from_files(...) API in ethology.io.annotations.load_bboxes.
When retain_image_id=True the loader will preserve the original image IDs from the input annotation file instead of reassigning 0-based indices based on sorted filenames.

Supported behaviours:

  • COCO: uses the original images[].id values as image_id in the output dataset when retain_image_id=True.
  • VIA: attempts to preserve the VIA metadata key as the image_id if that key can be coerced to an integer; otherwise it falls back to the existing ethology indexing (safe default).

The default behaviour is unchanged (retain_image_id=False) so existing downstream code will continue to see ethology's conventional 0-based image indexing.

Files changed (high level)

  • ethology/io/annotations/load_bboxes.py
    • Added retain_image_id parameter to from_files, _df_from_single_file, _df_from_multiple_files.
    • Modified _df_rows_from_valid_COCO_file and _df_rows_from_valid_VIA_file to accept retain_image_id and preserve input IDs when requested.
  • tests/test_retain_image_id.py
    • New unit test that asserts COCO image IDs (e.g. 42, 99) are preserved in ds.coords["image_id"] when retain_image_id=True.

Why this approach

  • Minimal, backwards-compatible change: keeps current indexing by default and exposes an opt-in flag for users who require provenance-preserving IDs.
  • Keeps internal data types stable (we only preserve numeric VIA keys for safety).
  • Includes a unit test that exercises the public API.

Tests / CI

  • Local test run: pytest -qall tests passed locally (170 passed).
  • Pre-commit was run locally; some hooks auto-fixed files (end-of-file, formatting). The check-manifest failure was due to an accidental .venv directory tracked in the repo; this PR removes those .venv files from version control (they were deleted in the commit) so the packaging manifest check should be satisfied on the next CI run.

How to reproduce / quick usage

from ethology.io.annotations import load_bboxes

# COCO example — preserve original COCO IDs in ds.coords['image_id']
ds = load_bboxes.from_files("path/to/annotations.json", format="COCO", retain_image_id=True)

# VIA example — numeric VIA keys will be preserved when possible
ds_via = load_bboxes.from_files("path/to/via.json", format="VIA", retain_image_id=True)

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.51%. Comparing base (33adaa5) to head (f2e0221).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #138      +/-   ##
==========================================
+ Coverage   99.45%   99.51%   +0.05%     
==========================================
  Files           8        8              
  Lines         554      614      +60     
==========================================
+ Hits          551      611      +60     
  Misses          3        3              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

GauravSRC and others added 2 commits March 13, 2026 20:40
Cover the 26 lines flagged by Codecov:
- VIA format with numeric and non-numeric metadata keys
- COCO/VIA non-existent file paths (continue branches)
- COCO/VIA invalid JSON (except branches)
- _compute_filename_to_original_id fallback return {}
- retain_image_id=False default behaviour
@GauravSRC
Copy link
Copy Markdown
Author

Hi @sfmig - I've addressed the Codecov patch coverage failure. The 26 uncovered lines were all in the error-handling and VIA branches of the new compute_filename_to_original_id* helpers. I've expanded tests/test_retain_image_id.py to cover:

VIA format with numeric and non-numeric metadata keys
The "file not found" continue branches in both COCO and VIA helpers
The invalid-JSON except branches in both helpers
The dispatcher's return {} fallback
The retain_image_id=False default path

All 170+ tests pass locally. Would you take a look when you get a chance? Happy to adjust anything based on your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants