Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
5a11d01
Notebook to visualise detections only
sfmig Apr 28, 2025
56c1752
Add boxmot dependency
sfmig May 2, 2025
6985f74
Rename and add new notebook
sfmig Jun 26, 2025
fb52d3c
Add dependencies for metrics computation
sfmig Jun 26, 2025
0e3194c
Notebook to run and evaluate detector on a dataset (WIP)
sfmig Jun 26, 2025
1c3fbe9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 26, 2025
0e6fe5c
exploring pycoco tools
sfmig Jun 30, 2025
493eb27
Merge branch 'smg/detections-flicker' of github.com:neuroinformatics-…
sfmig Jun 30, 2025
6f27465
Explore formatting detections as xarray and exporting as COCO-annotat…
sfmig Jul 1, 2025
ca7b4ac
Trying cocoeval from pycocotools (unsuccessfully)
sfmig Jul 1, 2025
9cf0372
Explore using mean_average_precision
sfmig Jul 1, 2025
c52918d
Move notebooks
sfmig Jul 1, 2025
7765bdc
Evaluate detections and plot histograms
sfmig Jul 2, 2025
831583c
Update dependencies
sfmig Jul 3, 2025
9a05c95
Add proto utilities
sfmig Jul 3, 2025
2314548
Refactor notebook using new utils. Add precision and recall to plots
sfmig Jul 3, 2025
af631b1
Change notebook to binning by diagonal
sfmig Jul 3, 2025
8a46f65
Add calibration plot
sfmig Jul 3, 2025
c2a7282
Rename
sfmig Jul 14, 2025
b9b5a99
Return detections from pytorch dataset as xarray ds
sfmig Jul 14, 2025
c9b8eff
Save detections as xarray datasets
sfmig Jul 14, 2025
6af309d
Key detections dict by image_id, rather than index in input dataset
sfmig Jul 17, 2025
66dd2cf
Update notebooks
sfmig Jul 17, 2025
60e79ec
Explore ensemble
sfmig Jul 17, 2025
0a9b880
Small edits to other notebooks
sfmig Jul 17, 2025
81549f5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 17, 2025
229fa9f
Save frames, compute precision and recall per frame
sfmig Jul 18, 2025
7566adf
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 18, 2025
78de297
cast as pytorch tensor in evaluate
sfmig Jul 18, 2025
93d2e08
Small changes and rename
sfmig Jul 23, 2025
7aa9747
Add IOU assigned to each true positive as output to Hungarian algorithm
sfmig Jul 23, 2025
ce04965
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 23, 2025
ed42850
Add image width and height as attributes to detections dataset
sfmig Jul 24, 2025
c6e14d6
Load models to cpu
sfmig Jul 24, 2025
94f3ab4
Add utils to transform detection datasets
sfmig Jul 24, 2025
97f0136
Notebook to run an ensemble on a dataset
sfmig Jul 24, 2025
29733f0
Fix import
sfmig Jul 25, 2025
fd5014a
Concatenate detection datasets per image
sfmig Jul 28, 2025
f4f8705
Exploring how to vectorise datasets/data arrays WIP
sfmig Jul 28, 2025
b79d2d0
Split detections ds formtting utils
sfmig Jul 29, 2025
57a0148
Add run detector on dataloader
sfmig Jul 29, 2025
16ef898
polish apply_ufunc approach
sfmig Jul 29, 2025
e3fb6b5
add naive approach and compare
sfmig Jul 29, 2025
7ee6960
Exploring vectorising nms (not working)
sfmig Jul 29, 2025
441a69f
Load annotations as ds
sfmig Jul 29, 2025
4f62cff
Add evaluate functions for ds
sfmig Jul 29, 2025
71a07a9
Add evaluation to notebook
sfmig Jul 29, 2025
f525f34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 29, 2025
3fd024e
Clean up ensemble notebook
sfmig Jul 29, 2025
6405a06
Update binned notebook
sfmig Jul 29, 2025
9b54ef0
Merge branch 'smg/detections-flicker' of github.com:neuroinformatics-…
sfmig Jul 29, 2025
95e553b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 29, 2025
9aecac8
Convert torch dataset to detections dataset
sfmig Jul 30, 2025
b9e4a3c
Clean up and add adhoc test
sfmig Jul 30, 2025
f21b2a5
Change variable names
sfmig Jul 30, 2025
18ce9a7
Run on OOD data
sfmig Jul 30, 2025
2161a59
Attempt to generalise ensemble WIP
sfmig Jul 31, 2025
31978a3
Remove fused detections with confidence below th
sfmig Jul 31, 2025
3eb7a47
Return image_ids when splitting. Apply confidence th after ensembling…
sfmig Jul 31, 2025
ea6431c
Add notes to fix
sfmig Aug 1, 2025
2b18620
Combine detections_dict_as_ds and detections_dict_as_ds_batch
sfmig Aug 1, 2025
116f542
Add transform from detection ds to movement-like ds. Add transform fr…
sfmig Aug 1, 2025
1d11c95
Fix wrong kwarg to cocnat
sfmig Aug 1, 2025
b409308
Run indomain add some comments
sfmig Aug 1, 2025
d8bee23
Add notebook to run ensemble on video (WIP)
sfmig Aug 1, 2025
1113db3
Run ensemble on video first draft
sfmig Aug 28, 2025
ac93567
Define intervals in calibration curve
sfmig Sep 1, 2025
8df1375
Add description to model
sfmig Sep 1, 2025
9fe6826
Working on ensemble on eval dataset
sfmig Sep 1, 2025
3fd9479
Use botsort for ensemble detections
sfmig Sep 1, 2025
777cc3d
add boxsort as dep
sfmig Sep 1, 2025
e229f89
Merge branch 'smg/detections-flicker' of github.com:neuroinformatics-…
sfmig Sep 1, 2025
96daf68
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 1, 2025
a6abeae
Add detect only notebooks
sfmig Sep 1, 2025
9e50524
Merge branch 'smg/detections-flicker' of github.com:neuroinformatics-…
sfmig Sep 1, 2025
5d60959
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ recursive-include docs *.md *.rst *.py
# Include json schemas
recursive-include ethology/annotations/json_schemas/schemas *.json
recursive-include ethology/annotations/json_schemas/schemas *.md


# Temporarily include notebooks
recursive-include notebooks *.py
6 changes: 6 additions & 0 deletions ethology/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
from importlib.metadata import PackageNotFoundError, version

import xarray as xr

# Set xarray options
# show collapsed attributes by default
xr.set_options(display_expand_attrs=False)

try:
__version__ = version("ethology")
except PackageNotFoundError:
Expand Down
173 changes: 157 additions & 16 deletions ethology/annotations/io/load_bboxes.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
from pathlib import Path
from typing import Literal

import numpy as np
import pandas as pd
import xarray as xr

from ethology.annotations.validators import ValidCOCO, ValidVIA

Expand All @@ -30,7 +32,7 @@ def from_files(
format: Literal["VIA", "COCO"],
images_dirs: Path | str | list[Path | str] | None = None,
) -> pd.DataFrame:
"""Read input annotation files as a bboxes dataframe.
"""Read input annotation files as a bboxes xarray dataset.

Parameters
----------
Expand All @@ -44,17 +46,29 @@ def from_files(

Returns
-------
pd.DataFrame
Bounding boxes annotations dataframe. The dataframe is indexed
by "annotation_id" and has the following columns: "image_filename",
"image_id", "image_width", "image_height", "x_min", "y_min",
"width", "height", "supercategory", "category". It also has the
following attributes: "annotation_files", "annotation_format",
"images_directories". The "image_id" is assigned based
on the alphabetically sorted list of unique image filenames across all
input files. The "category_id" column is always a 0-based integer,
except for VIA files where the values specified in the input file
are retained.
xr.Dataset
Bounding boxes annotations xarray dataset. The dataset has the
following dimensions: "image_id", "space", "id".
The "image_id" is assigned based on the alphabetically sorted list
of unique image filenames across all input files. The "space"
dimension holds the "x" or "y" coordinates. The "id" dimension
corresponds to the annotation ID per image and it is assigned from
0 to the max number of annotations per image in the full dataset.

The dataset consists of three arrays:
- "position": (image_id, space, id)
- "shape": (image_id, space, id)
- "category": (image_id, id)
The "category" array holds 0-based integers, except for VIA
files where the values specified in the input file are retained.

The dataset attributes include:
- "map_category_id_to_category": map from category_id to category name
- "map_image_id_to_filename": map from image_id to image filename
- "images_directories": list of paths to the directories containing
the images the annotations refer to (optional)
- "annotation_files": list of paths to the input annotation files
- "annotation_format": format of the input annotation files

Notes
-----
Expand All @@ -66,7 +80,18 @@ def from_files(
image IDs to images that have the same name but appear in different input
annotation files, you can either make the image filenames distinct before
loading the data, or you can load the data from each file
as a separate dataframe, and then concatenate them as desired.
as a separate xarray dataset, and then concatenate them as desired.

Examples
--------
>>> ds = from_files(
... file_paths=[
... "path/to/annotation_file_1.json",
... "path/to/annotation_file_2.json",
... ],
... format="VIA",
... images_dirs=["path/to/images_dir_1", "path/to/images_dir_2"],
... )

See Also
--------
Expand All @@ -82,14 +107,33 @@ def from_files(
else:
df_all = _from_single_file(file_paths, format=format)

# Add metadata
df_all.attrs = {
# Get map from image_id to image_filename
mapping_df = df_all[["image_filename", "image_id"]].drop_duplicates()
map_image_id_to_filename = mapping_df.set_index("image_id").to_dict()[
"image_filename"
]

# Get map from category_id to category
map_category_id_to_category = (
df_all[["category_id", "category"]]
.drop_duplicates()
.set_index("category_id")
.to_dict()["category"]
)

# Convert to xarray dataset
ds = _df_to_xarray_ds(df_all)

# Add metadata to the dataset
ds.attrs = {
"annotation_files": file_paths,
"annotation_format": format,
"map_category_id_to_category": map_category_id_to_category,
"map_image_id_to_filename": map_image_id_to_filename,
"images_directories": images_dirs,
}

return df_all
return ds


def _from_multiple_files(
Expand Down Expand Up @@ -397,3 +441,100 @@ def _VIA_category_id_as_int(df: pd.DataFrame) -> pd.DataFrame:
except ValueError:
df["category_id"] = df["category"].factorize(sort=True)[0]
return df


def _df_to_xarray_ds(df: pd.DataFrame) -> xr.Dataset:
"""Convert bounding boxes annotations dataframe to an xarray dataset.

Parameters
----------
df : pd.DataFrame
Bounding boxes annotations dataframe.

Returns
-------
xr.Dataset
an xarray dataset with the following dimensions:
- "image_id": holds the 0-based index of the image in the "images"
list of the COCO JSON file;
- "space": "x" or "y";
- "id": annotation ID per image, assigned from 0 to the max number of
annotations per image in the full dataset.

The dataset is made up of the following arrays:
- position: (image_id, space, id)
- shape: (image_id, space, id)
- category: (image_id, id)

"""
# Compute max number of annotations per image
max_annotations_per_image = df["image_id"].value_counts().max()

# Sort the dataframe by image_id
# Note: the input annotation ID is unique across the dataframe
df = df.sort_values(by=["image_id"])

# Compute indices of the rows where the image ID switches
bool_id_diff_from_prev = df["image_id"].ne(df["image_id"].shift())
indices_id_switch = np.argwhere(bool_id_diff_from_prev).squeeze()[1:]

# Stack position, shape and confidence arrays along ID axis
map_key_to_columns = {
"position_array": ["x_min", "y_min"],
"shape_array": ["width", "height"],
"category_array": ["category_id"],
}
map_key_to_padding = {
"position_array": (np.float64, np.nan),
"shape_array": (np.float64, np.nan),
"category_array": (int, -1),
}
array_dict = {}
for key in map_key_to_columns:
# extract annotations per image
list_arrays = np.split(
df[map_key_to_columns[key]].to_numpy(
dtype=map_key_to_padding[key][0] # type: ignore
),
indices_id_switch, # indices along axis=0
)

# pad arrays with NaN values along the annotation ID axis
list_arrays_padded = [
np.pad(
arr,
((0, max_annotations_per_image - arr.shape[0]), (0, 0)),
constant_values=map_key_to_padding[key][1], # type: ignore
)
for arr in list_arrays
]

# stack along the first axis (image_id)
array_dict[key] = np.stack(list_arrays_padded, axis=0).squeeze()

# reorder axes if required
if "category" not in key:
array_dict[key] = np.moveaxis(array_dict[key], -1, 1)

# ----
# Modify x_min and y_min to represent the bbox centre
array_dict["position_array"] += array_dict["shape_array"] / 2

# Create xarray dataset
return xr.Dataset(
data_vars=dict(
position=(
["image_id", "space", "id"],
array_dict["position_array"],
),
shape=(["image_id", "space", "id"], array_dict["shape_array"]),
category=(["image_id", "id"], array_dict["category_array"]),
),
coords=dict(
image_id=df["image_id"].unique(),
space=["x", "y"],
id=range(max_annotations_per_image),
# annotation ID per frame; could be consistent across frames
# or not
),
)
103 changes: 103 additions & 0 deletions ethology/datasets/convert.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
"""Convert betweendataset formats."""

from pathlib import Path

import numpy as np
import pandas as pd
import torch
import xarray as xr


def torch_dataset_to_xr_dataset(
torch_dataset: torch.utils.data.Dataset,
) -> xr.Dataset:
"""Convert a torch dataset to an xarray detections dataset."""
# Read list of annotations as a dataframe
list_annot = [annot for _img, annot in torch_dataset]
df_annot = pd.DataFrame(list_annot)

# Compute centroid, shape and labels
df_annot["centroid"] = df_annot["boxes"].apply(
lambda x: (0.5 * (x[:, 0:2] + x[:, 2:4])).numpy().astype(float)
)
df_annot["shape"] = df_annot["boxes"].apply(
lambda x: (x[:, 2:4] - x[:, 0:2]).numpy().astype(float)
)
df_annot["labels"] = df_annot["labels"].apply(
lambda x: x.numpy().reshape(-1, 1).astype(int)
)

# Compute maximum number of annotations per image
df_annot["n_annotations"] = df_annot["boxes"].apply(lambda x: x.shape[0])
n_max_annotations = df_annot["n_annotations"].max()

# Pad arrays to n_max_annotations
array_dict = {}
map_name_to_padding = {
"centroid": np.nan,
"shape": np.nan,
"labels": -1,
}
for array_name in map_name_to_padding:
array_dict[array_name] = np.stack(
[
np.pad(
arr,
((0, n_max_annotations - arr.shape[0]), (0, 0)),
mode="constant",
constant_values=map_name_to_padding[array_name],
).T
for arr in df_annot[array_name].to_list()
]
)

# Return xarray dataset
xr_dataset = xr.Dataset(
data_vars={
"position": (["image_id", "space", "id"], array_dict["centroid"]),
"shape": (["image_id", "space", "id"], array_dict["shape"]),
"category": (["image_id", "id"], array_dict["labels"].squeeze()),
},
coords={
"image_id": df_annot["image_id"].values,
"space": ["x", "y"],
"id": range(n_max_annotations),
},
)

# Add metadata
root = find_nested_root(torch_dataset)
if root:
xr_dataset.attrs["images_directories"] = root

return xr_dataset


def find_nested_root(dataset: torch.utils.data.Dataset) -> str | Path | None:
"""Find root of a possibly nested dataset.

Parameters
----------
dataset : torch.utils.data.Dataset
The dataset to check. It may be the result of multiple
splits, and therefore be nested.

Returns
-------
str or Path or None
The nested root value for the dataset, or None if not found

"""
current = dataset

# Check current level
if hasattr(current, "root"):
return current

# Check through dataset levels
while hasattr(current, "dataset"):
current = current.dataset
if hasattr(current, "root"):
return current.root

return None
Loading
Loading