Draft: conversion from napari layers to movement datasets by anna-teruel · Pull Request #1011 · neuroinformatics-unit/movement

anna-teruel · 2026-05-29T14:40:01Z

Description

This PR is part of #1008 and #997, and is an early draft supporting save napari layers back to movement xarrays and a save widget in the napari GUI.

Bug fix
Addition of a new feature
Other

Why is this PR needed?

Currently, movement supports converting movement datasets into napari layers, but the reverse conversion is also needed for workflows where users manually edit tracking data in napari and then want to save the corrected data (#993)

What does this PR do?

This draft PR adds an initial implementation of a napari_layers_to_ds() function to convert napari tracking layer data back into a movement xarray.Dataset.

At this stage, the PR is opened as a draft to get early feedback on the implementation and API design before extending it further.

References

Part of:

Related to:

Interactive napari widget for manual pose-track refinement #993

How has this PR been tested?

The code has been tested manually in a local Jupyter notebook using example tracking data.

Formal pytest tests have not been added yet because this PR is currently opened as a draft for early feedback on the implementation.

Is this a breaking change?

No. This PR adds new functionality and should not change existing behavior.

Does this PR require an update to the documentation?

Documentation and usage examples will be added once the implementation is finalized.
Because this is currently a draft PR, documentation has not yet been added.

Remaining work

This PR is opened as a draft to gather early feedback on the implementation. The following items are still planned:

Add support for bboxes tracking data. The current implementation has only been developed and manually tested with pose-estimation data (poses).
Add unit tests for napari_layers_to_ds() and validate behavior across different datasets.
Implement an initial save_widget providing basic save functionality, including output directory browsing.
Add tests for the save widget.
Preserve fps information during napari ↔ movement conversion

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

codecov · 2026-06-01T16:45:32Z

Codecov Report

❌ Patch coverage is 58.82353% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.75%. Comparing base (2749fe5) to head (3b93502).
⚠️ Report is 18 commits behind head on main.

Files with missing lines	Patch %	Lines
movement/napari/convert.py	58.82%	7 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##              main    #1011      +/-   ##
===========================================
- Coverage   100.00%   99.75%   -0.25%     
===========================================
  Files           41       41              
  Lines         2846     2838       -8     
===========================================
- Hits          2846     2831      -15     
- Misses           0        7       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

sonarqubecloud · 2026-06-02T10:01:44Z

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

for more information, see https://pre-commit.ci

sonarqubecloud · 2026-06-02T13:36:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

anna-teruel · 2026-06-02T13:54:53Z

While writing tests for napari_layers_to_ds(), I noticed an edge case with valid_bboxes_dataset_with_nan.

The fixture introduces NaNs only in the position array, not shapes. As a result, the dataset used for testing contains frames where :

position = NaN
shape = [60,40] #example

Then, testing napari_layers_to_ds() fails because shape is reconstructed from the bbox vertices. When position is NaN, the bbox vertices are NaN, and consequently, shape also becomes NaN.

My questions are:

Should shape be preserved even when position is NaN?
Is there a realistic use case where a bbox has a missing position but a valid shape?
Or is it expected that shape becomes NaN whenever position is NaN, since the bbox geometry cannot be reconstructed from NaN vertices?

I'm not very familiar with bbox datasets, so I wanted to check before adjusting the test expectations. I hope I explained myself 😅

niksirbi · 2026-06-03T16:36:39Z

            additional line segment to the first, creating a closed loop.
            (See Notes).
        name
-            Name of the LoI that is to be created. A default name will be


Nice catch on these typos and thanks for keeping these unrelated fixes in their own commit.

When little fixes like this aren't related to what the PR is about, we prefer to open them as separate quick PRs (or even a single 'docs: fix typos' PR collecting several). It keeps each PR's history clean, telling one story, which makes reviews and future git blame easier. For context, we 'squash-merge' PRs, meaning that individual commits here won't be visible on the main branch's history; instead the PR title becomes the commit message on main after merging (and hence good PR titles are important).

For tiny stuff like this it's a soft preference, and I've violated it myself in the past.

But it's a habit worth building as changes get bigger, so I'd recommed doing it here.

The unrelated changes being cleanly isolated in their own commit will make things easier. I would handle this by using git cherry-pick to move that commit onto a fresh branch off main and open it as its own PR, then drop it from this branch (you can use git revert for that.)

Thanks for pointing this out! I think I fixed those typos while I was reading through the code, probably before I created this branch, and I didn't realize they had ended up mixed into this PR.

I agree that they should live in a separate PR. Sorry for the mess! 😅

The unrelated changes being cleanly isolated in their own commit will make things easier. I would handle this by using git cherry-pick to move that commit onto a fresh branch off main and open it as its own PR, then drop it from this branch (you can use git revert for that.)

Thanks for this info, I'll do that ☺️

niksirbi · 2026-06-03T16:57:36Z

While writing tests for napari_layers_to_ds(), I noticed an edge case with valid_bboxes_dataset_with_nan.

Great find @anna-teruel and thanks for digging into this rather than just tweaking the test to make it pass!

I think that fixture is unrealistic and needs updating. In a bboxes dataset, a frame where an individual is missing (not detected) is encoded as NaN in all three arrays at once—position, shape, and confidence. You can see this in the VIA-tracks loader, which initialises every array full of NaN and only writes values for frames that actually have a detection:

# movement/io/load_bboxes.py
position_array   = np.full((n_frames, 2, n_individuals), np.nan, ...)
shape_array      = np.full((n_frames, 2, n_individuals), np.nan, ...)
confidence_array = np.full((n_frames, n_individuals), np.nan, ...)
# ...only observed (frame, id) entries get written

So to answer your three questions directly:

Should shape be preserved when position is NaN

No.

Realistic use case for missing position + valid shape?

No, position + shape are a representation of the same detection, so their absence/presence should always go together.

Is shape→NaN whenever position→NaN expected?

Yes, exactly. That would be the consistent behaviour.

As you've correctly identified, the valid_bboxes_dataset_with_nan fixture currently injects NaN into position only, leaving shapeand confidence as valid (an unrealistic state our own loaders never actually produce). This was not a problem before, as existing tests using this fixture only cared about position. It is one now.

My suggestion would be to update the shared fixture so a missing detection is NaN across all three arrays (and update its docstring to match). Something like:

@pytest.fixture
def valid_bboxes_dataset_with_nan(valid_bboxes_dataset):
    """Return a valid bboxes dataset with NaN values for some detections.
    A missing detection is represented by NaN in the ``position``, ``shape``
    and ``confidence`` arrays simultaneously, mirroring how real bounding box
    data (e.g. from VIA-tracks) encodes frames where an individual is absent.
    Here, individual ``id_0`` is missing at frames 3, 7 and 8.
    """
    nan_selection = {"individual": "id_0", "time": [3, 7, 8]}
    valid_bboxes_dataset.position.loc[nan_selection] = np.nan
    valid_bboxes_dataset.shape.loc[nan_selection] = np.nan
    valid_bboxes_dataset.confidence.loc[nan_selection] = np.nan
    return valid_bboxes_dataset

I checked locally and this is safe (all the other tests that use this fixture still pass).

sfmig · 2026-06-04T12:22:56Z

Good catch re the fixture @anna-teruel !

If you want, you can also open that as a very small PR with just that change and we can merge that before this one.

anna-teruel · 2026-06-05T07:28:00Z

Thank you both @niksirbi and @sfmig for the feedback! I will open a new PR for this 👍

niksirbi · 2026-06-05T09:27:14Z

Hi @anna-teruel, your napari_layers_to_ds() is a great starting point.

Reading it made me realise that there's one central question about the core design we have yet to address, because everything else will depend on the answer to that. None of this is "the current code is wrong", it's more that your draft has revealed some big questions, which is precisely why we love draft PRs!

The issue with the current implementation

napari_layers_to_ds() is written as the mathematical inverse of ds_to_napari_layers(): it consumes the exact arrays that the first function produces and inverts the arithmetic.

That on its own won't work for reading edited layer data, which is what we ultimately need for the feature (#993).

Let's carefully think through the full journey some pose estimation data may go through:

Data gets loaded from source file (via movement.io.load.load_dataset) in a movement xarray dataset
ds_to_napari_layers() will convert it to a napari Tracks array and a properties dataframe.
NaN rows are filtered out inside loader_widget.py before going into napari layers (search for self.data_non_nan to see what I mean)
The user may edit the points (in the most minimal case moving a single point).

Now, if we want to reverse that journey, we'd have to:

Decide which napari layer serves as the source of truth, as pose datasets get loaded as both Points and Tracks layers. Let's say we pick the Points layer for that (see next section for a full explanation)
Take that source-of-truth layer and the properties dataframe, and reconstruct an Xarray dataset in a way that both preserved the NaN values of the original data in ther original positions, AND keeps the edits the user has made.
That reconstructed Xarray dataset will eventually get written to a file via one of our saving functions. Initially we can focus our efforts on saving it via Xarray's native to_netcdf() method: `ds.to_netcdf("my_data_processed.nc").

So the round trip we are actually interested in facilitating (and must verify in tests), is not simply ds_to_napari_layers() followed immediately by napari_layers_to_ds(). In-between we have to drop NaNs and edit a point before we start the return journey. That kind of round-trip verification would qualify as an integration test.

The central design question

As hinted above, the main question is which napari layer is the "source of truth" when we read data back?

I think our intended UX shoud be for users to edit the Points layer, because that's much more feature-complete in napari than the Tracks layer, and has many convenient edit tools.

We could decide on a clean principle:

Position always comes from the Points layer, for both poses and bboxes. The Tracks layer (both ds types) and the Shapes/boxes layer (bboxes only) are derived views, kept in sync via callbacks. We never reconstruct position from them.

In this model 'Points' become the primary data layer, and the other napari layer types 'follow' it.

This means there are really two separate jobs, and we should keep them separate in the code:

Live sync (callbacks): a Points edit updates the Tracks layer (both types) and the Shapes layer (bboxes), to keeps the viewer consistent while editing.
Read-back / save (napari_layers_to_ds()): turn the layers into a movement dataset at save time.

There is a second related question hiding in there, namely, what's the bboxes editing scope. Do users edit only the points (centroids), with boxes translating to follow, or can they also resize (and rotate?) the boxes?

For the purposes of te GSoC project, I propose limiting the edit scope to just the Points layer, i.e. the one representing the centroid of each box. This means users will be able to translate boxes (by dragging their centroid) and to swap their identities, but NOT resize them.

This keeps the Points-layer-first principle consistent across poses and bboxes datasets.

What that implies for the function's inputs and implementation

Poses: Points layer + its features (properties) is everything we need (position, confidence, individual and keypoint names, time coords). Tracks + properties would also work, but I prefer Points for the reason mentioned above.
Bboxes: Points layer (centroid → position) + Shapes layer (shape) + features (properties). The Points layer alone can't give you width/height, so bboxes conversions also need the Shapes layer to populated the shapes array.

In practice, we may make the Points layer and properties dataframe as required inputs to napari_layers_to_ds, with the Shapes layer (containing boxes) as an optional argument. If we pass just Points + properties we should just return a poses dataset with position and condfidence arrays. If we pass Points layer + properties + Shapes layer, we should return a bboxes dataset with position + shape + confidence arrays. Not sure if this idea will work in practice, but that where I would start.

A small but important implementation note for whichever path we take: we should reconstruct the dataset by indexing each point/box into its (individual, keypoint, frame) cell using the properties dataframe, NOT by reshaping a flat array in assumed order. The frame index should come from the layer's frame column, not properties["time"] (which is in seconds when fps is set). Missing cells should stay NaN, which is exactly what we want.

Suggested way forward for this PR

This whole discussion is relevant for the whole #993 project; we don't have to tackle all of that in this PR.

I think what we need here is just:

An integration test that takes data through a realistic round-tip journey (as described above)
An implementation of napari_layers_to_ds that passes the above tests.

A possible simplification would be to also forget about bboxes for now (gets rid of the shape complication), and just focus on making this work for poses (but we can discuss this in our next meeting).

sfmig · 2026-06-05T11:46:54Z

I agree the Points layer should be the source of truth (Tracks would be a bad choice imo, it would make our lives very difficult).

So the PRs could be:

this one: converting from napari data layer to dataset (with the caveats Niko raises above)
next one: set up the callback so that edits in Points layer propagate to the Tracks layer. I think this could be done without UI, right?

We could consider disabling the Tracks layer in the first version for clarity (this was my point here)

anna-teruel changed the title ~~Save widget~~ Draft: conversion from napari layers to movement datasets May 29, 2026

anna-teruel added 2 commits June 1, 2026 18:21

adding my custom testing notebook

f683d28

corrected some typos in docstrings

15dbf35

anna-teruel force-pushed the save-widget branch from 1a3e966 to 3b93502 Compare June 1, 2026 16:37

anna-teruel added 3 commits June 2, 2026 09:38

napari_layers_to_ds function

5bce103

supporting bboxes in napari_layers_to_ds

2422a44

adding more docstrings and fixing a small bug

90ba280

anna-teruel force-pushed the save-widget branch from 3b93502 to 90ba280 Compare June 2, 2026 09:07

pre-commit-ci Bot and others added 6 commits June 2, 2026 09:07

[pre-commit.ci] auto fixes from pre-commit.com hooks

bbeea38

for more information, see https://pre-commit.ci

docstrings

a9e4326

[pre-commit.ci] auto fixes from pre-commit.com hooks

93ec4f5

for more information, see https://pre-commit.ci

adding fps param

1c21f7d

small fix

8451085

[pre-commit.ci] auto fixes from pre-commit.com hooks

f06f28e

for more information, see https://pre-commit.ci

anna-teruel and others added 5 commits June 2, 2026 12:14

fixing conflicts

1a40f14

solve conflicts

45484c3

apply pre-commit fixes

2f72ab7

tests for napari_layer_to_ds

6bf6d86

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd11ffc

for more information, see https://pre-commit.ci

niksirbi reviewed Jun 3, 2026

View reviewed changes

anna-teruel mentioned this pull request Jun 5, 2026

Update bbox nan fixture #1020

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: conversion from napari layers to movement datasets#1011

Draft: conversion from napari layers to movement datasets#1011
anna-teruel wants to merge 16 commits into
neuroinformatics-unit:mainfrom
anna-teruel:save-widget

anna-teruel commented May 29, 2026

Uh oh!

codecov Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Uh oh!

anna-teruel commented Jun 2, 2026

Uh oh!

niksirbi Jun 3, 2026

Uh oh!

anna-teruel Jun 5, 2026

Uh oh!

niksirbi commented Jun 3, 2026 •

edited

Loading

Uh oh!

sfmig commented Jun 4, 2026

Uh oh!

anna-teruel commented Jun 5, 2026

Uh oh!

niksirbi commented Jun 5, 2026

Uh oh!

sfmig commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anna-teruel commented May 29, 2026

Description

References

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Remaining work

Checklist:

Uh oh!

codecov Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Uh oh!

sonarqubecloud Bot commented Jun 2, 2026

Quality Gate passed

Uh oh!

anna-teruel commented Jun 2, 2026

Uh oh!

niksirbi Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

anna-teruel Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

niksirbi commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfmig commented Jun 4, 2026

Uh oh!

anna-teruel commented Jun 5, 2026

Uh oh!

niksirbi commented Jun 5, 2026

The issue with the current implementation

The central design question

What that implies for the function's inputs and implementation

Suggested way forward for this PR

Uh oh!

sfmig commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Jun 1, 2026 •

edited

Loading

niksirbi commented Jun 3, 2026 •

edited

Loading