Draft: conversion from napari layers to movement datasets#1011
Draft: conversion from napari layers to movement datasets#1011anna-teruel wants to merge 16 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1011 +/- ##
===========================================
- Coverage 100.00% 99.75% -0.25%
===========================================
Files 41 41
Lines 2846 2838 -8
===========================================
- Hits 2846 2831 -15
- Misses 0 7 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
❌ The last analysis has failed. |
|
|
While writing tests for The fixture introduces position = NaN
shape = [60,40] #example Then, testing My questions are:
I'm not very familiar with bbox datasets, so I wanted to check before adjusting the test expectations. I hope I explained myself 😅 |
| additional line segment to the first, creating a closed loop. | ||
| (See Notes). | ||
| name | ||
| Name of the LoI that is to be created. A default name will be |
There was a problem hiding this comment.
Nice catch on these typos and thanks for keeping these unrelated fixes in their own commit.
When little fixes like this aren't related to what the PR is about, we prefer to open them as separate quick PRs (or even a single 'docs: fix typos' PR collecting several). It keeps each PR's history clean, telling one story, which makes reviews and future git blame easier. For context, we 'squash-merge' PRs, meaning that individual commits here won't be visible on the main branch's history; instead the PR title becomes the commit message on main after merging (and hence good PR titles are important).
For tiny stuff like this it's a soft preference, and I've violated it myself in the past.
But it's a habit worth building as changes get bigger, so I'd recommed doing it here.
The unrelated changes being cleanly isolated in their own commit will make things easier. I would handle this by using git cherry-pick to move that commit onto a fresh branch off main and open it as its own PR, then drop it from this branch (you can use git revert for that.)
There was a problem hiding this comment.
Thanks for pointing this out! I think I fixed those typos while I was reading through the code, probably before I created this branch, and I didn't realize they had ended up mixed into this PR.
I agree that they should live in a separate PR. Sorry for the mess! 😅
The unrelated changes being cleanly isolated in their own commit will make things easier. I would handle this by using git cherry-pick to move that commit onto a fresh branch off main and open it as its own PR, then drop it from this branch (you can use git revert for that.)
Thanks for this info, I'll do that
Great find @anna-teruel and thanks for digging into this rather than just tweaking the test to make it pass! I think that fixture is unrealistic and needs updating. In a bboxes dataset, a frame where an individual is missing (not detected) is encoded as NaN in all three arrays at once— # movement/io/load_bboxes.py
position_array = np.full((n_frames, 2, n_individuals), np.nan, ...)
shape_array = np.full((n_frames, 2, n_individuals), np.nan, ...)
confidence_array = np.full((n_frames, n_individuals), np.nan, ...)
# ...only observed (frame, id) entries get writtenSo to answer your three questions directly:
No.
No, position + shape are a representation of the same detection, so their absence/presence should always go together.
Yes, exactly. That would be the consistent behaviour. As you've correctly identified, the My suggestion would be to update the shared fixture so a missing detection is NaN across all three arrays (and update its docstring to match). Something like: @pytest.fixture
def valid_bboxes_dataset_with_nan(valid_bboxes_dataset):
"""Return a valid bboxes dataset with NaN values for some detections.
A missing detection is represented by NaN in the ``position``, ``shape``
and ``confidence`` arrays simultaneously, mirroring how real bounding box
data (e.g. from VIA-tracks) encodes frames where an individual is absent.
Here, individual ``id_0`` is missing at frames 3, 7 and 8.
"""
nan_selection = {"individual": "id_0", "time": [3, 7, 8]}
valid_bboxes_dataset.position.loc[nan_selection] = np.nan
valid_bboxes_dataset.shape.loc[nan_selection] = np.nan
valid_bboxes_dataset.confidence.loc[nan_selection] = np.nan
return valid_bboxes_datasetI checked locally and this is safe (all the other tests that use this fixture still pass). |
|
Good catch re the fixture @anna-teruel ! If you want, you can also open that as a very small PR with just that change and we can merge that before this one. |
|
Hi @anna-teruel, your Reading it made me realise that there's one central question about the core design we have yet to address, because everything else will depend on the answer to that. None of this is "the current code is wrong", it's more that your draft has revealed some big questions, which is precisely why we love draft PRs! The issue with the current implementation
That on its own won't work for reading edited layer data, which is what we ultimately need for the feature (#993). Let's carefully think through the full journey some pose estimation data may go through:
Now, if we want to reverse that journey, we'd have to:
So the round trip we are actually interested in facilitating (and must verify in tests), is not simply The central design questionAs hinted above, the main question is which napari layer is the "source of truth" when we read data back? I think our intended UX shoud be for users to edit the Points layer, because that's much more feature-complete in We could decide on a clean principle:
In this model 'Points' become the primary data layer, and the other napari layer types 'follow' it. This means there are really two separate jobs, and we should keep them separate in the code:
There is a second related question hiding in there, namely, what's the bboxes editing scope. Do users edit only the points (centroids), with boxes translating to follow, or can they also resize (and rotate?) the boxes? For the purposes of te GSoC project, I propose limiting the edit scope to just the Points layer, i.e. the one representing the centroid of each box. This means users will be able to translate boxes (by dragging their centroid) and to swap their identities, but NOT resize them. This keeps the Points-layer-first principle consistent across poses and bboxes datasets. What that implies for the function's inputs and implementation
In practice, we may make the Points layer and properties dataframe as required inputs to A small but important implementation note for whichever path we take: we should reconstruct the dataset by indexing each point/box into its Suggested way forward for this PRThis whole discussion is relevant for the whole #993 project; we don't have to tackle all of that in this PR. I think what we need here is just:
A possible simplification would be to also forget about bboxes for now (gets rid of the shape complication), and just focus on making this work for poses (but we can discuss this in our next meeting). |
|
I agree the Points layer should be the source of truth (Tracks would be a bad choice imo, it would make our lives very difficult). So the PRs could be:
We could consider disabling the Tracks layer in the first version for clarity (this was my point here) |



Description
This PR is part of #1008 and #997, and is an early draft supporting save napari layers back to movement xarrays and a save widget in the napari GUI.
Why is this PR needed?
Currently, movement supports converting movement datasets into napari layers, but the reverse conversion is also needed for workflows where users manually edit tracking data in napari and then want to save the corrected data (#993)
What does this PR do?
This draft PR adds an initial implementation of a
napari_layers_to_ds()function to convert napari tracking layer data back into amovementxarray.Dataset.At this stage, the PR is opened as a draft to get early feedback on the implementation and API design before extending it further.
References
Part of:
Related to:
How has this PR been tested?
The code has been tested manually in a local Jupyter notebook using example tracking data.
Formal
pytesttests have not been added yet because this PR is currently opened as a draft for early feedback on the implementation.Is this a breaking change?
No. This PR adds new functionality and should not change existing behavior.
Does this PR require an update to the documentation?
Documentation and usage examples will be added once the implementation is finalized.
Because this is currently a draft PR, documentation has not yet been added.
Remaining work
This PR is opened as a draft to gather early feedback on the implementation. The following items are still planned:
bboxestracking data. The current implementation has only been developed and manually tested with pose-estimation data (poses).napari_layers_to_ds()and validate behavior across different datasets.save_widgetproviding basic save functionality, including output directory browsing.fpsinformation during napari ↔ movement conversionChecklist: