-
-
Notifications
You must be signed in to change notification settings - Fork 370
Adding copy_store convenience method for Group
#3612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3612 +/- ##
==========================================
+ Coverage 59.14% 60.96% +1.82%
==========================================
Files 86 86
Lines 10172 10197 +25
==========================================
+ Hits 6016 6217 +201
+ Misses 4156 3980 -176
🚀 New features to boost your workflow:
|
|
@d-v-b I know right now this won't work when zarr v2 is used. I can add that functionality as well or we say the convenience function is not supported for zarr v2. |
|
added zarr v2 support. If in general the code is approved, I will add documentation. |
src/zarr/core/group.py
Outdated
| store: StoreLike, | ||
| *, | ||
| overwrite: bool = False, | ||
| consolidate_metadata: bool | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this here? IMO it's simpler if copying is just copying. If people want to add consolidated metadata, they should do that after copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adjusted it. It now consolidates if the source store was consolidated, but this does not require an argument. One more thing, I only take into account when consolidation happened at the root level. It could be (although the user should not do this) that consolidation happened at the subgroup level. This could be accounted for by checking if member. metadata.consolidated_metadata when looping through members. WDYT should I add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would copy the original group and its children exactly, including their consolidated metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the purpose of copying over stale consolidated metadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it keeps the scope of this function simple -- it just copies the original group exactly. If this tool produces a different group, then it's not really copying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the from_store method, I would open a separate PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consolidated metadata is already in the group metadata document for zarr v3 if "kind" is set to "inline".
👍 but doing something other (more or less) than just copying would require interpreting that which is too far IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @joshmoore the method under discussion here is a method on the Group class, not on stores. maybe this is a point of confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the only consolidated metadata-related parameter we need is whether iterating over the child groups uses their consolidated metadata (if present). See the relevant kwarg in the signature of members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added both, only thing remaining is probably adding consolidated_metadata in the async group call.
|
@d-v-b @joshmoore I think I have now addressed all comments and I also added documentation, so I think this would be ready for a final round of review. @d-v-b thanks for your patience. |
|
@d-v-b @joshmoore , just bumping in case you have some time to see whether you approve the current state:) |
| self, | ||
| store: StoreLike, | ||
| *, | ||
| overwrite: bool = False, | ||
| use_consolidated_for_children: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unlike create_group, this function doesn't take a path parameter, which means it's not possible to easily specify a location within a store for the group. until we make it easy to include this information in the store itself (#2272 ) we should probably take a path parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not certain yet, first the group is opened by calling self.open(self.store, zarr_format=target_zarr_format). If we allow for path to be passed I interpret it as subgroup within existing zarr store to which to copy whatever store you pass. Am I following you correctly?
In other words are you saying we should allow to write a store to a subgroup destination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.open(self.store, zarr_format=target_zarr_format)
I am still confused why this line is necessary. It will open an array or group at the root of the store, which is not what we want:
>>> import zarr
>>> zg = zarr.create_group(store={}, path='group')
>>> zg
<Group memory://133419005213760/group>
>>> zg.open(zg.store)
<Group memory://133419005213760>In other words are you saying we should allow to write a store to a subgroup destination?
this method does not write a store, it writes a group into a store. In normal group creation, you declare the store, and also the path within the store. See the signature of create_group. The question is whether copy_to should follow the same pattern, and take a path argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will implement it also for the sake of symmetric API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second look I think it is still a bit confusing, the signature for create_group in the Group class itself does not have path. However, create_group in the async and sync API does include this. I take it the sole reason for create_group in the classes is to create a subgroup so I still think implementing path would be good in that case to have it model more the module level functions rather than the instance methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, create_group in the async and sync API does include this.
IMO we should treat the async / sync api create_group as the reference here. We are long overdue a cleanup of these methods / functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-v-b implemented, let me know what you think when you have time!
| self, | ||
| store: StoreLike, | ||
| *, | ||
| overwrite: bool = False, | ||
| use_consolidated_for_children: bool = True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add a test that checks the signature of this function against the signature of the async version. otherwise, they have a tendency to drift. Here's an example of such a test:
zarr-python/tests/test_api/test_synchronous.py
Lines 26 to 41 in ee0e69a
| def test_docstrings_match(callable_name: str) -> None: | |
| """ | |
| Tests that the docstrings for the sync and async define identical parameters. | |
| """ | |
| callable_a = getattr(synchronous, callable_name) | |
| callable_b = getattr(asynchronous, callable_name) | |
| if callable_a.__doc__ is None: | |
| assert callable_b.__doc__ is None | |
| else: | |
| params_a = NumpyDocString(callable_a.__doc__)["Parameters"] | |
| params_b = NumpyDocString(callable_b.__doc__)["Parameters"] | |
| mismatch = [] | |
| for idx, (a, b) in enumerate(zip(params_a, params_b, strict=False)): | |
| if a != b: | |
| mismatch.append((idx, (a, b))) | |
| assert mismatch == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I did not do the actual docstrings itself, but I did write a test for all methods for classes and their async counterparts. I think this is generalizable to other classes as well, though currently only Group and AsyncGroup are tested. With this, already some mismatches are detected which for now are skipped but could be addressed in a follow up PR. Doing so would require deprecating some parameters. If you agree I can create an issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see test_class_method_parameters_match
[Description of PR]
This PR adds a convenience method for copying the store of a
Groupto a destination store.TODO:
docs/user-guide/*.mdchanges/@d-v-b