-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xarray Cannot Read ConsolidateMetadata
Reference Output
#675
Comments
I also don't know if this is helpful but I'm a bit confused as well how zarr.consolidate_metadata is intended to or ever was working. I was also looking at convenience.py and noticed that
Which stuck out to me because this method is intended to add a metadata group, I think. Overall, I'm not seeing where the |
@martindurant suggested in fsspec/kerchunk#240 (comment) adding consolidation in MultiZarrtoZarr, if I understand correctly. In general, I haven't seen examples of Zarr Python being used to modify kerchunk reference files, so I'm wondering if that functionality would need to be provided by kerchunk? |
You can have a look at how Xarray does it. Calling For some thoughts on what belongs in Zarr vs. Kerchunk, I shared my perspective here: fsspec/kerchunk#377 (comment) |
It was helpful to read your perspective on where features belong, thanks for sharing. I interpreted "So when it comes time to refactor those prototypes into more stable interfaces" as when Zarr V3 is finalized and the base implementation is available in Zarr Python. Does the concept of a chunk manifest have the potential to apply to the V2 implementation in Zarr Python separately from kerchunk, to guide where the modifying references and metadata would happen if this feature were needed in the very near future? |
V2 is not extensible. The lack of extension points was one of the main reasons we needed to create V3. The lack of extensibility of V2 has driven people to implement all kinds of creative hacks at the Store level. Kerchunk reference filesystem is a perfect example of that. But the limitation of the Store API is that it's just a key / value interface. Stores don't "know" anything about Arrays, Groups, etc. This fact has led to Kerchunk essentially having to re-implement a bunch of logic that exists elsewhere in the stack, e.g. for concatenating arrays. It should certainly be possible to generate V3 metadata for existing V2 datasets without rewriting any chunks. That's the best path for migrating existing data. I don't think V2 will every be able to retroactively gain something like a chunk manifest. |
Before going too far, why would you want to consolidate metadata? In the case of kerchunk, all of the metadata is already separated, and "consolidate" would just duplicate information to no benefit. Having said that, I suppose the only thing missing from the pipeline is to write the references back. As @rabernat said, they have been changed in memory but not saved. Here is an example of amending kerchunk references - it works fine for chunks or metadata!
A lot of development has been waiting on this.
Well, persisting xarray's internal representation would have been better, but not tractable (by me!). That's assuming xarray-compatible datasets, which does not cover all zarrable datasets. |
Agreed. The slow and uneven rollout of V3 has really been a drag on our whole community. 😞 Lots of lessons to be learned from this. Fortunately there is a light at the end of the tunnel, thanks to the work @jhamman has been doing to finally get V3 implemented in Zarr Python. |
Versions:
zarr:
'2.16.1'
xarray.:
'2023.12.0'
pangeo-forge-recipes:
@main
Problem:
Given a
WriteCombinedReference
beam pipeline like this:we can only read the reference output with
zarr.open_consolidated
but notxr.open_dataset(..., consolidated=True)
Offending line on deserialization: https://github.com/zarr-developers/zarr-python/blob/main/zarr/storage.py#L2944
indeed looking at the raw
reference.json
it doesn't contain a.zmetadata
key even thoughzarr.consolidate_metadata
should be writing one. Totaly guess, but might have to do with the version branching here: https://github.com/zarr-developers/zarr-python/blob/main/zarr/convenience.py#L1250-L1263The text was updated successfully, but these errors were encountered: