Skip to content

Commit f0809e4

Browse files
aladinorpre-commit-ci[bot]jhammandcherianTomNicholas
authored
Refactoring/fixing zarr-python v3 incompatibilities in xarray datatrees (#10020)
* fixing compatibility with relative paths in open_store function within open_dtree for zarr * fixing/refactoring test to be compatible with Zarr-python v3 * adding @requires_zarr_v3 decorator to TestZarrDatatreeIO * replacing 0 with 1 in _create_test_datatree wich will write a chunk * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing issues with groups * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixing issue with dict creation * fixing issues with Mypy * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring open_store in ZarrStore class to use Zarr.core.group.Group members * refactoring datree test for zarr ensuring compatibility with zarr-python v3 * importing zarr.core.group only inside open_store function * documenting changes in what's-nwe.rst file * Update xarray/backends/zarr.py Co-authored-by: Joe Hamman <[email protected]> * keeping grroup creation compatible with zarr v2 * fixing issue with mypy * adding root_path equal to '/' when opening group in zarr v3 to avoid ValueError: Consolidated metadata requested with 'use_consolidated=True' but not found in 'path'. * fixing tests accordingly * removing print statement * reverting changes made in unaligned test in zarr * adding requires_zarr_v3 decorator * changing max_depth=None in Group.members to get all nested groups * fixing unaligned test in datrees using zarr * Update xarray/backends/zarr.py Co-authored-by: Joe Hamman <[email protected]> * updating whats-new.rst entry * remove funny-looking line and refactor to ensure reading consolidated metadata * parametrize over whether or not we write consolidated metadata * fix consolidated metadata * ian hcanges * open_datatree_specific_group consolidated true works tom changes * refactoring * test: add consolidated parametrize to zarr datatree test * fix: group finding behavior consolidated * remove more debugging print statements * revert changes to test fixture * formatting * add decorator to parametrize over zarr formats * ensure both versions of zarr-python and both versions of zarr-python get tested * change datatree fixture to not produce values that would be fill_values by default in zarr * refactor test to make expected behaviour clearer * fix wrongly expected behaviour - should not expect inherited variable to be written to zarr * make arrays no longer scalars to dodge #10147 * fix bad merge * parametrize almost every test over zarr_format * parametrize encoding test over zarr_formats * use xfail in encoding test * updated expected behaviour of zarr on-disk in light of #10147 * fully revert change to simple_datatree test fixture by considered zarr's fill_value in assertion * parametrize unaligned_zarr test fixture over zarr_format * move parametrize_over_zarr_format decorator to apply to entire test class * for now explicitly consolidate metadata in test fixture * correct bug in writing of consolidated metadata * delete commented-out lines * merges from main * Revert "merges from main" This reverts commit 22ac9b4. * fix encodings test for zarr_format=3 * tidy up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * account for different default value of write_empty_chunks between zarr-python 2 vs 3 * fix expected encoding key for compressor in zarr-python v2 * account for exception type changing * various typing fixes * remove outdated comment * bool type Co-authored-by: Deepak Cherian <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Joe Hamman <[email protected]> Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Tom Nicholas <[email protected]> Co-authored-by: Ian Hunt-Isaak <[email protected]>
1 parent 45cc2ee commit f0809e4

File tree

5 files changed

+298
-130
lines changed

5 files changed

+298
-130
lines changed

doc/whats-new.rst

+5
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,10 @@ Deprecations
6666

6767
Bug fixes
6868
~~~~~~~~~
69+
70+
- Fix ``open_datatree`` incompatibilities with Zarr-Python V3 and refactor
71+
``TestZarrDatatreeIO`` accordingly (:issue:`9960`, :pull:`10020`).
72+
By `Alfonso Ladino-Rincon <https://github.com/aladinor>`_.
6973
- Default to resolution-dependent optimal integer encoding units when saving
7074
chunked non-nanosecond :py:class:`numpy.datetime64` or
7175
:py:class:`numpy.timedelta64` arrays to disk. Previously units of
@@ -97,6 +101,7 @@ Bug fixes
97101
datetimes and timedeltas (:issue:`8957`, :pull:`10050`).
98102
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
99103

104+
100105
Documentation
101106
~~~~~~~~~~~~~
102107
- Better expose the :py:class:`Coordinates` class in API reference (:pull:`10000`)

xarray/backends/zarr.py

+56-36
Original file line numberDiff line numberDiff line change
@@ -666,10 +666,21 @@ def open_store(
666666
use_zarr_fill_value_as_mask=use_zarr_fill_value_as_mask,
667667
zarr_format=zarr_format,
668668
)
669+
670+
from zarr import Group
671+
672+
group_members: dict[str, Group] = {}
669673
group_paths = list(_iter_zarr_groups(zarr_group, parent=group))
670-
return {
674+
for path in group_paths:
675+
if path == group:
676+
group_members[path] = zarr_group
677+
else:
678+
rel_path = path.removeprefix(f"{group}/")
679+
group_members[path] = zarr_group[rel_path.removeprefix("/")]
680+
681+
out = {
671682
group: cls(
672-
zarr_group.get(group),
683+
group_store,
673684
mode,
674685
consolidate_on_close,
675686
append_dim,
@@ -680,8 +691,9 @@ def open_store(
680691
use_zarr_fill_value_as_mask,
681692
cache_members=cache_members,
682693
)
683-
for group in group_paths
694+
for group, group_store in group_members.items()
684695
}
696+
return out
685697

686698
@classmethod
687699
def open_group(
@@ -1034,8 +1046,6 @@ def store(
10341046
if self._consolidate_on_close:
10351047
kwargs = {}
10361048
if _zarr_v3():
1037-
# https://github.com/zarr-developers/zarr-python/pull/2113#issuecomment-2386718323
1038-
kwargs["path"] = self.zarr_group.name.lstrip("/")
10391049
kwargs["zarr_format"] = self.zarr_group.metadata.zarr_format
10401050
zarr.consolidate_metadata(self.zarr_group.store, **kwargs)
10411051

@@ -1662,8 +1672,6 @@ def open_groups_as_dict(
16621672
zarr_version=None,
16631673
zarr_format=None,
16641674
) -> dict[str, Dataset]:
1665-
from xarray.core.treenode import NodePath
1666-
16671675
filename_or_obj = _normalize_path(filename_or_obj)
16681676

16691677
# Check for a group and make it a parent if it exists
@@ -1686,7 +1694,6 @@ def open_groups_as_dict(
16861694
)
16871695

16881696
groups_dict = {}
1689-
16901697
for path_group, store in stores.items():
16911698
store_entrypoint = StoreBackendEntrypoint()
16921699

@@ -1762,44 +1769,57 @@ def _get_open_params(
17621769
consolidated = False
17631770

17641771
if _zarr_v3():
1765-
missing_exc = ValueError
1772+
# TODO: replace AssertionError after https://github.com/zarr-developers/zarr-python/issues/2821 is resolved
1773+
missing_exc = AssertionError
17661774
else:
17671775
missing_exc = zarr.errors.GroupNotFoundError
17681776

1769-
if consolidated is None:
1770-
try:
1771-
zarr_group = zarr.open_consolidated(store, **open_kwargs)
1772-
except (ValueError, KeyError):
1773-
# ValueError in zarr-python 3.x, KeyError in 2.x.
1777+
if consolidated in [None, True]:
1778+
# open the root of the store, in case there is metadata consolidated there
1779+
group = open_kwargs.pop("path")
1780+
1781+
if consolidated:
1782+
# TODO: an option to pass the metadata_key keyword
1783+
zarr_root_group = zarr.open_consolidated(store, **open_kwargs)
1784+
elif consolidated is None:
1785+
# same but with more error handling in case no consolidated metadata found
17741786
try:
1775-
zarr_group = zarr.open_group(store, **open_kwargs)
1776-
emit_user_level_warning(
1777-
"Failed to open Zarr store with consolidated metadata, "
1778-
"but successfully read with non-consolidated metadata. "
1779-
"This is typically much slower for opening a dataset. "
1780-
"To silence this warning, consider:\n"
1781-
"1. Consolidating metadata in this existing store with "
1782-
"zarr.consolidate_metadata().\n"
1783-
"2. Explicitly setting consolidated=False, to avoid trying "
1784-
"to read consolidate metadata, or\n"
1785-
"3. Explicitly setting consolidated=True, to raise an "
1786-
"error in this case instead of falling back to try "
1787-
"reading non-consolidated metadata.",
1788-
RuntimeWarning,
1789-
)
1790-
except missing_exc as err:
1791-
raise FileNotFoundError(
1792-
f"No such file or directory: '{store}'"
1793-
) from err
1794-
elif consolidated:
1795-
# TODO: an option to pass the metadata_key keyword
1796-
zarr_group = zarr.open_consolidated(store, **open_kwargs)
1787+
zarr_root_group = zarr.open_consolidated(store, **open_kwargs)
1788+
except (ValueError, KeyError):
1789+
# ValueError in zarr-python 3.x, KeyError in 2.x.
1790+
try:
1791+
zarr_root_group = zarr.open_group(store, **open_kwargs)
1792+
emit_user_level_warning(
1793+
"Failed to open Zarr store with consolidated metadata, "
1794+
"but successfully read with non-consolidated metadata. "
1795+
"This is typically much slower for opening a dataset. "
1796+
"To silence this warning, consider:\n"
1797+
"1. Consolidating metadata in this existing store with "
1798+
"zarr.consolidate_metadata().\n"
1799+
"2. Explicitly setting consolidated=False, to avoid trying "
1800+
"to read consolidate metadata, or\n"
1801+
"3. Explicitly setting consolidated=True, to raise an "
1802+
"error in this case instead of falling back to try "
1803+
"reading non-consolidated metadata.",
1804+
RuntimeWarning,
1805+
)
1806+
except missing_exc as err:
1807+
raise FileNotFoundError(
1808+
f"No such file or directory: '{store}'"
1809+
) from err
1810+
1811+
# but the user should still receive a DataTree whose root is the group they asked for
1812+
if group and group != "/":
1813+
zarr_group = zarr_root_group[group.removeprefix("/")]
1814+
else:
1815+
zarr_group = zarr_root_group
17971816
else:
17981817
if _zarr_v3():
17991818
# we have determined that we don't want to use consolidated metadata
18001819
# so we set that to False to avoid trying to read it
18011820
open_kwargs["use_consolidated"] = False
18021821
zarr_group = zarr.open_group(store, **open_kwargs)
1822+
18031823
close_store_on_close = zarr_group.store is not store
18041824

18051825
# we use this to determine how to handle fill_value

xarray/core/datatree.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
TYPE_CHECKING,
1717
Any,
1818
Concatenate,
19-
Literal,
2019
NoReturn,
2120
ParamSpec,
2221
TypeVar,
@@ -1741,7 +1740,7 @@ def to_zarr(
17411740
consolidated: bool = True,
17421741
group: str | None = None,
17431742
write_inherited_coords: bool = False,
1744-
compute: Literal[True] = True,
1743+
compute: bool = True,
17451744
**kwargs,
17461745
):
17471746
"""

xarray/tests/__init__.py

+15
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,21 @@ def _importorskip(
165165

166166
has_array_api_strict, requires_array_api_strict = _importorskip("array_api_strict")
167167

168+
parametrize_zarr_format = pytest.mark.parametrize(
169+
"zarr_format",
170+
[
171+
pytest.param(2, id="zarr_format=2"),
172+
pytest.param(
173+
3,
174+
marks=pytest.mark.skipif(
175+
not has_zarr_v3,
176+
reason="zarr-python v2 cannot understand the zarr v3 format",
177+
),
178+
id="zarr_format=3",
179+
),
180+
],
181+
)
182+
168183

169184
def _importorskip_h5netcdf_ros3(has_h5netcdf: bool):
170185
if not has_h5netcdf:

0 commit comments

Comments
 (0)