Skip to content

Commit 42326c3

Browse files
committed
Merge branch 'main' into fix-duplicate-dimensions
* main: new whats-new section (pydata#9115) release v2024.06.0 (pydata#9113) release notes for 2024.06.0 (pydata#9092) [skip-ci] Try fixing hypothesis CI trigger (pydata#9112) Undo custom padding-top. (pydata#9107) add remaining core-dev citations [skip-ci][skip-rtd] (pydata#9110) Add user survey announcement to docs (pydata#9101) skip the `pandas` datetime roundtrip test with `pandas=3.0` (pydata#9104) Adds Matt Savoie to CITATION.cff (pydata#9103) [skip-ci] Fix skip-ci for hypothesis (pydata#9102) open_datatree performance improvement on NetCDF, H5, and Zarr files (pydata#9014)
2 parents af380cf + 9237f90 commit 42326c3

File tree

11 files changed

+345
-155
lines changed

11 files changed

+345
-155
lines changed

.github/workflows/hypothesis.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ jobs:
3939
if: |
4040
always()
4141
&& (
42-
(github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
43-
|| needs.detect-ci-trigger.outputs.triggered == 'true'
44-
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis')
42+
needs.detect-ci-trigger.outputs.triggered == 'false'
43+
&& ( (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
44+
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis'))
4545
)
4646
defaults:
4747
run:

CITATION.cff

+5
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,11 @@ authors:
8484
- family-names: "Scheick"
8585
given-names: "Jessica"
8686
orcid: "https://orcid.org/0000-0002-3421-4459"
87+
- family-names: "Savoie"
88+
given-names: "Matthew"
89+
orcid: "https://orcid.org/0000-0002-8881-2550"
90+
- family-names: "Littlejohns"
91+
given-names: "Owen"
8792
title: "xarray"
8893
abstract: "N-D labeled arrays and datasets in Python."
8994
license: Apache-2.0

doc/_static/style.css

+2-5
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ table.docutils td {
77
word-wrap: break-word;
88
}
99

10-
div.bd-header-announcement {
11-
background-color: unset;
12-
color: #000;
10+
.bd-header-announcement {
11+
background-color: var(--pst-color-info-bg);
1312
}
1413

1514
/* Reduce left and right margins */
@@ -222,8 +221,6 @@ main *:target::before {
222221
}
223222

224223
body {
225-
/* Add padding to body to avoid overlap with navbar. */
226-
padding-top: var(--navbar-height);
227224
width: 100%;
228225
}
229226

doc/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@
242242
Theme by the <a href="https://ebp.jupyterbook.org">Executable Book Project</a></p>""",
243243
twitter_url="https://twitter.com/xarray_dev",
244244
icon_links=[], # workaround for pydata/pydata-sphinx-theme#1220
245-
announcement="🍾 <a href='https://github.com/pydata/xarray/discussions/8462'>Xarray is now 10 years old!</a> 🎉",
245+
announcement="<a href='https://forms.gle/KEq7WviCdz9xTaJX6'>Xarray's 2024 User Survey is live now. Please take ~5 minutes to fill it out and help us improve Xarray.</a>",
246246
)
247247

248248

doc/whats-new.rst

+43-17
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,14 @@ What's New
1515
np.random.seed(123456)
1616
1717
18-
.. _whats-new.2024.05.1:
18+
.. _whats-new.2024.06.1:
1919

20-
v2024.06 (unreleased)
20+
v2024.06.1 (unreleased)
2121
-----------------------
2222

2323
New Features
2424
~~~~~~~~~~~~
2525

26-
Performance
27-
~~~~~~~~~~~
28-
29-
- Small optimization to the netCDF4 and h5netcdf backends (:issue:`9058`, :pull:`9067`).
30-
By `Deepak Cherian <https://github.com/dcherian>`_.
31-
- Small optimizations to help reduce indexing speed of datasets (:pull:`9002`).
32-
By `Mark Harfouche <https://github.com/hmaarrfk>`_.
33-
3426

3527
Breaking changes
3628
~~~~~~~~~~~~~~~~
@@ -40,14 +32,45 @@ Deprecations
4032
~~~~~~~~~~~~
4133

4234

35+
Bug fixes
36+
~~~~~~~~~
37+
38+
39+
Documentation
40+
~~~~~~~~~~~~~
41+
42+
43+
Internal Changes
44+
~~~~~~~~~~~~~~~~
45+
46+
47+
.. _whats-new.2024.06.0:
48+
49+
v2024.06.0 (Jun 13, 2024)
50+
-------------------------
51+
This release brings various performance optimizations and compatibility with the upcoming numpy 2.0 release.
52+
53+
Thanks to the 22 contributors to this release:
54+
Alfonso Ladino, David Hoese, Deepak Cherian, Eni Awowale, Ilan Gold, Jessica Scheick, Joe Hamman, Justus Magin, Kai Mühlbauer, Mark Harfouche, Mathias Hauser, Matt Savoie, Maximilian Roos, Mike Thramann, Nicolas Karasiak, Owen Littlejohns, Paul Ockenfuß, Philippe THOMY, Scott Henderson, Spencer Clark, Stephan Hoyer and Tom Nicholas
55+
56+
Performance
57+
~~~~~~~~~~~
58+
59+
- Small optimization to the netCDF4 and h5netcdf backends (:issue:`9058`, :pull:`9067`).
60+
By `Deepak Cherian <https://github.com/dcherian>`_.
61+
- Small optimizations to help reduce indexing speed of datasets (:pull:`9002`).
62+
By `Mark Harfouche <https://github.com/hmaarrfk>`_.
63+
- Performance improvement in `open_datatree` method for Zarr, netCDF4 and h5netcdf backends (:issue:`8994`, :pull:`9014`).
64+
By `Alfonso Ladino <https://github.com/aladinor>`_.
65+
66+
4367
Bug fixes
4468
~~~~~~~~~
4569
- Preserve conversion of timezone-aware pandas Datetime arrays to numpy object arrays
4670
(:issue:`9026`, :pull:`9042`).
4771
By `Ilan Gold <https://github.com/ilan-gold>`_.
48-
4972
- :py:meth:`DataArrayResample.interpolate` and :py:meth:`DatasetResample.interpolate` method now
50-
support aribtrary kwargs such as ``order`` for polynomial interpolation. (:issue:`8762`).
73+
support arbitrary kwargs such as ``order`` for polynomial interpolation (:issue:`8762`).
5174
By `Nicolas Karasiak <https://github.com/nkarasiak>`_.
5275

5376
- Allow chunking for arrays with duplicated dimension names (:issue:`8759`, :pull:`9099`).
@@ -56,16 +79,18 @@ Bug fixes
5679

5780
Documentation
5881
~~~~~~~~~~~~~
59-
- Add link to CF Conventions on packed data and sentence on type determination in doc/user-guide/io.rst (:issue:`9041`, :pull:`9045`).
82+
- Add link to CF Conventions on packed data and sentence on type determination in the I/O user guide (:issue:`9041`, :pull:`9045`).
6083
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
6184

6285

6386
Internal Changes
6487
~~~~~~~~~~~~~~~~
6588
- Migrates remainder of ``io.py`` to ``xarray/core/datatree_io.py`` and
66-
``TreeAttrAccessMixin`` into ``xarray/core/common.py`` (:pull: `9011`)
89+
``TreeAttrAccessMixin`` into ``xarray/core/common.py`` (:pull:`9011`).
6790
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_ and
6891
`Tom Nicholas <https://github.com/TomNicholas>`_.
92+
- Compatibility with numpy 2 (:issue:`8844`, :pull:`8854`, :pull:`8946`).
93+
By `Justus Magin <https://github.com/keewis>`_ and `Stephan Hoyer <https://github.com/shoyer>`_.
6994

7095

7196
.. _whats-new.2024.05.0:
@@ -124,8 +149,8 @@ Bug fixes
124149
<https://github.com/pandas-dev/pandas/issues/56147>`_ to
125150
:py:func:`pandas.date_range`, date ranges produced by
126151
:py:func:`xarray.cftime_range` with negative frequencies will now fall fully
127-
within the bounds of the provided start and end dates (:pull:`8999`). By
128-
`Spencer Clark <https://github.com/spencerkclark>`_.
152+
within the bounds of the provided start and end dates (:pull:`8999`).
153+
By `Spencer Clark <https://github.com/spencerkclark>`_.
129154

130155
Internal Changes
131156
~~~~~~~~~~~~~~~~
@@ -150,7 +175,8 @@ Internal Changes
150175
- ``transpose``, ``set_dims``, ``stack`` & ``unstack`` now use a ``dim`` kwarg
151176
rather than ``dims`` or ``dimensions``. This is the final change to make xarray methods
152177
consistent with their use of ``dim``. Using the existing kwarg will raise a
153-
warning. By `Maximilian Roos <https://github.com/max-sixty>`_
178+
warning.
179+
By `Maximilian Roos <https://github.com/max-sixty>`_
154180

155181
.. _whats-new.2024.03.0:
156182

properties/test_pandas_roundtrip.py

+5
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import pytest
1010

1111
import xarray as xr
12+
from xarray.tests import has_pandas_3
1213

1314
pytest.importorskip("hypothesis")
1415
import hypothesis.extra.numpy as npst # isort:skip
@@ -110,6 +111,10 @@ def test_roundtrip_pandas_dataframe(df) -> None:
110111
xr.testing.assert_identical(arr, roundtripped.to_xarray())
111112

112113

114+
@pytest.mark.skipif(
115+
has_pandas_3,
116+
reason="fails to roundtrip on pandas 3 (see https://github.com/pydata/xarray/issues/9098)",
117+
)
113118
@given(df=dataframe_strategy)
114119
def test_roundtrip_pandas_dataframe_datetime(df) -> None:
115120
# Need to name the indexes, otherwise Xarray names them 'dim_0', 'dim_1'.

xarray/backends/common.py

-30
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@
1919
if TYPE_CHECKING:
2020
from io import BufferedIOBase
2121

22-
from h5netcdf.legacyapi import Dataset as ncDatasetLegacyH5
23-
from netCDF4 import Dataset as ncDataset
24-
2522
from xarray.core.dataset import Dataset
2623
from xarray.core.datatree import DataTree
2724
from xarray.core.types import NestedSequence
@@ -131,33 +128,6 @@ def _decode_variable_name(name):
131128
return name
132129

133130

134-
def _open_datatree_netcdf(
135-
ncDataset: ncDataset | ncDatasetLegacyH5,
136-
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
137-
**kwargs,
138-
) -> DataTree:
139-
from xarray.backends.api import open_dataset
140-
from xarray.core.datatree import DataTree
141-
from xarray.core.treenode import NodePath
142-
143-
ds = open_dataset(filename_or_obj, **kwargs)
144-
tree_root = DataTree.from_dict({"/": ds})
145-
with ncDataset(filename_or_obj, mode="r") as ncds:
146-
for path in _iter_nc_groups(ncds):
147-
subgroup_ds = open_dataset(filename_or_obj, group=path, **kwargs)
148-
149-
# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again
150-
node_name = NodePath(path).name
151-
new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)
152-
tree_root._set_item(
153-
path,
154-
new_node,
155-
allow_overwrite=False,
156-
new_nodes_along_path=True,
157-
)
158-
return tree_root
159-
160-
161131
def _iter_nc_groups(root, parent="/"):
162132
from xarray.core.treenode import NodePath
163133

xarray/backends/h5netcdf_.py

+50-4
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,14 @@
33
import functools
44
import io
55
import os
6-
from collections.abc import Iterable
6+
from collections.abc import Callable, Iterable
77
from typing import TYPE_CHECKING, Any
88

99
from xarray.backends.common import (
1010
BACKEND_ENTRYPOINTS,
1111
BackendEntrypoint,
1212
WritableCFDataStore,
1313
_normalize_path,
14-
_open_datatree_netcdf,
1514
find_root_and_group,
1615
)
1716
from xarray.backends.file_manager import CachingFileManager, DummyFileManager
@@ -431,11 +430,58 @@ def open_dataset( # type: ignore[override] # allow LSP violation, not supporti
431430
def open_datatree(
432431
self,
433432
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
433+
*,
434+
mask_and_scale=True,
435+
decode_times=True,
436+
concat_characters=True,
437+
decode_coords=True,
438+
drop_variables: str | Iterable[str] | None = None,
439+
use_cftime=None,
440+
decode_timedelta=None,
441+
group: str | Iterable[str] | Callable | None = None,
434442
**kwargs,
435443
) -> DataTree:
436-
from h5netcdf.legacyapi import Dataset as ncDataset
444+
from xarray.backends.api import open_dataset
445+
from xarray.backends.common import _iter_nc_groups
446+
from xarray.core.datatree import DataTree
447+
from xarray.core.treenode import NodePath
448+
from xarray.core.utils import close_on_error
437449

438-
return _open_datatree_netcdf(ncDataset, filename_or_obj, **kwargs)
450+
filename_or_obj = _normalize_path(filename_or_obj)
451+
store = H5NetCDFStore.open(
452+
filename_or_obj,
453+
group=group,
454+
)
455+
if group:
456+
parent = NodePath("/") / NodePath(group)
457+
else:
458+
parent = NodePath("/")
459+
460+
manager = store._manager
461+
ds = open_dataset(store, **kwargs)
462+
tree_root = DataTree.from_dict({str(parent): ds})
463+
for path_group in _iter_nc_groups(store.ds, parent=parent):
464+
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
465+
store_entrypoint = StoreBackendEntrypoint()
466+
with close_on_error(group_store):
467+
ds = store_entrypoint.open_dataset(
468+
group_store,
469+
mask_and_scale=mask_and_scale,
470+
decode_times=decode_times,
471+
concat_characters=concat_characters,
472+
decode_coords=decode_coords,
473+
drop_variables=drop_variables,
474+
use_cftime=use_cftime,
475+
decode_timedelta=decode_timedelta,
476+
)
477+
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
478+
tree_root._set_item(
479+
path_group,
480+
new_node,
481+
allow_overwrite=False,
482+
new_nodes_along_path=True,
483+
)
484+
return tree_root
439485

440486

441487
BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)

xarray/backends/netCDF4_.py

+49-4
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import functools
44
import operator
55
import os
6-
from collections.abc import Iterable
6+
from collections.abc import Callable, Iterable
77
from contextlib import suppress
88
from typing import TYPE_CHECKING, Any
99

@@ -16,7 +16,6 @@
1616
BackendEntrypoint,
1717
WritableCFDataStore,
1818
_normalize_path,
19-
_open_datatree_netcdf,
2019
find_root_and_group,
2120
robust_getitem,
2221
)
@@ -672,11 +671,57 @@ def open_dataset( # type: ignore[override] # allow LSP violation, not supporti
672671
def open_datatree(
673672
self,
674673
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
674+
*,
675+
mask_and_scale=True,
676+
decode_times=True,
677+
concat_characters=True,
678+
decode_coords=True,
679+
drop_variables: str | Iterable[str] | None = None,
680+
use_cftime=None,
681+
decode_timedelta=None,
682+
group: str | Iterable[str] | Callable | None = None,
675683
**kwargs,
676684
) -> DataTree:
677-
from netCDF4 import Dataset as ncDataset
685+
from xarray.backends.api import open_dataset
686+
from xarray.backends.common import _iter_nc_groups
687+
from xarray.core.datatree import DataTree
688+
from xarray.core.treenode import NodePath
678689

679-
return _open_datatree_netcdf(ncDataset, filename_or_obj, **kwargs)
690+
filename_or_obj = _normalize_path(filename_or_obj)
691+
store = NetCDF4DataStore.open(
692+
filename_or_obj,
693+
group=group,
694+
)
695+
if group:
696+
parent = NodePath("/") / NodePath(group)
697+
else:
698+
parent = NodePath("/")
699+
700+
manager = store._manager
701+
ds = open_dataset(store, **kwargs)
702+
tree_root = DataTree.from_dict({str(parent): ds})
703+
for path_group in _iter_nc_groups(store.ds, parent=parent):
704+
group_store = NetCDF4DataStore(manager, group=path_group, **kwargs)
705+
store_entrypoint = StoreBackendEntrypoint()
706+
with close_on_error(group_store):
707+
ds = store_entrypoint.open_dataset(
708+
group_store,
709+
mask_and_scale=mask_and_scale,
710+
decode_times=decode_times,
711+
concat_characters=concat_characters,
712+
decode_coords=decode_coords,
713+
drop_variables=drop_variables,
714+
use_cftime=use_cftime,
715+
decode_timedelta=decode_timedelta,
716+
)
717+
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
718+
tree_root._set_item(
719+
path_group,
720+
new_node,
721+
allow_overwrite=False,
722+
new_nodes_along_path=True,
723+
)
724+
return tree_root
680725

681726

682727
BACKEND_ENTRYPOINTS["netcdf4"] = ("netCDF4", NetCDF4BackendEntrypoint)

0 commit comments

Comments
 (0)