Skip to content
forked from pydata/xarray

Commit 566fd37

Browse files
committed
Merge remote-tracking branch 'upstream/main' into chunk-by-frequency
* upstream/main: [skip-ci] Try fixing hypothesis CI trigger (pydata#9112) Undo custom padding-top. (pydata#9107) add remaining core-dev citations [skip-ci][skip-rtd] (pydata#9110) Add user survey announcement to docs (pydata#9101) skip the `pandas` datetime roundtrip test with `pandas=3.0` (pydata#9104) Adds Matt Savoie to CITATION.cff (pydata#9103) [skip-ci] Fix skip-ci for hypothesis (pydata#9102) open_datatree performance improvement on NetCDF, H5, and Zarr files (pydata#9014) Migrate datatree io.py and common.py into xarray/core (pydata#9011) Micro optimizations to improve indexing (pydata#9002) (fix): don't handle time-dtypes as extension arrays in `from_dataframe` (pydata#9042)
2 parents 8a980ef + 6554855 commit 566fd37

20 files changed

+556
-308
lines changed

.github/workflows/hypothesis.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ jobs:
3939
if: |
4040
always()
4141
&& (
42-
(github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
43-
|| needs.detect-ci-trigger.outputs.triggered == 'true'
44-
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis')
42+
needs.detect-ci-trigger.outputs.triggered == 'false'
43+
&& ( (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
44+
|| contains( github.event.pull_request.labels.*.name, 'run-slow-hypothesis'))
4545
)
4646
defaults:
4747
run:

CITATION.cff

+5
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,11 @@ authors:
8484
- family-names: "Scheick"
8585
given-names: "Jessica"
8686
orcid: "https://orcid.org/0000-0002-3421-4459"
87+
- family-names: "Savoie"
88+
given-names: "Matthew"
89+
orcid: "https://orcid.org/0000-0002-8881-2550"
90+
- family-names: "Littlejohns"
91+
given-names: "Owen"
8792
title: "xarray"
8893
abstract: "N-D labeled arrays and datasets in Python."
8994
license: Apache-2.0

doc/_static/style.css

+2-5
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ table.docutils td {
77
word-wrap: break-word;
88
}
99

10-
div.bd-header-announcement {
11-
background-color: unset;
12-
color: #000;
10+
.bd-header-announcement {
11+
background-color: var(--pst-color-info-bg);
1312
}
1413

1514
/* Reduce left and right margins */
@@ -222,8 +221,6 @@ main *:target::before {
222221
}
223222

224223
body {
225-
/* Add padding to body to avoid overlap with navbar. */
226-
padding-top: var(--navbar-height);
227224
width: 100%;
228225
}
229226

doc/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@
242242
Theme by the <a href="https://ebp.jupyterbook.org">Executable Book Project</a></p>""",
243243
twitter_url="https://twitter.com/xarray_dev",
244244
icon_links=[], # workaround for pydata/pydata-sphinx-theme#1220
245-
announcement="🍾 <a href='https://github.com/pydata/xarray/discussions/8462'>Xarray is now 10 years old!</a> 🎉",
245+
announcement="<a href='https://forms.gle/KEq7WviCdz9xTaJX6'>Xarray's 2024 User Survey is live now. Please take ~5 minutes to fill it out and help us improve Xarray.</a>",
246246
)
247247

248248

doc/whats-new.rst

+16-6
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ What's New
1717
1818
.. _whats-new.2024.05.1:
1919

20-
v2024.05.1 (unreleased)
20+
v2024.06 (unreleased)
2121
-----------------------
2222

2323
New Features
@@ -28,6 +28,10 @@ Performance
2828

2929
- Small optimization to the netCDF4 and h5netcdf backends (:issue:`9058`, :pull:`9067`).
3030
By `Deepak Cherian <https://github.com/dcherian>`_.
31+
- Small optimizations to help reduce indexing speed of datasets (:pull:`9002`).
32+
By `Mark Harfouche <https://github.com/hmaarrfk>`_.
33+
- Performance improvement in `open_datatree` method for Zarr, netCDF4 and h5netcdf backends (:issue:`8994`, :pull:`9014`).
34+
By `Alfonso Ladino <https://github.com/aladinor>`_.
3135

3236

3337
Breaking changes
@@ -40,6 +44,9 @@ Deprecations
4044

4145
Bug fixes
4246
~~~~~~~~~
47+
- Preserve conversion of timezone-aware pandas Datetime arrays to numpy object arrays
48+
(:issue:`9026`, :pull:`9042`).
49+
By `Ilan Gold <https://github.com/ilan-gold>`_.
4350

4451
- :py:meth:`DataArrayResample.interpolate` and :py:meth:`DatasetResample.interpolate` method now
4552
support aribtrary kwargs such as ``order`` for polynomial interpolation. (:issue:`8762`).
@@ -54,6 +61,10 @@ Documentation
5461

5562
Internal Changes
5663
~~~~~~~~~~~~~~~~
64+
- Migrates remainder of ``io.py`` to ``xarray/core/datatree_io.py`` and
65+
``TreeAttrAccessMixin`` into ``xarray/core/common.py`` (:pull: `9011`)
66+
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_ and
67+
`Tom Nicholas <https://github.com/TomNicholas>`_.
5768

5869

5970
.. _whats-new.2024.05.0:
@@ -136,10 +147,9 @@ Internal Changes
136147
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, `Matt Savoie
137148
<https://github.com/flamingbear>`_ and `Tom Nicholas <https://github.com/TomNicholas>`_.
138149
- ``transpose``, ``set_dims``, ``stack`` & ``unstack`` now use a ``dim`` kwarg
139-
rather than ``dims`` or ``dimensions``. This is the final change to unify
140-
xarray functions to use ``dim``. Using the existing kwarg will raise a
141-
warning.
142-
By `Maximilian Roos <https://github.com/max-sixty>`_
150+
rather than ``dims`` or ``dimensions``. This is the final change to make xarray methods
151+
consistent with their use of ``dim``. Using the existing kwarg will raise a
152+
warning. By `Maximilian Roos <https://github.com/max-sixty>`_
143153

144154
.. _whats-new.2024.03.0:
145155

@@ -2903,7 +2913,7 @@ Bug fixes
29032913
process (:issue:`4045`, :pull:`4684`). It also enables encoding and decoding standard
29042914
calendar dates with time units of nanoseconds (:pull:`4400`).
29052915
By `Spencer Clark <https://github.com/spencerkclark>`_ and `Mark Harfouche
2906-
<http://github.com/hmaarrfk>`_.
2916+
<https://github.com/hmaarrfk>`_.
29072917
- :py:meth:`DataArray.astype`, :py:meth:`Dataset.astype` and :py:meth:`Variable.astype` support
29082918
the ``order`` and ``subok`` parameters again. This fixes a regression introduced in version 0.16.1
29092919
(:issue:`4644`, :pull:`4683`).

properties/test_pandas_roundtrip.py

+27
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import pytest
1010

1111
import xarray as xr
12+
from xarray.tests import has_pandas_3
1213

1314
pytest.importorskip("hypothesis")
1415
import hypothesis.extra.numpy as npst # isort:skip
@@ -30,6 +31,16 @@
3031
)
3132

3233

34+
datetime_with_tz_strategy = st.datetimes(timezones=st.timezones())
35+
dataframe_strategy = pdst.data_frames(
36+
[
37+
pdst.column("datetime_col", elements=datetime_with_tz_strategy),
38+
pdst.column("other_col", elements=st.integers()),
39+
],
40+
index=pdst.range_indexes(min_size=1, max_size=10),
41+
)
42+
43+
3344
@st.composite
3445
def datasets_1d_vars(draw) -> xr.Dataset:
3546
"""Generate datasets with only 1D variables
@@ -98,3 +109,19 @@ def test_roundtrip_pandas_dataframe(df) -> None:
98109
roundtripped = arr.to_pandas()
99110
pd.testing.assert_frame_equal(df, roundtripped)
100111
xr.testing.assert_identical(arr, roundtripped.to_xarray())
112+
113+
114+
@pytest.mark.skipif(
115+
has_pandas_3,
116+
reason="fails to roundtrip on pandas 3 (see https://github.com/pydata/xarray/issues/9098)",
117+
)
118+
@given(df=dataframe_strategy)
119+
def test_roundtrip_pandas_dataframe_datetime(df) -> None:
120+
# Need to name the indexes, otherwise Xarray names them 'dim_0', 'dim_1'.
121+
df.index.name = "rows"
122+
df.columns.name = "cols"
123+
dataset = xr.Dataset.from_dataframe(df)
124+
roundtripped = dataset.to_dataframe()
125+
roundtripped.columns.name = "cols" # why?
126+
pd.testing.assert_frame_equal(df, roundtripped)
127+
xr.testing.assert_identical(dataset, roundtripped.to_xarray())

xarray/backends/api.py

+9-9
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
from xarray.core.dataarray import DataArray
3737
from xarray.core.dataset import Dataset, _get_chunk, _maybe_chunk
3838
from xarray.core.indexes import Index
39-
from xarray.core.types import ZarrWriteModes
39+
from xarray.core.types import NetcdfWriteModes, ZarrWriteModes
4040
from xarray.core.utils import is_remote_uri
4141
from xarray.namedarray.daskmanager import DaskManager
4242
from xarray.namedarray.parallelcompat import guess_chunkmanager
@@ -1120,7 +1120,7 @@ def open_mfdataset(
11201120
def to_netcdf(
11211121
dataset: Dataset,
11221122
path_or_file: str | os.PathLike | None = None,
1123-
mode: Literal["w", "a"] = "w",
1123+
mode: NetcdfWriteModes = "w",
11241124
format: T_NetcdfTypes | None = None,
11251125
group: str | None = None,
11261126
engine: T_NetcdfEngine | None = None,
@@ -1138,7 +1138,7 @@ def to_netcdf(
11381138
def to_netcdf(
11391139
dataset: Dataset,
11401140
path_or_file: None = None,
1141-
mode: Literal["w", "a"] = "w",
1141+
mode: NetcdfWriteModes = "w",
11421142
format: T_NetcdfTypes | None = None,
11431143
group: str | None = None,
11441144
engine: T_NetcdfEngine | None = None,
@@ -1155,7 +1155,7 @@ def to_netcdf(
11551155
def to_netcdf(
11561156
dataset: Dataset,
11571157
path_or_file: str | os.PathLike,
1158-
mode: Literal["w", "a"] = "w",
1158+
mode: NetcdfWriteModes = "w",
11591159
format: T_NetcdfTypes | None = None,
11601160
group: str | None = None,
11611161
engine: T_NetcdfEngine | None = None,
@@ -1173,7 +1173,7 @@ def to_netcdf(
11731173
def to_netcdf(
11741174
dataset: Dataset,
11751175
path_or_file: str | os.PathLike,
1176-
mode: Literal["w", "a"] = "w",
1176+
mode: NetcdfWriteModes = "w",
11771177
format: T_NetcdfTypes | None = None,
11781178
group: str | None = None,
11791179
engine: T_NetcdfEngine | None = None,
@@ -1191,7 +1191,7 @@ def to_netcdf(
11911191
def to_netcdf(
11921192
dataset: Dataset,
11931193
path_or_file: str | os.PathLike,
1194-
mode: Literal["w", "a"] = "w",
1194+
mode: NetcdfWriteModes = "w",
11951195
format: T_NetcdfTypes | None = None,
11961196
group: str | None = None,
11971197
engine: T_NetcdfEngine | None = None,
@@ -1209,7 +1209,7 @@ def to_netcdf(
12091209
def to_netcdf(
12101210
dataset: Dataset,
12111211
path_or_file: str | os.PathLike,
1212-
mode: Literal["w", "a"] = "w",
1212+
mode: NetcdfWriteModes = "w",
12131213
format: T_NetcdfTypes | None = None,
12141214
group: str | None = None,
12151215
engine: T_NetcdfEngine | None = None,
@@ -1226,7 +1226,7 @@ def to_netcdf(
12261226
def to_netcdf(
12271227
dataset: Dataset,
12281228
path_or_file: str | os.PathLike | None,
1229-
mode: Literal["w", "a"] = "w",
1229+
mode: NetcdfWriteModes = "w",
12301230
format: T_NetcdfTypes | None = None,
12311231
group: str | None = None,
12321232
engine: T_NetcdfEngine | None = None,
@@ -1241,7 +1241,7 @@ def to_netcdf(
12411241
def to_netcdf(
12421242
dataset: Dataset,
12431243
path_or_file: str | os.PathLike | None = None,
1244-
mode: Literal["w", "a"] = "w",
1244+
mode: NetcdfWriteModes = "w",
12451245
format: T_NetcdfTypes | None = None,
12461246
group: str | None = None,
12471247
engine: T_NetcdfEngine | None = None,

xarray/backends/common.py

-30
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,6 @@
1919
if TYPE_CHECKING:
2020
from io import BufferedIOBase
2121

22-
from h5netcdf.legacyapi import Dataset as ncDatasetLegacyH5
23-
from netCDF4 import Dataset as ncDataset
24-
2522
from xarray.core.dataset import Dataset
2623
from xarray.core.datatree import DataTree
2724
from xarray.core.types import NestedSequence
@@ -131,33 +128,6 @@ def _decode_variable_name(name):
131128
return name
132129

133130

134-
def _open_datatree_netcdf(
135-
ncDataset: ncDataset | ncDatasetLegacyH5,
136-
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
137-
**kwargs,
138-
) -> DataTree:
139-
from xarray.backends.api import open_dataset
140-
from xarray.core.datatree import DataTree
141-
from xarray.core.treenode import NodePath
142-
143-
ds = open_dataset(filename_or_obj, **kwargs)
144-
tree_root = DataTree.from_dict({"/": ds})
145-
with ncDataset(filename_or_obj, mode="r") as ncds:
146-
for path in _iter_nc_groups(ncds):
147-
subgroup_ds = open_dataset(filename_or_obj, group=path, **kwargs)
148-
149-
# TODO refactor to use __setitem__ once creation of new nodes by assigning Dataset works again
150-
node_name = NodePath(path).name
151-
new_node: DataTree = DataTree(name=node_name, data=subgroup_ds)
152-
tree_root._set_item(
153-
path,
154-
new_node,
155-
allow_overwrite=False,
156-
new_nodes_along_path=True,
157-
)
158-
return tree_root
159-
160-
161131
def _iter_nc_groups(root, parent="/"):
162132
from xarray.core.treenode import NodePath
163133

xarray/backends/h5netcdf_.py

+50-4
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,14 @@
33
import functools
44
import io
55
import os
6-
from collections.abc import Iterable
6+
from collections.abc import Callable, Iterable
77
from typing import TYPE_CHECKING, Any
88

99
from xarray.backends.common import (
1010
BACKEND_ENTRYPOINTS,
1111
BackendEntrypoint,
1212
WritableCFDataStore,
1313
_normalize_path,
14-
_open_datatree_netcdf,
1514
find_root_and_group,
1615
)
1716
from xarray.backends.file_manager import CachingFileManager, DummyFileManager
@@ -431,11 +430,58 @@ def open_dataset( # type: ignore[override] # allow LSP violation, not supporti
431430
def open_datatree(
432431
self,
433432
filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
433+
*,
434+
mask_and_scale=True,
435+
decode_times=True,
436+
concat_characters=True,
437+
decode_coords=True,
438+
drop_variables: str | Iterable[str] | None = None,
439+
use_cftime=None,
440+
decode_timedelta=None,
441+
group: str | Iterable[str] | Callable | None = None,
434442
**kwargs,
435443
) -> DataTree:
436-
from h5netcdf.legacyapi import Dataset as ncDataset
444+
from xarray.backends.api import open_dataset
445+
from xarray.backends.common import _iter_nc_groups
446+
from xarray.core.datatree import DataTree
447+
from xarray.core.treenode import NodePath
448+
from xarray.core.utils import close_on_error
437449

438-
return _open_datatree_netcdf(ncDataset, filename_or_obj, **kwargs)
450+
filename_or_obj = _normalize_path(filename_or_obj)
451+
store = H5NetCDFStore.open(
452+
filename_or_obj,
453+
group=group,
454+
)
455+
if group:
456+
parent = NodePath("/") / NodePath(group)
457+
else:
458+
parent = NodePath("/")
459+
460+
manager = store._manager
461+
ds = open_dataset(store, **kwargs)
462+
tree_root = DataTree.from_dict({str(parent): ds})
463+
for path_group in _iter_nc_groups(store.ds, parent=parent):
464+
group_store = H5NetCDFStore(manager, group=path_group, **kwargs)
465+
store_entrypoint = StoreBackendEntrypoint()
466+
with close_on_error(group_store):
467+
ds = store_entrypoint.open_dataset(
468+
group_store,
469+
mask_and_scale=mask_and_scale,
470+
decode_times=decode_times,
471+
concat_characters=concat_characters,
472+
decode_coords=decode_coords,
473+
drop_variables=drop_variables,
474+
use_cftime=use_cftime,
475+
decode_timedelta=decode_timedelta,
476+
)
477+
new_node: DataTree = DataTree(name=NodePath(path_group).name, data=ds)
478+
tree_root._set_item(
479+
path_group,
480+
new_node,
481+
allow_overwrite=False,
482+
new_nodes_along_path=True,
483+
)
484+
return tree_root
439485

440486

441487
BACKEND_ENTRYPOINTS["h5netcdf"] = ("h5netcdf", H5netcdfBackendEntrypoint)

0 commit comments

Comments
 (0)