Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenating zarr groups with xr.open_datatree results in bad output #9912

Open
5 tasks done
lukegre opened this issue Dec 20, 2024 · 0 comments
Open
5 tasks done

Concatenating zarr groups with xr.open_datatree results in bad output #9912

lukegre opened this issue Dec 20, 2024 · 0 comments
Labels
bug topic-combine combine/concat/merge topic-DataTree Related to the implementation of a DataTree class

Comments

@lukegre
Copy link

lukegre commented Dec 20, 2024

What happened?

I tried to load and then concatenate groups of a zarr file that was loaded with xr.open_datatree. The groups of the data are years in an ERA5 time series. The concatenated result seems to be more of a mean seasonal cycle than the time series.

image

What did you expect to happen?

When looping through the groups with xr.open_zarr, the data is as expected, showing interannual variability.

image

Minimal Complete Verifiable Example

# %pip install s3fs

import xarray as xr
from matplotlib import pyplot as plt

# data set up for bug report
s3_uri = 's3://spi-greenfjord-public/era5_t2m-test_data_for_bug_report.zarr'

## specifications for zarr + s3 bucket ####################################
kwargs = dict(
    consolidated=True, 
    chunks={},
    storage_options=dict(
        anon=True, 
        endpoint_url='https://os.zhdk.cloud.switch.ch'))

## xr.open_datatree #######################################################
datatree = xr.open_datatree(s3_uri, engine='zarr', **kwargs)
ds_treecat = xr.combine_nested([datatree[year].ds for year in datatree], concat_dim="time")  # same behaviour with xr.concat
ds_treecat = ds_treecat.compute()

## xr.open_zarr #######################################################
ds_zarrlist = [xr.open_zarr(s3_uri, group=year, **kwargs) for year in range(1980, 2023)]
ds_zarrcat = xr.combine_nested(ds_zarrlist, concat_dim='time')
ds_zarrcat = ds_zarrcat.compute()

## Plotting #######################################################
def plot_t2m_time_series(da_hourly, label='', **kwargs):
    if 'ax' not in kwargs:
        fig, ax = plt.subplots(figsize=(12, 3), dpi=140)
        kwargs['ax'] = ax

    da_daily = da_hourly.resample(time='1D').mean()
    da_yearly = da_hourly.resample(time='1YS').mean()

    props = dict(lw=0.2) | kwargs
    da_daily.plot(**props)

    props = props | dict(lw=5, label=label, c=ax.get_lines()[-1].get_color())
    da_yearly.plot(**props)
    return fig, ax


_, ax0 = plot_t2m_time_series(ds_treecat.t2m, label="xr.open_datatree(s3_uri, engine='zarr', consolidated=True, chunks={{}})")
_, ax1 = plot_t2m_time_series(ds_zarrcat.t2m, label="xr.open_zarr(s3_uri, group=year, consolidated=True) ...", c='C1')

for ax in [ax0, ax1]:
    ax.set_title('ERA5 2m temperature for area in Greenland (1980-2022)', loc='left')
    ax.legend(ncol=2, frameon=True, edgecolor='none')

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.10 | packaged by conda-forge | (main, Sep 10 2024, 10:57:35) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.2.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2

xarray: 2024.11.0
pandas: 2.2.3
numpy: 2.2.0
scipy: 1.14.1
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 2.18.4
cftime: 1.6.4.post1
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.12.1
distributed: None
matplotlib: 3.10.0
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.12.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.6.0
pip: None
conda: None
pytest: None
mypy: None
IPython: 8.30.0
sphinx: None
/Users/luke/SDSC/CryoGrid/era5-downloader/.venv/lib/python3.11/site-packages/_distutils_hack/init.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
warnings.warn(

@lukegre lukegre added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 20, 2024
@TomNicholas TomNicholas added topic-DataTree Related to the implementation of a DataTree class topic-combine combine/concat/merge and removed needs triage Issue that has not been reviewed by xarray team member labels Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-combine combine/concat/merge topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

No branches or pull requests

2 participants