Skip to content

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Sep 16, 2025

The default engine when reading/writing netCDF files is now h5netcdf or scipy, which are typically faster than the prior default of netCDF4-python. You can control this default behavior explicitly via the new netcdf_engine_order parameter in set_options(), e.g., xr.set_options(netcdf_engine_order=['netcdf4', 'scipy', 'h5netcdf']) to restore the prior defaults.

I've also updated the documentation page which misled @lesserwhirls about Xarray supporting invalid netCDF files without invalid_netcdf=True.

The default `engine` when reading/writing netCDF files is now h5netcdf
or scipy, which are typically faster than the prior default of netCDF4-python.
You can control this default behavior explicitly via the new
`netcdf_engine_order` parameter in `set_options()`, e.g.,
`xr.set_options(netcdf_engine_order=['netcdf4', 'scipy', 'h5netcdf'])` to
restore the prior defaults.

I've also updated the documentation page which misled @lesserwhirls
about Xarray supporting invalid netCDF files without
`invalid_netcdf=True`.

Fixes pydata#10657
@github-actions github-actions bot added topic-backends topic-DataTree Related to the implementation of a DataTree class io labels Sep 16, 2025
@shoyer shoyer changed the title Add option for netcdf_engine_order Change default netCDF engine to use h5netcdf and add netcdf_engine_order Sep 16, 2025
@shoyer
Copy link
Member Author

shoyer commented Sep 16, 2025

Looking at the test failures, it looks like we previously supported writing NCZarr with ds.to_netcdf(f"file://{filename}#mode=nczarr"). Now we require also passing engine='netcdf4' explicitly.

Should we try to auto-detect URLs like this and use netcdf4 as the backend? Or is it better to encourage users to make an explicit choice?

@dcherian
Copy link
Contributor

in general I'm pro "explicit choice", but this would be a breaking change.

@malmans2 how common is nczarr use? I haven't really seen it.

@shoyer
Copy link
Member Author

shoyer commented Sep 17, 2025

I went ahead and added automatic support for writing nczarr. This wasn't hard to check.

@malmans2
Copy link
Contributor

in general I'm pro "explicit choice", but this would be a breaking change.

@malmans2 how common is nczarr use? I haven't really seen it.

I've never seen it actually used in python applications either. From a quick search on GitHub, it looks like the few packages that write to nczarr directly use netcdf4-python rather than xarray

@shoyer
Copy link
Member Author

shoyer commented Sep 17, 2025

I added supports_groups to BackendEntrypoint. Otherwise, we have no way to check if a backend supports open_datatree() short of calling the open_datatree() method.

This turned up because scipy is now used in preference to netcdf4 when opening netcdf v3 files, but scipy doesn't support opening groups.

In principle we could add support for reading groups to the SciPy backend (netcdf3 files arguably contain a single group, at the root node), but in any case this will also come up for custom backends.

@shoyer
Copy link
Member Author

shoyer commented Sep 23, 2025

I would love to get this in before the next release, to avoid needing repeated breaking changes.

Copy link
Contributor

@kmuehlbauer kmuehlbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Stephan. Nice to be able to parametrize this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should Xarray prefer h5netcdf and scipy to netCDF4?
4 participants