Skip to content

Commit

Permalink
Relax nanosecond datetime restriction in CF time decoding (#9618)
Browse files Browse the repository at this point in the history
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix scalar handling for timedelta based indexer

* remove stale error message and "ignore:Converting non-default" in testsuite

* add per review suggestions

* add/remove todo

* rename timeunit -> format

* return "ns" resolution per default for timedeltas, if not specified

* Be specific on types/dtpyes

* add comment

* add suggestions from code review

* fix docs

* fix test which isn't run for numpy2 atm

* add notes on to_datetime section, update examples showing usage of 'as_unit'

* use np.timedelta64 for to_timedelta example, update as_unit example, update note

* remove note

* Apply suggestions from code review

Co-authored-by: Deepak Cherian <[email protected]>

* refactor timedelta decoding to _numbers_to_timedelta and res-use it within decode_cf_timedelta

* fix conventions test, add todo

* run times through pd.Timestamp to catch possible overflows

* fix tests for cftime_to_nptime

* fix cftime_to_nptime in cftimeindex

* introduce pd.Timestamp instance check

* warn if out-of-bound datetimes are encoded with standard calendar, fall back to cftime encoding, add fix for cftime issue where python datetimes are not encoded correctly with date2num.

* fix time-coding.rst, add reference to time-series.rst.

* try to fix typing, ignore one

* try to fix docs

* revert doc-changes

* Add a non-ns test for polyval, polyfit

* more doc cosmetics

* add whats-new.rst entry

* add/fix coder docstring

* add xr.date_range example as suggested per review

* Apply suggestions from code review

Co-authored-by: Spencer Clark <[email protected]>

* Implement `time_unit` option for `decode_cf_timedelta` (#3)

* Fix timedelta encoding overflow issue; always decode to ns resolution

* Implement time_unit for decode_cf_timedelta

* Reduce diff

* fix typing

* use nanmin/nanmax, catch numpy RuntimeWarnings

* Apply suggestions from code review

Co-authored-by: Kai Mühlbauer <[email protected]>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Stephan Hoyer <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
Co-authored-by: Spencer Clark <[email protected]>
Co-authored-by: Deepak Cherian <[email protected]>
  • Loading branch information
6 people authored Jan 15, 2025
1 parent 2c8b6e6 commit 6bea715
Show file tree
Hide file tree
Showing 29 changed files with 1,310 additions and 490 deletions.
1 change: 1 addition & 0 deletions doc/internals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ The pages in this section are intended for:
how-to-add-new-backend
how-to-create-custom-index
zarr-encoding-spec
time-coding
475 changes: 475 additions & 0 deletions doc/internals/time-coding.rst

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -540,8 +540,8 @@ The ``units`` and ``calendar`` attributes control how xarray serializes ``dateti
``timedelta64`` arrays to datasets on disk as numeric values. The ``units`` encoding
should be a string like ``'days since 1900-01-01'`` for ``datetime64`` data or a string
like ``'days'`` for ``timedelta64`` data. ``calendar`` should be one of the calendar types
supported by netCDF4-python: 'standard', 'gregorian', 'proleptic_gregorian' 'noleap',
'365_day', '360_day', 'julian', 'all_leap', '366_day'.
supported by netCDF4-python: ``'standard'``, ``'gregorian'``, ``'proleptic_gregorian'``, ``'noleap'``,
``'365_day'``, ``'360_day'``, ``'julian'``, ``'all_leap'``, ``'366_day'``.

By default, xarray uses the ``'proleptic_gregorian'`` calendar and units of the smallest time
difference between values, with a reference time of the first time value.
Expand Down
47 changes: 33 additions & 14 deletions doc/user-guide/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,40 @@ core functionality.
Creating datetime64 data
------------------------

Xarray uses the numpy dtypes ``datetime64[ns]`` and ``timedelta64[ns]`` to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
Xarray uses the numpy dtypes ``datetime64[unit]`` and ``timedelta64[unit]``
(where unit is one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
data, which offer vectorized operations with numpy and smooth integration with pandas.

To convert to or create regular arrays of ``datetime64`` data, we recommend
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:

.. ipython:: python
pd.to_datetime(["2000-01-01", "2000-02-02"])
pd.DatetimeIndex(
["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
)
pd.date_range("2000-01-01", periods=365)
pd.date_range("2000-01-01", periods=365, unit="s")
It is also possible to use corresponding :py:func:`xarray.date_range`:

.. ipython:: python
xr.date_range("2000-01-01", periods=365)
xr.date_range("2000-01-01", periods=365, unit="s")
.. note::
Care has to be taken to create the output with the wanted resolution.
For :py:func:`pandas.date_range` the ``unit``-kwarg has to be specified
and for :py:func:`pandas.to_datetime` the selection of the resolution
isn't possible at all. For that :py:class:`pd.DatetimeIndex` can be used
directly. There is more in-depth information in section
:ref:`internals.timecoding`.

Alternatively, you can supply arrays of Python ``datetime`` objects. These get
converted automatically when used as arguments in xarray objects:
converted automatically when used as arguments in xarray objects (with us-resolution):

.. ipython:: python
Expand All @@ -51,12 +71,13 @@ attribute like ``'days since 2000-01-01'``).
.. note::

When decoding/encoding datetimes for non-standard calendars or for dates
before year 1678 or after year 2262, xarray uses the `cftime`_ library.
before `1582-10-15`_, xarray uses the `cftime`_ library by default.
It was previously packaged with the ``netcdf4-python`` package under the
name ``netcdftime`` but is now distributed separately. ``cftime`` is an
:ref:`optional dependency<installing>` of xarray.

.. _cftime: https://unidata.github.io/cftime
.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar


You can manual decode arrays in this form by passing a dataset to
Expand All @@ -66,17 +87,15 @@ You can manual decode arrays in this form by passing a dataset to
attrs = {"units": "hours since 2000-01-01"}
ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
# Default decoding to 'ns'-resolution
xr.decode_cf(ds)
# Decoding to 's'-resolution
coder = xr.coders.CFDatetimeCoder(time_unit="s")
xr.decode_cf(ds, decode_times=coder)
One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.
From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
:py:class:`~xarray.CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
See :ref:`CFTimeIndex` for more information.

Datetime indexing
-----------------
Expand Down
29 changes: 13 additions & 16 deletions doc/user-guide/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Weather and climate data
import xarray as xr
Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module(Explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.
Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module (explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.

.. _Climate and Forecast (CF) conventions: https://cfconventions.org

Expand Down Expand Up @@ -57,15 +57,14 @@ CF-compliant coordinate variables

.. _CFTimeIndex:

Non-standard calendars and dates outside the nanosecond-precision range
-----------------------------------------------------------------------
Non-standard calendars and dates outside the precision range
------------------------------------------------------------

Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `nanosecond-precision range`_
(approximately between years 1678 and 2262).
using a standard calendar, but outside the `precision range`_ and dates prior to `1582-10-15`_.

.. note::

Expand All @@ -75,18 +74,14 @@ using a standard calendar, but outside the `nanosecond-precision range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the nanosecond-precision range.
- Any dates are outside the nanosecond-precision range (prior xarray version 2025.01.2)
- Any dates are outside the time span limited by the resolution (from xarray version 2025.01.2)

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.
represented with the ``np.datetime64[unit]`` data type (where unit can be one of ``"s"``, ``"ms"``, ``"us"``, ``"ns"``), enabling the use of a :py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[unit]`` and their full set of associated features.

As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
values. For the time being, xarray still automatically casts datetime values
to nanosecond-precision for backwards compatibility with older pandas
versions; however, this is something we would like to relax going forward.
See :issue:`7493` for more discussion.
values. From xarray version 2025.01.2 on, non-nanosecond precision datetime values are also supported in xarray (this can be parameterized via :py:class:`~xarray.coders.CFDatetimeCoder` and ``decode_times`` kwarg). See also :ref:`internals.timecoding`.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
Expand Down Expand Up @@ -115,7 +110,7 @@ instance, we can create the same dates and DataArray we created above using:
Mirroring pandas' method with the same name, :py:meth:`~xarray.infer_freq` allows one to
infer the sampling frequency of a :py:class:`~xarray.CFTimeIndex` or a 1-D
:py:class:`~xarray.DataArray` containing cftime objects. It also works transparently with
``np.datetime64[ns]`` and ``np.timedelta64[ns]`` data.
``np.datetime64`` and ``np.timedelta64`` data (with "s", "ms", "us" or "ns" resolution).

.. ipython:: python
Expand All @@ -137,7 +132,9 @@ Conversion between non-standard calendar and to/from pandas DatetimeIndexes is
facilitated with the :py:meth:`xarray.Dataset.convert_calendar` method (also available as
:py:meth:`xarray.DataArray.convert_calendar`). Here, like elsewhere in xarray, the ``use_cftime``
argument controls which datetime backend is used in the output. The default (``None``) is to
use ``pandas`` when possible, i.e. when the calendar is standard and dates are within 1678 and 2262.
use ``pandas`` when possible, i.e. when the calendar is ``standard``/``gregorian`` and dates starting with `1582-10-15`_. There is no such restriction when converting to a ``proleptic_gregorian`` calendar.

.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar

.. ipython:: python
Expand Down Expand Up @@ -241,6 +238,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
da.resample(time="81min", closed="right", label="right", offset="3min").mean()
.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
35 changes: 33 additions & 2 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,39 @@ What's New
v2025.01.2 (unreleased)
-----------------------

This release brings non-nanosecond datetime resolution to xarray. In the
last couple of releases xarray has been prepared for that change. The code had
to be changed and adapted in numerous places, affecting especially the test suite.
The documentation has been updated accordingly and a new internal chapter
on :ref:`internals.timecoding` has been added.

To make the transition as smooth as possible this is designed to be fully backwards
compatible, keeping the current default of ``'ns'`` resolution on decoding.
To opt-in decoding into other resolutions (``'us'``, ``'ms'`` or ``'s'``) the
new :py:class:`coders.CFDatetimeCoder` is used as parameter to ``decode_times``
kwarg (see also :ref:`internals.default_timeunit`):

.. code-block:: python
coder = xr.coders.CFDatetimeCoder(time_unit="s")
ds = xr.open_dataset(filename, decode_times=coder)
There might slight changes when encoding/decoding times as some warning and
error messages have been removed or rewritten. Xarray will now also allow
non-nanosecond datetimes (with ``'us'``, ``'ms'`` or ``'s'`` resolution) when
creating DataArray's from scratch, picking the lowest possible resolution:

.. ipython:: python
xr.DataArray(data=[np.datetime64("2000-01-01", "D")], dims=("time",))
In a future release the current default of ``'ns'`` resolution on decoding will
eventually be deprecated.

New Features
~~~~~~~~~~~~

- Relax nanosecond datetime restriction in CF time decoding (:issue:`7493`, :pull:`9618`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_ and `Spencer Clark <https://github.com/spencerkclark>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand All @@ -37,7 +67,8 @@ Bug fixes

Documentation
~~~~~~~~~~~~~

- A chapter on :ref:`internals.timecoding` is added to the internal section (:pull:`9618`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.

Internal Changes
~~~~~~~~~~~~~~~~
Expand Down
9 changes: 6 additions & 3 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -775,7 +775,8 @@ def open_dataarray(
be replaced by NA. This keyword may not be supported by all the backends.
decode_times : bool, CFDatetimeCoder or dict-like, optional
If True, decode times encoded in the standard NetCDF datetime format
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
leave them encoded as numbers.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
Expand Down Expand Up @@ -984,7 +985,8 @@ def open_datatree(
This keyword may not be supported by all the backends.
decode_times : bool, CFDatetimeCoder or dict-like, optional
If True, decode times encoded in the standard NetCDF datetime format
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
leave them encoded as numbers.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
Expand Down Expand Up @@ -1210,7 +1212,8 @@ def open_groups(
This keyword may not be supported by all the backends.
decode_times : bool, CFDatetimeCoder or dict-like, optional
If True, decode times encoded in the standard NetCDF datetime format
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or leave them encoded as numbers.
into datetime objects. Otherwise, use :py:class:`coders.CFDatetimeCoder` or
leave them encoded as numbers.
Pass a mapping, e.g. ``{"my_variable": False}``,
to toggle this feature per-variable individually.
This keyword may not be supported by all the backends.
Expand Down
18 changes: 4 additions & 14 deletions xarray/coding/cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like
from xarray.core.pdcompat import (
count_not_none,
nanosecond_precision_timestamp,
default_precision_timestamp,
)
from xarray.core.utils import attempt_import, emit_user_level_warning

Expand All @@ -81,14 +81,6 @@
T_FreqStr = TypeVar("T_FreqStr", str, None)


def _nanosecond_precision_timestamp(*args, **kwargs):
# As of pandas version 3.0, pd.to_datetime(Timestamp(...)) will try to
# infer the appropriate datetime precision. Until xarray supports
# non-nanosecond precision times, we will use this constructor wrapper to
# explicitly create nanosecond-precision Timestamp objects.
return pd.Timestamp(*args, **kwargs).as_unit("ns")


def get_date_type(calendar, use_cftime=True):
"""Return the cftime date type for a given calendar name."""
if TYPE_CHECKING:
Expand All @@ -97,7 +89,7 @@ def get_date_type(calendar, use_cftime=True):
cftime = attempt_import("cftime")

if _is_standard_calendar(calendar) and not use_cftime:
return _nanosecond_precision_timestamp
return default_precision_timestamp

calendars = {
"noleap": cftime.DatetimeNoLeap,
Expand Down Expand Up @@ -1427,10 +1419,8 @@ def date_range_like(source, calendar, use_cftime=None):
if is_np_datetime_like(source.dtype):
# We want to use datetime fields (datetime64 object don't have them)
source_calendar = "standard"
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
source_start = nanosecond_precision_timestamp(source_start)
source_end = nanosecond_precision_timestamp(source_end)
source_start = default_precision_timestamp(source_start)
source_end = default_precision_timestamp(source_end)
else:
if isinstance(source, CFTimeIndex):
source_calendar = source.calendar
Expand Down
5 changes: 3 additions & 2 deletions xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,13 +581,14 @@ def to_datetimeindex(self, unsafe=False):
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
dtype='object', length=2, calendar='standard', freq=None)
>>> times.to_datetimeindex()
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[us]', freq=None)
"""

if not self._data.size:
return pd.DatetimeIndex([])

nptimes = cftime_to_nptime(self)
# transform to us-resolution is needed for DatetimeIndex
nptimes = cftime_to_nptime(self, time_unit="us")
calendar = infer_calendar_name(self)
if calendar not in _STANDARD_CALENDARS and not unsafe:
warnings.warn(
Expand Down
Loading

0 comments on commit 6bea715

Please sign in to comment.