|
| 1 | +############## |
| 2 | +Quick overview |
| 3 | +############## |
| 4 | + |
| 5 | +DataTrees |
| 6 | +--------- |
| 7 | + |
| 8 | +:py:class:`DataTree` is a tree-like container of ``DataArray`` objects, organised into multiple mutually alignable groups. |
| 9 | +You can think of it like a (recursive) ``dict`` of ``Dataset`` objects. |
| 10 | + |
| 11 | +Let's first make some example xarray datasets (following on from xarray's |
| 12 | +`quick overview <https://docs.xarray.dev/en/stable/getting-started-guide/quick-overview.html>`_ page): |
| 13 | + |
| 14 | +.. ipython:: python |
| 15 | +
|
| 16 | + import numpy as np |
| 17 | + import xarray as xr |
| 18 | +
|
| 19 | + data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]}) |
| 20 | + ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) |
| 21 | + ds |
| 22 | +
|
| 23 | + ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) |
| 24 | + ds2 |
| 25 | +
|
| 26 | + ds3 = xr.Dataset( |
| 27 | + dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), |
| 28 | + coords={"species": "human"}, |
| 29 | + ) |
| 30 | + ds3 |
| 31 | +
|
| 32 | +Now we'll put this data into a multi-group tree: |
| 33 | + |
| 34 | +.. ipython:: python |
| 35 | +
|
| 36 | + from datatree import DataTree |
| 37 | +
|
| 38 | + dt = DataTree.from_dict( |
| 39 | + {"root/simulation/coarse": ds, "root/simulation/fine": ds2, "root": ds3} |
| 40 | + ) |
| 41 | + print(dt) |
| 42 | +
|
| 43 | +This creates a datatree with various groups. We have one root group (named ``root``), containing information about individual people. |
| 44 | +The root group then has one subgroup ``simulation``, which contains no data itself but does contain another two subgroups, |
| 45 | +named ``fine`` and ``coarse``. |
| 46 | + |
| 47 | +The (sub-)sub-groups ``fine`` and ``coarse`` contain two very similar datasets. |
| 48 | +They both have an ``"x"`` dimension, but the dimension is of different lengths in each group, which makes the data in each group unalignable. |
| 49 | +In (``root``) we placed some completely unrelated information, showing how we can use a tree to store heterogenous data. |
| 50 | + |
| 51 | +The constraints on each group are therefore the same as the constraint on dataarrays within a single dataset. |
| 52 | + |
| 53 | +We created the sub-groups using a filesystem-like syntax, and accessing groups works the same way. |
| 54 | +We can access individual dataarrays in a similar fashion |
| 55 | + |
| 56 | +.. ipython:: python |
| 57 | +
|
| 58 | + dt["simulation/coarse/foo"] |
| 59 | +
|
| 60 | +and we can also pull out the data in a particular group as a ``Dataset`` object using ``.ds``: |
| 61 | + |
| 62 | +.. ipython:: python |
| 63 | +
|
| 64 | + dt["simulation/coarse"].ds |
| 65 | +
|
| 66 | +Operations map over subtrees, so we can take a mean over the ``x`` dimension of both the ``fine`` and ``coarse`` groups just by |
| 67 | + |
| 68 | +.. ipython:: python |
| 69 | +
|
| 70 | + avg = dt["simulation"].mean(dim="x") |
| 71 | + print(avg) |
| 72 | +
|
| 73 | +Here the ``"x"`` dimension used is always the one local to that sub-group. |
| 74 | + |
| 75 | +You can do almost everything you can do with ``Dataset`` objects with ``DataTree`` objects |
| 76 | +(including indexing and arithmetic), as operations will be mapped over every sub-group in the tree. |
| 77 | +This allows you to work with multiple groups of non-alignable variables at once. |
| 78 | + |
| 79 | +.. note:: |
| 80 | + |
| 81 | + If all of your variables are mutually alignable |
| 82 | + (i.e. they live on the same grid, such that every common dimension name maps to the same length), |
| 83 | + then you probably don't need :py:class:`DataTree`, and should consider just sticking with ``xarray.Dataset``. |
0 commit comments