Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Commit 89e5a14

Browse files
Docs on manipulating trees (#180)
* why hierarchical data * add hierarchical data page to index * Simpsons family tree * evolutionary tree * WIP rearrangement of creating trees * fixed examples in data structures page * dict-like navigation * filesystem-like paths explained * split PR into parts * plan * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ipython bug * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * filter simpsons family tree by age * use new filter method * test about filter * simple example of mapping over a subtree * ideas for docs on iterating over trees * add section on iterating over subtree * text to accompany Simpsons family aging example * add voltage dataset * RMS as example of mapping custom computation * isomorphism * P=IV example of binary multiplication * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unfinished sections * fix * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent dffd32c commit 89e5a14

File tree

2 files changed

+263
-1
lines changed

2 files changed

+263
-1
lines changed

docs/source/hierarchical-data.rst

+260-1
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ Let's use a different example of a tree to discuss more complex relationships be
175175
]
176176
177177
We have used the :py:meth:`~DataTree.from_dict` constructor method as an alternate way to quickly create a whole tree,
178-
and :ref:`filesystem-like syntax <filesystem paths>`_ (to be explained shortly) to select two nodes of interest.
178+
and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest.
179179

180180
.. ipython:: python
181181
@@ -339,3 +339,262 @@ we can construct a complex tree quickly using the alternative constructor :py:me
339339
Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path
340340
(i.e. the node labelled `"c"` in this case.)
341341
This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`DataTree.from_dict`.
342+
343+
.. _iterating over trees:
344+
345+
Iterating over trees
346+
~~~~~~~~~~~~~~~~~~~~
347+
348+
You can iterate over every node in a tree using the subtree :py:class:`~DataTree.subtree` property.
349+
This returns an iterable of nodes, which yields them in depth-first order.
350+
351+
.. ipython:: python
352+
353+
for node in vertebrates.subtree:
354+
print(node.path)
355+
356+
A very useful pattern is to use :py:class:`~DataTree.subtree` conjunction with the :py:class:`~DataTree.path` property to manipulate the nodes however you wish,
357+
then rebuild a new tree using :py:meth:`DataTree.from_dict()`.
358+
359+
For example, we could keep only the nodes containing data by looping over all nodes,
360+
checking if they contain any data using :py:class:`~DataTree.has_data`,
361+
then rebuilding a new tree using only the paths of those nodes:
362+
363+
.. ipython:: python
364+
365+
non_empty_nodes = {node.path: node.ds for node in dt.subtree if node.has_data}
366+
DataTree.from_dict(non_empty_nodes)
367+
368+
You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.
369+
370+
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.)
371+
372+
.. _manipulating trees:
373+
374+
Manipulating Trees
375+
------------------
376+
377+
Subsetting Tree Nodes
378+
~~~~~~~~~~~~~~~~~~~~~
379+
380+
We can subset our tree to select only nodes of interest in various ways.
381+
382+
The :py:meth:`DataTree.filter` method can be used to retain only the nodes of a tree that meet a certain condition.
383+
For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
384+
First lets recreate the tree but with an `age` data variable in every node:
385+
386+
.. ipython:: python
387+
388+
simpsons = DataTree.from_dict(
389+
d={
390+
"/": xr.Dataset({"age": 83}),
391+
"/Herbert": xr.Dataset({"age": 40}),
392+
"/Homer": xr.Dataset({"age": 39}),
393+
"/Homer/Bart": xr.Dataset({"age": 10}),
394+
"/Homer/Lisa": xr.Dataset({"age": 8}),
395+
"/Homer/Maggie": xr.Dataset({"age": 1}),
396+
},
397+
name="Abe",
398+
)
399+
simpsons
400+
401+
Now let's filter out the minors:
402+
403+
.. ipython:: python
404+
405+
simpsons.filter(lambda node: node["age"] > 18)
406+
407+
The result is a new tree, containing only the nodes matching the condition.
408+
409+
(Yes, under the hood :py:meth:`~DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !)
410+
411+
.. _tree computation:
412+
413+
Computation
414+
-----------
415+
416+
`DataTree` objects are also useful for performing computations, not just for organizing data.
417+
418+
Operations and Methods on Trees
419+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
420+
421+
To show how applying operations across a whole tree at once can be useful,
422+
let's first create a example scientific dataset.
423+
424+
.. ipython:: python
425+
426+
def time_stamps(n_samples, T):
427+
"""Create an array of evenly-spaced time stamps"""
428+
return xr.DataArray(
429+
data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
430+
)
431+
432+
433+
def signal_generator(t, f, A, phase):
434+
"""Generate an example electrical-like waveform"""
435+
return A * np.sin(f * t.data + phase)
436+
437+
438+
time_stamps1 = time_stamps(n_samples=15, T=1.5)
439+
time_stamps2 = time_stamps(n_samples=10, T=1.0)
440+
441+
voltages = DataTree.from_dict(
442+
{
443+
"/oscilloscope1": xr.Dataset(
444+
{
445+
"potential": (
446+
"time",
447+
signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
448+
),
449+
"current": (
450+
"time",
451+
signal_generator(time_stamps1, f=2, A=1.2, phase=1),
452+
),
453+
},
454+
coords={"time": time_stamps1},
455+
),
456+
"/oscilloscope2": xr.Dataset(
457+
{
458+
"potential": (
459+
"time",
460+
signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
461+
),
462+
"current": (
463+
"time",
464+
signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
465+
),
466+
},
467+
coords={"time": time_stamps2},
468+
),
469+
}
470+
)
471+
voltages
472+
473+
Most xarray computation methods also exist as methods on datatree objects,
474+
so you can for example take the mean value of these two timeseries at once:
475+
476+
.. ipython:: python
477+
478+
voltages.mean(dim="time")
479+
480+
This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the
481+
tree one-by-one.
482+
483+
The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another
484+
485+
.. ipython:: python
486+
:okexcept:
487+
488+
voltages.isel(time=12)
489+
490+
Notice that the error raised helpfully indicates which node of the tree the operation failed on.
491+
492+
Arithmetic Methods on Trees
493+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
494+
495+
Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once.
496+
For example, we can advance the timeline of the Simpsons by a decade just by
497+
498+
.. ipython:: python
499+
500+
simpsons + 10
501+
502+
See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node.
503+
504+
Mapping Custom Functions Over Trees
505+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
506+
507+
You can map custom computation over each node in a tree using :py:func:`map_over_subtree`.
508+
You can map any function, so long as it takes `xarray.Dataset` objects as one (or more) of the input arguments,
509+
and returns one (or more) xarray datasets.
510+
511+
.. note::
512+
513+
Functions passed to :py:func:`map_over_subtree` cannot alter nodes in-place.
514+
Instead they must return new `xarray.Dataset` objects.
515+
516+
For example, we can define a function to calculate the Root Mean Square of a timeseries
517+
518+
.. ipython:: python
519+
520+
def rms(signal):
521+
return np.sqrt(np.mean(signal**2))
522+
523+
Then calculate the RMS value of these signals:
524+
525+
.. ipython:: python
526+
527+
rms(readings)
528+
529+
.. _multiple trees:
530+
531+
Operating on Multiple Trees
532+
---------------------------
533+
534+
The examples so far have involved mapping functions or methods over the nodes of a single tree,
535+
but we can generalize this to mapping functions over multiple trees at once.
536+
537+
Comparing Trees for Isomorphism
538+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
539+
540+
For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
541+
each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic",
542+
if they have the same number of nodes, and each corresponding node has the same number of children.
543+
We can check if any two trees are isomorphic using the :py:meth:`DataTree.isomorphic` method.
544+
545+
.. ipython:: python
546+
:okexcept:
547+
548+
dt1 = DataTree.from_dict({"a": None, "a/b": None})
549+
dt2 = DataTree.from_dict({"a": None})
550+
dt1.isomorphic(dt2)
551+
552+
dt3 = DataTree.from_dict({"a": None, "b": None})
553+
dt1.isomorphic(dt3)
554+
555+
dt4 = DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})})
556+
dt1.isomorphic(dt4)
557+
558+
If the trees are not isomorphic a :py:class:`~TreeIsomorphismError` will be raised.
559+
Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic.
560+
561+
Arithmetic Between Multiple Trees
562+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
563+
564+
Arithmetic operations like multiplication are binary operations, so as long as we have wo isomorphic trees,
565+
we can do arithmetic between them.
566+
567+
.. ipython:: python
568+
569+
currents = DataTree.from_dict(
570+
{
571+
"/oscilloscope1": xr.Dataset(
572+
{
573+
"current": (
574+
"time",
575+
signal_generator(time_stamps1, f=2, A=1.2, phase=1),
576+
),
577+
},
578+
coords={"time": time_stamps1},
579+
),
580+
"/oscilloscope2": xr.Dataset(
581+
{
582+
"current": (
583+
"time",
584+
signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
585+
),
586+
},
587+
coords={"time": time_stamps2},
588+
),
589+
}
590+
)
591+
currents
592+
593+
currents.isomorphic(voltages)
594+
595+
We could use this feature to quickly calculate the electrical power in our signal, P=IV.
596+
597+
.. ipython:: python
598+
599+
power = currents * voltages
600+
power

docs/source/whats-new.rst

+3
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ Bug fixes
4545
Documentation
4646
~~~~~~~~~~~~~
4747

48+
- Added new sections to page on ``Working with Hierarchical Data`` (:pull:`180`)
49+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
50+
4851
Internal Changes
4952
~~~~~~~~~~~~~~~~
5053

0 commit comments

Comments
 (0)