diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 70c19e9f3..1ec9bd09f 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -26,9 +26,10 @@ repos: args: [hvplot] files: hvplot/ - repo: https://github.com/hoxbro/clean_notebook - rev: v0.1.10 + rev: v0.1.14 hooks: - id: clean-notebook + args: [-i, tags] - repo: https://github.com/codespell-project/codespell rev: v2.2.6 hooks: diff --git a/doc/user_guide/index.rst b/doc/user_guide/index.rst index 2d4e41f0d..31932ccd8 100644 --- a/doc/user_guide/index.rst +++ b/doc/user_guide/index.rst @@ -89,6 +89,9 @@ rather than Matplotlib. * `Timeseries Data `_ Using hvPlot when working with timeseries data. +* `Large Timeseries Data `_ + Using hvPlot when working with large timeseries data. + * `Statistical Plots `_ A number of statistical plot types modeled on the pandas.plotting module. @@ -117,5 +120,6 @@ rather than Matplotlib. Network Graphs Geographic Data Timeseries Data + Large Timeseries Statistical Plots Pandas API diff --git a/examples/getting_started/interactive.ipynb b/examples/getting_started/interactive.ipynb index 4286364b5..ee55745f7 100644 --- a/examples/getting_started/interactive.ipynb +++ b/examples/getting_started/interactive.ipynb @@ -2,6 +2,7 @@ "cells": [ { "cell_type": "markdown", + "id": "bcbd5216-59d4-4de9-b576-9727ce9b6435", "metadata": {}, "source": [ "hvPlot isn't only a plotting library, it is dedicated to make data exploration easier. In this guide you will see how it can help you to get better control over your data pipelines. We define a *data pipeline* as a series of commands that *transform* some data, such as aggregating, filtering, reshaping, renaming, etc. A data pipeline may include a *load* step that will provide the input data to the pipeline, e.g. reading the data from a data base. \n", @@ -14,6 +15,7 @@ { "cell_type": "code", "execution_count": null, + "id": "4ff4e131-e046-49a1-85ed-01151299adf8", "metadata": {}, "outputs": [], "source": [ @@ -23,6 +25,7 @@ }, { "cell_type": "markdown", + "id": "6ece0608-9102-40a6-b649-ee6e46e263b9", "metadata": {}, "source": [ "We load a dataset and get a handle on its unique *air* variable." @@ -31,6 +34,7 @@ { "cell_type": "code", "execution_count": null, + "id": "f34daeaa-bb75-4642-bfbe-c95973e78c5f", "metadata": {}, "outputs": [], "source": [ @@ -41,6 +45,7 @@ }, { "cell_type": "markdown", + "id": "4c105059-2f7d-44ec-ad73-9f65eef4cbc7", "metadata": {}, "source": [ "We want to better understand the temporal evolution of the air temperature over different latitudes compared to a baseline. The data pipeline we build includes:\n", @@ -57,6 +62,7 @@ { "cell_type": "code", "execution_count": null, + "id": "899aa9be-054c-4238-80f3-8706678d7346", "metadata": {}, "outputs": [], "source": [ @@ -78,6 +84,7 @@ }, { "cell_type": "markdown", + "id": "66063fd1-32b0-40dd-b2e5-bdb1ed595347", "metadata": {}, "source": [ "Without `.interactive()` we would manually change the values of `LATITUDE` and `ROLLING_WINDOW` to see how they affect the pipeline output. Instead we create two widgets with the values we expect them to take, we are basically declaring beforehand our parameter space. To create widgets we import [Panel](https://panel.holoviz.org) and pick two appropriate widgets from its [Reference Gallery](https://panel.holoviz.org/reference/index.html#widgets)." @@ -86,6 +93,7 @@ { "cell_type": "code", "execution_count": null, + "id": "03ff2aed-65d2-470a-a401-1c6f65fd5d05", "metadata": {}, "outputs": [], "source": [ @@ -97,6 +105,7 @@ }, { "cell_type": "markdown", + "id": "9a575b14-9973-4308-8b1c-c17b8d8b415e", "metadata": {}, "source": [ "Now we instantiate an *Interactive* object by calling `.interactive()` on our data. This object mirrors the underlying object API, it accepts all of its natural operations. We replace the data by the interactive object in the pipeline, and replace the constant parameters by the widgets we have just created." @@ -105,6 +114,7 @@ { "cell_type": "code", "execution_count": null, + "id": "dddaf71e-1f62-46c8-a35f-48dd04d3006a", "metadata": {}, "outputs": [], "source": [ @@ -114,6 +124,7 @@ { "cell_type": "code", "execution_count": null, + "id": "d660ad41-24bd-4236-8d57-268187bc7ac7", "metadata": {}, "outputs": [], "source": [ @@ -132,6 +143,7 @@ }, { "cell_type": "markdown", + "id": "41a8c760-d2d2-4de4-91dd-7cb0bfc7bd86", "metadata": {}, "source": [ "You can see that now the pipeline when rendered doesn't only consist of its output, it also includes the widgets that control it. Change the widgets' values and observe how the output dynamically updates.\n", @@ -144,6 +156,7 @@ { "cell_type": "code", "execution_count": null, + "id": "b3ac92e6-4041-44e1-9997-6d36c10457b1", "metadata": {}, "outputs": [], "source": [ @@ -154,6 +167,7 @@ }, { "cell_type": "markdown", + "id": "0606ad0e-2917-4bfa-b8ab-152ce2b8b837", "metadata": {}, "source": [ "For information on using `.interactive()` take a look at the [User Guide](../user_guide/Interactive.ipynb)." diff --git a/examples/reference/pandas/andrewscurves.ipynb b/examples/reference/pandas/andrewscurves.ipynb index eba77bb1a..99c17bca3 100644 --- a/examples/reference/pandas/andrewscurves.ipynb +++ b/examples/reference/pandas/andrewscurves.ipynb @@ -3,6 +3,7 @@ { "cell_type": "code", "execution_count": null, + "id": "c3152ae5-a7e8-4a87-946c-0622095a1a1a", "metadata": {}, "outputs": [], "source": [ @@ -11,6 +12,7 @@ }, { "cell_type": "markdown", + "id": "3ff5961b-3576-4b7b-9b97-fcabc717aafa", "metadata": {}, "source": [ "Andrews curves provides a mechanism for visualising clusters of multivariate data.\n", @@ -27,6 +29,7 @@ { "cell_type": "code", "execution_count": null, + "id": "36e7d95b-2810-4efa-ae16-291f63be1e0a", "metadata": {}, "outputs": [], "source": [ @@ -38,6 +41,7 @@ { "cell_type": "code", "execution_count": null, + "id": "96e78c6a-92ee-4820-ae8a-13220e2f9870", "metadata": {}, "outputs": [], "source": [ @@ -47,6 +51,7 @@ { "cell_type": "code", "execution_count": null, + "id": "27a21f83-4d3e-43a5-82e2-74f99cb7cfae", "metadata": {}, "outputs": [], "source": [ diff --git a/examples/reference/pandas/lagplot.ipynb b/examples/reference/pandas/lagplot.ipynb index 1a7b6d735..3a60c65cc 100644 --- a/examples/reference/pandas/lagplot.ipynb +++ b/examples/reference/pandas/lagplot.ipynb @@ -3,6 +3,7 @@ { "cell_type": "code", "execution_count": null, + "id": "0a497b8d-a35b-48b1-9e1a-e2acc907ff14", "metadata": {}, "outputs": [], "source": [ @@ -13,6 +14,7 @@ }, { "cell_type": "markdown", + "id": "992bc493-914a-4dbb-80cd-6d7ed12a3076", "metadata": {}, "source": [ "Lag plots are most commonly used to look for patterns in time series data." @@ -20,6 +22,7 @@ }, { "cell_type": "markdown", + "id": "f6d461a5-9c12-4042-8971-ad528af9977d", "metadata": {}, "source": [ "Given the following time series:" @@ -28,6 +31,7 @@ { "cell_type": "code", "execution_count": null, + "id": "16743185-fa7f-4854-90fc-f284907ad809", "metadata": {}, "outputs": [], "source": [ @@ -40,6 +44,7 @@ }, { "cell_type": "markdown", + "id": "8ed87848-6a86-4025-bd7e-7900a69706c9", "metadata": {}, "source": [ "A lag plot with `lag=1` returns:" @@ -48,6 +53,7 @@ { "cell_type": "code", "execution_count": null, + "id": "d14609df-a88a-44fc-85b0-426e139d9b4b", "metadata": {}, "outputs": [], "source": [ diff --git a/examples/reference/pandas/parallelcoordinates.ipynb b/examples/reference/pandas/parallelcoordinates.ipynb index 37c1995c9..91f851953 100644 --- a/examples/reference/pandas/parallelcoordinates.ipynb +++ b/examples/reference/pandas/parallelcoordinates.ipynb @@ -3,6 +3,7 @@ { "cell_type": "code", "execution_count": null, + "id": "1289b546-07a6-49f9-a21f-b2bd891b10b1", "metadata": {}, "outputs": [], "source": [ @@ -11,6 +12,7 @@ }, { "cell_type": "markdown", + "id": "7c867b2b-d891-483d-8b80-12103ad72b91", "metadata": {}, "source": [ "Parallel coordinates are a common way of visualizing and analyzing high-dimensional datasets.\n", @@ -21,6 +23,7 @@ { "cell_type": "code", "execution_count": null, + "id": "538eafff-24af-41ef-941d-c81499c88a39", "metadata": {}, "outputs": [], "source": [ @@ -32,6 +35,7 @@ { "cell_type": "code", "execution_count": null, + "id": "7e25f890-5e01-4358-b4a9-f0183647dd58", "metadata": {}, "outputs": [], "source": [ @@ -41,6 +45,7 @@ { "cell_type": "code", "execution_count": null, + "id": "eb4a6aec-36e0-43f6-b776-8cad6d926428", "metadata": {}, "outputs": [], "source": [ diff --git a/examples/reference/pandas/scattermatrix.ipynb b/examples/reference/pandas/scattermatrix.ipynb index d6d17d5e1..1cc747d42 100644 --- a/examples/reference/pandas/scattermatrix.ipynb +++ b/examples/reference/pandas/scattermatrix.ipynb @@ -3,6 +3,7 @@ { "cell_type": "code", "execution_count": null, + "id": "13612aff-dc6e-4cff-95a0-2f882720cbb5", "metadata": {}, "outputs": [], "source": [ @@ -13,6 +14,7 @@ }, { "cell_type": "markdown", + "id": "7a8636f3-de1d-419b-b2d9-37a6635b8ed4", "metadata": {}, "source": [ "`scatter_matrix` shows all the pairwise relationships between the columns of your data. Each non-diagonal entry plots the corresponding columns against another, while the diagonal plot shows the distribution of the data within each individual column.\n", @@ -23,6 +25,7 @@ { "cell_type": "code", "execution_count": null, + "id": "2f0122f2-5b40-4331-9818-243a04d6f8e6", "metadata": {}, "outputs": [], "source": [ @@ -34,6 +37,7 @@ { "cell_type": "code", "execution_count": null, + "id": "5a7b281a-5ea1-4dc5-886b-ec1ebcd49b3d", "metadata": {}, "outputs": [], "source": [ @@ -42,6 +46,7 @@ }, { "cell_type": "markdown", + "id": "047091e1-d2ad-4cd9-9016-6f057bbad2e2", "metadata": {}, "source": [ "The `chart` parameter allows to change the type of the *off-diagonal* plots." @@ -50,6 +55,7 @@ { "cell_type": "code", "execution_count": null, + "id": "ae047de0-33a8-44f5-9c4d-8a65c1482562", "metadata": {}, "outputs": [], "source": [ @@ -58,6 +64,7 @@ }, { "cell_type": "markdown", + "id": "66deeead-6af8-4652-b94b-6f53773f3b56", "metadata": {}, "source": [ "The `diagonal` parameter allows to change the type of the *diagonal* plots." @@ -66,6 +73,7 @@ { "cell_type": "code", "execution_count": null, + "id": "e83f3436-d84f-4298-8cec-e6b046aa0c6b", "metadata": {}, "outputs": [], "source": [ @@ -74,6 +82,7 @@ }, { "cell_type": "markdown", + "id": "0d80a5a8-ea2c-4bff-8785-b48909ac8ba6", "metadata": {}, "source": [ "Setting `tools` to include a selection tool like `box_select` and an inspection tool like `hover` permits further analysis." @@ -82,6 +91,7 @@ { "cell_type": "code", "execution_count": null, + "id": "21110261-3ea1-4a3d-9bed-f5971aac8316", "metadata": {}, "outputs": [], "source": [ @@ -91,6 +101,7 @@ { "cell_type": "code", "execution_count": null, + "id": "6b2c02e9-3bfd-4826-8358-3e9dc0461519", "metadata": {}, "outputs": [], "source": [ @@ -99,6 +110,7 @@ }, { "cell_type": "markdown", + "id": "85d57857-abc4-470c-9938-e8438f1efd2d", "metadata": {}, "source": [ "The `c` parameter allows to colorize the data by a given column, here by `'CAT'`. Note also that the `diagonal_kwds` parameter (equivalent to `hist_kwds` in this case or `density_kwds` for *kde* plots) allow to customize the diagonal plots." @@ -107,6 +119,7 @@ { "cell_type": "code", "execution_count": null, + "id": "cb363d1d-aa43-469c-be95-0d21dd140522", "metadata": {}, "outputs": [], "source": [ @@ -116,6 +129,7 @@ { "cell_type": "code", "execution_count": null, + "id": "3aefaf96-467b-48e4-a41a-a6a7a9e083c6", "metadata": {}, "outputs": [], "source": [ @@ -124,6 +138,7 @@ }, { "cell_type": "markdown", + "id": "69bdc5ba-9d2a-4a99-a844-6cc347f6f585", "metadata": {}, "source": [ "Scatter matrix plots may end up with a large number of points having to be rendered which can be challenging for the browser or even just crash it. In that case you should consider setting to `True` the `rasterize` (or `datashade`) parameter that uses [Datashader](https://datashader.org/) to render the off-diagonal plots on the backend and then send more efficient image-based representations to the browser.\n", @@ -134,6 +149,7 @@ { "cell_type": "code", "execution_count": null, + "id": "829e3e0f-1330-4af5-a7af-9c1e927659b1", "metadata": {}, "outputs": [], "source": [ @@ -142,6 +158,7 @@ }, { "cell_type": "markdown", + "id": "76fc17d6-aa09-4968-9dd5-b36645831e9f", "metadata": {}, "source": [ "When `rasterize` (or `datashade`) is toggled it's possible to make individual points more visible by setting `dynspread=True` or `spread=True`. Head over to the [Working with large data using datashader](https://holoviews.org/user_guide/Large_Data.html) guide of [HoloViews](https://holoviews.org/index.html) to learn more about these operations and what parameters they accept (which can be passed as `kwds` to `scatter_matrix`)." @@ -150,6 +167,7 @@ { "cell_type": "code", "execution_count": null, + "id": "a819c10e-6ef2-4f3d-a746-28e216c0dd4d", "metadata": {}, "outputs": [], "source": [ diff --git a/examples/user_guide/Large_Timeseries.ipynb b/examples/user_guide/Large_Timeseries.ipynb new file mode 100644 index 000000000..7a0232df0 --- /dev/null +++ b/examples/user_guide/Large_Timeseries.ipynb @@ -0,0 +1,441 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "artificial-english", + "metadata": { + "tags": [] + }, + "source": [ + "Effectively representing temporal dynamics in large datasets requires selecting appropriate visualization techniques that ensure responsiveness while providing both a macroscopic view of overall trends and a microscopic view of fine details. This guide will explore various methods, such as **WebGL Rendering**, **LTTB Downsampling**, **Datashader Rasterizing**, and **Minimap Contextualizing**, each suited for different aspects of large timeseries data visualization. We predominantly demonstrate the use of hvPlot syntax, leveraging HoloViews for more complex requirements. Although hvPlot supports multiple backends, including Matplotlib and Plotly, our focus will be on Bokeh due to its advanced capabilities in handling large timeseries data.\n", + "\n", + "\n", + "## Getting the data \n", + "\n", + "Here we have a DataFrame with 1.2 million rows containing standardized data from 5 different sensors." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "arabic-container", + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "df = pd.read_parquet(\"https://datasets.holoviz.org/sensor/v1/data.parq\")\n", + "df.sample(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "banned-richards", + "metadata": {}, + "outputs": [], + "source": [ + "df0 = df[df.sensor=='0']" + ] + }, + { + "cell_type": "markdown", + "id": "fourth-sentence", + "metadata": {}, + "source": [ + "Let's go ahead and plot this data using various approaches.\n", + "\n", + "## WebGL Rendering\n", + "\n", + "### Canvas Rendering - Prior Default\n", + "Rendering Bokeh plots in hvPlot or HoloViews has evolved significantly. Prior to 2023, Bokeh's custom HTML **Canvas** rendering was the default. This approach works well for datasets up to a few tens of thousands of points but struggles above 100K points, particularly in terms of zooming and panning speed. These days, if you want to utilize Bokeh's Canvas rendering, use `import holoviews as hv; hv.renderer(\"bokeh\").webgl = False` prior to creating your hvPlot or HoloViews object.\n", + "\n", + "### WebGL Rendering - Current Default\n", + "Around mid-2023, the adoption of improved **WebGL** as the default for hvPlot and HoloViews allowed for smoother interactions with larger datasets by utilizing GPU-acceleration. It's important to note that WebGL performance can vary based on your machine's specifications. For example, some Mac models may not exhibit a marked improvement in WebGL performance over Canvas due to GPU hardware configuration." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "constitutional-metabolism", + "metadata": {}, + "outputs": [], + "source": [ + "import holoviews as hv; hv.extension('bokeh')\n", + "import hvplot.pandas # noqa: F401\n", + "# Set notebook hvPlot/HoloViews default options\n", + "hv.opts.defaults(hv.opts.Curve(responsive=True))\n", + "\n", + "df0.hvplot(x=\"time\", y=\"value\", autorange='y', title=\"WebGL\", min_height=300)" + ] + }, + { + "cell_type": "markdown", + "id": "428042ef", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Note: `autorange='y'` is demonstrated here for automatic y-axis scaling, a feature from HoloViews 1.17 and hvPlot 0.9.0. You can omit that option if you prefer to set the y scaling manually using the zoom tool.\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "directed-proof", + "metadata": {}, + "source": [ + "Alone, both Canvas and WebGL rendering have a common limitation: they transfer the entire dataset from the server to the browser. This can be a significant bottleneck, especially for remote server setups or datasets larger than a million points. To address this, we'll explore other techniques like LTTB Downsampling, which focus on delivering only the necessary data for the current view. These methods offer more scalable solutions for interacting with large timeseries data, as we'll see in the following sections.\n", + "\n", + "## LTTB Downsampling\n", + "\n", + "### The Challenge with Simple Downsampling\n", + "\n", + "A straightforward approach to handling large datasets might involve plotting every _n_th datapoint using a method like df.sample:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "conservative-maldives", + "metadata": {}, + "outputs": [], + "source": [ + "df0.hvplot(x=\"time\", y=\"value\", color= '#003366', label = \"All the data\") *\\\n", + "df0.sample(500).hvplot(x=\"time\", y=\"value\", alpha=0.8, color='#FF6600', min_height=300,\n", + " label=\"Decimation\", title=\"Decimation: Don't do this!\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "c38a3dea", + "metadata": {}, + "source": [ + "However, this method, known as decimation or arbitrarily strided sampling, can lead to [aliasing](https://en.wikipedia.org/wiki/Downsampling_(signal_processing)), where the resulting plot misrepresents the actual data by missing crucial peaks, troughs, or slopes. For instance, significant variations visible in the WebGL plot of the previous section might be entirely absent in a decimated plot, making this approach generally inadvisable for accurate data representation.\n", + "\n", + "### The LTTB Solution\n", + "\n", + "To address this, a more sophisticated method like the [Largest Triangle Three Buckets (LTTB)](https://skemman.is/handle/1946/15343) algorithm can be employed. LTTB allows data points not contributing significantly to the visible shape to be dropped, reducing the amount of data to send to the browser but preserving the appearance (and particularly the envelope, i.e. highest and lowest values in a region).\n", + "\n", + "In hvPlot, adding `downsample=True` will enable the LTTB algorithm, which will automatically choose an appropriate number of samples for the current plot:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47e52cc0", + "metadata": {}, + "outputs": [], + "source": [ + "df0.hvplot(x=\"time\", y=\"value\", color='#003366', label = \"All the data\") *\\\n", + "df0.hvplot(x=\"time\", y=\"value\", color='#00B3B3', label=\"LTTB\", title=\"LTTB\",\n", + " min_height=300, alpha=.8, downsample=True)" + ] + }, + { + "cell_type": "markdown", + "id": "a25195ea", + "metadata": {}, + "source": [ + "The LTTB plot will closely resemble the WebGL plot in appearance, but in general, it is rendered much more quickly (especially for local browsing of remote computation).\n", + "\n", + "
\n", + "\n", + "Note: As LTTB dynamically depends on Python and therefore won't update as you zoom in on our website. If you are locally running this notebook with a live Python process, the plot will automatically update with additional detail as you zoom in.\n", + "\n", + "
\n", + "\n", + "\n", + "With LTTB, it is now practical to include all of the different sensors in a single plot without slowdown: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "208fada7", + "metadata": {}, + "outputs": [], + "source": [ + "df.hvplot(x=\"time\", y=\"value\", downsample=True, by='sensor', min_height=300, title=\"LTTB By Sensor\")" + ] + }, + { + "cell_type": "markdown", + "id": "conscious-collector", + "metadata": { + "tags": [ + "hide-cell" + ] + }, + "source": [ + "This makes LTTB an ideal default method for exploring timeseries datasets, particularly when the dataset size is unknown or too large for standard WebGL rendering.\n", + "\n", + "## Datashader Rasterizing" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "39ff4ae1", + "metadata": { + "tags": [ + "hide-cell" + ] + }, + "outputs": [], + "source": [ + "# Cell hidden on the website (hide-cell in tags)\n", + "from holoviews.operation.resample import ResampleOperation2D\n", + "ResampleOperation2D.width=1200\n", + "ResampleOperation2D.height=500" + ] + }, + { + "cell_type": "markdown", + "id": "28acbb0b-21b7-401e-af1e-29dee1f41287", + "metadata": {}, + "source": [ + "### Principles of Datashader\n", + "\n", + "While WebGL and LTTB both send individual data points to the web browser, [Datashader](https://datashader.org) rasterizing offers a fundamentally different approach to visualizing large datasets. Datashader operates by generating a fixed-size 2D binned array tailored to your screen's resolution during each zoom or pan event. In this array, each bin aggregates data points from its corresponding location, effectively creating a 2D histogram. So, instead of transmitting the entire dataset, only this optimized array is sent to the web browser, thereby displaying all relevant data at the current zoom level and facilitating the visualization of the largest datasets.\n", + "\n", + "❗ A couple important details: ❗\n", + "1. As with LTTB downsampling, Datashader rasterization dynamically depends on Python and, therefore, won't update as you zoom in on our website. If you are locally running this notebook with a live Python process, the plot will automatically update with additional detail as you zoom in.\n", + "2. Setting `line_width` to be greater than `0` activates [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing), smoothing the visual representation of lines that might otherwise look too pixelated.\n", + "\n", + "### Single Line Example\n", + "Activating Datashader rasterization for a single large timeseries curve in hvPlot is as simple as setting `rasterize=True`!" + ] + }, + { + "cell_type": "markdown", + "id": "8bbad008-adb6-4e00-ac4e-828445c3dd57", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Note: When plotting a single curve, the default behavior is to flatten the count in each pixel to better match the appearance of plotting a line without Datashader rasterization (see the [relevant PR](https://github.com/holoviz/holoviews/pull/6030) for details). If you want to restore these pixel count aggregations, just import Datashader (`import datashader as ds`) and activate 'self-intersection' in a count aggregator to hvPlot (`aggregator=ds.count(self_intersect=True)`).\n", + "\n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a623a983-5a91-40e1-ab98-0967d1f0afef", + "metadata": {}, + "outputs": [], + "source": [ + "df0.hvplot(x=\"time\", y=\"value\", rasterize=True, cnorm='eq_hist', padding=(0, 0.1),\n", + " min_height=300, autorange='y', title=\"Datashade\", colorbar=False, line_width=2)" + ] + }, + { + "cell_type": "markdown", + "id": "naughty-adventure", + "metadata": {}, + "source": [ + "### Multiple Categories Example\n", + "\n", + "For data with a line for each of several \"categories\" (sensors, in this case), Datashader can assign a different color to each of the sensor categories. The resulting image then blends these colors where data overlaps, providing visual cues for areas with high category intersection. This is particularly useful for datasets with multiple data series:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "expired-gallery", + "metadata": {}, + "outputs": [], + "source": [ + "df.hvplot(x=\"time\", y=\"value\", datashade=True, hover=True, padding=(0, 0.1), min_height=300,\n", + " by='sensor', title=\"Datashade Categories\", line_width=2)" + ] + }, + { + "cell_type": "markdown", + "id": "5fae9346-45e2-48bb-84a3-3ba95329345d", + "metadata": {}, + "source": [ + "When you're zoomed out, Datashader's effectiveness is apparent. The image it creates reveals the overall data distribution and patterns, with color and intensity showing areas of higher data concentration - where lines cross through the same pixel. Datashader rendering can therefore provide a good overview of the full shape of a long timeseries, helping you understand how the signal varies even when the variations involved are smaller than the pixels on the screen." + ] + }, + { + "cell_type": "markdown", + "id": "177ef209-13c8-43e4-848d-0ee40ed1b89f", + "metadata": {}, + "source": [ + "### Multiple Lines Per Category Example\n" + ] + }, + { + "cell_type": "markdown", + "id": "467fa694-23ba-437e-8b8f-d7485bcbb514", + "metadata": {}, + "source": [ + "Plotting hundreds or thousands of overlapping timeseries snippets relative to a set of events is important in domains like finance, sensor monitoring, and neuroscience. In neuroscience, for example, this approach is used to reveal distinct patterns across action potential waveforms from different neurons. Let's load a dataset of neural waveforms:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6c4d782b-5417-48ec-a773-817714929d91", + "metadata": {}, + "outputs": [], + "source": [ + "waves = pd.read_parquet(\"https://datasets.holoviz.org/waveform/v1/waveforms.parq\")\n", + "waves.head(2)" + ] + }, + { + "cell_type": "markdown", + "id": "463ae980-bd8e-4439-8fde-3804d66191ed", + "metadata": {}, + "source": [ + "This dataset contains numerous neural waveform snippets. To grasp its structure, we examine the length of each waveform and count of waveforms per neuron:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cabd9ea-1c75-4976-9c31-c4b745c54c00", + "metadata": {}, + "outputs": [], + "source": [ + "first_waveform = waves[(waves['Neuron'] == waves['Neuron'].unique()[0]) & (waves['Waveform'] == 0)]\n", + "print(f'Number of samples per waveform: {len(first_waveform)}')\n", + "waves.groupby('Neuron')['Waveform'].nunique().reset_index().rename(columns={'Waveform': '# Waveforms'})" + ] + }, + { + "cell_type": "markdown", + "id": "e51c2274-f963-4fa6-8b4c-fb56ba4d445a", + "metadata": {}, + "source": [ + "With a substantial number of waveforms and multiple categories (neurons), the density of data can make it difficult to accurately visualize patterns in the data. We can utilize hvPlot and Datashader, but there is currently one caveat: each waveform must be distinctly separated in the dataframe with a row containing `NaN` to effectively separate one waveform from another and still color by neuron with Datashader. This ensures each waveform is treated as an individual entity, avoiding misleading connections between the end of one waveform and the start of the next. Below, we can see one of these `NaN` rows at the end of the first waveform." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "77cf41a3-8206-4033-b555-3e47a86c1f38", + "metadata": {}, + "outputs": [], + "source": [ + "first_waveform.tail(3)" + ] + }, + { + "cell_type": "markdown", + "id": "dcc55202-ee53-42e5-8beb-c68d1031095b", + "metadata": {}, + "source": [ + "
\n", + "\n", + "Note: [Work is planned](https://github.com/holoviz/holoviews/issues/5976) to avoid having to prepare your dataset with `NaN`-separators. Stay tuned!\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "8be05959-aae1-4325-842a-71b7ee8b2e61", + "metadata": {}, + "source": [ + "With the `NaN` separators already in place, all we need to do is specify that hvPlot should color by neuron and apply datashader rasterization:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "28cadcc8-9f09-4a1c-ae35-06ffa9e02bfa", + "metadata": {}, + "outputs": [], + "source": [ + "waves.hvplot.line('Time', 'Amplitude', by='Neuron', hover=True, datashade=True,\n", + " xlabel='Time (ms)', ylabel='Amplitude (µV)', min_height=300,\n", + " title=\"Datashade Multiple Lines Per Category\", line_width=1)" + ] + }, + { + "cell_type": "markdown", + "id": "b4fe1f6f", + "metadata": {}, + "source": [ + "Datashader's approach, while comprehensive for large timeseries data, focuses on the entire dataset's view at a specific resolution. To explore data across different timescales, particularly when dealing with years of data but focusing on shorter intervals like a day or an hour, the next \"minimap\" approach offers an effective solution." + ] + }, + { + "cell_type": "markdown", + "id": "82346bfc", + "metadata": {}, + "source": [ + "## Minimap Contextualizing\n", + "\n", + "### Minimap Overview\n", + "Minimap introduces a way to visualize and navigate through extensive time ranges in your dataset. It allows you to maintain awareness of the larger context while focusing on a specific, smaller time range. This technique is particularly useful when dealing with timeseries data that span long durations but require detailed study of shorter intervals.\n", + "\n", + "### Implementing Minimap\n", + "To create a minimap, we use the HoloViz RangeToolLink, which links a main plot to a smaller overview plot. The smaller minimap plot provides a fixed, broad view of the data, and the main plot can be used for detailed examination. Note, we also make use of **Datashader rasterization** on the minimap and **LTTB downsampling** on the main plot to limit the data sent to the browser." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "smoking-findings", + "metadata": {}, + "outputs": [], + "source": [ + "from holoviews.plotting.links import RangeToolLink\n", + "\n", + "plot = df0.hvplot(x=\"time\", y=\"value\", rasterize=True, color='darkblue', line_width=2,\n", + " min_height=300, colorbar=False, ylim=(-9, 3), # optional: set initial y-range\n", + " xlim=(pd.Timestamp(\"2023-03-10\"), pd.Timestamp(\"2023-04-10\")), # optional: set initial x-range\n", + " ).opts(\n", + " backend_opts={\n", + " \"x_range.bounds\": (df0.time.min(), df0.time.max()), # optional: limit max viewable x-extent to data\n", + " \"y_range.bounds\": (df0.value.min()-1, df0.value.max()+1), # optional: limit max viewable y-extent to data\n", + " }\n", + ")\n", + "\n", + "minimap = df0.hvplot(x=\"time\", y=\"value\", height=150, padding=(0, 0.1), rasterize=True,\n", + " color='darkblue', colorbar=False, line_width=2).opts(toolbar='disable')\n", + "\n", + "link = RangeToolLink(minimap, plot, axes=[\"x\", \"y\"])\n", + "\n", + "(plot + minimap).opts(shared_axes=False).cols(1)" + ] + }, + { + "cell_type": "markdown", + "id": "lesser-magazine", + "metadata": {}, + "source": [ + "In this setup, you can interact with the minimap by dragging the grey selection box. The main plot above will update to reflect the selected range, allowing you to explore extensive datasets while focusing on specific segments.\n", + "\n", + "Here, we also demonstrate the use of `backend_opts` to configure properties of the Bokeh plotting library that are not yet exposed as HoloViews/hvPlot options. By setting hard outer limits on the plot's panning/zooming, we ensure that the view remains within the data's range, enhancing the user experience." + ] + }, + { + "cell_type": "markdown", + "id": "fd0ffe6b", + "metadata": { + "tags": [] + }, + "source": [ + "## Future Improvements\n", + "As we look to the future, our roadmap includes several exciting enhancements. A significant focus is to enrich Datashader inspections by incorporating rich hover tooltips for Datashader images. This addition will greatly enhance the data exploration experience, allowing users to access detailed information more intuitively.\n", + "\n", + "Additionally, we are working towards a more streamlined process for plotting multiple overlapping lines. Our goal is to evolve the current approach, eliminating the need for inserting `NaN` rows as separators in the data structure. This improvement will simplify data preparation, making the visualization of complex timeseries more accessible and user-friendly." + ] + } + ], + "metadata": { + "language_info": { + "name": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/examples/user_guide/Timeseries_Data.ipynb b/examples/user_guide/Timeseries_Data.ipynb index 81a1e34b2..825e4397d 100644 --- a/examples/user_guide/Timeseries_Data.ipynb +++ b/examples/user_guide/Timeseries_Data.ipynb @@ -192,20 +192,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### Downsample time series\n", + "### Working with Large Timeseries\n", "\n", - "*(Available with HoloViews >= 1.16)*\n", - "\n", - "An option when working with large time series is to downsample the data before plotting it. This can be done with `downsample=True`, which applies the `lttb` (Largest Triangle Three Buckets) algorithm to the data." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "sst.hvplot(label=\"original\") * sst.hvplot(downsample=True, label=\"downsampled\")" + "Working with large timeseries presents new visualization challenges. Consult our [Large Timeseries User Guide](Large_Timeseries.ipynb) to learn about \n", + "various approaches." ] } ],