Skip to content

Commit

Permalink
Update datashader and minimap sections
Browse files Browse the repository at this point in the history
  • Loading branch information
droumis committed Dec 13, 2023
1 parent 439d077 commit 70fc75a
Showing 1 changed file with 56 additions and 35 deletions.
91 changes: 56 additions & 35 deletions examples/user_guide/Large_Timeseries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "artificial-english",
"metadata": {},
"source": [
"Effectively representing temporal dynamics in large datasets requires selecting appropriate visualization techniques that ensure responsiveness while providing both a macroscopic view of overall trends and a microscopic view of fine details. This guide will explore various methods, such as **WebGL Rendering**, **LTTB Downsampling**, **Datashader Rasterizing**, and **Minimap**, each suited for different aspects of large timeseries data visualization. We predominantly demonstrate the use of hvPlot syntax, leveraging HoloViews for more complex requirements. Although hvPlot supports multiple backends, including Matplotlib and Plotly, our focus will be on Bokeh due to its advanced capabilities in handling large timeseries data.\n",
"Effectively representing temporal dynamics in large datasets requires selecting appropriate visualization techniques that ensure responsiveness while providing both a macroscopic view of overall trends and a microscopic view of fine details. This guide will explore various methods, such as **WebGL Rendering**, **LTTB Downsampling**, **Datashader Rasterizing**, and **Minimap Contextualizing**, each suited for different aspects of large timeseries data visualization. We predominantly demonstrate the use of hvPlot syntax, leveraging HoloViews for more complex requirements. Although hvPlot supports multiple backends, including Matplotlib and Plotly, our focus will be on Bokeh due to its advanced capabilities in handling large timeseries data.\n",
"\n",
"\n",
"## Getting the data \n",
Expand Down Expand Up @@ -153,38 +153,45 @@
"## Datashader Rasterizing"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from holoviews.operation.resample import ResampleOperation2D\n",
"ResampleOperation2D.width=1200\n",
"ResampleOperation2D.height=500"
]
},
{
"cell_type": "markdown",
"id": "helpful-content",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"<b>Note:</b> This code above sets the default image size for Datashader renderings. It ensures images appear at high resolution when the notebook is displayed on our website, addressing the absence of dynamic resizing in this context. For interactive or local use, this adjustment is not critical.\n",
"<b>Note:</b> The code below sets the default image size for Datashader renderings to ensure high-resolution images on our website. This adjustment is mainly for web display and is less critical for local notebook use.\n",
"\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from holoviews.operation.resample import ResampleOperation2D\n",
"ResampleOperation2D.width=1200\n",
"ResampleOperation2D.height=500"
]
},
{
"cell_type": "markdown",
"id": "conscious-collector",
"metadata": {},
"source": [
"### Principles of Datashader\n",
"\n",
"While WebGL and LTTB both send individual data points to the web browser, [Datashader](https://datashader.org) rasterizing offers a fundamentally different approach to visualizing large datasets. It operates by creating an image of the data that fits within the resolution of your computer screen. In essence, Datashader constructs a 2D histogram of your data, rendering an image where each pixel represents an aggregation of data points in that area. Only the screen-friendly, fixed-resolution image is sent to the web browser for display, potentially greatly speeding up plots of the largest datasets. \n",
"\n",
"\n",
"Bokeh WebGL and LTTB both send data to the web browser and ask the web browser to \"connect the dots\" between them by drawing a line in the browser page, with LTTB simply sending fewer points. [Datashader](https://datashader.org) works in a different way, rendering the data into a frame buffer on the server, and then sending that buffer to the web browser rather than the individual data points. Thus Datashader will send only a fixed amount of data (the rendered plot), potentially greatly speeding up plots of the largest datasets. As for LTTB, plots will only be updated after a zoom or pan if Python is still running, because Python is what renders and supplies the updated image. Setting the argument `line_width` to a value above 0 will enable [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing) of the line. "
"### Single Line Example\n",
"When you're zoomed out, Datashader's effectiveness is apparent. The \"aggregated\" image it creates reveals the overall data distribution and patterns, with color intensity showing areas of high data concentration - where a line crosses through the same pixel multiple times. Datashader rendering can therefore provide a good overview of the full shape of a long timeseries, helping you understand how the signal varies even when the variations involved are smaller than the pixels on the screen.\n",
"\n",
"❗ A couple important details:\n",
"1. As for LTTB, plots will only be updated after a zoom or pan if Python is still running, because Python is what renders and supplies the updated Datashader image.\n",
"2. Setting `line_width` to be greater than `0` activates [anti-aliasing](https://en.wikipedia.org/wiki/Anti-aliasing), smoothing the visual representation of lines that might otherwise look too pixelated."
]
},
{
Expand All @@ -203,9 +210,9 @@
"id": "naughty-adventure",
"metadata": {},
"source": [
"If you zoom in enough, you'll see a normal line, but for a long timeseries in a zoomed out plot like this one, what you will see is Datashader's \"aggregation\" of *all* the line segments between the points, with darker colors indicating areas where the data trace goes back and forth multiple times in a single pixel (with the number of \"switchbacks\" indicated in the color key). This representation conveys a lot more about the behavior of this data, with the previous plots showing a single solid color regardless of how many line segments crossed that pixel. Datashader rendering can be used to get a good overview of the full shape of a long timeseries, helping you understand how the signal varies even when the steps involved are smaller than the pixels on the screen.\n",
"### Categorical, Multi-Line Example\n",
"\n",
"For data with different \"categories\" (sensors, in this case), Datashader can assign a different color to each of the sensor categories and then aggregating all of them into the final display by mixing their colors:"
"For data with different \"categories\" (sensors, in this case), Datashader can assign a different color to each of the sensor categories. The resulting image then blends these colors where data overlaps, providing visual cues for areas with high category intersection. This is particularly useful for datasets with multiple data series:"
]
},
{
Expand All @@ -216,35 +223,29 @@
"outputs": [],
"source": [
"df.hvplot(x=\"time\", y=\"value\", datashade=True, hover=True, padding=(0, 0.1), responsive=True,\n",
" min_height=300, autorange='y', line_width=1, by='sensor', title=\"Rasterize categories\")"
" min_height=300, line_width=1, by='sensor', title=\"Rasterize categories\")"
]
},
{
"cell_type": "markdown",
"id": "honey-globe",
"metadata": {},
"source": [
"This categorical color mixing can help indicate when traces overlap each other, to give you a clue when to zoom in, and becomes particularly important the more categories there are."
]
},
{
"cell_type": "markdown",
"id": "lesser-magazine",
"metadata": {},
"source": [
"[The example above needs `rasterize`, plus instant inspection. Also needs to illustrate what happens when very large numbers of traces overlap.] "
"Datashader's approach, while comprehensive for large timeseries data, focuses on the entire dataset's view at a specific resolution. To explore data across different timescales, particularly when dealing with years of data but focusing on shorter intervals like a day or an hour, the \"minimap\" feature offers an effective solution."
]
},
{
"cell_type": "markdown",
"id": "expired-gallery",
"metadata": {},
"source": [
"## Minimap\n",
"## Minimap Contextualizing\n",
"\n",
"The LTTB and Datashader options are about rendering or omitting datapoints when showing a large time range that would include many data points. What if you have years of data, but the timescale involved is such that you typically study a single day or a single hour? In that case the new \"minimap\" approach can help you ensure that you see the larger context while actually plotting only the smaller time range.\n",
"### Minimap Overview\n",
"Minimap introduces a way to visualize and navigate through extensive time ranges in your dataset. It allows you to maintain awareness of the larger context while focusing on a specific, smaller time range. This technique is particularly useful when dealing with timeseries data that span long durations but require detailed study of shorter intervals.\n",
"\n",
"A minimap is added using the HoloViews RangeToolLink:"
"### Implementing Minimap\n",
"To create a minimap, we use the HoloViz RangeToolLink, which links a main plot to a smaller overview plot. The smaller minimap plot provides a fixed, broad view of the data, and the main plot can be used for detailed examination. Note, we use Datashader rasterization on the minimap image. Here's how you can implement it:"
]
},
{
Expand All @@ -256,13 +257,23 @@
"source": [
"from holoviews.plotting.links import RangeToolLink\n",
"\n",
"# Does not yet work with downsample1d. For now, to make it easier on the browser, let's just take a subset of the data\n",
"downsampled_df = df.iloc[::10]\n",
"plot = df0.hvplot(x=\"time\", y=\"value\", height=500, downsample=True).opts(\n",
" backend_opts={\n",
" \"x_range.bounds\": (df0.time.min(), df0.time.max()), # x-extent\n",
" \"y_range.bounds\": (df0.value.min(), df0.value.max()), # y-extent\n",
" }\n",
")\n",
"\n",
"plot = df0.hvplot(x=\"time\", y=\"value\", height=500)\n",
"minimap = df0.hvplot(x=\"time\", y=\"value\", height=150).opts(ylabel='', xlabel='')\n",
"minimap = df0.hvplot(x=\"time\", y=\"value\", height=150, rasterize=True).opts(\n",
" ylabel='',\n",
" xlabel='',\n",
" toolbar='disable', # disable zoom and pan on minimap\n",
")\n",
"\n",
"link = RangeToolLink(minimap, plot, axes=[\"x\", \"y\"], boundsx=(None, pd.Timestamp(\"2022-02-01\")), boundsy=(-5, 5))\n",
"link = RangeToolLink(minimap, plot, axes=[\"x\", \"y\"],\n",
" boundsx=(None, pd.Timestamp(\"2022-02-01\")), # initial x-bounded box\n",
" boundsy=(-5, 5) # initial y-bounded box\n",
" )\n",
"\n",
"(plot + minimap).opts(shared_axes=False).cols(1)"
]
Expand All @@ -272,7 +283,17 @@
"id": "lesser-magazine",
"metadata": {},
"source": [
"Here, you can drag the grey box on the bottom plot and the top plot will update to show that range of the data, letting you explore a large dataset while plotting only a short stretch at a time."
"In this setup, you can interact with the minimap by dragging the grey selection box. The main plot above will update to reflect the selected range, allowing you to explore extensive datasets while focusing on specific segments.\n",
"\n",
"We also demonstrate the use of `backend_opts`` to configure properties of the Bokeh plotting library that are not exposed in HoloViews/hvPlot. By setting hard outer limits on the plot's panning and zooming, we ensure that the view remains within the data's range, enhancing the user experience and data exploration's precision."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Future Improvements\n",
"Looking ahead, there are areas for improvement and open issues to address. One key area is enhancing Datashader inspections to include rich hover tooltips for Datashader images. This feature will significantly enrich the data exploration experience, providing detailed information at a glance. We continue to explore and develop these functionalities to offer more intuitive and insightful ways of interacting with large timeseries datasets."
]
}
],
Expand Down

0 comments on commit 70fc75a

Please sign in to comment.