-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Moved documentation around to have examples. Added blog info.
- Loading branch information
Showing
24 changed files
with
320 additions
and
154 deletions.
There are no files selected for viewing
54 changes: 54 additions & 0 deletions
54
docs-website/docs/examples/composite-vega-altair-charts.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
--- | ||
--- | ||
import ReactPlayer from 'react-player' | ||
|
||
|
||
# Composite Vega-Altair Charts | ||
|
||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/visdesignlab/persist/HEAD?labpath=examples%2Fgetting_started_composite_vega_altair_charts.ipynb) | ||
|
||
Persist also supports composite Vega-Altair charts. | ||
|
||
```python | ||
from vega_datasets import data # Load vega_datasets | ||
import altair as alt | ||
import persist_ext as PR # Load Persist Extension | ||
|
||
movies_df = data.movies() # Get the cars dataset as Pandas dataframe | ||
|
||
pts = alt.selection_point(name="selection", fields=["Major_Genre"]) | ||
|
||
rect = alt.Chart().mark_rect().encode( | ||
alt.X('IMDB_Rating:Q').bin(), | ||
alt.Y('Rotten_Tomatoes_Rating:Q').bin(), | ||
alt.Color('count()').scale(scheme='greenblue').title('Total Records') | ||
) | ||
|
||
circ = rect.mark_point().encode( | ||
alt.ColorValue('grey'), | ||
alt.Size('count()').title('Records in Selection') | ||
).transform_filter( | ||
pts | ||
) | ||
|
||
bar = alt.Chart(width=550, height=200).mark_bar().encode( | ||
x='Major_Genre:N', | ||
y='count()', | ||
color=alt.condition(pts, alt.ColorValue("steelblue"), alt.ColorValue("grey")) | ||
).add_params(pts) | ||
|
||
chart = alt.vconcat( | ||
rect + circ, | ||
bar | ||
).resolve_legend( | ||
color="independent", | ||
size="independent", | ||
) | ||
|
||
PR.PersistChart(chart, data=movies_df) | ||
``` | ||
|
||
## Video Tutorial | ||
|
||
<ReactPlayer playing controls url='https://github.com/visdesignlab/persist/assets/14944083/2808e722-f908-4cf9-8f66-5f2d90c5460d | ||
' /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
--- | ||
|
||
import ReactPlayer from 'react-player' | ||
|
||
# Visualizing dataframe with `plot` module | ||
|
||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/visdesignlab/persist/HEAD?labpath=examples%2Fgetting_started_plots_module.ipynb) | ||
|
||
Persist has a plotting module to create an interactive scatterplot or bar chart quickly. This module is a thin wrapper around Vega-Altair. | ||
|
||
To create a scatterplot: | ||
|
||
```python | ||
from vega_datasets import data # Load vega_datasets | ||
import persist_ext as PR # Load Persist Extension | ||
|
||
cars_df = data.cars() # Get the cars dataset as Pandas dataframe | ||
|
||
PR.plot.scatterplot(data=cars_df, x="Miles_per_Gallon:Q", y="Weight_in_lbs:Q", color="Origin:N") | ||
``` | ||
|
||
## Video Tutorial - Scatterplot | ||
|
||
|
||
<ReactPlayer playing controls url='https://github.com/visdesignlab/persist/assets/14944083/fd75be32-ab2a-425e-8bce-f60c99baebbc | ||
' /> | ||
|
||
To create a barchart: | ||
|
||
```python | ||
from vega_datasets import data # Load vega_datasets | ||
import persist_ext as PR # Load Persist Extension | ||
|
||
cars_df = data.cars() # Get the cars dataset as Pandas dataframe | ||
|
||
PR.plot.barchart(data=cars_df, x="Cylinders:N", y="count()") | ||
``` | ||
|
||
## Video Tutorial - BarChart | ||
|
||
|
||
<ReactPlayer playing controls url='https://github.com/visdesignlab/persist/assets/14944083/16d3be4c-9511-42ed-84ae-d4e65097a5b9' /> |
35 changes: 35 additions & 0 deletions
35
docs-website/docs/examples/interactive-vega-altair-charts.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
--- | ||
--- | ||
import ReactPlayer from 'react-player' | ||
|
||
|
||
# Interactive Vega-Altair charts | ||
|
||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/visdesignlab/persist/HEAD?labpath=examples%2Fgetting_started_vega_altair.ipynb) | ||
|
||
You can also use Vega-Altair charts directly by passing the chart object to the `PersistChart` function. | ||
|
||
```python | ||
from vega_datasets import data # Load vega_datasets | ||
import altair as alt | ||
import persist_ext as PR # Load Persist Extension | ||
|
||
cars_df = data.cars() # Get the cars dataset as Pandas dataframe | ||
|
||
brush = alt.selection_interval(name="selection") | ||
|
||
chart = alt.Chart().mark_point().encode( | ||
x="Weight_in_lbs:Q", | ||
y="Miles_per_Gallon:Q", | ||
color=alt.condition(brush, "Origin:N", alt.value("lightgray")) | ||
).add_params( | ||
brush | ||
) | ||
|
||
PR.PersistChart(chart, data=cars_df) | ||
``` | ||
|
||
## Video Tutorial | ||
|
||
<ReactPlayer playing controls url='https://github.com/visdesignlab/persist/assets/14944083/fadd5e6a-d6b6-4513-a94c-43b54ad4d047 | ||
' /> |
22 changes: 22 additions & 0 deletions
22
docs-website/docs/examples/visualize-dataframe-in-table.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
--- | ||
import ReactPlayer from 'react-player' | ||
|
||
# Visualize dataframe in an interactive data table | ||
|
||
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/visdesignlab/persist/HEAD?labpath=examples%2Fgetting_started_interactive_data_table.ipynb) | ||
|
||
You can use the following code snippet to create a Persist-enabled interactive data table. | ||
|
||
```python | ||
from vega_datasets import data # Load vega_datasets | ||
import persist_ext as PR # Load Persist Extension | ||
|
||
cars_df = data.cars() # Get the cars dataset as Pandas dataframe | ||
|
||
PR.PersistTable(cars_df) # Display cars dataset with interactive table | ||
``` | ||
|
||
## Video Tutorial | ||
|
||
<ReactPlayer playing controls url='https://github.com/visdesignlab/persist/assets/14944083/eb174d57-55f3-4ee9-8b5d-189ad8746c26' /> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
--- | ||
|
||
To get started with Persist, you can proceed to the [installation](installation) section. If you'd rather follow along with a tutorial without installing Persist, the [quickstart tutorial](simple-tutorial) has Binder links for you to follow along without having to install persist locally. | ||
|
||
|
||
import DocCardList from '@theme/DocCardList'; | ||
|
||
<DocCardList /> |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
--- | ||
sidebar_position: 2 | ||
--- | ||
import ReactPlayer from 'react-player' | ||
import ContextSensitiveImage from '../../src/components/ContextSensitiveImage.tsx' | ||
|
||
# Quickstart Tutorial | ||
|
||
|
||
## Analysis with Persist | ||
|
||
It’s easiest to see how it works by following an analysis. We’ll look at [avalanches in the Utah mountains](https://utahavalanchecenter.org/). You can follow along using this [binder instance.](https://mybinder.org/v2/gh/visdesignlab/persist/HEAD?labpath=examples%2Fblog.ipynb) _Binder instance might take a few minutes to start the first time._ You can also [download the notebook](https://raw.githubusercontent.com/visdesignlab/persist/main/examples/blog.ipynb) and run the notebook in a local JupyterLab instance. Follow the instructions [here](https://vdl.sci.utah.edu/persist/docs/installation) to set up local JupyterLab with Persist extension. | ||
|
||
_The notebook uses VegaAltair to create interactive visualizations and assumes some familiarity with VegaAltair, VegaLite, and the declarative approach to creating visualizations. You can refer to their [getting started](https://altair-viz.github.io/getting_started/overview.html) for a quick introduction. You won’t have to write any VegaAltair code to follow the blog._ | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_2.png' className='docs-image'/> | ||
|
||
|
||
After loading the data, we examine the data in an interactive data table using the following code. | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_3.png' className='docs-image'/> | ||
|
||
### Working with Columns | ||
|
||
We notice the dataset contains artifacts like leading semicolons in some column names. We can double-click the column header to edit the name and delete the semicolons from all four columns. | ||
<ContextSensitiveImage src='img/tutorial_images/step_4.png' className='docs-image'/> | ||
|
||
All our operations are tracked in a provenance graph on the right side. If we make a mistake, we can click on the previous step and fix it. | ||
|
||
Next, we will delete the coordinates and comments columns since we will not perform any location or text-based analysis. | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_5.png' className='docs-image'/> | ||
|
||
|
||
### Changing a Column’s Data Type | ||
We can hover over the column headers to see the data type. The `Depth_inches` column has the data type `string` instead of `float`. We want the `Depth_inches` to be a `float` column so we can plot it later. We also see that row 7 has a trailing inches symbol, `”`, which is the cause of the incorrect data type. | ||
|
||
We can use the search box on the top left of the table to find all instances of the trailing symbol. We can double-click the cell to edit it and remove the symbol. Using the menu in the column header, we will change the column's data type to float. | ||
|
||
### Extracting a Dataframe | ||
We can click the “insert dataframe” button in the dataframe manager at the bottom of the table to insert a cell with a pandas dataframe called `av_ut1`. This dataframe has the changes we made in the table applied: the column names are corrected, two columns are removed, and the datatype of `Depth_inches` is numerical. | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_6.png' className='docs-image'/> | ||
|
||
#### Equivalent pandas code | ||
For reference, here is the equivalent pandas code for making these changes to the dataframe: | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_7.png' className='docs-image'/> | ||
|
||
|
||
|
||
|
||
### Filtering Data in a Visualization | ||
Next, we take a look at how to interactively manipulate data in visualizations. | ||
|
||
Using the following code, we will create an interactive scatterplot of `Elevation_feet` vs. `Depth_inches` using the plot module (basically a shorthand for common vega-altair plots) and our new dataframe. | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_8.png' className='docs-image'/> | ||
|
||
If we look at this plot carefully, we can see that it shows avalanches occurring at elevations outside the possible range for Utah (Utah’s lowest point is at about 2,200 feet; its highest is at 13,528 feet), indicating that these entries are unreliable. We can select these points using a brush and remove them from the dataset. | ||
|
||
We can again access the resulting dataframe from the dataframe manager. | ||
|
||
#### Equivalent pandas code | ||
Again, here’s the equivalent pandas code: | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_9.png' className='docs-image'/> | ||
|
||
### Removing old data | ||
We will not look at the count of records aggregated by the year. We will use the following code to create the plot. | ||
|
||
We see that before 2010, we don’t have many records. We will remove the records for those years from our analysis. We can again interactively select the data we want to remove and filter it out. | ||
|
||
Equivalent pandas code | ||
|
||
### Creating a New Category in a Custom Vega-Altair Chart | ||
Next, we'll add a new categorical classification to our dataset: types of avalanche activity vary over the snow season, so we classify the season into three phases: Start, Middle, End. Using the following code, we will create a Vega-Altair bar chart with data aggregated by month and make it persistent: | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_11.png' className='docs-image'/> | ||
|
||
We will first create a new category, `Av_Season`, and add the three options using the new category popup. Next, we interactively select the months and assign them the appropriate phase. | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_12.png' className='docs-image'/> | ||
Notice that the season is now part of our dataset, and we could facet our dataset based on the season for further analysis. | ||
|
||
|
||
#### Equivalent pandas code | ||
|
||
<ContextSensitiveImage src='img/tutorial_images/step_13.png' className='docs-image'/> | ||
|
||
## The Persist Technique | ||
|
||
Persist leverages the concept of interaction provenance as a shared abstraction between code and interactions within a notebook. Interaction provenance records all interactions leading to a particular point in the analysis. Each interaction is captured in the output of a code cell and documented in a provenance graph. This graph tracks the interactive analysis in real-time, supports navigation through the history, and allows branching off to explore alternative analysis paths. | ||
|
||
Interactions recorded in the provenance graph are translated into data operations, updating the underlying dataframe. This updated dataframe is then used to refresh the output and is available as a new variable for further analysis. | ||
|
||
|
||
## Caveats on using Vega-Altair and Persist | ||
|
||
Persist works with Vega-Altair charts directly for the most part. Vega-Altair and Vega-Lite offer multiple ways to write a specification. However, Persist has certain requirements that need to be fulfilled. | ||
|
||
- The selection parameters in the chart should be named. Vega-Altair's default behavior is to generate a name of the selection parameter with an auto-incremented numeric suffix. The value of the generated selection parameter keeps incrementing on subsequent re-executions of the cell. Persist relies on consistent names to replay the interactions, and passing the name parameter fixes allows Persist to work reliably. | ||
|
||
- The point selections should have at least the field attribute specified. Vega-Altair supports selections without fields by using auto-generated indices to define them. The indices are generated in the source dataset in the default order of rows. Using the indices directly for selection can cause Persist to operate on incorrect rows if the source dataset order changes. | ||
|
||
- Dealing with datetime in Pandas is challenging. To standardize the way datetime conversion takes place within VegaLite and Pandas when using Vega-Altair, the TimeUnit transforms, and encodings must be specified in UTC. e.g `month(Date)` should be `utcmonth(Date)`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Introduction | ||
|
||
Computational notebooks are a modern realization of Donald Knuth’s vision of literate programming. These notebooks allow us to seamlessly mix code, visualizations, figures, and text to analyze data and narrate the analysis. The most popular notebooks are Jupyter notebooks. | ||
|
||
Jupyter supports interactive outputs like Vega-Altair charts and Jupyter Widgets in addition to text, static plots, and tables. Code-based analysis in notebooks can be re-run, and the results of one cell can be used in another, making the analysis reproducible and reusable. In contrast, interactive analysis in notebooks presents significant challenges concerning reproducibility and reusability. | ||
|
||
## Visualizations in Notebooks are a Dead End! | ||
|
||
Until now, there has been a significant disconnect between code and the interactive outputs of notebooks. While code can generate interactive visualizations (such as those created with Vega-Altair), **the results of these interactions cannot be accessed in code**. For instance, if a filter is applied in a visualization, analysts must write additional code to replicate the filter if they want to use it later in their notebook. This limitation vastly reduces the usefulness of interactions within visualizations. | ||
|
||
Furthermore, there is a disparity between code and interactions in terms of persistence. Changes to the code are saved and persist across restarts and re-executions. However, interactions are transient and are lost when the notebook is restarted, or the cell is re-executed. This lack of persistence makes visual analysis difficult to reproduce without the added effort of documenting each visual analysis step. | ||
|
||
|
||
|
||
|
||
Notice how a selection is lost when a cell is re-executed using standard interactive visualization tools in a notebook. | ||
|
||
|
||
## Persist makes Interactive Visualizations Useful in Notebooks. | ||
|
||
To address these challenges, we have developed Persist, a JupyterLab extension that captures interaction provenance, making interactions persistent and reusable. Persist bridges the gap between code and interactive visualizations, ensuring that all interactions are tracked, recorded, and can be reapplied automatically. |
Oops, something went wrong.