Skip to content

Commit

Permalink
build: updated version to 1.6.1rc1
Browse files Browse the repository at this point in the history
  • Loading branch information
kirangadhave committed May 22, 2024
1 parent 669b366 commit a50b32a
Show file tree
Hide file tree
Showing 5 changed files with 118 additions and 114 deletions.
98 changes: 63 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Persist

## Persistent and Reusable Interactions in Computational Notebooks

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/visdesignlab/persist/main?urlpath=lab)
Expand All @@ -11,33 +12,10 @@ https://github.com/visdesignlab/persist/assets/14944083/c6a9347b-7c93-4d0d-9e60-

[Watch on Youtube with CC](https://www.youtube.com/watch?v=DXHXPvRHN9I)

### Publication
Persist is developed as part of a [publication](https://osf.io/preprints/osf/9x8eq) and will appear in EuroVis 2024.

![Teaser image from the pre-print. The figure describes the workflow showing high level working of Persist technique.](public/imgs/teaser.png)

### Supplementary Material
Supplementary material including example notebooks, walkthrough notebooks, notebooks used in the study (including participant notebooks) and the analysis notebooks can be accessed [here](https://github.com/visdesignlab/persist_examples).

#### Abstract
> Computational notebooks, such as Jupyter, support rich data visualization. However, even when visualizations in notebooks are interactive, they still are a dead end: Interactive data manipulations, such as selections, applying labels, filters, categorizations, or fixes to column or cell values, could be efficiently apply in interactive visual components, but interactive components typically cannot manipulate Python data structures. Furthermore, actions performed in interactive plots are volatile, i.e., they are lost as soon as the cell is re-run, prohibiting reusability and reproducibility. To remedy this, we introduce Persist, a family of techniques to capture and apply interaction provenance to enable persistence of interactions. When interactions manipulate data, we make the transformed data available in dataframes that can be accessed in downstream code cells. We implement our approach as a JupyterLab extension that supports tracking interactions in Vega-Altair plots and in a data table view. Persist can re-execute the interaction provenance when a notebook or a cell is re-executed enabling reproducibility and re-use.
>
> We evaluated Persist in a user study targeting data manipulations with 11 participants skilled in Python and Pandas, comparing it to traditional code-based approaches. Participants were consistently faster with Persist, were able to correctly complete more tasks, and expressed a strong preference for Persist.

### Persist and Vega-Altair charts

Persist works with Vega-Altair charts directly for the most part. Vega-Altair and Vega-Lite offer multiple ways to write a specification. However Persist has certain requirements that need to be fulfilled.

- The selection parameters in the chart should be named. Vega-Altair's default behavior is to generate a name of selection parameter with auto-incremented numeric suffix. The value of the generated selection parameter keeps incrementing on subsequent re-executions of the cell. Persist relies on consistent names to replay the interactions, and passing the name parameter fixes allows Persist to work reliably.

- The point selections should have at least the fields attribute specified. Vega-Altair supports selections without fields by using the auto-generated indices to define selections. The indices are generated with the default order of rows in the source dataset. Using the indices directly for selection can cause Persist to operate on incorrect rows if the source dataset order changes.

- Dealing with datetime in Pandas is challenging. To standardize the way datetime conversion takes place within VegaLite and within Pandas when using Vega-Altair, the TimeUnit transforms and encodings must be specified in UTC. e.g `month(Date)` should be `utcmonth(Date)`.

## Getting Started

### Requirements

## Requirements
```markdown
- JupyterLab >= 4.0.0 or Jupyter Notebook >= 7.0.0
- pandas >= 0.25
Expand All @@ -46,50 +24,88 @@ Persist works with Vega-Altair charts directly for the most part. Vega-Altair an
- anywidget
```

## Install
### Install

To install the extension, execute:

```bash
pip install persist_ext
```
If the Jupyter server is running, you might have to reload the browser page and restart the kernel.

## Getting Started
If the Jupyter server was already running, you might have to reload the browser page and restart the kernel.

TODO:
* describe a simple example to use persist.
* link to a notebook that introduces persist and altair
* link to the documentation

## Uninstall
### Uninstall

To remove the extension, execute:

```bash
pip uninstall persist_ext
```

### Example

After installing the extension, you can use the following code snippet to create an Persist-enabled interactive data table.

```bash

```

TODO:

- describe a simple example to use persist.
- link to a notebook that introduces persist and altair
- link to the documentation

### Persist and Vega-Altair charts

Persist works with Vega-Altair charts directly for the most part. Vega-Altair and Vega-Lite offer multiple ways to write a specification. However Persist has certain requirements that need to be fulfilled.

- The selection parameters in the chart should be named. Vega-Altair's default behavior is to generate a name of selection parameter with auto-incremented numeric suffix. The value of the generated selection parameter keeps incrementing on subsequent re-executions of the cell. Persist relies on consistent names to replay the interactions, and passing the name parameter fixes allows Persist to work reliably.

- The point selections should have at least the fields attribute specified. Vega-Altair supports selections without fields by using the auto-generated indices to define selections. The indices are generated with the default order of rows in the source dataset. Using the indices directly for selection can cause Persist to operate on incorrect rows if the source dataset order changes.

- Dealing with datetime in Pandas is challenging. To standardize the way datetime conversion takes place within VegaLite and within Pandas when using Vega-Altair, the TimeUnit transforms and encodings must be specified in UTC. e.g `month(Date)` should be `utcmonth(Date)`.

### Publication

Persist is developed as part of a [publication](https://osf.io/preprints/osf/9x8eq) and will appear in EuroVis 2024.

![Teaser image from the pre-print. The figure describes the workflow showing high level working of Persist technique.](public/imgs/teaser.png)

### Supplementary Material

Supplementary material including example notebooks, walkthrough notebooks, notebooks used in the study (including participant notebooks) and the analysis notebooks can be accessed [here](https://github.com/visdesignlab/persist_examples).

#### Abstract

> Computational notebooks, such as Jupyter, support rich data visualization. However, even when visualizations in notebooks are interactive, they still are a dead end: Interactive data manipulations, such as selections, applying labels, filters, categorizations, or fixes to column or cell values, could be efficiently apply in interactive visual components, but interactive components typically cannot manipulate Python data structures. Furthermore, actions performed in interactive plots are volatile, i.e., they are lost as soon as the cell is re-run, prohibiting reusability and reproducibility. To remedy this, we introduce Persist, a family of techniques to capture and apply interaction provenance to enable persistence of interactions. When interactions manipulate data, we make the transformed data available in dataframes that can be accessed in downstream code cells. We implement our approach as a JupyterLab extension that supports tracking interactions in Vega-Altair plots and in a data table view. Persist can re-execute the interaction provenance when a notebook or a cell is re-executed enabling reproducibility and re-use.
>
> We evaluated Persist in a user study targeting data manipulations with 11 participants skilled in Python and Pandas, comparing it to traditional code-based approaches. Participants were consistently faster with Persist, were able to correctly complete more tasks, and expressed a strong preference for Persist.
## Contributing

Persist uses [hatch](https://hatch.pypa.io/latest/) to manage the development, build and publish workflows. You can install `hatch` using `pipx`, `pip` or Homebrew (on MacOS or Unix).

##### **pipx**

Install `hatch` globally in isolated environment. We recommend this way.

```bash
pipx install hatch
```

##### **pip**

Install hatch in the current Python environment.

_**WARNING**_: This may change the system Python installation.

```bash
pip install hatch
```

##### **Homebrew**

```bash
pip install hatch
```
Expand All @@ -100,38 +116,47 @@ After installing `hatch` with your preferred method follow instructions below fo
### Development

Run the `setup` script from `package.json`:

```bash
hatch run jlpm setup
```

When setup is completed, open three terminal windows and run the follow per terminal.

#### Widgets

Setup vite dev server to build the widgets

```bash
hatch run watch_widgets
```

#### Extension

Start dev server to watch and build the extension

```bash
hatch run watch_extension
```

#### Lab

Run JupyterLab server with `minimize` flag set to `false`, which gives better stack traces aqnd debugging experience.

```bash
hatch run run_lab
```

### Build

To build the extension as a standalone Python package, run:

```bash
hatch run build_extension
```


### Publish

To publish the extension, first we create a proper version. We can run any of the following

```bash
Expand All @@ -141,18 +166,21 @@ hatch version major # 1.x.x
```

You can also append release candidate label:

```bash
hatch version rc
```

Finally you can directly specify the exact version:

```bash
hatch version "1.3.0"
```

Once the proper version is set, build the extension using the `build` workflow.

When the build is successful, you can publish the extension if you have proper authorization:

```bash
hatch publish
```
Expand Down
46 changes: 46 additions & 0 deletions examples/Getting Started.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 2,
"id": "4d39d0b5-5fe4-4bb9-8e55-83f5deb7f471",
"metadata": {},
"outputs": [],
"source": [
"from vega_datasets import data # Load vega_datasets\n",
"import persist_ext as PR # Load Persist Extension\n",
"\n",
"cars_df = data.cars() # Get the cars dataset as Pandas dataframe\n",
"\n",
"PR.PersistTable(cars_df) # Display cars dataset as a "
]
}
],
"metadata": {
"__persist_keys_record": [
"__GENERATED_DATAFRAMES__",
"__persist_nb_uuid__",
"trrack_graph"
],
"__persist_nb_uuid__": "71bbd9f0-dd59-46c9-9e0f-7194b6f50588",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
74 changes: 0 additions & 74 deletions examples/Tutorial.ipynb

This file was deleted.

2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "persist_ext",
"version": "1.6.0",
"version": "1.6.1-rc1",
"description": "PersIst is a JupyterLab extension to enable persistent interactive visualizations in JupyterLab notebooks.",
"keywords": [
"jupyter",
Expand Down
12 changes: 8 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,17 +23,15 @@ classifiers = [
]
dependencies = [
"altair>=5",
"jupyterlab==4.0.4",
"jupyterlab>=4",
"pandas>=0.25",
"ipywidgets",
"anywidget",
"ipywidgets",
"lzstring",
"traittypes==0.2.1",
"pyarrow",
"fastparquet",
"scikit-learn",
"paretoset",
"vega-datasets"
]
dynamic = ["version", "description", "authors", "urls", "keywords"]

Expand Down Expand Up @@ -106,3 +104,9 @@ run_lab = ["hatch run jlpm dev:lab"]
watch_extension = ["hatch run jlpm dev:ext"]
build_widgets = ["hatch run node build_all.mjs"]
build_extension = ["hatch run build_widgets && hatch run python -m build"]

[tool.hatch.envs.published]
dependencies = ["persist_ext==1.6.1rc1"]

[tool.hatch.envs.published.scripts]
run_lab = ["jupyter lab"]

0 comments on commit a50b32a

Please sign in to comment.