JEP Feedback #1

captainsafia · 2020-07-12T22:38:51Z

Some general feedback about the JEP:

The JEP is long and repetitive. I would make some edits to:
- Use headings and bold statements to draw the readers attention to the relevant points.
- Leverage an appendix/endnotes to avoid repeating the same points throughout the document.
- Include more references to prior art in the community and existing community projects.
The JEP lacks a rigorous problem statement. The "Motivation for Investigating a New Format" section makes the case the interop with common Unix tools is a key part of the proposal but doesn't address why.
- What about non-Unix desktop users or Jupyter notebook users who are low-code or no-code personas?
- we would unlock level diffing, visualization, inline commenting and other common workflows more readily.
  
  It's not clear how visualization and inline commenting and relevant to Unix tooling here.
- What other motivations are there for creating a new format? Considering the scope of the proposal, identifying other motivations is key.
The JEP lacks a lot of technical detail about the implementation. After reading the document, I'm still not clear what the problems to be solved are and what the solution is. The "Not Yet Implemented" sections should be filled in and there should be answers (collective answers or individual) to the questions in the unresolved questions sections.
- What components in the Jupyter ecosystem need to be changed to successfully execute this change?
- What are the performance and security ramifications of the change?
- What is the adoption story for the proposed changes?
While the user scenario is helpful, I'd trim it down a bit and try to think of a simple end-to-end that drives the key points.
- The details can be moved to an appendix/endnote to keep the JEP easier to understand.
The table at the end with prior art is helpful but I'd simplify the headings to Project/Description/Pros/Cons. This makes it easier to identify the strengths and weaknesses from an engineering perspective of existing community projects.
- What aspects of each project make it easy to use for users and easy to maintain for the open source community?
- What are design improvements/challenges that each open source community is undertaking?
- At which point in the Jupyter ecosystem does the project interface?

Some more specific feedback:

Open Standards for Interactive Computing

I'd remove this section all together or trim it down to only the parts that are relevant to the JEP (e.g. the standardization and adoption of the JSON-based notebook format over the past decade plus).

Important Attributes of the Jupyter Format

I would make a distinction between notebook-level metadata and cell-level metadata under the "Metadata" section just so it's clear.

Another nit: The metadata property is extendable so notebook apps, extensions, and end-users can define their own metadata.

Usually associated with a point-in time execution to capture the state of things during a notebook resolution.

I don't think this is totally true. A big part of notebooks is outputs as presentational or interactive elements. Usually a certain output is the "goal" of a notebook. For example, you might run a series of cells that in the end generate a meaningful visualization or a trained ML model. You might also have a notebook that contains interactive widgets.

In general, outputs are key artifacts of the notebook document.

In order to meet the needs of these scenarios, .ipynb files contains a fair bit of complexity for capturing the inputs, outputs, and metadata from a user.

What complexity is being referred to here? It would be helpful to list out some examples for those unfamiliar with this.

There are a significant number of data scientists that use text-based workflows though we do not have data on exactly how many.

This is the first time the phrase "text-based workflows" is used outside of the title I believe. It might help to add a little explainer like "Text-based workflows are interactions (editing, sending, using in a DevOps pipeline) with files that blah blah blah".

User Groups / Communities

To make this section more useful, I'd probably rephrase it in the context of the JEP. As in, here are the different types of notebook users and how they interact with the document format. As opposed to a general list of Jupyter-using communities.

While nbdime does provide an excellent solution for some, it unfortunately uses a non-standard mechanism for diffing that makes it difficult to integrate with most other common tools (e.g. diff and patch).

Can you clarify what the non-standard mechanism is here?

Shipping and visualizing patches

What does it mean to "ship" a patch in the context of a notebook? An example would be helpful here.

Commenting inline

How does this tie in to diffing/patching generally?

Manually inspecting raw notebook contents

What does "raw notebook contents" mean here? The JSON?

Amal goes to the first cell and enters a command saying c.save_options = 'git-friendly'. From this point forward, the file will be saved as a version friendly to line level diffing.

How does this work? I'm writing this feedback as I read the doc so maybe this is outlined later but having an explainer of what is going on under the hood here is very helpful.

the outputs are incomprehensible

What does this mean? When serialized or rendered? Some specific examples would be helpful.

This is compounded by the fact that the notebook outputs and metadata are _interwoven _with the content (which is most likely what most users care about when they’re looking at a diff).

Outputs are content and I've had experiences where I've needed to via the diff of the output (e.g. changes in hyperparameters from a code cell that searches the hyperparamter space for the best set).

That way, the incomprehensible things (the outputs) would be at the bottom of any diff, and could either be filtered out or simply ignored more easily than they currently are, allowing the user to focus on the content sections of the file.

How do you associate inputs with outputs here? Do we assume that the outputs and inputs are always serialized in the same order? Is there some sort of ID reference?

A table of notebook formats and their features

I'd probably make the headings of this table "Project/Pros/Cons" so it's easier to read and the key aspects of each design aren't missed.

While rendering rich diffs visually is 'easy', most git workflows require things like comments, resolving conflicts, etc. This column is, ultimately, just opinions, but when described as 'git friendly' we would expect it to be reasonably possible to comment inline (in a persentent way), resolve git conflicts logically.

This is a good clarification. I would move it to before the table.

IMO, "inline commenting" is hard to lump as a 'git feature' since that's a UI experience that some git clients provide and not a core primitive of the git version control system itself.

However, you could imagine many people editing the text file without a jupyter server (e.g. via a comment in github). That’s why the text file should always be the source of truth.

Also in the case of the nteract desktop app which serializes to ipynb from it's in-memory format directly via filesystem IO.

Use Jupyter UI to activate this "pairing" so that it will automatically save an ipynb file to

Does this mean that Jupyter clients like Lab, classic Notebook, and nteract will have to implement this support on the front-end?

Perhaps there are ways that this tool could improve its functionality in order to more easily integrate into git-based workflows,

What are these possible improvements? And/or what are the problems that need to be addressed in existing tools?

cc: @aronchick

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JEP Feedback #1

JEP Feedback #1

captainsafia commented Jul 12, 2020

Open Standards for Interactive Computing

Important Attributes of the Jupyter Format

A table of notebook formats and their features

JEP Feedback #1

JEP Feedback #1

Comments

captainsafia commented Jul 12, 2020

Open Standards for Interactive Computing

Important Attributes of the Jupyter Format

A table of notebook formats and their features