Skip to content

Latest commit

 

History

History
636 lines (445 loc) · 53.2 KB

sample.md

File metadata and controls

636 lines (445 loc) · 53.2 KB

Safia's notes:

  • The JEP is long and repetitive. I would make some edits to:

    • Use headings and bold statements to draw the readers attention to the relevant points.
    • Leverage an appendix/endnotes to avoid repeating the same points throughout the document.
    • Include more references to prior art in the community and existing community projects.
  • The JEP lacks a rigorous problem statement. The "Motivation for Investigating a New Format" section makes the case the interop with common Unix tools is a key part of the proposal but doesn't address why.

    • What about non-Unix desktop users or Jupyter notebook users who are low-code or no-code personas?
    • We would unlock level diffing, visualization, inline commenting and other common workflows more readily.
    • It's not clear how visualization and inline commenting and relevant to Unix tooling here.
    • What other motivations are there for creating a new format? Considering the scope of the proposal, identifying other motivations is key.
  • The JEP lacks a lot of technical detail about the implementation. After reading the document, I'm still not clear what the problems to be solved are and what the solution is. The "Not Yet Implemented" sections should be filled in and there should be answers (collective answers or individual) to the questions in the unresolved questions sections.

    • What components in the Jupyter ecosystem need to be changed to successfully execute this change?
    • What are the performance and security ramifications of the change?
    • What is the adoption story for the proposed changes?
  • While the user scenario is helpful, I'd trim it down a bit and try to think of a simple end-to-end that drives the key points.

    • The details can be moved to an appendix/endnote to keep the JEP easier to understand.
  • The table at the end with prior art is helpful but I'd simplify the headings to Project/Description/Pros/Cons. This makes it easier to identify the strengths and weaknesses from an engineering perspective of existing community projects.

    • What aspects of each project make it easy to use for users and easy to maintain for the open source community?
    • What are design improvements/challenges that each open source community is undertaking?
    • At which point in the Jupyter ecosystem does the project interface?

Discussion - Improving the Notebook Experience for Text-based Workflows

Contents

  1. Summary
  2. Proposal
  3. Motivation
  4. Use Cases
  5. Features/Requirements of an Optional New Format
  6. Compatibility with Jupyter Format Standard
  7. Options Under Consideration
  8. Unresolved Questions
  9. Prior Art and Additional Options found Insufficient for this Proposal
  10. Guide-level Explanation
  11. Reference-level Explanation
  12. Rationale and Alternatives

Summary

To Be Done

Proposal

To Be Done

Motivation

Project Jupyter

The goal of Project Jupyter (Project Jupyter | Home) is to provide:

Open Standards for Interactive Computing

The Jupyter Notebook is based on a set of open standards for interactive computing. These open standards can be leveraged by third party developers to build customized applications with embedded interactive computing.

Through the work of the Jupyter team, since 2006, the community has created a set of tools that enable technologists across the technological skill spectrum to simply engage with data, data science and machine learning. A subset of these tools include (below is a non-exhaustive list - for more info read (here)):

Additionally a large set of third party tools have been created to extend the usage for specific scenarios. These include:

The Jupyter Notebook Format is an open standard which has existed for a number of years. The 2017 ACM Software System Award recognized Jupyter for: https://awards.acm.org/award_winners/perez_9039634#2017-acm-software-system-award

For Project Jupyter, a broad collaboration that develops open-source tools for interactive computing, with a language-agnostic design. These tools, that include IPython, the Jupyter Notebook and JupyterHub, have become a de facto standard for data analysis in research, education, journalism, and industry._

The Jupyter Notebook Format

This discussion is primarily scoped to the way Jupyter stores the notebook on disk.

The The Jupyter Notebook Format — nbformat 5.0 documentation defines the open standard format_description for Jupyter notebook files, also referred to as ipynb files. The JSON schema for the notebook is documented in the jupyter/nbformat · GitHub repo.

The nbformat open standard notebook offers a Python API for working with notebook files — nbformat 5.0 documentation. This Python API enables reading and writing of notebooks along with a way to programmatically create notebooks.

Important Attributes of the Jupyter Format

  • Notebook-Level Metadata
  • Cells
  • Source
    • Contains source code that a user is editing to produce outcomes. Usually what people care the most about reviewing.
    • Can be markdown, kernel-language code, magics, or raw cells
  • Metadata
    • Notebook-level Metadata

      • Stores execution information, parameter indicators, format rendering hints, and domain/organization specific fields for jupyter extensions to reuse
      • Stores kernel and language information
    • Cell-level Metadata

      • Stores widget output data that’s shared across the various cells and between headless and headed execution patterns.
      • Encodes name and authorship information
      • Stores domain/organization specific fields for jupyter extensions to reuse
      • Ignoring or removing metadata entirely can break workflows that are using extensions, though changes once initially set are rare for the non-runtime information attributes
      • Exception information when error stop execution
    • The metadata property is extendable so notebook apps, extensions, and end-users can define their own metadata.

  • Output
    • Results of a run / execution, oriented by the source cell that triggered them.
    • Includes logs, visuals, and data outcomes for human and machine parsing.
    • Usually associated with a point-in time execution to capture the state of things during a notebook resolution. However, in presentational or interactive notebooks, the outputs would be the "goal" of a notebook. For example, you might run a series of cells that in the end generate a meaningful visualization or a trained ML model. You might also have a notebook that contains interactive widgets.
    • Typically, but not always, stripped before being included in version control.
    • Almost always preserved when sharing outside of version control as a form or reporting.

Motivation for Investigating a New Format

Notebooks have a broad set of uses that continue to grow every day. In order to meet the needs of these scenarios, .ipynb files captures the inputs, outputs, and metadata from a user. Unfortunately, the current structure of the .ipynb notebook makes it challenging to use common Unix tools and workflows. However, making any changes needs to be carefully considered; the ecosystem needs the .ipynb format to be stable over time so that it does not cause massive disruption.

The core of this investigate is due to the rise of a large number of new users for Jupyter. Specifically, there are a significant number of data scientists that use text-based workflows and Jupyter (though we do not have data on exactly how many). By text-based workflows we are referring to interactions (editing, sending, using in a DevOps pipeline) with files that have no higher level structure. All logic, comparison, etc must be done on a line-by-line basis. The canonical oxamplesof which in the GNU Linux/Unix ecosystem are diff and patch which are the center of most version control and comparison tools. These two tools are in extremely broad usage and often represent the core of both human-readable and automated tooling. If we are able to solve for using these tools, and offer an alternative, optional, file format to the ipynb file format, we would unlock level diffing, visualization, inline commenting and other common workflows more readily.

User Groups / Communities

The IPython notebook initiative, which evolved into Jupyter notebooks, originally provided an interactive notebook style user interface to the IPython environment to support interactive computation research. Since then, the user community around Jupyter notebooks has grown considerably and now includes, but is not limited to, several distinct practice areas:

  • Data Engineering
    • Defining reproducible tasks for parameterized extract, transform, or load operations
    • Recording data movement operations in a way that’s easily modified and rerun for particular parameterizations when an error occurs
    • Localized logging associated with specific tasks
    • Visual indicators for data trends that can indicate data quality issues
    • Data auditing
  • Data Analytics
    • Low-code ability to collect data (e.g. magics)
    • Easy to share common starting places for problems
    • Access to programmatic concepts without needing full development tool chains
    • Easy to share results with peers and organization
    • Productionization path is has lower friction compared to writing scripts or one-off queries
  • Systems Operations
    • System monitoring / reaction made easy to implement and visualize (not unique to Jupyter tooling)
    • Disaster recovery playbooks can be written in one document that’s testable and reproducible with documented instructions
    • System probes can be captured and shared easily without screen captures
  • Teaching and learning
  • Scholarly publishing workflows
  • Communicating data-intensive ideas

File Comparison

At the core of our effort is the mechanism by which most users compare files. The issues with diffing notebooks, many of which have already been identified in JEP 08 include diffing "input" content in the context of the document format (JSON) and diffing output or embedded content (cell outputs, media content embedded in markdown cells). While nbdime does provide an excellent solution for some, it unfortunately uses a non-standard mechanism for diffing that makes it difficult to integrate with most other common tools (e.g. diff and patch).

Additionally, as diff and patch are included in many applications (as an embedded tool) or hosted workflows (e.g. GitLab, GitHub, Mercurial), it will be challenging to augment those solutions with an additional tool in order to unlock a better experience.

If we were able to offer an optional file format that supports line-based comparison allows users to use diff and patch, we would unlock a number of new sceraios. diff and patch are available in every default server installation meaning no further installation would be required. Line level comparison is the standard in the GNU/Linux ecosystem and supporting a file format that can be diff’d in a line-level way will unlock thousands of tools and workflows. That is to say, even if someone does not use diff or patch, if they need to do any sort of comparisons of files, it is likely they understand the standard patch format. Some common scenarios that would be unlocked include:

  • Shipping and visualizing patches (NOTE Needs example - grabbing a .patch file and sending it to another application (e.g. via pipe))
  • Commenting inline (NOTE Needs example of why commenting inline is challenging without line based patches)
  • Manually inspecting raw notebook (e.g. the JSON directly) contents
  • Any service that expects file format support of this kind

Though nbdime supports a subset of these experiences for Jupyter notebooks, there are few other tools that support the nbdime patch-format for the same use-case.

A further challenge for comparing notebook files line by line or in a text based medium is that notebooks contain rich media contents like images, videos, animations or even small GUI applications. This content is an essential part of a notebook but showing a meaningful diff in a terminal is challenging. Solving this for the new file format is necessary before it could be accepted by the community.

Use Cases

An example of an individual user

TODO: add a use-case of an individual person using jupyter notebooks locally, along with diffing / merging / etc.

An Example of Collaboration

Illustrating the challenge

  • Bao, the site reliability engineer (SRE), works at a small startup building machine learning models. She is on-boarding a small data science team who wants to begin collaborating.
    • To date, this team has been sharing their notebooks via network shares (e.g. SMB, DropBox) but they want to move to something better.
    • She'd like to use commenting and patches which is how the software engineers in her organization collaborate.
    • Bao cannot integrate notebooks into her company's existing diff and merge based infrastructure used by other software engineers and organizations because, though JSON is non-binary (and therefore can be committed), the comparisons generated are often quite complicated and non-human readible.
    • Further, automated tools in her corporate workflow (e.g. linting, complexity detection, etc) use patch files to analyze changes, and struggle with the existing files generated.
    • Her IT department would prefer not to install new tools as adding new binaries to existing blessed images require a significant security analysis and IT analysis to add, for each incremental version.
    • Her organization uses inline commenting, viewing diffs of her applications and notebooks together when they appear in a single commit, and patches generated by her security infrastructure to notebooks all of which cause merge conflicts when she interacts with notebooks
      • (e.g. her security infrastructure evaluates and generates patch formatted updates to python imports, and she needs to manually apply these changes where it is automatically applied in all other applications)
    • As a result, she feels isolated. This separation has made it difficult to integrate the data science team into the rest of the software engineering toolchain making it harder to move her models, when ready, to production.
    • If she had access to optionally saving (or converting automatically, such as via a githook) to a new format that supported these workflows, she would be much more intergrated with her colleagues and able to reuse existing tools in her organization.

How a potential solution would look

  • Amal, the data scientist, opens a Jupyter notebook using Jupyter. She is able to see inputs and outputs generated by her team the last time they were saved.
  • Amal sets a configuration option in the notebook that causes Jupyter to save the file as .nff (new file format) instead of .ipynb.
  • Adding this option does not change Amal’s experience with the notebook interface. Everything else about the file works the same - running cells, displaying rich outputs, sharing with her colleagues are all the same.
  • Her colleague, Madhuri, wants to see the changes that Amal has made.
    • Amal saves the file to their shared SMB share.
    • Madhuri opens the file from the share using Jupyter > Open. Everything appears exactly as it did with the .ipynb format.
    • Madhuri has some tooling built around the existing notebook format (.ipynb), so she removes the configuration setting at the top of the file, and it continues to work properly.
  • Amal is ready to contribute the file to the repo. She goes into the command line and, using standard git commands, adds and commits the notebook to her repo.
  • Amal decides to change a hyperparameter.
    • She creates a branch locally, and opens the notebook in that branch.
    • She thought one variable change would be enough but it ends up being a number of different changes before she gets her model to converge.
    • She also needs to make a change to a python file that is included with her overall project.
    • She finally reruns all the cells, generating outputs inline, and sees that everything looks correct.
  • She goes back to the command line and decides to commit this change to the repo.
    • When she adds and commits the file, she sees only the lines that impact inputs that are being checked-in. This is despite several large output blobs that have changed in the file.
    • She's also able to see python changes as well - the changes feel like a unified change, rather than siloed changes - one in a notebook and one in a python file.
    • She pushes the commit to GitHub and the diff is pushed up to site.
  • Amal goes to GitHub and executes a pull request against the core repository. She can see the line diffs in a straightforward side-by-side comparison - both python and notebook seen side by side.
  • Amal tells Madhuri via slack that she's made a change and wants feedback on her PR and wants a review.
    • Madhuri logs in and sees the changes. She’s curious about why Amal changed the file signature to the python function and how that impacts its use in the notebook.
    • She’s able to make an inline comment which immediately triggers Amal to come and discuss it.
    • The two go back and forth in the flow, and agree on the final decision to add another parameter to the function.
    • Amal goes back to her original commit, makes the changes and files a new PR.
    • Madhuri LGTMs the PR and its merged into the main repo.
  • At this point, the automated CI/CD takes over.
    • The workflow goes through a standard flow - stripping comments, linting, running unit tests, packaging for distributed training, and then running the distributed.
    • Because the file only triggers this when a significant change has been made, the fact that the outputs have been removed from the core notebook file, and diffs only show line level changes, the tools are not mistakenly triggered on irrelevant content.
  • The CI/CD works great, and kicks off distributed training. Soon the project will be rolled out to production!

Basic Use Cases

Using diff or patch comparison tools, a user or tool should be able to accomplish the following. (.nff == ".new_file_format_extension")

diff originalfile.nff updatedfile.nff > patchfile.patch
patch originalfile.nff -i patchfile.patch -o updatedfile.nff

Tools to consider compatibility with as we move forward:

  • nbdime
  • nbconvert
  • jupyter-text
  • jupyter-format
  • reviewnb
  • jupytext
  • wrattler
  • nbviewer

Features/Requirements of an Optional Format

  • Using the new notebook format is optional. A user that chooses the new format will have 100% similar functionality with the old format. When interacting with any of the core Jupyter tools, they will not experience any difference.
  • The format is 100% round-trippable to .ipynb. That’s not to say that all functionality in the new format will work in .ipynb, but 100% of .ipynb functions will work in the new format. Non-functional items will be preserved intact.
  • Supporting diff and patch so that tools that embed these tools will function
  • Users with the existing format will continue to have a first class experience and will never be forced to upgrade to the new format without their explicit consent

Compatibility with Jupyter Format Standard

To be done

Options Under Consideration

Improve this by creating a new storage format

TODO: insert proposed path forward here

Improve this with minor modifications to the ipynb storage format

Several of the issues raised with diff and patch have raised also simply boil down to JSON, as opposed to the underlying data structure itself. Another approach would be to simply try swapping out JSON for some other, more diffable structure such as YAML. This would be quite elegant, as YAML is explicitly a superset of JSON

Improve this with minor modifications to the ipynb storage structure

Another option is to solve this purely at the level of the structure of the IPYNB JSON that is saved to disk (not the in-memory object that is loaded with nbformat). I can think of three big issues with diffing the current IPYNB files:

  • the outputs are incomprehensible - e.g. images would be rendered as opaque blobs (as opposed to storing the images externally with a pointer) ** NOTE: Need additional examples here **
  • the metadata often changes in a way that isn’t relevant to the user’s diff
  • the JSON formatting requirements (e.g. special-casing characters) is cumbersome (more of a problem w/ editing than diffing per-se)

This is compounded by the fact that the notebook outputs and metadata are interwoven with the content (which is most likely what most users care about when they’re looking at a diff). This is not universally the case - for example, a diff could contain output that is relevant to the process (e.g. changes in hyperparameters from a code cell that searches the hyperparamter space for the best set).

So, one option could be to re-work how the ipynb files are structure on-disk. They remain JSON, but the structure looks something like:

<for cell in cells>
    <cell input>
    <reference to cell output>
<notebook metadata>
<for output in outputs>
    <cell output>

That way, the incomprehensible things (the outputs) would be at the bottom of any diff, and could either be filtered out or simply ignored more easily than they currently are, allowing the user to focus on the content sections of the file. This would require some form of associateion system (e.g. ID references that connect otherwise non-connected elements)

Improve this without changing the ipynb format or creating a new one

Changing the core ipynb format, or adding a new one, is a potentially disruptive move. These issues around diffing/merging/commenting could also be improved with better tooling, bridges, etc. See

Rationale and alternatives

Unresolved Questions

  • What parts of the design do you expect to resolve through the JEP process before this gets merged?
  • What related issues do you consider out of scope for this JEP that could be addressed in the future independently of the solution that comes out of this JEP?

Below are a list of concerns that must be addressed:

  • Lossless round-tripping between .ipynb and .nff
  • 100% compatibility with any tools that engage with jupyter
    • QUESTION: Possible?
  • Format must not be commercially restricted in some way
  • Format should be interactable - not a read-only and/or intermediate format
  • Format must include outputs
    • QUESTION: Necessary? What use cases need outputs included?
    • QUESTION: Would a separate file with pointers be acceptable?
  • Format should be compatible with being included in the default install as an option (though will not be the default for a significant amount of time)
  • Should the format be email-able?

Answered questions

  • More performant viewing on a web page (how do we measure?) - What are the performance bottlenecks in rendering? Can we help here?
    • A: performance (e.g. rendering, viewing, etc) is likely not a result of the underlying format, and there are several rendering tools that are highly performant for ipynb files (e.g., GitLab, nbviewer, and all of the web-based jupyter interfaces, such as jupyterlab/notebook, nteract, vscode ipynb extension, pycharm, etc)

Prior Art

Discuss prior art, both the good and the bad, in relation to this proposal. A few examples of what this can include are:

  • Does this feature exist in other tools or ecosystems, and what experience have their community had?
  • For community proposals: Is this done by some other community and what were their experiences with it?
  • For other teams: What lessons can we learn from what other communities have done here?
  • Papers: Are there any published papers or great posts that discuss this? If you have some relevant papers to refer to, this can serve as a more detailed theoretical background.

This section is intended to encourage you as an author to think about the lessons from other languages, provide readers of your JEP with a fuller picture. If there is no prior art, that is fine - your ideas are interesting to us whether they are brand new or if it is an adaptation from other languages.

A table of notebook formats and their features

  • While rendering rich diffs visually is 'easy', most git workflows require things like comments, resolving conflicts, etc. This column is, ultimately, just opinions, but when described as 'git friendly' we would expect it to be reasonably possible to comment inline (in a persentent way), resolve git conflicts logically.
Project OSS & >50% of contrib from community Diff technique Git ‘friendly’*? Supports outputs in the same file? Additional features Rejected/Reason?
Jupyter Notebook Yes Use nbdime No Yes
MyST Notebook Yes Yes No Works well with Sphinx & Jupyter Book (references, bibliography)
Jupytext Markdown Yes Yes No Well rendered by GitHub / VS Code See e.g. https://github.com/plotly/plotly.py/tree/doc-prod/doc/python
Percent scripts Yes Yes No Notebooks as scripts. Work well in VS Code, PyCharm Pro, Spyder, Hydrogen, and also with tools like black, etc.
jupyter -format Yes
MatLab No
R Markdown No Yes No. But the .nb.html file does.
Pandoc Markdown Yes? Yes Yes
CoLab No
MLFlow No
Zeppelin Yes

Previous discussions, JEPs, etc about text-friendly format

Guide-level explanation

Explain the proposal as if it was already implemented and you were explaining it to another community member. That generally means:

  • Adding examples for how this proposal affects people’s experience.
  • Explaining how others should think about the feature, and how it should impact the experience using Jupyter tools. It should explain the impact as concretely as possible.
  • If applicable, provide sample error messages, deprecation warnings, or migration guidance.
  • If applicable, describe the differences between teaching this to existing Jupyter members and new Jupyter members.

For implementation-oriented JEPs, this section should focus on how other Jupyter developers should think about the change, and give examples of its concrete impact. For policy JEPs, this section should provide an example-driven introduction to the policy, and explain its impact in concrete terms.

Not Yet Implemented

Reference-level explanation

This is the technical portion of the JEP. Explain the design in sufficient detail that:

  • Its interaction with other features is clear.
  • It is reasonably clear how the feature would be implemented.
  • Corner cases are dissected by example.

The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.

Not Yet Implemented

Rationale and alternatives

  • Why is this choice the best in the space of possible designs?
  • What other designs have been considered and what is the rationale for not choosing them?
  • What is the impact of not doing this?

Not Yet Implemented

Below are a few alternatives that could be explored

Alternative approaches to changing the ipynb format

Improve this with Jupytext + documentation

Recommend that users use Jupytext (https://jupytext.readthedocs.io/) to automatically keep two versions of their notebooks: one that is human-and-diff-friendly, one that is machine-friendly and messier with more information. Outputs are in the ipynb format, not the text format. The text file is generally treated as the source of truth in merging conflicts.

Note: one could assume that the only time someone edits an ipynb file is with a jupyter server, and jupytext will automatically synchronize the ipynb and text file as long as the jupyter server is running. However, you could imagine many people editing the text file without a jupyter server (e.g. via a comment in github). That’s why the text file should always be the source of truth. (This is also the case with the nteract desktop app)

Providers that build UIs on top of git could add support in the following way

  • E.g. Two-way synchronization between a text-based notebook and an ipynb file with Jupytext.
  • Use Jupyter UI to activate this "pairing" so that it will automatically save an ipynb file to [.py/myst-markdown/pandoc markdown/etc]. (Note, this means that other clients like Lab, classic Notebook & nteract will have to implement the pairing functionality as well)
  • Develop a mechanism to either move outputs to a specific section of the file (making it easier to diff/exclude) or pointers to an external file
  • Upstream recommendations to other tools (e.g. GitHub) - GitHub presents a warning that says "if you want text-based diffing for notebooks, we recommend you use jupytext to pair your ipynb files with a text-based version of them. in your PRs, make comments, edits, etc to the text-based versions."
  • GitHub further treats a paired ipynb and text-based file in a special way. E.g.: "if in a PR, two files are detected with jupytext metadata that links them, then in the diff only show the text file, and in the "enriched view" only show the notebook file.

Potential challenge here

  • Scenario
    • 2 data scientists work on same notebook using git.
    • Data scientist A uses notebook with ipynb, data scientist B uses jupytext. B changes cell and pushes both files
    • Potential problem: A can't resolve conflict easily, has to pull B's change, resolve conflict in jupytext, export to ipynb, push.
  • One potential solution
    • In this case, any changes to the ipynb file via a jupyter server will be automatically reflected in the text file. If we assume that the text file is always the source of truth, then DS A will merge changes into their text file, jupytext will automatically update the ipynb file, and then proceed.
    • You could also imagine an extreme case (maybe a setting in jupytext or something), where jupytext stores ipynb files with no content in them, only cells with outputs. Then you rely on ipynb for all the messiness of outputs, on the text file for the content and structure of the document, and use jupytext to sync them

Improve the tooling around ipynb diffing

There are tools out there that facilitate diffing and merging with the notebook format (most notably, https://nbdime.readthedocs.io/). Perhaps there are ways that this tool could improve its functionality in order to more easily integrate into git-based workflows, or into products that build on top of git-based workflows (like GitHub).

For reference, here is the output from nbdime and git when diffing a notebook with a single line changed:

Git

$ git diff Untitled.ipynb
diff --git a/Untitled.ipynb b/Untitled.ipynb
index e2f4c76..199ae3e 100644
--- a/Untitled.ipynb
+++ b/Untitled.ipynb
@@ -6,7 +6,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "print('hi')"
+    "print('there')"
    ]
   }
  ],

nbdime

$ nbdiff Untitled.ipynb

nbdiff Untitled.ipynb (HEAD) Untitled.ipynb
--- Untitled.ipynb (HEAD)  (no timestamp)
+++ Untitled.ipynb  2020-07-03 16:56:33.438469
## modified /cells/0/source:
-  print('hi')
+  print('there')

Improve online products for diffing/merging ipynb files

As we’ve discussed in this document, many people do their diffing/merging/editing/commenting via web services and interfaces. For example, GitHub and GitLab.

As these services have control over the interfaces that are exposed to users, and there is already some support for more “rich” interactions with certain formats (e.g., GitHub's fancy support for images), the story around git-based notebook workflows could be improve at the level of these interfaces.

Some issues to track this:

Sustainability issues

The Library of Congress Sustainability of Digital Formats has a schema for cataloguing digital document formats as well as a set of criteria against which the sustainability of digital documents formats can be tracked.

Sustainability factors include:

There are also fields associated with Quality and functionality factors which for text documents include: normal rendering, integrity of document structure, integrity of layout and display, support for mathematics/formulae etc., functionality beyond normal rendering.

Thet .ipynb format is not currently on the list of mentioned formats. Records for geojson and Rdata provide a steer for the sorts of thing that a such a record might initially contain.

Downsides to creating a new format, or extending the current one

Changing the ipynb standard, or creating a different format, may have negative consequences. We should answer questions such as the following:

WIP / Outline below:

  • What are the downsides of creating a new notebook-based format?
    • Fracturing ecosystem (e.g. “it worked in .nff why doesn’t it work in .ipynb or vice versa”)
    • Core jupyter engineering/testing cost
    • Confusion for users - (e.g. which one should I use)?
  • What are the downsides of changing the current ipynb format?
    • Millions and millions of existing users
    • This is not being considered at this time
  • Why does the current tooling ecosystem not work in a way that cannot be resolved by iterative improvements to this ecosystem?
    • diff & patch do not work elegantly with current format
    • Difficult for humans to interact with
    • Difficult to comment on in standard git flows (e.g. via Reviewable, GitHub, GitLab, etc)
    • Produces noise-y commits

Future possibilities

Think about what the natural extension and evolution of your proposal would be and how it would affect the Jupyter community at-large. Try to use this section as a tool to more fully consider all possible interactions with the project and language in your proposal. Also consider how this all fits into the roadmap for the project and of the relevant sub-team.

This is also a good place to 'dump ideas'. if they are out of scope for the JEP you are writing but otherwise related.

If you have tried and cannot think of any future possibilities, you may simply state that you cannot think of anything.

Note that having something written down in the future-possibilities section is not a reason to accept the current or a future JEP; such notes should be in the section on motivation or rationale in this or subsequent JEPs. The section merely provides additional information.

Not Yet Implemented