Skip to content

Respect autosave setting in RTC backend #479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

Darshan808
Copy link
Member

@Darshan808 Darshan808 commented Apr 25, 2025

Fixes jupyterlab/jupyterlab#14619

Previously, documents were always written to disk on changes, even if autosave was disabled. This PR fixes that by:

  • Sending the autosave setting from each client via document awareness.
  • Skipping disk writes on doc changes if all connected clients have autosave disabled.
  • Writing to disk only when autosave is enabled by at least one client.

The related PR in jupyterlab re-enables manual save in RTC mode. This allows users to save explicitly when autosave is off. On manual save, the frontend sends a save_to_disc message via the document's WebSocket provider, triggering a backend save.

Note: I didn't find a way to determine which client made the change. So if any connected client has autosave enabled, the document will be saved.

Feedback on the approach is welcome!

Copy link
Contributor

Binder 👈 Launch a Binder on branch Darshan808/jupyter-collaboration/fix-autosave

Comment on lines 83 to 86
// Force autosave to be true by default initially
if (docmanagerSettings) {
void docmanagerSettings.set('autosave', true);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this change user's autosave setting? I think we could just take the value as-is because autosave is the default in lab and notebook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if autosave is set to true by default, then we don't need this.
Let me go ahead and remove it.

@krassowski
Copy link
Member

Note: I didn't find a way to determine which client made the change. So if any connected client has autosave enabled, the document will be saved.

I think this makes sense. If others agree, I wonder if we should make it clear in the UI or at very least document it somehow.

@davidbrochart
Copy link
Collaborator

Having autosave enabled when at least one client wants it and disabled when all clients wants it doesn't make sense to me, that's why there was no choice but having autosave enabled in the first place. I have never seen this kind of behavior anywhere. Correct me if I'm wrong but all collaborative applications have autosave enabled (Google Docs...)?

@Darshan808
Copy link
Member Author

I think we can decide how to handle autosave when multiple clients are working on a document. However, what makes the most sense to me is to be able to disable autosave when the extension is installed, but I am working alone on the document.

@krassowski
Copy link
Member

Having autosave enabled when at least one client wants it and disabled when all clients wants it doesn't make sense to me, that's why there was no choice but having autosave enabled in the first place. I have never seen this kind of behavior anywhere.

The difference is you can install docprovider without using RTC, this is just to have:

  • server-side execution (and offline execution notifications which follows)
  • history timeline
  • better completion with jupyter-ai

Forcing non-RTC users to use autosave in that scenario is not good because they may (and in fact have, which is why we opened this PR) ran into IO limitations with large enough notebooks stalling high-performance workloads.

So I think what we want to achieve is to:

  • respect autosave preferences (on/off, interval) when only one user is connected (regardless of the number of window open?)
  • possibly force autosave (if needs be) when multiple users (not clients) connect; this could be enabled by a plugin in collaboration-ui package which could be disabled if users really don't like it, and which could add a flare in UI to indicate that autosave is active (toolbar? statusbar?) with explanation on hover that it was auto-enabled because of multi-user RTC.

Correct me if I'm wrong but all collaborative applications have autosave enabled (Google Docs...)?

Quoting my earlier comment from a month ago jupyterlab/jupyterlab#14619 (comment):

I think the pattern is to enforce autosave in cloud-synced documents, not in collaborative documents; yes most collaborative documents are cloud-synced nowadays, but in cases where they are not, the user still has to deal with their file system limitations and auto-save can lead to inadvertent side-effects, for example if users have file system watch scripts, such as performing expensive anti-virus scans on each modification (which is sometimes enforced and user has no way to disable it).

Databricks also run in issues with autosaving of large notebooks and automatically disables autosave for notebooks larger than 8 MB, see https://kb.databricks.com/notebooks/notebook-autosave

Here is an interesting pattern from OnlyOffice:

How saving works

You can decide when you want your changes sent to Document Server. Find the Autosaving option in the File tab -> Advanced settings:

If autosaving is on, your changes are sent to Document Server (the editors cache) automatically.
If it’s off you need to click the Save button to save your changes in the editors’ cache.

Saving during co-editing

The editors have two co-editing modes – Fast and Strict and they do have influence on autosaving.

In Strict mode, you lock the paragraph you are working on. Others can’t see your changes until you click the Save button, and you can’t see theirs. In this mode, when you click Save your changes are sent to Document Server as usual.

In Fast mode, you can see everything your co-authors are typing in real-time. In this mode, you don’t need to click Save at all – all the changes are saved automatically the second you stop typing. The Save button remains inactive.

https://www.onlyoffice.com/blog/2020/04/save-and-force-save-in-onlyoffice-never-lose-a-document

Another, one is Collabora Office where auto-save is enabled by default but can be disabled (and interval can be configured).

@krassowski
Copy link
Member

@davidbrochart did you have time to think about it a bit more? Any other thoughts/suggestions?

@@ -291,6 +292,16 @@ async def on_message(self, message):
"""
On message receive.
"""
if message == "save_to_disc":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should use a custom message type (see here). Maybe 2 followed by save?

@@ -123,7 +126,7 @@ export class RtcContentProvider
const provider = this._providers.get(key);

if (provider) {
// Save is done from the backend
provider.wsProvider?.ws?.send('save_to_disc');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should use a custom message type (see here). Maybe 2 followed by save?
Also, should we wait for a reply indicating that the file has indeed been saved? Otherwise the following get will probably not return the state of the saved file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise the following get will probably not return the state of the saved file.

True, But since the signal below is fired after each save from server (due to hash change) and the contents model is automatically updated with the new values, it may not be necessary to wait for the reply here to update the contents model.

this._ydriveFileChanged.emit({
type: 'save',
newValue: { ...model, hash: hashChange.newValue },

@Darshan808 Darshan808 requested a review from davidbrochart May 6, 2025 15:51
@mlucool
Copy link

mlucool commented May 8, 2025

FWIW I strongly agree with the idea that you don't always want auto-save with RTC. It seems to me that many things are benefiting from RTC's design of moving state to the backend that are not collaborative environments. In those cases, users would like Jupyter to work as it did before, but also allow for many of the benefits of moving state to the server side.

@davidbrochart
Copy link
Collaborator

Technically no state is moved to the server when using CRDTs, the state is just distributed among all peers, the server being just one of them. But yes in a future where jupyter-collaboration makes it in Jupyter core, one will use them even when "collaborating with oneself", and I can see an interest in deciding when to save to disk.

const autosave =
(this._docmanagerSettings?.composite?.['autosave'] as boolean) ?? true;

sharedModel.awareness.setLocalStateField('autosave', autosave);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also include autosaveInterval?

Copy link
Member Author

@Darshan808 Darshan808 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should also send the autosaveInterval to the backend?

The reason I ask is that currently, JupyterLab (frontend) automatically calls save after every autosaveInterval, which in turn triggers save_to_disc on the backend.
However, the current implementation is that when autosave is enabled, the backend saves the document to disk on every document change, regardless of the configured autosave interval.

If we want to respect the autosave interval properly, I think we could consider removing the _on_document_change function on the backend.

def _on_document_change(self, target: str, event: Any) -> None:
"""
Called when the shared document changes.

Here's why:

  • When autosave is off, only manual saves are possible, so observing document changes for saving might not be necessary.
  • When autosave is on, the client will already call save at the configured interval, which will trigger save_to_disc. Thus, observing document changes on the backend may also be redundant in this case.

One potential caveat is that if multiple clients with different autosaveInterval values are connected to the same document, save_to_disc will still be called for each of them when their individual autosave timers trigger.

Would love to hear your thoughts on this approach and whether you see any concerns or alternatives.

CC: @davidbrochart

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move the discussion to a new issue and address this in a separate PR?

Copy link
Member Author

@Darshan808 Darshan808 May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is reasonable. We could create a new issue after this PR gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to disable autosave in collaborative mode
4 participants