-
-
Notifications
You must be signed in to change notification settings - Fork 36
Respect autosave setting in RTC backend #479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// Force autosave to be true by default initially | ||
if (docmanagerSettings) { | ||
void docmanagerSettings.set('autosave', true); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this change user's autosave setting? I think we could just take the value as-is because autosave is the default in lab and notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if autosave is set to true by default, then we don't need this.
Let me go ahead and remove it.
I think this makes sense. If others agree, I wonder if we should make it clear in the UI or at very least document it somehow. |
Having autosave enabled when at least one client wants it and disabled when all clients wants it doesn't make sense to me, that's why there was no choice but having autosave enabled in the first place. I have never seen this kind of behavior anywhere. Correct me if I'm wrong but all collaborative applications have autosave enabled (Google Docs...)? |
I think we can decide how to handle autosave when multiple clients are working on a document. However, what makes the most sense to me is to be able to disable autosave when the extension is installed, but I am working alone on the document. |
The difference is you can install docprovider without using RTC, this is just to have:
Forcing non-RTC users to use autosave in that scenario is not good because they may (and in fact have, which is why we opened this PR) ran into IO limitations with large enough notebooks stalling high-performance workloads. So I think what we want to achieve is to:
Quoting my earlier comment from a month ago jupyterlab/jupyterlab#14619 (comment):
Databricks also run in issues with autosaving of large notebooks and automatically disables autosave for notebooks larger than 8 MB, see https://kb.databricks.com/notebooks/notebook-autosave Here is an interesting pattern from OnlyOffice:
https://www.onlyoffice.com/blog/2020/04/save-and-force-save-in-onlyoffice-never-lose-a-document Another, one is Collabora Office where auto-save is enabled by default but can be disabled (and interval can be configured). |
@davidbrochart did you have time to think about it a bit more? Any other thoughts/suggestions? |
@@ -291,6 +292,16 @@ async def on_message(self, message): | |||
""" | |||
On message receive. | |||
""" | |||
if message == "save_to_disc": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should use a custom message type (see here). Maybe 2
followed by save
?
packages/docprovider/src/ydrive.ts
Outdated
@@ -123,7 +126,7 @@ export class RtcContentProvider | |||
const provider = this._providers.get(key); | |||
|
|||
if (provider) { | |||
// Save is done from the backend | |||
provider.wsProvider?.ws?.send('save_to_disc'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should use a custom message type (see here). Maybe 2
followed by save
?
Also, should we wait for a reply indicating that the file has indeed been saved? Otherwise the following get
will probably not return the state of the saved file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise the following get will probably not return the state of the saved file.
True, But since the signal below is fired after each save from server (due to hash change) and the contents model is automatically updated with the new values, it may not be necessary to wait for the reply here to update the contents model.
jupyter-collaboration/packages/docprovider/src/ydrive.ts
Lines 198 to 200 in 9059310
this._ydriveFileChanged.emit({ | |
type: 'save', | |
newValue: { ...model, hash: hashChange.newValue }, |
FWIW I strongly agree with the idea that you don't always want auto-save with RTC. It seems to me that many things are benefiting from RTC's design of moving state to the backend that are not collaborative environments. In those cases, users would like Jupyter to work as it did before, but also allow for many of the benefits of moving state to the server side. |
Technically no state is moved to the server when using CRDTs, the state is just distributed among all peers, the server being just one of them. But yes in a future where jupyter-collaboration makes it in Jupyter core, one will use them even when "collaborating with oneself", and I can see an interest in deciding when to save to disk. |
const autosave = | ||
(this._docmanagerSettings?.composite?.['autosave'] as boolean) ?? true; | ||
|
||
sharedModel.awareness.setLocalStateField('autosave', autosave); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also include autosaveInterval
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should also send the autosaveInterval
to the backend?
The reason I ask is that currently, JupyterLab (frontend) automatically calls save
after every autosaveInterval
, which in turn triggers save_to_disc
on the backend.
However, the current implementation is that when autosave is enabled, the backend saves the document to disk on every document change, regardless of the configured autosave interval.
If we want to respect the autosave interval properly, I think we could consider removing the _on_document_change
function on the backend.
jupyter-collaboration/projects/jupyter-server-ydoc/jupyter_server_ydoc/rooms.py
Lines 235 to 237 in 9059310
def _on_document_change(self, target: str, event: Any) -> None: | |
""" | |
Called when the shared document changes. |
Here's why:
- When autosave is off, only manual saves are possible, so observing document changes for saving might not be necessary.
- When autosave is on, the client will already call
save
at the configured interval, which will triggersave_to_disc
. Thus, observing document changes on the backend may also be redundant in this case.
One potential caveat is that if multiple clients with different autosaveInterval
values are connected to the same document, save_to_disc
will still be called for each of them when their individual autosave timers trigger.
Would love to hear your thoughts on this approach and whether you see any concerns or alternatives.
CC: @davidbrochart
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move the discussion to a new issue and address this in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is reasonable. We could create a new issue after this PR gets merged.
Fixes jupyterlab/jupyterlab#14619
Previously, documents were always written to disk on changes, even if autosave was disabled. This PR fixes that by:
The related PR in
jupyterlab
re-enables manual save in RTC mode. This allows users to save explicitly when autosave is off. On manual save, the frontend sends asave_to_disc
message via the document's WebSocket provider, triggering a backend save.Note: I didn't find a way to determine which client made the change. So if any connected client has autosave enabled, the document will be saved.
Feedback on the approach is welcome!