Skip to content

Conversation

@imjuzcy
Copy link

@imjuzcy imjuzcy commented May 22, 2025

Some course admins update files with the same name at the same path. With this implementation of this feature, the script will dump a cache of the node tree root_node after every run. This cache is compared with the new node tree on the next run to identify which file is modified.

The cache file is just a normal binary file, does not need any special maintenance.

Caveat:

  1. On the first run once the option is enabled, all files will be redownloaded once, because the lack of the cache file is treated as if all files on Moodle are modified.
  2. If a course or module is excluded in the configuration (for example by selected_courses, skip_courses, only_sync_semester or used_modules), and re-included afterwards, every file from that course or module will be treated as modified and redownloaded.

Alternative solution to #77 instead of having to modify local file mtime

@D-VR
Copy link
Collaborator

D-VR commented Jun 4, 2025

On first look this seems like a good solution which avoids having to do more state management in syncmymoodle than necessary, thumbs up from my side

@D-VR
Copy link
Collaborator

D-VR commented Jun 5, 2025

One concern I have is, what if someone modified a file locally? For example added notes to a slide deck?

They update syncmymoodle and suddenly experience data loss if a prof modifies the slide deck.

Maybe we do need to do more state tracking on our side :(

One solution would be that we use the old root to check if a file was modified locally, before re-downloading it.. and move the locally modified file to "file name.syncconflict.hash" or something similar then, we can avoid that issue.

We might also consider adding a setting to ignore these updates for specific moodle rooms or similar. (Since there might be some weird ones which do weird stuff with course files?)

@D-VR
Copy link
Collaborator

D-VR commented Jun 5, 2025

We can also combine the solution mentioned by Septatrix with this approach, to avoid re-downloading files if they are synced for the first time with the root enabled.

Maybe just comparing the exact byte size would be enough (and if bytes are off, handle it as a sync conflict since we can't know if a user modified it)

@imjuzcy
Copy link
Author

imjuzcy commented Jun 5, 2025

One solution would be that we use the old root to check if a file was modified locally, before re-downloading it

If we really need to check local modification, then I think one way to do it is, as @septatrix mentioned, to make the mtime matches timemodified from Moodle, then compare it the next time.

We can also combine the solution mentioned by Septatrix with this approach, to avoid re-downloading files if they are synced for the first time with the root enabled.

That assumes that before the first run, all the files are the updated versions, no? If the file is already updated on Moodle before the first run, and the local file is outdated, the file will never be updated if there are no future modification any more, e.g. previous semester courses, etc.

Maybe just comparing the exact byte size would be enough (and if bytes are off, handle it as a sync conflict since we can't know if a user modified it)

I'm not sure about this, because if it's just a simple text file, the chance of having the same size to the byte is very high, e.g. correction of spelling error.

I think it might be hard to avoid redownload on first run.

@imjuzcy
Copy link
Author

imjuzcy commented Jun 5, 2025

To do/ to solve:

  • How to avoid redownload on first run
  • Check local/remote modification conflicts to avoid overwriting local modification
  • Add configuration option to exclude courses from update_files
  • Conflict resolution configuration option (global and specific courses): none, overwrite, rename
  • Update README

@imjuzcy
Copy link
Author

imjuzcy commented Jun 5, 2025

Just to write it down here, proposed solution to the local modification overwrite avoidance (integrating septatrix's proposal from #77):

  1. Make mtime of local file match timemodified from Moodle on download
  2. On the next run, if timemodified in current root is not the same as old root, compare mtime from local file and timemodified from old root for local modification since last run before redownloading

This might be the best of both worlds, since without the old root, we wouldn't know if differing mtime and current root's timemodified means if there are only modification on Moodle or there are also modification locally.

@D-VR
Copy link
Collaborator

D-VR commented Jun 5, 2025

Just to write it down here, proposed solution to the local modification overwrite avoidance (integrating septatrix's proposal from #77):

1. Make `mtime` of local file match `timemodified` from Moodle on download

2. On the next run, if `timemodified` in current root is not the same as old root, compare `mtime` from local file and `timemodified` from old root for local modification since last run before redownloading

This might be the best of both worlds, since without the old root, we wouldn't know if differing mtime and current root's timemodified means if there are only modification on Moodle or there are also modification locally.

Sounds good, and also to have the syncconflict resolution (i.e. move the old files)
maybe for the resolution we could also offer 3 options:

  1. Default: move the changed (or likely changed) files to a new filename <old_filename>.syncconflict.<file_hash> or something

  2. Keep the old files, i.e. just don't update.

  3. Replace the old files with new ones ignoring any local changes.

Maybe this option can then be configured for each course seperately

@imjuzcy imjuzcy force-pushed the feat-update-files branch 2 times, most recently from de0f91d to e82c5c1 Compare June 5, 2025 11:06
@septatrix
Copy link
Collaborator

Just to write it down here, proposed solution to the local modification overwrite avoidance (integrating septatrix's proposal from #77):

  1. Make mtime of local file match timemodified from Moodle on download
  2. On the next run, if timemodified in current root is not the same as old root, compare mtime from local file and timemodified from old root for local modification since last run before redownloading

This might be the best of both worlds, since without the old root, we wouldn't know if differing mtime and current root's timemodified means if there are only modification on Moodle or there are also modification locally.

What do you mean with current and old root?

Sounds good, and also to have the syncconflict resolution (i.e. move the old files) maybe for the resolution we could also offer 3 options:

  1. Default: move the changed (or likely changed) files to a new filename <old_filename>.syncconflict.<file_hash> or something
  2. Keep the old files, i.e. just don't update.
  3. Replace the old files with new ones ignoring any local changes.

Maybe this option can then be configured for each course seperately

Yeah I like this a lot. Maybe even an option whether to move the old file to a new place or keep it and simply place the incoming file into a different place. I think both have their merit, think e.g. the "Recently opened" tab of programs. conflict_resolution=preserve|overwrite|move-old|move-new

@D-VR
Copy link
Collaborator

D-VR commented Jun 6, 2025

Yeah I like this a lot. Maybe even an option whether to move the old file to a new place or keep it and simply place the incoming file into a different place. I think both have their merit, think e.g. the "Recently opened" tab of programs. conflict_resolution=preserve|overwrite|move-old|move-new

I think having an option for moving the incoming file would cause some headaches for us when doing state management, since now we have to also manage any number of new files which may get written to disk, and keep track of their states as well.

I think having just the 3 options I proposed would be more than fine, maybe that fourth option can be added in a future PR if someone really wants/needs it.

@D-VR
Copy link
Collaborator

D-VR commented Jun 6, 2025

I have also fixed the main branch so that the linting checks now run through (flake8 can only be sensible addressed with a major future refactor)

@imjuzcy imjuzcy force-pushed the feat-update-files branch from e82c5c1 to f8596a1 Compare June 6, 2025 11:38
@D-VR D-VR self-assigned this Jun 6, 2025
@D-VR D-VR added the enhancement New feature or request label Jun 6, 2025
@D-VR D-VR self-requested a review June 6, 2025 13:39
@D-VR D-VR removed their assignment Jun 6, 2025
@D-VR
Copy link
Collaborator

D-VR commented Jun 6, 2025

Just to write it down here, proposed solution to the local modification overwrite avoidance (integrating septatrix's proposal from #77):

1. Make `mtime` of local file match `timemodified` from Moodle on download

2. On the next run, if `timemodified` in current root is not the same as old root, compare `mtime` from local file and `timemodified` from old root for local modification since last run before redownloading

This might be the best of both worlds, since without the old root, we wouldn't know if differing mtime and current root's timemodified means if there are only modification on Moodle or there are also modification locally.

Have you considered also storing the ETag when getting a file?
As far as I can tell, it should be the SHA1 hash of the file content:
https://github.com/moodle/moodle/blob/9223e346c3e7ffc4b11199c0bb3836c09063f1b0/admin/tool/brickfield/classes/manager.php#L207

If we store that in the root, we can accurately detect local file changes (without relying on timestaps/bytes). Also, if we see that the timestamp has changed on the server, we can request it with HEAD and check the etag against local sha1 hash to see if it has actually changed, and only then download.

And if a client is doing this for the first time, i.e. has no cached root, but does have files downloaded, we can use the HEAD etag to compare local files vs moodle without redownloading everything.

@imjuzcy
Copy link
Author

imjuzcy commented Jun 6, 2025

What do you mean with current and old root?

Old root is the root_node from a previous run, in this PR it is dumped into a file and compared to on the next run to determine which file has changed on the server.

Have you considered also storing the ETag when getting a file?
Yes, previously septatrix suggested ETag too, but we will need to do a network request for every file, which means a lot of overhead. For the first run when we don't have a cached root_node it's fine, but for every run it will be cumbersome. That's why I opted for a local comparison with root_nodes.

I'm implementing using ETag to compare when there is no cached root_node, and before download when timemodified from server has changed. I don't see any point of storing the ETag, we can just calculate the SHA1 for local files, without having to manage a database.

@D-VR
Copy link
Collaborator

D-VR commented Jun 6, 2025

Yes, previously septatrix suggested ETag too, but we will need to do a network request for every file, which means a lot of overhead. For the first run when we don't have a cached root_node it's fine, but for every run it will be cumbersome. That's why I opted for a local comparison with root_nodes.

I'm implementing using ETag to compare when there is no cached root_node, and before download when timemodified from server has changed. I don't see any point of storing the ETag, we can just calculate the SHA1 for local files, without having to manage a database.

But shouldn't we store the file hash/ ETag from the last completed file download/update in the root cache to be able to detect if a file was locally modified so we don't delete userdata when doing a file update?

@imjuzcy
Copy link
Author

imjuzcy commented Jun 6, 2025

But shouldn't we store the file hash/ ETag from the last completed file download/update in the root cache to be able to detect if a file was locally modified so we don't delete userdata when doing a file update?

I believe when getting the content of a course, the ETags of the files are not sent. So we need to do a HEAD request for every file on every run, regardless of whether the file has changed or not. Therefore I think it's better if we use mtime to check for local modification.

These are the metadata that we have, without doing additional network requests than what is currently implemented:

{'author': 'Authur, Author',
 'filename': 'ThisIsPDF.pdf',
 'filepath': '/',
 'filesize': 23526457,
 'fileurl': 'https://moodle.rwth-aachen.de/webservice/pluginfile.php/412441/mod_folder/content/77/ThisIsPDF.pdf?forcedownload=1',
 'isexternalfile': False,
 'license': 'allrightsreserved',
 'mimetype': 'application/pdf',
 'sortorder': 0,
 'timecreated': 1748168741,
 'timemodified': 1748255118,
 'type': 'file',
 'userid': 412424}

@imjuzcy
Copy link
Author

imjuzcy commented Jun 6, 2025

Oh wait, I think I misunderstood. You mean when downloading a file, we save the ETag and use it to detect local changes on next run. We could do that, too I think. That would be safer than just using mtime.

@septatrix
Copy link
Collaborator

Oh wait, I think I misunderstood. You mean when downloading a file, we save the ETag and use it to detect local changes on next run. We could do that, too I think. That would be safer than just using mtime.

Nothing guarantees that it will remain the SHA1 hash, they could change that at any point, have done so in the past (IIRC it was MD5 before), and there have been talks in their bug tracker about changing it in the future (IIRC to SHA256)

@septatrix
Copy link
Collaborator

Let's just start with mtime, if we notice that more complex cache busting is needed we can still iterate on that later

@D-VR
Copy link
Collaborator

D-VR commented Jun 9, 2025

Oh wait, I think I misunderstood. You mean when downloading a file, we save the ETag and use it to detect local changes on next run. We could do that, too I think. That would be safer than just using mtime.

Nothing guarantees that it will remain the SHA1 hash, they could change that at any point, have done so in the past (IIRC it was MD5 before), and there have been talks in their bug tracker about changing it in the future (IIRC to SHA256)

That's true, but even if they change the hash algorithm, it's not a big issue for us, if we maintain a local cache along with the corresponding local files, so we're not solely dependent on the ETag.

That said, we should implement a check to detect if the hash algorithm has changed and raise an explicit error if it does.

For syncmymoodle, this wouldn't significantly impact updates. We can rely on the local cached state to determine whether files were modified locally since the last sync. We can then issue HEAD requests to fetch the current ETags (to get the new file hashes), compare them to hashes computed from the local files, and detect any remote changes.

After then updating the cached local state with the new hashes, we'll remain in sync with the updated Moodle version.

Keeping track of hashes is mostly good for noticing local changes and not destroying data 😅

Maybe we also talking past each other, I agree that mtime is a good signal to check if maybe we have a remote update we should pull.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants