Mirror SCORM zips to course contentstore for OLX export/import#115
Conversation
Open edX's OLX export bundles contentstore assets into the export tarball, but SCORM zips are unpacked into Django's default_storage and never make it into the bundle. As a result, a course exported and re-imported into a new course keeps the SCORM block's metadata but not its content. PR overhangio#71 made this more visible by keying the extraction directory on sha1(usage_key) instead of block_id; that fix prevented cross-course corruption but left imported blocks pointing at an empty directory. Add an opt-in sync that closes the gap without fighting PR overhangio#71: - studio_submit: after extract_package, mirror the original zip into the current course's contentstore as scorm_packages/<sha1>.zip (locked). - index_page_url and assets_proxy: before serving content, if package_meta is set but the unpacked tree is missing from default_storage, fetch the zip from the current course's contentstore and re-extract it. Gated behind XBLOCK_SETTINGS["ScormXBlock"]["CONTENTSTORE_SYNC_ENABLED"], defaulting to False; legacy behavior is unchanged unless an operator opts in. Imports of xmodule.contentstore are lazy and every contentstore call is wrapped in try/except so uploads still succeed when the contentstore is unavailable (tests, non-platform environments). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The contentstore mirror added in 3ff38b2 keeps every uploaded zip at scorm_packages/<package_sha1>.zip. Re-uploading produces a new sha1, so repeated edits leave the previous zips behind. Older zips are intentionally retained until they are no longer referenced (a published block whose draft has moved ahead still needs its old zip), but nothing reaped them once they were truly unreferenced. Hook the cleanup into studio_submit itself: after saving the new zip, walk the course's SCORM blocks across both draft and published branches, union their package_meta sha1s, and delete any scorm_packages/<sha1>.zip the union doesn't cover. The current upload's sha1 is always pinned into the reference set so the just-saved zip is never touched, even before the new package_meta is persisted by the modulestore. Pattern is strict 40-hex sha1 under scorm_packages/, so unrelated contentstore assets are never inspected. Keeps the package a pure XBlock — no Django app registration, no new management command, no operator action required. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_gc_unreferenced_contentstore_zips runs inside studio_submit before the XBlock framework persists the new package_meta to the modulestore, so store.get_items(..., revision=draft_only) returned the block being edited with its previous sha1. That stale sha1 stayed in the referenced set alongside the new sha1, and repeated draft uploads accumulated zips in the contentstore. Substitute the in-memory draft sha1 for the current block when collecting referenced sha1s so the stale value drops out and the older draft zip gets reclaimed. Per block, the contentstore now retains at most one published + one draft scorm_packages/<sha1>.zip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| Course export/import (contentstore sync) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| By default, SCORM zips are unpacked into Django's ``default_storage``, which |
There was a problem hiding this comment.
@ahmed-arb Thoughts on keeping this turned off by default?
| asset_key = StaticContent.compute_location( | ||
| course_key, f"scorm_packages/{sha1}.zip" | ||
| ) | ||
| try: |
There was a problem hiding this comment.
nit: lets try to avoid nested try excepts
| ) | ||
| for revision_name, revision in ( | ||
| ("published", ModuleStoreEnum.RevisionOption.published_only), | ||
| ("draft", ModuleStoreEnum.RevisionOption.draft_only), |
There was a problem hiding this comment.
This is a really cool approach. Lets also document this for future reference.
Danyal-Faheem
left a comment
There was a problem hiding this comment.
Exceptional work! I just have a few minor comments.
|
Thanks for tackling this @Syed-Ali-Abbas-568. The implementation here is very clever. That said, I want to raise an architectural concern before this merges. The fix asks every site operator to opt into a permanent, ongoing cost (two writes per upload, a read-time check on every page load, a GC sweep on every save, plus the risk of I think we're patching the wrong layer. Open edX's OLX exporter is the component that knows it's bundling a course and walks its blocks, so that's the natural place to be SCORM-aware. If the exporter pulled the package out of I realize that's a heavier lift because it requires changes in |
Summary
Open edX's OLX export only bundles assets stored in the course contentstore, but SCORM zips are unpacked into Django's
default_storage. As a result, an exported course re-imported into a new course keeps the SCORM block's metadata but the package fails to play — the unpacked files are absent from the new course's storage.This PR adds an opt-in path that mirrors each uploaded zip into the course contentstore (which IS bundled in OLX exports) and rehydrates the unpacked tree from the contentstore on first access after an import. It composes cleanly with the per-block extraction path from PR #71 and is disabled by default — legacy behavior is unchanged unless an operator opts in.
Fixes #108.
What changed
studio_submit): mirror the original zip into the current course's contentstore atscorm_packages/<sha1>.zip(locked).index_page_url,assets_proxy): ifpackage_metais set but the extracted tree is missing fromdefault_storage, fetch the zip from the contentstore and re-extract.scorm_packages/<sha1>.zipassets that no SCORM block on either the draft or published branch references, and delete them. Caps the contentstore at ≤1 published + ≤1 draft zip per block.try/exceptandxmoduleimports are lazy, so uploads still succeed in environments where the contentstore is unavailable (tests, non-platform installs).How to enable