Skip to content

Mirror SCORM zips to course contentstore for OLX export/import#115

Closed
Syed-Ali-Abbas-568 wants to merge 7 commits into
overhangio:masterfrom
Syed-Ali-Abbas-568:syed-ali/contentstore-sync-branch-updated
Closed

Mirror SCORM zips to course contentstore for OLX export/import#115
Syed-Ali-Abbas-568 wants to merge 7 commits into
overhangio:masterfrom
Syed-Ali-Abbas-568:syed-ali/contentstore-sync-branch-updated

Conversation

@Syed-Ali-Abbas-568
Copy link
Copy Markdown
Collaborator

Summary

Open edX's OLX export only bundles assets stored in the course contentstore, but SCORM zips are unpacked into Django's default_storage. As a result, an exported course re-imported into a new course keeps the SCORM block's metadata but the package fails to play — the unpacked files are absent from the new course's storage.

This PR adds an opt-in path that mirrors each uploaded zip into the course contentstore (which IS bundled in OLX exports) and rehydrates the unpacked tree from the contentstore on first access after an import. It composes cleanly with the per-block extraction path from PR #71 and is disabled by default — legacy behavior is unchanged unless an operator opts in.

Fixes #108.

What changed

  • On upload (studio_submit): mirror the original zip into the current course's contentstore at scorm_packages/<sha1>.zip (locked).
  • On read (index_page_url, assets_proxy): if package_meta is set but the extracted tree is missing from default_storage, fetch the zip from the contentstore and re-extract.
  • GC on upload: sweep scorm_packages/<sha1>.zip assets that no SCORM block on either the draft or published branch references, and delete them. Caps the contentstore at ≤1 published + ≤1 draft zip per block.
  • All contentstore calls are wrapped in try/except and xmodule imports are lazy, so uploads still succeed in environments where the contentstore is unavailable (tests, non-platform installs).

How to enable

XBLOCK_SETTINGS["ScormXBlock"] = {
    "CONTENTSTORE_SYNC_ENABLED": True,
}

Syed-Ali-Abbas-568 and others added 6 commits May 4, 2026 11:33
Open edX's OLX export bundles contentstore assets into the export tarball,
but SCORM zips are unpacked into Django's default_storage and never make it
into the bundle. As a result, a course exported and re-imported into a new
course keeps the SCORM block's metadata but not its content. PR overhangio#71 made
this more visible by keying the extraction directory on sha1(usage_key)
instead of block_id; that fix prevented cross-course corruption but left
imported blocks pointing at an empty directory.

Add an opt-in sync that closes the gap without fighting PR overhangio#71:

- studio_submit: after extract_package, mirror the original zip into the
  current course's contentstore as scorm_packages/<sha1>.zip (locked).
- index_page_url and assets_proxy: before serving content, if package_meta
  is set but the unpacked tree is missing from default_storage, fetch the
  zip from the current course's contentstore and re-extract it.

Gated behind XBLOCK_SETTINGS["ScormXBlock"]["CONTENTSTORE_SYNC_ENABLED"],
defaulting to False; legacy behavior is unchanged unless an operator opts
in. Imports of xmodule.contentstore are lazy and every contentstore call
is wrapped in try/except so uploads still succeed when the contentstore
is unavailable (tests, non-platform environments).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The contentstore mirror added in 3ff38b2 keeps every uploaded zip
at scorm_packages/<package_sha1>.zip. Re-uploading produces a new
sha1, so repeated edits leave the previous zips behind. Older zips
are intentionally retained until they are no longer referenced (a
published block whose draft has moved ahead still needs its old
zip), but nothing reaped them once they were truly unreferenced.

Hook the cleanup into studio_submit itself: after saving the new
zip, walk the course's SCORM blocks across both draft and published
branches, union their package_meta sha1s, and delete any
scorm_packages/<sha1>.zip the union doesn't cover. The current
upload's sha1 is always pinned into the reference set so the
just-saved zip is never touched, even before the new package_meta
is persisted by the modulestore. Pattern is strict 40-hex sha1
under scorm_packages/, so unrelated contentstore assets are never
inspected.

Keeps the package a pure XBlock — no Django app registration, no
new management command, no operator action required.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
_gc_unreferenced_contentstore_zips runs inside studio_submit before
the XBlock framework persists the new package_meta to the modulestore,
so store.get_items(..., revision=draft_only) returned the block being
edited with its previous sha1. That stale sha1 stayed in the referenced
set alongside the new sha1, and repeated draft uploads accumulated
zips in the contentstore.

Substitute the in-memory draft sha1 for the current block when
collecting referenced sha1s so the stale value drops out and the
older draft zip gets reclaimed. Per block, the contentstore now
retains at most one published + one draft scorm_packages/<sha1>.zip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread README.rst
Course export/import (contentstore sync)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, SCORM zips are unpacked into Django's ``default_storage``, which
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahmed-arb Thoughts on keeping this turned off by default?

Comment thread openedxscorm/scormxblock.py Outdated
Comment thread openedxscorm/scormxblock.py Outdated
asset_key = StaticContent.compute_location(
course_key, f"scorm_packages/{sha1}.zip"
)
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: lets try to avoid nested try excepts

)
for revision_name, revision in (
("published", ModuleStoreEnum.RevisionOption.published_only),
("draft", ModuleStoreEnum.RevisionOption.draft_only),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool approach. Lets also document this for future reference.

Copy link
Copy Markdown
Contributor

@Danyal-Faheem Danyal-Faheem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exceptional work! I just have a few minor comments.

@ahmed-arb
Copy link
Copy Markdown
Contributor

Thanks for tackling this @Syed-Ali-Abbas-568. The implementation here is very clever.

That said, I want to raise an architectural concern before this merges. The fix asks every site operator to opt into a permanent, ongoing cost (two writes per upload, a read-time check on every page load, a GC sweep on every save, plus the risk of default_storage and the contentstore drifting out of sync) in order to make a single, infrequent event (OLX export/import) work correctly. And operators have to know to flip the flag before they ever need to export. In practice many won't realize they needed it until an export silently produces a broken course on re-import, which is the same failure mode #108 describes today, just with an extra step in the postmortem.

I think we're patching the wrong layer. Open edX's OLX exporter is the component that knows it's bundling a course and walks its blocks, so that's the natural place to be SCORM-aware. If the exporter pulled the package out of default_storage and bundled it at export time (and restored it to default_storage on import), operators wouldn't need to configure anything, the cost would be paid only at export where it belongs, and there'd be no continuous mirror to keep consistent. The xblock would stay a pure xblock and not have to reach into modulestore/contentstore internals.

I realize that's a heavier lift because it requires changes in edx-platform rather than a self-contained xblock change, and this PR is something that can ship today. But before we commit to a continuous-sync code path here as the long-term answer, has anyone raised this with Open edX core? Framed as "the platform's export pipeline doesn't fully support an xblock that ships in the standard distribution," it seems like the kind of thing that would get traction upstream. At minimum I'd want that discussion to have happened before we lock this in. Even if we end up merging this as a stopgap, knowing whether it's a stopgap or the permanent answer matters for what we're willing to live with.

@Syed-Ali-Abbas-568 Syed-Ali-Abbas-568 moved this from Pending Triage to Won't fix in Tutor project management May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Won't fix

Development

Successfully merging this pull request may close these issues.

Export/Import and the SCORM XBlock

4 participants