Feature/39 box integration#85
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a Box.com integration intended to upload files from MEDIA_ROOT/uploads and introduces environment-based configuration for Box authentication.
Changes:
- Adds a background worker (
archive/media_upload.py) to scanMEDIA_ROOT/uploadsand upload media files to Box, mirroring folder structure. - Adds Box-related environment configuration and a custom logging formatter in Django settings.
- Updates dependency/configuration files (
requirements.txt,flake.nix,.env.example) to support the integration.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
flake.nix |
Adds a custom Box SDK build and includes Box/python-dotenv in the dev shell. |
bfd9000_web/requirements.txt |
Adds Box + dotenv dependencies (and type stubs). |
bfd9000_web/archive/media_upload.py |
Implements the Box upload worker, folder creation, caching, and cleanup logic. |
bfd9000_web/archive/apps.py |
Starts the background upload thread from AppConfig.ready() (runserver-only). |
bfd9000_web/BFD9000/settings.py |
Loads .env, adds Box env vars, adds a pretty/colored logging configuration, and adds a Box env check. |
bfd9000_web/.env.example |
Provides a new example environment file including Box configuration. |
bfd9000_web/dot-env.example |
Removes the previous example env file. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def upload_file(file_path: Path) -> bool: | ||
| """ | ||
| Upload a file to Box.com, creating folder structure as needed, and deleting the file if it already exists. | ||
|
|
||
| The folder structure in Box will mirror the structure relative to MEDIA_ROOT. |
There was a problem hiding this comment.
New Box upload logic (folder creation, conflict handling, deletion/pruning) is introduced here but no tests are added. Since archive/tests/ already exists with pytest, please add unit tests that mock the Box client and cover at least the 409-conflict delete-and-retry path and successful upload cleanup.
There was a problem hiding this comment.
[claude-sonnet-4-6] This was not addressed — no tests were added in the diff and there was no response to this comment. Still required.
| django-stubs>=4.2,<5.0 | ||
| djangorestframework-stubs>=3.14,<4.0 |
There was a problem hiding this comment.
django-stubs and djangorestframework-stubs are type-checking/dev-time packages; adding them to requirements.txt makes them production/runtime dependencies and duplicates the purpose of requirements-dev.txt. Move these to requirements-dev.txt (or another dev-only requirements file) and keep requirements.txt limited to runtime deps.
| django-stubs>=4.2,<5.0 | |
| djangorestframework-stubs>=3.14,<4.0 |
There was a problem hiding this comment.
i removed them but this one is kinda weird. Do we want a requirements-dev.txt?
There was a problem hiding this comment.
@aspiringLich Of course, whiy not ? requiremetns-dev are tools required for developmetn, like linting tools and stubs. By the way, we use Pylance, which does not usee *stubs, it uses the other ones, so we shouldn't need *stubs here. Look into this.
|
[claude-sonnet-4-6] [REQUIRED] Issue #43 ("Develop Initial Storage Layer") is not fulfilled by this implementation This PR is tagged as closing both #39 and #43. The Box upload worker satisfies the spirit of #39, but #43 specifies a different design:
Required abilities per #43:
What this PR delivers instead:
Two acceptable paths forward:
|
| from pathlib import Path | ||
| from typing import IO, List | ||
|
|
||
| from box_sdk_gen import BoxAPIError, BoxClient, BoxJWTAuth, CreateFolderParent, FileBaseTypeField, FolderBaseTypeField, JWTConfig, WebLinkBaseTypeField |
There was a problem hiding this comment.
[claude-sonnet-4-6] [MEDIUM] No tests for the new upload/retry/cleanup logic
archive/tests/ already exists with a pytest setup, but this entire new module has no test coverage. Copilot flagged this with no response from the author.
Minimum cases needed:
- Happy-path upload →
source_fileupdated tobox://<id>in the DB - 409-conflict → delete-and-retry path succeeds on second attempt
- Exhausted retries (3× 409) → returns
False, local file not deleted prune_empty_directory→ removes empty parents, stops atMEDIA_ROOT_get_item_by_name→ positive and negative cache hits avoid extra API callsmedia_upload_worker→ exits cleanly when env vars are missing
|
[claude-sonnet-4-6] Review summary — please check all threads before merging Several comments in this PR may be hard to spot because they are anchored to outdated diff positions and hidden behind collapsed "Show outdated" banners in the Files Changed tab. This comment summarizes all open findings for visibility. Blockers
Medium
Minor
Low
Architecture
|
|
Resolved all reviews; updated initial description |
|
|
||
| # ── Abstract interface ──────────────────────────────────────────────────────── | ||
|
|
||
| class StorageBackend(ABC): |
There was a problem hiding this comment.
Medium: this is closer to issue #43 than the earlier version, but it still does not match the contract that issue describes even though the PR body says it closes #43. The issue requires a record/endpoint abstraction (upload(record, destination), list(endpoint, filter) -> List[Record], download(record, from) -> Record), while this ABC is still path/link based and returns raw streams/strings. archive.media_upload also still hardcodes BoxStorageBackend() directly instead of depending on the abstraction for uploads. Unless the PR description is narrowed back to only #39, the scope mismatch is still real.
There was a problem hiding this comment.
The issue isn't that precise. E.g. you can figure out the backedn type from the URI prefix. So do that. And then use URI types instead of str for typing. That will clean up the module.
[nitpick] if you feel like it, you can refactor the module into a package, and have each ABC and implementation as a different module.
| def media_upload_worker(): | ||
| from BFD9000.settings import BOX_DEVELOPER_TOKEN, BOX_FOLDER_ID, BOX_JWT_CONFIG_FILE | ||
|
|
||
| if not BOX_DEVELOPER_TOKEN and not BOX_JWT_CONFIG_FILE: |
There was a problem hiding this comment.
High: this startup guard is stricter than the backend auth logic below. archive.storage._get_box_client() supports OAuth with BOX_OAUTH_CLIENT_ID + BOX_OAUTH_CLIENT_SECRET, and the new compose/example docs advertise that as a supported configuration, but the worker exits here unless a developer token or JWT config is present. In an OAuth-only deployment, downloads can authenticate but the background uploader never starts, so files remain on local disk indefinitely.
| with open(file_path, "rb") as f: | ||
| link = BoxStorageBackend().upload(f, str(relative_path)) | ||
|
|
||
| qs.update(source_file=link) |
There was a problem hiding this comment.
High: this rewrites DigitalRecord.source_file from a local media path to a box://... link, but the rest of the API still serializes that field as if it were a local FileField. DigitalRecordSerializer.get_image_url() still returns obj.source_file.url and get_file_size() still reads obj.source_file.size, so after archival Django advertises /media/box%3A/123 as the image URL and size lookup falls back to None/FileNotFoundError. I verified that behavior locally with source_file.name = "box://123". The new image() view handles Box-backed files, but list/detail responses will still expose broken metadata unless the serializer is updated too.
| if not media_root.exists(): | ||
| return 0 | ||
|
|
||
| image_extensions = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".tiff", ".webp"} |
There was a problem hiding this comment.
Medium: this worker only archives a small image-extension allowlist, but the rest of the app still accepts other source_file types. In particular, scan_tiff_preview() accepts both .tif and .tiff, and DigitalRecord.source_file is still documented as supporting PNG/TIFF/STL. Records uploaded as .tif or .stl will never be archived or cleaned up by this job.
| # ── Factory functions ───────────────────────────────────────────────────────── | ||
|
|
||
| def get_backend(link: str) -> StorageBackend: | ||
| """Return the backend that owns *link* (for download by known link).""" |
There was a problem hiding this comment.
in the comment add that the proper backend is chosed from the URI prefix "ex: box://"
|
To authenticate: Make a Box OAuth 2.0 app, copy the client and secret over, and go to Changed the file saving so now it generates a uuid for each file. |
Upload files from MEDIA_ROOT/uploads to box in a background task. Adds Storage backends with the following methods (the implementation differs slightly from #43):
upload(file, path: str) -> uri: strlist(path: str) -> uris: list[str]download(uri: str) -> (file, filename: str)The Box backend uses caching to avoid repeated redundant api calls.
Note that I provide options to use OAuth, jwt, and developer token authentication. If the envar
BOX_DEVELOPER_TOKENexists it uses that, otherwise it tries to authenticate with jwt or OAuth.Also adds logging support to console with pretty print for nice colors.
closes #39
closes #43