Skip to content

Feature/39 box integration#85

Open
aspiringLich wants to merge 26 commits into
developfrom
feature/39-box-integration
Open

Feature/39 box integration#85
aspiringLich wants to merge 26 commits into
developfrom
feature/39-box-integration

Conversation

@aspiringLich
Copy link
Copy Markdown
Contributor

@aspiringLich aspiringLich commented Mar 9, 2026

Upload files from MEDIA_ROOT/uploads to box in a background task. Adds Storage backends with the following methods (the implementation differs slightly from #43):

  • upload(file, path: str) -> uri: str
  • list(path: str) -> uris: list[str]
  • download(uri: str) -> (file, filename: str)

The Box backend uses caching to avoid repeated redundant api calls.

Note that I provide options to use OAuth, jwt, and developer token authentication. If the envar BOX_DEVELOPER_TOKEN exists it uses that, otherwise it tries to authenticate with jwt or OAuth.

Also adds logging support to console with pretty print for nice colors.

closes #39
closes #43

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Box.com integration intended to upload files from MEDIA_ROOT/uploads and introduces environment-based configuration for Box authentication.

Changes:

  • Adds a background worker (archive/media_upload.py) to scan MEDIA_ROOT/uploads and upload media files to Box, mirroring folder structure.
  • Adds Box-related environment configuration and a custom logging formatter in Django settings.
  • Updates dependency/configuration files (requirements.txt, flake.nix, .env.example) to support the integration.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
flake.nix Adds a custom Box SDK build and includes Box/python-dotenv in the dev shell.
bfd9000_web/requirements.txt Adds Box + dotenv dependencies (and type stubs).
bfd9000_web/archive/media_upload.py Implements the Box upload worker, folder creation, caching, and cleanup logic.
bfd9000_web/archive/apps.py Starts the background upload thread from AppConfig.ready() (runserver-only).
bfd9000_web/BFD9000/settings.py Loads .env, adds Box env vars, adds a pretty/colored logging configuration, and adds a Box env check.
bfd9000_web/.env.example Provides a new example environment file including Box configuration.
bfd9000_web/dot-env.example Removes the previous example env file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread bfd9000_web/BFD9000/settings.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment on lines +108 to +112
def upload_file(file_path: Path) -> bool:
"""
Upload a file to Box.com, creating folder structure as needed, and deleting the file if it already exists.

The folder structure in Box will mirror the structure relative to MEDIA_ROOT.
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New Box upload logic (folder creation, conflict handling, deletion/pruning) is introduced here but no tests are added. Since archive/tests/ already exists with pytest, please add unit tests that mock the Box client and cover at least the 409-conflict delete-and-retry path and successful upload cleanup.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-sonnet-4-6] This was not addressed — no tests were added in the diff and there was no response to this comment. Still required.

Comment thread bfd9000_web/requirements.txt Outdated
Comment on lines +12 to +13
django-stubs>=4.2,<5.0
djangorestframework-stubs>=3.14,<4.0
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

django-stubs and djangorestframework-stubs are type-checking/dev-time packages; adding them to requirements.txt makes them production/runtime dependencies and duplicates the purpose of requirements-dev.txt. Move these to requirements-dev.txt (or another dev-only requirements file) and keep requirements.txt limited to runtime deps.

Suggested change
django-stubs>=4.2,<5.0
djangorestframework-stubs>=3.14,<4.0

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed them but this one is kinda weird. Do we want a requirements-dev.txt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aspiringLich Of course, whiy not ? requiremetns-dev are tools required for developmetn, like linting tools and stubs. By the way, we use Pylance, which does not usee *stubs, it uses the other ones, so we shouldn't need *stubs here. Look into this.

Comment thread flake.nix
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/apps.py Outdated
Comment thread bfd9000_web/BFD9000/settings.py
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/BFD9000/settings.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/apps.py Outdated
Comment thread bfd9000_web/requirements.txt Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/requirements.txt Outdated
@zgypa
Copy link
Copy Markdown
Member

zgypa commented Apr 4, 2026

[claude-sonnet-4-6] [REQUIRED] Issue #43 ("Develop Initial Storage Layer") is not fulfilled by this implementation

This PR is tagged as closing both #39 and #43. The Box upload worker satisfies the spirit of #39, but #43 specifies a different design:

With ABC interface, and ability for multiple implementations (DICOM node, box, drive, local storage/smb…). First implementation should have capabilities for box. So #39 should be using this.

Required abilities per #43:

  • upload(record: Record, destination: Endpoint)
  • list(endpoint: Endpoint, filter) -> List[Record]
  • download(record: Record, from: Endpoint) -> Record

What this PR delivers instead:

  • A hardcoded Box background worker with no ABC and no pathway to other implementations without rewriting the module.
  • upload is implicit (background file scanner), not a callable interface method.
  • list is not implemented at all.
  • download_file(box_file_id: str) -> (IO[bytes], str) does not match the required signature and returns a stream, not a Record.

Two acceptable paths forward:

  1. Refactor media_upload.py around an ABC storage interface before merge, so that Develop Initial Storage Layer #43 is genuinely closed.
  2. Explicitly scope this PR to only close Box Integration #39, remove Develop Initial Storage Layer #43 from the closing list, and open a follow-up issue/PR for the ABC layer.

Comment thread bfd9000_web/requirements.txt Outdated
Comment thread flake.nix Outdated
Comment thread bfd9000_web/archive/apps.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/media_upload.py Outdated
from pathlib import Path
from typing import IO, List

from box_sdk_gen import BoxAPIError, BoxClient, BoxJWTAuth, CreateFolderParent, FileBaseTypeField, FolderBaseTypeField, JWTConfig, WebLinkBaseTypeField
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[claude-sonnet-4-6] [MEDIUM] No tests for the new upload/retry/cleanup logic

archive/tests/ already exists with a pytest setup, but this entire new module has no test coverage. Copilot flagged this with no response from the author.

Minimum cases needed:

  • Happy-path upload → source_file updated to box://<id> in the DB
  • 409-conflict → delete-and-retry path succeeds on second attempt
  • Exhausted retries (3× 409) → returns False, local file not deleted
  • prune_empty_directory → removes empty parents, stops at MEDIA_ROOT
  • _get_item_by_name → positive and negative cache hits avoid extra API calls
  • media_upload_worker → exits cleanly when env vars are missing

Comment thread bfd9000_web/archive/media_upload.py Outdated
Comment thread bfd9000_web/archive/views.py Outdated
Comment thread bfd9000_web/BFD9000/settings.py
Comment thread bfd9000_web/.env.example
@zgypa
Copy link
Copy Markdown
Member

zgypa commented Apr 5, 2026

[claude-sonnet-4-6] Review summary — please check all threads before merging

Several comments in this PR may be hard to spot because they are anchored to outdated diff positions and hidden behind collapsed "Show outdated" banners in the Files Changed tab. This comment summarizes all open findings for visibility.


Blockers

  • requirements.txt line 13 + flake.nix lines 24–59 — Wrong Box SDK package. The code imports box_sdk_gen but both files install boxsdk (the old, incompatible SDK). The author replied "switched to box-sdk-gen" in the flake.nix thread but the change was never applied in the diff. A clean install will fail with ModuleNotFoundError.
  • apps.py line 32 — _is_reloader_process is misleadingly named. The double-negation at the call site is a maintenance hazard. Rename to _is_main_server_process and flip the return value.

Medium

  • media_upload.py line 41 — Auth order contradicts PR description (JWT wins over developer token, should be the other way around).
  • media_upload.py (whole file) — No tests added for any of the new upload/retry/cleanup logic. The Copilot thread on this received no response.

Minor

  • media_upload.py line 193 — File is opened before the NotImplementedError size check. zgypa flagged this; no response, not fixed in diff.
  • views.py line 349 — import mimetypes inside a function body; should be top-level.

Low

  • settings.py (PrettyFormatter) — ANSI codes emitted unconditionally; breaks log aggregation in Docker/production. Copilot thread received no response.
  • .env.exampledot-env.example was removed but developer docs/scripts referencing it were not updated. Copilot thread received no response.

Architecture

Copy link
Copy Markdown
Member

@zgypa zgypa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@aspiringLich
Copy link
Copy Markdown
Contributor Author

Resolved all reviews; updated initial description


# ── Abstract interface ────────────────────────────────────────────────────────

class StorageBackend(ABC):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: this is closer to issue #43 than the earlier version, but it still does not match the contract that issue describes even though the PR body says it closes #43. The issue requires a record/endpoint abstraction (upload(record, destination), list(endpoint, filter) -> List[Record], download(record, from) -> Record), while this ABC is still path/link based and returns raw streams/strings. archive.media_upload also still hardcodes BoxStorageBackend() directly instead of depending on the abstraction for uploads. Unless the PR description is narrowed back to only #39, the scope mismatch is still real.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue isn't that precise. E.g. you can figure out the backedn type from the URI prefix. So do that. And then use URI types instead of str for typing. That will clean up the module.

[nitpick] if you feel like it, you can refactor the module into a package, and have each ABC and implementation as a different module.

Comment thread bfd9000_web/archive/media_upload.py Outdated
def media_upload_worker():
from BFD9000.settings import BOX_DEVELOPER_TOKEN, BOX_FOLDER_ID, BOX_JWT_CONFIG_FILE

if not BOX_DEVELOPER_TOKEN and not BOX_JWT_CONFIG_FILE:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: this startup guard is stricter than the backend auth logic below. archive.storage._get_box_client() supports OAuth with BOX_OAUTH_CLIENT_ID + BOX_OAUTH_CLIENT_SECRET, and the new compose/example docs advertise that as a supported configuration, but the worker exits here unless a developer token or JWT config is present. In an OAuth-only deployment, downloads can authenticate but the background uploader never starts, so files remain on local disk indefinitely.

with open(file_path, "rb") as f:
link = BoxStorageBackend().upload(f, str(relative_path))

qs.update(source_file=link)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High: this rewrites DigitalRecord.source_file from a local media path to a box://... link, but the rest of the API still serializes that field as if it were a local FileField. DigitalRecordSerializer.get_image_url() still returns obj.source_file.url and get_file_size() still reads obj.source_file.size, so after archival Django advertises /media/box%3A/123 as the image URL and size lookup falls back to None/FileNotFoundError. I verified that behavior locally with source_file.name = "box://123". The new image() view handles Box-backed files, but list/detail responses will still expose broken metadata unless the serializer is updated too.

if not media_root.exists():
return 0

image_extensions = {".jpg", ".jpeg", ".png", ".gif", ".bmp", ".tiff", ".webp"}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: this worker only archives a small image-extension allowlist, but the rest of the app still accepts other source_file types. In particular, scan_tiff_preview() accepts both .tif and .tiff, and DigitalRecord.source_file is still documented as supporting PNG/TIFF/STL. Records uploaded as .tif or .stl will never be archived or cleaned up by this job.

Comment thread bfd9000_web/archive/storage.py Outdated
# ── Factory functions ─────────────────────────────────────────────────────────

def get_backend(link: str) -> StorageBackend:
"""Return the backend that owns *link* (for download by known link)."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the comment add that the proper backend is chosed from the URI prefix "ex: box://"

@aspiringLich
Copy link
Copy Markdown
Contributor Author

To authenticate: Make a Box OAuth 2.0 app, copy the client and secret over, and go to /box/oauth/start. Sign in with box, you will be redirected and then its all good.

Changed the file saving so now it generates a uuid for each file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants