Open-source implementation slice for ArchIToken Office/PDF runtime boundaries.
This repository fixes the current production method:
- Office native online editing uses Collabora Online as an isolated WOPI service.
- ArchIToken remains the WOPI host and owns source bytes, permissions, locks, versions, audit and
PutFilesave-back. - PDF viewing stays source-bound and browser-native by default.
- PDF editing and processing use an independent Stirling-PDF Docker sidecar through worker/API calls.
- PaddleOCR, PaddleOCR-VL and PP-StructureV3 produce OCR/layout evidence only.
- MinerU is the document-intelligence parser for Markdown/JSON/OCR/table/formula/RAG artifacts, not an editor.
- OnlyOffice is kept as an explicit fallback or licensed deployment route, not the default dependency.
02-architecture/CONSTITUTION.mdrecords the non-negotiable Office/PDF/OCR boundary.docs/ADAPTER_SOURCE_MAP.mdrecords upstream projects, license boundaries and format routes.06-workers/README.mdrecords the local sidecar profiles and worker commands.
Browser PDF viewer
-> /api/local-files/:fileId/pdf-operation
-> architoken_workers.worker_cli --adapter stirling_pdf
-> STIRLING_PDF_URL sidecar
-> real PDF/ZIP/JSON artifact or blocked/failed
Browser Office editor
-> /api/local-files/:fileId/office-session
-> ArchIToken WOPI host
-> Collabora Online sidecar
-> WOPI PutFile save-back to source object
docker compose -f 05-infra/docker/docker-compose.yml --profile pdf up -d stirling-pdf
cd 06-workers
STIRLING_PDF_URL=http://127.0.0.1:8083 \
uv run python -m architoken_workers.worker_cli \
--adapter stirling_pdf \
--job examples/stirling-pdf-merge-job.json
PADDLE_PDX_MODEL_SOURCE=bos \
PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=true \
uv run python -m architoken_workers.worker_cli \
--adapter paddleocr \
--job examples/paddleocr-pdf-ocr-job.jsonThis repository follows the ArchIToken dual-license baseline: Apache-2.0 or MIT, at your option.
Third-party runtimes such as Collabora Online, Stirling-PDF, PaddleOCR and MinerU keep their own licenses and must stay behind the documented service/worker boundaries.