Skip to content

ActiveInAI/Pan-office-and-pdf

Repository files navigation

Pan Office and PDF

Open-source implementation slice for ArchIToken Office/PDF runtime boundaries.

This repository fixes the current production method:

  • Office native online editing uses Collabora Online as an isolated WOPI service.
  • ArchIToken remains the WOPI host and owns source bytes, permissions, locks, versions, audit and PutFile save-back.
  • PDF viewing stays source-bound and browser-native by default.
  • PDF editing and processing use an independent Stirling-PDF Docker sidecar through worker/API calls.
  • PaddleOCR, PaddleOCR-VL and PP-StructureV3 produce OCR/layout evidence only.
  • MinerU is the document-intelligence parser for Markdown/JSON/OCR/table/formula/RAG artifacts, not an editor.
  • OnlyOffice is kept as an explicit fallback or licensed deployment route, not the default dependency.

Canonical Contracts

  • 02-architecture/CONSTITUTION.md records the non-negotiable Office/PDF/OCR boundary.
  • docs/ADAPTER_SOURCE_MAP.md records upstream projects, license boundaries and format routes.
  • 06-workers/README.md records the local sidecar profiles and worker commands.

Runtime Topology

Browser PDF viewer
  -> /api/local-files/:fileId/pdf-operation
  -> architoken_workers.worker_cli --adapter stirling_pdf
  -> STIRLING_PDF_URL sidecar
  -> real PDF/ZIP/JSON artifact or blocked/failed

Browser Office editor
  -> /api/local-files/:fileId/office-session
  -> ArchIToken WOPI host
  -> Collabora Online sidecar
  -> WOPI PutFile save-back to source object

Local Smoke

docker compose -f 05-infra/docker/docker-compose.yml --profile pdf up -d stirling-pdf

cd 06-workers
STIRLING_PDF_URL=http://127.0.0.1:8083 \
uv run python -m architoken_workers.worker_cli \
  --adapter stirling_pdf \
  --job examples/stirling-pdf-merge-job.json

PADDLE_PDX_MODEL_SOURCE=bos \
PADDLE_PDX_DISABLE_MODEL_SOURCE_CHECK=true \
uv run python -m architoken_workers.worker_cli \
  --adapter paddleocr \
  --job examples/paddleocr-pdf-ocr-job.json

License

This repository follows the ArchIToken dual-license baseline: Apache-2.0 or MIT, at your option.

Third-party runtimes such as Collabora Online, Stirling-PDF, PaddleOCR and MinerU keep their own licenses and must stay behind the documented service/worker boundaries.

About

Office and PDF runtime boundary: Collabora WOPI, Stirling-PDF sidecar, PaddleOCR and MinerU evidence routes

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors