Skip to content

Agentic PDF interaction #244

@kien-ship-it

Description

@kien-ship-it

Adeu Engine Integration Analysis

Background

The goal is to integrate the Adeu modification engine (an automated DOCX redlining engine) into the LaunchStack application. LaunchStack is a modular microservice application built with Next.js, Node.js, and Inngest for agentic runtimes. The objective is to rip out the docx editing and ingestion pipeline from Adeu and put it as a runtime feature in LaunchStack, ideally with zero additional OPEX (operational expenditure) overhead.

Core Challenge

Adeu is heavily dependent on the Python ecosystem, specifically:

  • python-docx: For parsing and writing OOXML Document ASTs.
  • lxml: For deep XML manipulation.
  • diff-match-patch: For generating text diffs.

LaunchStack runs on Node.js/TypeScript. The Node.js ecosystem lacks a robust 1:1 equivalent for deep, programmatic DOCX AST redlining. Existing libraries (like mammoth or docxtemplater) are designed primarily for reading, converting, or templating, rather than surgical AST modifications.

Integration Pathways

1. Vercel Python Serverless Functions (Recommended for Zero Fixed OPEX)

If LaunchStack is deployed on Vercel, we can leverage Vercel's ability to run Python serverless functions natively within the Next.js monorepo.

  • Architecture: Move the core Adeu Python logic into the api/ (or api/python/) directory of the Next.js app, along with a requirements.txt.
  • Inngest Integration: Inngest agents (running in Node.js) make internal HTTP calls to the Vercel Python serverless route (/api/adeu-modify).
  • Pros:
    • Zero fixed OPEX (uses existing Vercel compute allocation).
    • No separate infrastructure to manage.
    • Keeps the codebase relatively unified.
  • Cons:
    • Cold starts (initial requests may take a few seconds).
    • Execution timeouts (Vercel limits execution time depending on the plan, e.g., 10s to 60s).
    • Deployment size limits (must keep Python dependencies under Vercel's 250MB limit).

Put in docker

2. Full TypeScript Rewrite (Native "Pure Code")

Rewriting the Adeu engine into TypeScript to run natively within LaunchStack's existing Node.js runtime.

  • Architecture: Rebuild the AST parsing and diffing logic using TS libraries like jszip (to unzip .docx), fast-xml-parser or xmldom (for XML manipulation), and the JS port of diff-match-patch.
  • Inngest Integration: Runs directly as a standard TypeScript Inngest step.
  • Pros:
    • True native integration.
    • No cold starts or external HTTP boundaries.
    • Zero external infrastructure dependency.
  • Cons:
    • Extremely High Engineering Cost: Replicating python-docx's abstraction of Word's complex OOXML schema (styles, runs, paragraphs, namespaces) from scratch is a massive undertaking.

3. Dedicated Microservice or MCP Server (Traditional)

Hosting Adeu as a standalone Python service.

  • Architecture: Wrap Adeu in a FastAPI/Flask application or an MCP (Model Context Protocol) server.
  • Inngest Integration: LaunchStack agents communicate via HTTP or MCP protocol to the external service.
  • Pros:
    • Clean separation of concerns.
    • No Vercel timeout limits (ideal for massive documents).
  • Cons:
    • Introduces OPEX (requires hosting on Render, AWS, Railway, etc.).
    • Additional infrastructure to monitor and maintain.

Conclusion

If the strict requirement is no OPEX overhead, the Vercel Python Serverless approach is the most pragmatic solution. It avoids the monumental engineering effort of a TypeScript rewrite while keeping hosting costs bundled within the existing Vercel plan. If document processing takes longer than Vercel's serverless timeout limits, a dedicated microservice (Option 3) becomes unavoidable.

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions