Skip to content

Implement pipeline decomposition (Opus / Sonnet / Haiku stages) #11

@cpitzi

Description

@cpitzi

Summary

Decompose the monolithic single-Opus-session auditor into a multi-stage pipeline with model-appropriate task assignment.

See docs/architecture.md for the full design document.

Stages

  1. Ingestion + Parsing (Haiku) — Parse references into structured JSON
  2. Procedural Verification (Sonnet + web search) — DOI resolution, PubMed queries, retraction checks
  3. Forensic Interpretation (Opus) — Heuristic application, risk classification, adversarial reasoning
  4. Report Generation (Haiku) — HTML report assembly

Key Design Decisions Needed

  • Orchestration layer: Simple Python script vs. API tool use vs. workflow engine?
  • Schema contracts: Define JSON schemas for each stage boundary
  • Parallelization: Per-reference in Stage 2, or batched?
  • Error handling: Partial Stage 2 failures — how does Stage 3 handle incomplete data?

Implementation Phases

  1. Define and test JSON schemas for inter-stage data
  2. Implement Stage 1 (Haiku parsing) as standalone prompt
  3. Implement Stage 2 (Sonnet verification) as standalone prompt
  4. Implement Stage 3 (Opus forensics) consuming Stage 1+2 output
  5. Implement Stage 4 (Haiku report generation) consuming all prior stages
  6. Build orchestration script to chain stages
  7. Benchmark cost-per-run vs. monolithic v3

Acceptance Criteria

  • Each stage independently testable with defined input/output schemas
  • Full pipeline produces equivalent output quality to monolithic v3
  • Cost-per-run reduced by ≥40% vs. monolithic Opus
  • Processing time comparable or better (parallelization in Stage 2 should help)
  • Graceful degradation when Stage 2 verification is incomplete

Dependencies

  • v3 prompt stable (baseline for comparison)
  • Test sets committed for regression testing

Metadata

Metadata

Assignees

No one assigned

    Labels

    architecturePipeline and system designv4Planned for v4

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions