diff --git a/docs/phases/PHASES.md b/docs/phases/PHASES.md index b925bef..2f480a8 100644 --- a/docs/phases/PHASES.md +++ b/docs/phases/PHASES.md @@ -2,220 +2,553 @@ This document defines the roadmap for **OpenTrust as the official OpenClaw memory layer standard**. -It intentionally replaces the split narrative between: -- “OpenTrust as a standalone product idea” -- “OpenTrust as a prototype to discard” - -The aligned plan is: -- OpenTrust is the **reference memory layer** for OpenClaw -- the current implementation is the **completed baseline** -- future phases deepen it into a world-class storage, retrieval, insight, and traceability system - -## Completed baseline checklist - -### Phase 1 — Foundation shell -- [x] private OpenKnots repo -- [x] Next.js dashboard shell -- [x] local-first product framing -- [x] initial SQLite schema blueprint -- [x] capability model for skills, plugins, souls, and bundles -- [x] architecture and UX docs - -### Phase 2 — Local runtime foundation -- [x] real SQLite runtime layer -- [x] schema bootstrap and migration path -- [x] capability sync from the local OpenClaw environment -- [x] homepage driven by local DB state -- [x] real-data-only mode with seeded evidence removed - -### Phase 3 — Real evidence ingestion -- [x] OpenClaw session ingestion from local JSONL transcripts -- [x] FTS-backed investigation search -- [x] imported session traces in UI -- [x] trace detail route -- [x] rendering/query split from ingestion -- [x] safe search highlighting -- [x] explicit ingestion commands - -### Phase 4 — Workflow clarity + ingestion state -- [x] cron/workflow ingestion from `~/.openclaw/cron/jobs.json` and `~/.openclaw/cron/runs/*.jsonl` -- [x] workflow source labeling in UI -- [x] ingestion state tracking -- [x] cursor and imported-count tracking for session + cron pipelines -- [x] secret-blocking pre-commit hook via Husky + Secretlint - -### Phase 5 — Artifact extraction -- [x] artifact extraction from imported traces and workflows -- [x] artifact registry population -- [x] trace-to-artifact linkage -- [x] workflow-to-artifact linkage -- [x] artifact display in overview and trace detail views - -### Phase 6 — Tool pairing and workflow surfaces -- [x] richer tool-call / tool-result pairing -- [x] workflow detail pages -- [x] artifact explorer page -- [x] saved investigations page -- [x] workflow and artifact navigation from the main dashboard - -### Phase 7 — Lineage and incremental ingestion -- [x] event lineage edges for parent/tool-result relationships -- [x] incremental-style ingestion behavior using ingestion state cursors -- [x] per-session and per-cron-job ingestion tracking - -### Phase 8 — Semantic indexing activation -- [x] semantic chunk staging for traces and artifacts -- [x] sqlite-vec activation via installed extension package -- [x] vector-ready semantic index status in UI -- [x] semantic indexing command -- [x] vector-backed fallback search path - -### Phase 8.5 — Desktop application scaffold -Goal: wrap OpenTrust in a native desktop application using Tauri v2. - -Primary design doc: -- `docs/DESKTOP-APPLICATION-PLAN.md` - -Includes: -- [ ] Tauri v2 project scaffold (`src-tauri/`) -- [ ] Next.js static export configuration -- [ ] Tauri window and platform configuration -- [ ] sidecar Node process for the memory runtime -- [ ] desktop-appropriate database path resolution (`$APPDATA` / `~/Library/Application Support`) -- [ ] basic macOS / Windows / Linux distributable builds -- [ ] system tray integration with health status -- [ ] native OS notifications for ingestion and health events - -This phase can proceed in parallel with Phases 9–14 since the Tauri scaffold is independent of memory-layer hardening. - -## Next aligned roadmap - -These are no longer “misc optional ideas.” -They are the next phases required to turn the baseline into the full OpenClaw memory standard. - -Desktop application packaging (Phase 8.5) can proceed in parallel with the phases below, since the Tauri scaffold is independent of memory-layer hardening. - -However, memory-layer expansion phases should be treated as **post-prerequisite execution work**. -The current upstream priority order is: -1. desktop application scaffold (Tauri — see Phase 8.5 and `docs/DESKTOP-APPLICATION-PLAN.md`) -2. persistence -3. reliable run completion - -Until those are dependable, OpenTrust should prioritize alignment, scope control, and baseline preservation over aggressive expansion. - -### Phase 9 — Memory writeback and curation -Goal: support durable promotion from raw event streams to curated memory. - -Primary design docs: -- `docs/MEMORY-API-CONTRACT.md` -- `docs/CURATED-MEMORY-DESIGN.md` -- `docs/PHASE-9-IMPLEMENTATION-PLAN.md` - -Includes: -- [ ] curated memory entities and retention labels -- [ ] promotion flows from traces/artifacts to long-term memory -- [ ] review metadata for promoted memory -- [ ] write contracts suitable for future OpenClaw memory APIs -- [ ] distinction between transient evidence and curated memory - -### Phase 10 — Retrieval quality and ranking -Goal: make storage useful by improving answer quality. - -Includes: -- [ ] stronger semantic embedding models and ranking quality -- [ ] retrieval-quality evaluation flows -- [ ] source-aware ranking and provenance-weighted retrieval -- [ ] confidence / uncertainty metadata in retrieval responses -- [ ] “why this result surfaced” explanations - -### Phase 11 — Insight and metrics layer -Goal: derive dynamic intelligence from the memory substrate. - -Includes: -- [ ] richer artifact normalization -- [ ] deeper workflow analytics -- [ ] event and workflow drift detection -- [ ] operator metrics for freshness, coverage, and reliability -- [ ] insight surfaces tied directly to evidence and provenance - -### Phase 12 — Memory health and operator trust -Goal: make memory reliability operationally visible. - -Includes: -- [ ] better operator health views for ingestion and semantic indexing -- [ ] stale-pipeline detection -- [ ] missing-index / degraded-search indicators -- [ ] backup, export, audit, and report tooling -- [ ] evidence preservation and reproducible report bundles - -### Phase 13 — Authoring and investigation power -Goal: let operators work directly with the memory layer. - -Includes: -- [ ] editing and saving custom investigations from the UI -- [ ] SQL / investigation studio surface -- [ ] richer capability explorer surface -- [ ] repeatable investigation presets and query workflows -- [ ] better comparative and timeline-driven analysis tools - -### Phase 14 — Official OpenClaw integration standard -Goal: expose OpenTrust cleanly as OpenClaw’s memory infrastructure. - -Primary design docs: -- `docs/MEMORY-API-CONTRACT.md` -- `docs/OPENCLAW-PLUGIN-PACKAGING.md` -- `docs/PLUGIN-READY-REFACTOR-PLAN.md` -- `docs/OPENCLAW-PLUGIN-DRAFT-ARTIFACTS.md` - -Includes: -- [ ] stable memory API contracts for OpenClaw retrieval -- [ ] agent-facing retrieval interface design -- [ ] memory writeback / promotion interface design -- [ ] integration with OpenClaw operator surfaces -- [ ] plugin-ready packaging plan under OpenClaw plugin conventions -- [ ] clear standards for source metadata, provenance, and retrieval response shape -- [ ] eventual first-class packaging as an OpenClaw plugin - -### Phase 15 — Desktop distribution and native experience -Goal: make the desktop application production-grade and fully distributable. - -Primary design doc: -- `docs/DESKTOP-APPLICATION-PLAN.md` - -Includes: -- [ ] auto-update via `tauri-plugin-updater` -- [ ] code signing for macOS (Developer ID) and Windows (Authenticode) -- [ ] macOS notarization -- [ ] CI pipeline for cross-platform builds (GitHub Actions) -- [ ] first-run onboarding flow (database setup, OpenClaw path detection) -- [ ] background ingestion service (tray-resident) -- [ ] scheduled ingestion (hourly/daily) -- [ ] native menu bar with keyboard shortcuts -- [ ] multi-window support (detach views) -- [ ] graceful migration path from localhost usage to desktop app - -## Evaluation - -### What is already strong -- The core local-first evidence platform is complete end-to-end: ingest, normalize, store, search, and inspect. -- The retrieval stack is strong for a local system: relational SQL, FTS5, and sqlite-vec are all present. -- The operational baseline is healthy: explicit ingestion commands, cursor-based incremental behavior, and secret scanning reduce risk. -- The current system is already a credible foundation for an official OpenClaw memory layer. - -### What now matters most -- Memory authoring and writeback are still missing. -- Retrieval quality needs to become measurable and tunable, not just enabled. -- Insight and metrics should be first-class derived products of the memory substrate. -- Memory health, export, and auditability must improve for operator trust. -- Integration contracts with OpenClaw should become explicit so this can serve as the official standard rather than only an internal prototype. +It is the single source of truth for what has been built, what remains, and what comes next. + +Last verified against the codebase: **2026-03-27** + +--- + +## How to read this document + +Each phase below reflects the actual state of the codebase — not aspirational plans. + +### Status labels + +- **COMPLETE** — fully implemented and verified in the codebase +- **SUBSTANTIALLY COMPLETE** — core functionality implemented; minor gaps noted inline +- **IN PROGRESS** — partial implementation exists +- **NOT STARTED** — no implementation exists + +### Item labels + +Each item within a phase is tagged: + +- `[required]` — must be done for the phase to be considered complete +- `[optional]` — valuable but not blocking; improves coverage or usability +- `[stretch]` — upgrades the application from operational to professional/industry-grade + +--- + +# COMPLETED PHASES + +## Phase 1 — Foundation shell `COMPLETE` + +Established the project skeleton, repo, and initial architecture. + +- [x] `[required]` Private OpenKnots repo +- [x] `[required]` Next.js dashboard shell +- [x] `[required]` Local-first product framing +- [x] `[required]` Initial SQLite schema blueprint +- [x] `[required]` Capability model for skills, plugins, souls, and bundles +- [x] `[required]` Architecture and UX docs + +--- + +## Phase 2 — Local runtime foundation `COMPLETE` + +Stood up the real database layer and removed all mock/seed data. + +- [x] `[required]` Real SQLite runtime via better-sqlite3 with WAL mode +- [x] `[required]` Schema bootstrap and idempotent migration path (`ensureMigrated()`) +- [x] `[required]` Capability sync from local OpenClaw environment (`~/.openclaw/`) +- [x] `[required]` Homepage driven by live DB state +- [x] `[required]` Real-data-only mode — seeded evidence removed + +--- + +## Phase 3 — Real evidence ingestion `COMPLETE` + +Connected OpenTrust to real OpenClaw session data on disk. + +- [x] `[required]` OpenClaw session ingestion from local JSONL transcripts (`~/.openclaw/agents/main/sessions/`) +- [x] `[required]` FTS5-backed investigation search with safe escaping +- [x] `[required]` Imported session traces in UI with detail routes +- [x] `[required]` Trace detail route with events, tool calls, artifacts +- [x] `[required]` Rendering/query split from ingestion +- [x] `[required]` Safe search highlighting +- [x] `[required]` Explicit ingestion commands (`pnpm ingest:openclaw`) + +--- + +## Phase 4 — Workflow clarity and ingestion state `COMPLETE` + +Added workflow/cron ingestion and durable ingestion tracking. + +- [x] `[required]` Cron/workflow ingestion from `~/.openclaw/cron/jobs.json` and runs JSONL +- [x] `[required]` Workflow source labeling in UI (scheduled vs on-demand) +- [x] `[required]` Ingestion state tracking with cursor-based deduplication +- [x] `[required]` Per-session and per-cron-job cursor and import-count tracking +- [x] `[required]` Secret-blocking pre-commit hook via Husky + Secretlint + +--- + +## Phase 5 — Artifact extraction `COMPLETE` + +Extracted structured artifacts from ingested traces and workflows. + +- [x] `[required]` Artifact extraction from imported traces and workflows (URLs, repos, file paths) +- [x] `[required]` Artifact registry with SHA1-based deterministic IDs +- [x] `[required]` Trace-to-artifact linkage via `trace_edges` +- [x] `[required]` Workflow-to-artifact linkage via `run_artifacts` +- [x] `[required]` Artifact display in overview and trace detail views +- [x] `[required]` Smart artifact grouping by kind and path (`artifact-groups.ts`) + +--- + +## Phase 6 — Tool pairing and workflow surfaces `COMPLETE` + +Built out the core navigation and explorer surfaces. + +- [x] `[required]` Richer tool-call / tool-result pairing with error detection +- [x] `[required]` Workflow detail pages with steps, artifacts, metadata +- [x] `[required]` Artifact explorer page with sort/filter (by kind, path, newest) +- [x] `[required]` Saved investigations page with live SQL preview +- [x] `[required]` Workflow and artifact navigation from the main dashboard + +--- + +## Phase 7 — Lineage and incremental ingestion `COMPLETE` + +Added provenance tracking and made ingestion incremental. + +- [x] `[required]` Event lineage edges for parent/tool-result relationships +- [x] `[required]` Incremental ingestion using ingestion state cursors +- [x] `[required]` Per-session and per-cron-job ingestion tracking + +--- + +## Phase 8 — Semantic indexing activation `COMPLETE` + +Activated vector search alongside existing FTS5. + +- [x] `[required]` Semantic chunk staging for traces and artifacts (1200-char chunks) +- [x] `[required]` sqlite-vec activation via installed extension package +- [x] `[required]` Vector-ready semantic index status tracking +- [x] `[required]` Semantic indexing command (`pnpm index:semantic`) +- [x] `[required]` Vector-backed search with graceful fallback to FTS5/LIKE +- [x] `[required]` Hybrid search pipeline: memory entries → FTS5 → semantic fallback + +--- + +## Phase 9 — Memory writeback and curation `SUBSTANTIALLY COMPLETE` + +Added the curated memory layer — the bridge between raw evidence and durable, reviewable knowledge. + +This was the largest single phase. It introduced first-class memory entities, promotion workflows, review states, retention policies, version tracking, and a full memory API contract. + +**Deviation from original plan:** The execution pipeline recommended deferring Phase 9 until upstream prerequisites (Tauri, persistence, reliable run completion) were stable. This guidance was bypassed — Phase 9 was implemented directly, and the result is solid. The gating advice in `EXECUTION-PIPELINE.md` should be considered superseded for this phase. + +### Schema and data model +- [x] `[required]` `memory_entries` table with 18 columns (kind, retention, review status, confidence, uncertainty, author, timestamps) +- [x] `[required]` `memory_entry_origins` table linking memory to source traces/artifacts/workflows +- [x] `[required]` `memory_entry_tags` table for classification +- [x] `[required]` `memory_entry_versions` table for full audit trail +- [x] `[required]` Runtime migration in `ensureMigrated()` (not file-based migration — see note below) + +### Runtime modules +- [x] `[required]` `memory-entries.ts` — full CRUD: create, list, get, update, archive, rollback +- [x] `[required]` `memory-api.ts` — contract-aligned functions: promote, inspect, search, health, export, working snapshot +- [x] `[required]` `memory-config.ts` — retention policies, archive policies, working snapshot token budget +- [x] `[required]` `api-contract.ts` — request validation, response envelopes, enum enforcement + +### Types +- [x] `[required]` `MemoryEntry`, `MemoryEntryOrigin`, `MemoryEntryTag` interfaces +- [x] `[required]` `MemoryEntryKind`: fact, summary, decision, preference, timeline, insight, note +- [x] `[required]` `MemoryRetentionClass`: ephemeral, working, longTerm, pinned +- [x] `[required]` `MemoryReviewStatus`: draft, reviewed, approved, rejected +- [x] `[required]` `MemorySourceType`: trace, event, workflowRun, workflowStep, artifact, savedInvestigation, memoryEntry, insight +- [x] `[required]` Search, promote, inspect, and health request/response types + +### API endpoints +- [x] `[required]` `POST /api/memory/promote` +- [x] `[required]` `GET/POST /api/memory/search` +- [x] `[required]` `GET/POST /api/memory/inspect` +- [x] `[required]` `GET/POST /api/memory/health` + +### UI surfaces +- [x] `[required]` `/memory` — curated memory explorer with filters (review status, retention class) +- [x] `[required]` `/memory/[id]` — memory detail with posture cards, body, origins, review controls +- [x] `[required]` `/memory/review` — review queue for draft entries with approve/reject/mark-reviewed actions +- [x] `[required]` `/memory/health` — health dashboard with signals (ingestion freshness, review debt, semantic coverage, weak provenance) +- [x] `[required]` Promote actions from traces (`/traces/[id]/promote`) +- [x] `[required]` Promote actions from workflows (`/workflows/[id]/promote`) +- [x] `[required]` Promote actions from artifacts (`/artifacts/promote`) +- [x] `[required]` Promote actions from investigations (`/investigations/promote`) + +### Memory lifecycle +- [x] `[required]` Version snapshots on every update (content and review state changes) +- [x] `[required]` Rollback to previous versions with audit trail +- [x] `[required]` Archive maintenance: age-out, stale archival, overflow purge +- [x] `[required]` Retention policy enforcement per class (max entries, age-out days, archive thresholds) +- [x] `[required]` Working memory snapshot with token budgeting for agent context + +### Remaining gaps +- [ ] `[optional]` Retrieval ranking heuristics (approved > draft, pinned > ephemeral, multi-origin boost) — currently search is substring-based, not ranked +- [ ] `[optional]` `whyMatched` explanation fields populated with real ranking signal data +- [ ] `[optional]` Freshness-weighted ranking in search results +- [ ] `[stretch]` Confidence scoring populated by real heuristics rather than manual entry + +### Implementation note +Memory tables are created via runtime migration in `db.ts` (`ensureMigrated()`), not via file-based migrations in `db/`. This diverges from the Phase 9 implementation plan which called for `db/` migration files. The runtime approach works correctly but should be considered for consolidation in a future cleanup pass. + +--- + +## Cross-cutting: Authentication and security `COMPLETE` + +Implemented alongside the core phases but not originally called out as a distinct phase. + +- [x] `[required]` Token and password auth modes with configurable bypass for localhost +- [x] `[required]` Timing-safe credential verification +- [x] `[required]` CSRF origin validation +- [x] `[required]` Rate limiting (5 attempts per 15-min window, in-memory) +- [x] `[required]` Auth audit logging (JSON-formatted event log) +- [x] `[required]` Auth middleware on API routes + +--- + +## Cross-cutting: Demo mode `COMPLETE` + +Full demo experience for showcasing the application without a live OpenClaw environment. + +- [x] `[required]` `isDemoMode()` detection (serverless environments, cookie-based toggle) +- [x] `[required]` Realistic demo data for all major surfaces (dashboard, traces, workflows, memory, investigations, calendar, health) +- [x] `[required]` Demo mode banner and toggle in UI + +--- + +## Cross-cutting: PWA and offline support `COMPLETE` + +- [x] `[required]` Service worker with cache-first strategy and offline fallback +- [x] `[required]` PWA manifest with icons and standalone display mode +- [x] `[required]` Offline fallback page +- [x] `[required]` API routes excluded from cache + +--- + +## Cross-cutting: Design system `SUBSTANTIALLY COMPLETE` + +A broad component library is in place. 42 UI primitives plus domain-specific components. + +### Primitives (in `components/ui/`) +Accordion, Avatar, Badge, Bento Grid, Breadcrumb, Button, Card, Card Grid, Chart, Checkbox, Collapsible, Command, Data Table, Dialog, Drawer, Dropdown Menu, Empty State, Glass Card, Hero Highlight, Input, Input Group, Label, Metric Card, Metric, Page Header, Panel, Pill, Progress, Select, Separator, Sheet, Sidebar, Skeleton, Sonner (toasts), Status Badge, Table, Tabs, Textarea, Toggle, Toggle Group, Tooltip + +### Domain components +App Shell, App Sidebar, Artifact Link, Auth Controls, Border Glow, Chart Area Interactive, Code Block, Code Demo, Command Palette, Confirm Dialog, Copy Button, Data Table, Demo Mode Banner/Toggle, JSON Editor, Markdown Preview, Onboarding steps, PII Safe, Promote Button, Quick Note Dialog, Review Actions, Runnable SQL, Section Cards, Shader Background, Site Header, Theme Provider/Toggle, Title Particles, URL Preview + +### Remaining gaps +- [ ] `[stretch]` Consistent animation system across all interactive elements +- [ ] `[stretch]` Component documentation / storybook equivalent +- [ ] `[stretch]` Responsive design audit for mobile/tablet +- [ ] `[stretch]` Accessibility audit (keyboard navigation, screen reader, ARIA labels) +- [ ] `[stretch]` Loading skeleton coverage for all data-fetching pages + +--- + +# CURRENT STATUS SUMMARY + +**What is built:** +- 34 runtime modules in `lib/opentrust/` +- 21 pages across 7 route groups +- 7 API endpoints +- 42 UI primitives + 27 domain components +- 15+ database tables with full schema +- Hybrid search (FTS5 + sqlite-vec + memory entries) +- Complete memory lifecycle (create → review → approve → archive → rollback) +- Auth, rate limiting, CSRF, audit logging +- PWA with offline support +- Comprehensive demo mode + +**What is notably absent:** +- Desktop application (Tauri) — zero implementation +- Test coverage — 1 test file with 23 assertions out of 34 modules +- Real embedding models — current vectors are deterministic char-code-based +- Structured logging / observability +- Performance benchmarking + +--- + +# NEXT PHASES + +## Phase 10 — Retrieval quality and ranking `NOT STARTED` + +**Goal:** Make search results defensible, explainable, and operationally useful — not just present. + +Current search is functional but unsophisticated: memory entries are found via case-insensitive substring matching, FTS5 provides lexical relevance, and semantic fallback uses basic char-code embeddings. This phase transforms retrieval from "it works" to "operators trust it." + +### Required +- [ ] `[required]` Provenance-weighted ranking: approved > reviewed > draft; pinned > longTerm > working > ephemeral +- [ ] `[required]` Multi-origin boost: entries backed by multiple evidence sources rank higher +- [ ] `[required]` Freshness decay: stale entries rank lower unless pinned +- [ ] `[required]` `whyMatched` explanations populated with real ranking signals (matched terms, ranking factors, explanation text) +- [ ] `[required]` Confidence band calculation from evidence breadth, provenance strength, and recency +- [ ] `[required]` Unified result shape across all search paths (FTS5, semantic, memory) with consistent metadata + +### Optional +- [ ] `[optional]` Retrieval quality evaluation harness — test queries with expected results to measure ranking changes +- [ ] `[optional]` Search result deduplication across evidence types (same artifact appearing as trace ref and memory entry) +- [ ] `[optional]` Scoped search: filter by time range, source type, tags, retention class from the UI +- [ ] `[optional]` Search result pagination (currently capped at 12) + +### Stretch — professional grade +- [ ] `[stretch]` Real embedding model integration (Ollama local, or configurable provider) replacing char-code vectors +- [ ] `[stretch]` Hybrid retrieval fusion: combine FTS5 BM25 scores with vector cosine similarity using reciprocal rank fusion +- [ ] `[stretch]` Query expansion: automatic synonym/related-term broadening for sparse queries +- [ ] `[stretch]` Retrieval latency tracking and performance dashboard +- [ ] `[stretch]` A/B ranking comparison tool for operators to evaluate ranking changes side by side +- [ ] `[stretch]` Saved search presets with notification on new matching evidence + +--- + +## Phase 11 — Testing and reliability `NOT STARTED` + +**Goal:** Establish the test coverage and reliability foundation that a professional-grade application requires. + +Current state: 1 test file (`agent-session.test.ts`) with 23 assertions. Zero tests for database operations, memory CRUD, API endpoints, search, or ingestion. This is the single largest gap between "it works on my machine" and "it is trustworthy infrastructure." + +### Required +- [ ] `[required]` Database integration tests: schema creation, migration idempotency, query helpers +- [ ] `[required]` Memory lifecycle tests: create → update → review → archive → rollback → version history +- [ ] `[required]` API endpoint tests: promote, search, inspect, health — happy path and error cases +- [ ] `[required]` Search pipeline tests: FTS5, semantic fallback, memory entry search, hybrid merge +- [ ] `[required]` Ingestion tests: session import, cron import, cursor deduplication, artifact extraction +- [ ] `[required]` Auth tests: token mode, password mode, localhost bypass, rate limiting, CSRF +- [ ] `[required]` CI pipeline: `pnpm test` in GitHub Actions on every PR + +### Optional +- [ ] `[optional]` Retention policy tests: age-out, overflow purge, archive thresholds +- [ ] `[optional]` API contract validation tests: request parsing, enum enforcement, error shapes +- [ ] `[optional]` Demo mode tests: all demo data generators return valid shapes +- [ ] `[optional]` Health signal tests: stale pipeline detection, review debt calculation + +### Stretch — professional grade +- [ ] `[stretch]` Coverage reporting with minimum threshold enforcement (80%+ for `lib/opentrust/`) +- [ ] `[stretch]` Performance regression tests: search latency, ingestion throughput, memory operation timing +- [ ] `[stretch]` Snapshot tests for API response shapes to catch accidental contract drift +- [ ] `[stretch]` Load testing: concurrent reads/writes, large database scenarios (10k+ traces, 5k+ memory entries) +- [ ] `[stretch]` Fuzz testing for search query escaping and SQL injection prevention +- [ ] `[stretch]` E2E browser tests for critical flows (promote → review → approve) + +--- + +## Phase 12 — Insight and analytics layer `NOT STARTED` + +**Goal:** Derive actionable intelligence from the evidence and memory substrate. + +### Required +- [ ] `[required]` Richer artifact normalization (deduplicate similar URLs, canonicalize repo references) +- [ ] `[required]` Workflow analytics: success/failure rates, duration trends, step-level bottleneck identification +- [ ] `[required]` Operator metrics: ingestion freshness trends, memory growth rate, review throughput +- [ ] `[required]` Evidence-backed insight surfaces (not generated summaries — aggregated patterns with links to source data) + +### Optional +- [ ] `[optional]` Event drift detection: alert when workflow patterns change significantly +- [ ] `[optional]` Coverage analysis: what percentage of traces have associated memory entries +- [ ] `[optional]` Tag co-occurrence analysis for memory entries +- [ ] `[optional]` Timeline view of evidence density by source and type + +### Stretch — professional grade +- [ ] `[stretch]` Trend detection with statistical significance (not just "this number went up") +- [ ] `[stretch]` Customizable dashboard widgets: operators choose which metrics to surface +- [ ] `[stretch]` Anomaly detection on ingestion patterns (unexpected gaps, volume spikes) +- [ ] `[stretch]` Comparative analysis: side-by-side view of two time periods or two workflow runs +- [ ] `[stretch]` Export analytics as structured reports (PDF/CSV) for stakeholder review + +--- + +## Phase 13 — Memory health and operator trust `NOT STARTED` + +**Goal:** Make the reliability of the memory system visible and actionable. + +Current state: basic health signals exist (stale pipelines, review debt, weak provenance count). This phase turns health monitoring from a status page into an operational trust surface. + +### Required +- [ ] `[required]` Stale pipeline detection with configurable thresholds (currently hardcoded to 24h) +- [ ] `[required]` Missing-index / degraded-search indicators visible on the dashboard +- [ ] `[required]` Review debt tracking: how many draft entries, how old, what's the review velocity +- [ ] `[required]` Backup and export tooling: full database export, memory bundle export (already partially implemented via `memoryExportBundle`) +- [ ] `[required]` Evidence preservation: ability to snapshot the current state for audit purposes + +### Optional +- [ ] `[optional]` Health trend tracking over time (not just current state) +- [ ] `[optional]` Configurable alert thresholds per health signal +- [ ] `[optional]` Audit log viewer in the UI (auth events, memory state changes) +- [ ] `[optional]` Database size and growth monitoring + +### Stretch — professional grade +- [ ] `[stretch]` Structured logging throughout the application (JSON-formatted, with correlation IDs) +- [ ] `[stretch]` Health check API endpoint for external monitoring integration +- [ ] `[stretch]` Scheduled health reports (daily/weekly digest) +- [ ] `[stretch]` Data integrity checks: orphaned origins, broken references, duplicate entries +- [ ] `[stretch]` Graceful degradation modes: read-only mode when disk is full, search-only when ingestion fails +- [ ] `[stretch]` Reproducible report bundles: package evidence + memory + health state for external review + +--- + +## Phase 14 — Authoring and investigation power `NOT STARTED` + +**Goal:** Give operators direct, powerful tools for working with the memory layer. + +### Required +- [ ] `[required]` Edit and save custom investigations from the UI (currently read-only SQL preview) +- [ ] `[required]` SQL / investigation studio with syntax highlighting and result visualization +- [ ] `[required]` Investigation history: track which queries were run and their results +- [ ] `[required]` Richer capability explorer surface with search and filtering + +### Optional +- [ ] `[optional]` Repeatable investigation presets (save a query template, run it with different parameters) +- [ ] `[optional]` Investigation sharing: export an investigation as a portable bundle +- [ ] `[optional]` Quick note creation from any evidence surface (partial: `quick-note-dialog.tsx` component exists) +- [ ] `[optional]` Bulk memory operations: approve/reject/archive multiple entries at once + +### Stretch — professional grade +- [ ] `[stretch]` Visual query builder for operators who don't write SQL +- [ ] `[stretch]` Timeline-driven analysis tools: scrub through evidence chronologically +- [ ] `[stretch]` Comparative investigation: diff two investigations side by side +- [ ] `[stretch]` Investigation collaboration: annotate results, share findings +- [ ] `[stretch]` Natural language to SQL: describe what you're looking for, get a query +- [ ] `[stretch]` Keyboard-driven workflow: power users can navigate entirely without mouse +- [ ] `[stretch]` Command palette integration for investigations (`command-palette.tsx` component exists) + +--- + +## Phase 15 — Official OpenClaw integration standard `NOT STARTED` + +**Goal:** Expose OpenTrust as OpenClaw's official memory infrastructure with stable contracts. + +### Required +- [ ] `[required]` Stable memory API contracts finalized and versioned (v1) +- [ ] `[required]` Agent-facing retrieval interface: agents can search, inspect, and consume memory +- [ ] `[required]` Memory writeback interface: agents can propose memory entries (defaults to draft) +- [ ] `[required]` Plugin entrypoint definition under OpenClaw plugin conventions +- [ ] `[required]` Config schema for OpenClaw integration (paths, auth, capabilities) + +### Optional +- [ ] `[optional]` Integration with OpenClaw operator surfaces (if they exist) +- [ ] `[optional]` Tool wrapper definitions for OpenClaw tool protocol +- [ ] `[optional]` Plugin packaging artifacts (manifest, entrypoint, capabilities declaration) +- [ ] `[optional]` Contract versioning and deprecation policy + +### Stretch — professional grade +- [ ] `[stretch]` Plugin discovery and self-registration in OpenClaw runtime +- [ ] `[stretch]` E2E integration tests: plugin loads, routes respond, tools execute, memory persists +- [ ] `[stretch]` OpenAPI / JSON Schema documentation auto-generated from TypeScript types +- [ ] `[stretch]` SDK package: `@openclaw/memory-client` for programmatic access +- [ ] `[stretch]` Rate limiting and quota enforcement for agent-initiated memory operations +- [ ] `[stretch]` Multi-tenant support: isolated memory spaces per agent/project + +--- + +## Phase 16 — Desktop application `NOT STARTED` + +**Goal:** Wrap OpenTrust in a native desktop application using Tauri v2. + +**Note:** This was originally Phase 8.5 (parallel with memory work). It has been renumbered to reflect that it has zero implementation and is no longer an upstream prerequisite — the memory layer was built without it. + +Primary design doc: `docs/DESKTOP-APPLICATION-PLAN.md` + +### Tier 1 — Minimal desktop app `[required]` +- [ ] `[required]` Tauri v2 project scaffold (`src-tauri/`) +- [ ] `[required]` Next.js static export configuration (`output: 'export'`) +- [ ] `[required]` Tauri window configuration (title, size, platform targets) +- [ ] `[required]` Sidecar Node process for the memory runtime +- [ ] `[required]` Desktop-appropriate database path resolution (`$APPDATA` / `~/Library/Application Support`) +- [ ] `[required]` Basic macOS `.dmg` build +- [ ] `[required]` Basic Windows `.msi` / `.exe` build +- [ ] `[required]` Basic Linux `.AppImage` / `.deb` build + +### Tier 2 — Native integrations `[optional]` +- [ ] `[optional]` System tray icon with health status indicator +- [ ] `[optional]` Tray menu: open dashboard, run ingestion, check health, quit +- [ ] `[optional]` Native file dialog for database selection / backup export +- [ ] `[optional]` OS notifications for ingestion completion and health alerts +- [ ] `[optional]` Global keyboard shortcut to open OpenTrust +- [ ] `[optional]` Deep link support (`opentrust://` protocol) + +### Tier 3 — Distribution `[stretch]` +- [ ] `[stretch]` Auto-update via `tauri-plugin-updater` +- [ ] `[stretch]` Code signing for macOS (Developer ID) and Windows (Authenticode) +- [ ] `[stretch]` macOS notarization +- [ ] `[stretch]` CI pipeline for cross-platform builds (GitHub Actions) +- [ ] `[stretch]` First-run onboarding flow (database setup, OpenClaw path detection) +- [ ] `[stretch]` Graceful migration from localhost usage to desktop app + +### Tier 4 — Advanced native `[stretch]` +- [ ] `[stretch]` Background ingestion service (runs via tray even when window is closed) +- [ ] `[stretch]` Scheduled ingestion (hourly/daily) +- [ ] `[stretch]` Native menu bar with keyboard shortcuts +- [ ] `[stretch]` Multi-window support (detach trace detail, investigation, or health views) +- [ ] `[stretch]` Spotlight / Alfred integration on macOS + +--- + +## Phase 17 — UX and design polish `NOT STARTED` + +**Goal:** Elevate the interface from functional to professional-grade — the kind of tool operators enjoy using. + +This phase is not about adding features. It is about making existing features feel refined, fast, and trustworthy through interaction quality. + +### Required +- [ ] `[required]` Page loading states: skeleton screens for all data-fetching pages (component exists, needs coverage) +- [ ] `[required]` Error boundaries: graceful error states instead of blank screens on failure +- [ ] `[required]` Empty states: meaningful guidance when there's no data (component exists, needs audit for coverage) +- [ ] `[required]` Responsive layout: sidebar collapse, mobile-friendly tables, touch-friendly controls +- [ ] `[required]` Consistent typography scale and spacing system + +### Optional +- [ ] `[optional]` Keyboard navigation audit: all interactive elements reachable via keyboard +- [ ] `[optional]` Focus management: logical focus flow on page transitions and dialog open/close +- [ ] `[optional]` Toast/notification consistency: all user actions produce visible feedback +- [ ] `[optional]` Dark/light mode polish (theme toggle exists, verify all surfaces) +- [ ] `[optional]` Page transition animations (smooth, not distracting) +- [ ] `[optional]` Breadcrumb navigation for deep routes (component exists) + +### Stretch — professional grade +- [ ] `[stretch]` Micro-interactions: hover states, press feedback, subtle transitions on every interactive element +- [ ] `[stretch]` Accessibility compliance: WCAG 2.1 AA — screen reader tested, contrast verified, ARIA complete +- [ ] `[stretch]` Performance perception: optimistic updates, instant navigation via prefetching +- [ ] `[stretch]` Data visualization polish: chart animations, interactive tooltips, contextual annotations +- [ ] `[stretch]` Onboarding improvements: contextual tips, feature discovery, progressive disclosure +- [ ] `[stretch]` Print-friendly views for reports and investigation results +- [ ] `[stretch]` Custom theme support: operator-configurable accent colors and density +- [ ] `[stretch]` Sound design: optional subtle audio feedback for key actions (promote, approve, alert) + +--- + +# EVALUATION + +## What is strong + +The core local-first evidence platform is complete end-to-end: ingest, normalize, store, search, inspect, promote, review, and archive. The implementation quality across 34 runtime modules is consistently production-ready — no stubs, no TODOs, no placeholder logic. The memory lifecycle (create → version → review → approve → archive → rollback) is sophisticated and fully operational. The type system is comprehensive and maps cleanly to the API contract documents. The design system is broad (42 primitives + 27 domain components) and the UI covers all primary workflows. Authentication, rate limiting, CSRF protection, and audit logging are all implemented with correct security patterns. + +## What now matters most + +1. **Testing** — The single largest gap. One test file out of 34 modules is not acceptable for infrastructure that operators are meant to trust. Phase 11 is the highest-leverage next investment. + +2. **Retrieval quality** — Search works but doesn't rank. Ranking is what separates a search box from a memory system. Phase 10 turns retrieval from "present" to "useful." + +3. **Observability** — No structured logging, no performance metrics, no request tracing. When something goes wrong in production, there is currently no way to diagnose it without reading code. + +4. **Real embeddings** — The char-code embedding function is a correct placeholder but produces vectors with no semantic meaning. Real embedding models are needed for vector search to deliver value beyond FTS5. + +## Deviations from original plan + +1. **Phase 9 was built before prerequisites were met.** The execution pipeline recommended deferring memory-layer expansion until Tauri, persistence, and reliable run completion were stable. Phase 9 was implemented anyway. The result is solid, and the gating advice should be considered superseded. + +2. **Desktop application (originally Phase 8.5) has zero implementation.** It was framed as a parallel workstream that "can proceed now" but no work was done. Renumbered to Phase 16 to reflect its actual priority position. + +3. **Memory table migrations are runtime-based, not file-based.** The Phase 9 implementation plan specified `db/` migration files. The actual implementation uses `ensureMigrated()` in `db.ts`. This works but diverges from the documented approach. + +4. **Cross-cutting concerns were not originally phased.** Auth, demo mode, PWA support, and the design system were built incrementally across phases. They are now documented as explicit cross-cutting sections for clarity. + +5. **Phase ordering shifted.** The original plan interleaved desktop work with memory work. The revised ordering reflects what was actually built and what the next highest-leverage investments are: testing (11) before analytics (12) before desktop (16). ## Delivery discipline -The roadmap should now be executed under two companion documents: -- `docs/EXECUTION-PIPELINE.md` — defines the recommended PR sequence, prerequisite ordering, verification discipline, and delivery posture -- `docs/SCOPE-GUARDRAILS.md` — defines what is in scope, what is out of scope for now, and how to recognize scope creep +The roadmap should be executed under two companion documents: +- `docs/EXECUTION-PIPELINE.md` — defines PR sequence, verification discipline, and delivery posture +- `docs/SCOPE-GUARDRAILS.md` — defines what is in scope, what is out of scope, and how to recognize scope creep -These documents exist to keep the roadmap professionally executable instead of conceptually expansive. +These remain the governing discipline documents. Note that the prerequisite gating advice in EXECUTION-PIPELINE.md has been partially superseded by the Phase 9 implementation, but the pipeline rules, anti-patterns, and quality standards remain fully applicable. ## Decision