diff --git a/CLAUDE.md b/CLAUDE.md
index 8bcdcef90..f5d6c64f4 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -177,6 +177,18 @@ pnpm --filter @roast/web run test:ci  # MUST actually run, not assume
 # TypeScript compiles ≠ tests pass
 ```
 
+### Development Workflow: Modify → Check Loop
+
+When making code changes, especially to internal packages (`@roast/ai`, `@roast/db`, `@roast/domain`, `@roast/jobs`), follow this verification loop before committing:
+
+1. **After modifying internal packages**: Run `pnpm turbo run typecheck` (not just `pnpm --filter @roast/web typecheck`). Turbo handles the dependency graph—it rebuilds packages first, then typechecks consumers with fresh `.d.ts` files. This mimics CI's clean-build behavior.
+
+2. **After modifying web app only**: Run `pnpm --filter @roast/web run typecheck && pnpm --filter @roast/web run lint`.
+
+3. **Before pushing**: Always run the full check: `pnpm turbo run typecheck lint --parallel`. This catches cross-package type errors that per-package checks miss due to stale `dist/` folders.
+
+4. **Why this matters**: TypeScript project references use compiled `dist/` for type resolution. Local dev accumulates stale builds; CI starts fresh. If you see typecheck errors and assume they're "pre-existing," verify by rebuilding the source package first (`pnpm --filter @roast/ai run build`).
+
 ## Commands Quick Reference
 
 ### Development
@@ -186,6 +198,30 @@ pnpm --filter @roast/db run gen       # Generate Prisma client
 pnpm --filter @roast/db run db:push   # Push schema changes
 ```
 
+### Dev Environment & Database Access (Primary)
+
+**🚨 MANDATORY: Use ONLY these scripts for database access - NO EXCEPTIONS 🚨**
+
+```bash
+dev/scripts/dev-env.sh start|stop|status|attach|restart  # Manage tmux dev session
+dev/scripts/dev-env.sh psql [args]                       # Connect to local DB via Docker
+```
+
+**FORBIDDEN database access methods** (DO NOT USE):
+```bash
+❌ psql -h localhost ...           # Direct psql - FORBIDDEN
+❌ PGPASSWORD=... psql ...         # Direct psql with password - FORBIDDEN
+❌ docker run ... postgres psql    # Docker-based psql - FORBIDDEN
+❌ Any other method                # If it's not dev-env.sh psql, DON'T USE IT
+```
+
+**Database utilities** (Docker-based, no local psql needed):
+- `dev/scripts/dev/db/lib/db_functions.sh` - Core DB functions (`psql_local`, `psql_prod`, `pg_dump_prod`, `copy_data`)
+- `dev/scripts/dev/db/lib/common_utils.sh` - Shared bash utilities
+- `dev/scripts/dev/db/setup_db.sh` - Example: sync prod schema to local
+
+**AI agents MUST use `dev/scripts/dev-env.sh psql` - no alternatives allowed.**
+
 ### Testing
 ```bash
 # Test categories by cost/dependencies:
@@ -203,6 +239,29 @@ pnpm --filter @roast/web run typecheck  # TypeScript
 # MUST run both - linter doesn't catch type errors!
 ```
 
+### 🚨 Type Definitions: NO INLINE TYPES 🚨
+
+**NEVER use inline types.** Always define named interfaces or type aliases.
+
+```typescript
+// ❌ WRONG - inline types
+function Foo({ data }: { data: string; count: number }) { }
+const [state, setState] = useState<{ loading: boolean; error?: string }>();
+let result: { success: boolean; value: number };
+
+// ✅ CORRECT - named types
+interface FooProps { data: string; count: number; }
+function Foo({ data }: FooProps) { }
+
+interface LoadingState { loading: boolean; error?: string; }
+const [state, setState] = useState<LoadingState>();
+
+interface Result { success: boolean; value: number; }
+let result: Result;
+```
+
+**Why:** Inline types hurt readability, reusability, and refactoring. Named types are self-documenting and can be exported/shared.
+
 ## MCP Server Quick Fix
 
 **Problem**: Claude Code caches MCP servers, changes don't take effect
@@ -278,6 +337,22 @@ Details here"
 /bin/rm, /bin/cat, /bin/echo  # Use full paths
 ```
 
+## Tmux Key Sending
+
+When sending multiple keystrokes to tmux sessions (e.g., navigating CLI menus), use a loop with delays between keystrokes instead of sending them all at once.
+
+**Bad** (keys may be dropped or processed incorrectly):
+```bash
+tmux send-keys -t session Down Down Down Down Down Enter
+```
+
+**Good** (reliable keystroke delivery):
+```bash
+for i in {1..5}; do tmux send-keys -t session Down; sleep 0.1; done; tmux send-keys -t session Enter
+```
+
+This ensures each keystroke is processed before the next is sent, preventing navigation issues in terminal UIs.
+
 ## Documentation Structure
 - `/dev/docs/README.md` - Documentation index
 - `/dev/docs/development/` - Development guides
diff --git a/apps/web/src/app/api/monitor/lab/baselines/[id]/route.ts b/apps/web/src/app/api/monitor/lab/baselines/[id]/route.ts
new file mode 100644
index 000000000..89ef3c167
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/baselines/[id]/route.ts
@@ -0,0 +1,28 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository } from "@roast/db";
+import type { RouteIdParams } from "../../types";
+
+export async function DELETE(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    await metaEvaluationRepository.deleteValidationBaseline(id);
+    return NextResponse.json({ success: true });
+  } catch (error) {
+    logger.error("Error deleting baseline:", error);
+    return commonErrors.serverError("Failed to delete baseline");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/baselines/route.ts b/apps/web/src/app/api/monitor/lab/baselines/route.ts
new file mode 100644
index 000000000..527ee1bc9
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/baselines/route.ts
@@ -0,0 +1,96 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository, prisma } from "@roast/db";
+
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const agentId = request.nextUrl.searchParams.get("agentId");
+  if (!agentId) {
+    return NextResponse.json({ error: "agentId is required" }, { status: 400 });
+  }
+
+  try {
+    const baselines = await metaEvaluationRepository.getValidationBaselines(agentId);
+    return NextResponse.json({ baselines });
+  } catch (error) {
+    logger.error("Error fetching baselines:", error);
+    return commonErrors.serverError("Failed to fetch baselines");
+  }
+}
+
+export async function POST(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  try {
+    const body = await request.json();
+    const { name, description, agentId, documentIds, evaluationVersionIds, beforeDate } = body;
+
+    if (!name || !agentId) {
+      return NextResponse.json(
+        { error: "name and agentId are required" },
+        { status: 400 }
+      );
+    }
+
+    // Get evaluation version IDs from document IDs if not provided directly
+    let evalVersionIds = evaluationVersionIds;
+    if (!evalVersionIds?.length && documentIds?.length) {
+      // Get the latest evaluation version for each document (optionally before cutoff date)
+      const evaluations = await prisma.evaluationVersion.findMany({
+        where: {
+          agentId,
+          evaluation: {
+            documentId: { in: documentIds },
+          },
+          ...(beforeDate ? { createdAt: { lt: new Date(beforeDate) } } : {}),
+        },
+        orderBy: { createdAt: "desc" },
+        select: {
+          id: true,
+          evaluation: { select: { documentId: true } },
+        },
+      });
+
+      // Keep only the latest version per document (before cutoff if specified)
+      const latestByDoc = new Map<string, string>();
+      for (const ev of evaluations) {
+        if (!latestByDoc.has(ev.evaluation.documentId)) {
+          latestByDoc.set(ev.evaluation.documentId, ev.id);
+        }
+      }
+      evalVersionIds = Array.from(latestByDoc.values());
+    }
+
+    if (!evalVersionIds?.length) {
+      return NextResponse.json(
+        { error: "No evaluation versions found for the selected documents" },
+        { status: 400 }
+      );
+    }
+
+    const baseline = await metaEvaluationRepository.createValidationBaseline({
+      name,
+      description,
+      agentId,
+      evaluationVersionIds: evalVersionIds,
+      createdById: userId,
+    });
+
+    return NextResponse.json({ baseline });
+  } catch (error) {
+    logger.error("Error creating baseline:", error);
+    return commonErrors.serverError("Failed to create baseline");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/corpus/route.ts b/apps/web/src/app/api/monitor/lab/corpus/route.ts
new file mode 100644
index 000000000..1c2336b2a
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/corpus/route.ts
@@ -0,0 +1,33 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository } from "@roast/db";
+
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const agentId = request.nextUrl.searchParams.get("agentId");
+  const filter = request.nextUrl.searchParams.get("filter") || undefined;
+  const limit = parseInt(request.nextUrl.searchParams.get("limit") || "500", 10);
+
+  if (!agentId) {
+    return NextResponse.json({ error: "agentId is required" }, { status: 400 });
+  }
+
+  try {
+    const documents = await metaEvaluationRepository.getValidationCorpusDocuments(agentId, {
+      filter,
+      limit,
+    });
+    return NextResponse.json({ documents });
+  } catch (error) {
+    logger.error("Error fetching corpus documents:", error);
+    return commonErrors.serverError("Failed to fetch corpus documents");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/corpus/snapshots/route.ts b/apps/web/src/app/api/monitor/lab/corpus/snapshots/route.ts
new file mode 100644
index 000000000..50a5fcae3
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/corpus/snapshots/route.ts
@@ -0,0 +1,38 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository } from "@roast/db";
+
+// Get evaluation snapshots for a set of documents (used when creating baselines)
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const agentId = request.nextUrl.searchParams.get("agentId");
+  const documentIdsParam = request.nextUrl.searchParams.get("documentIds");
+
+  if (!agentId || !documentIdsParam) {
+    return NextResponse.json(
+      { error: "agentId and documentIds are required" },
+      { status: 400 }
+    );
+  }
+
+  const documentIds = documentIdsParam.split(",").filter(Boolean);
+  if (documentIds.length === 0) {
+    return NextResponse.json({ error: "documentIds cannot be empty" }, { status: 400 });
+  }
+
+  try {
+    const snapshots = await metaEvaluationRepository.getEvaluationSnapshots(documentIds, agentId);
+    return NextResponse.json({ snapshots });
+  } catch (error) {
+    logger.error("Error fetching evaluation snapshots:", error);
+    return commonErrors.serverError("Failed to fetch evaluation snapshots");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/corpus/versions/route.ts b/apps/web/src/app/api/monitor/lab/corpus/versions/route.ts
new file mode 100644
index 000000000..b50f17daa
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/corpus/versions/route.ts
@@ -0,0 +1,56 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma } from "@roast/db";
+
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const agentId = request.nextUrl.searchParams.get("agentId");
+  const documentId = request.nextUrl.searchParams.get("documentId");
+  const beforeDate = request.nextUrl.searchParams.get("beforeDate");
+
+  if (!agentId || !documentId) {
+    return NextResponse.json(
+      { error: "agentId and documentId are required" },
+      { status: 400 }
+    );
+  }
+
+  try {
+    const versions = await prisma.evaluationVersion.findMany({
+      where: {
+        agentId,
+        evaluation: { documentId },
+        ...(beforeDate ? { createdAt: { lt: new Date(beforeDate) } } : {}),
+      },
+      orderBy: { createdAt: "desc" },
+      select: {
+        id: true,
+        createdAt: true,
+        grade: true,
+        summary: true,
+        version: true,
+      },
+    });
+
+    return NextResponse.json({
+      versions: versions.map((v) => ({
+        id: v.id,
+        createdAt: v.createdAt.toISOString(),
+        grade: v.grade,
+        summary: v.summary,
+        version: v.version,
+      })),
+    });
+  } catch (error) {
+    logger.error("Error fetching evaluation versions:", error);
+    return commonErrors.serverError("Failed to fetch evaluation versions");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/evaluations/route.ts b/apps/web/src/app/api/monitor/lab/evaluations/route.ts
new file mode 100644
index 000000000..98b53cf62
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/evaluations/route.ts
@@ -0,0 +1,135 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { prisma, Prisma } from "@roast/db";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+
+/**
+ * GET /api/monitor/lab/evaluations
+ *
+ * Fetches recent evaluations with their pipeline telemetry for the Lab UI.
+ * Admin-only endpoint.
+ *
+ * Query params:
+ * - limit: number of evaluations to fetch (default 20, max 100)
+ * - agentId: optional filter by agent ID
+ */
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) {
+    return commonErrors.unauthorized();
+  }
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) {
+    return commonErrors.forbidden();
+  }
+
+  const { searchParams } = new URL(request.url);
+  const limit = Math.min(parseInt(searchParams.get("limit") || "20"), 100);
+  const agentId = searchParams.get("agentId");
+
+  try {
+    // Build where clause
+    const where: Prisma.EvaluationVersionWhereInput = {
+      pipelineTelemetry: { not: Prisma.JsonNull },
+      ...(agentId && { evaluation: { agentId } }),
+    };
+
+    // Get recent evaluation versions with telemetry
+    const versions = await prisma.evaluationVersion.findMany({
+      where,
+      take: limit,
+      orderBy: { createdAt: "desc" },
+      select: {
+        id: true,
+        version: true,
+        grade: true,
+        summary: true,
+        createdAt: true,
+        pipelineTelemetry: true,
+        evaluation: {
+          select: {
+            id: true,
+            document: {
+              select: {
+                id: true,
+                versions: {
+                  select: { title: true },
+                  orderBy: { version: "desc" as const },
+                  take: 1,
+                },
+              },
+            },
+            agent: {
+              select: {
+                id: true,
+                versions: {
+                  select: { name: true },
+                  orderBy: { version: "desc" as const },
+                  take: 1,
+                },
+              },
+            },
+          },
+        },
+        comments: {
+          select: {
+            id: true,
+            header: true,
+            description: true,
+            importance: true,
+            highlight: {
+              select: {
+                quotedText: true,
+              },
+            },
+          },
+        },
+      },
+    });
+
+    // Transform to Lab UI format
+    const evaluations = versions.map((v) => {
+      const telemetry = v.pipelineTelemetry as Record<string, unknown> | null;
+
+      return {
+        id: v.id,
+        evaluationId: v.evaluation.id,
+        version: v.version,
+        grade: v.grade,
+        summary: v.summary,
+        createdAt: v.createdAt.toISOString(),
+        documentId: v.evaluation.document.id,
+        documentTitle: v.evaluation.document.versions[0]?.title || "Untitled",
+        agentId: v.evaluation.agent.id,
+        agentName: v.evaluation.agent.versions[0]?.name || "Unknown Agent",
+        comments: v.comments.map((c) => ({
+          id: c.id,
+          header: c.header,
+          description: c.description,
+          importance: c.importance,
+          quotedText: c.highlight.quotedText || "",
+        })),
+        // Telemetry data (matches ComparisonData structure)
+        telemetry: telemetry ? {
+          stages: telemetry.stages,
+          extractionPhase: telemetry.extractionPhase,
+          filteredItems: telemetry.filteredItems,
+          passedItems: telemetry.passedItems,
+          pipelineCounts: telemetry.finalCounts,
+          totalDurationMs: telemetry.totalDurationMs,
+          totalCostUsd: telemetry.totalCostUsd,
+          documentLength: telemetry.documentLength,
+          profileInfo: telemetry.profileInfo,
+        } : null,
+      };
+    });
+
+    return NextResponse.json({ evaluations });
+  } catch (error) {
+    logger.error("Error fetching evaluations with telemetry:", error);
+    return commonErrors.serverError("Failed to fetch evaluations");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/jobs/status/route.ts b/apps/web/src/app/api/monitor/lab/jobs/status/route.ts
new file mode 100644
index 000000000..142187e32
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/jobs/status/route.ts
@@ -0,0 +1,59 @@
+import { NextRequest, NextResponse } from "next/server";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma } from "@roast/db";
+
+/**
+ * Get status of multiple jobs by ID
+ */
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const jobIdsParam = request.nextUrl.searchParams.get("jobIds");
+  if (!jobIdsParam) {
+    return NextResponse.json({ error: "jobIds is required" }, { status: 400 });
+  }
+
+  const jobIds = jobIdsParam.split(",").filter(Boolean);
+  if (jobIds.length === 0) {
+    return NextResponse.json({ error: "jobIds cannot be empty" }, { status: 400 });
+  }
+
+  try {
+    const jobs = await prisma.job.findMany({
+      where: { id: { in: jobIds } },
+      select: {
+        id: true,
+        status: true,
+        evaluationVersionId: true,
+        error: true,
+      },
+    });
+
+    const completed = jobs.filter((j) => j.status === "COMPLETED").length;
+    const failed = jobs.filter((j) => j.status === "FAILED").length;
+    const pending = jobs.filter((j) => j.status === "PENDING").length;
+    const running = jobs.filter((j) => j.status === "RUNNING").length;
+
+    const allDone = completed + failed === jobs.length;
+
+    return NextResponse.json({
+      jobs,
+      summary: {
+        total: jobs.length,
+        completed,
+        failed,
+        pending,
+        running,
+        allDone,
+      },
+    });
+  } catch {
+    return commonErrors.serverError("Failed to get job status");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/model-endpoints/route.ts b/apps/web/src/app/api/monitor/lab/model-endpoints/route.ts
new file mode 100644
index 000000000..2c49d934c
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/model-endpoints/route.ts
@@ -0,0 +1,33 @@
+import { NextRequest, NextResponse } from "next/server";
+
+export async function GET(request: NextRequest) {
+  const modelId = request.nextUrl.searchParams.get("model");
+
+  if (!modelId) {
+    return NextResponse.json({ error: "model parameter required" }, { status: 400 });
+  }
+
+  try {
+    // Don't encode the full modelId - OpenRouter expects the / as part of the path
+    // e.g., /models/z-ai/glm-4.7/endpoints not /models/z-ai%2Fglm-4.7/endpoints
+    const response = await fetch(
+      `https://openrouter.ai/api/v1/models/${modelId}/endpoints`
+    );
+
+    if (!response.ok) {
+      return NextResponse.json(
+        { error: `OpenRouter API error: ${response.status}` },
+        { status: response.status }
+      );
+    }
+
+    const data = await response.json();
+    return NextResponse.json(data);
+  } catch (error) {
+    console.error("Failed to fetch model endpoints:", error);
+    return NextResponse.json(
+      { error: "Failed to fetch model endpoints" },
+      { status: 500 }
+    );
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/models/route.ts b/apps/web/src/app/api/monitor/lab/models/route.ts
new file mode 100644
index 000000000..58716f8ec
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/models/route.ts
@@ -0,0 +1,26 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { getAllModels } from "@roast/ai";
+
+/**
+ * GET /api/monitor/lab/models
+ * Fetch all available models from Anthropic + OpenRouter
+ */
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  try {
+    const models = await getAllModels();
+    return NextResponse.json({ models });
+  } catch (error) {
+    logger.error("Error fetching models:", error);
+    return commonErrors.serverError("Failed to fetch models");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/profiles/[id]/route.ts b/apps/web/src/app/api/monitor/lab/profiles/[id]/route.ts
new file mode 100644
index 000000000..bc9db5eaf
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/profiles/[id]/route.ts
@@ -0,0 +1,151 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma } from "@roast/db";
+import type { RouteIdParams } from "../../types";
+
+/**
+ * GET /api/monitor/lab/profiles/[id]
+ * Get a single profile by ID
+ */
+export async function GET(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    const profile = await prisma.fallacyCheckerProfile.findUnique({
+      where: { id },
+    });
+
+    if (!profile) {
+      return NextResponse.json({ error: "Profile not found" }, { status: 404 });
+    }
+
+    return NextResponse.json({ profile });
+  } catch (error) {
+    logger.error("Error fetching profile:", error);
+    return commonErrors.serverError("Failed to fetch profile");
+  }
+}
+
+/**
+ * PUT /api/monitor/lab/profiles/[id]
+ * Update a profile
+ */
+export async function PUT(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    const body = await request.json();
+    const { name, description, config, isDefault } = body;
+
+    // Check profile exists
+    const existing = await prisma.fallacyCheckerProfile.findUnique({
+      where: { id },
+    });
+
+    if (!existing) {
+      return NextResponse.json({ error: "Profile not found" }, { status: 404 });
+    }
+
+    // Check for duplicate name (excluding current profile)
+    if (name && name !== existing.name) {
+      const duplicate = await prisma.fallacyCheckerProfile.findFirst({
+        where: {
+          agentId: existing.agentId,
+          name,
+          id: { not: id },
+        },
+      });
+
+      if (duplicate) {
+        return NextResponse.json(
+          { error: "A profile with this name already exists" },
+          { status: 400 }
+        );
+      }
+    }
+
+    // If setting as default, unset other defaults first
+    if (isDefault && !existing.isDefault) {
+      await prisma.fallacyCheckerProfile.updateMany({
+        where: { agentId: existing.agentId, isDefault: true, id: { not: id } },
+        data: { isDefault: false },
+      });
+    }
+
+    const profile = await prisma.fallacyCheckerProfile.update({
+      where: { id },
+      data: {
+        ...(name !== undefined && { name }),
+        ...(description !== undefined && { description }),
+        ...(config !== undefined && { config }),
+        ...(isDefault !== undefined && { isDefault }),
+      },
+    });
+
+    logger.info("Profile updated", { profileId: id });
+
+    return NextResponse.json({ profile });
+  } catch (error) {
+    logger.error("Error updating profile:", error);
+    return commonErrors.serverError("Failed to update profile");
+  }
+}
+
+/**
+ * DELETE /api/monitor/lab/profiles/[id]
+ * Delete a profile
+ */
+export async function DELETE(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    const existing = await prisma.fallacyCheckerProfile.findUnique({
+      where: { id },
+    });
+
+    if (!existing) {
+      return NextResponse.json({ error: "Profile not found" }, { status: 404 });
+    }
+
+    await prisma.fallacyCheckerProfile.delete({
+      where: { id },
+    });
+
+    logger.info("Profile deleted", { profileId: id });
+
+    return NextResponse.json({ success: true });
+  } catch (error) {
+    logger.error("Error deleting profile:", error);
+    return commonErrors.serverError("Failed to delete profile");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/profiles/route.ts b/apps/web/src/app/api/monitor/lab/profiles/route.ts
new file mode 100644
index 000000000..da5e2308e
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/profiles/route.ts
@@ -0,0 +1,135 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma } from "@roast/db";
+
+/**
+ * GET /api/monitor/lab/profiles
+ * List all profiles for an agent
+ */
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const agentId = request.nextUrl.searchParams.get("agentId");
+  if (!agentId) {
+    return NextResponse.json({ error: "agentId is required" }, { status: 400 });
+  }
+
+  try {
+    const profiles = await prisma.fallacyCheckerProfile.findMany({
+      where: { agentId },
+      orderBy: [
+        { isDefault: "desc" },
+        { name: "asc" },
+      ],
+    });
+
+    return NextResponse.json({ profiles });
+  } catch (error) {
+    logger.error("Error fetching profiles:", error);
+    return commonErrors.serverError("Failed to fetch profiles");
+  }
+}
+
+/**
+ * POST /api/monitor/lab/profiles
+ * Create a new profile
+ */
+export async function POST(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  try {
+    const body = await request.json();
+    const { name, description, agentId, config, isDefault } = body;
+
+    if (!name || !agentId) {
+      return NextResponse.json(
+        { error: "name and agentId are required" },
+        { status: 400 }
+      );
+    }
+
+    // Check for duplicate name
+    const existing = await prisma.fallacyCheckerProfile.findFirst({
+      where: { agentId, name },
+    });
+
+    if (existing) {
+      return NextResponse.json(
+        { error: "A profile with this name already exists" },
+        { status: 400 }
+      );
+    }
+
+    // If setting as default, unset other defaults first
+    if (isDefault) {
+      await prisma.fallacyCheckerProfile.updateMany({
+        where: { agentId, isDefault: true },
+        data: { isDefault: false },
+      });
+    }
+
+    const profile = await prisma.fallacyCheckerProfile.create({
+      data: {
+        name,
+        description: description ?? null,
+        agentId,
+        config: config ?? getDefaultConfig(),
+        isDefault: isDefault ?? false,
+      },
+    });
+
+    logger.info("Profile created", { profileId: profile.id, name, agentId });
+
+    return NextResponse.json({ profile });
+  } catch (error) {
+    logger.error("Error creating profile:", error);
+    return commonErrors.serverError("Failed to create profile");
+  }
+}
+
+/**
+ * Default profile configuration - matches the real fallacy checker defaults
+ * Uses the NEW filterChain array format (not the old { filters: [...] } format)
+ */
+function getDefaultConfig() {
+  return {
+    version: 1,
+    models: {
+      extractors: [
+        { model: "claude-sonnet-4-5-20250929", temperature: 0, thinking: false },
+        { model: "google/gemini-3-flash-preview", temperature: "default", thinking: true },
+        { model: "google/gemini-2.5-flash", temperature: "default", thinking: true },
+      ],
+      judge: {
+        model: "claude-sonnet-4-5-20250929",
+        enabled: false,
+      },
+    },
+    thresholds: {
+      minSeverityThreshold: 60,
+      maxIssues: 15,
+      dedupThreshold: 0.7,
+      maxIssuesToProcess: 25,
+    },
+    filterChain: [
+      {
+        id: "default-supported-elsewhere",
+        type: "supported-elsewhere",
+        enabled: true,
+        model: "claude-sonnet-4-5-20250929",
+        temperature: 0.1,
+      },
+    ],
+  };
+}
diff --git a/apps/web/src/app/api/monitor/lab/prompts/route.ts b/apps/web/src/app/api/monitor/lab/prompts/route.ts
new file mode 100644
index 000000000..d0a56dc55
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/prompts/route.ts
@@ -0,0 +1,22 @@
+import { NextResponse } from "next/server";
+import {
+  DEFAULT_EXTRACTOR_SYSTEM_PROMPT,
+  DEFAULT_EXTRACTOR_USER_PROMPT,
+} from "@roast/ai/fallacy-extractor/prompts";
+import { DEFAULT_JUDGE_SYSTEM_PROMPT } from "@roast/ai/fallacy-judge/prompts";
+import { DEFAULT_SUPPORTED_ELSEWHERE_SYSTEM_PROMPT } from "@roast/ai/supported-elsewhere-filter/prompts";
+
+/**
+ * GET /api/monitor/lab/prompts
+ *
+ * Returns the default prompts for the fallacy extractor, judge, and filter.
+ * Used by the profile editor UI to show placeholders.
+ */
+export function GET() {
+  return NextResponse.json({
+    extractorSystemPrompt: DEFAULT_EXTRACTOR_SYSTEM_PROMPT,
+    extractorUserPrompt: DEFAULT_EXTRACTOR_USER_PROMPT,
+    judgeSystemPrompt: DEFAULT_JUDGE_SYSTEM_PROMPT,
+    filterSystemPrompt: DEFAULT_SUPPORTED_ELSEWHERE_SYSTEM_PROMPT,
+  });
+}
diff --git a/apps/web/src/app/api/monitor/lab/runs/[id]/finalize/route.ts b/apps/web/src/app/api/monitor/lab/runs/[id]/finalize/route.ts
new file mode 100644
index 000000000..c4ba853e0
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/runs/[id]/finalize/route.ts
@@ -0,0 +1,278 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma, metaEvaluationRepository } from "@roast/db";
+import type { RouteIdParams } from "../../../types";
+
+interface CommentData {
+  id: string;
+  quotedText: string;
+  header: string | null;
+  description: string;
+  importance: number | null;
+  startOffset: number;
+  endOffset: number;
+}
+
+interface EvaluationSnapshot {
+  evaluationVersionId: string;
+  documentId: string;
+  comments: CommentData[];
+  pipelineTelemetry?: {
+    filteredItems?: unknown[];
+    extractionPhase?: unknown;
+    finalCounts?: {
+      issuesExtracted?: number;
+      issuesAfterDedup?: number;
+      issuesAfterFiltering?: number;
+      commentsGenerated?: number;
+      commentsKept?: number;
+    };
+  };
+}
+
+/**
+ * Finalize a validation run:
+ * 1. Get the new evaluation versions from completed jobs
+ * 2. Compare with baseline
+ * 3. Save comparison results
+ * 4. Update run status
+ */
+export async function POST(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id: runId } = await params;
+
+  try {
+    // Get the run
+    const run = await prisma.validationRun.findUnique({
+      where: { id: runId },
+      include: {
+        baseline: {
+          select: { id: true, agentId: true },
+        },
+      },
+    });
+
+    if (!run) {
+      return NextResponse.json({ error: "Run not found" }, { status: 404 });
+    }
+
+    if (run.status === "completed") {
+      return NextResponse.json({ error: "Run already finalized" }, { status: 400 });
+    }
+
+    // Get baseline snapshots
+    const baselineSnapshots = await metaEvaluationRepository.getBaselineSnapshots(run.baselineId);
+
+    if (baselineSnapshots.length === 0) {
+      await metaEvaluationRepository.updateValidationRunStatus(runId, "failed", "Baseline has no snapshots");
+      return NextResponse.json({ error: "Baseline has no snapshots" }, { status: 400 });
+    }
+
+    // Get the document IDs
+    const documentIds = [...new Set(baselineSnapshots.map((s) => s.documentId))];
+
+    // Get the latest evaluation versions for these documents
+    const newSnapshots = await metaEvaluationRepository.getEvaluationSnapshots(
+      documentIds,
+      run.baseline.agentId
+    );
+
+    // Compare and save results
+    let unchangedCount = 0;
+    let changedCount = 0;
+
+    for (const baselineSnapshot of baselineSnapshots) {
+      const newSnapshot = newSnapshots.find(
+        (s) => s.documentId === baselineSnapshot.documentId
+      );
+
+      if (newSnapshot) {
+        // Compare comments
+        const comparison = compareSnapshots(
+          toEvaluationSnapshot(baselineSnapshot),
+          toEvaluationSnapshot(newSnapshot)
+        );
+
+        // Get baseline snapshot record ID
+        const baselineSnapshotRecord = await metaEvaluationRepository.getBaselineSnapshotByDocument(
+          run.baselineId,
+          baselineSnapshot.documentId
+        );
+
+        if (baselineSnapshotRecord) {
+          const status =
+            comparison.newComments.length === 0 && comparison.lostComments.length === 0
+              ? "unchanged"
+              : "changed";
+
+          if (status === "unchanged") unchangedCount++;
+          else changedCount++;
+
+          // Get pipeline telemetry from new snapshot
+          const telemetry = newSnapshot.pipelineTelemetry as EvaluationSnapshot["pipelineTelemetry"];
+          const finalCounts = telemetry?.finalCounts;
+
+          // Get full telemetry record for stages
+          const fullTelemetry = newSnapshot.pipelineTelemetry as {
+            stages?: Array<{
+              stageName: string;
+              durationMs: number;
+              inputCount: number;
+              outputCount: number;
+              model?: string;
+              costUsd?: number;
+            }>;
+            totalDurationMs?: number;
+          } & EvaluationSnapshot["pipelineTelemetry"];
+
+          await metaEvaluationRepository.addValidationRunSnapshot({
+            runId,
+            baselineSnapshotId: baselineSnapshotRecord.id,
+            newEvaluationId: newSnapshot.evaluationVersionId,
+            status,
+            keptCount: comparison.matchedComments.length,
+            newCount: comparison.newComments.length,
+            lostCount: comparison.lostComments.length,
+            comparisonData: {
+              matchedComments: comparison.matchedComments,
+              newComments: comparison.newComments,
+              lostComments: comparison.lostComments,
+              filteredItems: telemetry?.filteredItems,
+              extractionPhase: telemetry?.extractionPhase,
+              stages: fullTelemetry.stages,
+              totalDurationMs: fullTelemetry.totalDurationMs,
+              pipelineCounts: finalCounts
+                ? {
+                    issuesAfterDedup: finalCounts.issuesAfterDedup ?? 0,
+                    issuesAfterFiltering: finalCounts.issuesAfterFiltering ?? 0,
+                    commentsGenerated: finalCounts.commentsGenerated ?? 0,
+                    commentsKept: finalCounts.commentsKept ?? 0,
+                  }
+                : undefined,
+            },
+          });
+        }
+      }
+    }
+
+    // Update run status
+    const summary = `${unchangedCount} unchanged, ${changedCount} changed`;
+    await metaEvaluationRepository.updateValidationRunStatus(runId, "completed", summary);
+
+    logger.info("Validation run finalized", {
+      runId,
+      unchangedCount,
+      changedCount,
+    });
+
+    return NextResponse.json({
+      success: true,
+      summary,
+      unchangedCount,
+      changedCount,
+    });
+  } catch (error) {
+    logger.error("Error finalizing validation run:", error);
+
+    // Mark run as failed
+    try {
+      await metaEvaluationRepository.updateValidationRunStatus(
+        runId,
+        "failed",
+        error instanceof Error ? error.message : "Unknown error"
+      );
+    } catch {
+      // Ignore secondary error
+    }
+
+    return commonErrors.serverError("Failed to finalize validation run");
+  }
+}
+
+// Helper to convert snapshot format
+function toEvaluationSnapshot(snapshot: {
+  evaluationVersionId: string;
+  documentId: string;
+  comments: CommentData[];
+  pipelineTelemetry?: unknown;
+}): EvaluationSnapshot {
+  return {
+    evaluationVersionId: snapshot.evaluationVersionId,
+    documentId: snapshot.documentId,
+    comments: snapshot.comments,
+    pipelineTelemetry: snapshot.pipelineTelemetry as EvaluationSnapshot["pipelineTelemetry"],
+  };
+}
+
+// Simple comment comparison
+function compareSnapshots(baseline: EvaluationSnapshot, current: EvaluationSnapshot) {
+  const matchedComments: Array<{
+    baselineComment: CommentData;
+    currentComment: CommentData;
+    matchConfidence: number;
+    status: string;
+  }> = [];
+  const newComments: CommentData[] = [];
+  const lostComments: CommentData[] = [];
+
+  const usedCurrentIndices = new Set<number>();
+
+  // Find matches based on quoted text similarity
+  for (const baselineComment of baseline.comments) {
+    let bestMatch: { index: number; score: number } | null = null;
+
+    for (let i = 0; i < current.comments.length; i++) {
+      if (usedCurrentIndices.has(i)) continue;
+
+      const currentComment = current.comments[i];
+      const score = calculateSimilarity(baselineComment.quotedText, currentComment.quotedText);
+
+      if (score > 0.6 && (!bestMatch || score > bestMatch.score)) {
+        bestMatch = { index: i, score };
+      }
+    }
+
+    if (bestMatch) {
+      usedCurrentIndices.add(bestMatch.index);
+      matchedComments.push({
+        baselineComment,
+        currentComment: current.comments[bestMatch.index],
+        matchConfidence: bestMatch.score,
+        status: "matched",
+      });
+    } else {
+      lostComments.push(baselineComment);
+    }
+  }
+
+  // Find new comments (not matched to any baseline)
+  for (let i = 0; i < current.comments.length; i++) {
+    if (!usedCurrentIndices.has(i)) {
+      newComments.push(current.comments[i]);
+    }
+  }
+
+  return { matchedComments, newComments, lostComments };
+}
+
+// Simple text similarity (Jaccard on words)
+function calculateSimilarity(a: string, b: string): number {
+  const wordsA = new Set(a.toLowerCase().split(/\s+/));
+  const wordsB = new Set(b.toLowerCase().split(/\s+/));
+
+  const intersection = new Set([...wordsA].filter((x) => wordsB.has(x)));
+  const union = new Set([...wordsA, ...wordsB]);
+
+  return intersection.size / union.size;
+}
diff --git a/apps/web/src/app/api/monitor/lab/runs/[id]/route.ts b/apps/web/src/app/api/monitor/lab/runs/[id]/route.ts
new file mode 100644
index 000000000..f539e8474
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/runs/[id]/route.ts
@@ -0,0 +1,52 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository } from "@roast/db";
+import type { RouteIdParams } from "../../types";
+
+export async function GET(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    const run = await metaEvaluationRepository.getValidationRunDetail(id);
+    if (!run) {
+      return NextResponse.json({ error: "Run not found" }, { status: 404 });
+    }
+    return NextResponse.json({ run });
+  } catch (error) {
+    logger.error("Error fetching run detail:", error);
+    return commonErrors.serverError("Failed to fetch run detail");
+  }
+}
+
+export async function DELETE(
+  request: NextRequest,
+  { params }: { params: Promise<{ id: string }> }
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    await metaEvaluationRepository.deleteValidationRun(id);
+    return NextResponse.json({ success: true });
+  } catch (error) {
+    logger.error("Error deleting run:", error);
+    return commonErrors.serverError("Failed to delete run");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/runs/route.ts b/apps/web/src/app/api/monitor/lab/runs/route.ts
new file mode 100644
index 000000000..e16bcaf24
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/runs/route.ts
@@ -0,0 +1,59 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { metaEvaluationRepository } from "@roast/db";
+
+export async function GET(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const baselineId = request.nextUrl.searchParams.get("baselineId");
+  if (!baselineId) {
+    return NextResponse.json({ error: "baselineId is required" }, { status: 400 });
+  }
+
+  try {
+    const runs = await metaEvaluationRepository.getValidationRuns(baselineId);
+    return NextResponse.json({ runs });
+  } catch (error) {
+    logger.error("Error fetching runs:", error);
+    return commonErrors.serverError("Failed to fetch runs");
+  }
+}
+
+export async function POST(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  try {
+    const body = await request.json();
+    const { baselineId, name } = body;
+
+    if (!baselineId) {
+      return NextResponse.json({ error: "baselineId is required" }, { status: 400 });
+    }
+
+    // Create the run record
+    const run = await metaEvaluationRepository.createValidationRun({
+      baselineId,
+      name,
+    });
+
+    // Note: The actual evaluation execution would be triggered separately
+    // (e.g., via a job queue). For now, we just create the run record.
+    // The CLI handles the actual pipeline execution.
+
+    return NextResponse.json({ run });
+  } catch (error) {
+    logger.error("Error creating run:", error);
+    return commonErrors.serverError("Failed to create run");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/runs/start/route.ts b/apps/web/src/app/api/monitor/lab/runs/start/route.ts
new file mode 100644
index 000000000..deb0e6b6c
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/runs/start/route.ts
@@ -0,0 +1,116 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma, metaEvaluationRepository } from "@roast/db";
+import { getServices } from "@/application/services/ServiceFactory";
+
+/**
+ * Start a validation run:
+ * 1. Create ValidationRun record
+ * 2. Get documents from baseline
+ * 3. Create batch jobs to re-evaluate each document
+ * 4. Return run ID and job IDs for polling
+ */
+export async function POST(request: NextRequest) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  try {
+    const body = await request.json();
+    const { baselineId, name, profileId } = body;
+
+    if (!baselineId) {
+      return NextResponse.json({ error: "baselineId is required" }, { status: 400 });
+    }
+
+    // Get baseline info
+    const baseline = await prisma.validationBaseline.findUnique({
+      where: { id: baselineId },
+      select: { id: true, name: true, agentId: true },
+    });
+
+    if (!baseline) {
+      return NextResponse.json({ error: "Baseline not found" }, { status: 404 });
+    }
+
+    // Get document IDs from baseline
+    const documentIds = await metaEvaluationRepository.getBaselineDocumentIds(baselineId);
+
+    if (documentIds.length === 0) {
+      return NextResponse.json({ error: "Baseline has no documents" }, { status: 400 });
+    }
+
+    // Create the validation run
+    const run = await metaEvaluationRepository.createValidationRun({
+      baselineId,
+      name: name || `Run ${new Date().toLocaleString()}`,
+      profileId: profileId || undefined,
+    });
+
+    // Create batch for the jobs
+    const batch = await prisma.agentEvalBatch.create({
+      data: {
+        name: `Validation run ${run.id.slice(0, 8)}`,
+        agentId: baseline.agentId,
+        requestedDocumentIds: documentIds,
+        userId,
+      },
+    });
+
+    // Create evaluations and jobs for each document
+    const jobIds: string[] = [];
+    const { jobService } = getServices();
+
+    for (const documentId of documentIds) {
+      // Check if evaluation exists
+      let evaluation = await prisma.evaluation.findFirst({
+        where: {
+          documentId,
+          agentId: baseline.agentId,
+        },
+      });
+
+      // Create evaluation if it doesn't exist
+      if (!evaluation) {
+        evaluation = await prisma.evaluation.create({
+          data: {
+            documentId,
+            agentId: baseline.agentId,
+          },
+        });
+      }
+
+      // Create job with profile ID for plugin configuration
+      const job = await jobService.createJob(evaluation.id, batch.id, profileId);
+      jobIds.push(job.id);
+    }
+
+    logger.info("Validation run started", {
+      runId: run.id,
+      baselineId,
+      profileId: profileId || null,
+      documentCount: documentIds.length,
+      jobCount: jobIds.length,
+    });
+
+    return NextResponse.json({
+      run: {
+        id: run.id,
+        status: "running",
+      },
+      batch: {
+        id: batch.id,
+      },
+      jobIds,
+      documentCount: documentIds.length,
+    });
+  } catch (error) {
+    logger.error("Error starting validation run:", error);
+    return commonErrors.serverError("Failed to start validation run");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/snapshots/[id]/route.ts b/apps/web/src/app/api/monitor/lab/snapshots/[id]/route.ts
new file mode 100644
index 000000000..ed5775923
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/snapshots/[id]/route.ts
@@ -0,0 +1,101 @@
+import { NextRequest, NextResponse } from "next/server";
+import { logger } from "@/infrastructure/logging/logger";
+import { authenticateRequest } from "@/infrastructure/auth/auth-helpers";
+import { commonErrors } from "@/infrastructure/http/api-response-helpers";
+import { isAdmin } from "@/infrastructure/auth/auth";
+import { prisma } from "@roast/db";
+import type { RouteIdParams } from "../../types";
+
+export async function GET(
+  request: NextRequest,
+  { params }: RouteIdParams
+) {
+  const userId = await authenticateRequest(request);
+  if (!userId) return commonErrors.unauthorized();
+
+  const adminCheck = await isAdmin();
+  if (!adminCheck) return commonErrors.forbidden();
+
+  const { id } = await params;
+
+  try {
+    // Get the run snapshot with full comparison data
+    const snapshot = await prisma.validationRunSnapshot.findUnique({
+      where: { id },
+      include: {
+        baselineSnapshot: {
+          include: {
+            evaluationVersion: {
+              include: {
+                evaluation: {
+                  include: {
+                    document: {
+                      include: {
+                        versions: {
+                          orderBy: { version: "desc" },
+                          take: 1,
+                          select: { title: true },
+                        },
+                      },
+                    },
+                  },
+                },
+                comments: {
+                  include: { highlight: true },
+                },
+              },
+            },
+          },
+        },
+        newEvaluation: {
+          include: {
+            comments: {
+              include: { highlight: true },
+            },
+          },
+        },
+      },
+    });
+
+    if (!snapshot) {
+      return NextResponse.json({ error: "Snapshot not found" }, { status: 404 });
+    }
+
+    // Format baseline comments
+    const baselineComments = snapshot.baselineSnapshot.evaluationVersion.comments.map((c) => ({
+      id: c.id,
+      quotedText: c.highlight.quotedText,
+      header: c.header,
+      description: c.description,
+      importance: c.importance,
+    }));
+
+    // Format current comments
+    const currentComments = snapshot.newEvaluation.comments.map((c) => ({
+      id: c.id,
+      quotedText: c.highlight.quotedText,
+      header: c.header,
+      description: c.description,
+      importance: c.importance,
+    }));
+
+    return NextResponse.json({
+      snapshot: {
+        id: snapshot.id,
+        status: snapshot.status,
+        keptCount: snapshot.keptCount,
+        newCount: snapshot.newCount,
+        lostCount: snapshot.lostCount,
+        documentTitle:
+          snapshot.baselineSnapshot.evaluationVersion.evaluation.document.versions[0]?.title ||
+          "Unknown",
+        comparisonData: snapshot.comparisonData,
+        baselineComments,
+        currentComments,
+      },
+    });
+  } catch (error) {
+    logger.error("Error fetching snapshot:", error);
+    return commonErrors.serverError("Failed to fetch snapshot");
+  }
+}
diff --git a/apps/web/src/app/api/monitor/lab/types.ts b/apps/web/src/app/api/monitor/lab/types.ts
new file mode 100644
index 000000000..f2904604a
--- /dev/null
+++ b/apps/web/src/app/api/monitor/lab/types.ts
@@ -0,0 +1,10 @@
+/**
+ * Shared types for Lab API routes
+ */
+
+/**
+ * Next.js 15 dynamic route params for routes with [id] segment
+ */
+export interface RouteIdParams {
+  params: Promise<{ id: string }>;
+}
diff --git a/apps/web/src/app/monitor/client-layout.tsx b/apps/web/src/app/monitor/client-layout.tsx
index be6fd5bd5..162019273 100644
--- a/apps/web/src/app/monitor/client-layout.tsx
+++ b/apps/web/src/app/monitor/client-layout.tsx
@@ -55,6 +55,12 @@ export default function MonitorLayout({ children }: MonitorLayoutProps) {
                 >
                   Docs
                 </Link>
+                <Link
+                  href="/monitor/lab"
+                  className={getLinkClass("/monitor/lab")}
+                >
+                  Lab
+                </Link>
               </nav>
             </div>
             <Link
diff --git a/apps/web/src/app/monitor/lab/components/baselines/BaselineCard.tsx b/apps/web/src/app/monitor/lab/components/baselines/BaselineCard.tsx
new file mode 100644
index 000000000..587df168c
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/baselines/BaselineCard.tsx
@@ -0,0 +1,49 @@
+"use client";
+
+import { TrashIcon } from "@heroicons/react/24/outline";
+import { formatDate } from "../../utils/formatters";
+import type { Baseline } from "../../types";
+
+interface BaselineCardProps {
+  baseline: Baseline;
+  isSelected: boolean;
+  onSelect: () => void;
+  onDelete: () => void;
+}
+
+export function BaselineCard({ baseline, isSelected, onSelect, onDelete }: BaselineCardProps) {
+  return (
+    <div
+      onClick={onSelect}
+      className={`bg-white rounded-lg shadow p-4 cursor-pointer transition-colors ${
+        isSelected ? "ring-2 ring-blue-500 bg-blue-50" : "hover:bg-gray-50"
+      }`}
+    >
+      <div className="flex items-center justify-between">
+        <div className="flex-1 min-w-0">
+          <h3 className="font-medium text-gray-900 truncate">{baseline.name}</h3>
+          {baseline.description && (
+            <p className="text-sm text-gray-500 truncate">{baseline.description}</p>
+          )}
+          <div className="mt-1 flex items-center space-x-4 text-xs text-gray-400">
+            <span>{baseline.snapshotCount} documents</span>
+            <span>{formatDate(baseline.createdAt)}</span>
+            {baseline.commitHash && (
+              <span className="font-mono">{baseline.commitHash.slice(0, 7)}</span>
+            )}
+          </div>
+        </div>
+        <button
+          onClick={(e) => {
+            e.stopPropagation();
+            onDelete();
+          }}
+          className="p-2 text-gray-400 hover:text-red-600 transition-colors"
+          title="Delete baseline"
+        >
+          <TrashIcon className="h-4 w-4" />
+        </button>
+      </div>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/baselines/BaselineList.tsx b/apps/web/src/app/monitor/lab/components/baselines/BaselineList.tsx
new file mode 100644
index 000000000..cfd437782
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/baselines/BaselineList.tsx
@@ -0,0 +1,27 @@
+"use client";
+
+import { BaselineCard } from "./BaselineCard";
+import type { Baseline } from "../../types";
+
+interface BaselineListProps {
+  baselines: Baseline[];
+  selectedId: string | null;
+  onSelect: (baseline: Baseline) => void;
+  onDelete: (id: string) => void;
+}
+
+export function BaselineList({ baselines, selectedId, onSelect, onDelete }: BaselineListProps) {
+  return (
+    <div className="space-y-2">
+      {baselines.map((baseline) => (
+        <BaselineCard
+          key={baseline.id}
+          baseline={baseline}
+          isSelected={baseline.id === selectedId}
+          onSelect={() => onSelect(baseline)}
+          onDelete={() => onDelete(baseline.id)}
+        />
+      ))}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/baselines/CreateBaselineModal.tsx b/apps/web/src/app/monitor/lab/components/baselines/CreateBaselineModal.tsx
new file mode 100644
index 000000000..ea27c66c7
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/baselines/CreateBaselineModal.tsx
@@ -0,0 +1,359 @@
+"use client";
+
+import { useState, useEffect, useCallback } from "react";
+import { XMarkIcon, MagnifyingGlassIcon, ChevronDownIcon, ChevronRightIcon } from "@heroicons/react/24/outline";
+import type { CorpusDocument, EvaluationVersionSummary } from "../../types";
+import { truncate } from "../../utils/formatters";
+
+interface CreateBaselineModalProps {
+  agentId: string;
+  onClose: () => void;
+  onCreated: () => void;
+}
+
+function getDefaultName(): string {
+  const now = new Date();
+  const date = now.toISOString().split("T")[0];
+  return `Baseline ${date}`;
+}
+
+function formatDate(dateStr: string): string {
+  return new Date(dateStr).toLocaleString();
+}
+
+export function CreateBaselineModal({ agentId, onClose, onCreated }: CreateBaselineModalProps) {
+  const [name, setName] = useState(getDefaultName);
+  const [description, setDescription] = useState("");
+  const [searchQuery, setSearchQuery] = useState("");
+  const [documents, setDocuments] = useState<CorpusDocument[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [creating, setCreating] = useState(false);
+
+  // Selected version per document: Map<documentId, versionId>
+  const [selectedVersions, setSelectedVersions] = useState<Map<string, string>>(new Map());
+
+  // Expanded documents for viewing versions
+  const [expandedDocId, setExpandedDocId] = useState<string | null>(null);
+  const [versions, setVersions] = useState<EvaluationVersionSummary[]>([]);
+  const [loadingVersions, setLoadingVersions] = useState(false);
+
+  // Cache versions per document to avoid refetching
+  const [versionsCache, setVersionsCache] = useState<Map<string, EvaluationVersionSummary[]>>(new Map());
+
+  const fetchDocuments = useCallback(async (filter?: string) => {
+    setLoading(true);
+    try {
+      const params = new URLSearchParams({ agentId });
+      if (filter) params.set("filter", filter);
+      const res = await fetch(`/api/monitor/lab/corpus?${params}`);
+      if (res.ok) {
+        const data = await res.json();
+        setDocuments(data.documents);
+      }
+    } finally {
+      setLoading(false);
+    }
+  }, [agentId]);
+
+  const fetchVersions = useCallback(async (documentId: string): Promise<EvaluationVersionSummary[]> => {
+    // Check cache first
+    const cached = versionsCache.get(documentId);
+    if (cached) return cached;
+
+    setLoadingVersions(true);
+    try {
+      const params = new URLSearchParams({ agentId, documentId });
+      const res = await fetch(`/api/monitor/lab/corpus/versions?${params}`);
+      if (res.ok) {
+        const data = await res.json();
+        const fetchedVersions = data.versions as EvaluationVersionSummary[];
+        setVersionsCache(prev => new Map(prev).set(documentId, fetchedVersions));
+        return fetchedVersions;
+      }
+    } finally {
+      setLoadingVersions(false);
+    }
+    return [];
+  }, [agentId, versionsCache]);
+
+  useEffect(() => {
+    void fetchDocuments();
+  }, [fetchDocuments]);
+
+  const handleSearch = () => {
+    void fetchDocuments(searchQuery || undefined);
+  };
+
+  const toggleDocument = async (docId: string) => {
+    const newSelected = new Map(selectedVersions);
+    if (newSelected.has(docId)) {
+      // Deselect document
+      newSelected.delete(docId);
+    } else {
+      // Select document - auto-select latest version
+      const docVersions = versionsCache.get(docId) || await fetchVersions(docId);
+      if (docVersions.length > 0) {
+        newSelected.set(docId, docVersions[0].id); // Latest version (ordered desc)
+      }
+    }
+    setSelectedVersions(newSelected);
+  };
+
+  const selectVersion = (docId: string, versionId: string) => {
+    const newSelected = new Map(selectedVersions);
+    newSelected.set(docId, versionId);
+    setSelectedVersions(newSelected);
+  };
+
+  const toggleExpand = async (docId: string) => {
+    if (expandedDocId === docId) {
+      setExpandedDocId(null);
+      setVersions([]);
+    } else {
+      setExpandedDocId(docId);
+      const docVersions = await fetchVersions(docId);
+      setVersions(docVersions);
+    }
+  };
+
+  const handleSelectAll = async () => {
+    const newSelected = new Map<string, string>();
+    for (const doc of documents) {
+      const docVersions = versionsCache.get(doc.documentId) || await fetchVersions(doc.documentId);
+      if (docVersions.length > 0) {
+        newSelected.set(doc.documentId, docVersions[0].id);
+      }
+    }
+    setSelectedVersions(newSelected);
+  };
+
+  const handleSelectNone = () => {
+    setSelectedVersions(new Map());
+  };
+
+  const handleCreate = async () => {
+    if (!name.trim() || selectedVersions.size === 0) return;
+    setCreating(true);
+    try {
+      const res = await fetch("/api/monitor/lab/baselines", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({
+          agentId,
+          name: name.trim(),
+          description: description.trim() || undefined,
+          evaluationVersionIds: Array.from(selectedVersions.values()),
+        }),
+      });
+      if (res.ok) {
+        onCreated();
+      }
+    } finally {
+      setCreating(false);
+    }
+  };
+
+  return (
+    <div className="fixed inset-0 bg-black/50 flex items-center justify-center z-50">
+      <div className="bg-white rounded-lg shadow-xl w-full max-w-2xl max-h-[80vh] flex flex-col">
+        {/* Header */}
+        <div className="flex items-center justify-between px-6 py-4 border-b">
+          <h2 className="text-lg font-semibold text-gray-900">Create Validation Baseline</h2>
+          <button onClick={onClose} className="text-gray-400 hover:text-gray-600">
+            <XMarkIcon className="h-5 w-5" />
+          </button>
+        </div>
+
+        {/* Content */}
+        <div className="flex-1 overflow-y-auto p-6 space-y-4">
+          {/* Name & Description */}
+          <div className="space-y-3">
+            <div>
+              <label className="block text-sm font-medium text-gray-700 mb-1">Name</label>
+              <input
+                type="text"
+                value={name}
+                onChange={(e) => setName(e.target.value)}
+                placeholder="e.g., Pre-refactor baseline"
+                className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
+              />
+            </div>
+            <div>
+              <label className="block text-sm font-medium text-gray-700 mb-1">
+                Description (optional)
+              </label>
+              <input
+                type="text"
+                value={description}
+                onChange={(e) => setDescription(e.target.value)}
+                placeholder="e.g., Baseline before filter changes"
+                className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
+              />
+            </div>
+          </div>
+
+          {/* Document Selection */}
+          <div>
+            <div className="flex items-center justify-between mb-2">
+              <label className="block text-sm font-medium text-gray-700">
+                Select Documents ({selectedVersions.size} selected)
+              </label>
+              <div className="space-x-2">
+                <button
+                  onClick={() => void handleSelectAll()}
+                  className="text-xs text-blue-600 hover:underline"
+                >
+                  Select All
+                </button>
+                <button
+                  onClick={handleSelectNone}
+                  className="text-xs text-blue-600 hover:underline"
+                >
+                  Select None
+                </button>
+              </div>
+            </div>
+
+            {/* Search */}
+            <div className="flex items-center space-x-2 mb-3">
+              <div className="relative flex-1">
+                <MagnifyingGlassIcon className="absolute left-3 top-1/2 -translate-y-1/2 h-4 w-4 text-gray-400" />
+                <input
+                  type="text"
+                  value={searchQuery}
+                  onChange={(e) => setSearchQuery(e.target.value)}
+                  onKeyDown={(e) => e.key === "Enter" && handleSearch()}
+                  placeholder="Search documents..."
+                  className="w-full pl-9 pr-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500 text-sm"
+                />
+              </div>
+              <button
+                onClick={handleSearch}
+                className="px-3 py-2 bg-gray-100 text-gray-700 rounded-md hover:bg-gray-200 text-sm"
+              >
+                Search
+              </button>
+            </div>
+
+            <p className="text-xs text-gray-500 mb-2">
+              Expand a document to select a specific version. By default, the latest version is used.
+            </p>
+
+            {/* Document List */}
+            <div className="border rounded-md max-h-72 overflow-y-auto">
+              {loading ? (
+                <div className="p-4 text-center text-gray-500">Loading documents...</div>
+              ) : documents.length === 0 ? (
+                <div className="p-4 text-center text-gray-500">No documents found</div>
+              ) : (
+                <div className="divide-y">
+                  {documents.map((doc) => {
+                    const isSelected = selectedVersions.has(doc.documentId);
+                    const selectedVersionId = selectedVersions.get(doc.documentId);
+
+                    return (
+                      <div key={doc.documentId}>
+                        <div className="flex items-center p-3 hover:bg-gray-50">
+                          {/* Expand button */}
+                          <button
+                            onClick={() => void toggleExpand(doc.documentId)}
+                            className="mr-2 text-gray-400 hover:text-gray-600"
+                          >
+                            {expandedDocId === doc.documentId ? (
+                              <ChevronDownIcon className="h-4 w-4" />
+                            ) : (
+                              <ChevronRightIcon className="h-4 w-4" />
+                            )}
+                          </button>
+
+                          {/* Checkbox */}
+                          <input
+                            type="checkbox"
+                            checked={isSelected}
+                            onChange={() => void toggleDocument(doc.documentId)}
+                            className="h-4 w-4 text-blue-600 rounded border-gray-300"
+                          />
+
+                          {/* Document info */}
+                          <div className="ml-3 flex-1 min-w-0">
+                            <p className="text-sm font-medium text-gray-900 truncate">
+                              {truncate(doc.title, 60)}
+                            </p>
+                            <p className="text-xs text-gray-500">
+                              {doc.evaluationCount} evaluations
+                              {isSelected && selectedVersionId && (
+                                <span className="ml-2 text-blue-600">
+                                  (version selected)
+                                </span>
+                              )}
+                            </p>
+                          </div>
+                        </div>
+
+                        {/* Expanded versions */}
+                        {expandedDocId === doc.documentId && (
+                          <div className="bg-gray-50 px-10 py-2 border-t border-gray-100">
+                            {loadingVersions ? (
+                              <p className="text-xs text-gray-500">Loading versions...</p>
+                            ) : versions.length === 0 ? (
+                              <p className="text-xs text-gray-500">No versions found</p>
+                            ) : (
+                              <div className="space-y-1">
+                                <p className="text-xs font-medium text-gray-600 mb-2">
+                                  Select version ({versions.length} available):
+                                </p>
+                                {versions.map((v, idx) => (
+                                  <label
+                                    key={v.id}
+                                    className="flex items-center gap-2 text-xs text-gray-600 cursor-pointer hover:bg-gray-100 p-1 rounded"
+                                  >
+                                    <input
+                                      type="radio"
+                                      name={`version-${doc.documentId}`}
+                                      checked={selectedVersionId === v.id}
+                                      onChange={() => selectVersion(doc.documentId, v.id)}
+                                      className="h-3 w-3 text-blue-600 border-gray-300"
+                                    />
+                                    <span className="font-mono text-gray-400">v{v.version}</span>
+                                    <span>{formatDate(v.createdAt)}</span>
+                                    {v.grade !== null && (
+                                      <span className="text-gray-500">Grade: {v.grade}</span>
+                                    )}
+                                    {idx === 0 && (
+                                      <span className="text-blue-500 text-[10px]">(latest)</span>
+                                    )}
+                                  </label>
+                                ))}
+                              </div>
+                            )}
+                          </div>
+                        )}
+                      </div>
+                    );
+                  })}
+                </div>
+              )}
+            </div>
+          </div>
+        </div>
+
+        {/* Footer */}
+        <div className="flex items-center justify-end space-x-3 px-6 py-4 border-t bg-gray-50">
+          <button
+            onClick={onClose}
+            className="px-4 py-2 text-gray-700 hover:text-gray-900"
+          >
+            Cancel
+          </button>
+          <button
+            onClick={() => void handleCreate()}
+            disabled={!name.trim() || selectedVersions.size === 0 || creating}
+            className="px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700 disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium"
+          >
+            {creating ? "Creating..." : `Create Baseline (${selectedVersions.size} docs)`}
+          </button>
+        </div>
+      </div>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/evaluations/AllEvaluationsList.tsx b/apps/web/src/app/monitor/lab/components/evaluations/AllEvaluationsList.tsx
new file mode 100644
index 000000000..8bd39a159
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/evaluations/AllEvaluationsList.tsx
@@ -0,0 +1,200 @@
+"use client";
+
+import { useState } from "react";
+import {
+  ChevronDownIcon,
+  ChevronRightIcon,
+  ArrowPathIcon,
+  DocumentTextIcon,
+  ClockIcon,
+  CurrencyDollarIcon,
+} from "@heroicons/react/24/outline";
+import type { EvaluationWithTelemetry } from "../../hooks/useAllEvaluations";
+import { PipelineView } from "../snapshots/PipelineView";
+import { formatDuration, formatCost } from "../snapshots/pipelineUtils";
+
+interface AllEvaluationsListProps {
+  evaluations: EvaluationWithTelemetry[];
+  loading: boolean;
+  error: string | null;
+  onRefresh: () => void;
+}
+
+export function AllEvaluationsList({
+  evaluations,
+  loading,
+  error,
+  onRefresh,
+}: AllEvaluationsListProps) {
+  const [expandedId, setExpandedId] = useState<string | null>(null);
+
+  const toggleExpanded = (id: string) => {
+    setExpandedId(expandedId === id ? null : id);
+  };
+
+  if (loading && evaluations.length === 0) {
+    return (
+      <div className="flex items-center justify-center p-8 text-gray-500">
+        <ArrowPathIcon className="w-5 h-5 animate-spin mr-2" />
+        Loading evaluations...
+      </div>
+    );
+  }
+
+  if (error) {
+    return (
+      <div className="p-4 bg-red-50 border border-red-200 rounded-lg">
+        <p className="text-red-700">Error: {error}</p>
+        <button
+          onClick={onRefresh}
+          className="mt-2 text-sm text-red-600 hover:text-red-800 underline"
+        >
+          Try again
+        </button>
+      </div>
+    );
+  }
+
+  if (evaluations.length === 0) {
+    return (
+      <div className="p-8 text-center text-gray-500">
+        <DocumentTextIcon className="w-12 h-12 mx-auto mb-3 text-gray-300" />
+        <p>No evaluations with telemetry found.</p>
+        <p className="text-sm mt-1">Run some evaluations in the main UI to see them here.</p>
+      </div>
+    );
+  }
+
+  return (
+    <div className="space-y-2">
+      {/* Header with refresh button */}
+      <div className="flex items-center justify-between mb-4">
+        <span className="text-sm text-gray-500">
+          {evaluations.length} recent evaluation{evaluations.length !== 1 ? "s" : ""}
+        </span>
+        <button
+          onClick={onRefresh}
+          disabled={loading}
+          className="flex items-center gap-1 px-2 py-1 text-sm text-gray-600 hover:text-gray-900 disabled:opacity-50"
+        >
+          <ArrowPathIcon className={`w-4 h-4 ${loading ? "animate-spin" : ""}`} />
+          Refresh
+        </button>
+      </div>
+
+      {/* Evaluation list */}
+      <div className="border rounded-lg divide-y">
+        {evaluations.map((evaluation) => (
+          <EvaluationRow
+            key={evaluation.id}
+            evaluation={evaluation}
+            isExpanded={expandedId === evaluation.id}
+            onToggle={() => toggleExpanded(evaluation.id)}
+          />
+        ))}
+      </div>
+    </div>
+  );
+}
+
+interface EvaluationRowProps {
+  evaluation: EvaluationWithTelemetry;
+  isExpanded: boolean;
+  onToggle: () => void;
+}
+
+function EvaluationRow({ evaluation, isExpanded, onToggle }: EvaluationRowProps) {
+  const telemetry = evaluation.telemetry;
+  const hasTelemetry = telemetry && (telemetry.stages || telemetry.extractionPhase);
+
+  // Format date
+  const date = new Date(evaluation.createdAt);
+  const formattedDate = date.toLocaleDateString("en-US", {
+    month: "short",
+    day: "numeric",
+    hour: "2-digit",
+    minute: "2-digit",
+  });
+
+  return (
+    <div className="bg-white">
+      {/* Row header - always visible */}
+      <button
+        onClick={onToggle}
+        disabled={!hasTelemetry}
+        className={`w-full px-4 py-3 flex items-center gap-3 text-left transition-colors ${
+          hasTelemetry
+            ? "hover:bg-gray-50 cursor-pointer"
+            : "cursor-default opacity-60"
+        }`}
+      >
+        {/* Expand icon */}
+        <div className="flex-shrink-0 w-5 h-5 text-gray-400">
+          {hasTelemetry ? (
+            isExpanded ? (
+              <ChevronDownIcon className="w-5 h-5" />
+            ) : (
+              <ChevronRightIcon className="w-5 h-5" />
+            )
+          ) : (
+            <div className="w-5 h-5" />
+          )}
+        </div>
+
+        {/* Document title */}
+        <div className="flex-1 min-w-0">
+          <p className="font-medium text-gray-900 truncate">
+            {evaluation.documentTitle}
+          </p>
+          <p className="text-xs text-gray-500 mt-0.5">
+            {evaluation.agentName} &middot; v{evaluation.version}
+          </p>
+        </div>
+
+        {/* Quick stats */}
+        <div className="flex items-center gap-4 text-xs text-gray-500">
+          {/* Comments count */}
+          <span className="flex items-center gap-1">
+            <DocumentTextIcon className="w-4 h-4" />
+            {evaluation.comments.length} comment{evaluation.comments.length !== 1 ? "s" : ""}
+          </span>
+
+          {/* Duration */}
+          {telemetry?.totalDurationMs && (
+            <span className="flex items-center gap-1">
+              <ClockIcon className="w-4 h-4" />
+              {formatDuration(telemetry.totalDurationMs)}
+            </span>
+          )}
+
+          {/* Cost */}
+          {telemetry?.totalCostUsd && (
+            <span className="flex items-center gap-1">
+              <CurrencyDollarIcon className="w-4 h-4" />
+              {formatCost(telemetry.totalCostUsd)}
+            </span>
+          )}
+
+          {/* Date */}
+          <span className="w-24 text-right">{formattedDate}</span>
+        </div>
+      </button>
+
+      {/* Expanded content - PipelineView */}
+      {isExpanded && hasTelemetry && (
+        <div className="px-4 pb-4 pt-2 border-t bg-gray-50">
+          <PipelineView
+            extraction={telemetry.extractionPhase}
+            counts={telemetry.pipelineCounts}
+            filteredItems={telemetry.filteredItems ?? []}
+            passedItems={telemetry.passedItems}
+            stages={telemetry.stages}
+            totalDurationMs={telemetry.totalDurationMs}
+            finalComments={evaluation.comments}
+            lostComments={[]}
+          />
+        </div>
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/history/RunDetail.tsx b/apps/web/src/app/monitor/lab/components/history/RunDetail.tsx
new file mode 100644
index 000000000..807c23fa5
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/history/RunDetail.tsx
@@ -0,0 +1,131 @@
+"use client";
+
+import { useState, useEffect } from "react";
+import { ChevronRightIcon, ChevronLeftIcon } from "@heroicons/react/24/outline";
+import { formatDate } from "../../utils/formatters";
+import type { ValidationRunDetail, RunSnapshot } from "../../types";
+import { SnapshotComparison } from "../snapshots/SnapshotComparison";
+
+interface RunDetailProps {
+  runId: string;
+}
+
+interface SnapshotRowProps {
+  snapshot: RunSnapshot;
+  onClick: () => void;
+}
+
+export function RunDetail({ runId }: RunDetailProps) {
+  const [run, setRun] = useState<ValidationRunDetail | null>(null);
+  const [loading, setLoading] = useState(true);
+  const [selectedSnapshot, setSelectedSnapshot] = useState<RunSnapshot | null>(null);
+
+  useEffect(() => {
+    const fetchRun = async () => {
+      setLoading(true);
+      try {
+        const res = await fetch(`/api/monitor/lab/runs/${runId}`);
+        if (res.ok) {
+          const data = await res.json();
+          setRun(data.run);
+        }
+      } finally {
+        setLoading(false);
+      }
+    };
+    void fetchRun();
+  }, [runId]);
+
+  if (loading) {
+    return <div className="p-4 text-gray-500">Loading run details...</div>;
+  }
+
+  if (!run) {
+    return <div className="p-4 text-gray-500">Run not found</div>;
+  }
+
+  if (selectedSnapshot) {
+    return (
+      <div className="p-4">
+        <button
+          onClick={() => setSelectedSnapshot(null)}
+          className="flex items-center gap-1 text-sm text-blue-600 hover:text-blue-700 mb-4"
+        >
+          <ChevronLeftIcon className="h-4 w-4" />
+          Back to snapshots
+        </button>
+        <SnapshotComparison snapshot={selectedSnapshot} onBack={() => setSelectedSnapshot(null)} />
+      </div>
+    );
+  }
+
+  const changedSnapshots = run.snapshots.filter((s) => s.status === "changed");
+  const unchangedSnapshots = run.snapshots.filter((s) => s.status === "unchanged");
+
+  return (
+    <div className="p-4 space-y-4">
+      {/* Summary */}
+      <div className="flex items-center gap-4 text-sm">
+        <span className="text-gray-500">{formatDate(run.createdAt)}</span>
+        {run.summary && <span className="text-gray-700">{run.summary}</span>}
+      </div>
+
+      {/* Changed First */}
+      {changedSnapshots.length > 0 && (
+        <div>
+          <h4 className="text-sm font-medium text-orange-600 mb-2">
+            Changed ({changedSnapshots.length})
+          </h4>
+          <div className="space-y-2">
+            {changedSnapshots.map((snapshot) => (
+              <SnapshotRow
+                key={snapshot.id}
+                snapshot={snapshot}
+                onClick={() => setSelectedSnapshot(snapshot)}
+              />
+            ))}
+          </div>
+        </div>
+      )}
+
+      {/* Unchanged */}
+      {unchangedSnapshots.length > 0 && (
+        <div>
+          <h4 className="text-sm font-medium text-green-600 mb-2">
+            Unchanged ({unchangedSnapshots.length})
+          </h4>
+          <div className="space-y-2">
+            {unchangedSnapshots.map((snapshot) => (
+              <SnapshotRow
+                key={snapshot.id}
+                snapshot={snapshot}
+                onClick={() => setSelectedSnapshot(snapshot)}
+              />
+            ))}
+          </div>
+        </div>
+      )}
+
+      {run.snapshots.length === 0 && (
+        <div className="text-gray-500 text-sm">No snapshots in this run</div>
+      )}
+    </div>
+  );
+}
+
+function SnapshotRow({ snapshot, onClick }: SnapshotRowProps) {
+  return (
+    <div
+      onClick={onClick}
+      className="flex items-center justify-between p-3 bg-gray-50 rounded-md hover:bg-gray-100 cursor-pointer"
+    >
+      <div className="flex-1 min-w-0">
+        <p className="text-sm font-medium text-gray-900 truncate">{snapshot.documentTitle}</p>
+        <p className="text-xs text-gray-500">
+          {snapshot.keptCount} kept · {snapshot.newCount} new · {snapshot.lostCount} lost
+        </p>
+      </div>
+      <ChevronRightIcon className="h-4 w-4 text-gray-400" />
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ExtractorEditor.tsx b/apps/web/src/app/monitor/lab/components/profiles/ExtractorEditor.tsx
new file mode 100644
index 000000000..a8fefffff
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ExtractorEditor.tsx
@@ -0,0 +1,80 @@
+"use client";
+
+import { useState } from "react";
+import { PlusIcon } from "@heroicons/react/24/outline";
+import type { ExtractorConfig } from "../../types";
+import { useModels, type ModelInfo } from "../../hooks/useModels";
+import { ModelConfigurator } from "./ModelConfigurator";
+import { ModelSelector } from "./ModelSelector";
+
+interface ExtractorEditorProps {
+  extractors: ExtractorConfig[];
+  onChange: (extractors: ExtractorConfig[]) => void;
+  disabled?: boolean;
+}
+
+export function ExtractorEditor({ extractors, onChange, disabled }: ExtractorEditorProps) {
+  const { models, loading: modelsLoading } = useModels();
+  const [addingExtractor, setAddingExtractor] = useState(false);
+
+  const updateExtractor = (index: number, updates: Partial<ExtractorConfig>) => {
+    const newExtractors = [...extractors];
+    newExtractors[index] = { ...newExtractors[index], ...updates };
+    onChange(newExtractors);
+  };
+
+  const removeExtractor = (index: number) => {
+    if (extractors.length <= 1) return;
+    onChange(extractors.filter((_, i) => i !== index));
+  };
+
+  const addExtractor = (model: ModelInfo) => {
+    onChange([
+      ...extractors,
+      { model: model.id, temperature: "default", thinking: false },
+    ]);
+    setAddingExtractor(false);
+  };
+
+  return (
+    <div className="space-y-3">
+      {extractors.map((ext, index) => (
+        <ModelConfigurator
+          key={index}
+          config={ext}
+          onChange={(updates) => updateExtractor(index, updates)}
+          disabled={disabled}
+          label={index + 1}
+          colorTheme="blue"
+          showProvider={true}
+          showDelete={true}
+          onDelete={() => removeExtractor(index)}
+          deleteDisabled={extractors.length <= 1}
+          deleteDisabledReason="Cannot remove last extractor"
+        />
+      ))}
+
+      {/* Add Extractor Button / Model Selector */}
+      {!disabled && (
+        addingExtractor ? (
+          <div className="relative">
+            <ModelSelector
+              models={models}
+              loading={modelsLoading}
+              onSelect={addExtractor}
+              onCancel={() => setAddingExtractor(false)}
+            />
+          </div>
+        ) : (
+          <button
+            onClick={() => setAddingExtractor(true)}
+            className="flex items-center gap-2 px-3 py-2 text-blue-600 hover:bg-blue-50 rounded-md text-sm w-full border border-dashed border-blue-200"
+          >
+            <PlusIcon className="h-4 w-4" />
+            Add Extractor
+          </button>
+        )
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/FilterChainEditor.tsx b/apps/web/src/app/monitor/lab/components/profiles/FilterChainEditor.tsx
new file mode 100644
index 000000000..b1691c060
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/FilterChainEditor.tsx
@@ -0,0 +1,491 @@
+"use client";
+
+import { useState } from "react";
+import {
+  ChevronUpIcon,
+  ChevronDownIcon,
+  TrashIcon,
+  PlusIcon,
+  ChevronRightIcon,
+} from "@heroicons/react/24/outline";
+import type {
+  FilterChainItem,
+  SupportedElsewhereFilterConfig,
+  PrincipleOfCharityFilterConfig,
+  SeverityFilterConfig,
+  ConfidenceFilterConfig,
+} from "../../types";
+import { AVAILABLE_FILTER_TYPES } from "../../types";
+import { ModelConfigurator } from "./ModelConfigurator";
+import { getModelDisplayName } from "./ModelSelector";
+
+interface FilterChainEditorProps {
+  filters: FilterChainItem[];
+  onChange: (filters: FilterChainItem[]) => void;
+  disabled?: boolean;
+  defaultFilterPrompt?: string;
+}
+
+export function FilterChainEditor({
+  filters,
+  onChange,
+  disabled,
+  defaultFilterPrompt,
+}: FilterChainEditorProps) {
+  const [showAddMenu, setShowAddMenu] = useState(false);
+
+  const moveFilter = (index: number, direction: "up" | "down") => {
+    if (disabled) return;
+    const newFilters = [...filters];
+    const newIndex = direction === "up" ? index - 1 : index + 1;
+    if (newIndex < 0 || newIndex >= filters.length) return;
+    [newFilters[index], newFilters[newIndex]] = [newFilters[newIndex], newFilters[index]];
+    onChange(newFilters);
+  };
+
+  const removeFilter = (index: number) => {
+    if (disabled) return;
+    onChange(filters.filter((_, i) => i !== index));
+  };
+
+  const updateFilter = (index: number, updates: Partial<FilterChainItem>) => {
+    if (disabled) return;
+    const newFilters = [...filters];
+    newFilters[index] = { ...newFilters[index], ...updates } as FilterChainItem;
+    onChange(newFilters);
+  };
+
+  const toggleFilter = (index: number) => {
+    updateFilter(index, { enabled: !filters[index].enabled });
+  };
+
+  const addFilter = (type: FilterChainItem["type"]) => {
+    if (disabled) return;
+    const id = `filter-${Date.now()}`;
+    let newFilter: FilterChainItem;
+
+    switch (type) {
+      case "principle-of-charity":
+        newFilter = {
+          id,
+          type: "principle-of-charity",
+          enabled: true,
+          model: "claude-sonnet-4-5-20250929",
+          temperature: 0.2,
+        };
+        break;
+      case "supported-elsewhere":
+        newFilter = {
+          id,
+          type: "supported-elsewhere",
+          enabled: true,
+          model: "claude-sonnet-4-5-20250929",
+          temperature: 0.1,
+        };
+        break;
+      case "severity":
+        newFilter = {
+          id,
+          type: "severity",
+          enabled: true,
+          minSeverity: 50,
+        };
+        break;
+      case "confidence":
+        newFilter = {
+          id,
+          type: "confidence",
+          enabled: true,
+          minConfidence: 50,
+        };
+        break;
+      default: {
+        const _exhaustiveCheck: never = type;
+        throw new Error(`Unknown filter type: ${_exhaustiveCheck}`);
+      }
+    }
+
+    onChange([...filters, newFilter]);
+    setShowAddMenu(false);
+  };
+
+  return (
+    <div className="space-y-3">
+      <div className="flex items-center justify-between">
+        <div>
+          <label className="text-sm font-medium text-gray-700">Filter Chain</label>
+          <p className="text-xs text-gray-500">
+            Filters run in sequence. Each filter can remove issues from the pipeline.
+          </p>
+        </div>
+      </div>
+
+      {/* Filter List */}
+      <div className="space-y-2">
+        {filters.length === 0 ? (
+          <div className="p-4 text-center text-sm text-gray-500 bg-gray-50 rounded-lg border border-dashed border-gray-300">
+            No filters configured. Add a filter to remove false positives.
+          </div>
+        ) : (
+          filters.map((filter, index) => (
+            <FilterItemEditor
+              key={filter.id}
+              filter={filter}
+              index={index}
+              totalFilters={filters.length}
+              disabled={disabled}
+              defaultFilterPrompt={defaultFilterPrompt}
+              onMove={(dir) => moveFilter(index, dir)}
+              onRemove={() => removeFilter(index)}
+              onUpdate={(updates) => updateFilter(index, updates)}
+              onToggle={() => toggleFilter(index)}
+            />
+          ))
+        )}
+      </div>
+
+      {/* Add Filter Button */}
+      {!disabled && (
+        <div className="relative">
+          <button
+            type="button"
+            onClick={() => setShowAddMenu(!showAddMenu)}
+            className="flex items-center gap-1 px-3 py-1.5 text-sm text-orange-600 hover:bg-orange-50 rounded-md border border-orange-200"
+          >
+            <PlusIcon className="h-4 w-4" />
+            Add Filter
+          </button>
+
+          {showAddMenu && (
+            <div className="absolute z-10 left-0 top-full mt-1 w-72 bg-white border rounded-lg shadow-lg overflow-hidden">
+              <div className="p-2 border-b bg-gray-50">
+                <span className="text-xs font-medium text-gray-500">Available Filters</span>
+              </div>
+              {AVAILABLE_FILTER_TYPES.map((filterType) => (
+                <button
+                  key={filterType.type}
+                  onClick={() => addFilter(filterType.type)}
+                  className="w-full px-3 py-2 text-left hover:bg-orange-50"
+                >
+                  <div className="text-sm font-medium text-gray-900">{filterType.label}</div>
+                  <div className="text-xs text-gray-500">{filterType.description}</div>
+                </button>
+              ))}
+              <div className="p-2 border-t">
+                <button
+                  onClick={() => setShowAddMenu(false)}
+                  className="text-xs text-gray-500 hover:text-gray-700"
+                >
+                  Cancel
+                </button>
+              </div>
+            </div>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
+interface FilterItemEditorProps {
+  filter: FilterChainItem;
+  index: number;
+  totalFilters: number;
+  disabled?: boolean;
+  defaultFilterPrompt?: string;
+  onMove: (direction: "up" | "down") => void;
+  onRemove: () => void;
+  onUpdate: (updates: Partial<FilterChainItem>) => void;
+  onToggle: () => void;
+}
+
+function FilterItemEditor({
+  filter,
+  index,
+  totalFilters,
+  disabled,
+  defaultFilterPrompt,
+  onMove,
+  onRemove,
+  onUpdate,
+  onToggle,
+}: FilterItemEditorProps) {
+  const [isExpanded, setIsExpanded] = useState(false);
+
+  const filterLabel = AVAILABLE_FILTER_TYPES.find((f) => f.type === filter.type)?.label || filter.type;
+
+  return (
+    <div
+      className={`border rounded-lg ${
+        filter.enabled ? "border-orange-200 bg-orange-50/50" : "border-gray-200 bg-gray-50"
+      }`}
+    >
+      {/* Header Row */}
+      <div className="flex items-center gap-2 px-3 py-2">
+        {/* Order controls */}
+        {!disabled && (
+          <div className="flex flex-col">
+            <button
+              type="button"
+              onClick={() => onMove("up")}
+              disabled={index === 0}
+              className="p-0.5 text-gray-400 hover:text-gray-600 disabled:opacity-30 disabled:cursor-not-allowed"
+            >
+              <ChevronUpIcon className="h-3 w-3" />
+            </button>
+            <button
+              type="button"
+              onClick={() => onMove("down")}
+              disabled={index === totalFilters - 1}
+              className="p-0.5 text-gray-400 hover:text-gray-600 disabled:opacity-30 disabled:cursor-not-allowed"
+            >
+              <ChevronDownIcon className="h-3 w-3" />
+            </button>
+          </div>
+        )}
+
+        {/* Index badge */}
+        <span className="w-5 h-5 flex items-center justify-center text-xs font-medium bg-orange-200 text-orange-800 rounded">
+          {index + 1}
+        </span>
+
+        {/* Expand/collapse button */}
+        <button
+          type="button"
+          onClick={() => setIsExpanded(!isExpanded)}
+          className="flex items-center gap-2 flex-1 text-left"
+        >
+          <ChevronRightIcon
+            className={`h-4 w-4 text-gray-400 transition-transform ${isExpanded ? "rotate-90" : ""}`}
+          />
+          <span className="text-sm font-medium text-gray-900">{filterLabel}</span>
+          {filter.type === "principle-of-charity" && (
+            <span className="text-xs text-gray-500 font-mono">
+              {getModelDisplayName(filter.model)}
+            </span>
+          )}
+          {filter.type === "supported-elsewhere" && (
+            <span className="text-xs text-gray-500 font-mono">
+              {getModelDisplayName(filter.model)}
+            </span>
+          )}
+          {filter.type === "severity" && (
+            <span className="text-xs text-gray-500">
+              ≥ {filter.minSeverity}
+            </span>
+          )}
+          {filter.type === "confidence" && (
+            <span className="text-xs text-gray-500">
+              ≥ {filter.minConfidence}
+            </span>
+          )}
+        </button>
+
+        {/* Enable/Disable toggle */}
+        <button
+          type="button"
+          onClick={() => !disabled && onToggle()}
+          disabled={disabled}
+          className={`relative inline-flex h-5 w-9 flex-shrink-0 cursor-pointer rounded-full border-2 border-transparent transition-colors duration-200 ease-in-out focus:outline-none ${
+            filter.enabled ? "bg-orange-500" : "bg-gray-300"
+          } ${disabled ? "opacity-50 cursor-not-allowed" : ""}`}
+        >
+          <span
+            className={`pointer-events-none inline-block h-4 w-4 transform rounded-full bg-white shadow transition duration-200 ease-in-out ${
+              filter.enabled ? "translate-x-4" : "translate-x-0"
+            }`}
+          />
+        </button>
+
+        {/* Delete button */}
+        {!disabled && (
+          <button
+            type="button"
+            onClick={onRemove}
+            className="p-1 text-gray-400 hover:text-red-500"
+          >
+            <TrashIcon className="h-4 w-4" />
+          </button>
+        )}
+      </div>
+
+      {/* Expanded Settings */}
+      {isExpanded && (
+        <div className="px-3 pb-3 pt-1 border-t border-orange-100 overflow-visible">
+          {filter.type === "principle-of-charity" && (
+            <LLMFilterSettings
+              filter={filter}
+              disabled={disabled}
+              onUpdate={onUpdate}
+              description='Applies the "Principle of Charity" - interprets arguments in their strongest, most reasonable form before critiquing. Issues that dissolve under charitable interpretation are filtered out.'
+            />
+          )}
+          {filter.type === "supported-elsewhere" && (
+            <LLMFilterSettings
+              filter={filter}
+              disabled={disabled}
+              defaultPrompt={defaultFilterPrompt}
+              onUpdate={onUpdate}
+              description="Uses an LLM to check if each flagged issue is actually supported, explained, or qualified elsewhere in the document. Issues that are well-supported are filtered out."
+              showCustomPrompt
+            />
+          )}
+          {filter.type === "severity" && (
+            <SeveritySettings
+              filter={filter}
+              disabled={disabled}
+              onUpdate={onUpdate}
+            />
+          )}
+          {filter.type === "confidence" && (
+            <ConfidenceSettings
+              filter={filter}
+              disabled={disabled}
+              onUpdate={onUpdate}
+            />
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
+// ============================================================================
+// LLM Filter Settings (uses ModelConfigurator)
+// ============================================================================
+
+interface LLMFilterSettingsProps {
+  filter: SupportedElsewhereFilterConfig | PrincipleOfCharityFilterConfig;
+  disabled?: boolean;
+  defaultPrompt?: string;
+  onUpdate: (updates: Partial<SupportedElsewhereFilterConfig | PrincipleOfCharityFilterConfig>) => void;
+  description: string;
+  showCustomPrompt?: boolean;
+}
+
+function LLMFilterSettings({
+  filter,
+  disabled,
+  defaultPrompt,
+  onUpdate,
+  description,
+  showCustomPrompt,
+}: LLMFilterSettingsProps) {
+  const customPrompt = "customPrompt" in filter ? filter.customPrompt : undefined;
+
+  return (
+    <div className="space-y-3 text-sm">
+      <p className="text-xs text-gray-600">{description}</p>
+
+      {/* Model Configuration using ModelConfigurator */}
+      <ModelConfigurator
+        config={filter}
+        onChange={onUpdate}
+        disabled={disabled}
+        colorTheme="orange"
+        showProvider={true}
+        showDelete={false}
+      />
+
+      {/* Custom Prompt (only for supported-elsewhere filter) */}
+      {showCustomPrompt && (
+        <div className="space-y-1">
+          <div className="flex items-center justify-between">
+            <span className="text-xs text-gray-500">Custom Prompt</span>
+            {customPrompt && (
+              <button
+                onClick={() => !disabled && onUpdate({ customPrompt: undefined })}
+                disabled={disabled}
+                className="text-xs text-orange-600 hover:text-orange-700 disabled:opacity-50"
+              >
+                Reset to default
+              </button>
+            )}
+          </div>
+          <textarea
+            value={customPrompt || ""}
+            onChange={(e) => onUpdate({ customPrompt: e.target.value || undefined })}
+            disabled={disabled}
+            placeholder={
+              defaultPrompt ? "Using default prompt (click to customize)" : "Enter custom system prompt..."
+            }
+            className="w-full px-2 py-1.5 text-xs font-mono border rounded resize-y min-h-[60px] max-h-[200px] disabled:bg-gray-50 placeholder:text-gray-400"
+            rows={3}
+          />
+          {defaultPrompt && !customPrompt && (
+            <details className="text-xs">
+              <summary className="text-gray-500 cursor-pointer hover:text-gray-700">
+                View default prompt
+              </summary>
+              <pre className="mt-1 p-2 bg-gray-50 border rounded max-h-32 overflow-auto whitespace-pre-wrap text-gray-600">
+                {defaultPrompt}
+              </pre>
+            </details>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
+// ============================================================================
+// Simple Filter Settings
+// ============================================================================
+
+interface SeveritySettingsProps {
+  filter: SeverityFilterConfig;
+  disabled?: boolean;
+  onUpdate: (updates: Partial<SeverityFilterConfig>) => void;
+}
+
+function SeveritySettings({ filter, disabled, onUpdate }: SeveritySettingsProps) {
+  return (
+    <div className="space-y-3 text-sm">
+      <p className="text-xs text-gray-600">
+        Removes issues with a severity score below the threshold.
+      </p>
+      <div className="flex items-center gap-3">
+        <span className="text-xs text-gray-500 w-24">Min Severity</span>
+        <input
+          type="number"
+          min={0}
+          max={100}
+          value={filter.minSeverity}
+          onChange={(e) => onUpdate({ minSeverity: Number(e.target.value) })}
+          disabled={disabled}
+          className="w-20 px-2 py-1 text-sm border rounded disabled:bg-gray-50"
+        />
+        <span className="text-xs text-gray-400">(0-100)</span>
+      </div>
+    </div>
+  );
+}
+
+interface ConfidenceSettingsProps {
+  filter: ConfidenceFilterConfig;
+  disabled?: boolean;
+  onUpdate: (updates: Partial<ConfidenceFilterConfig>) => void;
+}
+
+function ConfidenceSettings({ filter, disabled, onUpdate }: ConfidenceSettingsProps) {
+  return (
+    <div className="space-y-3 text-sm">
+      <p className="text-xs text-gray-600">
+        Removes issues with a confidence score below the threshold.
+      </p>
+      <div className="flex items-center gap-3">
+        <span className="text-xs text-gray-500 w-24">Min Confidence</span>
+        <input
+          type="number"
+          min={0}
+          max={100}
+          value={filter.minConfidence}
+          onChange={(e) => onUpdate({ minConfidence: Number(e.target.value) })}
+          disabled={disabled}
+          className="w-20 px-2 py-1 text-sm border rounded disabled:bg-gray-50"
+        />
+        <span className="text-xs text-gray-400">(0-100)</span>
+      </div>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/JudgeEditor.tsx b/apps/web/src/app/monitor/lab/components/profiles/JudgeEditor.tsx
new file mode 100644
index 000000000..18cb09aa3
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/JudgeEditor.tsx
@@ -0,0 +1,56 @@
+"use client";
+
+import type { JudgeConfig } from "../../types";
+import { ModelConfigurator } from "./ModelConfigurator";
+
+interface JudgeEditorProps {
+  judge: JudgeConfig;
+  onChange: (judge: JudgeConfig) => void;
+  disabled?: boolean;
+}
+
+export function JudgeEditor({ judge, onChange, disabled }: JudgeEditorProps) {
+  const updateJudge = (updates: Partial<JudgeConfig>) => {
+    onChange({ ...judge, ...updates });
+  };
+
+  return (
+    <div className="space-y-4">
+      {/* Enable/Disable Toggle */}
+      <div className="flex items-center justify-between">
+        <div>
+          <label className="text-sm font-medium text-gray-700">Enable Judge</label>
+          <p className="text-xs text-gray-500">
+            When using multiple extractors, the judge aggregates and deduplicates results.
+          </p>
+        </div>
+        <button
+          type="button"
+          onClick={() => !disabled && updateJudge({ enabled: !judge.enabled })}
+          disabled={disabled}
+          className={`relative inline-flex h-6 w-11 flex-shrink-0 cursor-pointer rounded-full border-2 border-transparent transition-colors duration-200 ease-in-out focus:outline-none focus:ring-2 focus:ring-purple-600 focus:ring-offset-2 ${
+            judge.enabled ? "bg-purple-600" : "bg-gray-200"
+          } ${disabled ? "opacity-50 cursor-not-allowed" : ""}`}
+        >
+          <span
+            className={`pointer-events-none inline-block h-5 w-5 transform rounded-full bg-white shadow ring-0 transition duration-200 ease-in-out ${
+              judge.enabled ? "translate-x-5" : "translate-x-0"
+            }`}
+          />
+        </button>
+      </div>
+
+      {/* Model Configuration - only shown when enabled */}
+      {judge.enabled && (
+        <ModelConfigurator
+          config={judge}
+          onChange={updateJudge}
+          disabled={disabled}
+          colorTheme="purple"
+          showProvider={true}
+          showDelete={false}
+        />
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ModelConfigurator.tsx b/apps/web/src/app/monitor/lab/components/profiles/ModelConfigurator.tsx
new file mode 100644
index 000000000..df629f2c3
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ModelConfigurator.tsx
@@ -0,0 +1,403 @@
+"use client";
+
+import { useState, useMemo } from "react";
+import { ChevronDownIcon, TrashIcon } from "@heroicons/react/24/outline";
+import type { ReasoningConfig, ReasoningEffort, ProviderPreferences } from "../../types";
+import { useModels } from "../../hooks/useModels";
+import { useModelEndpoints } from "../../hooks/useModelEndpoints";
+import { resolveReasoningBudgetSync, type ModelEndpointData } from "@roast/ai";
+import { ModelSelector, getModelDisplayName } from "./ModelSelector";
+import { ProviderSelector } from "./ProviderSelector";
+
+// ============================================================================
+// Types
+// ============================================================================
+
+export interface ModelConfig {
+  model: string;
+  temperature?: number | "default";
+  reasoning?: ReasoningConfig;
+  thinking?: boolean; // Legacy field, mapped to reasoning
+  provider?: ProviderPreferences;
+}
+
+export interface ModelConfiguratorProps {
+  /** Current configuration */
+  config: ModelConfig;
+  /** Callback when config changes */
+  onChange: (updates: Partial<ModelConfig>) => void;
+  /** Whether the configurator is disabled */
+  disabled?: boolean;
+  /** Optional label/index to show (e.g., "1" for extractor list) */
+  label?: string | number;
+  /** Color theme for the component */
+  colorTheme?: "blue" | "purple" | "orange";
+  /** Whether to show provider selection (for OpenRouter models) */
+  showProvider?: boolean;
+  /** Whether to show delete button */
+  showDelete?: boolean;
+  /** Callback when delete is clicked */
+  onDelete?: () => void;
+  /** Whether delete is disabled (e.g., can't delete last item) */
+  deleteDisabled?: boolean;
+  /** Tooltip for delete button when disabled */
+  deleteDisabledReason?: string;
+  /** Layout mode */
+  layout?: "compact" | "expanded";
+}
+
+// ============================================================================
+// Color Theme Helpers
+// ============================================================================
+
+const themeColors = {
+  blue: {
+    bg: "bg-blue-50",
+    border: "border-blue-100",
+    hoverBg: "hover:bg-blue-100",
+    text: "text-blue-900",
+    textMuted: "text-blue-400",
+    badge: "bg-blue-100 text-blue-700",
+  },
+  purple: {
+    bg: "bg-purple-50",
+    border: "border-purple-100",
+    hoverBg: "hover:bg-purple-100",
+    text: "text-purple-900",
+    textMuted: "text-purple-400",
+    badge: "bg-purple-100 text-purple-700",
+  },
+  orange: {
+    bg: "bg-orange-50",
+    border: "border-orange-100",
+    hoverBg: "hover:bg-orange-100",
+    text: "text-orange-900",
+    textMuted: "text-orange-400",
+    badge: "bg-orange-100 text-orange-700",
+  },
+};
+
+// ============================================================================
+// Main Component
+// ============================================================================
+
+export function ModelConfigurator({
+  config,
+  onChange,
+  disabled,
+  label,
+  colorTheme = "blue",
+  showProvider = true,
+  showDelete = false,
+  onDelete,
+  deleteDisabled = false,
+  deleteDisabledReason,
+  layout: _layout = "compact",
+}: ModelConfiguratorProps) {
+  const { models, loading: modelsLoading } = useModels();
+  const [showModelDropdown, setShowModelDropdown] = useState(false);
+  const [showCustomTemp, setShowCustomTemp] = useState(false);
+  const [customTempValue, setCustomTempValue] = useState(
+    typeof config.temperature === "number" ? config.temperature : 0.5
+  );
+
+  const colors = themeColors[colorTheme];
+
+  // Find model info for the selected model
+  const modelInfo = models.find((m) => m.id === config.model);
+  const supportsTemperature = modelInfo?.supportsTemperature ?? true;
+  const supportsReasoning = modelInfo?.supportsReasoning ?? true;
+  const defaultTemperature = modelInfo?.defaultTemperature;
+  const maxTemperature = modelInfo?.maxTemperature ?? 1;
+
+  // Fetch endpoints for OpenRouter models to calculate reasoning budget
+  const isOpenRouter = config.model.includes("/");
+  const { endpoints: modelEndpoints } = useModelEndpoints(isOpenRouter ? config.model : null);
+
+  // Check if this is an Anthropic model (direct API, not via OpenRouter)
+  const isAnthropicDirect = config.model.startsWith("claude-") && !config.model.includes("/");
+
+  // Calculate reasoning budget options based on model endpoints
+  const reasoningOptions = useMemo(() => {
+    const efforts: Array<{ value: string; label: string; tokens: string }> = [
+      { value: "off", label: "Off", tokens: "" },
+    ];
+
+    const effortLevels: Array<{ value: ReasoningEffort; label: string }> = [
+      { value: "minimal", label: "Minimal" },
+      { value: "low", label: "Low" },
+      { value: "medium", label: "Medium" },
+      { value: "high", label: "High" },
+      { value: "xhigh", label: "Very High" },
+    ];
+
+    // Effort level percentages (same as ReasoningBudgetResolver)
+    const effortPercentages: Record<ReasoningEffort, number> = {
+      minimal: 0.1,
+      low: 0.3,
+      medium: 0.5,
+      high: 0.7,
+      xhigh: 0.9,
+    };
+
+    for (const { value: effort, label } of effortLevels) {
+      if (isAnthropicDirect) {
+        // Anthropic models support up to 128K thinking tokens
+        // Calculate budget based on percentage of max (128K)
+        const maxThinkingTokens = 128000;
+        const budget = Math.floor(maxThinkingTokens * effortPercentages[effort]);
+        const displayBudget = budget >= 1000 ? `${Math.round(budget / 1000)}K` : String(budget);
+        efforts.push({
+          value: effort,
+          label,
+          tokens: displayBudget,
+        });
+      } else if (isOpenRouter && modelEndpoints.length > 0) {
+        // OpenRouter models - use endpoint data to calculate
+        const endpointsData: ModelEndpointData[] = modelEndpoints.map((ep) => ({
+          tag: ep.tag,
+          providerName: ep.providerName,
+          maxCompletionTokens: ep.maxCompletionTokens,
+        }));
+        const budget = resolveReasoningBudgetSync({
+          effort,
+          modelId: config.model,
+          selectedProviders: config.provider?.order,
+          endpointsData,
+        });
+        efforts.push({
+          value: effort,
+          label,
+          tokens: budget.displayBudget,
+        });
+      } else {
+        // Fallback to static estimates (for unknown models or while loading)
+        const staticTokens: Record<ReasoningEffort, string> = {
+          minimal: "~1K",
+          low: "~2K",
+          medium: "~4K",
+          high: "~8K",
+          xhigh: "~16K",
+        };
+        efforts.push({
+          value: effort,
+          label,
+          tokens: staticTokens[effort],
+        });
+      }
+    }
+
+    return efforts;
+  }, [config.model, config.provider?.order, modelEndpoints, isOpenRouter, isAnthropicDirect]);
+
+  // Check if current temp is a preset or custom
+  const DROPDOWN_TEMPS = [0, 0.3, 0.7, 1, 1.5, 2];
+  const isCustomTemp =
+    typeof config.temperature === "number" && !DROPDOWN_TEMPS.includes(config.temperature);
+
+  // Build auto label with default temp if known
+  const autoLabel = defaultTemperature !== undefined ? `Auto (${defaultTemperature})` : "Auto";
+
+  const handleReasoningChange = (value: string) => {
+    if (disabled) return;
+    if (value === "off") {
+      onChange({ reasoning: false, thinking: false });
+    } else {
+      onChange({ reasoning: { effort: value as ReasoningEffort }, thinking: true });
+    }
+  };
+
+  const handleTemperatureChange = (value: string) => {
+    if (disabled) return;
+    if (value === "custom") {
+      setShowCustomTemp(true);
+      onChange({ temperature: customTempValue });
+    } else if (value === "default") {
+      setShowCustomTemp(false);
+      onChange({ temperature: "default" });
+    } else {
+      setShowCustomTemp(false);
+      onChange({ temperature: parseFloat(value) });
+    }
+  };
+
+  return (
+    <div className={`p-3 ${colors.bg} rounded-lg border ${colors.border} space-y-3`}>
+      {/* Top row: label, model, reasoning, delete */}
+      <div className="flex items-center gap-2">
+        {/* Optional label/index */}
+        {label !== undefined && (
+          <span className={`text-xs ${colors.textMuted} font-medium w-5`}>{label}</span>
+        )}
+
+        {/* Model Selector */}
+        <div className="flex-1 min-w-0 relative">
+          <button
+            onClick={() => !disabled && setShowModelDropdown(!showModelDropdown)}
+            disabled={disabled}
+            className={`flex items-center gap-2 w-full text-left px-2 py-1 rounded ${colors.hoverBg} disabled:hover:bg-transparent disabled:cursor-default`}
+          >
+            <span className={`font-mono text-sm ${colors.text} truncate`}>
+              {getModelDisplayName(config.model)}
+            </span>
+            {!disabled && <ChevronDownIcon className={`h-3 w-3 ${colors.textMuted} flex-shrink-0`} />}
+          </button>
+          {showModelDropdown && (
+            <ModelSelector
+              models={models}
+              loading={modelsLoading}
+              onSelect={(model) => {
+                onChange({ model: model.id });
+                setShowModelDropdown(false);
+              }}
+              onCancel={() => setShowModelDropdown(false)}
+            />
+          )}
+        </div>
+
+        {/* Reasoning Dropdown */}
+        {supportsReasoning ? (
+          <select
+            value={getReasoningValue(config.reasoning, config.thinking)}
+            onChange={(e) => handleReasoningChange(e.target.value)}
+            disabled={disabled}
+            className={`px-2 py-1 text-xs rounded border transition-colors ${
+              getReasoningValue(config.reasoning, config.thinking) !== "off"
+                ? "bg-green-100 text-green-700 border-green-200"
+                : "bg-gray-100 text-gray-500 border-gray-200"
+            } disabled:bg-gray-50 disabled:text-gray-500`}
+            title="Extended reasoning effort level - shows calculated token budget"
+          >
+            {reasoningOptions.map((opt) => (
+              <option key={opt.value} value={opt.value}>
+                {opt.label}
+                {opt.tokens ? ` (${opt.tokens})` : ""}
+              </option>
+            ))}
+          </select>
+        ) : (
+          <span
+            className="flex items-center gap-1.5 px-2 py-1 text-xs rounded border bg-gray-50 text-gray-400 border-gray-200 cursor-not-allowed"
+            title="This model does not support extended reasoning"
+          >
+            Reasoning N/A
+          </span>
+        )}
+
+        {/* Delete Button */}
+        {showDelete && (
+          <button
+            onClick={onDelete}
+            disabled={disabled || deleteDisabled}
+            className={`p-1 rounded ${
+              !disabled && !deleteDisabled
+                ? "text-gray-400 hover:text-red-600 hover:bg-red-50"
+                : "text-gray-300 cursor-not-allowed"
+            }`}
+            title={deleteDisabled ? deleteDisabledReason : "Remove"}
+          >
+            <TrashIcon className="h-4 w-4" />
+          </button>
+        )}
+      </div>
+
+      {/* Temperature row */}
+      {supportsTemperature ? (
+        <div className={`${label !== undefined ? "pl-5" : ""} flex items-start gap-3`}>
+          <div className="flex items-center gap-2">
+            <span className="text-xs text-gray-500">Temperature</span>
+            <select
+              value={isCustomTemp ? "custom" : String(config.temperature ?? "default")}
+              onChange={(e) => handleTemperatureChange(e.target.value)}
+              disabled={disabled}
+              className="px-2 py-1 text-sm border border-gray-200 rounded bg-white disabled:bg-gray-50 disabled:text-gray-500"
+            >
+              <option value="default">{autoLabel}</option>
+              <option value="0">0 - Precise</option>
+              <option value="0.3">0.3</option>
+              <option value="0.7">0.7</option>
+              <option value="1">1.0{maxTemperature <= 1 ? " - Creative" : ""}</option>
+              {maxTemperature > 1 && <option value="1.5">1.5</option>}
+              {maxTemperature >= 2 && <option value="2">2.0 - Creative</option>}
+              <option value="custom">Custom...</option>
+            </select>
+          </div>
+
+          {/* Custom temperature slider */}
+          {(showCustomTemp || isCustomTemp) && !disabled && (
+            <div className="flex items-center gap-2 flex-1">
+              <input
+                type="range"
+                min={0}
+                max={maxTemperature}
+                step={0.05}
+                value={typeof config.temperature === "number" ? config.temperature : customTempValue}
+                onChange={(e) => {
+                  const val = parseFloat(e.target.value);
+                  setCustomTempValue(val);
+                  onChange({ temperature: val });
+                }}
+                className="flex-1 h-2 bg-gray-200 rounded-lg appearance-none cursor-pointer accent-blue-600"
+              />
+              <input
+                type="number"
+                min={0}
+                max={maxTemperature}
+                step={0.05}
+                value={typeof config.temperature === "number" ? config.temperature : customTempValue}
+                onChange={(e) => {
+                  const val = parseFloat(e.target.value);
+                  if (!isNaN(val) && val >= 0 && val <= maxTemperature) {
+                    setCustomTempValue(val);
+                    onChange({ temperature: val });
+                  }
+                }}
+                className="w-16 px-2 py-1 text-center text-sm border rounded"
+              />
+            </div>
+          )}
+        </div>
+      ) : (
+        <div className={`${label !== undefined ? "pl-5" : ""} text-xs text-gray-400`}>
+          Temperature not supported by this model
+        </div>
+      )}
+
+      {/* Provider preference row - only for OpenRouter models */}
+      {showProvider && isOpenRouter && (
+        <ProviderSelector
+          provider={config.provider}
+          onChange={(provider) => onChange({ provider })}
+          disabled={disabled}
+          modelId={config.model}
+          indent={label !== undefined}
+        />
+      )}
+    </div>
+  );
+}
+
+// ============================================================================
+// Helpers
+// ============================================================================
+
+/**
+ * Convert ReasoningConfig to dropdown value string
+ * Handles both new reasoning config and legacy thinking boolean
+ */
+function getReasoningValue(reasoning: ReasoningConfig | undefined, thinking?: boolean): string {
+  if (reasoning !== undefined) {
+    if (reasoning === false) return "off";
+    if (typeof reasoning === "object" && "effort" in reasoning) {
+      return reasoning.effort;
+    }
+    if (typeof reasoning === "object" && "budget_tokens" in reasoning) {
+      return "high"; // Default for custom budget
+    }
+  }
+  if (thinking === true) return "medium"; // Legacy
+  return "off";
+}
+
+// Re-export for convenience
+export { getModelDisplayName } from "./ModelSelector";
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ModelSelector.tsx b/apps/web/src/app/monitor/lab/components/profiles/ModelSelector.tsx
new file mode 100644
index 000000000..aeb748633
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ModelSelector.tsx
@@ -0,0 +1,160 @@
+"use client";
+
+import { useState, useMemo, useRef, useEffect } from "react";
+import { MagnifyingGlassIcon } from "@heroicons/react/24/outline";
+import type { ModelInfo } from "../../hooks/useModels";
+
+interface ModelSelectorProps {
+  models: ModelInfo[];
+  loading: boolean;
+  onSelect: (model: ModelInfo) => void;
+  onCancel: () => void;
+}
+
+export function ModelSelector({ models, loading, onSelect, onCancel }: ModelSelectorProps) {
+  const [search, setSearch] = useState("");
+  const [highlightedIndex, setHighlightedIndex] = useState(0);
+  const inputRef = useRef<HTMLInputElement>(null);
+  const listRef = useRef<HTMLDivElement>(null);
+
+  useEffect(() => {
+    inputRef.current?.focus();
+  }, []);
+
+  const filteredModels = useMemo(() => {
+    if (!search.trim()) return models;
+    const lowerSearch = search.toLowerCase();
+    return models.filter(
+      (m) =>
+        m.id.toLowerCase().includes(lowerSearch) ||
+        m.name.toLowerCase().includes(lowerSearch)
+    );
+  }, [models, search]);
+
+  const anthropicModels = filteredModels.filter((m) => m.provider === "anthropic");
+  const openRouterModels = filteredModels.filter((m) => m.provider === "openrouter");
+
+  const handleKeyDown = (e: React.KeyboardEvent) => {
+    if (e.key === "Escape") {
+      onCancel();
+    } else if (e.key === "ArrowDown") {
+      e.preventDefault();
+      setHighlightedIndex((prev) => Math.min(prev + 1, filteredModels.length - 1));
+    } else if (e.key === "ArrowUp") {
+      e.preventDefault();
+      setHighlightedIndex((prev) => Math.max(prev - 1, 0));
+    } else if (e.key === "Enter" && filteredModels.length > 0) {
+      e.preventDefault();
+      onSelect(filteredModels[highlightedIndex]);
+    }
+  };
+
+  useEffect(() => {
+    const list = listRef.current;
+    if (!list) return;
+    const highlighted = list.querySelector(`[data-index="${highlightedIndex}"]`);
+    highlighted?.scrollIntoView({ block: "nearest" });
+  }, [highlightedIndex]);
+
+  if (loading) {
+    return (
+      <div className="absolute z-10 left-0 top-full mt-1 min-w-[300px] bg-white border rounded-lg shadow-lg p-4">
+        <span className="text-sm text-gray-500">Loading models...</span>
+      </div>
+    );
+  }
+
+  return (
+    <div
+      className="absolute z-10 left-0 top-full mt-1 min-w-[300px] bg-white border rounded-lg shadow-lg overflow-hidden"
+      onKeyDown={handleKeyDown}
+    >
+      <div className="p-2 border-b">
+        <div className="flex items-center gap-2 px-2 py-1 bg-gray-50 rounded border">
+          <MagnifyingGlassIcon className="h-4 w-4 text-gray-400" />
+          <input
+            ref={inputRef}
+            type="text"
+            value={search}
+            onChange={(e) => {
+              setSearch(e.target.value);
+              setHighlightedIndex(0);
+            }}
+            placeholder="Search models..."
+            className="flex-1 bg-transparent text-sm outline-none"
+          />
+        </div>
+        <div className="flex items-center justify-between mt-2 text-xs text-gray-500 px-1">
+          <span>{filteredModels.length} models</span>
+          <button onClick={onCancel} className="text-blue-600 hover:underline">
+            Cancel
+          </button>
+        </div>
+      </div>
+
+      <div ref={listRef} className="max-h-[300px] overflow-y-auto">
+        {anthropicModels.length > 0 && (
+          <>
+            <div className="px-3 py-1.5 bg-gray-50 text-xs font-medium text-gray-500 sticky top-0">
+              Anthropic ({anthropicModels.length})
+            </div>
+            {anthropicModels.map((model) => {
+              const globalIndex = filteredModels.indexOf(model);
+              return (
+                <button
+                  key={model.id}
+                  data-index={globalIndex}
+                  onClick={() => onSelect(model)}
+                  onMouseEnter={() => setHighlightedIndex(globalIndex)}
+                  className={`w-full px-3 py-2 text-left text-sm hover:bg-purple-50 ${
+                    globalIndex === highlightedIndex ? "bg-purple-50" : ""
+                  }`}
+                >
+                  <span className="font-mono text-gray-900">{getModelDisplayName(model.id)}</span>
+                </button>
+              );
+            })}
+          </>
+        )}
+
+        {openRouterModels.length > 0 && (
+          <>
+            <div className="px-3 py-1.5 bg-gray-50 text-xs font-medium text-gray-500 sticky top-0">
+              OpenRouter ({openRouterModels.length})
+            </div>
+            {openRouterModels.map((model) => {
+              const globalIndex = filteredModels.indexOf(model);
+              return (
+                <button
+                  key={model.id}
+                  data-index={globalIndex}
+                  onClick={() => onSelect(model)}
+                  onMouseEnter={() => setHighlightedIndex(globalIndex)}
+                  className={`w-full px-3 py-2 text-left text-sm hover:bg-purple-50 ${
+                    globalIndex === highlightedIndex ? "bg-purple-50" : ""
+                  }`}
+                >
+                  <span className="font-mono text-gray-900">{getModelDisplayName(model.id)}</span>
+                </button>
+              );
+            })}
+          </>
+        )}
+
+        {filteredModels.length === 0 && (
+          <div className="p-4 text-center text-sm text-gray-500">
+            No models found matching "{search}"
+          </div>
+        )}
+      </div>
+    </div>
+  );
+}
+
+export function getModelDisplayName(modelId: string): string {
+  let name = modelId.replace(/-\d{8}$/, "");
+  if (name.includes("/")) {
+    name = name.split("/").pop() || name;
+  }
+  return name;
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ProfileDetailView.tsx b/apps/web/src/app/monitor/lab/components/profiles/ProfileDetailView.tsx
new file mode 100644
index 000000000..073927c9e
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ProfileDetailView.tsx
@@ -0,0 +1,606 @@
+"use client";
+
+import { useState, useEffect } from "react";
+import { PencilIcon, CheckIcon, XMarkIcon, ChevronDownIcon, ChevronRightIcon } from "@heroicons/react/24/outline";
+import type { Profile, ProfileConfig, ExtractorConfig, JudgeConfig, FilterChainItem } from "../../types";
+import { formatDate } from "../../utils/formatters";
+import { ExtractorEditor } from "./ExtractorEditor";
+import { useDefaultPrompts } from "../../hooks/useDefaultPrompts";
+import { JudgeEditor } from "./JudgeEditor";
+import { FilterChainEditor } from "./FilterChainEditor";
+
+interface ProfileDetailViewProps {
+  profile: Profile;
+  onSave: (updates: { name?: string; description?: string; config?: ProfileConfig }) => Promise<void>;
+}
+
+const DEFAULT_CONFIG: ProfileConfig = {
+  version: 1,
+  models: {
+    extractors: [
+      { model: "claude-sonnet-4-5-20250929", temperature: 0, thinking: false },
+      { model: "google/gemini-3-flash-preview", temperature: "default", thinking: true },
+      { model: "google/gemini-2.5-flash", temperature: "default", thinking: true },
+    ],
+    judge: { model: "claude-sonnet-4-5-20250929", enabled: false },
+  },
+  thresholds: {
+    minSeverityThreshold: 60,
+    maxIssues: 15,
+    dedupThreshold: 0.7,
+    maxIssuesToProcess: 25,
+  },
+  filterChain: [
+    {
+      id: "default-supported-elsewhere",
+      type: "supported-elsewhere",
+      enabled: true,
+      model: "claude-sonnet-4-5-20250929",
+      temperature: 0.1,
+    },
+  ],
+};
+
+// Migrate old filterChain format { filters: [...] } to new format FilterChainItem[]
+function migrateFilterChain(config: Profile["config"] | undefined): FilterChainItem[] {
+  if (!config?.filterChain) return DEFAULT_CONFIG.filterChain;
+
+  // New format: array of FilterChainItem
+  if (Array.isArray(config.filterChain)) {
+    return config.filterChain;
+  }
+
+  // Old format: { filters: Array<{ type, enabled }> }
+  const oldFormat = config.filterChain as unknown as { filters?: Array<{ type: string; enabled: boolean }> };
+  if (oldFormat.filters && Array.isArray(oldFormat.filters)) {
+    // Convert old format to new format - only migrate supported-elsewhere
+    const supportedElsewhere = oldFormat.filters.find(f => f.type === "supported-elsewhere");
+    if (supportedElsewhere) {
+      return [{
+        id: "migrated-supported-elsewhere",
+        type: "supported-elsewhere" as const,
+        enabled: supportedElsewhere.enabled,
+        model: "claude-sonnet-4-5-20250929",
+        temperature: 0.1,
+      }];
+    }
+  }
+
+  return DEFAULT_CONFIG.filterChain;
+}
+
+export function ProfileDetailView({ profile, onSave }: ProfileDetailViewProps) {
+  const [isEditing, setIsEditing] = useState(false);
+  const [saving, setSaving] = useState(false);
+  const [editedName, setEditedName] = useState(profile.name);
+  const [editedDescription, setEditedDescription] = useState(profile.description || "");
+  const { prompts: defaultPrompts } = useDefaultPrompts();
+
+  // Merge profile config with defaults to ensure all values are present
+  const mergedConfig: ProfileConfig = {
+    ...DEFAULT_CONFIG,
+    ...profile.config,
+    thresholds: { ...DEFAULT_CONFIG.thresholds, ...profile.config.thresholds },
+    models: {
+      extractors: profile.config.models.extractors.length > 0 ? profile.config.models.extractors : DEFAULT_CONFIG.models.extractors,
+      judge: { ...DEFAULT_CONFIG.models.judge, ...profile.config.models.judge },
+    },
+    filterChain: migrateFilterChain(profile.config),
+    prompts: profile.config.prompts,
+  };
+
+  const [editedConfig, setEditedConfig] = useState<ProfileConfig>(mergedConfig);
+  const [expandedSections, setExpandedSections] = useState<Set<string>>(new Set(["extraction"]));
+
+  // Reset state when profile changes
+  useEffect(() => {
+    const newMergedConfig: ProfileConfig = {
+      ...DEFAULT_CONFIG,
+      ...profile.config,
+      thresholds: { ...DEFAULT_CONFIG.thresholds, ...profile.config.thresholds },
+      models: {
+        extractors: profile.config.models.extractors.length > 0 ? profile.config.models.extractors : DEFAULT_CONFIG.models.extractors,
+        judge: { ...DEFAULT_CONFIG.models.judge, ...profile.config.models.judge },
+      },
+      filterChain: migrateFilterChain(profile.config),
+      prompts: profile.config.prompts,
+    };
+    setEditedName(profile.name);
+    setEditedDescription(profile.description || "");
+    setEditedConfig(newMergedConfig);
+    setIsEditing(false);
+  }, [profile.id, profile.name, profile.description, profile.config]);
+
+  const toggleSection = (section: string) => {
+    const newSet = new Set(expandedSections);
+    if (newSet.has(section)) {
+      newSet.delete(section);
+    } else {
+      newSet.add(section);
+    }
+    setExpandedSections(newSet);
+  };
+
+  const handleSave = async () => {
+    setSaving(true);
+    try {
+      await onSave({
+        name: editedName,
+        description: editedDescription || undefined,
+        config: editedConfig,
+      });
+      setIsEditing(false);
+    } catch (error) {
+      console.error("Failed to save profile:", error);
+      alert("Failed to save profile");
+    } finally {
+      setSaving(false);
+    }
+  };
+
+  const handleCancel = () => {
+    const newMergedConfig: ProfileConfig = {
+      ...DEFAULT_CONFIG,
+      ...profile.config,
+      thresholds: { ...DEFAULT_CONFIG.thresholds, ...profile.config.thresholds },
+      models: {
+        extractors: profile.config.models.extractors.length > 0 ? profile.config.models.extractors : DEFAULT_CONFIG.models.extractors,
+        judge: { ...DEFAULT_CONFIG.models.judge, ...profile.config.models.judge },
+      },
+      filterChain: migrateFilterChain(profile.config),
+      prompts: profile.config.prompts,
+    };
+    setEditedName(profile.name);
+    setEditedDescription(profile.description || "");
+    setEditedConfig(newMergedConfig);
+    setIsEditing(false);
+  };
+
+  const updateThreshold = (key: keyof ProfileConfig["thresholds"], value: number) => {
+    setEditedConfig((prev) => ({
+      ...prev,
+      thresholds: { ...prev.thresholds, [key]: value },
+    }));
+  };
+
+  const updatePrompt = (key: keyof NonNullable<ProfileConfig["prompts"]>, value: string) => {
+    setEditedConfig((prev) => ({
+      ...prev,
+      prompts: { ...(prev.prompts || {}), [key]: value || undefined },
+    }));
+  };
+
+  const updateExtractors = (extractors: ExtractorConfig[]) => {
+    setEditedConfig((prev) => ({
+      ...prev,
+      models: { ...prev.models, extractors },
+    }));
+  };
+
+  const updateJudge = (judge: JudgeConfig) => {
+    setEditedConfig((prev) => ({
+      ...prev,
+      models: { ...prev.models, judge },
+    }));
+  };
+
+  const updateFilterChain = (filterChain: FilterChainItem[]) => {
+    setEditedConfig((prev) => ({
+      ...prev,
+      filterChain,
+    }));
+  };
+
+  return (
+    <div className="h-full flex flex-col">
+      {/* Header */}
+      <div className="p-4 border-b bg-white">
+        <div className="flex items-center justify-between">
+          <div className="flex-1">
+            {isEditing ? (
+              <input
+                type="text"
+                value={editedName}
+                onChange={(e) => setEditedName(e.target.value)}
+                className="text-xl font-semibold text-gray-900 border-b border-blue-500 focus:outline-none w-full"
+                placeholder="Profile name"
+              />
+            ) : (
+              <div className="flex items-center gap-2">
+                <h1 className="text-xl font-semibold text-gray-900">{profile.name}</h1>
+                {profile.isDefault && (
+                  <span className="px-2 py-0.5 text-xs bg-green-100 text-green-700 rounded">
+                    Default
+                  </span>
+                )}
+              </div>
+            )}
+            {isEditing ? (
+              <textarea
+                value={editedDescription}
+                onChange={(e) => setEditedDescription(e.target.value)}
+                className="text-sm text-gray-500 mt-1 border border-gray-300 rounded px-2 py-1 focus:outline-none focus:border-blue-500 w-full resize-y min-h-[60px]"
+                placeholder="Description (optional)"
+                rows={2}
+              />
+            ) : (
+              <p className="text-sm text-gray-500 mt-1">
+                {profile.description || "No description"}
+              </p>
+            )}
+            <p className="text-xs text-gray-400 mt-2">
+              Updated {formatDate(profile.updatedAt)}
+            </p>
+          </div>
+          <div className="flex items-center gap-2">
+            {isEditing ? (
+              <>
+                <button
+                  onClick={handleCancel}
+                  disabled={saving}
+                  className="flex items-center gap-1 px-3 py-2 text-gray-600 hover:bg-gray-100 rounded-md disabled:opacity-50"
+                >
+                  <XMarkIcon className="h-4 w-4" />
+                  Cancel
+                </button>
+                <button
+                  onClick={() => void handleSave()}
+                  disabled={saving || !editedName.trim()}
+                  className="flex items-center gap-1 px-3 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700 disabled:opacity-50"
+                >
+                  <CheckIcon className="h-4 w-4" />
+                  {saving ? "Saving..." : "Save"}
+                </button>
+              </>
+            ) : (
+              <button
+                onClick={() => setIsEditing(true)}
+                className="flex items-center gap-1 px-3 py-2 text-blue-600 hover:bg-blue-50 rounded-md"
+              >
+                <PencilIcon className="h-4 w-4" />
+                Edit
+              </button>
+            )}
+          </div>
+        </div>
+      </div>
+
+      {/* Pipeline Configuration */}
+      <div className="flex-1 overflow-y-auto">
+        <div className="border rounded-lg m-4 bg-white">
+          <div className="px-4 py-3 bg-gray-50 border-b">
+            <h4 className="font-medium text-gray-900">Pipeline Configuration</h4>
+            <p className="text-xs text-gray-500 mt-0.5">Configure each phase of the fallacy detection pipeline</p>
+          </div>
+
+          <div className="divide-y">
+            {/* Phase 1: Extraction */}
+            <PipelineSection
+              title="1. Extraction"
+              subtitle="Find potential issues in the document"
+              color="blue"
+              isExpanded={expandedSections.has("extraction")}
+              onToggle={() => toggleSection("extraction")}
+            >
+              <div className="space-y-4">
+                {/* Models */}
+                <div>
+                  <label className="block text-sm font-medium text-gray-700 mb-1">
+                    Extractor Models
+                  </label>
+                  <p className="text-xs text-gray-500 mb-2">
+                    AI models that scan the document for issues. Multiple models improve coverage.
+                  </p>
+                  <ExtractorEditor
+                    extractors={editedConfig.models.extractors}
+                    onChange={updateExtractors}
+                    disabled={!isEditing}
+                  />
+                </div>
+
+                {/* Thresholds */}
+                <div className="grid grid-cols-2 gap-4">
+                  <div>
+                    <label className="block text-sm font-medium text-gray-700">
+                      Min Severity
+                    </label>
+                    <p className="text-xs text-gray-500 mb-1">
+                      Only flag issues with severity ≥ this value (0-100)
+                    </p>
+                    <input
+                      type="number"
+                      min={0}
+                      max={100}
+                      value={editedConfig.thresholds.minSeverityThreshold}
+                      onChange={(e) => updateThreshold("minSeverityThreshold", Number(e.target.value))}
+                      disabled={!isEditing}
+                      className="w-24 px-3 py-2 border rounded-md text-sm disabled:bg-gray-50 disabled:text-gray-500"
+                    />
+                  </div>
+                  <div>
+                    <label className="block text-sm font-medium text-gray-700">
+                      Max Issues
+                    </label>
+                    <p className="text-xs text-gray-500 mb-1">
+                      Maximum issues to return per document
+                    </p>
+                    <input
+                      type="number"
+                      min={1}
+                      max={100}
+                      value={editedConfig.thresholds.maxIssues}
+                      onChange={(e) => updateThreshold("maxIssues", Number(e.target.value))}
+                      disabled={!isEditing}
+                      className="w-24 px-3 py-2 border rounded-md text-sm disabled:bg-gray-50 disabled:text-gray-500"
+                    />
+                  </div>
+                </div>
+
+                {/* Custom Prompts */}
+                <PromptEditor
+                  label="System Prompt"
+                  description="Instructions for the extraction model"
+                  value={editedConfig.prompts?.extractorSystemPrompt || ""}
+                  defaultValue={defaultPrompts?.extractorSystemPrompt}
+                  onChange={(v) => updatePrompt("extractorSystemPrompt", v)}
+                  disabled={!isEditing}
+                  placeholder="Leave empty to use default extraction prompt"
+                />
+                <PromptEditor
+                  label="User Prompt"
+                  description="Task instructions (document text appended automatically)"
+                  value={editedConfig.prompts?.extractorUserPrompt || ""}
+                  defaultValue={defaultPrompts?.extractorUserPrompt}
+                  onChange={(v) => updatePrompt("extractorUserPrompt", v)}
+                  disabled={!isEditing}
+                  placeholder="Leave empty to use default user prompt"
+                  rows={2}
+                />
+              </div>
+            </PipelineSection>
+
+            {/* Phase 2: Deduplication */}
+            <PipelineSection
+              title="2. Deduplication"
+              subtitle="Remove duplicate issues found by multiple extractors"
+              color="purple"
+              isExpanded={expandedSections.has("dedup")}
+              onToggle={() => toggleSection("dedup")}
+            >
+              <div className="space-y-4">
+                <div>
+                  <label className="block text-sm font-medium text-gray-700">
+                    Similarity Threshold
+                  </label>
+                  <p className="text-xs text-gray-500 mb-1">
+                    Jaccard similarity threshold (0-1). Issues above this are considered duplicates.
+                  </p>
+                  <input
+                    type="number"
+                    min={0}
+                    max={1}
+                    step={0.05}
+                    value={editedConfig.thresholds.dedupThreshold}
+                    onChange={(e) => updateThreshold("dedupThreshold", Number(e.target.value))}
+                    disabled={!isEditing}
+                    className="w-24 px-3 py-2 border rounded-md text-sm disabled:bg-gray-50 disabled:text-gray-500"
+                  />
+                </div>
+
+                {/* Judge configuration (for multi-extractor mode) */}
+                <JudgeEditor
+                  judge={editedConfig.models.judge}
+                  onChange={updateJudge}
+                  disabled={!isEditing}
+                />
+
+                <PromptEditor
+                  label="Judge System Prompt"
+                  description="Instructions for the judge aggregation model"
+                  value={editedConfig.prompts?.judgeSystemPrompt || ""}
+                  defaultValue={defaultPrompts?.judgeSystemPrompt}
+                  onChange={(v) => updatePrompt("judgeSystemPrompt", v)}
+                  disabled={!isEditing}
+                  placeholder="Leave empty to use default judge prompt"
+                />
+              </div>
+            </PipelineSection>
+
+            {/* Phase 3: Filtering */}
+            <PipelineSection
+              title="3. Filtering"
+              subtitle="Remove false positives using configurable filters"
+              color="orange"
+              isExpanded={expandedSections.has("filtering")}
+              onToggle={() => toggleSection("filtering")}
+            >
+              <div className="space-y-4">
+                {/* Max Issues to Process */}
+                <div>
+                  <label className="block text-sm font-medium text-gray-700">
+                    Max Issues to Process
+                  </label>
+                  <p className="text-xs text-gray-500 mb-1">
+                    Limit how many issues enter the filtering pipeline (after dedup)
+                  </p>
+                  <input
+                    type="number"
+                    min={1}
+                    max={100}
+                    value={editedConfig.thresholds.maxIssuesToProcess}
+                    onChange={(e) => updateThreshold("maxIssuesToProcess", Number(e.target.value))}
+                    disabled={!isEditing}
+                    className="w-24 px-3 py-2 border rounded-md text-sm disabled:bg-gray-50 disabled:text-gray-500"
+                  />
+                </div>
+
+                {/* Filter Chain */}
+                <FilterChainEditor
+                  filters={editedConfig.filterChain}
+                  onChange={updateFilterChain}
+                  disabled={!isEditing}
+                  defaultFilterPrompt={defaultPrompts?.filterSystemPrompt}
+                />
+              </div>
+            </PipelineSection>
+
+            {/* Phase 4: Review */}
+            <PipelineSection
+              title="4. Review"
+              subtitle="Final quality check and summary generation"
+              color="green"
+              isExpanded={expandedSections.has("review")}
+              onToggle={() => toggleSection("review")}
+            >
+              <div className="space-y-4">
+                <p className="text-sm text-gray-600">
+                  The review phase filters out redundant or low-value comments and generates a summary of the document's epistemic quality.
+                </p>
+
+                <PromptEditor
+                  label="Review System Prompt"
+                  description="Instructions for the final review model"
+                  value={editedConfig.prompts?.reviewSystemPrompt || ""}
+                  onChange={(v) => updatePrompt("reviewSystemPrompt", v)}
+                  disabled={!isEditing}
+                  placeholder="Leave empty to use default review prompt"
+                />
+              </div>
+            </PipelineSection>
+          </div>
+
+          {/* Summary Bar */}
+          <div className="px-4 py-3 bg-gray-100 border-t text-sm text-gray-600">
+            <span className="font-medium">Flow:</span>{" "}
+            Extract ({editedConfig.models.extractors.length} model{editedConfig.models.extractors.length !== 1 ? 's' : ''}, severity ≥{editedConfig.thresholds.minSeverityThreshold}, max {editedConfig.thresholds.maxIssues}/model) →{" "}
+            Dedup (similarity ≥{editedConfig.thresholds.dedupThreshold}) →{" "}
+            Filter ({editedConfig.filterChain.filter(f => f.enabled).length} active, intake max {editedConfig.thresholds.maxIssuesToProcess}) →{" "}
+            Review
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+}
+
+interface PipelineSectionProps {
+  title: string;
+  subtitle: string;
+  color: "blue" | "purple" | "orange" | "green";
+  isExpanded: boolean;
+  onToggle: () => void;
+  children: React.ReactNode;
+}
+
+function PipelineSection({ title, subtitle, color, isExpanded, onToggle, children }: PipelineSectionProps) {
+  const colorClasses = {
+    blue: "bg-blue-100 text-blue-800 border-blue-200",
+    purple: "bg-purple-100 text-purple-800 border-purple-200",
+    orange: "bg-orange-100 text-orange-800 border-orange-200",
+    green: "bg-green-100 text-green-800 border-green-200",
+  };
+
+  return (
+    <div>
+      <button
+        onClick={onToggle}
+        className="w-full px-4 py-3 flex items-center justify-between hover:bg-gray-50 transition-colors text-left"
+      >
+        <div className="flex items-center space-x-3">
+          {isExpanded ? (
+            <ChevronDownIcon className="h-4 w-4 text-gray-500" />
+          ) : (
+            <ChevronRightIcon className="h-4 w-4 text-gray-500" />
+          )}
+          <span className={`px-2 py-0.5 rounded text-xs font-medium ${colorClasses[color]}`}>
+            {title}
+          </span>
+        </div>
+        <span className="text-sm text-gray-500">{subtitle}</span>
+      </button>
+      {isExpanded && <div className="px-4 pb-4 pt-2">{children}</div>}
+    </div>
+  );
+}
+
+interface PromptEditorProps {
+  label: string;
+  description: string;
+  value: string;
+  defaultValue?: string;
+  onChange: (value: string) => void;
+  disabled: boolean;
+  placeholder: string;
+  rows?: number;
+}
+
+function PromptEditor({ label, description, value, defaultValue, onChange, disabled, placeholder, rows = 3 }: PromptEditorProps) {
+  const [isOpen, setIsOpen] = useState(false);
+  const [showDefault, setShowDefault] = useState(false);
+  const hasValue = value.trim().length > 0;
+
+  const handleUseDefault = () => {
+    if (defaultValue) {
+      onChange(defaultValue);
+    }
+  };
+
+  return (
+    <div>
+      <button
+        type="button"
+        onClick={() => setIsOpen(!isOpen)}
+        className="flex items-center gap-2 text-sm text-gray-600 hover:text-gray-900"
+      >
+        {isOpen ? <ChevronDownIcon className="h-3 w-3" /> : <ChevronRightIcon className="h-3 w-3" />}
+        <span className="font-medium">{label}</span>
+        {hasValue ? (
+          <span className="text-xs text-blue-600">(customized)</span>
+        ) : (
+          <span className="text-xs text-gray-400">(using default)</span>
+        )}
+      </button>
+      {isOpen && (
+        <div className="mt-2">
+          <p className="text-xs text-gray-500 mb-1">{description}</p>
+          <textarea
+            value={value}
+            onChange={(e) => onChange(e.target.value)}
+            disabled={disabled}
+            rows={hasValue ? rows : 2}
+            className="w-full px-3 py-2 border rounded-md text-sm font-mono disabled:bg-gray-50 disabled:text-gray-500"
+            placeholder={placeholder}
+          />
+          {!hasValue && defaultValue && (
+            <div className="mt-2">
+              <div className="flex items-center gap-2 mb-1">
+                <button
+                  type="button"
+                  onClick={() => setShowDefault(!showDefault)}
+                  className="text-xs text-blue-600 hover:text-blue-800"
+                >
+                  {showDefault ? "Hide default" : "View default prompt"}
+                </button>
+                {!disabled && (
+                  <button
+                    type="button"
+                    onClick={handleUseDefault}
+                    className="text-xs text-gray-500 hover:text-gray-700"
+                  >
+                    Copy to customize
+                  </button>
+                )}
+              </div>
+              {showDefault && (
+                <pre className="p-2 bg-gray-50 border rounded text-xs font-mono text-gray-600 max-h-64 overflow-auto whitespace-pre-wrap">
+                  {defaultValue}
+                </pre>
+              )}
+            </div>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ProfilesList.tsx b/apps/web/src/app/monitor/lab/components/profiles/ProfilesList.tsx
new file mode 100644
index 000000000..62492cf78
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ProfilesList.tsx
@@ -0,0 +1,129 @@
+"use client";
+
+import { PlusIcon, TrashIcon, CheckCircleIcon } from "@heroicons/react/24/outline";
+import type { Profile } from "../../types";
+import { formatDate } from "../../utils/formatters";
+
+interface ProfilesListProps {
+  profiles: Profile[];
+  loading: boolean;
+  selectedProfile: Profile | null;
+  onSelectProfile: (profile: Profile) => void;
+  onCreateProfile: () => void;
+  onDeleteProfile: (id: string) => void;
+  onSetDefault: (id: string) => void;
+}
+
+export function ProfilesList({
+  profiles,
+  loading,
+  selectedProfile,
+  onSelectProfile,
+  onCreateProfile,
+  onDeleteProfile,
+  onSetDefault,
+}: ProfilesListProps) {
+  const handleDelete = (e: React.MouseEvent, profile: Profile) => {
+    e.stopPropagation();
+    if (profile.isDefault) {
+      alert("Cannot delete the default profile. Set another profile as default first.");
+      return;
+    }
+    if (!confirm(`Delete profile "${profile.name}"?`)) return;
+    onDeleteProfile(profile.id);
+  };
+
+  const handleSetDefault = (e: React.MouseEvent, profile: Profile) => {
+    e.stopPropagation();
+    if (profile.isDefault) return;
+    onSetDefault(profile.id);
+  };
+
+  return (
+    <div className="flex flex-col h-full">
+      <div className="p-4 border-b bg-white">
+        <div className="flex items-center justify-between mb-2">
+          <h2 className="font-semibold text-gray-900">Profiles</h2>
+          <button
+            onClick={onCreateProfile}
+            className="p-1 text-blue-600 hover:bg-blue-50 rounded"
+            title="Create profile"
+          >
+            <PlusIcon className="h-5 w-5" />
+          </button>
+        </div>
+        <p className="text-xs text-gray-500">Configure fallacy checker settings for different use cases</p>
+      </div>
+
+      <div className="flex-1 overflow-y-auto">
+        {loading ? (
+          <div className="p-4 text-gray-500 text-sm">Loading...</div>
+        ) : profiles.length === 0 ? (
+          <div className="p-4 text-gray-500 text-sm">
+            <p>No profiles yet</p>
+            <button
+              onClick={onCreateProfile}
+              className="mt-2 text-blue-600 hover:underline text-sm"
+            >
+              Create your first profile
+            </button>
+          </div>
+        ) : (
+          <div className="divide-y">
+            {profiles.map((profile) => (
+              <div
+                key={profile.id}
+                onClick={() => onSelectProfile(profile)}
+                className={`p-3 cursor-pointer hover:bg-gray-100 ${
+                  selectedProfile?.id === profile.id ? "bg-blue-50 border-l-4 border-blue-500" : ""
+                }`}
+              >
+                <div className="flex items-start justify-between">
+                  <div className="flex-1 min-w-0">
+                    <div className="flex items-center gap-2">
+                      <span className="font-medium text-gray-900 truncate">{profile.name}</span>
+                      {profile.isDefault && (
+                        <span className="px-1.5 py-0.5 text-xs bg-green-100 text-green-700 rounded">
+                          Default
+                        </span>
+                      )}
+                    </div>
+                    {profile.description && (
+                      <div className="text-xs text-gray-500 mt-1 truncate">{profile.description}</div>
+                    )}
+                    <div className="text-xs text-gray-400 mt-1">
+                      {formatDate(profile.updatedAt)}
+                    </div>
+                  </div>
+                  <div className="flex items-center gap-1 ml-2">
+                    {!profile.isDefault && (
+                      <button
+                        onClick={(e) => handleSetDefault(e, profile)}
+                        className="p-1 text-gray-400 hover:text-green-600 hover:bg-green-50 rounded"
+                        title="Set as default"
+                      >
+                        <CheckCircleIcon className="h-4 w-4" />
+                      </button>
+                    )}
+                    <button
+                      onClick={(e) => handleDelete(e, profile)}
+                      className={`p-1 rounded ${
+                        profile.isDefault
+                          ? "text-gray-300 cursor-not-allowed"
+                          : "text-gray-400 hover:text-red-600 hover:bg-red-50"
+                      }`}
+                      disabled={profile.isDefault}
+                      title={profile.isDefault ? "Cannot delete default profile" : "Delete profile"}
+                    >
+                      <TrashIcon className="h-4 w-4" />
+                    </button>
+                  </div>
+                </div>
+              </div>
+            ))}
+          </div>
+        )}
+      </div>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/profiles/ProviderSelector.tsx b/apps/web/src/app/monitor/lab/components/profiles/ProviderSelector.tsx
new file mode 100644
index 000000000..8c4b2326f
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/profiles/ProviderSelector.tsx
@@ -0,0 +1,214 @@
+"use client";
+
+import { useMemo } from "react";
+import type { ProviderPreferences } from "../../types";
+import { useModelEndpoints } from "../../hooks/useModelEndpoints";
+
+interface ProviderSelectorProps {
+  provider: ProviderPreferences | undefined;
+  onChange: (provider: ProviderPreferences | undefined) => void;
+  disabled?: boolean;
+  modelId: string;
+  /** Whether to indent the content (for use inside ModelConfigurator with label) */
+  indent?: boolean;
+}
+
+export function ProviderSelector({
+  provider,
+  onChange,
+  disabled,
+  modelId,
+  indent = false,
+}: ProviderSelectorProps) {
+  const selectedProviders = provider?.order || [];
+
+  // Fetch available providers/endpoints for this specific model
+  const { endpoints, loading: endpointsLoading } = useModelEndpoints(modelId);
+
+  // Convert endpoints to provider format for the UI
+  const availableProviders = useMemo(
+    () =>
+      endpoints.map((ep) => ({
+        id: ep.tag,
+        name: ep.providerName,
+        uptime: ep.uptimeLast30m,
+      })),
+    [endpoints]
+  );
+
+  const toggleProvider = (providerId: string) => {
+    if (disabled) return;
+    const current = selectedProviders;
+    const newOrder = current.includes(providerId)
+      ? current.filter((p) => p !== providerId)
+      : [...current, providerId];
+
+    if (newOrder.length === 0) {
+      onChange(undefined);
+    } else {
+      onChange({ order: newOrder, allow_fallbacks: provider?.allow_fallbacks ?? true });
+    }
+  };
+
+  const moveProvider = (providerId: string, direction: "up" | "down") => {
+    if (disabled) return;
+    const current = [...selectedProviders];
+    const index = current.indexOf(providerId);
+    if (index === -1) return;
+
+    const newIndex = direction === "up" ? index - 1 : index + 1;
+    if (newIndex < 0 || newIndex >= current.length) return;
+
+    [current[index], current[newIndex]] = [current[newIndex], current[index]];
+    onChange({ order: current, allow_fallbacks: provider?.allow_fallbacks ?? true });
+  };
+
+  const containerClass = indent ? "pl-5 space-y-2" : "space-y-2";
+
+  // Show loading state
+  if (endpointsLoading) {
+    return (
+      <div className={containerClass}>
+        <div className="flex items-center gap-2">
+          <span className="text-xs text-gray-500">Provider Preference</span>
+          <span className="text-xs text-gray-400 animate-pulse">Loading providers...</span>
+        </div>
+      </div>
+    );
+  }
+
+  // No providers available for this model
+  if (availableProviders.length === 0) {
+    return (
+      <div className={containerClass}>
+        <div className="flex items-center gap-2">
+          <span className="text-xs text-gray-500">Provider Preference</span>
+          <span className="text-xs text-gray-400">(no routing options)</span>
+        </div>
+      </div>
+    );
+  }
+
+  // Only one provider - no need to show selection
+  if (availableProviders.length === 1) {
+    return (
+      <div className={containerClass}>
+        <div className="flex items-center gap-2">
+          <span className="text-xs text-gray-500">Provider</span>
+          <span className="text-xs text-gray-600">{availableProviders[0].name}</span>
+          <span className="text-xs text-green-600">
+            ({Math.round(availableProviders[0].uptime)}% uptime)
+          </span>
+        </div>
+      </div>
+    );
+  }
+
+  return (
+    <div className={containerClass}>
+      <div className="flex items-center gap-2">
+        <span className="text-xs text-gray-500">Provider Preference</span>
+        <span className="text-xs text-gray-400">({availableProviders.length} available)</span>
+      </div>
+
+      {/* Selected providers in order */}
+      {selectedProviders.length > 0 && (
+        <div className="flex flex-wrap items-center gap-1.5">
+          {selectedProviders.map((pid, idx) => {
+            const providerInfo = availableProviders.find((p) => p.id === pid);
+            return (
+              <div
+                key={pid}
+                className="flex items-center gap-1 px-2 py-0.5 bg-blue-100 text-blue-700 rounded text-xs"
+              >
+                <span className="font-medium">{idx + 1}.</span>
+                <span>{providerInfo?.name || pid}</span>
+                {providerInfo?.uptime !== undefined && (
+                  <span className="text-blue-500 text-[10px]">
+                    ({Math.round(providerInfo.uptime)}%)
+                  </span>
+                )}
+                {!disabled && (
+                  <>
+                    {idx > 0 && (
+                      <button
+                        onClick={() => moveProvider(pid, "up")}
+                        className="text-blue-500 hover:text-blue-700 px-0.5"
+                        title="Move up in priority"
+                      >
+                        ↑
+                      </button>
+                    )}
+                    {idx < selectedProviders.length - 1 && (
+                      <button
+                        onClick={() => moveProvider(pid, "down")}
+                        className="text-blue-500 hover:text-blue-700 px-0.5"
+                        title="Move down in priority"
+                      >
+                        ↓
+                      </button>
+                    )}
+                    <button
+                      onClick={() => toggleProvider(pid)}
+                      className="text-blue-500 hover:text-red-600 ml-1"
+                      title="Remove"
+                    >
+                      ×
+                    </button>
+                  </>
+                )}
+              </div>
+            );
+          })}
+        </div>
+      )}
+
+      {/* Provider selection dropdown */}
+      {!disabled && (
+        <select
+          value=""
+          onChange={(e) => {
+            if (e.target.value) {
+              toggleProvider(e.target.value);
+            }
+          }}
+          className="px-2 py-1 text-xs border border-gray-200 rounded bg-white"
+        >
+          <option value="">
+            {selectedProviders.length === 0 ? "Select preferred provider..." : "+ Add provider"}
+          </option>
+          {availableProviders
+            .filter((p) => !selectedProviders.includes(p.id))
+            .map((p) => (
+              <option key={p.id} value={p.id}>
+                {p.name} ({Math.round(p.uptime)}% uptime)
+              </option>
+            ))}
+        </select>
+      )}
+
+      {/* Fallback toggle */}
+      {selectedProviders.length > 0 && (
+        <label className="flex items-center gap-2 text-xs text-gray-500">
+          <input
+            type="checkbox"
+            checked={provider?.allow_fallbacks ?? true}
+            onChange={(e) => {
+              if (disabled) return;
+              onChange({ order: selectedProviders, allow_fallbacks: e.target.checked });
+            }}
+            disabled={disabled}
+            className="rounded border-gray-300"
+          />
+          Allow fallback to other providers
+        </label>
+      )}
+
+      {selectedProviders.length === 0 && (
+        <p className="text-xs text-gray-400 italic">
+          No preference set - OpenRouter will choose automatically
+        </p>
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/snapshots/ExtractorCards.tsx b/apps/web/src/app/monitor/lab/components/snapshots/ExtractorCards.tsx
new file mode 100644
index 000000000..ca2bb68af
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/snapshots/ExtractorCards.tsx
@@ -0,0 +1,363 @@
+"use client";
+
+import { useState } from "react";
+import {
+  ChevronDownIcon,
+  ChevronRightIcon,
+  ExclamationTriangleIcon,
+  CheckCircleIcon,
+} from "@heroicons/react/24/outline";
+import type { ExtractorInfo, ExtractionPhase } from "../../types";
+import {
+  formatDuration,
+  formatCost,
+  formatTokens,
+  getModelDisplayName,
+  formatTemperature,
+  formatReasoning,
+} from "./pipelineUtils";
+
+interface ExtractorCardProps {
+  ext: ExtractorInfo;
+}
+
+/**
+ * Individual extractor card with collapsible details
+ */
+export function ExtractorCard({ ext }: ExtractorCardProps) {
+  const [showDetails, setShowDetails] = useState(false);
+  const hasError = !!ext.error;
+  const modelName = getModelDisplayName(ext.model);
+  const tempDisplay = formatTemperature(ext);
+  const reasoningDisplay = formatReasoning(ext);
+  const inputTokens = ext.responseMetrics?.inputTokens;
+  const outputTokens = ext.responseMetrics?.outputTokens;
+  const cacheReadTokens = ext.responseMetrics?.cacheReadTokens;
+  const cacheWriteTokens = ext.responseMetrics?.cacheWriteTokens;
+
+  return (
+    <div
+      className={`p-3 rounded-md border ${
+        hasError ? "bg-red-50 border-red-200" : "bg-blue-50 border-blue-100"
+      }`}
+    >
+      {/* Header row: Model name + status */}
+      <div className="flex items-center justify-between">
+        <div className="flex items-center gap-2">
+          {hasError ? (
+            <ExclamationTriangleIcon className="h-4 w-4 text-red-500" />
+          ) : (
+            <CheckCircleIcon className="h-4 w-4 text-green-500" />
+          )}
+          <span className={`font-medium ${hasError ? "text-red-900" : "text-blue-900"}`}>
+            {modelName}
+          </span>
+        </div>
+        <div className="flex items-center gap-3">
+          {ext.durationMs !== undefined && (
+            <span className="text-xs text-gray-500 font-mono">
+              {formatDuration(ext.durationMs)}
+            </span>
+          )}
+          {ext.costUsd !== undefined && (
+            <span className="text-xs text-gray-400">{formatCost(ext.costUsd)}</span>
+          )}
+          <span className={`font-mono ${hasError ? "text-red-700" : "text-blue-700"}`}>
+            {ext.issuesFound} issue{ext.issuesFound !== 1 ? "s" : ""}
+          </span>
+        </div>
+      </div>
+
+      {/* Config row: temperature, reasoning */}
+      <div className="flex flex-wrap gap-2 mt-2">
+        {tempDisplay && (
+          <span className="px-2 py-0.5 bg-gray-100 text-gray-600 text-xs rounded">
+            {tempDisplay}
+          </span>
+        )}
+        {reasoningDisplay && (
+          <span className="px-2 py-0.5 bg-purple-100 text-purple-700 text-xs rounded">
+            {reasoningDisplay}
+          </span>
+        )}
+      </div>
+
+      {/* Error display */}
+      {hasError && (
+        <div className="mt-2 p-2 bg-red-100 rounded text-xs text-red-700">
+          <span className="font-medium">Error:</span> {ext.error}
+        </div>
+      )}
+
+      {/* Issue type breakdown if available */}
+      {ext.issuesByType && Object.keys(ext.issuesByType).length > 0 && (
+        <div className="mt-2 flex flex-wrap gap-1">
+          {Object.entries(ext.issuesByType).map(([type, count]) => (
+            <span key={type} className="px-1.5 py-0.5 bg-blue-100 text-blue-700 text-xs rounded">
+              {type}: {count}
+            </span>
+          ))}
+        </div>
+      )}
+
+      {/* Collapsible details toggle */}
+      <button
+        onClick={() => setShowDetails(!showDetails)}
+        className="mt-2 text-xs text-gray-400 hover:text-gray-600 flex items-center gap-1"
+      >
+        {showDetails ? (
+          <ChevronDownIcon className="h-3 w-3" />
+        ) : (
+          <ChevronRightIcon className="h-3 w-3" />
+        )}
+        {showDetails ? "Hide details" : "Show details"}
+      </button>
+
+      {/* Collapsible details section */}
+      {showDetails && (
+        <div className="mt-2 pt-2 border-t border-gray-200 space-y-1.5">
+          {/* Token usage */}
+          {(inputTokens || outputTokens) && (
+            <div className="text-xs text-gray-500">
+              <span className="font-medium text-gray-600">Tokens:</span>{" "}
+              {inputTokens && <span>in: {formatTokens(inputTokens)}</span>}
+              {inputTokens && outputTokens && " · "}
+              {outputTokens && <span>out: {formatTokens(outputTokens)}</span>}
+              {cacheReadTokens ? (
+                <span className="ml-2 text-green-600">
+                  cache read: {formatTokens(cacheReadTokens)}
+                </span>
+              ) : null}
+              {cacheWriteTokens ? (
+                <span className="ml-2 text-yellow-600">
+                  cache write: {formatTokens(cacheWriteTokens)}
+                </span>
+              ) : null}
+            </div>
+          )}
+
+          {/* API params details */}
+          {ext.actualApiParams && (
+            <div className="text-xs text-gray-500">
+              <span className="font-medium text-gray-600">API params:</span> temp=
+              {ext.actualApiParams.temperature}, maxTokens={ext.actualApiParams.maxTokens}
+              {ext.actualApiParams.thinking && (
+                <span className="ml-1">
+                  , thinking budget: {formatTokens(ext.actualApiParams.thinking.budget_tokens)}
+                </span>
+              )}
+              {ext.actualApiParams.reasoning?.max_tokens && (
+                <span className="ml-1">
+                  , reasoning budget: {formatTokens(ext.actualApiParams.reasoning.max_tokens)}
+                </span>
+              )}
+              {ext.actualApiParams.reasoning?.effort &&
+                !ext.actualApiParams.reasoning.max_tokens && (
+                  <span className="ml-1">, reasoning: {ext.actualApiParams.reasoning.effort}</span>
+                )}
+            </div>
+          )}
+
+          {/* Response metrics */}
+          {ext.responseMetrics && (
+            <div className="text-xs text-gray-500">
+              <span className="font-medium text-gray-600">Response:</span>{" "}
+              {ext.responseMetrics.success ? "success" : "failed"}, latency:{" "}
+              {ext.responseMetrics.latencyMs}ms
+              {ext.responseMetrics.stopReason && (
+                <span className="ml-1">, stop: {ext.responseMetrics.stopReason}</span>
+              )}
+            </div>
+          )}
+
+          {/* Full model ID */}
+          <div className="text-xs text-gray-400">
+            <span className="font-medium text-gray-600">Model ID:</span>{" "}
+            <code className="bg-gray-100 px-1 rounded">{ext.model}</code>
+          </div>
+        </div>
+      )}
+    </div>
+  );
+}
+
+/**
+ * Deduplication & Aggregation card showing both Jaccard dedup and Judge stages
+ */
+export function DeduplicationCard({
+  extraction,
+  extractorCount: _extractorCount,
+}: {
+  extraction: ExtractionPhase;
+  extractorCount: number;
+}) {
+  const [showDetails, setShowDetails] = useState(false);
+
+  const totalFromExtractors = extraction.totalIssuesBeforeJudge;
+  const afterJaccardDedup = extraction.totalIssuesAfterDedup ?? totalFromExtractors;
+  const afterJudge = extraction.totalIssuesAfterJudge;
+
+  const jaccardRemoved = totalFromExtractors - afterJaccardDedup;
+  const judgeRemoved = afterJaccardDedup - afterJudge;
+  const totalRemoved = totalFromExtractors - afterJudge;
+
+  const hasJudge = extraction.judgeDurationMs !== undefined;
+  const judgeCost = extraction.judgeCostUsd;
+  const overallRate =
+    totalFromExtractors > 0 ? Math.round((totalRemoved / totalFromExtractors) * 100) : 0;
+
+  return (
+    <div className="p-3 bg-purple-50 border border-purple-100 rounded-md">
+      <div className="flex items-center justify-between">
+        <span className="text-sm font-medium text-purple-900">Deduplication & Aggregation</span>
+        {hasJudge && (
+          <div className="flex items-center gap-2">
+            <span className="text-xs text-purple-600 font-mono">
+              judge: {formatDuration(extraction.judgeDurationMs)}
+            </span>
+            {judgeCost !== undefined && judgeCost > 0 && (
+              <span className="text-xs text-purple-400 font-mono">{formatCost(judgeCost)}</span>
+            )}
+          </div>
+        )}
+      </div>
+
+      {/* Flow visualization */}
+      <div className="mt-3 flex items-center gap-2 text-xs">
+        <div className="bg-purple-100 rounded px-2 py-1 text-center">
+          <div className="text-purple-600 text-[10px]">Raw</div>
+          <div className="font-mono text-purple-900 font-bold">{totalFromExtractors}</div>
+        </div>
+        <span className="text-purple-400">→</span>
+        <div className="bg-purple-100 rounded px-2 py-1 text-center">
+          <div className="text-purple-600 text-[10px]">Jaccard</div>
+          <div className="font-mono text-purple-900">{afterJaccardDedup}</div>
+          {jaccardRemoved > 0 && <div className="text-[10px] text-red-500">-{jaccardRemoved}</div>}
+        </div>
+        {hasJudge && (
+          <>
+            <span className="text-purple-400">→</span>
+            <div className="bg-purple-100 rounded px-2 py-1 text-center">
+              <div className="text-purple-600 text-[10px]">Judge</div>
+              <div className="font-mono text-purple-900">{afterJudge}</div>
+              {judgeRemoved > 0 && <div className="text-[10px] text-red-500">-{judgeRemoved}</div>}
+            </div>
+          </>
+        )}
+        <span className="text-purple-400">=</span>
+        <div className="bg-green-100 rounded px-2 py-1 text-center">
+          <div className="text-green-600 text-[10px]">Final</div>
+          <div className="font-mono text-green-900 font-bold">{afterJudge}</div>
+        </div>
+        <span className="ml-2 text-purple-500 text-[10px]">({overallRate}% reduced)</span>
+      </div>
+
+      {/* Collapsible details toggle */}
+      <button
+        onClick={() => setShowDetails(!showDetails)}
+        className="mt-2 text-xs text-purple-400 hover:text-purple-600 flex items-center gap-1"
+      >
+        {showDetails ? (
+          <ChevronDownIcon className="h-3 w-3" />
+        ) : (
+          <ChevronRightIcon className="h-3 w-3" />
+        )}
+        {showDetails ? "Hide details" : "Show details"}
+      </button>
+
+      {/* Collapsible details */}
+      {showDetails && (
+        <div className="mt-2 pt-2 border-t border-purple-200 text-xs text-purple-700 space-y-2">
+          <div>
+            <span className="font-medium">Jaccard Dedup:</span> Merges issues with 70%+ word
+            overlap, keeping higher quality version.
+            {jaccardRemoved > 0 ? (
+              <span className="ml-1 text-red-600">Removed {jaccardRemoved} duplicates.</span>
+            ) : (
+              <span className="ml-1 text-green-600">No duplicates found.</span>
+            )}
+          </div>
+          {hasJudge && (
+            <>
+              <div>
+                <span className="font-medium">LLM Judge:</span> Evaluates and merges semantically
+                similar issues.
+                {judgeRemoved > 0 ? (
+                  <span className="ml-1 text-red-600">Removed {judgeRemoved} issues.</span>
+                ) : (
+                  <span className="ml-1 text-green-600">Kept all issues.</span>
+                )}
+              </div>
+              {/* Judge telemetry details */}
+              {extraction.judgeModel && (
+                <div className="text-gray-500">
+                  <span className="font-medium text-gray-600">Model:</span>{" "}
+                  <code className="bg-gray-100 px-1 rounded">{extraction.judgeModel}</code>
+                </div>
+              )}
+              {extraction.judgeActualApiParams && (
+                <div className="text-gray-500">
+                  <span className="font-medium text-gray-600">API params:</span> temp=
+                  {extraction.judgeActualApiParams.temperature}, maxTokens=
+                  {extraction.judgeActualApiParams.maxTokens}
+                  {extraction.judgeActualApiParams.thinking && (
+                    <span className="ml-1">
+                      , thinking:{" "}
+                      {formatTokens(extraction.judgeActualApiParams.thinking.budget_tokens)}
+                    </span>
+                  )}
+                  {extraction.judgeActualApiParams.reasoning?.max_tokens && (
+                    <span className="ml-1">
+                      , reasoning:{" "}
+                      {formatTokens(extraction.judgeActualApiParams.reasoning.max_tokens)}
+                    </span>
+                  )}
+                  {extraction.judgeActualApiParams.reasoning?.effort &&
+                    !extraction.judgeActualApiParams.reasoning.max_tokens && (
+                      <span className="ml-1">
+                        , reasoning: {extraction.judgeActualApiParams.reasoning.effort}
+                      </span>
+                    )}
+                </div>
+              )}
+              {extraction.judgeResponseMetrics && (
+                <div className="text-gray-500">
+                  <span className="font-medium text-gray-600">Response:</span>{" "}
+                  {extraction.judgeResponseMetrics.success ? "success" : "failed"}
+                  {extraction.judgeResponseMetrics.latencyMs && (
+                    <span>, latency: {extraction.judgeResponseMetrics.latencyMs}ms</span>
+                  )}
+                  {extraction.judgeResponseMetrics.inputTokens && (
+                    <span>
+                      , in: {formatTokens(extraction.judgeResponseMetrics.inputTokens)}
+                    </span>
+                  )}
+                  {extraction.judgeResponseMetrics.outputTokens && (
+                    <span>
+                      , out: {formatTokens(extraction.judgeResponseMetrics.outputTokens)}
+                    </span>
+                  )}
+                  {extraction.judgeResponseMetrics.stopReason && (
+                    <span>, stop: {extraction.judgeResponseMetrics.stopReason}</span>
+                  )}
+                </div>
+              )}
+            </>
+          )}
+          {extraction.extractors && extraction.extractors.length > 0 && (
+            <div>
+              <span className="font-medium">Issues per extractor:</span>
+              <ul className="ml-3 list-disc">
+                {extraction.extractors.map((ext, i) => (
+                  <li key={i}>
+                    {ext.extractorId || getModelDisplayName(ext.model)}: {ext.issuesFound}
+                  </li>
+                ))}
+              </ul>
+            </div>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/snapshots/ItemCards.tsx b/apps/web/src/app/monitor/lab/components/snapshots/ItemCards.tsx
new file mode 100644
index 000000000..75ea3b96b
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/snapshots/ItemCards.tsx
@@ -0,0 +1,136 @@
+"use client";
+
+import { useState } from "react";
+import { ChevronRightIcon } from "@heroicons/react/24/outline";
+import type { FilteredItem, PassedItem, Comment } from "../../types";
+import { truncate } from "../../utils/formatters";
+import { getFilterStageBadgeText } from "./pipelineUtils";
+
+interface FilteredItemCardProps {
+  item: FilteredItem;
+}
+
+interface CommentCardProps {
+  comment: Comment;
+  variant: "kept" | "lost";
+}
+
+interface PassedItemCardProps {
+  item: PassedItem;
+}
+
+/**
+ * Card component for displaying a filtered item (removed by a filter stage)
+ */
+export function FilteredItemCard({ item }: FilteredItemCardProps) {
+  const [expanded, setExpanded] = useState(false);
+
+  return (
+    <div className="p-3 bg-orange-50 rounded-md border border-orange-100">
+      <div
+        className="flex items-start justify-between cursor-pointer"
+        onClick={() => setExpanded(!expanded)}
+      >
+        <div className="flex-1 min-w-0">
+          <div className="flex items-center space-x-2">
+            <span className="px-1.5 py-0.5 bg-orange-200 text-orange-800 rounded text-xs">
+              {getFilterStageBadgeText(item.stage)}
+            </span>
+            {item.header && (
+              <span className="text-xs text-orange-700">[{item.header}]</span>
+            )}
+          </div>
+          <p className="text-sm text-gray-700 mt-1">{truncate(item.quotedText, 80)}</p>
+        </div>
+        <ChevronRightIcon
+          className={`h-4 w-4 text-gray-400 transition-transform ${expanded ? "rotate-90" : ""}`}
+        />
+      </div>
+      {expanded && (
+        <div className="mt-3 pt-3 border-t border-orange-200">
+          <p className="text-xs text-gray-600">
+            <span className="font-medium">Reason:</span> {item.filterReason}
+          </p>
+          {item.supportLocation && (
+            <p className="text-xs text-gray-500 mt-1">
+              <span className="font-medium">Support found at:</span> {item.supportLocation}
+            </p>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
+/**
+ * Card component for displaying a comment (kept or lost)
+ */
+export function CommentCard({ comment, variant }: CommentCardProps) {
+  const [expanded, setExpanded] = useState(false);
+  const bgColor = variant === "kept" ? "bg-green-50 border-green-100" : "bg-red-50 border-red-100";
+
+  return (
+    <div className={`p-3 rounded-md border ${bgColor}`}>
+      <div
+        className="flex items-start justify-between cursor-pointer"
+        onClick={() => setExpanded(!expanded)}
+      >
+        <div className="flex-1 min-w-0">
+          <span className="text-sm font-medium text-gray-900">{comment.header || "Comment"}</span>
+          <p className="text-sm text-gray-600 mt-1">{truncate(comment.quotedText, 80)}</p>
+        </div>
+        <ChevronRightIcon
+          className={`h-4 w-4 text-gray-400 transition-transform ${expanded ? "rotate-90" : ""}`}
+        />
+      </div>
+      {expanded && (
+        <div className="mt-3 pt-3 border-t border-gray-200">
+          <p className="text-xs text-gray-600">{comment.description}</p>
+          {comment.importance && (
+            <p className="text-xs text-gray-500 mt-1">
+              <span className="font-medium">Importance:</span> {comment.importance}
+            </p>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+
+/**
+ * Card component for displaying a passed item (kept by a filter stage)
+ */
+export function PassedItemCard({ item }: PassedItemCardProps) {
+  const [expanded, setExpanded] = useState(false);
+
+  return (
+    <div className="p-3 bg-green-50 rounded-md border border-green-100">
+      <div
+        className="flex items-start justify-between cursor-pointer"
+        onClick={() => setExpanded(!expanded)}
+      >
+        <div className="flex-1 min-w-0">
+          <div className="flex items-center space-x-2">
+            <span className="px-1.5 py-0.5 bg-green-200 text-green-800 rounded text-xs">
+              Kept
+            </span>
+            {item.header && (
+              <span className="text-xs text-green-700">[{item.header}]</span>
+            )}
+          </div>
+          <p className="text-sm text-gray-700 mt-1">{truncate(item.quotedText, 80)}</p>
+        </div>
+        <ChevronRightIcon
+          className={`h-4 w-4 text-gray-400 transition-transform ${expanded ? "rotate-90" : ""}`}
+        />
+      </div>
+      {expanded && (
+        <div className="mt-3 pt-3 border-t border-green-200">
+          <p className="text-xs text-gray-600">
+            <span className="font-medium">Why it passed:</span> {item.passReason}
+          </p>
+        </div>
+      )}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/snapshots/PipelineView.tsx b/apps/web/src/app/monitor/lab/components/snapshots/PipelineView.tsx
new file mode 100644
index 000000000..6c6281009
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/snapshots/PipelineView.tsx
@@ -0,0 +1,434 @@
+"use client";
+
+import { useState } from "react";
+import { ChevronDownIcon, ChevronRightIcon } from "@heroicons/react/24/outline";
+import type {
+  ExtractionPhase,
+  PipelineCounts,
+  FilteredItem,
+  PassedItem,
+  Comment,
+  StageMetrics,
+} from "../../types";
+import {
+  formatDuration,
+  formatCost,
+  formatTokens,
+  getFilterStageTitle,
+} from "./pipelineUtils";
+import { FilteredItemCard, PassedItemCard, CommentCard } from "./ItemCards";
+import { ExtractorCard, DeduplicationCard } from "./ExtractorCards";
+
+interface PipelineViewProps {
+  extraction?: ExtractionPhase;
+  counts?: PipelineCounts;
+  filteredItems: FilteredItem[];
+  passedItems?: PassedItem[];
+  stages?: StageMetrics[];
+  totalDurationMs?: number;
+  finalComments: Comment[];
+  lostComments: Comment[];
+}
+
+export function PipelineView({
+  extraction,
+  counts,
+  filteredItems,
+  passedItems,
+  stages,
+  totalDurationMs,
+  finalComments,
+  lostComments: _lostComments,
+}: PipelineViewProps) {
+  // Helper to get stage timing
+  const getStageTiming = (stageName: string): StageMetrics | undefined => {
+    return stages?.find((s) => s.stageName === stageName);
+  };
+  const [expandedSteps, setExpandedSteps] = useState<Set<string>>(new Set());
+  const [expandedPassedSections, setExpandedPassedSections] = useState<Set<string>>(new Set());
+
+  const togglePassedSection = (stageName: string) => {
+    const newSet = new Set(expandedPassedSections);
+    if (newSet.has(stageName)) {
+      newSet.delete(stageName);
+    } else {
+      newSet.add(stageName);
+    }
+    setExpandedPassedSections(newSet);
+  };
+
+  const toggleStep = (step: string) => {
+    const newSet = new Set(expandedSteps);
+    if (newSet.has(step)) {
+      newSet.delete(step);
+    } else {
+      newSet.add(step);
+    }
+    setExpandedSteps(newSet);
+  };
+
+  const extractors = extraction?.extractors ?? [];
+  const totalExtracted = extraction?.totalIssuesBeforeJudge ?? 0;
+  const afterDedup = counts?.issuesAfterDedup ?? extraction?.totalIssuesAfterJudge ?? 0;
+  const afterFilter = counts?.issuesAfterFiltering ?? 0;
+  const commentsGenerated = counts?.commentsGenerated ?? 0;
+  const commentsKept = counts?.commentsKept ?? 0;
+
+  const _filterRemoved = afterDedup - afterFilter;
+  const reviewRemoved = commentsGenerated - commentsKept;
+
+  // Calculate total cost from all extractors + judge + stages
+  const extractorsCost = extractors.reduce((sum, ext) => sum + (ext.costUsd ?? 0), 0);
+  const judgeCost = extraction?.judgeCostUsd ?? 0;
+  const stagesCost = stages?.reduce((sum, s) => sum + (s.costUsd ?? 0), 0) ?? 0;
+  const totalCostUsd = extractorsCost + judgeCost + stagesCost;
+
+  // Get filter stages from telemetry (exclude extraction, comment-generation, review which have their own sections)
+  const coreStages = new Set(["extraction", "comment-generation", "review"]);
+  const filterStages = (stages ?? [])
+    .filter((s) => !coreStages.has(s.stageName))
+    .map((s) => s.stageName);
+
+  // Separate filtered items by stage
+  const getFilteredItemsForStage = (stageName: string): FilteredItem[] =>
+    filteredItems.filter((item) => item.stage === stageName);
+  const getPassedItemsForStage = (stageName: string): PassedItem[] =>
+    (passedItems ?? []).filter((item) => item.stage === stageName);
+  const reviewStageItems = filteredItems.filter((item) => item.stage === "review");
+
+  return (
+    <div className="border rounded-lg bg-white">
+      <div className="px-4 py-3 bg-gray-50 border-b">
+        <h4 className="font-medium text-gray-900">Pipeline Flow</h4>
+      </div>
+
+      <div className="divide-y">
+        {/* Step 1: Extraction */}
+        <PipelineStep
+          step="extraction"
+          title="1. Extraction"
+          summary={`${totalExtracted} issues from ${extractors.length} model${extractors.length !== 1 ? "s" : ""}`}
+          timing={getStageTiming("extraction")?.durationMs}
+          cost={extractorsCost}
+          isExpanded={expandedSteps.has("extraction")}
+          onToggle={() => toggleStep("extraction")}
+          color="blue"
+        >
+          <div className="space-y-3">
+            {extractors.map((ext, i) => (
+              <ExtractorCard key={i} ext={ext} />
+            ))}
+            {extractors.length === 0 && (
+              <p className="text-sm text-gray-500 italic">No extractor data available</p>
+            )}
+            {extraction && (
+              <DeduplicationCard extraction={extraction} extractorCount={extractors.length} />
+            )}
+          </div>
+        </PipelineStep>
+
+        {/* Steps 2+: Dynamic Filter Stages */}
+        {filterStages.map((stageName, index) => {
+          const stageData = getStageTiming(stageName);
+          const stageFilteredItems = getFilteredItemsForStage(stageName);
+          const stagePassedItems = getPassedItemsForStage(stageName);
+          const stageInputCount = stageData?.inputCount ?? afterDedup;
+          const stageOutputCount = stageData?.outputCount ?? stageInputCount;
+          const stageRemovedCount = stageFilteredItems.length;
+
+          return (
+            <PipelineStep
+              key={stageName}
+              step={stageName}
+              title={getFilterStageTitle(stageName, index)}
+              summary={`${stageOutputCount} kept, ${stageRemovedCount} filtered out`}
+              timing={stageData?.durationMs}
+              cost={stageData?.costUsd}
+              isExpanded={expandedSteps.has(stageName)}
+              onToggle={() => toggleStep(stageName)}
+              color="orange"
+            >
+              <div className="space-y-3">
+                <div className="p-3 bg-orange-50 rounded-md">
+                  <div className="grid grid-cols-2 gap-4 text-sm">
+                    <div>
+                      <span className="text-orange-600">Input:</span>
+                      <span className="font-mono ml-2">{stageInputCount}</span>
+                    </div>
+                    <div>
+                      <span className="text-orange-600">Output:</span>
+                      <span className="font-mono ml-2">{stageOutputCount}</span>
+                    </div>
+                  </div>
+                  {/* Filter telemetry details */}
+                  {stageData?.model && (
+                    <div className="mt-2 pt-2 border-t border-orange-200 text-xs text-gray-500">
+                      <span className="font-medium text-gray-600">Model:</span>{" "}
+                      <code className="bg-gray-100 px-1 rounded">{stageData.model}</code>
+                    </div>
+                  )}
+                  {stageData?.actualApiParams && (
+                    <div className="text-xs text-gray-500 mt-1">
+                      <span className="font-medium text-gray-600">API params:</span> temp=
+                      {stageData.actualApiParams.temperature}, maxTokens=
+                      {stageData.actualApiParams.maxTokens}
+                      {stageData.actualApiParams.reasoning?.max_tokens && (
+                        <span>
+                          , reasoning: {formatTokens(stageData.actualApiParams.reasoning.max_tokens)}
+                        </span>
+                      )}
+                      {stageData.actualApiParams.reasoning?.effort &&
+                        !stageData.actualApiParams.reasoning.max_tokens && (
+                          <span>, reasoning: {stageData.actualApiParams.reasoning.effort}</span>
+                        )}
+                    </div>
+                  )}
+                  {stageData?.responseMetrics && (
+                    <div className="text-xs text-gray-500 mt-1">
+                      <span className="font-medium text-gray-600">Response:</span>{" "}
+                      {stageData.responseMetrics.inputTokens && (
+                        <span>in: {formatTokens(stageData.responseMetrics.inputTokens)}</span>
+                      )}
+                      {stageData.responseMetrics.inputTokens &&
+                        stageData.responseMetrics.outputTokens &&
+                        " · "}
+                      {stageData.responseMetrics.outputTokens && (
+                        <span>out: {formatTokens(stageData.responseMetrics.outputTokens)}</span>
+                      )}
+                      {stageData.responseMetrics.latencyMs && (
+                        <span>, latency: {stageData.responseMetrics.latencyMs}ms</span>
+                      )}
+                    </div>
+                  )}
+                </div>
+
+                {/* Passed Items (collapsed by default) */}
+                {stagePassedItems.length > 0 && (
+                  <div>
+                    <button
+                      onClick={() => togglePassedSection(stageName)}
+                      className="flex items-center gap-1 text-sm font-medium text-green-700 hover:text-green-800 mb-2"
+                    >
+                      {expandedPassedSections.has(stageName) ? (
+                        <ChevronDownIcon className="h-4 w-4" />
+                      ) : (
+                        <ChevronRightIcon className="h-4 w-4" />
+                      )}
+                      Passed Through ({stagePassedItems.length})
+                    </button>
+                    {expandedPassedSections.has(stageName) && (
+                      <div className="space-y-2 max-h-64 overflow-y-auto">
+                        {stagePassedItems.map((item, i) => (
+                          <PassedItemCard key={i} item={item} />
+                        ))}
+                      </div>
+                    )}
+                  </div>
+                )}
+
+                {stageFilteredItems.length > 0 && (
+                  <div>
+                    <h5 className="text-sm font-medium text-orange-800 mb-2">
+                      Filtered Out ({stageFilteredItems.length})
+                    </h5>
+                    <div className="space-y-2 max-h-64 overflow-y-auto">
+                      {stageFilteredItems.map((item, i) => (
+                        <FilteredItemCard key={i} item={item} />
+                      ))}
+                    </div>
+                  </div>
+                )}
+
+                {stageFilteredItems.length === 0 && stageRemovedCount > 0 && (
+                  <p className="text-sm text-gray-500 italic">
+                    {stageRemovedCount} items filtered (details not available)
+                  </p>
+                )}
+              </div>
+            </PipelineStep>
+          );
+        })}
+
+        {/* Comment Generation (step number = 2 + filterStages.length) */}
+        <PipelineStep
+          step="generation"
+          title={`${2 + filterStages.length}. Comment Generation`}
+          summary={`${commentsGenerated} comments generated`}
+          timing={getStageTiming("comment-generation")?.durationMs}
+          cost={getStageTiming("comment-generation")?.costUsd}
+          isExpanded={expandedSteps.has("generation")}
+          onToggle={() => toggleStep("generation")}
+          color="teal"
+        >
+          <div className="p-3 bg-teal-50 rounded-md">
+            <div className="grid grid-cols-2 gap-4 text-sm">
+              <div>
+                <span className="text-teal-600">Input (issues):</span>
+                <span className="font-mono ml-2">{afterFilter}</span>
+              </div>
+              <div>
+                <span className="text-teal-600">Output (comments):</span>
+                <span className="font-mono ml-2">{commentsGenerated}</span>
+              </div>
+            </div>
+            <p className="text-xs text-teal-600 mt-2">
+              Issues are converted to user-facing comments with proper formatting
+            </p>
+          </div>
+        </PipelineStep>
+
+        {/* Review (step number = 3 + filterStages.length) */}
+        <PipelineStep
+          step="review"
+          title={`${3 + filterStages.length}. Review Filter`}
+          summary={`${commentsKept} kept, ${reviewRemoved} removed`}
+          timing={getStageTiming("review")?.durationMs}
+          cost={getStageTiming("review")?.costUsd}
+          isExpanded={expandedSteps.has("review")}
+          onToggle={() => toggleStep("review")}
+          color="green"
+        >
+          <div className="space-y-3">
+            <div className="p-3 bg-green-50 rounded-md">
+              <div className="grid grid-cols-2 gap-4 text-sm">
+                <div>
+                  <span className="text-green-600">Input:</span>
+                  <span className="font-mono ml-2">{commentsGenerated}</span>
+                </div>
+                <div>
+                  <span className="text-green-600">Final Output:</span>
+                  <span className="font-mono ml-2 font-bold">{commentsKept}</span>
+                </div>
+              </div>
+            </div>
+
+            {/* Removed by review */}
+            {reviewStageItems.length > 0 && (
+              <div>
+                <h5 className="text-sm font-medium text-red-700 mb-2">
+                  Removed by Review ({reviewStageItems.length})
+                </h5>
+                <div className="space-y-2 max-h-48 overflow-y-auto">
+                  {reviewStageItems.map((item, i) => (
+                    <FilteredItemCard key={i} item={item} />
+                  ))}
+                </div>
+              </div>
+            )}
+
+            {reviewStageItems.length === 0 && reviewRemoved > 0 && (
+              <p className="text-sm text-gray-500 italic">
+                {reviewRemoved} comments removed (details not available)
+              </p>
+            )}
+
+            {/* Final kept comments */}
+            {finalComments.length > 0 && (
+              <div>
+                <h5 className="text-sm font-medium text-green-800 mb-2">
+                  Final Comments ({finalComments.length})
+                </h5>
+                <div className="space-y-2 max-h-64 overflow-y-auto">
+                  {finalComments.map((comment, i) => (
+                    <CommentCard key={i} comment={comment} variant="kept" />
+                  ))}
+                </div>
+              </div>
+            )}
+          </div>
+        </PipelineStep>
+      </div>
+
+      {/* Summary Bar */}
+      <div className="px-4 py-3 bg-gray-100 border-t text-sm">
+        <div className="flex items-center justify-between">
+          <span className="text-gray-600">
+            <span className="font-medium">{totalExtracted}</span> extracted →{" "}
+            <span className="font-medium">{afterDedup}</span> deduped →{" "}
+            <span className="font-medium">{afterFilter}</span> filtered →{" "}
+            <span className="font-medium">{commentsGenerated}</span> generated →{" "}
+            <span className="font-bold text-green-700">{commentsKept}</span> final
+          </span>
+          <div className="flex items-center gap-4 text-gray-500">
+            {totalDurationMs !== undefined && (
+              <span className="font-mono">{formatDuration(totalDurationMs)}</span>
+            )}
+            {totalCostUsd > 0 && (
+              <span className="font-mono text-emerald-600" title="Total pipeline cost">
+                {formatCost(totalCostUsd)}
+              </span>
+            )}
+            <span>
+              {totalExtracted > 0
+                ? `${Math.round((commentsKept / totalExtracted) * 100)}% yield`
+                : "-"}
+            </span>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+}
+
+// ============================================================================
+// PipelineStep Component
+// ============================================================================
+
+interface PipelineStepProps {
+  step: string;
+  title: string;
+  summary: string;
+  timing?: number;
+  cost?: number;
+  isExpanded: boolean;
+  onToggle: () => void;
+  color: "blue" | "purple" | "orange" | "teal" | "green";
+  children: React.ReactNode;
+}
+
+function PipelineStep({
+  title,
+  summary,
+  timing,
+  cost,
+  isExpanded,
+  onToggle,
+  color,
+  children,
+}: PipelineStepProps) {
+  const colorClasses = {
+    blue: "bg-blue-100 text-blue-800 border-blue-200",
+    purple: "bg-purple-100 text-purple-800 border-purple-200",
+    orange: "bg-orange-100 text-orange-800 border-orange-200",
+    teal: "bg-teal-100 text-teal-800 border-teal-200",
+    green: "bg-green-100 text-green-800 border-green-200",
+  };
+
+  return (
+    <div>
+      <button
+        onClick={onToggle}
+        className="w-full px-4 py-3 flex items-center justify-between hover:bg-gray-50 transition-colors"
+      >
+        <div className="flex items-center space-x-3">
+          {isExpanded ? (
+            <ChevronDownIcon className="h-4 w-4 text-gray-500" />
+          ) : (
+            <ChevronRightIcon className="h-4 w-4 text-gray-500" />
+          )}
+          <span className={`px-2 py-0.5 rounded text-xs font-medium ${colorClasses[color]}`}>
+            {title}
+          </span>
+          {timing !== undefined && (
+            <span className="text-xs text-gray-400 font-mono">{formatDuration(timing)}</span>
+          )}
+          {cost !== undefined && cost > 0 && (
+            <span className="text-xs text-emerald-500 font-mono">{formatCost(cost)}</span>
+          )}
+        </div>
+        <span className="text-sm text-gray-600">{summary}</span>
+      </button>
+      {isExpanded && <div className="px-4 pb-4 pt-2">{children}</div>}
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/snapshots/SnapshotComparison.tsx b/apps/web/src/app/monitor/lab/components/snapshots/SnapshotComparison.tsx
new file mode 100644
index 000000000..203b50a8b
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/snapshots/SnapshotComparison.tsx
@@ -0,0 +1,269 @@
+"use client";
+
+import { useState } from "react";
+import { ArrowLeftIcon } from "@heroicons/react/24/outline";
+import type { RunSnapshot, CommentMatch, Comment } from "../../types";
+import { truncate } from "../../utils/formatters";
+import { PipelineView } from "./PipelineView";
+
+interface SnapshotComparisonProps {
+  snapshot: RunSnapshot;
+  onBack: () => void;
+}
+
+interface StatusSummaryProps {
+  snapshot: RunSnapshot;
+}
+
+interface MatchedCommentItemProps {
+  match: CommentMatch;
+}
+
+interface CommentItemProps {
+  comment: Comment;
+}
+
+type ViewTab = "pipeline" | "comparison";
+
+export function SnapshotComparison({ snapshot, onBack }: SnapshotComparisonProps) {
+  const [activeTab, setActiveTab] = useState<ViewTab>("pipeline");
+
+  const comparison = snapshot.comparisonData;
+  const matched = comparison?.matchedComments ?? [];
+  const newComments = comparison?.newComments ?? [];
+  const lostComments = comparison?.lostComments ?? [];
+  const filteredItems = comparison?.filteredItems ?? [];
+  const pipelineCounts = comparison?.pipelineCounts;
+  const extractionPhase = comparison?.extractionPhase;
+  const stages = comparison?.stages;
+  const totalDurationMs = comparison?.totalDurationMs;
+
+  // Collect all final comments for the pipeline view
+  const allFinalComments: Comment[] = [
+    ...matched.map((m) => m.currentComment),
+    ...newComments,
+  ];
+
+  return (
+    <div className="bg-white shadow rounded-lg">
+      {/* Header */}
+      <div className="px-6 py-4 border-b border-gray-200">
+        <button
+          onClick={onBack}
+          className="flex items-center text-sm text-gray-600 hover:text-gray-900 mb-2"
+        >
+          <ArrowLeftIcon className="h-4 w-4 mr-1" />
+          Back to run
+        </button>
+        <h3 className="text-lg font-medium text-gray-900">{snapshot.documentTitle}</h3>
+        <StatusSummary snapshot={snapshot} />
+      </div>
+
+      {/* Tab Navigation */}
+      <div className="px-6 border-b border-gray-200">
+        <nav className="-mb-px flex space-x-6">
+          <TabButton
+            active={activeTab === "pipeline"}
+            onClick={() => setActiveTab("pipeline")}
+            label="Pipeline View"
+          />
+          <TabButton
+            active={activeTab === "comparison"}
+            onClick={() => setActiveTab("comparison")}
+            label="Baseline Comparison"
+          />
+        </nav>
+      </div>
+
+      {/* Tab Content */}
+      <div className="p-4 max-h-[calc(100vh-350px)] overflow-y-auto">
+        {activeTab === "pipeline" ? (
+          <PipelineView
+            extraction={extractionPhase}
+            counts={pipelineCounts}
+            filteredItems={filteredItems}
+            stages={stages}
+            totalDurationMs={totalDurationMs}
+            finalComments={allFinalComments}
+            lostComments={lostComments}
+          />
+        ) : (
+          <ComparisonView
+            matched={matched}
+            newComments={newComments}
+            lostComments={lostComments}
+          />
+        )}
+      </div>
+    </div>
+  );
+}
+
+function TabButton({
+  active,
+  onClick,
+  label,
+}: {
+  active: boolean;
+  onClick: () => void;
+  label: string;
+}) {
+  return (
+    <button
+      onClick={onClick}
+      className={`py-3 px-1 border-b-2 font-medium text-sm transition-colors ${
+        active
+          ? "border-blue-500 text-blue-600"
+          : "border-transparent text-gray-500 hover:text-gray-700 hover:border-gray-300"
+      }`}
+    >
+      {label}
+    </button>
+  );
+}
+
+function StatusSummary({ snapshot }: StatusSummaryProps) {
+  return (
+    <div className="flex items-center space-x-4 mt-2 text-sm">
+      <span className={snapshot.status === "unchanged" ? "text-green-600" : "text-orange-600"}>
+        {snapshot.status === "unchanged" ? "Unchanged" : "Changed"}
+      </span>
+      <span className="text-gray-500">|</span>
+      <span className="text-green-600">{snapshot.keptCount} matched</span>
+      <span className="text-cyan-600">{snapshot.newCount} new</span>
+      <span className="text-red-600">{snapshot.lostCount} gone</span>
+    </div>
+  );
+}
+
+interface ComparisonViewProps {
+  matched: CommentMatch[];
+  newComments: Comment[];
+  lostComments: Comment[];
+}
+
+function ComparisonView({ matched, newComments, lostComments }: ComparisonViewProps) {
+  return (
+    <div className="space-y-6">
+      {/* Matched Comments */}
+      <ComparisonSection
+        title={`Matched (${matched.length})`}
+        titleColor="text-green-600"
+        isEmpty={matched.length === 0}
+        emptyMessage="No matched comments"
+      >
+        {matched.map((match, i) => (
+          <MatchedCommentItem key={i} match={match} />
+        ))}
+      </ComparisonSection>
+
+      {/* New Comments */}
+      <ComparisonSection
+        title={`New in Current (${newComments.length})`}
+        titleColor="text-cyan-600"
+        isEmpty={newComments.length === 0}
+        emptyMessage="No new comments"
+      >
+        {newComments.map((comment, i) => (
+          <CommentItem key={i} comment={comment} />
+        ))}
+      </ComparisonSection>
+
+      {/* Lost Comments */}
+      <ComparisonSection
+        title={`Gone from Baseline (${lostComments.length})`}
+        titleColor="text-red-600"
+        isEmpty={lostComments.length === 0}
+        emptyMessage="No lost comments"
+      >
+        {lostComments.map((comment, i) => (
+          <CommentItem key={i} comment={comment} />
+        ))}
+      </ComparisonSection>
+    </div>
+  );
+}
+
+interface ComparisonSectionProps {
+  title: string;
+  titleColor: string;
+  isEmpty: boolean;
+  emptyMessage: string;
+  children: React.ReactNode;
+}
+
+function ComparisonSection({
+  title,
+  titleColor,
+  isEmpty,
+  emptyMessage,
+  children,
+}: ComparisonSectionProps) {
+  return (
+    <div>
+      <h4 className={`text-sm font-medium ${titleColor} mb-3`}>{title}</h4>
+      {isEmpty ? (
+        <p className="text-sm text-gray-400 italic">{emptyMessage}</p>
+      ) : (
+        <div className="space-y-2">{children}</div>
+      )}
+    </div>
+  );
+}
+
+function MatchedCommentItem({ match }: MatchedCommentItemProps) {
+  const [expanded, setExpanded] = useState(false);
+  const comment = match.baselineComment;
+
+  const needsExpand = comment.quotedText.length > 100 || comment.description.length > 150;
+
+  return (
+    <div
+      className={`p-3 bg-green-50 rounded-md border border-green-100 ${needsExpand ? "cursor-pointer hover:bg-green-100" : ""}`}
+      onClick={() => needsExpand && setExpanded(!expanded)}
+    >
+      <div className="text-sm flex items-center justify-between">
+        <div>
+          <span className="font-medium text-gray-900">{comment.header || "Comment"}</span>
+          <span className="text-gray-500 ml-2">
+            (confidence: {Math.round(match.matchConfidence * 100)}%)
+          </span>
+        </div>
+        {needsExpand && (
+          <span className="text-xs text-green-600">{expanded ? "collapse" : "expand"}</span>
+        )}
+      </div>
+      <p className="text-sm text-gray-600 mt-1 whitespace-pre-wrap">
+        {expanded ? comment.quotedText : truncate(comment.quotedText, 100)}
+      </p>
+      <p className="text-xs text-gray-500 mt-1 whitespace-pre-wrap">
+        {expanded ? comment.description : truncate(comment.description, 150)}
+      </p>
+    </div>
+  );
+}
+
+function CommentItem({ comment }: CommentItemProps) {
+  const [expanded, setExpanded] = useState(false);
+  const needsExpand = comment.quotedText.length > 100 || comment.description.length > 150;
+
+  return (
+    <div
+      className={`p-3 bg-gray-50 rounded-md ${needsExpand ? "cursor-pointer hover:bg-gray-100" : ""}`}
+      onClick={() => needsExpand && setExpanded(!expanded)}
+    >
+      <div className="text-sm font-medium text-gray-900 flex items-center justify-between">
+        <span>{comment.header || "Comment"}</span>
+        {needsExpand && (
+          <span className="text-xs text-gray-400">{expanded ? "collapse" : "expand"}</span>
+        )}
+      </div>
+      <p className="text-sm text-gray-600 mt-1 whitespace-pre-wrap">
+        {expanded ? comment.quotedText : truncate(comment.quotedText, 100)}
+      </p>
+      <p className="text-xs text-gray-500 mt-1 whitespace-pre-wrap">
+        {expanded ? comment.description : truncate(comment.description, 150)}
+      </p>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/snapshots/pipelineUtils.ts b/apps/web/src/app/monitor/lab/components/snapshots/pipelineUtils.ts
new file mode 100644
index 000000000..61d3390cd
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/snapshots/pipelineUtils.ts
@@ -0,0 +1,109 @@
+/**
+ * Utility functions for Pipeline visualization
+ */
+
+import type { ExtractorInfo } from "../../types";
+
+export function formatDuration(ms: number | undefined): string {
+  if (ms === undefined) return "-";
+  if (ms < 1000) return `${ms}ms`;
+  if (ms < 60000) return `${(ms / 1000).toFixed(1)}s`;
+  return `${(ms / 60000).toFixed(1)}m`;
+}
+
+export function formatCost(usd: number | undefined): string {
+  if (usd === undefined) return "";
+  return `$${usd.toFixed(4)}`;
+}
+
+export function formatTokens(tokens: number | undefined): string {
+  if (tokens === undefined) return "";
+  if (tokens >= 1000) return `${(tokens / 1000).toFixed(1)}k`;
+  return String(tokens);
+}
+
+/** Extract a friendly model name from the full model ID */
+export function getModelDisplayName(model: string): string {
+  // Remove provider prefix (e.g., "google/gemini-2.5-flash" -> "gemini-2.5-flash")
+  const withoutProvider = model.includes("/") ? model.split("/")[1] : model;
+
+  // Shorten common model names
+  const shortcuts: Record<string, string> = {
+    "claude-sonnet-4-5-20250929": "Claude Sonnet 4.5",
+    "claude-3-5-sonnet-20241022": "Claude 3.5 Sonnet",
+    "claude-3-haiku-20240307": "Claude 3 Haiku",
+    "gemini-3-flash-preview": "Gemini 3 Flash",
+    "gemini-2.5-flash": "Gemini 2.5 Flash",
+    "gemini-2.5-pro": "Gemini 2.5 Pro",
+    "gpt-4-turbo": "GPT-4 Turbo",
+    "gpt-4o": "GPT-4o",
+  };
+
+  return shortcuts[withoutProvider] || withoutProvider;
+}
+
+/** Format temperature for display */
+export function formatTemperature(ext: ExtractorInfo): string {
+  // Check actualApiParams first (source of truth)
+  if (ext.actualApiParams?.temperature !== undefined) {
+    return `temp ${ext.actualApiParams.temperature}`;
+  }
+  // Fall back to temperatureConfig
+  if (ext.temperatureConfig === "default") {
+    return "temp default";
+  }
+  if (typeof ext.temperatureConfig === "number") {
+    return `temp ${ext.temperatureConfig}`;
+  }
+  return "";
+}
+
+/** Format reasoning/thinking for display */
+export function formatReasoning(ext: ExtractorInfo): string {
+  // Check actualApiParams for Claude-style thinking
+  if (ext.actualApiParams?.thinking?.type === "enabled") {
+    const budget = ext.actualApiParams.thinking.budget_tokens;
+    return `thinking ${formatTokens(budget)} tokens`;
+  }
+  // Check for OpenRouter-style reasoning with explicit max_tokens (preferred)
+  if (ext.actualApiParams?.reasoning?.max_tokens) {
+    const budget = ext.actualApiParams.reasoning.max_tokens;
+    return `reasoning ${formatTokens(budget)} tokens`;
+  }
+  // Check for OpenRouter-style reasoning effort (fallback)
+  if (ext.actualApiParams?.reasoning?.effort) {
+    return `reasoning: ${ext.actualApiParams.reasoning.effort}`;
+  }
+  if (ext.thinkingEnabled === true) {
+    return "thinking enabled";
+  }
+  if (ext.thinkingEnabled === false) {
+    return "no thinking";
+  }
+  return "";
+}
+
+/** Get a human-readable title for a filter stage */
+export function getFilterStageTitle(stageName: string, index: number): string {
+  const titles: Record<string, string> = {
+    "principle-of-charity-filter": "Principle of Charity",
+    "supported-elsewhere-filter": "Supported-Elsewhere",
+    "severity-filter": "Severity",
+    "confidence-filter": "Confidence",
+    "dedup-filter": "Deduplication",
+  };
+  const base = titles[stageName] || stageName.replace(/-filter$/, "").replace(/-/g, " ");
+  return `${index + 2}. ${base.charAt(0).toUpperCase() + base.slice(1)} Filter`;
+}
+
+/** Get filter stage badge text */
+export function getFilterStageBadgeText(stage: string): string {
+  const labels: Record<string, string> = {
+    "principle-of-charity-filter": "Charity",
+    "supported-elsewhere-filter": "Elsewhere",
+    "severity-filter": "Severity",
+    "confidence-filter": "Confidence",
+    "review": "Review",
+  };
+  return labels[stage] || stage.replace(/-filter$/, "");
+}
diff --git a/apps/web/src/app/monitor/lab/components/tabs/BaselinesTab.tsx b/apps/web/src/app/monitor/lab/components/tabs/BaselinesTab.tsx
new file mode 100644
index 000000000..9d7398647
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/tabs/BaselinesTab.tsx
@@ -0,0 +1,105 @@
+"use client";
+
+import { useEffect, useState } from "react";
+import { useBaselines } from "../../hooks/useBaselines";
+import { BaselineList } from "../baselines/BaselineList";
+import { CreateBaselineModal } from "../baselines/CreateBaselineModal";
+import type { Baseline } from "../../types";
+
+interface BaselinesTabProps {
+  agentId: string;
+  selectedBaseline: Baseline | null;
+  onSelectBaseline: (baseline: Baseline | null) => void;
+}
+
+export function BaselinesTab({ agentId, selectedBaseline, onSelectBaseline }: BaselinesTabProps) {
+  const { baselines, loading, error, refresh, deleteBaseline } = useBaselines(agentId);
+  const [showCreateModal, setShowCreateModal] = useState(false);
+
+  useEffect(() => {
+    void refresh();
+  }, [refresh]);
+
+  const handleBaselineCreated = () => {
+    setShowCreateModal(false);
+    void refresh();
+  };
+
+  const handleDelete = async (id: string) => {
+    if (confirm("Delete this baseline? This cannot be undone.")) {
+      await deleteBaseline(id);
+      if (selectedBaseline?.id === id) {
+        onSelectBaseline(null);
+      }
+    }
+  };
+
+  if (loading) {
+    return <LoadingState message="Loading baselines..." />;
+  }
+
+  if (error) {
+    return <ErrorState message={error} />;
+  }
+
+  return (
+    <div className="space-y-4">
+      <div className="flex items-center justify-between">
+        <h2 className="text-lg font-semibold text-gray-900">Validation Baselines</h2>
+        <button
+          onClick={() => setShowCreateModal(true)}
+          className="px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700 text-sm font-medium"
+        >
+          Create Baseline
+        </button>
+      </div>
+
+      {baselines.length === 0 ? (
+        <EmptyState
+          message="No baselines yet"
+          action="Create a baseline to start validation testing"
+        />
+      ) : (
+        <BaselineList
+          baselines={baselines}
+          selectedId={selectedBaseline?.id ?? null}
+          onSelect={onSelectBaseline}
+          onDelete={(id) => void handleDelete(id)}
+        />
+      )}
+
+      {showCreateModal && (
+        <CreateBaselineModal
+          agentId={agentId}
+          onClose={() => setShowCreateModal(false)}
+          onCreated={handleBaselineCreated}
+        />
+      )}
+    </div>
+  );
+}
+
+function LoadingState({ message }: { message: string }) {
+  return (
+    <div className="flex items-center justify-center h-64">
+      <div className="text-gray-600">{message}</div>
+    </div>
+  );
+}
+
+function ErrorState({ message }: { message: string }) {
+  return (
+    <div className="flex items-center justify-center h-64">
+      <div className="text-red-600">Error: {message}</div>
+    </div>
+  );
+}
+
+function EmptyState({ message, action }: { message: string; action: string }) {
+  return (
+    <div className="bg-white rounded-lg shadow p-8 text-center">
+      <p className="text-gray-600 mb-2">{message}</p>
+      <p className="text-gray-400 text-sm">{action}</p>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/tabs/HistoryTab.tsx b/apps/web/src/app/monitor/lab/components/tabs/HistoryTab.tsx
new file mode 100644
index 000000000..fca5bff6f
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/tabs/HistoryTab.tsx
@@ -0,0 +1,303 @@
+"use client";
+
+import { useEffect, useState } from "react";
+import { useBaselines } from "../../hooks/useBaselines";
+import { useRuns } from "../../hooks/useRuns";
+import { formatDate } from "../../utils/formatters";
+import type { Baseline, ValidationRun, ValidationRunDetail, RunSnapshot } from "../../types";
+import { SnapshotComparison } from "../snapshots/SnapshotComparison";
+import { ChevronRightIcon, TrashIcon } from "@heroicons/react/24/outline";
+
+interface HistoryTabProps {
+  agentId: string;
+  selectedBaseline: Baseline | null;
+  onSelectBaseline: (baseline: Baseline | null) => void;
+}
+
+export function HistoryTab({ agentId, selectedBaseline, onSelectBaseline }: HistoryTabProps) {
+  const { baselines, loading: baselinesLoading, refresh: refreshBaselines } = useBaselines(agentId);
+  const { runs, loading: runsLoading, refresh: refreshRuns, getRunDetail, deleteRun } = useRuns(selectedBaseline?.id ?? null);
+  const [selectedRun, setSelectedRun] = useState<ValidationRunDetail | null>(null);
+  const [selectedSnapshot, setSelectedSnapshot] = useState<RunSnapshot | null>(null);
+  const [loadingDetail, setLoadingDetail] = useState(false);
+
+  useEffect(() => {
+    void refreshBaselines();
+  }, [refreshBaselines]);
+
+  useEffect(() => {
+    if (selectedBaseline) {
+      void refreshRuns();
+      setSelectedRun(null);
+      setSelectedSnapshot(null);
+    }
+  }, [selectedBaseline, refreshRuns]);
+
+  const handleSelectRun = async (run: ValidationRun) => {
+    setLoadingDetail(true);
+    setSelectedSnapshot(null);
+    try {
+      const detail = await getRunDetail(run.id);
+      setSelectedRun(detail);
+    } finally {
+      setLoadingDetail(false);
+    }
+  };
+
+  const handleDeleteRun = async (runId: string) => {
+    if (confirm("Delete this run? This cannot be undone.")) {
+      await deleteRun(runId);
+      if (selectedRun?.id === runId) {
+        setSelectedRun(null);
+        setSelectedSnapshot(null);
+      }
+    }
+  };
+
+  if (baselinesLoading) {
+    return <LoadingState message="Loading baselines..." />;
+  }
+
+  if (baselines.length === 0) {
+    return (
+      <EmptyState
+        message="No baselines available"
+        action="Create a baseline in the Baselines tab first"
+      />
+    );
+  }
+
+  return (
+    <div className="space-y-6">
+      {/* Baseline Selector */}
+      <div className="bg-white rounded-lg shadow p-4">
+        <select
+          value={selectedBaseline?.id ?? ""}
+          onChange={(e) => {
+            const baseline = baselines.find((b) => b.id === e.target.value) ?? null;
+            onSelectBaseline(baseline);
+          }}
+          className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500"
+        >
+          <option value="">Choose a baseline to view history...</option>
+          {baselines.map((b) => (
+            <option key={b.id} value={b.id}>
+              {b.name} ({b.snapshotCount} docs)
+            </option>
+          ))}
+        </select>
+      </div>
+
+      {selectedBaseline && (
+        <div className="grid grid-cols-12 gap-6">
+          {/* Run List */}
+          <div className="col-span-4 bg-white shadow rounded-lg">
+            <div className="px-4 py-3 border-b border-gray-200">
+              <h3 className="text-sm font-medium text-gray-900">Validation Runs</h3>
+            </div>
+            <div className="divide-y divide-gray-200 max-h-[calc(100vh-400px)] overflow-y-auto">
+              {runsLoading ? (
+                <div className="p-4 text-center text-gray-500">Loading...</div>
+              ) : runs.length === 0 ? (
+                <div className="p-4 text-center text-gray-500">No runs yet</div>
+              ) : (
+                runs.map((run) => (
+                  <RunListItem
+                    key={run.id}
+                    run={run}
+                    isSelected={selectedRun?.id === run.id}
+                    onSelect={() => void handleSelectRun(run)}
+                    onDelete={() => void handleDeleteRun(run.id)}
+                  />
+                ))
+              )}
+            </div>
+          </div>
+
+          {/* Run Detail / Snapshot List */}
+          <div className="col-span-8">
+            {loadingDetail ? (
+              <LoadingState message="Loading run details..." />
+            ) : selectedSnapshot ? (
+              <SnapshotComparison
+                snapshot={selectedSnapshot}
+                onBack={() => setSelectedSnapshot(null)}
+              />
+            ) : selectedRun ? (
+              <RunDetail
+                run={selectedRun}
+                onSelectSnapshot={setSelectedSnapshot}
+              />
+            ) : (
+              <div className="bg-white shadow rounded-lg p-6 text-center text-gray-500">
+                Select a run to view details
+              </div>
+            )}
+          </div>
+        </div>
+      )}
+    </div>
+  );
+}
+
+interface RunListItemProps {
+  run: ValidationRun;
+  isSelected: boolean;
+  onSelect: () => void;
+  onDelete: () => void;
+}
+
+function RunListItem({ run, isSelected, onSelect, onDelete }: RunListItemProps) {
+  return (
+    <div
+      onClick={onSelect}
+      className={`p-3 cursor-pointer transition-colors ${
+        isSelected ? "bg-blue-50 border-l-4 border-blue-500" : "hover:bg-gray-50"
+      }`}
+    >
+      <div className="flex items-center justify-between">
+        <div className="flex-1 min-w-0">
+          <div className="flex items-center space-x-2">
+            <span className="font-medium text-sm truncate">
+              {run.name || `Run ${run.id.slice(0, 8)}`}
+            </span>
+            <StatusBadge status={run.status} />
+          </div>
+          <div className="text-xs text-gray-500 mt-1">{formatDate(run.createdAt)}</div>
+          {run.status === "completed" && (
+            <div className="text-xs mt-1">
+              <span className="text-green-600">{run.unchangedCount} unchanged</span>
+              {run.changedCount > 0 && (
+                <span className="text-orange-600 ml-2">{run.changedCount} changed</span>
+              )}
+            </div>
+          )}
+        </div>
+        <button
+          onClick={(e) => {
+            e.stopPropagation();
+            onDelete();
+          }}
+          className="p-1 text-gray-400 hover:text-red-600"
+        >
+          <TrashIcon className="h-4 w-4" />
+        </button>
+      </div>
+    </div>
+  );
+}
+
+interface RunDetailProps {
+  run: ValidationRunDetail;
+  onSelectSnapshot: (snapshot: RunSnapshot) => void;
+}
+
+function RunDetail({ run, onSelectSnapshot }: RunDetailProps) {
+  const unchangedSnapshots = run.snapshots.filter((s) => s.status === "unchanged");
+  const changedSnapshots = run.snapshots.filter((s) => s.status === "changed");
+
+  return (
+    <div className="bg-white shadow rounded-lg">
+      <div className="px-6 py-4 border-b border-gray-200">
+        <h3 className="text-lg font-medium text-gray-900">
+          {run.name || `Run ${run.id.slice(0, 8)}`}
+        </h3>
+        <p className="text-sm text-gray-500 mt-1">
+          {formatDate(run.createdAt)} | Baseline: {run.baseline.name}
+        </p>
+        {run.summary && <p className="text-sm text-gray-600 mt-2">{run.summary}</p>}
+      </div>
+
+      <div className="divide-y divide-gray-200 max-h-[calc(100vh-450px)] overflow-y-auto">
+        {/* Changed Snapshots First */}
+        {changedSnapshots.length > 0 && (
+          <div className="p-4">
+            <h4 className="text-sm font-medium text-orange-600 mb-2">
+              Changed ({changedSnapshots.length})
+            </h4>
+            <div className="space-y-2">
+              {changedSnapshots.map((snapshot) => (
+                <SnapshotListItem
+                  key={snapshot.id}
+                  snapshot={snapshot}
+                  onSelect={() => onSelectSnapshot(snapshot)}
+                />
+              ))}
+            </div>
+          </div>
+        )}
+
+        {/* Unchanged Snapshots */}
+        {unchangedSnapshots.length > 0 && (
+          <div className="p-4">
+            <h4 className="text-sm font-medium text-green-600 mb-2">
+              Unchanged ({unchangedSnapshots.length})
+            </h4>
+            <div className="space-y-2">
+              {unchangedSnapshots.map((snapshot) => (
+                <SnapshotListItem
+                  key={snapshot.id}
+                  snapshot={snapshot}
+                  onSelect={() => onSelectSnapshot(snapshot)}
+                />
+              ))}
+            </div>
+          </div>
+        )}
+      </div>
+    </div>
+  );
+}
+
+function SnapshotListItem({
+  snapshot,
+  onSelect,
+}: {
+  snapshot: RunSnapshot;
+  onSelect: () => void;
+}) {
+  return (
+    <div
+      onClick={onSelect}
+      className="flex items-center justify-between p-3 bg-gray-50 rounded-md hover:bg-gray-100 cursor-pointer"
+    >
+      <div className="flex-1 min-w-0">
+        <p className="text-sm font-medium text-gray-900 truncate">{snapshot.documentTitle}</p>
+        <p className="text-xs text-gray-500">
+          {snapshot.keptCount} matched | {snapshot.newCount} new | {snapshot.lostCount} gone
+        </p>
+      </div>
+      <ChevronRightIcon className="h-4 w-4 text-gray-400" />
+    </div>
+  );
+}
+
+function StatusBadge({ status }: { status: string }) {
+  const colors = {
+    running: "bg-yellow-100 text-yellow-800",
+    completed: "bg-green-100 text-green-800",
+    failed: "bg-red-100 text-red-800",
+  };
+  return (
+    <span className={`px-2 py-0.5 text-xs rounded-full ${colors[status as keyof typeof colors] || "bg-gray-100 text-gray-800"}`}>
+      {status}
+    </span>
+  );
+}
+
+function LoadingState({ message }: { message: string }) {
+  return (
+    <div className="bg-white shadow rounded-lg p-6 text-center">
+      <div className="text-gray-600">{message}</div>
+    </div>
+  );
+}
+
+function EmptyState({ message, action }: { message: string; action: string }) {
+  return (
+    <div className="bg-white rounded-lg shadow p-8 text-center">
+      <p className="text-gray-600 mb-2">{message}</p>
+      <p className="text-gray-400 text-sm">{action}</p>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/components/tabs/RunTab.tsx b/apps/web/src/app/monitor/lab/components/tabs/RunTab.tsx
new file mode 100644
index 000000000..c6870ba7f
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/components/tabs/RunTab.tsx
@@ -0,0 +1,312 @@
+"use client";
+
+import { useEffect, useState, useCallback } from "react";
+import { useBaselines } from "../../hooks/useBaselines";
+import { formatDate } from "../../utils/formatters";
+import type { Baseline } from "../../types";
+import { PlayIcon, ArrowPathIcon, CheckCircleIcon, XCircleIcon } from "@heroicons/react/24/outline";
+
+interface RunTabProps {
+  agentId: string;
+  selectedBaseline: Baseline | null;
+  onSelectBaseline: (baseline: Baseline | null) => void;
+}
+
+interface RunProgress {
+  phase: "idle" | "starting" | "running" | "comparing" | "done" | "error";
+  message: string;
+  completed: number;
+  total: number;
+  runId?: string;
+  error?: string;
+}
+
+export function RunTab({ agentId, selectedBaseline, onSelectBaseline }: RunTabProps) {
+  const { baselines, loading: baselinesLoading, refresh: refreshBaselines } = useBaselines(agentId);
+  const [runName, setRunName] = useState("");
+  const [progress, setProgress] = useState<RunProgress>({
+    phase: "idle",
+    message: "",
+    completed: 0,
+    total: 0,
+  });
+
+  useEffect(() => {
+    void refreshBaselines();
+  }, [refreshBaselines]);
+
+  const pollJobStatus = useCallback(async (jobIds: string[]): Promise<boolean> => {
+    const res = await fetch(`/api/monitor/lab/jobs/status?jobIds=${jobIds.join(",")}`);
+    if (!res.ok) throw new Error("Failed to check job status");
+
+    const data = await res.json();
+    setProgress((p) => ({
+      ...p,
+      completed: data.summary.completed + data.summary.failed,
+      total: data.summary.total,
+      message: `${data.summary.completed} completed, ${data.summary.running} running, ${data.summary.pending} pending`,
+    }));
+
+    return data.summary.allDone;
+  }, []);
+
+  const startRun = async () => {
+    if (!selectedBaseline) return;
+
+    setProgress({
+      phase: "starting",
+      message: "Creating validation run...",
+      completed: 0,
+      total: 0,
+    });
+
+    try {
+      // Start the run
+      const startRes = await fetch("/api/monitor/lab/runs/start", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({
+          baselineId: selectedBaseline.id,
+          name: runName || undefined,
+        }),
+      });
+
+      if (!startRes.ok) {
+        const err = await startRes.json();
+        throw new Error(err.error || "Failed to start run");
+      }
+
+      const startData = await startRes.json();
+      const runId = startData.run.id;
+      const { jobIds } = startData;
+
+      setProgress({
+        phase: "running",
+        message: `Evaluating ${jobIds.length} documents...`,
+        completed: 0,
+        total: jobIds.length,
+        runId,
+      });
+
+      // Poll for job completion
+      const maxWaitMs = 10 * 60 * 1000; // 10 minutes
+      const pollIntervalMs = 3000; // 3 seconds
+      const startTime = Date.now();
+
+      while (Date.now() - startTime < maxWaitMs) {
+        const allDone = await pollJobStatus(jobIds);
+        if (allDone) break;
+        await new Promise((r) => setTimeout(r, pollIntervalMs));
+      }
+
+      // Finalize the run (compare results)
+      setProgress((p) => ({
+        ...p,
+        phase: "comparing",
+        message: "Comparing results with baseline...",
+      }));
+
+      const finalizeRes = await fetch(`/api/monitor/lab/runs/${runId}/finalize`, {
+        method: "POST",
+      });
+
+      if (!finalizeRes.ok) {
+        const err = await finalizeRes.json();
+        throw new Error(err.error || "Failed to finalize run");
+      }
+
+      const finalizeData = await finalizeRes.json();
+
+      setProgress({
+        phase: "done",
+        message: finalizeData.summary,
+        completed: finalizeData.unchangedCount + finalizeData.changedCount,
+        total: finalizeData.unchangedCount + finalizeData.changedCount,
+        runId,
+      });
+
+      setRunName("");
+    } catch (error) {
+      setProgress((p) => ({
+        ...p,
+        phase: "error",
+        message: error instanceof Error ? error.message : "Unknown error",
+        error: error instanceof Error ? error.message : "Unknown error",
+      }));
+    }
+  };
+
+  if (baselinesLoading) {
+    return <LoadingState message="Loading baselines..." />;
+  }
+
+  if (baselines.length === 0) {
+    return (
+      <EmptyState
+        message="No baselines available"
+        action="Create a baseline in the Baselines tab first"
+      />
+    );
+  }
+
+  const isRunning = progress.phase === "starting" || progress.phase === "running" || progress.phase === "comparing";
+
+  return (
+    <div className="space-y-6">
+      {/* Baseline Selector */}
+      <div className="bg-white rounded-lg shadow p-6">
+        <h3 className="text-lg font-medium text-gray-900 mb-4">Select Baseline</h3>
+        <select
+          value={selectedBaseline?.id ?? ""}
+          onChange={(e) => {
+            const baseline = baselines.find((b) => b.id === e.target.value) ?? null;
+            onSelectBaseline(baseline);
+          }}
+          disabled={isRunning}
+          className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-gray-100"
+        >
+          <option value="">Choose a baseline...</option>
+          {baselines.map((b) => (
+            <option key={b.id} value={b.id}>
+              {b.name} ({b.snapshotCount} docs) - {formatDate(b.createdAt)}
+            </option>
+          ))}
+        </select>
+      </div>
+
+      {/* Run Configuration */}
+      {selectedBaseline && (
+        <div className="bg-white rounded-lg shadow p-6">
+          <h3 className="text-lg font-medium text-gray-900 mb-4">Run Validation</h3>
+          <div className="space-y-4">
+            <div>
+              <label className="block text-sm font-medium text-gray-700 mb-1">
+                Run Name (optional)
+              </label>
+              <input
+                type="text"
+                value={runName}
+                onChange={(e) => setRunName(e.target.value)}
+                placeholder="e.g., After filter changes"
+                disabled={isRunning}
+                className="w-full px-3 py-2 border border-gray-300 rounded-md focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:bg-gray-100"
+              />
+            </div>
+            <div className="flex items-center space-x-4">
+              <button
+                onClick={() => void startRun()}
+                disabled={isRunning}
+                className="flex items-center space-x-2 px-4 py-2 bg-green-600 text-white rounded-md hover:bg-green-700 disabled:opacity-50 disabled:cursor-not-allowed"
+              >
+                {isRunning ? (
+                  <ArrowPathIcon className="h-4 w-4 animate-spin" />
+                ) : (
+                  <PlayIcon className="h-4 w-4" />
+                )}
+                <span>{isRunning ? "Running..." : "Start Run"}</span>
+              </button>
+              <span className="text-sm text-gray-500">
+                Will re-evaluate {selectedBaseline.snapshotCount} documents
+              </span>
+            </div>
+          </div>
+        </div>
+      )}
+
+      {/* Progress */}
+      {progress.phase !== "idle" && (
+        <ProgressPanel progress={progress} />
+      )}
+    </div>
+  );
+}
+
+function ProgressPanel({ progress }: { progress: RunProgress }) {
+  const getStatusColor = () => {
+    switch (progress.phase) {
+      case "done":
+        return "bg-green-50 border-green-200";
+      case "error":
+        return "bg-red-50 border-red-200";
+      case "idle":
+      case "starting":
+      case "running":
+      case "comparing":
+        return "bg-blue-50 border-blue-200";
+    }
+  };
+
+  const getIcon = () => {
+    switch (progress.phase) {
+      case "done":
+        return <CheckCircleIcon className="h-5 w-5 text-green-600" />;
+      case "error":
+        return <XCircleIcon className="h-5 w-5 text-red-600" />;
+      case "idle":
+      case "starting":
+      case "running":
+      case "comparing":
+        return <ArrowPathIcon className="h-5 w-5 text-blue-600 animate-spin" />;
+    }
+  };
+
+  const progressPercent = progress.total > 0 ? Math.round((progress.completed / progress.total) * 100) : 0;
+
+  return (
+    <div className={`rounded-lg border p-4 ${getStatusColor()}`}>
+      <div className="flex items-start space-x-3">
+        {getIcon()}
+        <div className="flex-1">
+          <h4 className="font-medium text-gray-900">
+            {progress.phase === "starting" && "Starting..."}
+            {progress.phase === "running" && "Running Evaluations"}
+            {progress.phase === "comparing" && "Comparing Results"}
+            {progress.phase === "done" && "Completed"}
+            {progress.phase === "error" && "Error"}
+          </h4>
+          <p className="text-sm text-gray-600 mt-1">{progress.message}</p>
+
+          {/* Progress bar */}
+          {(progress.phase === "running" || progress.phase === "comparing") && progress.total > 0 && (
+            <div className="mt-3">
+              <div className="flex justify-between text-xs text-gray-500 mb-1">
+                <span>{progress.completed} / {progress.total}</span>
+                <span>{progressPercent}%</span>
+              </div>
+              <div className="h-2 bg-gray-200 rounded-full overflow-hidden">
+                <div
+                  className="h-full bg-blue-600 transition-all duration-300"
+                  style={{ width: `${progressPercent}%` }}
+                />
+              </div>
+            </div>
+          )}
+
+          {/* Link to history */}
+          {progress.phase === "done" && progress.runId && (
+            <p className="text-sm text-blue-600 mt-2">
+              View results in the History tab
+            </p>
+          )}
+        </div>
+      </div>
+    </div>
+  );
+}
+
+function LoadingState({ message }: { message: string }) {
+  return (
+    <div className="flex items-center justify-center h-64">
+      <div className="text-gray-600">{message}</div>
+    </div>
+  );
+}
+
+function EmptyState({ message, action }: { message: string; action: string }) {
+  return (
+    <div className="bg-white rounded-lg shadow p-8 text-center">
+      <p className="text-gray-600 mb-2">{message}</p>
+      <p className="text-gray-400 text-sm">{action}</p>
+    </div>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useAllEvaluations.ts b/apps/web/src/app/monitor/lab/hooks/useAllEvaluations.ts
new file mode 100644
index 000000000..9aa4610b9
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useAllEvaluations.ts
@@ -0,0 +1,68 @@
+import { useState, useCallback, useEffect } from "react";
+import type { Comment, StageMetrics, ExtractionPhase, FilteredItem, PassedItem, PipelineCounts } from "../types";
+
+export interface EvaluationWithTelemetry {
+  id: string;
+  evaluationId: string;
+  version: number;
+  grade: number | null;
+  summary: string | null;
+  createdAt: string;
+  documentId: string;
+  documentTitle: string;
+  agentId: string;
+  agentName: string;
+  comments: Comment[];
+  telemetry: {
+    stages?: StageMetrics[];
+    extractionPhase?: ExtractionPhase;
+    filteredItems?: FilteredItem[];
+    passedItems?: PassedItem[];
+    pipelineCounts?: PipelineCounts;
+    totalDurationMs?: number;
+    totalCostUsd?: number;
+    documentLength?: number;
+    profileInfo?: {
+      profileId: string | null;
+      profileName: string | null;
+    };
+  } | null;
+}
+
+interface UseAllEvaluationsReturn {
+  evaluations: EvaluationWithTelemetry[];
+  loading: boolean;
+  error: string | null;
+  refresh: () => Promise<void>;
+}
+
+export function useAllEvaluations(agentId?: string): UseAllEvaluationsReturn {
+  const [evaluations, setEvaluations] = useState<EvaluationWithTelemetry[]>([]);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  const refresh = useCallback(async () => {
+    setLoading(true);
+    setError(null);
+    try {
+      const params = new URLSearchParams();
+      params.set("limit", "50");
+      if (agentId) params.set("agentId", agentId);
+
+      const res = await fetch(`/api/monitor/lab/evaluations?${params.toString()}`);
+      if (!res.ok) throw new Error("Failed to fetch evaluations");
+      const data = await res.json();
+      setEvaluations(data.evaluations);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : "Unknown error");
+    } finally {
+      setLoading(false);
+    }
+  }, [agentId]);
+
+  useEffect(() => {
+    void refresh();
+  }, [refresh]);
+
+  return { evaluations, loading, error, refresh };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useBaselines.ts b/apps/web/src/app/monitor/lab/hooks/useBaselines.ts
new file mode 100644
index 000000000..603e7251c
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useBaselines.ts
@@ -0,0 +1,63 @@
+import { useState, useCallback, useEffect } from "react";
+import type { Baseline } from "../types";
+
+interface UseBaselinesReturn {
+  baselines: Baseline[];
+  loading: boolean;
+  error: string | null;
+  refresh: () => Promise<void>;
+  createBaseline: (name: string, description: string, evaluationVersionIds: string[]) => Promise<Baseline>;
+  deleteBaseline: (id: string) => Promise<void>;
+}
+
+export function useBaselines(agentId: string): UseBaselinesReturn {
+  const [baselines, setBaselines] = useState<Baseline[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  const refresh = useCallback(async () => {
+    if (!agentId) return;
+    setLoading(true);
+    setError(null);
+    try {
+      const res = await fetch(`/api/monitor/lab/baselines?agentId=${agentId}`);
+      if (!res.ok) throw new Error("Failed to fetch baselines");
+      const data = await res.json();
+      setBaselines(data.baselines);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : "Unknown error");
+    } finally {
+      setLoading(false);
+    }
+  }, [agentId]);
+
+  useEffect(() => {
+    void refresh();
+  }, [refresh]);
+
+  const createBaseline = useCallback(
+    async (name: string, description: string, evaluationVersionIds: string[]): Promise<Baseline> => {
+      const res = await fetch("/api/monitor/lab/baselines", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ name, description, agentId, evaluationVersionIds }),
+      });
+      if (!res.ok) throw new Error("Failed to create baseline");
+      const data = await res.json();
+      await refresh();
+      return data.baseline;
+    },
+    [agentId, refresh]
+  );
+
+  const deleteBaseline = useCallback(
+    async (id: string) => {
+      const res = await fetch(`/api/monitor/lab/baselines/${id}`, { method: "DELETE" });
+      if (!res.ok) throw new Error("Failed to delete baseline");
+      await refresh();
+    },
+    [refresh]
+  );
+
+  return { baselines, loading, error, refresh, createBaseline, deleteBaseline };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useCorpusDocs.ts b/apps/web/src/app/monitor/lab/hooks/useCorpusDocs.ts
new file mode 100644
index 000000000..8fe9f0d63
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useCorpusDocs.ts
@@ -0,0 +1,40 @@
+import { useState, useCallback } from "react";
+import type { CorpusDocument } from "../types";
+
+interface UseCorpusDocsReturn {
+  documents: CorpusDocument[];
+  loading: boolean;
+  error: string | null;
+  refresh: (filter?: string) => Promise<void>;
+}
+
+export function useCorpusDocs(agentId: string): UseCorpusDocsReturn {
+  const [documents, setDocuments] = useState<CorpusDocument[]>([]);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  const refresh = useCallback(
+    async (filter?: string) => {
+      if (!agentId) return;
+      setLoading(true);
+      setError(null);
+      try {
+        const url = new URL("/api/monitor/lab/corpus", window.location.origin);
+        url.searchParams.set("agentId", agentId);
+        if (filter) url.searchParams.set("filter", filter);
+
+        const res = await fetch(url.toString());
+        if (!res.ok) throw new Error("Failed to fetch corpus documents");
+        const data = await res.json();
+        setDocuments(data.documents);
+      } catch (err) {
+        setError(err instanceof Error ? err.message : "Unknown error");
+      } finally {
+        setLoading(false);
+      }
+    },
+    [agentId]
+  );
+
+  return { documents, loading, error, refresh };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useDefaultPrompts.ts b/apps/web/src/app/monitor/lab/hooks/useDefaultPrompts.ts
new file mode 100644
index 000000000..92b5748e2
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useDefaultPrompts.ts
@@ -0,0 +1,37 @@
+"use client";
+
+import { useState, useEffect } from "react";
+
+interface DefaultPrompts {
+  extractorSystemPrompt: string;
+  extractorUserPrompt: string;
+  judgeSystemPrompt: string;
+  filterSystemPrompt: string;
+}
+
+export function useDefaultPrompts() {
+  const [prompts, setPrompts] = useState<DefaultPrompts | null>(null);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    async function fetchPrompts() {
+      try {
+        const response = await fetch("/api/monitor/lab/prompts");
+        if (!response.ok) {
+          throw new Error("Failed to fetch default prompts");
+        }
+        const data = await response.json();
+        setPrompts(data);
+      } catch (err) {
+        setError(err instanceof Error ? err.message : "Unknown error");
+      } finally {
+        setLoading(false);
+      }
+    }
+
+    void fetchPrompts();
+  }, []);
+
+  return { prompts, loading, error };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useModelEndpoints.ts b/apps/web/src/app/monitor/lab/hooks/useModelEndpoints.ts
new file mode 100644
index 000000000..e87d78a47
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useModelEndpoints.ts
@@ -0,0 +1,107 @@
+"use client";
+
+import { useState, useEffect } from "react";
+
+export interface ModelEndpoint {
+  name: string;
+  providerName: string;
+  tag: string; // Provider slug like "google-vertex", "together", etc.
+  contextLength: number;
+  maxCompletionTokens: number | null;
+  status: number;
+  uptimeLast30m: number;
+}
+
+interface UseModelEndpointsReturn {
+  endpoints: ModelEndpoint[];
+  loading: boolean;
+  error: string | null;
+}
+
+// Cache endpoints to avoid repeated fetches
+const endpointsCache = new Map<string, { endpoints: ModelEndpoint[]; timestamp: number }>();
+const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
+
+/**
+ * Fetch available endpoints/providers for a specific OpenRouter model.
+ * Only works for OpenRouter models (those with "/" in the ID).
+ */
+export function useModelEndpoints(modelId: string | null): UseModelEndpointsReturn {
+  const [endpoints, setEndpoints] = useState<ModelEndpoint[]>([]);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  useEffect(() => {
+    // Only fetch for OpenRouter models
+    if (!modelId || !modelId.includes("/")) {
+      setEndpoints([]);
+      setLoading(false);
+      return;
+    }
+
+    // Check cache
+    const cached = endpointsCache.get(modelId);
+    if (cached && Date.now() - cached.timestamp < CACHE_TTL_MS) {
+      setEndpoints(cached.endpoints);
+      setLoading(false);
+      return;
+    }
+
+    const fetchEndpoints = async () => {
+      setLoading(true);
+      setError(null);
+
+      try {
+        // Use our proxy endpoint to avoid CORS issues
+        const response = await fetch(
+          `/api/monitor/lab/model-endpoints?model=${encodeURIComponent(modelId)}`
+        );
+
+        if (!response.ok) {
+          throw new Error(`Failed to fetch endpoints: ${response.status}`);
+        }
+
+        const data = await response.json();
+        const rawEndpoints = data.data?.endpoints || [];
+
+        // Parse and deduplicate by tag (provider slug)
+        const seenTags = new Set<string>();
+        const parsed: ModelEndpoint[] = [];
+
+        for (const ep of rawEndpoints) {
+          // Extract base tag (remove region suffixes like "/global")
+          const baseTag = ep.tag?.split("/")[0];
+          if (!baseTag || seenTags.has(baseTag)) continue;
+          seenTags.add(baseTag);
+
+          parsed.push({
+            name: ep.name,
+            providerName: ep.provider_name,
+            tag: baseTag,
+            contextLength: ep.context_length || 0,
+            maxCompletionTokens: ep.max_completion_tokens,
+            status: ep.status,
+            uptimeLast30m: ep.uptime_last_30m || 0,
+          });
+        }
+
+        // Sort by uptime (best first)
+        parsed.sort((a, b) => b.uptimeLast30m - a.uptimeLast30m);
+
+        // Cache the result
+        endpointsCache.set(modelId, { endpoints: parsed, timestamp: Date.now() });
+
+        setEndpoints(parsed);
+      } catch (e) {
+        setError(e instanceof Error ? e.message : "Failed to fetch endpoints");
+        setEndpoints([]);
+      } finally {
+        setLoading(false);
+      }
+    };
+
+    void fetchEndpoints();
+  }, [modelId]);
+
+  return { endpoints, loading, error };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useModels.ts b/apps/web/src/app/monitor/lab/hooks/useModels.ts
new file mode 100644
index 000000000..e1b481097
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useModels.ts
@@ -0,0 +1,86 @@
+"use client";
+
+import { useState, useEffect, useCallback } from "react";
+
+export interface ModelInfo {
+  id: string;
+  name: string;
+  provider: "anthropic" | "openrouter";
+  contextLength?: number;
+  description?: string;
+  supportsTemperature?: boolean;
+  defaultTemperature?: number;
+  maxTemperature?: number;
+  supportsReasoning?: boolean;
+}
+
+interface UseModelsReturn {
+  models: ModelInfo[];
+  loading: boolean;
+  error: string | null;
+  refresh: () => Promise<void>;
+  filterModels: (query: string) => ModelInfo[];
+  groupByProvider: () => Map<string, ModelInfo[]>;
+}
+
+export function useModels(): UseModelsReturn {
+  const [models, setModels] = useState<ModelInfo[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  const fetchModels = useCallback(async () => {
+    setLoading(true);
+    setError(null);
+    try {
+      const response = await fetch("/api/monitor/lab/models");
+      if (!response.ok) {
+        throw new Error(`Failed to fetch models: ${response.status}`);
+      }
+      const data = await response.json();
+      setModels(data.models);
+    } catch (e) {
+      setError(e instanceof Error ? e.message : "Failed to fetch models");
+    } finally {
+      setLoading(false);
+    }
+  }, []);
+
+  useEffect(() => {
+    void fetchModels();
+  }, [fetchModels]);
+
+  const filterModels = useCallback(
+    (query: string): ModelInfo[] => {
+      if (!query.trim()) {
+        return models;
+      }
+      const lowerQuery = query.toLowerCase();
+      return models.filter(
+        (m) =>
+          m.id.toLowerCase().includes(lowerQuery) ||
+          m.name.toLowerCase().includes(lowerQuery) ||
+          m.provider.toLowerCase().includes(lowerQuery)
+      );
+    },
+    [models]
+  );
+
+  const groupByProvider = useCallback((): Map<string, ModelInfo[]> => {
+    const grouped = new Map<string, ModelInfo[]>();
+    for (const model of models) {
+      const existing = grouped.get(model.provider) || [];
+      existing.push(model);
+      grouped.set(model.provider, existing);
+    }
+    return grouped;
+  }, [models]);
+
+  return {
+    models,
+    loading,
+    error,
+    refresh: fetchModels,
+    filterModels,
+    groupByProvider,
+  };
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useProfiles.ts b/apps/web/src/app/monitor/lab/hooks/useProfiles.ts
new file mode 100644
index 000000000..9a8b5c169
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useProfiles.ts
@@ -0,0 +1,110 @@
+import { useState, useCallback, useEffect } from "react";
+import type { Profile, ProfileConfig } from "../types";
+
+interface UseProfilesReturn {
+  profiles: Profile[];
+  loading: boolean;
+  error: string | null;
+  refresh: () => Promise<void>;
+  createProfile: (name: string, description: string, config?: Partial<ProfileConfig>) => Promise<Profile>;
+  updateProfile: (id: string, updates: Partial<Profile>) => Promise<Profile>;
+  deleteProfile: (id: string) => Promise<void>;
+  setDefault: (id: string) => Promise<void>;
+}
+
+export function useProfiles(agentId: string): UseProfilesReturn {
+  const [profiles, setProfiles] = useState<Profile[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  const refresh = useCallback(async () => {
+    if (!agentId) return;
+    setLoading(true);
+    setError(null);
+    try {
+      const res = await fetch(`/api/monitor/lab/profiles?agentId=${agentId}`);
+      if (!res.ok) throw new Error("Failed to fetch profiles");
+      const data = await res.json();
+      setProfiles(data.profiles);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : "Unknown error");
+    } finally {
+      setLoading(false);
+    }
+  }, [agentId]);
+
+  useEffect(() => {
+    void refresh();
+  }, [refresh]);
+
+  const createProfile = useCallback(
+    async (name: string, description: string, config?: Partial<ProfileConfig>): Promise<Profile> => {
+      const res = await fetch("/api/monitor/lab/profiles", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ name, description, agentId, config }),
+      });
+      if (!res.ok) {
+        const data = await res.json();
+        throw new Error(data.error || "Failed to create profile");
+      }
+      const data = await res.json();
+      await refresh();
+      return data.profile;
+    },
+    [agentId, refresh]
+  );
+
+  const updateProfile = useCallback(
+    async (id: string, updates: Partial<Profile>): Promise<Profile> => {
+      const res = await fetch(`/api/monitor/lab/profiles/${id}`, {
+        method: "PUT",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify(updates),
+      });
+      if (!res.ok) {
+        const data = await res.json();
+        throw new Error(data.error || "Failed to update profile");
+      }
+      const data = await res.json();
+      await refresh();
+      return data.profile;
+    },
+    [refresh]
+  );
+
+  const deleteProfile = useCallback(
+    async (id: string) => {
+      const res = await fetch(`/api/monitor/lab/profiles/${id}`, { method: "DELETE" });
+      if (!res.ok) throw new Error("Failed to delete profile");
+      await refresh();
+    },
+    [refresh]
+  );
+
+  const setDefault = useCallback(
+    async (id: string) => {
+      await updateProfile(id, { isDefault: true } as Partial<Profile>);
+    },
+    [updateProfile]
+  );
+
+  return { profiles, loading, error, refresh, createProfile, updateProfile, deleteProfile, setDefault };
+}
+
+/**
+ * Get the default profile or the first one available
+ */
+export function getActiveProfile(profiles: Profile[], selectedId: string | null): Profile | null {
+  if (selectedId) {
+    const selected = profiles.find(p => p.id === selectedId);
+    if (selected) return selected;
+  }
+
+  // Find default profile
+  const defaultProfile = profiles.find(p => p.isDefault);
+  if (defaultProfile) return defaultProfile;
+
+  // Fall back to first profile
+  return profiles[0] ?? null;
+}
diff --git a/apps/web/src/app/monitor/lab/hooks/useRuns.ts b/apps/web/src/app/monitor/lab/hooks/useRuns.ts
new file mode 100644
index 000000000..808b70e21
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/hooks/useRuns.ts
@@ -0,0 +1,73 @@
+import { useState, useCallback, useEffect } from "react";
+import type { ValidationRun, ValidationRunDetail } from "../types";
+
+interface UseRunsReturn {
+  runs: ValidationRun[];
+  loading: boolean;
+  error: string | null;
+  refresh: () => Promise<void>;
+  startRun: (baselineId: string, name?: string) => Promise<ValidationRun>;
+  getRunDetail: (runId: string) => Promise<ValidationRunDetail | null>;
+  deleteRun: (runId: string) => Promise<void>;
+}
+
+export function useRuns(baselineId: string | null): UseRunsReturn {
+  const [runs, setRuns] = useState<ValidationRun[]>([]);
+  const [loading, setLoading] = useState(false);
+  const [error, setError] = useState<string | null>(null);
+
+  const refresh = useCallback(async () => {
+    if (!baselineId) {
+      setRuns([]);
+      return;
+    }
+    setLoading(true);
+    setError(null);
+    try {
+      const res = await fetch(`/api/monitor/lab/runs?baselineId=${baselineId}`);
+      if (!res.ok) throw new Error("Failed to fetch runs");
+      const data = await res.json();
+      setRuns(data.runs);
+    } catch (err) {
+      setError(err instanceof Error ? err.message : "Unknown error");
+    } finally {
+      setLoading(false);
+    }
+  }, [baselineId]);
+
+  useEffect(() => {
+    void refresh();
+  }, [refresh]);
+
+  const startRun = useCallback(
+    async (baselineId: string, name?: string): Promise<ValidationRun> => {
+      const res = await fetch("/api/monitor/lab/runs", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ baselineId, name }),
+      });
+      if (!res.ok) throw new Error("Failed to start run");
+      const data = await res.json();
+      return data.run;
+    },
+    []
+  );
+
+  const getRunDetail = useCallback(async (runId: string): Promise<ValidationRunDetail | null> => {
+    const res = await fetch(`/api/monitor/lab/runs/${runId}`);
+    if (!res.ok) return null;
+    const data = await res.json();
+    return data.run;
+  }, []);
+
+  const deleteRun = useCallback(
+    async (runId: string) => {
+      const res = await fetch(`/api/monitor/lab/runs/${runId}`, { method: "DELETE" });
+      if (!res.ok) throw new Error("Failed to delete run");
+      await refresh();
+    },
+    [refresh]
+  );
+
+  return { runs, loading, error, refresh, startRun, getRunDetail, deleteRun };
+}
diff --git a/apps/web/src/app/monitor/lab/page.tsx b/apps/web/src/app/monitor/lab/page.tsx
new file mode 100644
index 000000000..3a020e0fc
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/page.tsx
@@ -0,0 +1,534 @@
+"use client";
+
+import { useState, useCallback } from "react";
+import { useBaselines } from "./hooks/useBaselines";
+import { useRuns } from "./hooks/useRuns";
+import { useProfiles, getActiveProfile } from "./hooks/useProfiles";
+import { useAllEvaluations } from "./hooks/useAllEvaluations";
+import type { Baseline, Profile, ProfileConfig } from "./types";
+import { formatDate } from "./utils/formatters";
+import { PlusIcon, PlayIcon, ArrowPathIcon, TrashIcon, BeakerIcon, CpuChipIcon, DocumentMagnifyingGlassIcon } from "@heroicons/react/24/outline";
+import { CreateBaselineModal } from "./components/baselines/CreateBaselineModal";
+import { RunDetail } from "./components/history/RunDetail";
+import { ProfilesList } from "./components/profiles/ProfilesList";
+import { ProfileDetailView } from "./components/profiles/ProfileDetailView";
+import { AllEvaluationsList } from "./components/evaluations/AllEvaluationsList";
+
+type SidebarTab = "baselines" | "profiles" | "evaluations";
+
+interface RunProgress {
+  phase: "idle" | "starting" | "running" | "comparing" | "done" | "error";
+  message: string;
+  completed: number;
+  total: number;
+}
+
+const AGENT_ID = "system-fallacy-check";
+
+function getDefaultRunName(): string {
+  const now = new Date();
+  return `Run ${now.toLocaleString("en-US", { month: "short", day: "numeric", hour: "numeric", minute: "2-digit" })}`;
+}
+
+export default function LabPage() {
+  const { baselines, loading: baselinesLoading, refresh: refreshBaselines, deleteBaseline } = useBaselines(AGENT_ID);
+  const { profiles, loading: profilesLoading, deleteProfile, setDefault: setDefaultProfile, updateProfile, createProfile } = useProfiles(AGENT_ID);
+  const { evaluations, loading: evaluationsLoading, error: evaluationsError, refresh: refreshEvaluations } = useAllEvaluations(AGENT_ID);
+
+  // Sidebar tab state
+  const [sidebarTab, setSidebarTab] = useState<SidebarTab>("baselines");
+
+  // Baselines state
+  const [selectedBaseline, setSelectedBaseline] = useState<Baseline | null>(null);
+  const [selectedProfileId, setSelectedProfileId] = useState<string | null>(null);
+  const [showCreateModal, setShowCreateModal] = useState(false);
+  const [expandedRun, setExpandedRun] = useState<string | null>(null);
+
+  // Profiles tab state
+  const [selectedProfileForEdit, setSelectedProfileForEdit] = useState<Profile | null>(null);
+
+  // Get active profile
+  const activeProfile = getActiveProfile(profiles, selectedProfileId);
+
+  // Run state
+  const [runName, setRunName] = useState(getDefaultRunName);
+  const [runProgress, setRunProgress] = useState<RunProgress>({
+    phase: "idle",
+    message: "",
+    completed: 0,
+    total: 0,
+  });
+
+  // Get runs for selected baseline
+  const { runs, loading: runsLoading, refresh: refreshRuns } = useRuns(selectedBaseline?.id ?? null);
+
+  const pollJobStatus = useCallback(async (jobIds: string[]): Promise<boolean> => {
+    const res = await fetch(`/api/monitor/lab/jobs/status?jobIds=${jobIds.join(",")}`);
+    if (!res.ok) throw new Error("Failed to check job status");
+    const data = await res.json();
+    setRunProgress((p) => ({
+      ...p,
+      completed: data.summary.completed + data.summary.failed,
+      total: data.summary.total,
+      message: `${data.summary.completed} completed, ${data.summary.running} running, ${data.summary.pending} pending`,
+    }));
+    return data.summary.allDone;
+  }, []);
+
+  const startRun = async () => {
+    if (!selectedBaseline) return;
+
+    setRunProgress({ phase: "starting", message: "Creating validation run...", completed: 0, total: 0 });
+
+    try {
+      const startRes = await fetch("/api/monitor/lab/runs/start", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({
+          baselineId: selectedBaseline.id,
+          name: runName || undefined,
+          profileId: activeProfile?.id,
+        }),
+      });
+
+      if (!startRes.ok) {
+        const err = await startRes.json();
+        throw new Error(err.error || "Failed to start run");
+      }
+
+      const startData = await startRes.json();
+      const runId = startData.run.id;
+      const { jobIds } = startData;
+
+      setRunProgress({
+        phase: "running",
+        message: `Evaluating ${jobIds.length} documents...`,
+        completed: 0,
+        total: jobIds.length,
+      });
+
+      // Poll for completion
+      const maxWaitMs = 10 * 60 * 1000;
+      const pollIntervalMs = 3000;
+      const startTime = Date.now();
+
+      while (Date.now() - startTime < maxWaitMs) {
+        const allDone = await pollJobStatus(jobIds);
+        if (allDone) break;
+        await new Promise((r) => setTimeout(r, pollIntervalMs));
+      }
+
+      setRunProgress((p) => ({ ...p, phase: "comparing", message: "Comparing results..." }));
+
+      const finalizeRes = await fetch(`/api/monitor/lab/runs/${runId}/finalize`, { method: "POST" });
+      if (!finalizeRes.ok) {
+        const err = await finalizeRes.json();
+        throw new Error(err.error || "Failed to finalize run");
+      }
+
+      const finalizeData = await finalizeRes.json();
+      setRunProgress({
+        phase: "done",
+        message: finalizeData.summary,
+        completed: finalizeData.unchangedCount + finalizeData.changedCount,
+        total: finalizeData.unchangedCount + finalizeData.changedCount,
+      });
+
+      setRunName(getDefaultRunName());
+      void refreshRuns();
+    } catch (error) {
+      setRunProgress((p) => ({
+        ...p,
+        phase: "error",
+        message: error instanceof Error ? error.message : "Unknown error",
+      }));
+    }
+  };
+
+  const isRunning = runProgress.phase === "starting" || runProgress.phase === "running" || runProgress.phase === "comparing";
+  const progressPercent = runProgress.total > 0 ? Math.round((runProgress.completed / runProgress.total) * 100) : 0;
+
+  const handleBaselineCreated = () => {
+    setShowCreateModal(false);
+    void refreshBaselines();
+  };
+
+  const handleDeleteBaseline = async (id: string) => {
+    if (!confirm("Delete this baseline?")) return;
+    await deleteBaseline(id);
+    if (selectedBaseline?.id === id) {
+      setSelectedBaseline(null);
+    }
+  };
+
+  // Handle saving profile from ProfileDetailView
+  const handleSaveProfile = async (updates: { name?: string; description?: string; config?: ProfileConfig }) => {
+    if (!selectedProfileForEdit) return;
+    await updateProfile(selectedProfileForEdit.id, updates);
+    // Update local state with the new values
+    setSelectedProfileForEdit((prev) =>
+      prev
+        ? {
+            ...prev,
+            name: updates.name || prev.name,
+            description: updates.description ?? prev.description,
+            config: updates.config || prev.config,
+          }
+        : null
+    );
+  };
+
+  // Create a new profile and select it for editing
+  const handleCreateProfile = async () => {
+    const existingNames = profiles.map((p) => p.name);
+    // Generate a descriptive name with date
+    const now = new Date();
+    let name = `Profile ${now.toLocaleDateString("en-US", {
+      month: "short",
+      day: "numeric",
+      hour: "2-digit",
+      minute: "2-digit",
+    })}`;
+    let counter = 1;
+    while (existingNames.includes(name)) {
+      counter++;
+      name = `Profile ${now.toLocaleDateString("en-US", {
+        month: "short",
+        day: "numeric",
+        hour: "2-digit",
+        minute: "2-digit",
+      })} (${counter})`;
+    }
+
+    try {
+      const newProfile = await createProfile(name, "");
+      setSelectedProfileForEdit(newProfile);
+    } catch (error) {
+      console.error("Failed to create profile:", error);
+      alert("Failed to create profile");
+    }
+  };
+
+  return (
+    <div className="h-full flex">
+      {/* Left Sidebar */}
+      <div className="w-72 border-r bg-gray-50 flex flex-col">
+        {/* Sidebar Tabs */}
+        <div className="flex border-b bg-white">
+          <button
+            onClick={() => {
+              setSidebarTab("baselines");
+              setSelectedProfileForEdit(null);
+            }}
+            className={`flex-1 flex items-center justify-center gap-1.5 px-2 py-3 text-sm font-medium whitespace-nowrap transition-colors ${
+              sidebarTab === "baselines"
+                ? "text-blue-600 border-b-2 border-blue-600 bg-blue-50"
+                : "text-gray-600 hover:text-gray-900 hover:bg-gray-50"
+            }`}
+          >
+            <BeakerIcon className="h-4 w-4 flex-shrink-0" />
+            Baselines
+          </button>
+          <button
+            onClick={() => {
+              setSidebarTab("profiles");
+              setSelectedBaseline(null);
+              setExpandedRun(null);
+            }}
+            className={`flex-1 flex items-center justify-center gap-1.5 px-2 py-3 text-sm font-medium whitespace-nowrap transition-colors ${
+              sidebarTab === "profiles"
+                ? "text-blue-600 border-b-2 border-blue-600 bg-blue-50"
+                : "text-gray-600 hover:text-gray-900 hover:bg-gray-50"
+            }`}
+          >
+            <CpuChipIcon className="h-4 w-4 flex-shrink-0" />
+            Profiles
+          </button>
+          <button
+            onClick={() => {
+              setSidebarTab("evaluations");
+              setSelectedBaseline(null);
+              setExpandedRun(null);
+              setSelectedProfileForEdit(null);
+            }}
+            className={`flex-1 flex items-center justify-center gap-1.5 px-2 py-3 text-sm font-medium whitespace-nowrap transition-colors ${
+              sidebarTab === "evaluations"
+                ? "text-blue-600 border-b-2 border-blue-600 bg-blue-50"
+                : "text-gray-600 hover:text-gray-900 hover:bg-gray-50"
+            }`}
+          >
+            <DocumentMagnifyingGlassIcon className="h-4 w-4 flex-shrink-0" />
+            Evals
+          </button>
+        </div>
+
+        {/* Sidebar Content */}
+        {sidebarTab === "baselines" ? (
+          <>
+            <div className="p-4 border-b bg-white">
+              <div className="flex items-center justify-between mb-2">
+                <h2 className="font-semibold text-gray-900">Baselines</h2>
+                <button
+                  onClick={() => setShowCreateModal(true)}
+                  className="p-1 text-blue-600 hover:bg-blue-50 rounded"
+                  title="Create baseline"
+                >
+                  <PlusIcon className="h-5 w-5" />
+                </button>
+              </div>
+              <p className="text-xs text-gray-500">Select a baseline to run validation</p>
+            </div>
+
+            <div className="flex-1 overflow-y-auto">
+              {baselinesLoading ? (
+                <div className="p-4 text-gray-500 text-sm">Loading...</div>
+              ) : baselines.length === 0 ? (
+                <div className="p-4 text-gray-500 text-sm">No baselines yet</div>
+              ) : (
+                <div className="divide-y">
+                  {baselines.map((baseline) => (
+                    <div
+                      key={baseline.id}
+                      onClick={() => {
+                        setSelectedBaseline(baseline);
+                        setExpandedRun(null);
+                      }}
+                      className={`p-3 cursor-pointer hover:bg-gray-100 ${
+                        selectedBaseline?.id === baseline.id ? "bg-blue-50 border-l-4 border-blue-500" : ""
+                      }`}
+                    >
+                      <div className="flex items-start justify-between">
+                        <div className="flex-1 min-w-0">
+                          <div className="font-medium text-gray-900 truncate">{baseline.name}</div>
+                          <div className="text-xs text-gray-500 mt-1">
+                            {baseline.snapshotCount} docs · {formatDate(baseline.createdAt)}
+                          </div>
+                        </div>
+                        <button
+                          onClick={(e) => {
+                            e.stopPropagation();
+                            void handleDeleteBaseline(baseline.id);
+                          }}
+                          className="p-1 text-gray-400 hover:text-red-600 hover:bg-red-50 rounded"
+                        >
+                          <TrashIcon className="h-4 w-4" />
+                        </button>
+                      </div>
+                    </div>
+                  ))}
+                </div>
+              )}
+            </div>
+          </>
+        ) : sidebarTab === "profiles" ? (
+          <ProfilesList
+            profiles={profiles}
+            loading={profilesLoading}
+            selectedProfile={selectedProfileForEdit}
+            onSelectProfile={setSelectedProfileForEdit}
+            onCreateProfile={() => void handleCreateProfile()}
+            onDeleteProfile={(id) => void deleteProfile(id)}
+            onSetDefault={(id) => void setDefaultProfile(id)}
+          />
+        ) : (
+          /* Evaluations tab - sidebar info */
+          <div className="p-4">
+            <h2 className="font-semibold text-gray-900 mb-2">All Evaluations</h2>
+            <p className="text-xs text-gray-500">
+              View telemetry from all user-facing evaluations, not just validation runs.
+            </p>
+          </div>
+        )}
+      </div>
+
+      {/* Main Content */}
+      <div className="flex-1 flex flex-col overflow-hidden">
+        {/* Evaluations Tab Main Content */}
+        {sidebarTab === "evaluations" ? (
+          <div className="flex-1 overflow-y-auto p-4 bg-gray-50">
+            <AllEvaluationsList
+              evaluations={evaluations}
+              loading={evaluationsLoading}
+              error={evaluationsError}
+              onRefresh={() => void refreshEvaluations()}
+            />
+          </div>
+        ) : /* Profiles Tab Main Content */
+        sidebarTab === "profiles" ? (
+          selectedProfileForEdit ? (
+            <ProfileDetailView profile={selectedProfileForEdit} onSave={handleSaveProfile} />
+          ) : (
+            <div className="flex-1 flex items-center justify-center text-gray-500">
+              <div className="text-center">
+                <CpuChipIcon className="h-12 w-12 mx-auto mb-4 text-gray-300" />
+                <p className="text-lg">Select a profile to view or edit</p>
+                <p className="text-sm mt-1">or create a new one</p>
+                <button
+                  onClick={() => void handleCreateProfile()}
+                  className="mt-4 px-4 py-2 bg-blue-600 text-white rounded-md hover:bg-blue-700"
+                >
+                  Create Profile
+                </button>
+              </div>
+            </div>
+          )
+        ) : !selectedBaseline ? (
+          <div className="flex-1 flex items-center justify-center text-gray-500">
+            <div className="text-center">
+              <BeakerIcon className="h-12 w-12 mx-auto mb-4 text-gray-300" />
+              <p className="text-lg">Select a baseline to get started</p>
+              <p className="text-sm mt-1">or create a new one</p>
+            </div>
+          </div>
+        ) : (
+          <>
+            {/* Run Controls Header */}
+            <div className="p-4 border-b bg-white">
+              <div className="flex items-center justify-between">
+                <div>
+                  <h1 className="text-xl font-semibold text-gray-900">{selectedBaseline.name}</h1>
+                  <p className="text-sm text-gray-500">
+                    {selectedBaseline.snapshotCount} documents · Created {formatDate(selectedBaseline.createdAt)}
+                  </p>
+                </div>
+                <div className="flex items-center gap-3">
+                  {/* Profile Selector */}
+                  <div className="flex items-center gap-1">
+                    <select
+                      value={selectedProfileId || activeProfile?.id || ""}
+                      onChange={(e) => setSelectedProfileId(e.target.value || null)}
+                      disabled={isRunning || profilesLoading}
+                      className="px-3 py-2 border rounded-md text-sm disabled:bg-gray-100"
+                    >
+                      {profilesLoading ? (
+                        <option>Loading...</option>
+                      ) : profiles.length === 0 ? (
+                        <option value="">Default config</option>
+                      ) : (
+                        profiles.map((p) => (
+                          <option key={p.id} value={p.id}>
+                            {p.name}{p.isDefault ? " (default)" : ""}
+                          </option>
+                        ))
+                      )}
+                    </select>
+                  </div>
+                  <input
+                    type="text"
+                    value={runName}
+                    onChange={(e) => setRunName(e.target.value)}
+                    placeholder="Run name (optional)"
+                    disabled={isRunning}
+                    className="px-3 py-2 border rounded-md text-sm w-48 disabled:bg-gray-100"
+                  />
+                  <button
+                    onClick={() => void startRun()}
+                    disabled={isRunning}
+                    className="flex items-center gap-2 px-4 py-2 bg-green-600 text-white rounded-md hover:bg-green-700 disabled:opacity-50 disabled:cursor-not-allowed"
+                  >
+                    {isRunning ? (
+                      <ArrowPathIcon className="h-4 w-4 animate-spin" />
+                    ) : (
+                      <PlayIcon className="h-4 w-4" />
+                    )}
+                    <span>{isRunning ? "Running..." : "Run Validation"}</span>
+                  </button>
+                </div>
+              </div>
+
+              {/* Progress Bar */}
+              {runProgress.phase !== "idle" && (
+                <div className="mt-4">
+                  <div className="flex items-center justify-between text-sm mb-1">
+                    <span className={`font-medium ${
+                      runProgress.phase === "done" ? "text-green-600" :
+                      runProgress.phase === "error" ? "text-red-600" : "text-blue-600"
+                    }`}>
+                      {runProgress.phase === "starting" && "Starting..."}
+                      {runProgress.phase === "running" && "Running evaluations"}
+                      {runProgress.phase === "comparing" && "Comparing results"}
+                      {runProgress.phase === "done" && "Complete"}
+                      {runProgress.phase === "error" && "Error"}
+                    </span>
+                    <span className="text-gray-500">{runProgress.message}</span>
+                  </div>
+                  {(runProgress.phase === "running" || runProgress.phase === "comparing") && (
+                    <div className="h-2 bg-gray-200 rounded-full overflow-hidden">
+                      <div
+                        className="h-full bg-blue-600 transition-all duration-300"
+                        style={{ width: `${progressPercent}%` }}
+                      />
+                    </div>
+                  )}
+                </div>
+              )}
+            </div>
+
+            {/* Run History */}
+            <div className="flex-1 overflow-y-auto p-4 bg-gray-50">
+              <h2 className="font-semibold text-gray-900 mb-3">Run History</h2>
+              {runsLoading ? (
+                <div className="text-gray-500 text-sm">Loading runs...</div>
+              ) : runs.length === 0 ? (
+                <div className="bg-white rounded-lg border p-8 text-center text-gray-500">
+                  <p>No runs yet for this baseline</p>
+                  <p className="text-sm mt-1">Click "Run Validation" to start</p>
+                </div>
+              ) : (
+                <div className="space-y-3">
+                  {runs.map((run) => (
+                    <div key={run.id} className="bg-white rounded-lg border overflow-hidden">
+                      <div
+                        onClick={() => setExpandedRun(expandedRun === run.id ? null : run.id)}
+                        className="p-4 cursor-pointer hover:bg-gray-50 flex items-center justify-between"
+                      >
+                        <div>
+                          <div className="font-medium text-gray-900">{run.name}</div>
+                          <div className="text-sm text-gray-500 mt-1">
+                            {formatDate(run.createdAt)} · {run.summary || run.status}
+                          </div>
+                        </div>
+                        <div className="flex items-center gap-3">
+                          <RunStatusBadge status={run.status} />
+                          <span className="text-gray-400">{expandedRun === run.id ? "▼" : "▶"}</span>
+                        </div>
+                      </div>
+                      {expandedRun === run.id && (
+                        <div className="border-t">
+                          <RunDetail runId={run.id} />
+                        </div>
+                      )}
+                    </div>
+                  ))}
+                </div>
+              )}
+            </div>
+          </>
+        )}
+      </div>
+
+      {/* Create Baseline Modal */}
+      {showCreateModal && (
+        <CreateBaselineModal
+          agentId={AGENT_ID}
+          onClose={() => setShowCreateModal(false)}
+          onCreated={handleBaselineCreated}
+        />
+      )}
+    </div>
+  );
+}
+
+function RunStatusBadge({ status }: { status: string }) {
+  const styles = {
+    completed: "bg-green-100 text-green-800",
+    running: "bg-blue-100 text-blue-800",
+    failed: "bg-red-100 text-red-800",
+    pending: "bg-gray-100 text-gray-800",
+  };
+  return (
+    <span className={`px-2 py-1 rounded text-xs font-medium ${styles[status as keyof typeof styles] || styles.pending}`}>
+      {status}
+    </span>
+  );
+}
diff --git a/apps/web/src/app/monitor/lab/types.ts b/apps/web/src/app/monitor/lab/types.ts
new file mode 100644
index 000000000..132bc2798
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/types.ts
@@ -0,0 +1,340 @@
+// Types for the Lab (Validation) feature
+
+export interface Baseline {
+  id: string;
+  name: string;
+  description: string | null;
+  commitHash: string | null;
+  createdAt: string;
+  snapshotCount: number;
+}
+
+export interface CorpusDocument {
+  documentId: string;
+  title: string;
+  contentLength: number;
+  lastEvaluatedAt: string | null;
+  evaluationCount: number;
+}
+
+export interface EvaluationVersionSummary {
+  id: string;
+  createdAt: string;
+  grade: number | null;
+  summary: string | null;
+  version: number;
+}
+
+export interface ValidationRun {
+  id: string;
+  name: string | null;
+  commitHash: string | null;
+  status: "running" | "completed" | "failed";
+  summary: string | null;
+  createdAt: string;
+  completedAt: string | null;
+  snapshotCount: number;
+  unchangedCount: number;
+  changedCount: number;
+}
+
+export interface RunSnapshot {
+  id: string;
+  status: "unchanged" | "changed";
+  keptCount: number;
+  newCount: number;
+  lostCount: number;
+  documentId: string;
+  documentTitle: string;
+  comparisonData: ComparisonData | null;
+}
+
+export interface ComparisonData {
+  matchedComments: CommentMatch[];
+  newComments: Comment[];
+  lostComments: Comment[];
+  filteredItems?: FilteredItem[];
+  passedItems?: PassedItem[];
+  pipelineCounts?: PipelineCounts;
+  extractionPhase?: ExtractionPhase;
+  stages?: StageMetrics[];
+  totalDurationMs?: number;
+}
+
+export interface CommentMatch {
+  baselineComment: Comment;
+  currentComment: Comment;
+  matchConfidence: number;
+  status: string;
+}
+
+export interface Comment {
+  id: string;
+  quotedText: string;
+  header: string | null;
+  description: string;
+  importance: number | null;
+}
+
+export interface FilteredItem {
+  stage: string; // Filter stage name (e.g., "principle-of-charity-filter", "supported-elsewhere-filter", "review")
+  filterReason: string;
+  quotedText: string;
+  header?: string;
+  originalIndex?: number;
+  supportLocation?: string;
+}
+
+export interface PassedItem {
+  stage: string; // Filter stage name
+  quotedText: string;
+  header?: string;
+  passReason: string;
+  originalIndex?: number;
+}
+
+export interface PipelineCounts {
+  issuesAfterDedup: number;
+  issuesAfterFiltering: number;
+  commentsGenerated: number;
+  commentsKept: number;
+}
+
+/** Actual API parameters sent to the provider */
+export interface ActualApiParams {
+  model: string;
+  temperature: number;
+  maxTokens: number;
+  thinking?: {
+    type: "enabled";
+    budget_tokens: number;
+  };
+  reasoning?: {
+    effort?: "none" | "minimal" | "low" | "medium" | "high" | "xhigh";
+    max_tokens?: number;
+  };
+}
+
+/** Response metrics from API call */
+export interface ApiResponseMetrics {
+  success: boolean;
+  latencyMs: number;
+  inputTokens?: number;
+  outputTokens?: number;
+  cacheReadTokens?: number;
+  cacheWriteTokens?: number;
+  stopReason?: string;
+  errorType?: string;
+  errorMessage?: string;
+}
+
+export interface ExtractorInfo {
+  extractorId: string;
+  model: string;
+  issuesFound: number;
+  durationMs?: number;
+  costUsd?: number;
+  /** Temperature that was configured (number or "default") */
+  temperatureConfig?: number | "default";
+  /** Whether thinking/reasoning was enabled */
+  thinkingEnabled?: boolean;
+  /** Error message if extraction failed */
+  error?: string;
+  /** Actual parameters sent to the API */
+  actualApiParams?: ActualApiParams;
+  /** Response metrics from the API */
+  responseMetrics?: ApiResponseMetrics;
+  /** Issues breakdown by type */
+  issuesByType?: Record<string, number>;
+}
+
+export interface ExtractionPhase {
+  totalIssuesBeforeJudge: number;
+  totalIssuesAfterDedup?: number;
+  totalIssuesAfterJudge: number;
+  extractors?: ExtractorInfo[];
+  judgeDurationMs?: number;
+  judgeCostUsd?: number;
+  judgeModel?: string;
+  /** Actual API parameters sent for judge call */
+  judgeActualApiParams?: ActualApiParams;
+  /** Response metrics from judge API call */
+  judgeResponseMetrics?: ApiResponseMetrics;
+}
+
+export interface StageMetrics {
+  stageName: string;
+  durationMs: number;
+  inputCount: number;
+  outputCount: number;
+  model?: string;
+  costUsd?: number;
+  /** Actual API parameters sent to the provider (for LLM-based stages) */
+  actualApiParams?: ActualApiParams;
+  /** Response metrics from API call (for LLM-based stages) */
+  responseMetrics?: ApiResponseMetrics;
+}
+
+export interface ValidationRunDetail {
+  id: string;
+  name: string | null;
+  commitHash: string | null;
+  status: string;
+  summary: string | null;
+  createdAt: string;
+  completedAt: string | null;
+  baseline: { id: string; name: string };
+  snapshots: RunSnapshot[];
+}
+
+export type TabId = "baselines" | "run" | "history";
+
+// Profile types
+export type FilterType = "dedup" | "principle-of-charity" | "supported-elsewhere" | "severity" | "confidence" | "review";
+
+/** Reasoning effort levels (maps to OpenRouter's effort parameter) */
+export type ReasoningEffort = "minimal" | "low" | "medium" | "high" | "xhigh";
+
+/** Reasoning configuration - either off, effort level, or custom token budget */
+export type ReasoningConfig =
+  | false                           // Off
+  | { effort: ReasoningEffort }     // Effort level (minimal, low, medium, high, xhigh)
+  | { budget_tokens: number };      // Custom token budget (min 1024)
+
+/** Maps effort levels to Anthropic budget_tokens values */
+export const EFFORT_TO_BUDGET_TOKENS: Record<ReasoningEffort, number> = {
+  minimal: 1024,    // Minimum allowed
+  low: 2048,
+  medium: 8192,
+  high: 16384,
+  xhigh: 32768,
+};
+
+/** Provider routing preferences for OpenRouter */
+export interface ProviderPreferences {
+  /** Ordered list of preferred providers (e.g., ["anthropic", "google"]) */
+  order?: string[];
+  /** Allow fallback to other providers if preferred ones fail */
+  allow_fallbacks?: boolean;
+}
+
+export interface ExtractorConfig {
+  model: string;
+  temperature?: number | "default";
+  label?: string;
+  /** @deprecated Use reasoning instead */
+  thinking?: boolean;
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+}
+
+export interface JudgeConfig {
+  model: string;
+  temperature?: number | "default";
+  /** @deprecated Use reasoning instead */
+  thinking?: boolean;
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+  enabled: boolean;
+}
+
+/** Base filter configuration - all filters have these */
+interface BaseFilterConfig {
+  id: string;  // Unique ID for this filter instance
+  enabled: boolean;
+}
+
+/** Supported-elsewhere filter: LLM checks if issues are explained elsewhere in document */
+export interface SupportedElsewhereFilterConfig extends BaseFilterConfig {
+  type: "supported-elsewhere";
+  model: string;
+  temperature?: number | "default";
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+  customPrompt?: string;
+}
+
+/** Principle of Charity filter: LLM interprets arguments charitably before critiquing */
+export interface PrincipleOfCharityFilterConfig extends BaseFilterConfig {
+  type: "principle-of-charity";
+  model: string;
+  temperature?: number | "default";
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+  customPrompt?: string;
+}
+
+/** Severity threshold filter: removes issues below a severity score */
+export interface SeverityFilterConfig extends BaseFilterConfig {
+  type: "severity";
+  minSeverity: number;  // 0-100
+}
+
+/** Confidence threshold filter: removes issues below a confidence score */
+export interface ConfidenceFilterConfig extends BaseFilterConfig {
+  type: "confidence";
+  minConfidence: number;  // 0-100
+}
+
+/** Union of all filter configs */
+export type FilterChainItem =
+  | SupportedElsewhereFilterConfig
+  | PrincipleOfCharityFilterConfig
+  | SeverityFilterConfig
+  | ConfidenceFilterConfig;
+
+/** Available filter types for the "Add Filter" dropdown */
+export const AVAILABLE_FILTER_TYPES = [
+  {
+    type: "principle-of-charity" as const,
+    label: "Principle of Charity",
+    description: "Interprets arguments charitably before critiquing - filters issues that dissolve under generous interpretation"
+  },
+  {
+    type: "supported-elsewhere" as const,
+    label: "Supported Elsewhere",
+    description: "LLM checks if issues are explained/supported elsewhere in the document"
+  },
+  // Note: Severity filtering happens during extraction (minSeverityThreshold)
+  // Note: Confidence filtering is not yet implemented
+] as const;
+
+export interface PromptConfig {
+  extractorSystemPrompt?: string;
+  extractorUserPrompt?: string;
+  judgeSystemPrompt?: string;
+  filterSystemPrompt?: string;
+  reviewSystemPrompt?: string;
+}
+
+export interface ThresholdConfig {
+  minSeverityThreshold: number;
+  maxIssues: number;
+  dedupThreshold: number;
+  maxIssuesToProcess: number;
+}
+
+export interface ProfileConfig {
+  version: 1;
+  models: {
+    extractors: ExtractorConfig[];
+    judge: JudgeConfig;
+  };
+  thresholds: ThresholdConfig;
+  prompts?: PromptConfig;
+  /** Ordered list of filters to apply. Filters run in sequence. */
+  filterChain: FilterChainItem[];
+}
+
+export interface Profile {
+  id: string;
+  name: string;
+  description: string | null;
+  agentId: string;
+  config: ProfileConfig;
+  isDefault: boolean;
+  createdAt: string;
+  updatedAt: string;
+}
diff --git a/apps/web/src/app/monitor/lab/utils/formatters.ts b/apps/web/src/app/monitor/lab/utils/formatters.ts
new file mode 100644
index 000000000..d855ac6ea
--- /dev/null
+++ b/apps/web/src/app/monitor/lab/utils/formatters.ts
@@ -0,0 +1,54 @@
+// Display formatting utilities
+
+export function formatDate(dateString: string): string {
+  const date = new Date(dateString);
+  return date.toLocaleDateString("en-US", {
+    month: "short",
+    day: "numeric",
+    year: "numeric",
+    hour: "2-digit",
+    minute: "2-digit",
+  });
+}
+
+export function formatDateShort(dateString: string): string {
+  const date = new Date(dateString);
+  return date.toLocaleDateString("en-US", {
+    month: "short",
+    day: "numeric",
+  });
+}
+
+export function truncate(str: string, maxLen: number): string {
+  const clean = str.replace(/[\n\r\t]+/g, " ").replace(/\s+/g, " ").trim();
+  if (clean.length <= maxLen) return clean;
+  return clean.slice(0, maxLen - 3) + "...";
+}
+
+export function formatStatus(status: string): string {
+  switch (status) {
+    case "running":
+      return "Running";
+    case "completed":
+      return "Completed";
+    case "failed":
+      return "Failed";
+    case "unchanged":
+      return "Unchanged";
+    case "changed":
+      return "Changed";
+    default:
+      return status;
+  }
+}
+
+export function formatFilterStage(stage: string): string {
+  switch (stage) {
+    case "supported-elsewhere-filter":
+      return "Filter";
+    case "review":
+      return "Review";
+    default:
+      return stage;
+  }
+}
diff --git a/apps/web/src/application/services/tools/createToolAPIHandler.ts b/apps/web/src/application/services/tools/createToolAPIHandler.ts
index 7a90ec57d..2ae58d9e7 100644
--- a/apps/web/src/application/services/tools/createToolAPIHandler.ts
+++ b/apps/web/src/application/services/tools/createToolAPIHandler.ts
@@ -1,7 +1,6 @@
 import { NextRequest, NextResponse } from 'next/server';
-import { Tool } from '@roast/ai';
+import { Tool, logger as aiLogger } from '@roast/ai/server';
 import { logger } from '@/infrastructure/logging/logger';
-import { logger as aiLogger } from '@roast/ai/server';
 import { auth } from '@/infrastructure/auth/auth';
 import { config } from '@roast/domain';
 
diff --git a/dev/scripts/dev-env.sh b/dev/scripts/dev-env.sh
index 8378c5998..8d06a4051 100755
--- a/dev/scripts/dev-env.sh
+++ b/dev/scripts/dev-env.sh
@@ -1,29 +1,43 @@
 #!/bin/bash
 
-# Dev environment manager using zellij
-# Usage: ./dev-env.sh [start|stop|status|attach]
+# Dev environment manager using tmux
+# Usage: ./dev-env.sh [start|stop|status|attach|restart]
 
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
 SESSION_NAME="roast-dev"
-LAYOUT_FILE="$SCRIPT_DIR/dev-env.kdl"
 
 start_dev() {
-    if zellij list-sessions 2>/dev/null | grep -q "$SESSION_NAME"; then
+    if tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
         echo "Session '$SESSION_NAME' already running"
         exit 0
     fi
 
     cd "$REPO_ROOT"
-    zellij --session "$SESSION_NAME" --new-session-with-layout "$LAYOUT_FILE" &
-    sleep 2
+
+    # Create new detached session with first window for web
+    tmux new-session -d -s "$SESSION_NAME" -n "dev" -c "$REPO_ROOT"
+
+    # Split vertically and run jobs in right pane
+    tmux split-window -h -t "$SESSION_NAME:dev" -c "$REPO_ROOT/internal-packages/jobs"
+
+    # Run web dev server in left pane
+    tmux send-keys -t "$SESSION_NAME:dev.0" "pnpm run dev -H 0.0.0.0" Enter
+
+    # Run jobs processor in right pane
+    tmux send-keys -t "$SESSION_NAME:dev.1" "NODE_ENV=development pnpm run process-pgboss" Enter
+
+    # Select left pane
+    tmux select-pane -t "$SESSION_NAME:dev.0"
+
     echo "Dev session '$SESSION_NAME' started"
+    echo "Use './dev-env.sh attach' or 'tmux attach -t $SESSION_NAME' to attach"
 }
 
 stop_dev() {
-    if zellij list-sessions 2>/dev/null | grep -q "$SESSION_NAME"; then
+    if tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
         echo "Stopping dev environment..."
-        zellij kill-session "$SESSION_NAME"
+        tmux kill-session -t "$SESSION_NAME"
         echo "Session '$SESSION_NAME' stopped."
     else
         echo "Session '$SESSION_NAME' is not running."
@@ -31,22 +45,57 @@ stop_dev() {
 }
 
 status_dev() {
-    if zellij list-sessions 2>/dev/null | grep -q "$SESSION_NAME"; then
+    if tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
         echo "Session '$SESSION_NAME' is running."
-        zellij list-sessions | grep "$SESSION_NAME"
+        tmux list-windows -t "$SESSION_NAME"
     else
         echo "Session '$SESSION_NAME' is not running."
     fi
 }
 
 attach_dev() {
-    if zellij list-sessions 2>/dev/null | grep -q "$SESSION_NAME"; then
-        zellij attach "$SESSION_NAME"
+    if tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
+        tmux attach -t "$SESSION_NAME"
     else
         echo "Session '$SESSION_NAME' is not running. Use 'start' first."
     fi
 }
 
+restart_dev() {
+    if ! tmux has-session -t "$SESSION_NAME" 2>/dev/null; then
+        echo "Session '$SESSION_NAME' is not running. Starting fresh..."
+        start_dev
+        return
+    fi
+
+    echo "Restarting dev environment..."
+
+    # Send Ctrl+C to both panes to kill running processes
+    tmux send-keys -t "$SESSION_NAME:dev.0" C-c
+    tmux send-keys -t "$SESSION_NAME:dev.1" C-c
+
+    # Wait a moment for processes to die
+    sleep 1
+
+    # Clear visible screen and scrollback buffer in both panes
+    tmux send-keys -t "$SESSION_NAME:dev.0" "clear" Enter
+    tmux send-keys -t "$SESSION_NAME:dev.1" "clear" Enter
+    sleep 0.2
+    tmux clear-history -t "$SESSION_NAME:dev.0"
+    tmux clear-history -t "$SESSION_NAME:dev.1"
+
+    # Re-run the commands
+    tmux send-keys -t "$SESSION_NAME:dev.0" "pnpm run dev -H 0.0.0.0" Enter
+    tmux send-keys -t "$SESSION_NAME:dev.1" "NODE_ENV=development pnpm run process-pgboss" Enter
+
+    echo "Dev environment restarted in existing session."
+}
+
+psql_dev() {
+    source "$SCRIPT_DIR/dev/db/lib/db_functions.sh"
+    psql_local "${LOCAL_DB_NAME:-roast_my_post}" "$@"
+}
+
 case "${1:-start}" in
     start)
         start_dev
@@ -60,8 +109,15 @@ case "${1:-start}" in
     attach)
         attach_dev
         ;;
+    restart)
+        restart_dev
+        ;;
+    psql)
+        shift
+        psql_dev "$@"
+        ;;
     *)
-        echo "Usage: $0 [start|stop|status|attach]"
+        echo "Usage: $0 [start|stop|status|attach|restart|psql]"
         exit 1
         ;;
 esac
diff --git a/dev/scripts/dev/db/setup_db.sh b/dev/scripts/dev/db/setup_db.sh
index 3365009ed..0166dcc76 100755
--- a/dev/scripts/dev/db/setup_db.sh
+++ b/dev/scripts/dev/db/setup_db.sh
@@ -70,5 +70,8 @@ sed -i "s/$PROD_DB_USER/$LOCAL_DB_USER/g; s/doadmin/$LOCAL_DB_USER/g" schema.sql
 echo "Importing production schema and data..."
 cat schema.sql | psql_local "$LOCAL_DB_NAME"
 
+# Clean up dump file
+echo "Cleaning up dump file..."
+rm -f schema.sql
 
 echo "Database setup completed successfully!"
diff --git a/dev/scripts/lint-pr-strict.sh b/dev/scripts/lint-pr-strict.sh
new file mode 100755
index 000000000..aae3fd95b
--- /dev/null
+++ b/dev/scripts/lint-pr-strict.sh
@@ -0,0 +1,242 @@
+#!/usr/bin/env bash
+
+# Strict ESLint check for PR changed files only
+#
+# Runs strict type-aware ESLint rules on files changed in the current branch
+# compared to main. Catches real bugs without pedantic style rules.
+#
+# Usage:
+#   ./dev/scripts/lint-pr-strict.sh [options] [path-filter]
+#
+# Options:
+#   -b, --base <branch>    Base branch to compare against (default: origin/main)
+#   -l, --list             List files that would be checked, don't run lint
+#   -h, --help             Show this help message
+#
+# Examples:
+#   ./dev/scripts/lint-pr-strict.sh                          # All changed files
+#   ./dev/scripts/lint-pr-strict.sh internal-packages/ai     # Only ai package
+#   ./dev/scripts/lint-pr-strict.sh apps/web                 # Only web app
+#   ./dev/scripts/lint-pr-strict.sh -l                       # List files only
+#   ./dev/scripts/lint-pr-strict.sh -b origin/develop        # Compare to develop
+
+set -euo pipefail
+IFS=$'\n\t'
+
+# ==============================================================================
+# Configuration
+# ==============================================================================
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+
+DEFAULT_BASE_BRANCH="origin/main"
+EXCLUDED_PATTERNS=(
+    '\.test\.'
+    '\.vtest\.'
+    '^meta-evals/'
+)
+
+# Strict rules that catch real bugs (not style)
+STRICT_RULES=(
+    # Dead code detection
+    "no-unused-private-class-members:warn"
+    "@typescript-eslint/no-unnecessary-condition:warn"
+    "@typescript-eslint/no-unnecessary-type-assertion:warn"
+    "@typescript-eslint/no-redundant-type-constituents:warn"
+    "no-unreachable:error"
+    "no-unreachable-loop:warn"
+
+    # Promise/async bugs (critical for Node.js)
+    "@typescript-eslint/no-floating-promises:warn"
+    "@typescript-eslint/no-misused-promises:warn"
+    "@typescript-eslint/await-thenable:warn"
+    "@typescript-eslint/require-await:warn"
+
+    # Logic errors
+    "@typescript-eslint/switch-exhaustiveness-check:warn"
+    "array-callback-return:warn"
+    "no-constant-binary-expression:warn"
+    "no-self-compare:warn"
+    "use-isnan:error"
+    "valid-typeof:error"
+
+    # Type safety
+    "@typescript-eslint/no-for-in-array:warn"
+    "@typescript-eslint/no-array-delete:warn"
+    "@typescript-eslint/no-base-to-string:warn"
+    "@typescript-eslint/restrict-plus-operands:warn"
+    "@typescript-eslint/unbound-method:warn"
+
+    # Error handling
+    "@typescript-eslint/only-throw-error:warn"
+    "@typescript-eslint/prefer-promise-reject-errors:warn"
+
+    # Security
+    "@typescript-eslint/no-implied-eval:error"
+    "no-new-func:error"
+)
+
+# ==============================================================================
+# Output helpers
+# ==============================================================================
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+BOLD='\033[1m'
+NC='\033[0m'
+
+log_info()    { echo -e "${BLUE}$*${NC}"; }
+log_success() { echo -e "${GREEN}$*${NC}"; }
+log_warning() { echo -e "${YELLOW}$*${NC}"; }
+log_error()   { echo -e "${RED}$*${NC}" >&2; }
+log_header()  { echo -e "\n${BOLD}${BLUE}=== $* ===${NC}\n"; }
+
+# ==============================================================================
+# Core functions
+# ==============================================================================
+
+show_help() {
+    sed -n '3,18p' "$0" | sed 's/^# //' | sed 's/^#//'
+    exit 0
+}
+
+get_changed_files() {
+    local base_branch="$1"
+    local path_filter="${2:-}"
+
+    local files
+    files=$(git diff --name-only "$base_branch"...HEAD 2>/dev/null || git diff --name-only "$base_branch" HEAD)
+
+    # Filter to TypeScript files
+    files=$(echo "$files" | grep -E '\.tsx?$' || true)
+
+    # Apply exclusion patterns
+    for pattern in "${EXCLUDED_PATTERNS[@]}"; do
+        files=$(echo "$files" | grep -v "$pattern" || true)
+    done
+
+    # Apply path filter if provided
+    if [[ -n "$path_filter" ]]; then
+        files=$(echo "$files" | grep "^$path_filter" || true)
+    fi
+
+    echo "$files"
+}
+
+run_eslint() {
+    local files="$1"
+
+    if [[ -z "$files" ]]; then
+        log_success "No TypeScript files to check in this scope."
+        return 0
+    fi
+
+    local file_count
+    file_count=$(echo "$files" | wc -l | tr -d ' ')
+    log_info "Checking $file_count files..."
+    echo ""
+
+    # Build rule arguments array
+    local rule_args=()
+    for rule in "${STRICT_RULES[@]}"; do
+        rule_args+=("--rule=$rule")
+    done
+
+    # Convert newline-separated files to array
+    local file_array=()
+    while IFS= read -r file; do
+        [[ -n "$file" ]] && file_array+=("$file")
+    done <<< "$files"
+
+    # Run eslint from repo root
+    cd "$REPO_ROOT"
+
+    npx eslint \
+        --config apps/web/config/eslint/.eslintrc.json \
+        "${rule_args[@]}" \
+        "${file_array[@]}"
+}
+
+list_files() {
+    local files="$1"
+
+    if [[ -z "$files" ]]; then
+        log_warning "No TypeScript files changed in this scope."
+        return 0
+    fi
+
+    local file_count
+    file_count=$(echo "$files" | wc -l | tr -d ' ')
+    log_info "Files that would be checked ($file_count):"
+    echo ""
+    echo "$files" | sed 's/^/  /'
+}
+
+# ==============================================================================
+# Main
+# ==============================================================================
+
+main() {
+    local base_branch="$DEFAULT_BASE_BRANCH"
+    local path_filter=""
+    local list_only=false
+
+    # Parse arguments
+    while [[ $# -gt 0 ]]; do
+        case "$1" in
+            -h|--help)
+                show_help
+                ;;
+            -b|--base)
+                base_branch="$2"
+                shift 2
+                ;;
+            -l|--list)
+                list_only=true
+                shift
+                ;;
+            -*)
+                log_error "Unknown option: $1"
+                show_help
+                ;;
+            *)
+                path_filter="$1"
+                shift
+                ;;
+        esac
+    done
+
+    log_header "Strict PR Lint Check"
+
+    log_info "Base branch: $base_branch"
+    if [[ -n "$path_filter" ]]; then
+        log_info "Path filter: $path_filter"
+    else
+        log_info "Path filter: (all changed files, excluding meta-evals)"
+    fi
+    echo ""
+
+    # Get changed files
+    local files
+    files=$(get_changed_files "$base_branch" "$path_filter")
+
+    if $list_only; then
+        list_files "$files"
+        exit 0
+    fi
+
+    # Run lint
+    if run_eslint "$files"; then
+        echo ""
+        log_success "All strict checks passed!"
+    else
+        echo ""
+        log_error "Strict checks found issues."
+        exit 1
+    fi
+}
+
+main "$@"
diff --git a/internal-packages/ai/package.json b/internal-packages/ai/package.json
index a34dd25b7..69d06e041 100644
--- a/internal-packages/ai/package.json
+++ b/internal-packages/ai/package.json
@@ -48,6 +48,34 @@
     "./tools/generated-readmes": {
       "types": "./src/tools/generated-readmes.ts",
       "default": "./src/tools/generated-readmes.ts"
+    },
+    "./fallacy-extraction": {
+      "types": "./dist/analysis-plugins/plugins/fallacy-check/extraction/index.d.ts",
+      "default": "./dist/analysis-plugins/plugins/fallacy-check/extraction/index.js"
+    },
+    "./fallacy-extraction/lab": {
+      "types": "./dist/analysis-plugins/plugins/fallacy-check/extraction/lab-exports.d.ts",
+      "default": "./dist/analysis-plugins/plugins/fallacy-check/extraction/lab-exports.js"
+    },
+    "./fallacy-judge": {
+      "types": "./dist/tools/fallacy-judge/index.d.ts",
+      "default": "./dist/tools/fallacy-judge/index.js"
+    },
+    "./fallacy-judge/types": {
+      "types": "./dist/tools/fallacy-judge/types.d.ts",
+      "default": "./dist/tools/fallacy-judge/types.js"
+    },
+    "./fallacy-extractor/prompts": {
+      "types": "./src/tools/fallacy-extractor/prompts.ts",
+      "default": "./src/tools/fallacy-extractor/prompts.ts"
+    },
+    "./fallacy-judge/prompts": {
+      "types": "./src/tools/fallacy-judge/prompts.ts",
+      "default": "./src/tools/fallacy-judge/prompts.ts"
+    },
+    "./supported-elsewhere-filter/prompts": {
+      "types": "./src/tools/supported-elsewhere-filter/prompts.ts",
+      "default": "./src/tools/supported-elsewhere-filter/prompts.ts"
     }
   },
   "scripts": {
@@ -74,6 +102,7 @@
   "dependencies": {
     "@anthropic-ai/sdk": "^0.54.0",
     "@leeoniya/ufuzzy": "^1.0.18",
+    "fuse.js": "^7.1.0",
     "mathjs": "^14.0.1",
     "openai": "^4.77.0",
     "tiktoken": "^1.0.17",
diff --git a/internal-packages/ai/src/analysis-plugins/PluginManager.ts b/internal-packages/ai/src/analysis-plugins/PluginManager.ts
index 48a6ccb79..e3e8ad6d7 100644
--- a/internal-packages/ai/src/analysis-plugins/PluginManager.ts
+++ b/internal-packages/ai/src/analysis-plugins/PluginManager.ts
@@ -10,8 +10,6 @@ import {
   HeliconeSessionManager,
 } from "../helicone/simpleSessionManager";
 import { logger } from "../shared/logger";
-// Document and Comment types are passed as parameters to avoid circular dependencies
-// LLMInteraction type removed - was unused
 import type { Comment } from "../shared/types";
 import { ANALYSIS_MODEL } from "../types";
 // Import plugin ID constants
@@ -50,6 +48,10 @@ export interface PluginManagerConfig {
   jobId?: string; // For logging integration
   pluginSelection?: PluginSelection; // Optional plugin selection configuration
   useIsolation?: boolean; // Enable plugin state isolation
+  /** Profile ID for FallacyCheckPlugin configuration */
+  fallacyCheckProfileId?: string;
+  /** Agent ID for FallacyCheckPlugin default profile loading */
+  fallacyCheckAgentId?: string;
 }
 
 export interface SimpleDocumentAnalysisResult {
@@ -66,6 +68,7 @@ export interface SimpleDocumentAnalysisResult {
   };
   logSummary: JobLogSummary;
   jobLogString: string; // Formatted string for Job.logs field
+  pipelineTelemetry?: Record<string, unknown>; // Pipeline telemetry from plugins (e.g., FallacyCheckPlugin)
 }
 
 export interface FullDocumentAnalysisResult {
@@ -88,6 +91,7 @@ export interface FullDocumentAnalysisResult {
   }>;
   logSummary: JobLogSummary;
   jobLogString: string; // Formatted string for Job.logs field
+  pipelineTelemetry?: Record<string, unknown>; // Pipeline telemetry from plugins (e.g., FallacyCheckPlugin)
 }
 
 export class PluginManager {
@@ -104,6 +108,10 @@ export class PluginManager {
   private isolatedExecutor?: IsolatedPluginExecutor;
   private factory?: PluginFactory;
 
+  // Profile configuration for FallacyCheckPlugin
+  private fallacyCheckProfileId?: string;
+  private fallacyCheckAgentId?: string;
+
   constructor(config: PluginManagerConfig = {}) {
     // Use provided session manager, or fall back to global if available
     this.sessionManager = config.sessionManager || getGlobalSessionManager();
@@ -111,6 +119,10 @@ export class PluginManager {
     this.pluginSelection = config.pluginSelection;
     this.useIsolation = config.useIsolation || false;
 
+    // Profile configuration for FallacyCheckPlugin
+    this.fallacyCheckProfileId = config.fallacyCheckProfileId;
+    this.fallacyCheckAgentId = config.fallacyCheckAgentId;
+
     // Initialize refactored components
     this.registry = new PluginRegistry();
     this.router = new PluginRouter();
@@ -201,7 +213,6 @@ export class PluginManager {
         preserveContext: true,
       });
 
-      // Debug: Verify chunk positions
       logger.info("Verifying chunk positions after creation", {
         textLength: text.length,
         textStartsWith: text.slice(0, 100),
@@ -209,7 +220,6 @@ export class PluginManager {
       });
 
       for (const chunk of chunks.slice(0, 3)) {
-        // Check first 3 chunks
         if (chunk.metadata?.position) {
           const extractedText = text.substring(
             chunk.metadata.position.start,
@@ -287,7 +297,7 @@ export class PluginManager {
       const pluginPromises = plugins.map(async (plugin) => {
         const pluginName = plugin.name();
         const maxRetries = 2;
-        let lastError: Error | unknown = null;
+        let lastError: unknown = null;
 
         // Start plugin logging
         this.pluginLogger.pluginStarted(pluginName);
@@ -319,7 +329,6 @@ export class PluginManager {
             // Get the chunks assigned to this plugin
             const assignedChunks = chunksPerPlugin.get(pluginName) || [];
 
-            // Check if this is an always-run plugin
             const isrunOnAllChunks = plugin.runOnAllChunks === true;
 
             logger.info(
@@ -345,7 +354,6 @@ export class PluginManager {
               };
             }
 
-            // Add basic logging wrapper around plugin execution
             pluginLoggerInstance.startPhase(
               "initialization",
               `Starting ${pluginName} analysis with ${assignedChunks.length} chunks`
@@ -468,7 +476,7 @@ export class PluginManager {
         result,
         success,
         error,
-        recoveryAction,
+        recoveryAction: _recoveryAction,
       } of results) {
         if (success && result) {
           pluginResults.set(plugin, result);
@@ -521,6 +529,13 @@ export class PluginManager {
       const logSummary = this.pluginLogger.generateSummary();
       const jobLogString = this.pluginLogger.generateJobLogString();
 
+      // Collect pipeline telemetry from plugins that provide it (e.g., FALLACY_CHECK)
+      let pipelineTelemetry: Record<string, unknown> | undefined;
+      const fallacyResult = pluginResults.get('FALLACY_CHECK');
+      if (fallacyResult?.pipelineTelemetry) {
+        pipelineTelemetry = fallacyResult.pipelineTelemetry;
+      }
+
       return {
         summary,
         analysis,
@@ -535,6 +550,7 @@ export class PluginManager {
         },
         logSummary,
         jobLogString,
+        pipelineTelemetry,
       };
     } finally {
       // Cleanup if needed
@@ -565,7 +581,7 @@ export class PluginManager {
       const pluginStartTime = Date.now();
 
       // Get selected plugins
-      const plugins = await this.getSelectedPlugins();
+      const plugins = this.getSelectedPlugins();
 
       // Run analysis on document content using new API
       const pluginResults = await this.analyzeDocumentSimple(
@@ -624,6 +640,7 @@ export class PluginManager {
         errors: undefined, // TODO: Add better error tracking
         logSummary: pluginResults.logSummary,
         jobLogString: pluginResults.jobLogString,
+        pipelineTelemetry: pluginResults.pipelineTelemetry,
       };
     } catch (error) {
       logger.error(
@@ -660,6 +677,7 @@ export class PluginManager {
         ],
         logSummary: this.pluginLogger.generateSummary(),
         jobLogString: this.pluginLogger.generateJobLogString(),
+        pipelineTelemetry: undefined,
       };
     }
   }
@@ -675,17 +693,24 @@ export class PluginManager {
       [PluginType.FACT_CHECK, new FactCheckPlugin()],
       [PluginType.FORECAST, new ForecastPlugin()],
       [PluginType.LINK_ANALYSIS, new LinkPlugin()],
-      [PluginType.FALLACY_CHECK, new FallacyCheckPlugin()],
+      // Pass profile options to FallacyCheckPlugin
+      [PluginType.FALLACY_CHECK, new FallacyCheckPlugin({
+        profileId: this.fallacyCheckProfileId,
+        agentId: this.fallacyCheckAgentId,
+      })],
     ]);
 
-    logger.info(`Created fresh instances of ${plugins.size} plugins`);
+    logger.info(`Created fresh instances of ${plugins.size} plugins`, {
+      fallacyCheckProfileId: this.fallacyCheckProfileId,
+      fallacyCheckAgentId: this.fallacyCheckAgentId,
+    });
     return plugins;
   }
 
   /**
    * Get selected plugins based on configuration
    */
-  private async getSelectedPlugins(): Promise<SimpleAnalysisPlugin[]> {
+  private getSelectedPlugins(): SimpleAnalysisPlugin[] {
     const allPlugins = this.createPluginInstances();
 
     // Default plugins if no selection specified
@@ -739,26 +764,20 @@ export class PluginManager {
   private deduplicateHighlights(highlights: Comment[]): Comment[] {
     if (highlights.length <= 1) return highlights;
 
-    // Filter out comments without highlights first
-    const withHighlights = highlights.filter(
-      (h) => h.highlight && h.highlight.startOffset !== undefined
-    );
-    if (withHighlights.length <= 1) return withHighlights;
-
     // Sort by start offset
-    const sorted = [...withHighlights].sort(
-      (a, b) => a.highlight!.startOffset! - b.highlight!.startOffset!
+    const sorted = [...highlights].sort(
+      (a, b) => a.highlight.startOffset - b.highlight.startOffset
     );
 
     const unique: Comment[] = [];
 
-    for (const highlight of sorted) {
+    for (const comment of sorted) {
       // Check if this highlight overlaps with any existing unique highlight
       const overlaps = unique.some((existing) => {
-        const existingStart = existing.highlight!.startOffset!;
-        const existingEnd = existing.highlight!.endOffset!;
-        const currentStart = highlight.highlight!.startOffset!;
-        const currentEnd = highlight.highlight!.endOffset!;
+        const existingStart = existing.highlight.startOffset;
+        const existingEnd = existing.highlight.endOffset;
+        const currentStart = comment.highlight.startOffset;
+        const currentEnd = comment.highlight.endOffset;
 
         // Check for overlap (fixed off-by-one error)
         return (
@@ -769,7 +788,7 @@ export class PluginManager {
       });
 
       if (!overlaps) {
-        unique.push(highlight);
+        unique.push(comment);
       }
     }
 
@@ -783,7 +802,7 @@ export class PluginManager {
     if (!error) return false;
 
     // Check for common retryable error patterns
-    const errorMessage = (error as unknown as Error)?.message || String(error);
+    const errorMessage = error instanceof Error ? error.message : String(error);
 
     // Network/timeout errors are retryable
     if (
@@ -829,7 +848,7 @@ export class PluginManager {
    * Determine the appropriate recovery action for a failed plugin
    */
   private determineRecoveryAction(pluginName: string, error: unknown): string {
-    const errorMessage = (error as unknown as Error)?.message || String(error);
+    const errorMessage = error instanceof Error ? error.message : String(error);
 
     // Specific recovery actions based on error type
     if (errorMessage.includes("timeout")) {
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/comments/markdown.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/comments/markdown.ts
index 186b4884c..437da6f9c 100644
--- a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/comments/markdown.ts
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/comments/markdown.ts
@@ -5,22 +5,6 @@ import { ISSUE_TYPES } from "../constants";
  * Pure functions for generating markdown content for epistemic critic comments.
  */
 
-// Helper to escape markdown special characters
-function escapeMd(s: string): string {
-  return s.replace(/[[\]()*_`>]/g, (m) => `\\${m}`);
-}
-
-// Helper to sanitize URLs
-function sanitizeUrl(u: string): string {
-  try {
-    const url = new URL(u, "https://example.com");
-    const scheme = url.protocol.replace(":", "");
-    return scheme === "http" || scheme === "https" ? u : "#";
-  } catch {
-    return "#";
-  }
-}
-
 /**
  * Build the main description content for an epistemic issue comment
  */
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/dedup.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/dedup.ts
new file mode 100644
index 000000000..b4c0b820a
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/dedup.ts
@@ -0,0 +1,182 @@
+/**
+ * Deduplication and prioritization utilities for fallacy issues
+ *
+ * Uses Jaccard word-overlap similarity with quality-based selection:
+ * - When duplicates are found, keeps the higher-quality issue
+ * - Quality based on text length (more context) + scores (severity, confidence, importance)
+ */
+
+import { logger } from "../../../shared/logger";
+import type { FallacyIssue } from "./FallacyIssue";
+import { LIMITS } from "./constants";
+import { JACCARD_SIMILARITY_THRESHOLD } from "../../../types/common";
+
+interface DuplicateMatch<T> {
+  keptIdx: number;
+  kept: T;
+  similarity: number;
+}
+
+/**
+ * Calculate priority score for an issue.
+ * Higher score = more important to address.
+ */
+export function calculatePriorityScore(issue: FallacyIssue): number {
+  return issue.severityScore * 0.6 + issue.importanceScore * 0.4;
+}
+
+/**
+ * Normalize text for comparison.
+ * - Lowercase
+ * - Collapse whitespace
+ * - Trim
+ */
+export function normalizeTextForDedup(text: string): string {
+  return text.toLowerCase().replace(/\s+/g, " ").trim();
+}
+
+/**
+ * Calculate Jaccard similarity between two texts based on word overlap.
+ * Returns a value between 0 (no overlap) and 1 (identical).
+ */
+function calculateJaccardSimilarity(textA: string, textB: string): number {
+  const wordsA = new Set(normalizeTextForDedup(textA).split(/\s+/).filter(Boolean));
+  const wordsB = new Set(normalizeTextForDedup(textB).split(/\s+/).filter(Boolean));
+
+  if (wordsA.size === 0 && wordsB.size === 0) return 1;
+  if (wordsA.size === 0 || wordsB.size === 0) return 0;
+
+  let intersection = 0;
+  for (const word of wordsA) {
+    if (wordsB.has(word)) intersection++;
+  }
+
+  const union = wordsA.size + wordsB.size - intersection;
+  return union > 0 ? intersection / union : 0;
+}
+
+/**
+ * Compute a quality score for an issue.
+ * Higher = better quality (prefer to keep).
+ * Factors: text length (more context), severity, confidence, importance.
+ */
+function computeIssueQuality(issue: FallacyIssue): number {
+  // Normalize text length (log scale to prevent extremely long texts from dominating)
+  const lengthScore = Math.log10(issue.text.length + 1) / 4; // ~0.5-1.0 for typical lengths
+
+  // Combine severity, confidence, importance (each 0-100, normalize to 0-1)
+  const severityNorm = issue.severityScore / 100;
+  const confidenceNorm = issue.confidenceScore / 100;
+  const importanceNorm = issue.importanceScore / 100;
+
+  // Weighted combination: prefer longer text, then higher scores
+  // Length is most important (40%), then confidence (25%), severity (20%), importance (15%)
+  return (
+    lengthScore * 0.4 +
+    confidenceNorm * 0.25 +
+    severityNorm * 0.2 +
+    importanceNorm * 0.15
+  );
+}
+
+/**
+ * Deduplicate issues using Jaccard word-overlap similarity.
+ * When duplicates are found, keeps the higher-quality issue
+ * (longer text + higher severity/confidence/importance).
+ */
+export function deduplicateIssues(issues: FallacyIssue[]): FallacyIssue[] {
+  const unique: FallacyIssue[] = [];
+
+  for (const issue of issues) {
+    // Check if this issue is a duplicate of any already-kept issue
+    let bestMatch: DuplicateMatch<FallacyIssue> | null = null;
+
+    for (let i = 0; i < unique.length; i++) {
+      const kept = unique[i];
+      const similarity = calculateJaccardSimilarity(issue.text, kept.text);
+
+      if (similarity >= JACCARD_SIMILARITY_THRESHOLD) {
+        if (!bestMatch || similarity > bestMatch.similarity) {
+          bestMatch = { keptIdx: i, kept, similarity };
+        }
+      }
+    }
+
+    if (bestMatch) {
+      // Found a duplicate - decide which to keep based on quality score
+      const newQuality = computeIssueQuality(issue);
+      const keptQuality = computeIssueQuality(bestMatch.kept);
+
+      if (newQuality > keptQuality) {
+        // New issue is better - swap: replace kept with new
+        logger.debug(
+          `[Dedup] Replacing issue (quality ${keptQuality.toFixed(2)}) with better duplicate (quality ${newQuality.toFixed(2)})`
+        );
+        unique[bestMatch.keptIdx] = issue;
+      } else {
+        // Kept issue is better - discard new
+        logger.debug(
+          `[Dedup] Discarding duplicate (quality ${newQuality.toFixed(2)}), keeping (quality ${keptQuality.toFixed(2)})`
+        );
+      }
+    } else {
+      unique.push(issue);
+    }
+  }
+
+  if (unique.length < issues.length) {
+    logger.info(
+      `[Dedup] Reduced ${issues.length} issues to ${unique.length} unique (${issues.length - unique.length} duplicates removed)`
+    );
+  }
+
+  return unique;
+}
+
+/**
+ * Prioritize and limit issues based on severity and importance scores.
+ * - Sorts by priority score (highest first)
+ * - Limits to MAX_ISSUES_TO_PROCESS if too many
+ */
+export function prioritizeAndLimitIssues(issues: FallacyIssue[]): FallacyIssue[] {
+  // Sort by priority score (most important issues first)
+  const sortedIssues = [...issues].sort(
+    (a, b) => calculatePriorityScore(b) - calculatePriorityScore(a)
+  );
+
+  // Limit to maximum issues if we have too many
+  if (sortedIssues.length > LIMITS.MAX_ISSUES_TO_PROCESS) {
+    logger.info(
+      `Limiting issues from ${sortedIssues.length} to ${LIMITS.MAX_ISSUES_TO_PROCESS} based on priority scores`
+    );
+
+    const keptIssues = sortedIssues.slice(0, LIMITS.MAX_ISSUES_TO_PROCESS);
+    const discardedIssues = sortedIssues.slice(LIMITS.MAX_ISSUES_TO_PROCESS);
+
+    const avgKeptScore =
+      keptIssues.reduce((sum, i) => sum + calculatePriorityScore(i), 0) /
+      keptIssues.length;
+    const avgDiscardedScore =
+      discardedIssues.length > 0
+        ? discardedIssues.reduce((sum, i) => sum + calculatePriorityScore(i), 0) /
+          discardedIssues.length
+        : 0;
+
+    logger.debug(
+      `Priority scores - Kept issues avg: ${avgKeptScore.toFixed(1)}, ` +
+        `Discarded issues avg: ${avgDiscardedScore.toFixed(1)}`
+    );
+
+    return keptIssues;
+  }
+
+  return sortedIssues;
+}
+
+/**
+ * Full deduplication pipeline: deduplicate, then prioritize and limit.
+ */
+export function deduplicateAndPrioritize(issues: FallacyIssue[]): FallacyIssue[] {
+  const deduplicated = deduplicateIssues(issues);
+  return prioritizeAndLimitIssues(deduplicated);
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/config.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/config.ts
new file mode 100644
index 000000000..4101e1ed9
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/config.ts
@@ -0,0 +1,319 @@
+/**
+ * Multi-Extractor Configuration Parser
+ *
+ * Parses the FALLACY_EXTRACTORS environment variable and provides defaults.
+ * Also supports profile-based configuration from the database.
+ */
+
+import type { ExtractorConfig, MultiExtractorConfig, JudgeConfig } from './types';
+import type { FallacyCheckerProfileConfig } from '../profile-types';
+import { logger } from '../../../../shared/logger';
+
+/** Default model for extraction when not configured */
+const DEFAULT_EXTRACTOR_MODEL = 'claude-sonnet-4-5-20250929';
+
+/** Default model for judge aggregation */
+const DEFAULT_JUDGE_MODEL = 'claude-sonnet-4-5-20250929';
+
+/** Default temperature for Claude models */
+const DEFAULT_CLAUDE_TEMPERATURE = 0;
+
+/** Default temperature for OpenRouter models */
+const DEFAULT_OPENROUTER_TEMPERATURE = 0.1;
+
+/**
+ * Check if a model is an OpenRouter model (contains '/')
+ */
+function isOpenRouterModel(model: string): boolean {
+  return model.includes('/');
+}
+
+/**
+ * Get default temperature for a model
+ */
+export function getDefaultTemperature(model: string): number {
+  return isOpenRouterModel(model)
+    ? DEFAULT_OPENROUTER_TEMPERATURE
+    : DEFAULT_CLAUDE_TEMPERATURE;
+}
+
+/**
+ * Generate a unique label for an extractor config
+ */
+export function generateExtractorLabel(config: ExtractorConfig): string {
+  if (config.label) {
+    return config.label;
+  }
+
+  // Extract short model name
+  let shortName: string;
+  if (isOpenRouterModel(config.model)) {
+    // e.g., "google/gemini-3-flash-preview" -> "gemini-3-flash"
+    const parts = config.model.split('/');
+    shortName = parts[parts.length - 1].replace('-preview', '').replace('-latest', '');
+  } else {
+    // e.g., "claude-sonnet-4-5-20250929" -> "sonnet"
+    if (config.model.includes('opus')) {
+      shortName = 'opus';
+    } else if (config.model.includes('sonnet')) {
+      shortName = 'sonnet';
+    } else if (config.model.includes('haiku')) {
+      shortName = 'haiku';
+    } else {
+      shortName = config.model.slice(0, 10);
+    }
+  }
+
+  // Build suffix parts
+  const suffixParts: string[] = [];
+
+  // Add temperature suffix if non-default
+  if (config.temperature === 'default') {
+    suffixParts.push('tDef');
+  } else {
+    const defaultTemp = getDefaultTemperature(config.model);
+    const temp = config.temperature ?? defaultTemp;
+    if (temp !== defaultTemp) {
+      suffixParts.push(`t${temp}`);
+    }
+  }
+
+  // Add thinking suffix if disabled
+  if (config.thinking === false) {
+    suffixParts.push('noThink');
+  }
+
+  if (suffixParts.length > 0) {
+    return `${shortName}-${suffixParts.join('-')}`;
+  }
+
+  return shortName;
+}
+
+/**
+ * Generate a unique extractor ID (for telemetry correlation)
+ */
+export function generateExtractorId(
+  config: ExtractorConfig,
+  index: number,
+  allConfigs: ExtractorConfig[]
+): string {
+  const label = generateExtractorLabel(config);
+
+  // Check if this label would be duplicated
+  const sameLabels = allConfigs.filter(c => generateExtractorLabel(c) === label);
+
+  // Only append index if there are duplicates
+  if (sameLabels.length > 1) {
+    return `${label}-${index}`;
+  }
+  return label;
+}
+
+/**
+ * Parse and validate the FALLACY_EXTRACTORS environment variable
+ *
+ * Expected format:
+ * ```json
+ * [
+ *   {"model": "claude-sonnet-4-5-20250929"},
+ *   {"model": "claude-sonnet-4-5-20250929", "temperature": 0.5},
+ *   {"model": "google/gemini-3-flash-preview", "temperature": 0.1}
+ * ]
+ * ```
+ */
+function parseExtractorsEnvVar(envValue: string): ExtractorConfig[] {
+  try {
+    const parsed = JSON.parse(envValue);
+
+    if (!Array.isArray(parsed)) {
+      logger.warn('[MultiExtractor] FALLACY_EXTRACTORS must be a JSON array, using defaults');
+      return [];
+    }
+
+    const configs: ExtractorConfig[] = [];
+    for (const item of parsed) {
+      if (typeof item !== 'object' || item === null) {
+        logger.warn('[MultiExtractor] Invalid extractor config, skipping:', { item });
+        continue;
+      }
+
+      if (typeof item.model !== 'string' || !item.model) {
+        logger.warn('[MultiExtractor] Extractor config missing model, skipping:', { item });
+        continue;
+      }
+
+      const config: ExtractorConfig = {
+        model: item.model,
+      };
+
+      // Temperature can be a number or "default" string
+      if (typeof item.temperature === 'number') {
+        config.temperature = item.temperature;
+      } else if (item.temperature === 'default') {
+        config.temperature = 'default';
+      }
+
+      if (typeof item.label === 'string' && item.label) {
+        config.label = item.label;
+      }
+
+      // Thinking defaults to true (enabled), can be set to false
+      if (typeof item.thinking === 'boolean') {
+        config.thinking = item.thinking;
+      }
+
+      configs.push(config);
+    }
+
+    return configs;
+  } catch (error) {
+    logger.warn('[MultiExtractor] Failed to parse FALLACY_EXTRACTORS:', {
+      error: error instanceof Error ? error.message : error,
+    });
+    return [];
+  }
+}
+
+/**
+ * Parse FALLACY_JUDGE env var
+ *
+ * Example:
+ * FALLACY_JUDGE='{"model":"google/gemini-3-flash-preview","temperature":"default","thinking":false,"enabled":true}'
+ */
+function parseJudgeEnvVar(): JudgeConfig {
+  const judgeEnv = process.env.FALLACY_JUDGE;
+
+  if (judgeEnv) {
+    try {
+      const parsed = JSON.parse(judgeEnv);
+      if (typeof parsed === 'object' && parsed !== null && typeof parsed.model === 'string') {
+        return {
+          model: parsed.model,
+          temperature: typeof parsed.temperature === 'number' ? parsed.temperature :
+                       parsed.temperature === 'default' ? 'default' : undefined,
+          thinking: typeof parsed.thinking === 'boolean' ? parsed.thinking : undefined,
+          enabled: parsed.enabled !== false,
+        };
+      }
+    } catch {
+      // Invalid JSON - fall through to default
+    }
+  }
+
+  // Default: disabled
+  return {
+    model: DEFAULT_JUDGE_MODEL,
+    enabled: false,
+  };
+}
+
+/**
+ * Get the multi-extractor configuration from environment variables
+ *
+ * Environment variables:
+ * - FALLACY_EXTRACTORS: JSON array of extractor configs
+ * - FALLACY_JUDGE: JSON object with judge config (model, temperature, thinking, enabled)
+ *
+ * Defaults to single extractor with DEFAULT_EXTRACTOR_MODEL if not configured.
+ */
+export function getMultiExtractorConfig(): MultiExtractorConfig {
+  const extractorsEnv = process.env.FALLACY_EXTRACTORS;
+
+  let extractors: ExtractorConfig[];
+
+  if (extractorsEnv) {
+    extractors = parseExtractorsEnvVar(extractorsEnv);
+    if (extractors.length === 0) {
+      logger.warn('[MultiExtractor] No valid extractors in FALLACY_EXTRACTORS, using defaults');
+      extractors = [{ model: DEFAULT_EXTRACTOR_MODEL }];
+    }
+  } else {
+    extractors = [{ model: DEFAULT_EXTRACTOR_MODEL }];
+  }
+
+  return {
+    extractors,
+    judge: parseJudgeEnvVar(),
+  };
+}
+
+/**
+ * Check if LLM judge is enabled for aggregation
+ */
+export function isJudgeEnabled(): boolean {
+  const config = getMultiExtractorConfig();
+  return config.judge.enabled;
+}
+
+/**
+ * Check if multi-extractor mode is enabled (more than one extractor configured)
+ */
+export function isMultiExtractorEnabled(): boolean {
+  const config = getMultiExtractorConfig();
+  return config.extractors.length > 1;
+}
+
+/**
+ * Convert a profile config to MultiExtractorConfig
+ *
+ * This allows using database-stored profiles instead of environment variables.
+ */
+export function profileToMultiExtractorConfig(
+  profileConfig: FallacyCheckerProfileConfig
+): MultiExtractorConfig {
+  return {
+    extractors: profileConfig.models.extractors,
+    judge: profileConfig.models.judge,
+  };
+}
+
+/**
+ * Get multi-extractor configuration from a profile or fall back to environment variables
+ *
+ * @param profileConfig Optional profile config from database
+ */
+export function getMultiExtractorConfigFromProfile(
+  profileConfig?: FallacyCheckerProfileConfig
+): MultiExtractorConfig {
+  if (profileConfig) {
+    return profileToMultiExtractorConfig(profileConfig);
+  }
+  return getMultiExtractorConfig();
+}
+
+/**
+ * Get a human-readable summary of the current configuration
+ */
+export function getConfigSummary(): string {
+  const config = getMultiExtractorConfig();
+
+  const formatTemp = (ext: ExtractorConfig): string => {
+    if (ext.temperature === 'default') return 'default';
+    return String(ext.temperature ?? getDefaultTemperature(ext.model));
+  };
+
+  const formatThinking = (ext: ExtractorConfig): string => {
+    return ext.thinking === false ? ', think=off' : '';
+  };
+
+  if (config.extractors.length === 1) {
+    const ext = config.extractors[0];
+    return `Single extractor: ${ext.model} (t=${formatTemp(ext)}${formatThinking(ext)})`;
+  }
+
+  const extractorSummaries = config.extractors.map((ext, i) => {
+    const label = generateExtractorLabel(ext);
+    return `${i + 1}. ${label} (${ext.model}, t=${formatTemp(ext)}${formatThinking(ext)})`;
+  });
+
+  const judgeStatus = config.judge.enabled
+    ? `${config.judge.model} (t=${config.judge.temperature ?? 'default'}, think=${config.judge.thinking !== false})`
+    : 'disabled';
+
+  return [
+    `Multi-extractor mode: ${config.extractors.length} extractors`,
+    ...extractorSummaries,
+    `Judge: ${judgeStatus}`,
+  ].join('\n');
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/index.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/index.ts
new file mode 100644
index 000000000..1f083a260
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/index.ts
@@ -0,0 +1,9 @@
+/**
+ * Multi-Extractor Module
+ *
+ * Provides parallel extraction with multiple models and LLM judge aggregation.
+ */
+
+export * from './types';
+export * from './config';
+export { runMultiExtractor } from './multiExtractor';
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/lab-exports.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/lab-exports.ts
new file mode 100644
index 000000000..a7559d2ac
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/lab-exports.ts
@@ -0,0 +1,315 @@
+/**
+ * Lab-specific exports for Extractor Lab
+ *
+ * This file provides STANDALONE types and config parsing for the Extractor Lab
+ * without importing from files that have circular dependencies with the plugin system.
+ *
+ * The types here are intentionally duplicated to avoid the circular dependency chain:
+ * extraction → fallacy-extractor → constants → (back to plugin)
+ */
+
+// ============================================================================
+// Standalone Type Definitions (duplicated to avoid cycles)
+// ============================================================================
+
+/** Type of epistemic issue (duplicated from constants.ts ISSUE_TYPES) */
+export type IssueType =
+  | 'misinformation'
+  | 'missing-context'
+  | 'deceptive-wording'
+  | 'logical-fallacy'
+  | 'verified-accurate';
+
+/** Specific types of fallacies */
+export type FallacyType =
+  | 'ad-hominem'
+  | 'straw-man'
+  | 'false-dilemma'
+  | 'slippery-slope'
+  | 'appeal-to-authority'
+  | 'appeal-to-emotion'
+  | 'appeal-to-nature'
+  | 'hasty-generalization'
+  | 'survivorship-bias'
+  | 'selection-bias'
+  | 'cherry-picking'
+  | 'circular-reasoning'
+  | 'equivocation'
+  | 'non-sequitur'
+  | 'other';
+
+/** Raw epistemic issue extracted from text */
+export interface ExtractedFallacyIssue {
+  exactText: string;
+  issueType: IssueType;
+  fallacyType?: FallacyType;
+  severityScore: number;
+  confidenceScore: number;
+  reasoning: string;
+  importanceScore: number;
+  approximateLineNumber?: number;
+  location?: {
+    startOffset: number;
+    endOffset: number;
+    quotedText: string;
+    strategy?: string;
+    confidence?: number;
+  };
+  [key: string]: unknown;
+}
+
+// ============================================================================
+// Configuration Types
+// ============================================================================
+
+export interface ExtractorConfig {
+  model: string;
+  temperature?: number | 'default';
+  label?: string;
+  thinking?: boolean;
+}
+
+/**
+ * Judge configuration
+ */
+export interface JudgeConfig {
+  model: string;
+  temperature?: number | 'default';
+  thinking?: boolean;
+  enabled: boolean;
+}
+
+export interface MultiExtractorConfig {
+  extractors: ExtractorConfig[];
+  judge: JudgeConfig;
+}
+
+// ============================================================================
+// Result Types
+// ============================================================================
+
+export interface ExtractorResult {
+  extractorId: string;
+  config: ExtractorConfig;
+  issues: ExtractedFallacyIssue[];
+  durationMs: number;
+  costUsd?: number;
+  error?: string;
+}
+
+export interface MultiExtractorResult {
+  extractorResults: ExtractorResult[];
+  totalDurationMs: number;
+  totalIssuesFound: number;
+}
+
+// ============================================================================
+// Config Parsing (standalone implementation)
+// ============================================================================
+
+const DEFAULT_EXTRACTOR_MODEL = 'claude-sonnet-4-5-20250929';
+const DEFAULT_JUDGE_MODEL = 'claude-sonnet-4-5-20250929';
+const DEFAULT_CLAUDE_TEMPERATURE = 0;
+const DEFAULT_OPENROUTER_TEMPERATURE = 0.1;
+
+function isOpenRouterModel(model: string): boolean {
+  return model.includes('/');
+}
+
+export function getDefaultTemperature(model: string): number {
+  return isOpenRouterModel(model)
+    ? DEFAULT_OPENROUTER_TEMPERATURE
+    : DEFAULT_CLAUDE_TEMPERATURE;
+}
+
+export function generateExtractorLabel(config: ExtractorConfig): string {
+  if (config.label) {
+    return config.label;
+  }
+
+  let shortName: string;
+  if (isOpenRouterModel(config.model)) {
+    const parts = config.model.split('/');
+    shortName = parts[parts.length - 1].replace('-preview', '').replace('-latest', '');
+  } else {
+    if (config.model.includes('opus')) {
+      shortName = 'opus';
+    } else if (config.model.includes('sonnet')) {
+      shortName = 'sonnet';
+    } else if (config.model.includes('haiku')) {
+      shortName = 'haiku';
+    } else {
+      shortName = config.model.slice(0, 10);
+    }
+  }
+
+  const suffixParts: string[] = [];
+
+  if (config.temperature === 'default') {
+    suffixParts.push('tDef');
+  } else {
+    const defaultTemp = getDefaultTemperature(config.model);
+    const temp = config.temperature ?? defaultTemp;
+    if (temp !== defaultTemp) {
+      suffixParts.push(`t${temp}`);
+    }
+  }
+
+  if (config.thinking === false) {
+    suffixParts.push('noThink');
+  }
+
+  if (suffixParts.length > 0) {
+    return `${shortName}-${suffixParts.join('-')}`;
+  }
+
+  return shortName;
+}
+
+export function generateExtractorId(
+  config: ExtractorConfig,
+  index: number,
+  allConfigs: ExtractorConfig[]
+): string {
+  const label = generateExtractorLabel(config);
+  const sameLabels = allConfigs.filter(c => generateExtractorLabel(c) === label);
+  if (sameLabels.length > 1) {
+    return `${label}-${index}`;
+  }
+  return label;
+}
+
+function parseExtractorsEnvVar(envValue: string): ExtractorConfig[] {
+  try {
+    const parsed = JSON.parse(envValue);
+
+    if (!Array.isArray(parsed)) {
+      console.warn(
+        '[MultiExtractor] FALLACY_EXTRACTORS must be a JSON array, using defaults'
+      );
+      return [];
+    }
+
+    const configs: ExtractorConfig[] = [];
+    for (const item of parsed) {
+      if (typeof item !== 'object' || item === null) {
+        continue;
+      }
+
+      if (typeof item.model !== 'string' || !item.model) {
+        continue;
+      }
+
+      const config: ExtractorConfig = {
+        model: item.model,
+      };
+
+      if (typeof item.temperature === 'number') {
+        config.temperature = item.temperature;
+      } else if (item.temperature === 'default') {
+        config.temperature = 'default';
+      }
+
+      if (typeof item.label === 'string' && item.label) {
+        config.label = item.label;
+      }
+
+      if (typeof item.thinking === 'boolean') {
+        config.thinking = item.thinking;
+      }
+
+      configs.push(config);
+    }
+
+    return configs;
+  } catch (error) {
+    console.warn(
+      '[MultiExtractor] Failed to parse FALLACY_EXTRACTORS:',
+      error instanceof Error ? error.message : error
+    );
+    return [];
+  }
+}
+
+/**
+ * Parse FALLACY_JUDGE env var
+ */
+function parseJudgeEnvVar(): JudgeConfig {
+  const judgeEnv = process.env.FALLACY_JUDGE;
+
+  if (judgeEnv) {
+    try {
+      const parsed = JSON.parse(judgeEnv);
+      if (typeof parsed === 'object' && parsed !== null && typeof parsed.model === 'string') {
+        return {
+          model: parsed.model,
+          temperature: typeof parsed.temperature === 'number' ? parsed.temperature :
+                       parsed.temperature === 'default' ? 'default' : undefined,
+          thinking: typeof parsed.thinking === 'boolean' ? parsed.thinking : undefined,
+          enabled: parsed.enabled !== false,
+        };
+      }
+    } catch (e) {
+      console.warn('[Config] Failed to parse FALLACY_JUDGE:', e);
+    }
+  }
+
+  // Default: disabled
+  return {
+    model: DEFAULT_JUDGE_MODEL,
+    enabled: false,
+  };
+}
+
+export function getMultiExtractorConfig(): MultiExtractorConfig {
+  const extractorsEnv = process.env.FALLACY_EXTRACTORS;
+
+  let extractors: ExtractorConfig[];
+
+  if (extractorsEnv) {
+    extractors = parseExtractorsEnvVar(extractorsEnv);
+    if (extractors.length === 0) {
+      extractors = [{ model: DEFAULT_EXTRACTOR_MODEL }];
+    }
+  } else {
+    extractors = [{ model: DEFAULT_EXTRACTOR_MODEL }];
+  }
+
+  return {
+    extractors,
+    judge: parseJudgeEnvVar(),
+  };
+}
+
+export function getConfigSummary(): string {
+  const config = getMultiExtractorConfig();
+
+  const formatTemp = (ext: ExtractorConfig): string => {
+    if (ext.temperature === 'default') return 'default';
+    return String(ext.temperature ?? getDefaultTemperature(ext.model));
+  };
+
+  const formatThinking = (ext: ExtractorConfig): string => {
+    return ext.thinking === false ? ', think=off' : '';
+  };
+
+  if (config.extractors.length === 1) {
+    const ext = config.extractors[0];
+    return `Single extractor: ${ext.model} (t=${formatTemp(ext)}${formatThinking(ext)})`;
+  }
+
+  const extractorSummaries = config.extractors.map((ext, i) => {
+    const label = generateExtractorLabel(ext);
+    return `${i + 1}. ${label} (${ext.model}, t=${formatTemp(ext)}${formatThinking(ext)})`;
+  });
+
+  const judgeStatus = config.judge.enabled
+    ? `${config.judge.model} (t=${config.judge.temperature ?? 'default'}, think=${config.judge.thinking !== false})`
+    : 'disabled';
+
+  return [
+    `Multi-extractor mode: ${config.extractors.length} extractors`,
+    ...extractorSummaries,
+    `Judge: ${judgeStatus}`,
+  ].join('\n');
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/multiExtractor.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/multiExtractor.ts
new file mode 100644
index 000000000..9e4ec60e9
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/multiExtractor.ts
@@ -0,0 +1,359 @@
+/**
+ * Multi-Extractor Runner
+ *
+ * Runs multiple fallacy extractors in parallel and aggregates results.
+ * Supports different models and/or temperatures for diversity.
+ */
+
+import { logger } from '../../../../shared/logger';
+import fallacyExtractorTool from '../../../../tools/fallacy-extractor';
+import type { ExtractedFallacyIssue } from '../../../../tools/fallacy-extractor/types';
+import type {
+  ExtractorConfig,
+  MultiExtractorConfig,
+  ExtractorResult,
+  MultiExtractorResult,
+  ExtractionThresholds,
+  ReasoningConfig,
+} from './types';
+import { generateExtractorId, getDefaultTemperature } from './config';
+import { JACCARD_SIMILARITY_THRESHOLD, type ReasoningEffort } from '../../../../types/common';
+
+interface ResolvedReasoning {
+  thinkingEnabled: boolean;
+  reasoningEffort?: ReasoningEffort;
+}
+
+interface DeduplicationResult {
+  deduplicated: ExtractedFallacyIssue[];
+  removedCount: number;
+}
+
+interface DuplicateMatch<T> {
+  keptIdx: number;
+  kept: T;
+  similarity: number;
+}
+
+/**
+ * Resolve reasoning config to thinking boolean and reasoning effort level.
+ *
+ * @param reasoning - The reasoning config from profile
+ * @param thinking - The deprecated thinking boolean (fallback)
+ * @returns Object with thinkingEnabled and optional reasoningEffort
+ */
+function resolveReasoning(
+  reasoning: ReasoningConfig | undefined,
+  thinking?: boolean
+): ResolvedReasoning {
+  // New reasoning config takes precedence
+  if (reasoning !== undefined) {
+    // false = disabled
+    if (reasoning === false) {
+      return { thinkingEnabled: false, reasoningEffort: 'none' };
+    }
+    // Effort level specified
+    if ('effort' in reasoning) {
+      return { thinkingEnabled: true, reasoningEffort: reasoning.effort };
+    }
+    // Budget tokens specified - use xhigh (we can't pass custom budget to OpenRouter)
+    if ('budget_tokens' in reasoning) {
+      return { thinkingEnabled: true, reasoningEffort: 'xhigh' };
+    }
+  }
+
+  // Fall back to legacy thinking boolean (default true)
+  if (thinking === false) {
+    return { thinkingEnabled: false, reasoningEffort: 'none' };
+  }
+
+  // Default: enabled without explicit effort (let model decide)
+  return { thinkingEnabled: true };
+}
+
+/**
+ * Run a single extractor with the given configuration
+ */
+async function runSingleExtractor(
+  documentText: string,
+  config: ExtractorConfig,
+  extractorId: string,
+  thresholds?: ExtractionThresholds
+): Promise<ExtractorResult> {
+  const startTime = Date.now();
+
+  // Handle temperature: "default" means don't pass, undefined means use our default
+  const temperatureForLog = config.temperature === 'default'
+    ? 'default'
+    : (typeof config.temperature === 'number' ? config.temperature : getDefaultTemperature(config.model));
+
+  // Resolve thinking and reasoning effort from config
+  const { thinkingEnabled, reasoningEffort } = resolveReasoning(config.reasoning, config.thinking);
+
+  logger.info(`[MultiExtractor] Starting extractor: ${extractorId}`, {
+    model: config.model,
+    temperature: temperatureForLog,
+    thinking: thinkingEnabled,
+    reasoningEffort,
+    reasoning: config.reasoning,
+    documentLength: documentText.length,
+    minSeverityThreshold: thresholds?.minSeverityThreshold,
+    maxIssues: thresholds?.maxIssues,
+  });
+
+  try {
+    const result = await fallacyExtractorTool.execute(
+      {
+        documentText,
+        model: config.model,
+        // Pass temperature as-is (can be number, "default", or undefined)
+        temperature: config.temperature,
+        // Pass resolved thinking value (new reasoning takes precedence over legacy thinking)
+        thinking: thinkingEnabled,
+        // Pass reasoning effort for OpenRouter models
+        reasoningEffort,
+        // Pass thresholds from profile config
+        minSeverityThreshold: thresholds?.minSeverityThreshold,
+        maxIssues: thresholds?.maxIssues,
+        // Pass provider preferences for OpenRouter
+        ...(config.provider && { provider: config.provider }),
+      },
+      { logger }
+    );
+
+    const durationMs = Date.now() - startTime;
+
+    logger.info(`[MultiExtractor] Extractor ${extractorId} completed`, {
+      issuesFound: result.issues.length,
+      durationMs,
+      wasComplete: result.wasComplete,
+    });
+
+    return {
+      extractorId,
+      config,
+      issues: result.issues,
+      durationMs,
+      actualApiParams: result.actualApiParams,
+      responseMetrics: result.responseMetrics,
+      unifiedUsage: result.unifiedUsage,
+    };
+  } catch (error) {
+    const durationMs = Date.now() - startTime;
+    const errorMessage = error instanceof Error ? error.message : String(error);
+
+    logger.error(`[MultiExtractor] Extractor ${extractorId} failed`, {
+      error: errorMessage,
+      durationMs,
+    });
+
+    return {
+      extractorId,
+      config,
+      issues: [],
+      durationMs,
+      error: errorMessage,
+    };
+  }
+}
+
+/**
+ * Run multiple extractors in parallel
+ *
+ * @param documentText - Full document text to analyze
+ * @param config - Multi-extractor configuration
+ * @returns Combined results from all extractors
+ */
+export async function runMultiExtractor(
+  documentText: string,
+  config: MultiExtractorConfig
+): Promise<MultiExtractorResult> {
+  const startTime = Date.now();
+  const { extractors } = config;
+
+  logger.info(`[MultiExtractor] Starting parallel extraction`, {
+    extractorCount: extractors.length,
+    documentLength: documentText.length,
+  });
+
+  // Generate unique IDs for each extractor
+  const extractorsWithIds = extractors.map((ext, index) => ({
+    config: ext,
+    extractorId: generateExtractorId(ext, index, extractors),
+  }));
+
+  // Run all extractors in parallel
+  const extractorPromises = extractorsWithIds.map(({ config: extConfig, extractorId }) =>
+    runSingleExtractor(documentText, extConfig, extractorId, config.thresholds)
+  );
+
+  const settledResults = await Promise.allSettled(extractorPromises);
+
+  // Process results
+  const extractorResults: ExtractorResult[] = settledResults.map((result, index) => {
+    if (result.status === 'fulfilled') {
+      return result.value;
+    }
+
+    // Promise rejection (shouldn't happen since we catch inside runSingleExtractor)
+    const extConfig = extractorsWithIds[index];
+    return {
+      extractorId: extConfig.extractorId,
+      config: extConfig.config,
+      issues: [],
+      durationMs: 0,
+      error: result.reason instanceof Error ? result.reason.message : String(result.reason),
+    };
+  });
+
+  const totalDurationMs = Date.now() - startTime;
+  const totalIssuesFound = extractorResults.reduce(
+    (sum, r) => sum + r.issues.length,
+    0
+  );
+
+  // Log summary
+  const successCount = extractorResults.filter((r) => !r.error).length;
+  const failedCount = extractorResults.filter((r) => r.error).length;
+
+  logger.info(`[MultiExtractor] Parallel extraction complete`, {
+    totalDurationMs,
+    totalIssuesFound,
+    successCount,
+    failedCount,
+    extractorSummaries: extractorResults.map((r) => ({
+      extractorId: r.extractorId,
+      issuesFound: r.issues.length,
+      durationMs: r.durationMs,
+      error: r.error,
+    })),
+  });
+
+  return {
+    extractorResults,
+    totalDurationMs,
+    totalIssuesFound,
+  };
+}
+
+/**
+ * Flatten all issues from multi-extractor results with source tracking
+ *
+ * @param result - Multi-extractor result
+ * @returns Array of issues with extractorId attached
+ */
+export function flattenExtractorIssues(
+  result: MultiExtractorResult
+): Array<ExtractedFallacyIssue & { extractorId: string }> {
+  const allIssues: Array<ExtractedFallacyIssue & { extractorId: string }> = [];
+
+  for (const extractor of result.extractorResults) {
+    for (const issue of extractor.issues) {
+      allIssues.push({
+        ...issue,
+        extractorId: extractor.extractorId,
+      });
+    }
+  }
+
+  return allIssues;
+}
+
+/**
+ * Normalize text for comparison.
+ */
+function normalizeTextForDedup(text: string | undefined | null): string {
+  if (!text) return '';
+  return text.toLowerCase().replace(/\s+/g, ' ').trim();
+}
+
+/**
+ * Calculate Jaccard similarity between two texts based on word overlap.
+ * Returns a value between 0 (no overlap) and 1 (identical).
+ */
+function calculateJaccardSimilarity(textA: string, textB: string): number {
+  const wordsA = new Set(normalizeTextForDedup(textA).split(/\s+/).filter(Boolean));
+  const wordsB = new Set(normalizeTextForDedup(textB).split(/\s+/).filter(Boolean));
+
+  if (wordsA.size === 0 && wordsB.size === 0) return 1;
+  if (wordsA.size === 0 || wordsB.size === 0) return 0;
+
+  let intersection = 0;
+  for (const word of wordsA) {
+    if (wordsB.has(word)) intersection++;
+  }
+
+  const union = wordsA.size + wordsB.size - intersection;
+  return union > 0 ? intersection / union : 0;
+}
+
+/**
+ * Compute a quality score for an extracted issue.
+ * Higher = better quality (prefer to keep).
+ */
+function computeExtractedIssueQuality(issue: ExtractedFallacyIssue): number {
+  const textLength = issue.exactText.length;
+  const lengthScore = Math.log10(textLength + 1) / 4;
+  const severityNorm = issue.severityScore / 100;
+  const confidenceNorm = issue.confidenceScore / 100;
+  const importanceNorm = issue.importanceScore / 100;
+
+  return (
+    lengthScore * 0.4 +
+    confidenceNorm * 0.25 +
+    severityNorm * 0.2 +
+    importanceNorm * 0.15
+  );
+}
+
+/**
+ * Deduplicate extracted issues using Jaccard word-overlap similarity.
+ * When duplicates are found, keeps the higher-quality issue.
+ *
+ * This runs BEFORE the judge to reduce the number of issues it needs to process.
+ */
+export function deduplicateExtractedIssues(
+  issues: ExtractedFallacyIssue[]
+): DeduplicationResult {
+  // Filter out issues with no text (malformed responses from LLM)
+  const validIssues = issues.filter(issue => issue.exactText && issue.exactText.trim().length > 0);
+  if (validIssues.length < issues.length) {
+    logger.info(`[Dedup] Filtered out ${issues.length - validIssues.length} issues with empty/missing text`);
+  }
+
+  const unique: ExtractedFallacyIssue[] = [];
+
+  for (const issue of validIssues) {
+    let bestMatch: DuplicateMatch<ExtractedFallacyIssue> | null = null;
+
+    for (let i = 0; i < unique.length; i++) {
+      const kept = unique[i];
+      const similarity = calculateJaccardSimilarity(issue.exactText, kept.exactText);
+
+      if (similarity >= JACCARD_SIMILARITY_THRESHOLD) {
+        if (!bestMatch || similarity > bestMatch.similarity) {
+          bestMatch = { keptIdx: i, kept, similarity };
+        }
+      }
+    }
+
+    if (bestMatch) {
+      const newQuality = computeExtractedIssueQuality(issue);
+      const keptQuality = computeExtractedIssueQuality(bestMatch.kept);
+
+      if (newQuality > keptQuality) {
+        unique[bestMatch.keptIdx] = issue;
+      }
+    } else {
+      unique.push(issue);
+    }
+  }
+
+  const duplicatesRemoved = validIssues.length - unique.length;
+  const totalRemoved = issues.length - unique.length;
+  if (duplicatesRemoved > 0) {
+    logger.info(`[Dedup] Reduced ${validIssues.length} issues to ${unique.length} (${duplicatesRemoved} duplicates removed)`);
+  }
+
+  return { deduplicated: unique, removedCount: totalRemoved };
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/types.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/types.ts
new file mode 100644
index 000000000..6400fea49
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/extraction/types.ts
@@ -0,0 +1,325 @@
+/**
+ * Multi-Extractor Types
+ *
+ * Configuration and result types for running multiple fallacy extractors
+ * in parallel with LLM judge aggregation.
+ */
+
+import type { ExtractedFallacyIssue } from '../../../../tools/fallacy-extractor/types';
+import type { UnifiedUsageMetrics } from '../../../../utils/usageMetrics';
+
+// Re-export common types for backwards compatibility
+export {
+  type ReasoningEffort,
+  type ReasoningConfig,
+  type ProviderPreferences,
+  type ActualApiParams,
+  type ApiResponseMetrics,
+  EFFORT_TO_BUDGET_TOKENS,
+} from '../../../../types/common';
+
+import type {
+  ReasoningConfig,
+  ProviderPreferences,
+  ActualApiParams,
+  ApiResponseMetrics,
+} from '../../../../types/common';
+
+// ============================================================================
+// Configuration Types
+// ============================================================================
+
+/**
+ * Configuration for a single extractor instance
+ */
+export interface ExtractorConfig {
+  /** Model ID (Claude or OpenRouter format) */
+  model: string;
+
+  /**
+   * Temperature setting:
+   * - undefined: Use model-specific default (0 for Claude, 0.1 for OpenRouter)
+   * - number: Use this specific temperature
+   * - "default": Let the model use its own default (don't pass temperature)
+   */
+  temperature?: number | 'default';
+
+  /** Optional display label (auto-generated if not provided) */
+  label?: string;
+
+  /**
+   * @deprecated Use reasoning instead
+   * Whether to enable extended thinking/reasoning mode.
+   * - true (default): Enable extended thinking (Claude) / reasoning (OpenRouter/Gemini)
+   * - false: Disable extended thinking for faster, cheaper responses
+   */
+  thinking?: boolean;
+
+  /**
+   * Reasoning/thinking configuration (preferred over thinking boolean)
+   * - undefined: Use default (thinking enabled)
+   * - false: Disable thinking
+   * - { effort: ReasoningEffort }: Use effort level
+   * - { budget_tokens: number }: Custom token budget
+   */
+  reasoning?: ReasoningConfig;
+
+  /**
+   * Provider routing preferences (OpenRouter only)
+   * Allows specifying preferred providers for a model
+   */
+  provider?: ProviderPreferences;
+}
+
+/**
+ * Judge configuration from FALLACY_JUDGE env var
+ *
+ * Example:
+ * FALLACY_JUDGE='{"model":"google/gemini-3-flash-preview","temperature":"default","thinking":false,"enabled":true}'
+ */
+export interface JudgeConfig {
+  /** Model to use (Claude or OpenRouter format) */
+  model: string;
+
+  /** Temperature (number or "default" for model's native default) */
+  temperature?: number | 'default';
+
+  /**
+   * @deprecated Use reasoning instead
+   * Enable extended thinking/reasoning
+   */
+  thinking?: boolean;
+
+  /**
+   * Reasoning/thinking configuration (preferred over thinking boolean)
+   */
+  reasoning?: ReasoningConfig;
+
+  /**
+   * Provider routing preferences (OpenRouter only)
+   */
+  provider?: ProviderPreferences;
+
+  /** Whether the judge is enabled */
+  enabled: boolean;
+}
+
+/**
+ * Threshold configuration for extraction filtering
+ */
+export interface ExtractionThresholds {
+  /** Minimum severity score to include (0-100, default: 60) */
+  minSeverityThreshold?: number;
+
+  /** Maximum issues to return per extractor (default: 15) */
+  maxIssues?: number;
+}
+
+/**
+ * Configuration for multi-extractor execution
+ */
+export interface MultiExtractorConfig {
+  /** List of extractor configurations to run in parallel */
+  extractors: ExtractorConfig[];
+
+  /** Judge configuration */
+  judge: JudgeConfig;
+
+  /** Threshold configuration applied to each extractor */
+  thresholds?: ExtractionThresholds;
+}
+
+// ============================================================================
+// Extractor Result Types
+// ============================================================================
+
+/**
+ * Result from a single extractor run
+ */
+export interface ExtractorResult {
+  /** Unique identifier for this extractor (e.g., "sonnet-t0", "gemini-flash-t0.1") */
+  extractorId: string;
+
+  /** The configuration used for this extractor */
+  config: ExtractorConfig;
+
+  /** Issues extracted by this model */
+  issues: ExtractedFallacyIssue[];
+
+  /** Execution time in milliseconds */
+  durationMs: number;
+
+  /** Cost in USD (if available) */
+  costUsd?: number;
+
+  /** Error message if extraction failed */
+  error?: string;
+
+  /** Actual parameters sent to the API (source of truth) */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics from the API call */
+  responseMetrics?: ApiResponseMetrics;
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
+}
+
+/**
+ * Combined result from running multiple extractors in parallel
+ */
+export interface MultiExtractorResult {
+  /** Results from each extractor */
+  extractorResults: ExtractorResult[];
+
+  /** Wall clock time (parallel execution) */
+  totalDurationMs: number;
+
+  /** Total issues across all extractors (before dedup/judge) */
+  totalIssuesFound: number;
+}
+
+// ============================================================================
+// Judge Types
+// ============================================================================
+
+/**
+ * Reference to an issue from a specific extractor
+ */
+export interface ExtractorIssueRef {
+  extractorId: string;
+  issue: ExtractedFallacyIssue;
+}
+
+/**
+ * An issue after judge evaluation with provenance tracking
+ */
+export interface JudgedIssue {
+  /** The final merged/selected issue */
+  issue: ExtractedFallacyIssue;
+
+  /** Which extractors found this or similar issues */
+  sourceExtractors: string[];
+
+  /** The original issues that were merged/deduplicated into this one */
+  originalIssues: ExtractorIssueRef[];
+
+  /** Judge's decision */
+  decision: 'accepted' | 'merged' | 'rejected';
+
+  /** Judge's reasoning for this decision */
+  reasoning: string;
+}
+
+/**
+ * Output from the LLM judge aggregator
+ */
+export interface JudgeOutput {
+  /** Issues accepted by the judge */
+  acceptedIssues: JudgedIssue[];
+
+  /** Issues rejected by the judge (for telemetry) */
+  rejectedIssues: JudgedIssue[];
+
+  /** Judge execution time */
+  durationMs: number;
+
+  /** Judge cost in USD (if available) */
+  costUsd?: number;
+
+  /** Unified usage metrics for the judge */
+  unifiedUsage?: UnifiedUsageMetrics;
+}
+
+// ============================================================================
+// Telemetry Types
+// ============================================================================
+
+/**
+ * Telemetry for a single extractor
+ */
+export interface ExtractorTelemetry {
+  extractorId: string;
+  model: string;
+
+  /**
+   * Effective temperature used for this extractor.
+   * This is the actual value sent to the API (resolved from config).
+   */
+  temperature: number;
+
+  /**
+   * Original temperature configuration.
+   * - "default": Model's native default was used
+   * - number: Explicit temperature was configured
+   * - undefined: Our model-specific default was used
+   */
+  temperatureConfig?: number | 'default';
+
+  /**
+   * Whether extended thinking/reasoning was enabled.
+   * - true: Thinking enabled (Claude) / high reasoning (OpenRouter)
+   * - false: Thinking disabled for faster, cheaper responses
+   */
+  thinkingEnabled: boolean;
+
+  issuesFound: number;
+  durationMs: number;
+  costUsd?: number;
+  error?: string;
+
+  /** Breakdown of issues by type */
+  issuesByType: Record<string, number>;
+}
+
+/**
+ * Record of a judge decision (for drill-down)
+ */
+export interface JudgeDecisionRecord {
+  /** The quoted text from the issue */
+  issueText: string;
+
+  /** Issue type (e.g., "logical-fallacy", "missing-context") */
+  issueType: string;
+
+  /** Judge's decision */
+  decision: 'accepted' | 'merged' | 'rejected';
+
+  /** Judge's reasoning */
+  reasoning: string;
+
+  /** Which extractors found this issue */
+  sourceExtractors: string[];
+
+  /** Final severity after judge assessment */
+  finalSeverity?: number;
+
+  /** Final confidence after judge assessment */
+  finalConfidence?: number;
+}
+
+/**
+ * Complete telemetry for the extraction phase
+ */
+export interface ExtractionPhaseTelemetry {
+  /** Per-extractor breakdown */
+  extractors: ExtractorTelemetry[];
+
+  /** Total issues before judge aggregation */
+  totalIssuesBeforeJudge: number;
+
+  /** Total issues after judge aggregation */
+  totalIssuesAfterJudge: number;
+
+  /** Model used for judge */
+  judgeModel: string;
+
+  /** Judge execution time */
+  judgeDurationMs: number;
+
+  /** Judge cost in USD */
+  judgeCostUsd?: number;
+
+  /** Detailed decisions for drill-down */
+  judgeDecisions: JudgeDecisionRecord[];
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/index.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/index.ts
index 14a46db7a..665218a0f 100644
--- a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/index.ts
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/index.ts
@@ -1,20 +1,72 @@
-import {
-  getGlobalSessionManager,
-} from "../../../helicone/simpleSessionManager";
 import { logger } from "../../../shared/logger";
-import type { Comment, ToolChainResult } from "../../../shared/types";
+import type { Comment } from "../../../shared/types";
 import fallacyExtractorTool from "../../../tools/fallacy-extractor";
+import type { ExtractedFallacyIssue } from "../../../tools/fallacy-extractor/types";
 import fuzzyTextLocatorTool from "../../../tools/smart-text-searcher";
 import fallacyReviewTool from "../../../tools/fallacy-review";
+import supportedElsewhereFilterTool from "../../../tools/supported-elsewhere-filter";
+import principleOfCharityFilterTool from "../../../tools/principle-of-charity-filter";
+import fallacyJudgeTool from "../../../tools/fallacy-judge";
+import { decisionToIssue } from "../../../tools/fallacy-judge/types";
 import { TextChunk } from "../../TextChunk";
 import type {
   AnalysisResult,
   RoutingExample,
   SimpleAnalysisPlugin,
 } from "../../types";
-import { LIMITS, THRESHOLDS, ISSUE_TYPES } from "./constants";
+import { THRESHOLDS, ISSUE_TYPES } from "./constants";
 import { buildFallacyComment } from "./comments/builder";
 import { FallacyIssue } from "./FallacyIssue";
+import {
+  PipelineTelemetry,
+  PIPELINE_STAGES,
+  type PipelineExecutionRecord,
+  type ExtractionPhaseTelemetry,
+  type ExtractorTelemetry,
+  type JudgeDecisionRecord,
+  type ActualApiParams,
+  type ApiResponseMetrics,
+} from "./telemetry";
+import {
+  getMultiExtractorConfigFromProfile,
+  getDefaultTemperature,
+  getConfigSummary,
+} from "./extraction/config";
+import { runMultiExtractor, deduplicateExtractedIssues } from "./extraction/multiExtractor";
+import type { MultiExtractorConfig, ExtractorConfig, JudgeConfig } from "./extraction/types";
+import { prioritizeAndLimitIssues } from "./dedup";
+import type {
+  FallacyCheckerProfileConfig,
+  SupportedElsewhereFilterConfig,
+  PrincipleOfCharityFilterConfig,
+  ReasoningConfig,
+  FilterChainItem,
+} from "./profile-types";
+import { createDefaultProfileConfig } from "./profile-types";
+import { loadProfileOrDefault } from "./profile-loader";
+
+/**
+ * Options for FallacyCheckPlugin
+ */
+export interface FallacyCheckPluginOptions {
+  /**
+   * Profile ID to load from database.
+   * If provided, profile config is loaded from the database.
+   */
+  profileId?: string;
+
+  /**
+   * Agent ID used to load default profile if profileId is not found.
+   * Defaults to "system-fallacy-check".
+   */
+  agentId?: string;
+
+  /**
+   * Direct profile config - bypasses database loading.
+   * Use this for testing or when the config is already loaded.
+   */
+  profileConfig?: FallacyCheckerProfileConfig;
+}
 
 export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
   private documentText: string;
@@ -25,11 +77,91 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
   private summary: string = "";
   private analysis: string = "";
   private processingStartTime: number = 0;
+  private telemetryRecord: PipelineExecutionRecord | null = null;
+
+  // Profile configuration
+  private options: FallacyCheckPluginOptions;
+  private profileConfig: FallacyCheckerProfileConfig | null = null;
+  private profileLoaded = false;
 
-  constructor() {
-    // Initialize empty values - they'll be set in analyze()
+  constructor(options: FallacyCheckPluginOptions = {}) {
     this.documentText = "";
     this.chunks = [];
+    this.options = options;
+  }
+
+  /**
+   * Load the profile configuration.
+   * Called at the start of analyze() to ensure profile is loaded before use.
+   */
+  private async loadProfile(): Promise<FallacyCheckerProfileConfig> {
+    if (this.profileLoaded && this.profileConfig) {
+      return this.profileConfig;
+    }
+
+    // If config was provided directly, use it
+    if (this.options.profileConfig) {
+      this.profileConfig = this.options.profileConfig;
+      this.profileLoaded = true;
+      return this.profileConfig;
+    }
+
+    // If profileId is provided, load from database
+    if (this.options.profileId || this.options.agentId) {
+      try {
+        this.profileConfig = await loadProfileOrDefault(
+          this.options.profileId,
+          this.options.agentId || 'system-fallacy-check'
+        );
+        this.profileLoaded = true;
+        logger.info('FallacyCheckPlugin: Loaded profile config', {
+          profileId: this.options.profileId,
+          agentId: this.options.agentId,
+          hasProfileConfig: !!this.profileConfig,
+        });
+        return this.profileConfig;
+      } catch (error) {
+        logger.warn('FallacyCheckPlugin: Failed to load profile, using defaults', error);
+      }
+    }
+
+    // Fall back to default config (uses env vars internally)
+    const defaultConfig = createDefaultProfileConfig();
+    this.profileConfig = defaultConfig;
+    this.profileLoaded = true;
+    return defaultConfig;
+  }
+
+  /**
+   * Resolve thinking boolean from extractor config
+   * Checks reasoning config first, falls back to thinking boolean
+   */
+  private resolveThinkingForExtractor(config: ExtractorConfig | undefined): boolean {
+    if (!config) return true; // default enabled
+
+    // New reasoning config takes precedence
+    if (config.reasoning !== undefined) {
+      if (config.reasoning === false) return false;
+      return true; // any effort level or budget_tokens = enabled
+    }
+
+    // Fall back to legacy thinking boolean (default true)
+    return config.thinking !== false;
+  }
+
+  /**
+   * Resolve thinking boolean for judge config.
+   * Checks reasoning config first, falls back to thinking boolean.
+   */
+  private resolveThinkingForJudge(config: JudgeConfig): boolean {
+    // New reasoning config takes precedence
+    if (config.reasoning !== undefined) {
+      if (config.reasoning === false) return false;
+      return true; // any effort level or budget_tokens = enabled
+    }
+
+    // Fall back to legacy thinking boolean (default true)
+    return config.thinking !== false;
   }
 
   name(): string {
@@ -110,7 +242,6 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
     chunks: TextChunk[],
     documentText: string
   ): Promise<AnalysisResult> {
-    // Store the inputs
     this.processingStartTime = Date.now();
     this.documentText = documentText;
     this.chunks = chunks;
@@ -119,6 +250,22 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
       return this.getResults();
     }
 
+    // Load profile configuration FIRST before any analysis
+    const profileConfig = await this.loadProfile();
+
+    // Initialize telemetry - use local const to avoid repeated null assertions
+    const telemetry = new PipelineTelemetry(documentText.length);
+
+    // Record profile info in telemetry
+    telemetry.setProfileInfo({
+      profileId: this.options.profileId,
+      agentId: this.options.agentId || 'system-fallacy-check',
+      thresholds: profileConfig.thresholds,
+      extractorCount: profileConfig.models.extractors.length,
+      judgeEnabled: profileConfig.models.judge.enabled,
+      hasCustomPrompts: !!profileConfig.prompts,
+    });
+
     try {
       // Audit log: Analysis started
       logger.info("FallacyCheckPlugin: AUDIT: Analysis started", {
@@ -126,143 +273,80 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
         documentLength: documentText.length,
         chunkCount: chunks.length,
         operation: "fallacy-check-analysis",
+        profileId: this.options.profileId,
+        thresholds: profileConfig.thresholds,
+        hasCustomPrompts: !!profileConfig.prompts,
       });
 
-      logger.info("FallacyCheckPlugin: Starting analysis");
-      logger.info(`FallacyCheckPlugin: Processing ${chunks.length} chunks`);
-
-      // Phase 1: Extract epistemic issues from all chunks in parallel
-      const extractionPromises = this.chunks.map((chunk) =>
-        this.extractIssuesFromChunk(chunk)
-      );
+      logger.info("FallacyCheckPlugin: Starting analysis (single-pass mode)", {
+        profileId: this.options.profileId,
+        extractorCount: profileConfig.models.extractors.length,
+        judgeEnabled: profileConfig.models.judge.enabled,
+      });
 
-      const extractionResults = await Promise.allSettled(extractionPromises);
-
-      // Collect all extracted issues and track errors
-      const allIssues: FallacyIssue[] = [];
-      const extractionErrors: string[] = [];
-
-      for (const result of extractionResults) {
-        if (result.status === "fulfilled" && result.value) {
-          allIssues.push(...result.value.issues);
-          if (result.value.error) {
-            extractionErrors.push(result.value.error);
-          }
-        } else if (result.status === "rejected") {
-          const error =
-            result.reason instanceof Error
-              ? result.reason.message
-              : "Unknown extraction error";
-          extractionErrors.push(error);
-          logger.warn(`Issue extraction failed for chunk: ${error}`);
-        }
-      }
+      // Phase 1: Single-pass extraction on full document
+      telemetry.startStage(PIPELINE_STAGES.EXTRACTION, 1); // 1 = full document
+      const extractionResult = await this.extractIssuesFromDocument(documentText, telemetry);
+      const allIssues: FallacyIssue[] = extractionResult.issues;
+      telemetry.endStage(allIssues.length, {
+        error: extractionResult.error,
+        metadata: { documentLength: documentText.length },
+      });
+      telemetry.setFinalCounts({ issuesExtracted: allIssues.length });
 
-      // Log summary of errors if any occurred
-      if (extractionErrors.length > 0) {
-        logger.warn(
-          `Issue extraction completed with ${extractionErrors.length} errors`
-        );
+      if (extractionResult.error) {
+        logger.warn(`Issue extraction completed with error: ${extractionResult.error}`);
       }
 
-      // Audit log: Extraction phase completed
       logger.info("FallacyCheckPlugin: AUDIT: Extraction phase completed", {
         timestamp: new Date().toISOString(),
         issuesExtracted: allIssues.length,
-        extractionErrors: extractionErrors.length,
+        extractionError: extractionResult.error || null,
         phase: "extraction",
       });
 
-      // Deduplicate issues by similar text
-      this.issues = this.deduplicateIssues(allIssues);
-
-      // Phase 2: Generate comments for all issues in parallel
-      const commentPromises = this.issues.map(async (issue) => {
-        // Run in next tick to ensure true parallelism
-        await new Promise((resolve) => setImmediate(resolve));
-        const comment = await buildFallacyComment(
-          issue,
-          documentText,
-          { logger }
-        );
-        // Filter out comments with empty descriptions
-        if (
-          comment &&
-          comment.description &&
-          comment.description.trim() !== ""
-        ) {
-          return comment;
+      // Deduplication now happens inside extraction phase, so allIssues is already deduplicated
+      // Just prioritize and limit
+      const prioritizedIssues = prioritizeAndLimitIssues(allIssues);
+      telemetry.setFinalCounts({ issuesAfterDedup: prioritizedIssues.length });
+
+      // Phase 2: Run filters in filterChain order
+      // Iterate through the filter chain and run each enabled filter
+      let filteredIssues = prioritizedIssues;
+      for (const filterConfig of profileConfig.filterChain) {
+        if (!filterConfig.enabled) {
+          logger.info(`FallacyCheckPlugin: Filter ${filterConfig.type} is disabled, skipping`);
+          continue;
         }
-        return null;
-      });
-
-      const commentResults = await Promise.all(commentPromises);
-      const allComments = commentResults.filter(
-        (comment): comment is Comment => comment !== null
-      );
-
-      // Phase 3: Review and filter comments, generate summaries
-      try {
-        const reviewComments = allComments.map((comment, index) => ({
-          index,
-          header: comment.header || "Epistemic Issue",
-          description: comment.description,
-          level: comment.level || 'warning',
-          importance: comment.importance,
-          quotedText: comment.highlight.quotedText,
-        }));
-
-        // Audit log: Review phase started
-        logger.info("FallacyCheckPlugin: AUDIT: Review phase started", {
-          timestamp: new Date().toISOString(),
-          commentsToReview: allComments.length,
-          phase: "review",
-          operation: "fallacy-review-tool",
-        });
 
-        const reviewResult = await fallacyReviewTool.execute(
-          {
-            documentText,
-            comments: reviewComments,
-          },
-          { logger }
+        filteredIssues = await this.runFilter(
+          filterConfig,
+          filteredIssues,
+          documentText,
+          telemetry
         );
+      }
+      telemetry.setFinalCounts({ issuesAfterFiltering: filteredIssues.length });
 
-        // Filter comments based on review
-        this.comments = reviewResult.commentIndicesToKeep.map(
-          (idx) => allComments[idx]
-        );
+      this.issues = filteredIssues;
 
-        // Use summaries from review
-        this.summary = reviewResult.oneLineSummary;
-        this.analysis = reviewResult.documentSummary;
-
-        // Audit log: Review phase completed
-        logger.info("FallacyCheckPlugin: AUDIT: Review phase completed", {
-          timestamp: new Date().toISOString(),
-          commentsReviewed: allComments.length,
-          commentsKept: this.comments.length,
-          commentsFiltered: allComments.length - this.comments.length,
-          phase: "review",
-        });
+      // Phase 3: Generate comments for all issues in parallel
+      telemetry.startStage(PIPELINE_STAGES.COMMENT_GENERATION, this.issues.length);
+      const allComments = await this.generateCommentsForIssues(this.issues, documentText);
+      telemetry.endStage(allComments.length);
+      telemetry.setFinalCounts({ commentsGenerated: allComments.length });
 
-        logger.info(
-          `FallacyCheckPlugin: Review complete - kept ${this.comments.length}/${allComments.length} comments`
-        );
-      } catch (error) {
-        logger.error("FallacyCheckPlugin: Review failed, using fallback", error);
-        // Fallback: keep all comments and use old summary generation
-        this.comments = allComments;
-        const { summary, analysisSummary } = this.generateAnalysis();
-        this.summary = summary;
-        this.analysis = analysisSummary;
-      }
+      // Phase 4: Review and filter comments, generate summaries
+      telemetry.startStage(PIPELINE_STAGES.REVIEW, allComments.length);
+      await this.reviewAndFilterComments(allComments, documentText, telemetry);
 
       this.hasRun = true;
 
-      const totalDuration = Date.now() - this.processingStartTime;
+      // Finalize telemetry
+      this.telemetryRecord = telemetry.finalize(true);
+      telemetry.logSummary();
 
-      // Audit log: Analysis completed successfully
+      const totalDuration = Date.now() - this.processingStartTime;
       logger.info("FallacyCheckPlugin: AUDIT: Analysis completed", {
         timestamp: new Date().toISOString(),
         totalDurationMs: totalDuration,
@@ -281,6 +365,10 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
       const totalDuration = Date.now() - this.processingStartTime;
       const errorMessage = error instanceof Error ? error.message : String(error);
 
+      // Finalize telemetry with error
+      this.telemetryRecord = telemetry.finalize(false, errorMessage);
+      telemetry.logSummary();
+
       // Audit log: Analysis failed
       logger.error("FallacyCheckPlugin: AUDIT: Analysis failed", {
         timestamp: new Date().toISOString(),
@@ -310,45 +398,219 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
       analysis: this.analysis,
       comments: this.comments,
       cost: 0,
+      // Cast to Record<string, unknown> for JSON serialization
+      pipelineTelemetry: this.telemetryRecord as unknown as Record<string, unknown> | undefined,
     };
   }
 
-  private async extractIssuesFromChunk(chunk: TextChunk): Promise<{
+  /**
+   * Extract issues from the full document in a single pass.
+   * This provides complete context for better accuracy and reduces false positives
+   * from flagging intro claims that are supported later in the document.
+   *
+   * Supports multi-extractor mode when multiple extractors are configured
+   * in the profile or FALLACY_EXTRACTORS env var.
+   */
+  private async extractIssuesFromDocument(
+    documentText: string,
+    telemetry: PipelineTelemetry
+  ): Promise<{
     issues: FallacyIssue[];
     error?: string;
   }> {
+    // Use profile-based config if available, otherwise fall back to env vars
+    // Always use the multi-extractor path - it handles 1+ extractors, does dedup
+    // (useful even for single extractor since LLMs can produce duplicates),
+    // and has proper telemetry capture
+    const config = getMultiExtractorConfigFromProfile(this.profileConfig || undefined);
+    return this.extractWithMultiExtractor(documentText, telemetry, config);
+  }
+
+  /**
+   * Extract issues using the unified extractor path.
+   * Handles 1+ extractors, always does dedup (useful even with 1 extractor),
+   * and optionally runs the judge if enabled.
+   */
+  private async extractWithMultiExtractor(
+    documentText: string,
+    telemetry: PipelineTelemetry,
+    config: MultiExtractorConfig
+  ): Promise<{
+    issues: FallacyIssue[];
+    error?: string;
+  }> {
+    logger.info(`[FallacyCheckPlugin] Starting extraction`, {
+      extractorCount: config.extractors.length,
+      judgeEnabled: config.judge.enabled,
+      minSeverityThreshold: this.profileConfig?.thresholds.minSeverityThreshold,
+      maxIssues: this.profileConfig?.thresholds.maxIssues,
+      hasCustomPrompts: !!this.profileConfig?.prompts,
+    });
+    logger.info(getConfigSummary());
+
     try {
-      // Track tool execution if session manager is available
-      const sessionManager = getGlobalSessionManager();
-      const executeExtraction = async () => {
-        return await fallacyExtractorTool.execute(
-          {
-            text: chunk.text,
-            documentText: this.documentText, // Pass full document for location finding
-            chunkStartOffset: chunk.metadata?.position?.start, // Optimize location finding to search chunk first
-          },
-          {
-            logger,
-          }
+      // Phase 1: Run all extractors in parallel
+      const multiResult = await runMultiExtractor(documentText, {
+        ...config,
+        thresholds: {
+          minSeverityThreshold: this.profileConfig?.thresholds.minSeverityThreshold,
+          maxIssues: this.profileConfig?.thresholds.maxIssues,
+        },
+      });
+
+      // Collect telemetry for each extractor
+      const extractorsTelemetry: ExtractorTelemetry[] = multiResult.extractorResults.map(
+        (r) => ({
+          extractorId: r.extractorId,
+          model: r.config.model,
+          // Resolve temperature for telemetry: "default" -> model default, number -> use as-is
+          temperature: typeof r.config.temperature === 'number'
+            ? r.config.temperature
+            : getDefaultTemperature(r.config.model),
+          // Store original config for display
+          temperatureConfig: r.config.temperature,
+          thinkingEnabled: r.config.thinking !== false,
+          issuesFound: r.issues.length,
+          durationMs: r.durationMs,
+          // Get cost from unified usage (preferred) or legacy costUsd field
+          costUsd: r.unifiedUsage?.costUsd ?? r.costUsd,
+          error: r.error,
+          issuesByType: this.countIssuesByType(r.issues),
+          // Include actual API params and response metrics for UI display
+          actualApiParams: r.actualApiParams,
+          responseMetrics: r.responseMetrics,
+          // Include unified usage for detailed cost/token tracking
+          unifiedUsage: r.unifiedUsage,
+        })
+      );
+
+      // Phase 2: Deduplicate all issues using Jaccard similarity
+      const successfulExtractors = multiResult.extractorResults.filter((r) => !r.error);
+      const allExtractedIssues = successfulExtractors.flatMap((r) => r.issues);
+
+      let finalIssues: ExtractedFallacyIssue[];
+      let judgeDecisions: JudgeDecisionRecord[] = [];
+      let judgeDurationMs: number | undefined;
+      let judgeCostUsd: number | undefined;
+      let judgeUnifiedUsage: typeof multiResult.extractorResults[0]['unifiedUsage'];
+      let judgeActualApiParams: ActualApiParams | undefined;
+      let judgeResponseMetrics: ApiResponseMetrics | undefined;
+      let issuesAfterDedup = allExtractedIssues.length;
+
+      if (allExtractedIssues.length === 0) {
+        finalIssues = [];
+      } else {
+        // Always run Jaccard deduplication first
+        const dedupResult = deduplicateExtractedIssues(allExtractedIssues);
+        issuesAfterDedup = dedupResult.deduplicated.length;
+
+        logger.info(
+          `[FallacyCheckPlugin] Deduplication: ${allExtractedIssues.length} → ${issuesAfterDedup} issues (${dedupResult.removedCount} duplicates removed)`
         );
+
+        if (!config.judge.enabled) {
+          // Judge disabled - deduplication is the final step
+          logger.info(`[FallacyCheckPlugin] Judge disabled, using deduplicated issues`);
+          finalIssues = dedupResult.deduplicated;
+        } else {
+          // Judge enabled - run judge on deduplicated issues
+          const judgeInput = {
+            documentText,
+            issues: dedupResult.deduplicated.map((issue) => ({
+              extractorId: 'deduped', // Issues are already merged
+              exactText: issue.exactText,
+              issueType: issue.issueType,
+              fallacyType: issue.fallacyType,
+              severityScore: issue.severityScore,
+              confidenceScore: issue.confidenceScore,
+              importanceScore: issue.importanceScore,
+              reasoning: issue.reasoning,
+            })),
+            extractorIds: successfulExtractors.map((r) => r.extractorId),
+            // Pass judge config from profile to avoid env var fallback
+            judgeConfig: {
+              model: config.judge.model,
+              temperature: config.judge.temperature,
+              thinking: this.resolveThinkingForJudge(config.judge),
+              reasoning: config.judge.reasoning,
+              provider: config.judge.provider,
+              enabled: true, // We're inside the enabled branch
+            },
+          };
+
+          logger.info(
+            `[FallacyCheckPlugin] Running LLM judge on ${judgeInput.issues.length} deduplicated issues`
+          );
+
+          const judgeStartTime = Date.now();
+          const judgeResult = await fallacyJudgeTool.execute(judgeInput, { logger });
+          judgeDurationMs = Date.now() - judgeStartTime;
+          // Get cost and unified usage from judge result
+          judgeCostUsd = judgeResult.unifiedUsage?.costUsd;
+          judgeUnifiedUsage = judgeResult.unifiedUsage;
+          judgeActualApiParams = judgeResult.actualApiParams;
+          judgeResponseMetrics = judgeResult.responseMetrics;
+
+          // Convert judge decisions to issues
+          finalIssues = judgeResult.acceptedDecisions.map((d) => decisionToIssue(d));
+
+          // Record judge decisions for telemetry
+          judgeDecisions = [
+            ...judgeResult.acceptedDecisions.map((d) => ({
+              issueText: d.finalText,
+              issueType: d.finalIssueType,
+              decision: (d.decision === 'accept' || d.decision === 'merge' ? 'accepted' : 'rejected') as 'accepted' | 'merged' | 'rejected',
+              reasoning: d.judgeReasoning,
+              sourceExtractors: d.sourceExtractors,
+              finalSeverity: d.finalSeverity,
+              finalConfidence: d.finalConfidence,
+            })),
+            ...judgeResult.rejectedDecisions.map((d) => ({
+              issueText: d.finalText,
+              issueType: d.finalIssueType,
+              decision: 'rejected' as const,
+              reasoning: d.judgeReasoning,
+              sourceExtractors: d.sourceExtractors,
+              finalSeverity: d.finalSeverity,
+              finalConfidence: d.finalConfidence,
+            })),
+          ];
+
+          logger.info(
+            `[FallacyCheckPlugin] Judge aggregation complete: ${finalIssues.length} accepted, ${judgeResult.rejectedDecisions.length} rejected`
+          );
+        }
+      }
+
+      // Record extraction phase telemetry
+      const extractionTelemetry: ExtractionPhaseTelemetry = {
+        multiExtractorEnabled: true,
+        extractors: extractorsTelemetry,
+        totalIssuesBeforeJudge: multiResult.totalIssuesFound,
+        totalIssuesAfterDedup: issuesAfterDedup,
+        totalIssuesAfterJudge: finalIssues.length,
+        judgeModel: config.judge.model,
+        judgeDurationMs,
+        judgeCostUsd,
+        judgeUnifiedUsage,
+        judgeActualApiParams,
+        judgeResponseMetrics,
+        judgeDecisions,
       };
+      telemetry.setExtractionPhase(extractionTelemetry);
 
-      const result = sessionManager
-        ? await sessionManager.trackTool(
-            "extract-fallacy-issues",
-            executeExtraction
-          )
-        : await executeExtraction();
+      // Create FallacyIssue objects
+      const fullDocChunk = new TextChunk("full-document", documentText, {
+        position: { start: 0, end: documentText.length },
+      });
 
-      const issues = result.issues.map(
-        (issue) => new FallacyIssue(issue, chunk, this.processingStartTime)
+      const issues = finalIssues.map(
+        (issue) => new FallacyIssue(issue, fullDocChunk, this.processingStartTime)
       );
 
-      return {
-        issues,
-      };
+      return { issues };
     } catch (error) {
-      logger.error("Error extracting issues from chunk:", error);
+      logger.error("Error in multi-extractor mode:", error);
       return {
         issues: [],
         error: error instanceof Error ? error.message : "Unknown error",
@@ -356,56 +618,463 @@ export class FallacyCheckPlugin implements SimpleAnalysisPlugin {
     }
   }
 
-  private deduplicateIssues(issues: FallacyIssue[]): FallacyIssue[] {
-    const seen = new Set<string>();
-    const unique: FallacyIssue[] = [];
-
+  /**
+   * Count issues by type for telemetry
+   */
+  private countIssuesByType(issues: ExtractedFallacyIssue[]): Record<string, number> {
+    const counts: Record<string, number> = {};
     for (const issue of issues) {
-      const key = issue.text.toLowerCase().replace(/\s+/g, " ").trim();
-      if (!seen.has(key)) {
-        seen.add(key);
-        unique.push(issue);
+      counts[issue.issueType] = (counts[issue.issueType] || 0) + 1;
+    }
+    return counts;
+  }
+
+  /**
+   * Dispatch to the appropriate filter based on the filter config type.
+   * This enables dynamic filter chain ordering.
+   */
+  private async runFilter(
+    filterConfig: FilterChainItem,
+    issues: FallacyIssue[],
+    documentText: string,
+    telemetry: PipelineTelemetry
+  ): Promise<FallacyIssue[]> {
+    // Get the stage name for telemetry
+    const stageName = this.getFilterStageName(filterConfig.type);
+
+    logger.info(`FallacyCheckPlugin: AUDIT: ${filterConfig.type} filter started`, {
+      timestamp: new Date().toISOString(),
+      issuesToFilter: issues.length,
+      phase: stageName,
+      type: filterConfig.type,
+    });
+
+    switch (filterConfig.type) {
+      case 'principle-of-charity':
+        telemetry.startStage(stageName, issues.length);
+        return this.runPrincipleOfCharityFilter(
+          issues,
+          documentText,
+          telemetry,
+          filterConfig
+        );
+
+      case 'supported-elsewhere':
+        telemetry.startStage(stageName, issues.length);
+        return this.runSupportedElsewhereFilter(
+          issues,
+          documentText,
+          telemetry,
+          filterConfig
+        );
+
+      case 'dedup':
+        // Dedup is handled in extraction phase, but if someone re-adds it here, just pass through
+        logger.info("FallacyCheckPlugin: Dedup filter in chain (already handled in extraction phase)");
+        return issues;
+
+      case 'severity':
+        // Severity filtering - filter by minimum severity threshold
+        const minSeverity = filterConfig.minSeverity;
+        const afterSeverity = issues.filter((issue) => issue.severityScore >= minSeverity);
+        logger.info(`FallacyCheckPlugin: Severity filter: ${issues.length} → ${afterSeverity.length} (min: ${minSeverity})`);
+        return afterSeverity;
+
+      case 'confidence':
+        // Confidence filtering - filter by minimum confidence threshold
+        const minConfidence = filterConfig.minConfidence;
+        const afterConfidence = issues.filter((issue) => issue.confidenceScore >= minConfidence);
+        logger.info(`FallacyCheckPlugin: Confidence filter: ${issues.length} → ${afterConfidence.length} (min: ${minConfidence})`);
+        return afterConfidence;
+
+      case 'review':
+        // Review filter is handled later in the pipeline (after comment generation)
+        logger.info("FallacyCheckPlugin: Review filter in chain (handled after comment generation)");
+        return issues;
+
+      default:
+        // Exhaustive check - TypeScript will error if we miss a case
+        const _exhaustive: never = filterConfig;
+        logger.warn(`FallacyCheckPlugin: Unknown filter type, skipping: ${(filterConfig as FilterChainItem).type}`);
+        return issues;
+    }
+  }
+
+  /**
+   * Get the telemetry stage name for a filter type
+   */
+  private getFilterStageName(filterType: string): string {
+    switch (filterType) {
+      case 'principle-of-charity':
+        return PIPELINE_STAGES.PRINCIPLE_OF_CHARITY_FILTER;
+      case 'supported-elsewhere':
+        return PIPELINE_STAGES.SUPPORTED_ELSEWHERE_FILTER;
+      default:
+        return `${filterType}-filter`;
+    }
+  }
+
+  /**
+   * Run the principle-of-charity filter to remove issues that dissolve under charitable interpretation
+   */
+  private async runPrincipleOfCharityFilter(
+    issues: FallacyIssue[],
+    documentText: string,
+    telemetry: PipelineTelemetry,
+    filterConfig?: PrincipleOfCharityFilterConfig
+  ): Promise<FallacyIssue[]> {
+    try {
+      // Build filter input with config settings
+      const filterInput: {
+        documentText: string;
+        issues: Array<{
+          quotedText: string;
+          issueType: string;
+          reasoning: string;
+          locationOffset?: number;
+        }>;
+        model?: string;
+        temperature?: number;
+        reasoning?: ReasoningConfig;
+        provider?: { order?: string[]; allow_fallbacks?: boolean };
+        customPrompt?: string;
+      } = {
+        documentText,
+        issues: issues.map((issue) => ({
+          quotedText: issue.text,
+          issueType: issue.issueType,
+          reasoning: issue.issue.reasoning,
+          locationOffset: issue.issue.location?.startOffset,
+        })),
+      };
+
+      // Apply config settings if provided
+      if (filterConfig) {
+        if (filterConfig.model) {
+          filterInput.model = filterConfig.model;
+        }
+        if (filterConfig.temperature !== undefined && filterConfig.temperature !== 'default') {
+          filterInput.temperature = filterConfig.temperature;
+        }
+        if (filterConfig.reasoning !== undefined) {
+          filterInput.reasoning = filterConfig.reasoning;
+        }
+        if (filterConfig.provider) {
+          filterInput.provider = filterConfig.provider;
+        }
+        if (filterConfig.customPrompt) {
+          filterInput.customPrompt = filterConfig.customPrompt;
+        }
+      }
+
+      const filterResult = await principleOfCharityFilterTool.execute(
+        filterInput,
+        { logger }
+      );
+
+      // Keep only the issues that remain valid under charitable interpretation
+      const validIndices = new Set(
+        filterResult.validIssues.map((r) => r.index)
+      );
+      const filteredIssues = issues.filter((_, idx) =>
+        validIndices.has(idx)
+      );
+
+      // Log and record what was filtered
+      const dissolvedCount = filterResult.dissolvedIssues.length;
+      if (dissolvedCount > 0) {
+        logger.info(
+          `FallacyCheckPlugin: Filtered out ${dissolvedCount} issues (dissolved under charitable interpretation)`
+        );
+
+        // Record filtered items with their reasoning for telemetry
+        const filteredRecords = filterResult.dissolvedIssues.map((dissolved) => {
+          const originalIssue = issues[dissolved.index];
+          logger.debug(`  - Issue ${dissolved.index}: ${dissolved.explanation}`);
+          return {
+            stage: PIPELINE_STAGES.PRINCIPLE_OF_CHARITY_FILTER,
+            quotedText: originalIssue.text || `Issue at index ${dissolved.index}`,
+            header: originalIssue.issueType,
+            filterReason: `Charitable interpretation: ${dissolved.charitableInterpretation}. ${dissolved.explanation}`,
+            originalIndex: dissolved.index,
+          };
+        });
+        telemetry.recordFilteredItems(filteredRecords);
       }
+
+      // Record passed items (issues that remain valid)
+      if (filterResult.validIssues.length > 0) {
+        const passedRecords = filterResult.validIssues.map((valid) => {
+          const originalIssue = issues[valid.index];
+          return {
+            stage: PIPELINE_STAGES.PRINCIPLE_OF_CHARITY_FILTER,
+            quotedText: originalIssue.text || `Issue at index ${valid.index}`,
+            header: originalIssue.issueType,
+            passReason: valid.explanation,
+            originalIndex: valid.index,
+          };
+        });
+        telemetry.recordPassedItems(passedRecords);
+      }
+
+      logger.info("FallacyCheckPlugin: AUDIT: Principle-of-charity filter completed", {
+        timestamp: new Date().toISOString(),
+        issuesBeforeFilter: issues.length,
+        issuesAfterFilter: filteredIssues.length,
+        issuesFiltered: dissolvedCount,
+        phase: "principle-of-charity-filter",
+        costUsd: filterResult.unifiedUsage?.costUsd,
+      });
+
+      telemetry.endStage(filteredIssues.length, {
+        costUsd: filterResult.unifiedUsage?.costUsd,
+        actualApiParams: filterResult.actualApiParams,
+        responseMetrics: filterResult.responseMetrics,
+        unifiedUsage: filterResult.unifiedUsage,
+      });
+      return filteredIssues;
+    } catch (error) {
+      logger.warn("FallacyCheckPlugin: Principle-of-charity filter failed, keeping all issues", error);
+      telemetry.endStage(issues.length, {
+        error: error instanceof Error ? error.message : String(error),
+      });
+      return issues;
     }
+  }
 
-    // Calculate priority score for each issue
-    // Higher score = more important to address
-    const priorityScore = (issue: FallacyIssue) =>
-      issue.severityScore * 0.6 + issue.importanceScore * 0.4;
+  /**
+   * Run the supported-elsewhere filter to remove false positives
+   */
+  private async runSupportedElsewhereFilter(
+    issues: FallacyIssue[],
+    documentText: string,
+    telemetry: PipelineTelemetry,
+    filterConfig?: SupportedElsewhereFilterConfig
+  ): Promise<FallacyIssue[]> {
+    try {
+      // Build filter input with config settings
+      const filterInput: {
+        documentText: string;
+        issues: Array<{
+          quotedText: string;
+          issueType: string;
+          reasoning: string;
+          locationOffset?: number;
+        }>;
+        model?: string;
+        temperature?: number;
+        reasoning?: ReasoningConfig;
+        provider?: { order?: string[]; allow_fallbacks?: boolean };
+        customPrompt?: string;
+      } = {
+        documentText,
+        issues: issues.map((issue) => ({
+          quotedText: issue.text,
+          issueType: issue.issueType,
+          reasoning: issue.issue.reasoning,
+          locationOffset: issue.issue.location?.startOffset,
+        })),
+      };
 
-    // Sort by priority score (most important issues first)
-    const sortedIssues = unique.sort(
-      (a, b) => priorityScore(b) - priorityScore(a)
-    );
+      // Apply config settings if provided
+      if (filterConfig) {
+        if (filterConfig.model) {
+          filterInput.model = filterConfig.model;
+        }
+        if (filterConfig.temperature !== undefined && filterConfig.temperature !== 'default') {
+          filterInput.temperature = filterConfig.temperature;
+        }
+        if (filterConfig.reasoning !== undefined) {
+          filterInput.reasoning = filterConfig.reasoning;
+        }
+        if (filterConfig.provider) {
+          filterInput.provider = filterConfig.provider;
+        }
+        if (filterConfig.customPrompt) {
+          filterInput.customPrompt = filterConfig.customPrompt;
+        }
+      }
 
-    // Limit to maximum issues if we have too many
-    if (sortedIssues.length > LIMITS.MAX_ISSUES_TO_PROCESS) {
-      logger.info(
-        `Limiting issues from ${sortedIssues.length} to ${LIMITS.MAX_ISSUES_TO_PROCESS} based on priority scores`
+      const filterResult = await supportedElsewhereFilterTool.execute(
+        filterInput,
+        { logger }
       );
 
-      // Log details about what's being kept and discarded
-      const keptIssues = sortedIssues.slice(0, LIMITS.MAX_ISSUES_TO_PROCESS);
-      const discardedIssues = sortedIssues.slice(LIMITS.MAX_ISSUES_TO_PROCESS);
-
-      const avgKeptScore =
-        keptIssues.reduce((sum, i) => sum + priorityScore(i), 0) /
-        keptIssues.length;
-      const avgDiscardedScore =
-        discardedIssues.length > 0
-          ? discardedIssues.reduce((sum, i) => sum + priorityScore(i), 0) /
-            discardedIssues.length
-          : 0;
-
-      logger.debug(
-        `Priority scores - Kept issues avg: ${avgKeptScore.toFixed(1)}, ` +
-          `Discarded issues avg: ${avgDiscardedScore.toFixed(1)}`
+      // Keep only the issues that are NOT supported elsewhere
+      const unsupportedIndices = new Set(
+        filterResult.unsupportedIssues.map((r) => r.index)
+      );
+      const filteredIssues = issues.filter((_, idx) =>
+        unsupportedIndices.has(idx)
       );
 
-      return keptIssues;
+      // Log and record what was filtered
+      const supportedCount = filterResult.supportedIssues.length;
+      if (supportedCount > 0) {
+        logger.info(
+          `FallacyCheckPlugin: Filtered out ${supportedCount} issues (supported elsewhere in document)`
+        );
+
+        // Record filtered items with their reasoning for telemetry
+        const filteredRecords = filterResult.supportedIssues.map((supported) => {
+          const originalIssue = issues[supported.index];
+          logger.debug(`  - Issue ${supported.index}: ${supported.explanation}`);
+          return {
+            stage: PIPELINE_STAGES.SUPPORTED_ELSEWHERE_FILTER,
+            quotedText: originalIssue.text || `Issue at index ${supported.index}`,
+            header: originalIssue.issueType,
+            filterReason: supported.explanation,
+            supportLocation: supported.supportLocation,
+            originalIndex: supported.index,
+          };
+        });
+        telemetry.recordFilteredItems(filteredRecords);
+      }
+
+      // Record passed items (issues that are NOT supported elsewhere)
+      if (filterResult.unsupportedIssues.length > 0) {
+        const passedRecords = filterResult.unsupportedIssues.map((unsupported) => {
+          const originalIssue = issues[unsupported.index];
+          return {
+            stage: PIPELINE_STAGES.SUPPORTED_ELSEWHERE_FILTER,
+            quotedText: originalIssue.text || `Issue at index ${unsupported.index}`,
+            header: originalIssue.issueType,
+            passReason: unsupported.explanation,
+            originalIndex: unsupported.index,
+          };
+        });
+        telemetry.recordPassedItems(passedRecords);
+      }
+
+      logger.info("FallacyCheckPlugin: AUDIT: Supported-elsewhere filter completed", {
+        timestamp: new Date().toISOString(),
+        issuesBeforeFilter: issues.length,
+        issuesAfterFilter: filteredIssues.length,
+        issuesFiltered: supportedCount,
+        phase: "supported-elsewhere-filter",
+        costUsd: filterResult.unifiedUsage?.costUsd,
+      });
+
+      telemetry.endStage(filteredIssues.length, {
+        costUsd: filterResult.unifiedUsage?.costUsd,
+        actualApiParams: filterResult.actualApiParams,
+        responseMetrics: filterResult.responseMetrics,
+        unifiedUsage: filterResult.unifiedUsage,
+      });
+      return filteredIssues;
+    } catch (error) {
+      logger.warn("FallacyCheckPlugin: Supported-elsewhere filter failed, keeping all issues", error);
+      telemetry.endStage(issues.length, {
+        error: error instanceof Error ? error.message : String(error),
+      });
+      return issues;
     }
+  }
+
+  /**
+   * Generate comments for all issues in parallel
+   */
+  private async generateCommentsForIssues(
+    issues: FallacyIssue[],
+    documentText: string
+  ): Promise<Comment[]> {
+    const commentPromises = issues.map(async (issue) => {
+      // Run in next tick to ensure true parallelism
+      await new Promise((resolve) => setImmediate(resolve));
+      const comment = await buildFallacyComment(issue, documentText, { logger });
+      // Filter out comments with empty descriptions
+      if (comment?.description.trim()) {
+        return comment;
+      }
+      return null;
+    });
 
-    return sortedIssues;
+    const commentResults = await Promise.all(commentPromises);
+    return commentResults.filter((comment): comment is Comment => comment !== null);
+  }
+
+  /**
+   * Review and filter comments, generate summaries
+   */
+  private async reviewAndFilterComments(
+    allComments: Comment[],
+    documentText: string,
+    telemetry: PipelineTelemetry
+  ): Promise<void> {
+    try {
+      const reviewComments = allComments.map((comment, index) => ({
+        index,
+        header: comment.header || "Epistemic Issue",
+        description: comment.description,
+        level: comment.level || 'warning',
+        importance: comment.importance,
+        quotedText: comment.highlight.quotedText,
+      }));
+
+      logger.info("FallacyCheckPlugin: AUDIT: Review phase started", {
+        timestamp: new Date().toISOString(),
+        commentsToReview: allComments.length,
+        phase: "review",
+        operation: "fallacy-review-tool",
+      });
+
+      const reviewResult = await fallacyReviewTool.execute(
+        { documentText, comments: reviewComments },
+        { logger }
+      );
+
+      // Filter comments based on review
+      const keptIndices = new Set(reviewResult.commentIndicesToKeep);
+      this.comments = reviewResult.commentIndicesToKeep.map((idx) => allComments[idx]);
+      this.summary = reviewResult.oneLineSummary;
+      this.analysis = reviewResult.documentSummary;
+
+      // Record comments that were filtered by review
+      const filteredComments = allComments
+        .map((comment, idx) => ({ comment, idx }))
+        .filter(({ idx }) => !keptIndices.has(idx));
+
+      if (filteredComments.length > 0) {
+        const filteredRecords = filteredComments.map(({ comment, idx }) => ({
+          stage: PIPELINE_STAGES.REVIEW,
+          quotedText: comment.highlight.quotedText,
+          header: comment.header,
+          filterReason: 'Filtered by review (redundant, low-value, or questionable)',
+          originalIndex: idx,
+        }));
+        telemetry.recordFilteredItems(filteredRecords);
+      }
+
+      logger.info("FallacyCheckPlugin: AUDIT: Review phase completed", {
+        timestamp: new Date().toISOString(),
+        commentsReviewed: allComments.length,
+        commentsKept: this.comments.length,
+        commentsFiltered: allComments.length - this.comments.length,
+        phase: "review",
+        costUsd: reviewResult.unifiedUsage?.costUsd,
+      });
+
+      telemetry.endStage(this.comments.length, {
+        costUsd: reviewResult.unifiedUsage?.costUsd,
+      });
+      telemetry.setFinalCounts({ commentsKept: this.comments.length });
+
+      logger.info(
+        `FallacyCheckPlugin: Review complete - kept ${this.comments.length}/${allComments.length} comments`
+      );
+    } catch (error) {
+      logger.error("FallacyCheckPlugin: Review failed, using fallback", error);
+      // Fallback: keep all comments and use old summary generation
+      this.comments = allComments;
+      const { summary, analysisSummary } = this.generateAnalysis();
+      this.summary = summary;
+      this.analysis = analysisSummary;
+
+      telemetry.endStage(this.comments.length, {
+        error: error instanceof Error ? error.message : String(error),
+      });
+      telemetry.setFinalCounts({ commentsKept: this.comments.length });
+    }
   }
 
   private generateAnalysis(): { summary: string; analysisSummary: string } {
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-loader.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-loader.ts
new file mode 100644
index 000000000..6ef2d7eb1
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-loader.ts
@@ -0,0 +1,478 @@
+/**
+ * Profile Loader for Fallacy Checker
+ *
+ * Loads, validates, and manages profile configurations from the database.
+ * Provides methods to load profiles by ID or get the default profile for an agent.
+ */
+
+import { prisma } from '@roast/db';
+import { logger } from '../../../shared/logger';
+import type {
+  FallacyCheckerProfile,
+  FallacyCheckerProfileConfig,
+  ModelConfig,
+  ThresholdConfig,
+  FilterChainConfig,
+  FilterChainItem,
+  PromptConfig,
+  FilterType,
+  ReasoningConfig,
+} from './profile-types';
+import {
+  createDefaultProfileConfig,
+  DEFAULT_EXTRACTOR_MODEL,
+  DEFAULT_JUDGE_MODEL,
+  migrateFilterChainConfig,
+} from './profile-types';
+
+// ============================================================================
+// Profile Loading
+// ============================================================================
+
+/**
+ * Load a profile by ID
+ *
+ * @throws Error if profile not found
+ */
+export async function loadProfile(profileId: string): Promise<FallacyCheckerProfileConfig> {
+  const profile = await prisma.fallacyCheckerProfile.findUnique({
+    where: { id: profileId },
+  });
+
+  if (!profile) {
+    throw new Error(`Profile not found: ${profileId}`);
+  }
+
+  return validateAndMergeConfig(profile.config);
+}
+
+/**
+ * Load the default profile for an agent
+ *
+ * Returns the profile marked as default, or creates a default config if none exists.
+ */
+export async function loadDefaultProfile(agentId: string): Promise<FallacyCheckerProfileConfig> {
+  const profile = await prisma.fallacyCheckerProfile.findFirst({
+    where: {
+      agentId,
+      isDefault: true,
+    },
+  });
+
+  if (!profile) {
+    // No default profile found, return default config
+    return createDefaultProfileConfig();
+  }
+
+  return validateAndMergeConfig(profile.config);
+}
+
+/**
+ * Load a profile by ID or fall back to default for agent
+ */
+export async function loadProfileOrDefault(
+  profileId: string | undefined,
+  agentId: string
+): Promise<FallacyCheckerProfileConfig> {
+  if (profileId) {
+    try {
+      return await loadProfile(profileId);
+    } catch (error) {
+      // Log warning and fall back to default
+      logger.warn(`Failed to load profile ${profileId}, using default:`, error);
+    }
+  }
+  return loadDefaultProfile(agentId);
+}
+
+/**
+ * Get all profiles for an agent
+ */
+export async function getProfilesForAgent(agentId: string): Promise<FallacyCheckerProfile[]> {
+  const profiles = await prisma.fallacyCheckerProfile.findMany({
+    where: { agentId },
+    orderBy: [
+      { isDefault: 'desc' },
+      { name: 'asc' },
+    ],
+  });
+
+  return profiles.map((p) => ({
+    ...p,
+    config: validateAndMergeConfig(p.config),
+  }));
+}
+
+// ============================================================================
+// Validation
+// ============================================================================
+
+/**
+ * Validate and merge a raw config with defaults
+ *
+ * Handles missing fields and invalid values by falling back to defaults.
+ */
+export function validateAndMergeConfig(rawConfig: unknown): FallacyCheckerProfileConfig {
+  const defaults = createDefaultProfileConfig();
+
+  if (!rawConfig || typeof rawConfig !== 'object') {
+    return defaults;
+  }
+
+  const config = rawConfig as Record<string, unknown>;
+
+  return {
+    version: 1,
+    models: validateModels(config.models, defaults.models),
+    thresholds: validateThresholds(config.thresholds, defaults.thresholds),
+    prompts: validatePrompts(config.prompts),
+    filterChain: validateFilterChain(config.filterChain, defaults.filterChain),
+  };
+}
+
+/**
+ * Validate model configuration
+ */
+function validateModels(raw: unknown, defaults: ModelConfig): ModelConfig {
+  if (!raw || typeof raw !== 'object') {
+    return defaults;
+  }
+
+  const models = raw as Record<string, unknown>;
+
+  // Validate extractors
+  let extractors = defaults.extractors;
+  if (Array.isArray(models.extractors) && models.extractors.length > 0) {
+    extractors = models.extractors
+      .filter((e): e is Record<string, unknown> => typeof e === 'object' && e !== null)
+      .map((e) => ({
+        model: typeof e.model === 'string' ? e.model : DEFAULT_EXTRACTOR_MODEL,
+        temperature: typeof e.temperature === 'number' || e.temperature === 'default'
+          ? e.temperature
+          : undefined,
+        label: typeof e.label === 'string' ? e.label : undefined,
+        thinking: typeof e.thinking === 'boolean' ? e.thinking : undefined,
+        reasoning: validateReasoning(e.reasoning),
+        provider: validateProvider(e.provider),
+      }));
+
+    if (extractors.length === 0) {
+      extractors = defaults.extractors;
+    }
+  }
+
+  // Validate judge
+  let judge = defaults.judge;
+  if (typeof models.judge === 'object' && models.judge !== null) {
+    const j = models.judge as Record<string, unknown>;
+    judge = {
+      model: typeof j.model === 'string' ? j.model : DEFAULT_JUDGE_MODEL,
+      temperature: typeof j.temperature === 'number' || j.temperature === 'default'
+        ? j.temperature
+        : undefined,
+      thinking: typeof j.thinking === 'boolean' ? j.thinking : undefined,
+      reasoning: validateReasoning(j.reasoning),
+      enabled: typeof j.enabled === 'boolean' ? j.enabled : false,
+    };
+  }
+
+  return {
+    extractors,
+    judge,
+  };
+}
+
+/**
+ * Validate thresholds
+ */
+function validateThresholds(raw: unknown, defaults: ThresholdConfig): ThresholdConfig {
+  if (!raw || typeof raw !== 'object') {
+    return defaults;
+  }
+
+  const thresholds = raw as Record<string, unknown>;
+
+  const validNumber = (val: unknown, min: number, max: number, defaultVal: number): number => {
+    if (typeof val === 'number' && val >= min && val <= max) {
+      return val;
+    }
+    return defaultVal;
+  };
+
+  return {
+    minSeverityThreshold: validNumber(thresholds.minSeverityThreshold, 0, 100, defaults.minSeverityThreshold),
+    maxIssues: validNumber(thresholds.maxIssues, 1, 100, defaults.maxIssues),
+    dedupThreshold: validNumber(thresholds.dedupThreshold, 0, 1, defaults.dedupThreshold),
+    maxIssuesToProcess: validNumber(thresholds.maxIssuesToProcess, 1, 100, defaults.maxIssuesToProcess),
+  };
+}
+
+/**
+ * Validate prompts
+ */
+function validatePrompts(raw: unknown): PromptConfig | undefined {
+  if (!raw || typeof raw !== 'object') {
+    return undefined;
+  }
+
+  const prompts = raw as Record<string, unknown>;
+  const result: PromptConfig = {};
+
+  if (typeof prompts.extractorSystemPrompt === 'string' && prompts.extractorSystemPrompt.trim()) {
+    result.extractorSystemPrompt = prompts.extractorSystemPrompt;
+  }
+  if (typeof prompts.extractorUserPrompt === 'string' && prompts.extractorUserPrompt.trim()) {
+    result.extractorUserPrompt = prompts.extractorUserPrompt;
+  }
+  if (typeof prompts.judgeSystemPrompt === 'string' && prompts.judgeSystemPrompt.trim()) {
+    result.judgeSystemPrompt = prompts.judgeSystemPrompt;
+  }
+  if (typeof prompts.reviewSystemPrompt === 'string' && prompts.reviewSystemPrompt.trim()) {
+    result.reviewSystemPrompt = prompts.reviewSystemPrompt;
+  }
+
+  // Return undefined if no prompts are set
+  return Object.keys(result).length > 0 ? result : undefined;
+}
+
+/**
+ * Validate filter chain
+ */
+function validateFilterChain(raw: unknown, defaults: FilterChainConfig): FilterChainConfig {
+  // Handle empty/invalid input
+  if (!raw) {
+    return defaults;
+  }
+
+  // Migrate from old format if needed
+  const migrated = migrateFilterChainConfig(raw);
+
+  const validFilterTypes: FilterType[] = [
+    'dedup',
+    'principle-of-charity',
+    'supported-elsewhere',
+    'severity',
+    'confidence',
+    'review',
+  ];
+
+  // Validate each filter item
+  const validatedFilters: FilterChainItem[] = migrated
+    .filter((f): boolean => typeof f === 'object' && f !== null)
+    .filter((f) => validFilterTypes.includes((f as unknown as Record<string, unknown>).type as FilterType))
+    .map((f, index) => {
+      const raw = f as unknown as Record<string, unknown>;
+      const type = raw.type as FilterType;
+      const base = {
+        id: typeof raw.id === 'string' ? raw.id : `filter-${index}`,
+        type,
+        enabled: typeof raw.enabled === 'boolean' ? raw.enabled : true,
+      };
+
+      // Add type-specific fields
+      if (type === 'principle-of-charity') {
+        return {
+          ...base,
+          type: 'principle-of-charity' as const,
+          model: typeof raw.model === 'string' ? raw.model : undefined,
+          temperature: (typeof raw.temperature === 'number' || raw.temperature === 'default')
+            ? (raw.temperature as number | 'default')
+            : undefined,
+          reasoning: validateReasoning(raw.reasoning),
+          customPrompt: typeof raw.customPrompt === 'string' ? raw.customPrompt : undefined,
+        };
+      }
+
+      if (type === 'supported-elsewhere') {
+        return {
+          ...base,
+          type: 'supported-elsewhere' as const,
+          model: typeof raw.model === 'string' ? raw.model : undefined,
+          temperature: (typeof raw.temperature === 'number' || raw.temperature === 'default')
+            ? (raw.temperature as number | 'default')
+            : undefined,
+          reasoning: validateReasoning(raw.reasoning),
+          customPrompt: typeof raw.customPrompt === 'string' ? raw.customPrompt : undefined,
+        };
+      }
+
+      if (type === 'severity') {
+        return {
+          ...base,
+          type: 'severity' as const,
+          minSeverity: typeof raw.minSeverity === 'number' ? raw.minSeverity : 50,
+        };
+      }
+
+      if (type === 'confidence') {
+        return {
+          ...base,
+          type: 'confidence' as const,
+          minConfidence: typeof raw.minConfidence === 'number' ? raw.minConfidence : 50,
+        };
+      }
+
+      // Simple filters (dedup, review)
+      return base as FilterChainItem;
+    });
+
+  if (validatedFilters.length === 0) {
+    return defaults;
+  }
+
+  return validatedFilters;
+}
+
+/**
+ * Validate reasoning config
+ */
+function validateReasoning(raw: unknown): ReasoningConfig | undefined {
+  if (raw === false) return false;
+  if (!raw || typeof raw !== 'object') return undefined;
+
+  const r = raw as Record<string, unknown>;
+
+  if (typeof r.effort === 'string' &&
+      ['minimal', 'low', 'medium', 'high', 'xhigh'].includes(r.effort)) {
+    return { effort: r.effort as 'minimal' | 'low' | 'medium' | 'high' | 'xhigh' };
+  }
+
+  if (typeof r.budget_tokens === 'number' && r.budget_tokens >= 1024) {
+    return { budget_tokens: r.budget_tokens };
+  }
+
+  return undefined;
+}
+
+/**
+ * Validate provider preferences
+ */
+function validateProvider(raw: unknown): { order?: string[]; allow_fallbacks?: boolean } | undefined {
+  if (!raw || typeof raw !== 'object') return undefined;
+
+  const p = raw as Record<string, unknown>;
+  const result: { order?: string[]; allow_fallbacks?: boolean } = {};
+
+  if (Array.isArray(p.order) && p.order.every((item) => typeof item === 'string')) {
+    result.order = p.order;
+  }
+
+  if (typeof p.allow_fallbacks === 'boolean') {
+    result.allow_fallbacks = p.allow_fallbacks;
+  }
+
+  // Return undefined if no valid fields found
+  if (Object.keys(result).length === 0) return undefined;
+
+  return result;
+}
+
+// ============================================================================
+// Profile Creation/Update Helpers
+// ============================================================================
+
+/**
+ * Create a new profile
+ */
+export async function createProfile(
+  agentId: string,
+  name: string,
+  config: Partial<FallacyCheckerProfileConfig>,
+  options?: {
+    description?: string;
+    isDefault?: boolean;
+  }
+): Promise<FallacyCheckerProfile> {
+  const fullConfig = validateAndMergeConfig(config);
+
+  // If setting as default, unset other defaults first
+  if (options?.isDefault) {
+    await prisma.fallacyCheckerProfile.updateMany({
+      where: { agentId, isDefault: true },
+      data: { isDefault: false },
+    });
+  }
+
+  const profile = await prisma.fallacyCheckerProfile.create({
+    data: {
+      agentId,
+      name,
+      description: options?.description ?? null,
+      config: fullConfig as unknown as Record<string, unknown>,
+      isDefault: options?.isDefault ?? false,
+    },
+  });
+
+  return {
+    ...profile,
+    config: fullConfig,
+  };
+}
+
+/**
+ * Update a profile
+ */
+export async function updateProfile(
+  profileId: string,
+  updates: {
+    name?: string;
+    description?: string | null;
+    config?: Partial<FallacyCheckerProfileConfig>;
+    isDefault?: boolean;
+  }
+): Promise<FallacyCheckerProfile> {
+  const existing = await prisma.fallacyCheckerProfile.findUnique({
+    where: { id: profileId },
+  });
+
+  if (!existing) {
+    throw new Error(`Profile not found: ${profileId}`);
+  }
+
+  // Merge config if provided
+  const existingConfig = existing.config as Record<string, unknown> | null;
+  const newConfig = updates.config
+    ? validateAndMergeConfig({
+        ...(existingConfig ?? {}),
+        ...updates.config,
+      })
+    : validateAndMergeConfig(existingConfig);
+
+  // If setting as default, unset other defaults first
+  if (updates.isDefault) {
+    await prisma.fallacyCheckerProfile.updateMany({
+      where: { agentId: existing.agentId, isDefault: true, id: { not: profileId } },
+      data: { isDefault: false },
+    });
+  }
+
+  const profile = await prisma.fallacyCheckerProfile.update({
+    where: { id: profileId },
+    data: {
+      ...(updates.name !== undefined && { name: updates.name }),
+      ...(updates.description !== undefined && { description: updates.description }),
+      ...(updates.config !== undefined && { config: newConfig as unknown as Record<string, unknown> }),
+      ...(updates.isDefault !== undefined && { isDefault: updates.isDefault }),
+    },
+  });
+
+  return {
+    ...profile,
+    config: newConfig,
+  };
+}
+
+/**
+ * Delete a profile
+ */
+export async function deleteProfile(profileId: string): Promise<void> {
+  await prisma.fallacyCheckerProfile.delete({
+    where: { id: profileId },
+  });
+}
+
+// ============================================================================
+// Export Types
+// ============================================================================
+
+export type { FallacyCheckerProfile, FallacyCheckerProfileConfig };
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-types.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-types.ts
new file mode 100644
index 000000000..75ea7fd8f
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/profile-types.ts
@@ -0,0 +1,317 @@
+/**
+ * Fallacy Checker Profile Configuration Types
+ *
+ * These types define the configuration structure stored in the database
+ * for fallacy checker profiles. Profiles allow customizing the pipeline
+ * parameters: models, thresholds, and prompts.
+ */
+
+import type { ExtractorConfig, JudgeConfig, ProviderPreferences, ReasoningEffort, ReasoningConfig } from './extraction/types';
+
+// ============================================================================
+// Model Configuration Types
+// ============================================================================
+
+/**
+ * Model configuration for all pipeline stages
+ */
+export interface ModelConfig {
+  /**
+   * Extractors configuration.
+   * Supports multiple extractors with different models for ensemble extraction.
+   */
+  extractors: ExtractorConfig[];
+
+  /**
+   * Judge configuration for aggregating multi-extractor results.
+   * Only used when multiple extractors are configured.
+   */
+  judge: JudgeConfig;
+}
+
+// ============================================================================
+// Threshold Configuration Types
+// ============================================================================
+
+/**
+ * Threshold and limit configuration for the pipeline
+ */
+export interface ThresholdConfig {
+  /**
+   * Minimum severity score (0-100) for issues to be kept by extractor.
+   * Default: 60 (from MIN_SEVERITY_THRESHOLD in fallacy-extractor)
+   */
+  minSeverityThreshold: number;
+
+  /**
+   * Maximum number of issues returned by extractor.
+   * Default: 15 (from MAX_ISSUES in fallacy-extractor)
+   */
+  maxIssues: number;
+
+  /**
+   * Jaccard similarity threshold for deduplication (0-1).
+   * Higher = stricter matching (fewer duplicates detected).
+   * Default: 0.7 (from JACCARD_THRESHOLD in dedup.ts)
+   */
+  dedupThreshold: number;
+
+  /**
+   * Maximum issues to process through the full pipeline.
+   * Default: 25 (from LIMITS.MAX_ISSUES_TO_PROCESS in constants.ts)
+   */
+  maxIssuesToProcess: number;
+}
+
+// ============================================================================
+// Prompt Configuration Types
+// ============================================================================
+
+/**
+ * Prompts used in the pipeline.
+ * All prompts are optional - if not provided, defaults from the tools are used.
+ */
+export interface PromptConfig {
+  /**
+   * System prompt for the fallacy extractor.
+   * This is the main instruction prompt that defines how issues are detected.
+   */
+  extractorSystemPrompt?: string;
+
+  /**
+   * User prompt template for the fallacy extractor.
+   * Use {{text}} as placeholder for the document text.
+   */
+  extractorUserPrompt?: string;
+
+  /**
+   * System prompt for the LLM judge (multi-extractor aggregation).
+   */
+  judgeSystemPrompt?: string;
+
+  /**
+   * System prompt for the review/filter stage.
+   */
+  reviewSystemPrompt?: string;
+}
+
+// ============================================================================
+// Filter Chain Configuration Types
+// ============================================================================
+
+/**
+ * Available filter types
+ */
+export type FilterType =
+  | 'dedup'                   // Remove near-duplicate issues
+  | 'principle-of-charity'    // Apply charitable interpretation before critiquing
+  | 'supported-elsewhere'     // Filter issues addressed elsewhere in document
+  | 'severity'                // Filter by severity threshold
+  | 'confidence'              // Filter by confidence threshold
+  | 'review';                 // Human review / AI review filter
+
+// Re-export for backwards compatibility
+export type { ReasoningEffort, ReasoningConfig };
+
+/**
+ * Base filter configuration
+ */
+interface BaseFilterConfig {
+  id: string;
+  type: FilterType;
+  enabled: boolean;
+}
+
+/**
+ * Principle of Charity filter configuration
+ */
+export interface PrincipleOfCharityFilterConfig extends BaseFilterConfig {
+  type: 'principle-of-charity';
+  model?: string;
+  temperature?: number | 'default';
+  reasoning?: ReasoningConfig;
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+  customPrompt?: string;
+}
+
+/**
+ * Supported-elsewhere filter configuration
+ */
+export interface SupportedElsewhereFilterConfig extends BaseFilterConfig {
+  type: 'supported-elsewhere';
+  model?: string;
+  temperature?: number | 'default';
+  reasoning?: ReasoningConfig;
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+  customPrompt?: string;
+}
+
+/**
+ * Severity filter configuration
+ */
+export interface SeverityFilterConfig extends BaseFilterConfig {
+  type: 'severity';
+  minSeverity: number;  // 0-100
+}
+
+/**
+ * Confidence filter configuration
+ */
+export interface ConfidenceFilterConfig extends BaseFilterConfig {
+  type: 'confidence';
+  minConfidence: number;  // 0-100
+}
+
+/**
+ * Simple filter configuration (for dedup, review which don't need extra settings)
+ */
+export interface SimpleFilterConfig extends BaseFilterConfig {
+  type: 'dedup' | 'review';
+}
+
+/**
+ * Union of all filter configurations
+ */
+export type FilterChainItem =
+  | PrincipleOfCharityFilterConfig
+  | SupportedElsewhereFilterConfig
+  | SeverityFilterConfig
+  | ConfidenceFilterConfig
+  | SimpleFilterConfig;
+
+/**
+ * Filter chain configuration - array of filter items
+ */
+export type FilterChainConfig = FilterChainItem[];
+
+// Helper to migrate old format to new
+export function migrateFilterChainConfig(
+  config: unknown
+): FilterChainConfig {
+  // Already an array - return as-is
+  if (Array.isArray(config)) {
+    return config as FilterChainConfig;
+  }
+
+  // Old format: { filters: Array<{ type, enabled }> }
+  const oldFormat = config as { filters?: Array<{ type: FilterType; enabled: boolean }> } | undefined;
+  if (oldFormat?.filters && Array.isArray(oldFormat.filters)) {
+    return oldFormat.filters.map((f, i) => ({
+      id: `filter-${i}`,
+      type: f.type,
+      enabled: f.enabled,
+    } as FilterChainItem));
+  }
+
+  // Fallback to default
+  return DEFAULT_FILTER_CHAIN;
+}
+
+// ============================================================================
+// Main Profile Configuration Type
+// ============================================================================
+
+/**
+ * Complete profile configuration for the fallacy checker pipeline.
+ * This is the structure stored in the database's config JSON field.
+ */
+export interface FallacyCheckerProfileConfig {
+  /**
+   * Schema version for forward compatibility.
+   */
+  version: 1;
+
+  /**
+   * Model configurations for each pipeline stage.
+   */
+  models: ModelConfig;
+
+  /**
+   * Threshold and limit configurations.
+   */
+  thresholds: ThresholdConfig;
+
+  /**
+   * Custom prompts (optional - defaults used if not provided).
+   */
+  prompts?: PromptConfig;
+
+  /**
+   * Filter chain configuration.
+   */
+  filterChain: FilterChainConfig;
+}
+
+// ============================================================================
+// Profile Database Type
+// ============================================================================
+
+/**
+ * Profile as stored in the database (matches Prisma schema).
+ */
+export interface FallacyCheckerProfile {
+  id: string;
+  name: string;
+  description: string | null;
+  agentId: string;
+  config: FallacyCheckerProfileConfig;
+  isDefault: boolean;
+  createdAt: Date;
+  updatedAt: Date;
+}
+
+// ============================================================================
+// Default Values
+// ============================================================================
+
+/**
+ * Default extractor model
+ */
+export const DEFAULT_EXTRACTOR_MODEL = 'claude-sonnet-4-5-20250929';
+
+/**
+ * Default judge model
+ */
+export const DEFAULT_JUDGE_MODEL = 'claude-sonnet-4-5-20250929';
+
+/**
+ * Default threshold configuration (matches existing hardcoded values)
+ */
+export const DEFAULT_THRESHOLDS: ThresholdConfig = {
+  minSeverityThreshold: 60,  // From fallacy-extractor MIN_SEVERITY_THRESHOLD
+  maxIssues: 15,             // From fallacy-extractor MAX_ISSUES
+  dedupThreshold: 0.7,       // From dedup.ts JACCARD_THRESHOLD
+  maxIssuesToProcess: 25,    // From constants.ts LIMITS.MAX_ISSUES_TO_PROCESS
+};
+
+/**
+ * Default filter chain (current behavior)
+ * Order: dedup → principle-of-charity → supported-elsewhere → review
+ */
+export const DEFAULT_FILTER_CHAIN: FilterChainConfig = [
+  { id: 'default-dedup', type: 'dedup', enabled: true },
+  { id: 'default-principle-of-charity', type: 'principle-of-charity', enabled: true },
+  { id: 'default-supported-elsewhere', type: 'supported-elsewhere', enabled: true },
+  { id: 'default-review', type: 'review', enabled: true },
+];
+
+/**
+ * Create a default profile configuration
+ */
+export function createDefaultProfileConfig(): FallacyCheckerProfileConfig {
+  return {
+    version: 1,
+    models: {
+      extractors: [{ model: DEFAULT_EXTRACTOR_MODEL }],
+      judge: {
+        model: DEFAULT_JUDGE_MODEL,
+        enabled: false,
+      },
+    },
+    thresholds: DEFAULT_THRESHOLDS,
+    prompts: undefined,  // Use tool defaults
+    filterChain: DEFAULT_FILTER_CHAIN,
+  };
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/PipelineTelemetry.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/PipelineTelemetry.ts
new file mode 100644
index 000000000..6c2639616
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/PipelineTelemetry.ts
@@ -0,0 +1,300 @@
+/**
+ * Pipeline Telemetry Collector
+ *
+ * Collects and aggregates metrics during fallacy check pipeline execution.
+ * Provides a fluent API for tracking stages and finalizing results.
+ */
+
+import { v4 as uuidv4 } from 'uuid';
+import { logger } from '../../../../shared/logger';
+import type {
+  StageMetrics,
+  PipelineExecutionRecord,
+  PipelineStage,
+  FilteredItemRecord,
+  PassedItemRecord,
+  ExtractionPhaseTelemetry,
+  ProfileInfo,
+} from './types';
+
+/** Current pipeline version - increment when making significant changes */
+const PIPELINE_VERSION = '2.0.0'; // v2: single-pass extraction + supported-elsewhere filter
+
+/**
+ * Tracks metrics for an in-progress stage
+ */
+interface ActiveStage {
+  stageName: string;
+  startTime: number;
+  inputCount: number;
+  model?: string;
+}
+
+/**
+ * Pipeline Telemetry Collector
+ *
+ * Usage:
+ * ```ts
+ * const telemetry = new PipelineTelemetry(documentText.length);
+ *
+ * telemetry.startStage('extraction', 1);
+ * const issues = await extract();
+ * telemetry.endStage(issues.length);
+ *
+ * telemetry.startStage('filter', issues.length);
+ * const filtered = await filter(issues);
+ * telemetry.endStage(filtered.length);
+ *
+ * const record = telemetry.finalize(true);
+ * ```
+ */
+export class PipelineTelemetry {
+  private executionId: string;
+  private startedAt: Date;
+  private documentLength: number;
+  private stages: StageMetrics[] = [];
+  private activeStage: ActiveStage | null = null;
+  private filteredItems: FilteredItemRecord[] = [];
+  private passedItems: PassedItemRecord[] = [];
+  private extractionPhase: ExtractionPhaseTelemetry | null = null;
+  private profileInfo: ProfileInfo | null = null;
+  private finalCounts: PipelineExecutionRecord['finalCounts'] = {
+    issuesExtracted: 0,
+    issuesAfterDedup: 0,
+    issuesAfterFiltering: 0,
+    commentsGenerated: 0,
+    commentsKept: 0,
+  };
+
+  constructor(documentLength: number) {
+    this.executionId = uuidv4();
+    this.startedAt = new Date();
+    this.documentLength = documentLength;
+  }
+
+  /**
+   * Set profile info for this execution
+   */
+  setProfileInfo(info: ProfileInfo): this {
+    this.profileInfo = info;
+    return this;
+  }
+
+  /**
+   * Start tracking a new pipeline stage
+   */
+  startStage(
+    stageName: PipelineStage | string,
+    inputCount: number,
+    options?: { model?: string }
+  ): this {
+    // If there's an active stage that wasn't ended, end it with error
+    if (this.activeStage) {
+      logger.warn(
+        `[PipelineTelemetry] Stage '${this.activeStage.stageName}' was not properly ended. Ending with error.`
+      );
+      this.endStage(0, { error: 'Stage was not properly ended' });
+    }
+
+    this.activeStage = {
+      stageName,
+      startTime: Date.now(),
+      inputCount,
+      model: options?.model,
+    };
+
+    return this;
+  }
+
+  /**
+   * End the current stage and record metrics
+   */
+  endStage(
+    outputCount: number,
+    options?: {
+      costUsd?: number;
+      error?: string;
+      metadata?: Record<string, unknown>;
+      actualApiParams?: StageMetrics['actualApiParams'];
+      responseMetrics?: StageMetrics['responseMetrics'];
+      unifiedUsage?: StageMetrics['unifiedUsage'];
+    }
+  ): this {
+    if (!this.activeStage) {
+      logger.warn(
+        '[PipelineTelemetry] endStage called without an active stage'
+      );
+      return this;
+    }
+
+    const durationMs = Date.now() - this.activeStage.startTime;
+    const filteredCount = this.activeStage.inputCount - outputCount;
+
+    const metrics: StageMetrics = {
+      stageName: this.activeStage.stageName,
+      durationMs,
+      inputCount: this.activeStage.inputCount,
+      outputCount,
+      filteredCount: Math.max(0, filteredCount), // Don't report negative if output > input
+      model: this.activeStage.model,
+      costUsd: options?.costUsd,
+      error: options?.error,
+      metadata: options?.metadata,
+      actualApiParams: options?.actualApiParams,
+      responseMetrics: options?.responseMetrics,
+      unifiedUsage: options?.unifiedUsage,
+    };
+
+    this.stages.push(metrics);
+    this.activeStage = null;
+
+    return this;
+  }
+
+  /**
+   * Record a stage that already completed (for stages we can't wrap)
+   */
+  recordStage(
+    stageName: PipelineStage | string,
+    metrics: Omit<StageMetrics, 'stageName'>
+  ): this {
+    this.stages.push({
+      stageName,
+      ...metrics,
+    });
+    return this;
+  }
+
+  /**
+   * Update final counts (call after each major phase)
+   */
+  setFinalCounts(
+    counts: Partial<PipelineExecutionRecord['finalCounts']>
+  ): this {
+    this.finalCounts = {
+      ...this.finalCounts,
+      ...counts,
+    };
+    return this;
+  }
+
+  /**
+   * Record a filtered item with its reasoning
+   */
+  recordFilteredItem(item: FilteredItemRecord): this {
+    this.filteredItems.push(item);
+    return this;
+  }
+
+  /**
+   * Record multiple filtered items
+   */
+  recordFilteredItems(items: FilteredItemRecord[]): this {
+    this.filteredItems.push(...items);
+    return this;
+  }
+
+  /**
+   * Record an item that passed through a filter
+   */
+  recordPassedItem(item: PassedItemRecord): this {
+    this.passedItems.push(item);
+    return this;
+  }
+
+  /**
+   * Record multiple passed items
+   */
+  recordPassedItems(items: PassedItemRecord[]): this {
+    this.passedItems.push(...items);
+    return this;
+  }
+
+  /**
+   * Set extraction phase telemetry (for multi-extractor mode)
+   */
+  setExtractionPhase(telemetry: ExtractionPhaseTelemetry): this {
+    this.extractionPhase = telemetry;
+    return this;
+  }
+
+  /**
+   * Calculate total cost from all stages
+   */
+  private calculateTotalCost(): number | undefined {
+    const costs = this.stages
+      .map((s) => s.costUsd)
+      .filter((c): c is number => c !== undefined);
+
+    if (costs.length === 0) return undefined;
+    return costs.reduce((sum, cost) => sum + cost, 0);
+  }
+
+  /**
+   * Finalize and return the complete execution record
+   */
+  finalize(success: boolean, error?: string): PipelineExecutionRecord {
+    // End any active stage
+    if (this.activeStage) {
+      this.endStage(0, { error: error || 'Pipeline ended with active stage' });
+    }
+
+    const completedAt = new Date();
+    const totalDurationMs = completedAt.getTime() - this.startedAt.getTime();
+
+    return {
+      executionId: this.executionId,
+      startedAt: this.startedAt.toISOString(),
+      completedAt: completedAt.toISOString(),
+      totalDurationMs,
+      documentLength: this.documentLength,
+      stages: this.stages,
+      finalCounts: this.finalCounts,
+      success,
+      error,
+      totalCostUsd: this.calculateTotalCost(),
+      pipelineVersion: PIPELINE_VERSION,
+      filteredItems: this.filteredItems, // Always include (even if empty) so we know telemetry was captured
+      passedItems: this.passedItems.length > 0 ? this.passedItems : undefined, // Only include if there are items
+      extractionPhase: this.extractionPhase || undefined,
+      profileInfo: this.profileInfo || undefined,
+    };
+  }
+
+  /**
+   * Get execution ID for correlation
+   */
+  getExecutionId(): string {
+    return this.executionId;
+  }
+
+  /**
+   * Log a summary of the current telemetry state
+   */
+  logSummary(): void {
+    const stagesSummary = this.stages.map(stage => ({
+      name: stage.stageName,
+      status: stage.error ? 'error' : 'ok',
+      durationMs: stage.durationMs,
+      input: stage.inputCount,
+      output: stage.outputCount,
+      filtered: stage.filteredCount,
+      model: stage.model,
+      costUsd: stage.costUsd,
+      error: stage.error,
+    }));
+
+    const totalCost = this.calculateTotalCost();
+    const elapsed = Date.now() - this.startedAt.getTime();
+
+    logger.debug('[PipelineTelemetry] Summary', {
+      executionId: this.executionId,
+      documentLength: this.documentLength,
+      stagesCompleted: this.stages.length,
+      stages: stagesSummary,
+      finalCounts: this.finalCounts,
+      totalCostUsd: totalCost,
+      totalElapsedMs: elapsed,
+    });
+  }
+}
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/index.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/index.ts
new file mode 100644
index 000000000..751d7dba2
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/index.ts
@@ -0,0 +1,23 @@
+/**
+ * Pipeline Telemetry Module
+ *
+ * Exports telemetry types and collector for fallacy check pipeline observability.
+ */
+
+export { PipelineTelemetry } from './PipelineTelemetry';
+export {
+  type StageMetrics,
+  type PipelineExecutionRecord,
+  type PipelineStage,
+  type FilteredItemRecord,
+  type ExtractorTelemetry,
+  type JudgeDecisionRecord,
+  type ExtractionPhaseTelemetry,
+  type ProfileInfo,
+  type ActualApiParams,
+  type ApiResponseMetrics,
+  PIPELINE_STAGES,
+} from './types';
+
+// Re-export UnifiedUsageMetrics for consumers
+export type { UnifiedUsageMetrics } from '../../../../utils/usageMetrics';
diff --git a/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/types.ts b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/types.ts
new file mode 100644
index 000000000..efb2f352e
--- /dev/null
+++ b/internal-packages/ai/src/analysis-plugins/plugins/fallacy-check/telemetry/types.ts
@@ -0,0 +1,342 @@
+/**
+ * Pipeline Telemetry Types
+ *
+ * Structured metrics for tracking fallacy check pipeline execution.
+ * Used for observability, debugging, and regression detection.
+ */
+
+import type { UnifiedUsageMetrics } from '../../../../utils/usageMetrics';
+import type { ActualApiParams, ApiResponseMetrics } from '../../../../types/common';
+
+// Re-export for backwards compatibility
+export type { ActualApiParams, ApiResponseMetrics };
+
+/**
+ * Metrics for a single pipeline stage
+ */
+export interface StageMetrics {
+  /** Stage name (e.g., 'extraction', 'supported-elsewhere-filter') */
+  stageName: string;
+
+  /** Duration of the stage in milliseconds */
+  durationMs: number;
+
+  /** Number of items going into this stage */
+  inputCount: number;
+
+  /** Number of items coming out of this stage */
+  outputCount: number;
+
+  /** Number of items filtered/removed by this stage */
+  filteredCount: number;
+
+  /** Estimated cost in dollars (if applicable) */
+  costUsd?: number;
+
+  /** Model used for this stage (if applicable) */
+  model?: string;
+
+  /** Any error that occurred during this stage */
+  error?: string;
+
+  /** Additional stage-specific metadata */
+  metadata?: Record<string, unknown>;
+
+  /** Actual API parameters sent to the provider (for LLM-based stages) */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics from API call (for LLM-based stages) */
+  responseMetrics?: ApiResponseMetrics;
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
+}
+
+/**
+ * Details about a filtered item (issue or comment)
+ */
+export interface FilteredItemRecord {
+  /** Stage where filtering occurred */
+  stage: string;
+
+  /** Original text that was flagged */
+  quotedText: string;
+
+  /** Header/type of the issue */
+  header?: string;
+
+  /** Why this item was filtered */
+  filterReason: string;
+
+  /** Where support was found (for supported-elsewhere filter) */
+  supportLocation?: string;
+
+  /** Original index in the input array */
+  originalIndex: number;
+}
+
+/**
+ * Details about an item that passed through a filter
+ */
+export interface PassedItemRecord {
+  /** Stage where this item was evaluated */
+  stage: string;
+
+  /** Original text of the issue */
+  quotedText: string;
+
+  /** Header/type of the issue */
+  header?: string;
+
+  /** Why this item passed / the filter's reasoning */
+  passReason: string;
+
+  /** Original index in the input array */
+  originalIndex: number;
+}
+
+// ============================================================================
+// Multi-Extractor Telemetry Types
+// ============================================================================
+
+/**
+ * Telemetry for a single extractor run
+ */
+export interface ExtractorTelemetry {
+  /** Unique extractor ID (e.g., "sonnet-0", "gemini-flash-1") */
+  extractorId: string;
+
+  /** Model used */
+  model: string;
+
+  /**
+   * @deprecated Use actualApiParams.temperature instead
+   * Effective temperature used for this extractor.
+   */
+  temperature: number;
+
+  /**
+   * Original temperature configuration.
+   * - "default": Model's native default was used
+   * - number: Explicit temperature was configured
+   * - undefined: Our model-specific default was used
+   */
+  temperatureConfig?: number | 'default';
+
+  /**
+   * @deprecated Use actualApiParams for actual thinking config
+   * Whether extended thinking/reasoning was enabled.
+   */
+  thinkingEnabled: boolean;
+
+  /** Number of issues found by this extractor */
+  issuesFound: number;
+
+  /** Execution time in milliseconds */
+  durationMs: number;
+
+  /** Cost in USD (if available) */
+  costUsd?: number;
+
+  /** Error message if extraction failed */
+  error?: string;
+
+  /** Breakdown of issues by type */
+  issuesByType: Record<string, number>;
+
+  /**
+   * Actual API request parameters as sent to the provider.
+   * Captured right before the API call - this is the source of truth.
+   */
+  actualApiParams?: ActualApiParams;
+
+  /**
+   * Response metrics from the API call.
+   */
+  responseMetrics?: ApiResponseMetrics;
+
+  /**
+   * Unified usage metrics (includes cost, tokens, latency).
+   * This provides a consistent format across all providers (OpenRouter, Anthropic).
+   * The costUsd is directly from API for OpenRouter, calculated for Anthropic.
+   */
+  unifiedUsage?: UnifiedUsageMetrics;
+}
+
+/**
+ * Record of a judge decision (for drill-down)
+ */
+export interface JudgeDecisionRecord {
+  /** The quoted text from the issue */
+  issueText: string;
+
+  /** Issue type (e.g., "logical-fallacy", "missing-context") */
+  issueType: string;
+
+  /** Judge's decision */
+  decision: 'accepted' | 'merged' | 'rejected';
+
+  /** Judge's reasoning */
+  reasoning: string;
+
+  /** Which extractors found this issue */
+  sourceExtractors: string[];
+
+  /** Final severity after judge assessment */
+  finalSeverity?: number;
+
+  /** Final confidence after judge assessment */
+  finalConfidence?: number;
+}
+
+/**
+ * Complete telemetry for the extraction phase (multi-extractor mode)
+ */
+export interface ExtractionPhaseTelemetry {
+  /** Whether multi-extractor mode was used */
+  multiExtractorEnabled: boolean;
+
+  /** Per-extractor breakdown */
+  extractors: ExtractorTelemetry[];
+
+  /** Total issues from all extractors (before dedup) */
+  totalIssuesBeforeJudge: number;
+
+  /** Total issues after Jaccard deduplication (before judge) */
+  totalIssuesAfterDedup?: number;
+
+  /** Total issues after judge aggregation (final output) */
+  totalIssuesAfterJudge: number;
+
+  /** Model used for judge (if multi-extractor enabled) */
+  judgeModel?: string;
+
+  /** Judge execution time (if multi-extractor enabled) */
+  judgeDurationMs?: number;
+
+  /** Judge cost in USD (if available) */
+  judgeCostUsd?: number;
+
+  /** Unified usage metrics for the judge (if multi-extractor enabled) */
+  judgeUnifiedUsage?: UnifiedUsageMetrics;
+
+  /** Actual API parameters sent for judge call */
+  judgeActualApiParams?: ActualApiParams;
+
+  /** Response metrics from judge API call */
+  judgeResponseMetrics?: ApiResponseMetrics;
+
+  /** Detailed decisions for drill-down */
+  judgeDecisions: JudgeDecisionRecord[];
+}
+
+// ============================================================================
+// Profile Info
+// ============================================================================
+
+/**
+ * Information about the profile used for this execution
+ */
+export interface ProfileInfo {
+  /** Profile ID from database (if loaded from DB) */
+  profileId?: string;
+
+  /** Agent ID used for profile loading */
+  agentId: string;
+
+  /** Threshold configuration from profile */
+  thresholds: {
+    minSeverityThreshold: number;
+    maxIssues: number;
+    dedupThreshold: number;
+    maxIssuesToProcess: number;
+  };
+
+  /** Number of extractors configured */
+  extractorCount: number;
+
+  /** Whether judge is enabled */
+  judgeEnabled: boolean;
+
+  /** Whether custom prompts are configured in the profile */
+  hasCustomPrompts: boolean;
+}
+
+// ============================================================================
+// Pipeline Execution Record
+// ============================================================================
+
+/**
+ * Complete pipeline execution record
+ */
+export interface PipelineExecutionRecord {
+  /** Unique identifier for this execution */
+  executionId: string;
+
+  /** Timestamp when pipeline started */
+  startedAt: string;
+
+  /** Timestamp when pipeline completed */
+  completedAt: string;
+
+  /** Total duration of the entire pipeline in milliseconds */
+  totalDurationMs: number;
+
+  /** Document length in characters */
+  documentLength: number;
+
+  /** Metrics for each stage, in order of execution */
+  stages: StageMetrics[];
+
+  /** Final counts */
+  finalCounts: {
+    /** Total issues extracted initially */
+    issuesExtracted: number;
+    /** Issues after deduplication */
+    issuesAfterDedup: number;
+    /** Issues after all filtering */
+    issuesAfterFiltering: number;
+    /** Final comments generated */
+    commentsGenerated: number;
+    /** Comments kept after review */
+    commentsKept: number;
+  };
+
+  /** Overall success/failure status */
+  success: boolean;
+
+  /** Error message if pipeline failed */
+  error?: string;
+
+  /** Total estimated cost in dollars */
+  totalCostUsd?: number;
+
+  /** Pipeline version (for tracking changes over time) */
+  pipelineVersion: string;
+
+  /** Details about items that were filtered out (for debugging/validation) */
+  filteredItems?: FilteredItemRecord[];
+
+  /** Details about items that passed through filters (for debugging/validation) */
+  passedItems?: PassedItemRecord[];
+
+  /** Detailed extraction phase telemetry (multi-extractor mode) */
+  extractionPhase?: ExtractionPhaseTelemetry;
+
+  /** Profile configuration used for this execution */
+  profileInfo?: ProfileInfo;
+}
+
+/**
+ * Stage names used in the fallacy check pipeline
+ */
+export const PIPELINE_STAGES = {
+  EXTRACTION: 'extraction',
+  DEDUPLICATION: 'deduplication',
+  PRINCIPLE_OF_CHARITY_FILTER: 'principle-of-charity-filter',
+  SUPPORTED_ELSEWHERE_FILTER: 'supported-elsewhere-filter',
+  COMMENT_GENERATION: 'comment-generation',
+  REVIEW: 'review',
+} as const;
+
+export type PipelineStage = typeof PIPELINE_STAGES[keyof typeof PIPELINE_STAGES];
diff --git a/internal-packages/ai/src/analysis-plugins/types.ts b/internal-packages/ai/src/analysis-plugins/types.ts
index 36d69b0e5..9fef6882c 100644
--- a/internal-packages/ai/src/analysis-plugins/types.ts
+++ b/internal-packages/ai/src/analysis-plugins/types.ts
@@ -116,6 +116,7 @@ export interface AnalysisResult {
   comments: Comment[];
   cost: number;
   grade?: number; // Optional grade (0-100) for quality assessment
+  pipelineTelemetry?: Record<string, unknown>; // Pipeline execution telemetry (JSON-serializable)
 }
 
 export interface SimpleAnalysisPlugin {
diff --git a/internal-packages/ai/src/claude/wrapper.ts b/internal-packages/ai/src/claude/wrapper.ts
index 26563e587..c72c459bd 100644
--- a/internal-packages/ai/src/claude/wrapper.ts
+++ b/internal-packages/ai/src/claude/wrapper.ts
@@ -1,10 +1,15 @@
 import { Anthropic } from '@anthropic-ai/sdk';
 import { createAnthropicClient } from '../utils/anthropic';
 import { ANALYSIS_MODEL, RichLLMInteraction } from '../types';
-import { withRetry } from '../utils/retryUtils';
+// Note: withRetry was deprecated in favor of inline retry logic
 import { getCurrentHeliconeHeaders } from '../helicone/simpleSessionManager';
 import { logger } from '../shared/logger';
 import { getRemainingTimeMs } from '../shared/jobContext';
+import {
+  UnifiedUsageMetrics,
+  fromAnthropicUsage,
+  AnthropicRawUsage
+} from '../utils/usageMetrics';
 
 // Centralized model configuration
 export const MODEL_CONFIG = {
@@ -13,6 +18,12 @@ export const MODEL_CONFIG = {
   forecasting: ANALYSIS_MODEL,
 } as const;
 
+/** Extended thinking configuration */
+export interface ThinkingConfig {
+  type: "enabled";
+  budget_tokens: number;
+}
+
 export interface ClaudeCallOptions {
   model?: string;
   system?: string;
@@ -25,11 +36,51 @@ export interface ClaudeCallOptions {
   enablePromptCaching?: boolean; // Enable Anthropic prompt caching
   cacheSeed?: string; // Custom cache seed for Helicone response caching
   timeout?: number; // Custom timeout in milliseconds
+  /**
+   * Extended thinking mode configuration.
+   * - true: Enable with default budget of 10000 tokens
+   * - false/undefined: Disable extended thinking
+   * - ThinkingConfig: Enable with custom budget_tokens
+   * Note: Extended thinking requires temperature=1, so temperature is ignored when enabled.
+   */
+  thinking?: boolean | ThinkingConfig;
+}
+
+/** Actual API params as sent to Anthropic */
+export interface ClaudeActualParams {
+  model: string;
+  temperature: number;
+  maxTokens: number;
+  thinking?: {
+    type: 'enabled';
+    budget_tokens: number;
+  };
+}
+
+/** Response metrics from Claude API */
+export interface ClaudeResponseMetrics {
+  success: boolean;
+  latencyMs: number;
+  inputTokens: number;
+  outputTokens: number;
+  cacheReadTokens?: number;
+  cacheWriteTokens?: number;
+  stopReason?: string;
+  errorType?: string;
+  errorMessage?: string;
+  /** Full raw usage from Anthropic API */
+  rawUsage?: AnthropicRawUsage;
 }
 
 export interface ClaudeCallResult {
   response: Anthropic.Message;
   interaction: RichLLMInteraction;
+  /** Actual params sent to API - captured right before the call */
+  actualParams: ClaudeActualParams;
+  /** Response metrics */
+  responseMetrics: ClaudeResponseMetrics;
+  /** Unified usage metrics (includes calculated cost) */
+  unifiedUsage?: UnifiedUsageMetrics;
 }
 
 function buildPromptString(
@@ -48,7 +99,7 @@ function buildPromptString(
   return prompt.trim();
 }
 
-import { isApiError, type ApiError } from '../types/errors';
+import { isApiError } from '../types/errors';
 
 function isRetryableError(error: unknown): boolean {
   // Check if this is an API error with status code
@@ -104,9 +155,11 @@ export async function callClaude(
   
   // Make API call with manual retry logic for retryable errors only
   let response: Anthropic.Messages.Message;
-  let lastError: Error | null = null;
+  let _lastError: Error | null = null;
   const maxRetries = 3;
-  
+  let actualParams: ClaudeActualParams | undefined;
+  let apiCallStartTime: number = 0;
+
   for (let attempt = 1; attempt <= maxRetries; attempt++) {
     try {
       // Add delay between retries (exponential backoff)
@@ -115,11 +168,47 @@ export async function callClaude(
         await new Promise(resolve => setTimeout(resolve, delay));
       }
 
+      // Determine if extended thinking is enabled (default: false for tool calls to save cost)
+      // When thinking is enabled, temperature must be 1 and max_tokens must be > budget_tokens
+      const thinkingEnabled = options.thinking === true || typeof options.thinking === 'object';
+      const thinkingBudget = typeof options.thinking === 'object' && options.thinking.budget_tokens
+        ? options.thinking.budget_tokens
+        : 10000; // Default budget
+      // Claude's temperature range is 0-1 (unlike some other providers that allow 0-2)
+      // Cap any out-of-range values to prevent API errors
+      const requestedTemp = options.temperature ?? 0;
+      const effectiveTemperature = thinkingEnabled ? 1 : Math.min(Math.max(requestedTemp, 0), 1);
+      // When thinking is enabled, max_tokens must be greater than budget_tokens
+      const requestedMaxTokens = options.max_tokens || 4000;
+      const effectiveMaxTokens = thinkingEnabled
+        ? Math.max(requestedMaxTokens, thinkingBudget + 1000) // Ensure max_tokens > budget_tokens with buffer
+        : requestedMaxTokens;
+
       const requestOptions: Anthropic.Messages.MessageCreateParams = {
         model,
-        max_tokens: options.max_tokens || 4000,
-        temperature: options.temperature ?? 0,
-        messages: options.messages
+        max_tokens: effectiveMaxTokens,
+        temperature: effectiveTemperature,
+        messages: options.messages,
+        // Add thinking configuration when enabled
+        ...(thinkingEnabled && {
+          thinking: {
+            type: "enabled" as const,
+            budget_tokens: thinkingBudget,
+          }
+        }),
+      };
+
+      // Capture actual params being sent to API (for telemetry)
+      actualParams = {
+        model,
+        temperature: effectiveTemperature,
+        maxTokens: effectiveMaxTokens,
+        ...(thinkingEnabled && {
+          thinking: {
+            type: 'enabled' as const,
+            budget_tokens: thinkingBudget,
+          }
+        }),
       };
       
       if (options.system) {
@@ -166,7 +255,10 @@ export async function callClaude(
           reject(new Error(`Claude API call timed out after ${timeoutMs}ms`));
         }, timeoutMs);
       });
-      
+
+      // Capture timing for telemetry
+      apiCallStartTime = Date.now();
+
       const result = await Promise.race([
         anthropic.messages.create(requestOptions),
         timeoutPromise
@@ -175,10 +267,8 @@ export async function callClaude(
         clearTimeout(timeoutId);
       });
       
-      // Validate response structure
-      if (!result || !result.content || !result.usage) {
-        throw new Error('Malformed response from Claude API');
-      }
+      // Response structure is validated by TypeScript types
+      // result is guaranteed to have content and usage by Anthropic.Messages.Message type
       
       response = result;
       
@@ -204,7 +294,7 @@ export async function callClaude(
       break; // Success, exit retry loop
       
     } catch (error) {
-      lastError = error instanceof Error ? error : new Error(String(error));
+      _lastError = error instanceof Error ? error : new Error(String(error));
       
       // If error is not retryable, throw immediately
       if (!isRetryableError(error)) {
@@ -219,20 +309,21 @@ export async function callClaude(
       // Otherwise, continue to next retry attempt
     }
   }
-  
-  if (!response!) {
-    throw lastError || new Error('Max retries exhausted');
-  }
+
+  // TypeScript can't prove response is set (even though our logic guarantees it)
+  // because the for loop could theoretically complete without breaking.
+  // We know this is impossible because we always either break or throw.
+  const finalResponse = response!;
 
   // Automatically create interaction with proper format
   const interaction: RichLLMInteraction = {
     model,
     prompt: buildPromptString(options.system, options.messages),
-    response: JSON.stringify(response.content),
+    response: JSON.stringify(finalResponse.content),
     tokensUsed: {
-      prompt: response.usage.input_tokens,
-      completion: response.usage.output_tokens,
-      total: response.usage.input_tokens + response.usage.output_tokens
+      prompt: finalResponse.usage.input_tokens,
+      completion: finalResponse.usage.output_tokens,
+      total: finalResponse.usage.input_tokens + finalResponse.usage.output_tokens
     },
     timestamp: new Date(),
     duration: Date.now() - startTime
@@ -243,7 +334,39 @@ export async function callClaude(
     previousInteractions.push(interaction);
   }
 
-  return { response, interaction };
+  // Build raw usage object for unified metrics
+  const rawUsage: AnthropicRawUsage = {
+    input_tokens: finalResponse.usage.input_tokens,
+    output_tokens: finalResponse.usage.output_tokens,
+    cache_creation_input_tokens: (finalResponse.usage as { cache_creation_input_tokens?: number }).cache_creation_input_tokens,
+    cache_read_input_tokens: (finalResponse.usage as { cache_read_input_tokens?: number }).cache_read_input_tokens,
+  };
+
+  // Calculate latency
+  const latencyMs = apiCallStartTime > 0 ? Date.now() - apiCallStartTime : Date.now() - startTime;
+
+  // Build response metrics for telemetry
+  const responseMetrics: ClaudeResponseMetrics = {
+    success: true,
+    latencyMs,
+    inputTokens: finalResponse.usage.input_tokens,
+    outputTokens: finalResponse.usage.output_tokens,
+    cacheReadTokens: rawUsage.cache_read_input_tokens,
+    cacheWriteTokens: rawUsage.cache_creation_input_tokens,
+    stopReason: finalResponse.stop_reason || undefined,
+    rawUsage,
+  };
+
+  // Build unified usage metrics (includes calculated cost)
+  const unifiedUsage = fromAnthropicUsage(rawUsage, model, latencyMs);
+
+  return {
+    response: finalResponse,
+    interaction,
+    actualParams: actualParams!,
+    responseMetrics,
+    unifiedUsage,
+  };
 }
 
 /**
@@ -257,6 +380,10 @@ export async function callClaudeWithTool<T>(
   },
   previousInteractions?: RichLLMInteraction[]
 ): Promise<ClaudeCallResult & { toolResult: T }> {
+  // When thinking is enabled, we must use tool_choice: 'auto' because
+  // forced tool_choice is incompatible with extended thinking
+  const thinkingEnabled = options.thinking === true || typeof options.thinking === 'object';
+
   const toolOptions: ClaudeCallOptions = {
     ...options,
     tools: [{
@@ -264,7 +391,10 @@ export async function callClaudeWithTool<T>(
       description: options.toolDescription,
       input_schema: options.toolSchema
     }],
-    tool_choice: { type: "tool", name: options.toolName },
+    // Use 'auto' when thinking is enabled, otherwise force the specific tool
+    tool_choice: thinkingEnabled
+      ? { type: "auto" }
+      : { type: "tool", name: options.toolName },
     cacheSeed: options.cacheSeed // Pass through cache seed
   };
 
@@ -293,8 +423,8 @@ export async function callClaudeWithTool<T>(
   );
   if (!toolUse) {
     // Enhanced error message to check for max_tokens issue
-    const stopReason = result.response.stop_reason as string;
-    if (stopReason === 'max_tokens') {
+    // Note: 'max_tokens' stop reason exists but isn't in the SDK types
+    if ((result.response.stop_reason as string) === 'max_tokens') {
       throw new Error(
         `No tool use found - response was truncated due to max_tokens limit (${options.max_tokens || 4000} tokens)`
       );
@@ -307,12 +437,12 @@ export async function callClaudeWithTool<T>(
 
   // Check if tool input is empty or malformed (often happens with max_tokens)
   if (!toolUse.input || Object.keys(toolUse.input).length === 0) {
-    const stopReason = result.response.stop_reason as string;
-    if (stopReason === 'max_tokens') {
+    // Note: 'max_tokens' stop reason exists but isn't in the SDK types
+    if ((result.response.stop_reason as string) === 'max_tokens') {
       logger.error('[Claude] Tool returned empty due to max_tokens truncation', {
         tool: options.toolName,
         max_tokens: options.max_tokens || 4000,
-        stop_reason: stopReason
+        stop_reason: result.response.stop_reason
       });
       
       throw new Error(
diff --git a/internal-packages/ai/src/index.ts b/internal-packages/ai/src/index.ts
index 74f394cc4..834a997e2 100644
--- a/internal-packages/ai/src/index.ts
+++ b/internal-packages/ai/src/index.ts
@@ -20,8 +20,23 @@ export type { SimpleSessionConfig } from './helicone/simpleSessionManager';
 export * from './utils/tokenUtils';
 export * from './utils/anthropic';
 export * from './utils/retryUtils';
-export * from './utils/openrouter';
+// Client-safe openrouter types and constants (no async API functions)
+export * from './utils/openrouter-types';
+export * from './utils/allModels';
+// Client-safe reasoning budget utilities (no logger dependency)
+export * from './utils/reasoningBudget-client';
+export * from './utils/modelConfigResolver';
 export * from './types';
+// Export common types (note: ReasoningEffort is also exported from openrouter with same definition,
+// and ReasoningConfig has different meaning in openrouter - profile config vs API format)
+export {
+  type ReasoningConfig as ProfileReasoningConfig,
+  type ProviderPreferences,
+  type ActualApiParams,
+  type ApiResponseMetrics,
+  EFFORT_TO_BUDGET_TOKENS,
+  effortToBudgetTokens,
+} from './types/common';
 
 // Configuration
 export { initializeAI, type AIConfig } from './config';
@@ -36,27 +51,58 @@ export {
   type ModelName
 } from './utils/costCalculator';
 
+// Unified usage metrics
+export {
+  type UnifiedUsageMetrics,
+  type OpenRouterRawUsage,
+  type AnthropicRawUsage,
+  fromOpenRouterUsage,
+  fromAnthropicUsage,
+  calculateAnthropicCost,
+  getAnthropicPricing,
+  formatCost,
+  aggregateUsageMetrics,
+  ANTHROPIC_PRICING,
+} from './utils/usageMetrics';
+
 // Tool configs and generated metadata
 export * from './tools/configs';
 export { toolSchemas, getToolSchema, type ToolId } from './tools/generated-schemas';
 export { toolReadmes, getToolReadme } from './tools/generated-readmes';
 
-// Tool types (implementations in @roast/ai/server)
-export type { Tool, ToolContext, ToolConfig } from './tools/base/Tool';
-export type { DocumentChunkerOutput } from './tools/document-chunker';
-export type { TextLocationFinderOutput } from './tools/smart-text-searcher';
-export type { CheckMathOutput } from './tools/math-validator-llm';
-export type { CheckMathAgenticOutput as CheckMathWithMathJSOutput } from './tools/math-validator-mathjs/types';
-export type { CheckSpellingGrammarOutput, SpellingGrammarError } from './tools/spelling-grammar-checker';
-export type { ExtractFactualClaimsOutput, ExtractedFactualClaim } from './tools/factual-claims-extractor';
-export type { ExtractForecastingClaimsOutput, ExtractedForecast } from './tools/binary-forecasting-claims-extractor';
-export type { ExtractMathExpressionsOutput, ExtractedMathExpression } from './tools/math-expressions-extractor';
-export type { DetectLanguageConventionOutput } from './tools/language-convention-detector';
-export type { MathErrorDetails, MathVerificationStatus } from './tools/shared/math-schemas';
+// Default prompts for fallacy extractor (used by profile editor UI)
+export {
+  DEFAULT_EXTRACTOR_SYSTEM_PROMPT,
+  DEFAULT_EXTRACTOR_USER_PROMPT,
+} from './tools/fallacy-extractor/prompts';
+
+// Tool types - client-safe imports from types.ts (no logger dependency)
+// NOTE: Tool class is server-only, import from @roast/ai/server
+export type { ToolContext, ToolConfig } from './tools/base/types';
+
+// Client-safe tool output types (extracted to avoid server dependencies)
+export type {
+  DocumentChunk,
+  DocumentChunkerOutput,
+  TextLocationFinderOutput,
+  CheckMathOutput,
+  CheckMathWithMathJSOutput,
+  CheckSpellingGrammarOutput,
+  SpellingGrammarError,
+  ExtractFactualClaimsOutput,
+  ExtractedFactualClaim,
+  ExtractForecastingClaimsOutput,
+  ExtractedForecast,
+  ExtractMathExpressionsOutput,
+  ExtractedMathExpression,
+  DetectLanguageConventionOutput,
+  MathErrorDetails,
+  MathVerificationStatus,
+  MathErrorType,
+  MathSeverity,
+  MathVerificationDetails,
+} from './tools/client-types';
 
-// Plugin types (implementations in @roast/ai/server)
-export type { FullDocumentAnalysisResult } from './analysis-plugins/PluginManager';
-export type { Finding } from './analysis-plugins/types';
 export { PluginType } from './analysis-plugins/types/plugin-types';
 
 // Document and agent schemas
diff --git a/internal-packages/ai/src/server.ts b/internal-packages/ai/src/server.ts
index 584aeff18..3da0397d3 100644
--- a/internal-packages/ai/src/server.ts
+++ b/internal-packages/ai/src/server.ts
@@ -9,6 +9,7 @@ export * from './claude/wrapper';
 
 // Document analysis workflows (uses logger)
 export * from './workflows';
+export type { DocumentAnalysisResult } from './workflows';
 
 // Tools system - full implementations
 export * from './tools';
@@ -42,5 +43,19 @@ export {
   JobTimeoutError
 } from './shared/jobContext';
 
+// Reasoning budget resolver (full async version with caching)
+export {
+  resolveReasoningBudget,
+  invalidateEndpointsCache,
+  invalidateAllEndpointsCache,
+} from './utils/reasoningBudget';
+
+// OpenRouter API functions (server-only - require API calls)
+export {
+  callOpenRouter,
+  callOpenRouterChat,
+  callOpenRouterWithTool,
+} from './utils/openrouter';
+
 // Meta-evaluation system (uses Claude API)
 export * from './meta-eval';
\ No newline at end of file
diff --git a/internal-packages/ai/src/tools/base/Tool.ts b/internal-packages/ai/src/tools/base/Tool.ts
index df5b7a01d..47be775b2 100644
--- a/internal-packages/ai/src/tools/base/Tool.ts
+++ b/internal-packages/ai/src/tools/base/Tool.ts
@@ -1,24 +1,10 @@
 import { z } from 'zod';
 import { zodToJsonSchema } from 'zod-to-json-schema';
-import { logger as defaultLogger } from '../../shared/logger';
 import { getGlobalSessionManager } from '../../helicone/simpleSessionManager';
+import type { ToolConfig, ToolContext } from './types';
 
-export interface ToolConfig {
-  id: string;
-  name: string;
-  description: string;
-  version: string;
-  category: 'extraction' | 'checker' | 'research' | 'utility';
-  costEstimate?: string;
-  path?: string; // UI route path
-  status?: 'stable' | 'experimental' | 'beta';
-}
-
-export interface ToolContext {
-  userId?: string;
-  apiKey?: string;
-  logger: typeof defaultLogger;
-}
+// Re-export types for backwards compatibility
+export type { ToolConfig, ToolContext } from './types';
 
 export abstract class Tool<TInput = unknown, TOutput = unknown> {
   abstract config: ToolConfig;
@@ -28,13 +14,13 @@ export abstract class Tool<TInput = unknown, TOutput = unknown> {
   // Core method that all tools implement
   abstract execute(input: TInput, context: ToolContext): Promise<TOutput>;
   
-  // Optional hooks
-  async validateAccess(context: ToolContext): Promise<boolean> {
-    return true;
+  // Optional hooks (subclasses can override as async if needed)
+  validateAccess(_context: ToolContext): Promise<boolean> {
+    return Promise.resolve(true);
   }
-  
-  async beforeExecute(input: TInput, context: ToolContext): Promise<void> {}
-  async afterExecute(output: TOutput, context: ToolContext): Promise<void> {}
+
+  beforeExecute(_input: TInput, _context: ToolContext): Promise<void> { return Promise.resolve(); }
+  afterExecute(_output: TOutput, _context: ToolContext): Promise<void> { return Promise.resolve(); }
   
   // Schema conversion methods
   getInputJsonSchema() {
diff --git a/internal-packages/ai/src/tools/base/testRunner.ts b/internal-packages/ai/src/tools/base/testRunner.ts
index 72266b45f..0fea92a15 100644
--- a/internal-packages/ai/src/tools/base/testRunner.ts
+++ b/internal-packages/ai/src/tools/base/testRunner.ts
@@ -239,8 +239,9 @@ export async function runTestSuite<TInput, TExpected>(
   const results: TestResult[] = [];
   let totalScore = 0;
   
-  const testCasesToRun = options.filter 
-    ? suite.testCases.filter(tc => options.filter!.includes(tc.id))
+  const filterIds = options.filter;
+  const testCasesToRun = filterIds
+    ? suite.testCases.filter(tc => filterIds.includes(tc.id))
     : suite.testCases;
   
   for (let i = 0; i < testCasesToRun.length; i++) {
diff --git a/internal-packages/ai/src/tools/base/types.ts b/internal-packages/ai/src/tools/base/types.ts
index cd587c169..e765ecd82 100644
--- a/internal-packages/ai/src/tools/base/types.ts
+++ b/internal-packages/ai/src/tools/base/types.ts
@@ -1,5 +1,46 @@
 import { z } from 'zod';
 
+// ============================================================================
+// Tool Metadata Types (client-safe, no runtime dependencies)
+// ============================================================================
+
+/**
+ * Tool configuration metadata
+ */
+export interface ToolConfig {
+  id: string;
+  name: string;
+  description: string;
+  version: string;
+  category: 'extraction' | 'checker' | 'research' | 'utility';
+  costEstimate?: string;
+  path?: string; // UI route path
+  status?: 'stable' | 'experimental' | 'beta';
+}
+
+/**
+ * Logger interface that tools use for structured logging
+ */
+export interface Logger {
+  info: (message: string, ...args: unknown[]) => void;
+  warn: (message: string, ...args: unknown[]) => void;
+  error: (message: string, ...args: unknown[]) => void;
+  debug: (message: string, ...args: unknown[]) => void;
+}
+
+/**
+ * Context provided to tools during execution
+ */
+export interface ToolContext {
+  userId?: string;
+  apiKey?: string;
+  logger: Logger;
+}
+
+// ============================================================================
+// Common Response Types
+// ============================================================================
+
 // Common response wrapper
 export const toolResponseSchema = z.object({
   success: z.boolean(),
diff --git a/internal-packages/ai/src/tools/claim-evaluator/index.ts b/internal-packages/ai/src/tools/claim-evaluator/index.ts
index fa7bee177..de9fa506a 100644
--- a/internal-packages/ai/src/tools/claim-evaluator/index.ts
+++ b/internal-packages/ai/src/tools/claim-evaluator/index.ts
@@ -1,19 +1,15 @@
 import { z } from "zod";
 import { Tool, ToolContext } from "../base/Tool";
 import { claimEvaluatorConfig } from "../configs";
-import { createOpenRouterClient, OPENROUTER_MODELS, normalizeTemperature } from "../../utils/openrouter";
+import { callOpenRouterChat, OPENROUTER_MODELS, normalizeTemperature } from "../../utils/openrouter";
 import { HeliconeSessionManager, setGlobalSessionManager } from "../../helicone/simpleSessionManager";
 
 // Import from new modules
 import { generateClaimEvaluatorPrompt, DEFAULT_EXPLANATION_LENGTH } from "./prompt";
 import {
-  RefusalReason,
   TokenUsage,
-  MessageWithReasoning,
   EvaluationError,
   ClaimEvaluatorInput,
-  ModelEvaluation,
-  FailedEvaluation,
   EvaluationResult,
   ClaimEvaluatorOutput,
   extractProvider,
@@ -140,7 +136,6 @@ const outputSchema = z.object({
  * Evaluate a claim with a single model via OpenRouter with timeout
  */
 async function evaluateWithModel(
-  client: ReturnType<typeof createOpenRouterClient>,
   input: ClaimEvaluatorInput,
   model: string,
   context: ToolContext,
@@ -173,14 +168,14 @@ async function evaluateWithModel(
         { provider, model: modelName },
         async () => {
           return Promise.race([
-            evaluateWithModelImpl(client, input, model, context),
+            evaluateWithModelImpl(input, model, context),
             timeoutPromise,
           ]);
         }
       );
     } else {
       return Promise.race([
-        evaluateWithModelImpl(client, input, model, context),
+        evaluateWithModelImpl(input, model, context),
         timeoutPromise,
       ]);
     }
@@ -200,7 +195,6 @@ async function evaluateWithModel(
  * Implementation of model evaluation (wrapped with timeout)
  */
 async function evaluateWithModelImpl(
-  client: ReturnType<typeof createOpenRouterClient>,
   input: ClaimEvaluatorInput,
   model: string,
   context: ToolContext
@@ -233,37 +227,25 @@ async function evaluateWithModelImpl(
 
     // Track response time
     const startTime = Date.now();
-    const completion = await client.chat.completions.create(
-      {
-        model,
-        messages: [
-          {
-            role: 'user',
-            content: uniquePrompt,
-          },
-        ],
-        max_tokens: maxTokens,
-        temperature: actualTemperature, // Normalized per provider (Anthropic 0-1, others 0-2)
-        // Use OpenRouter's standard response_format parameter for JSON mode
-        // Works across all providers (OpenAI, Gemini, etc.) through OpenRouter
-        response_format: { type: 'json_object' },
-      },
-      {
-        // Pass headers to disable caching (via request options)
-        // Helicone caching: Use unique seed per request to prevent cache hits
-        headers: {
-          'X-No-Cache': 'true',
-          'Helicone-Cache-Enabled': 'false',
-          'Helicone-Cache-Seed': uniqueId, // Unique seed ensures no cache reuse
-        } as Record<string, string>,
-      }
-    );
+    const completion = await callOpenRouterChat({
+      model,
+      messages: [
+        {
+          role: 'user',
+          content: uniquePrompt,
+        },
+      ],
+      max_tokens: maxTokens,
+      temperature: actualTemperature, // Normalized per provider (Anthropic 0-1, others 0-2)
+      // Use OpenRouter's standard response_format parameter for JSON mode
+      // Works across all providers (OpenAI, Gemini, etc.) through OpenRouter
+      response_format: { type: 'json_object' },
+    });
     responseTimeMs = Date.now() - startTime;
 
-    const message = completion.choices[0]?.message as MessageWithReasoning | undefined;
-    rawContent = message?.content || undefined;
+    rawContent = completion.content || undefined;
     // Capture reasoning from both GPT-5 (reasoning) and o1/o3 (reasoning_content)
-    rawThinking = message?.reasoning || message?.reasoning_content || undefined;
+    rawThinking = completion.reasoning || undefined;
     rawTokenUsage = completion.usage as TokenUsage | undefined;
 
     if (!rawContent) {
@@ -284,7 +266,7 @@ async function evaluateWithModelImpl(
     let parsed: unknown;
     try {
       parsed = extractAndParseJSON(rawContent);
-    } catch (parseError: unknown) {
+    } catch {
       // Don't expose JSON parsing details to end users
       throw createEvaluationError(
         'Model returned invalid JSON',
@@ -404,8 +386,6 @@ export class ClaimEvaluatorTool extends Tool<ClaimEvaluatorInput, ClaimEvaluator
     setGlobalSessionManager(sessionManager);
 
     try {
-      const client = createOpenRouterClient();
-
       // Create array of all model-run combinations
       // Each model will be run 'runs' times independently
       const modelRuns: Array<{ model: string; runIndex: number }> = [];
@@ -417,7 +397,7 @@ export class ClaimEvaluatorTool extends Tool<ClaimEvaluatorInput, ClaimEvaluator
 
       // Evaluate with all models in parallel (across all runs)
       const results = await Promise.allSettled(
-        modelRuns.map(({ model }) => evaluateWithModel(client, input, model, context, sessionManager))
+        modelRuns.map(({ model }) => evaluateWithModel(input, model, context, sessionManager))
       );
 
       // Process results, maintaining index correspondence with modelRuns
@@ -530,7 +510,7 @@ export class ClaimEvaluatorTool extends Tool<ClaimEvaluatorInput, ClaimEvaluator
     }
   }
 
-  override async beforeExecute(
+  override beforeExecute(
     input: ClaimEvaluatorInput,
     context: ToolContext
   ): Promise<void> {
@@ -539,9 +519,10 @@ export class ClaimEvaluatorTool extends Tool<ClaimEvaluatorInput, ClaimEvaluator
     context.logger.info(
       `[ClaimEvaluator] Starting evaluation with ${modelCount} models, ${runs} run(s) each (${modelCount * runs} total evaluations)`
     );
+    return Promise.resolve();
   }
 
-  override async afterExecute(
+  override afterExecute(
     output: ClaimEvaluatorOutput,
     context: ToolContext
   ): Promise<void> {
@@ -549,6 +530,7 @@ export class ClaimEvaluatorTool extends Tool<ClaimEvaluatorInput, ClaimEvaluator
     context.logger.info(
       `[ClaimEvaluator] Completed with ${successCount} successful evaluations`
     );
+    return Promise.resolve();
   }
 }
 
diff --git a/internal-packages/ai/src/tools/client-types.ts b/internal-packages/ai/src/tools/client-types.ts
new file mode 100644
index 000000000..1cd76043d
--- /dev/null
+++ b/internal-packages/ai/src/tools/client-types.ts
@@ -0,0 +1,223 @@
+/**
+ * Client-safe tool types
+ *
+ * These types are extracted from tool implementations to avoid pulling in
+ * server-side dependencies when importing types for UI components.
+ *
+ * DO NOT import anything from tool index files or base/Tool.ts here.
+ */
+
+import type { LanguageConvention } from '../shared/types';
+
+// ============================================================================
+// Document Chunker Types
+// ============================================================================
+
+export interface DocumentChunk {
+  id: string;
+  text: string;
+  startOffset: number;
+  endOffset: number;
+  startLine: number;
+  endLine: number;
+  metadata: {
+    type?: 'paragraph' | 'section' | 'code' | 'list' | 'heading' | 'mixed';
+    headingContext?: string[];
+    isComplete: boolean;
+    confidence: number;
+  };
+}
+
+export interface DocumentChunkerOutput {
+  chunks: DocumentChunk[];
+  metadata: {
+    totalChunks: number;
+    averageChunkSize: number;
+    strategy: string;
+    warnings?: string[];
+  };
+}
+
+// ============================================================================
+// Text Location Finder Types
+// ============================================================================
+
+export interface TextLocationFinderOutput {
+  searchText: string;
+  found: boolean;
+  location?: {
+    startOffset: number;
+    endOffset: number;
+    quotedText: string;
+    strategy: string;
+    confidence: number;
+  };
+  error?: string;
+  processingTimeMs: number;
+  llmUsed?: boolean;
+}
+
+// ============================================================================
+// Math Validator Types
+// ============================================================================
+
+export type MathVerificationStatus = 'verified_true' | 'verified_false' | 'verified_warning' | 'cannot_verify';
+export type MathErrorType = 'calculation' | 'logic' | 'unit' | 'notation' | 'conceptual';
+export type MathSeverity = 'critical' | 'major' | 'minor';
+
+export interface MathErrorDetails {
+  errorType: MathErrorType;
+  severity: MathSeverity;
+  displayCorrection: string;
+  expectedValue?: string;
+  actualValue?: string;
+  steps?: Array<{
+    expression: string;
+    result: string;
+  }>;
+}
+
+export interface MathVerificationDetails {
+  mathJsExpression?: string;
+  computedValue?: string;
+  steps?: Array<{
+    expression: string;
+    result: string;
+  }>;
+}
+
+export interface CheckMathOutput {
+  statement: string;
+  status: MathVerificationStatus;
+  explanation: string;
+  reasoning: string;
+  errorDetails?: MathErrorDetails;
+}
+
+// CheckMathWithMathJSOutput is an alias for the agentic math validator output
+export interface CheckMathWithMathJSOutput {
+  statement: string;
+  status: MathVerificationStatus;
+  explanation: string;
+  verificationDetails?: MathVerificationDetails;
+  errorDetails?: MathErrorDetails;
+  error?: string;
+}
+
+// ============================================================================
+// Spelling & Grammar Checker Types
+// ============================================================================
+
+export interface SpellingGrammarError {
+  text: string;
+  correction: string;
+  displayCorrection: string;
+  type: 'spelling' | 'grammar';
+  context?: string;
+  importance: number;
+  confidence: number;
+  description?: string;
+  lineNumber?: number;
+}
+
+export interface CheckSpellingGrammarOutput {
+  errors: SpellingGrammarError[];
+  metadata?: {
+    totalErrorsFound: number;
+    convention: LanguageConvention | 'mixed';
+    processingTime?: number;
+  };
+}
+
+// ============================================================================
+// Factual Claims Extractor Types
+// ============================================================================
+
+export interface ExtractedFactualClaim {
+  exactText: string;
+  claim: string;
+  topic: string;
+  importanceScore: number;
+  checkabilityScore: number;
+  truthProbability: number;
+  highlight?: {
+    startOffset: number;
+    endOffset: number;
+    quotedText: string;
+    isValid: boolean;
+    prefix?: string;
+    error?: string;
+  };
+}
+
+export interface ExtractFactualClaimsOutput {
+  claims: ExtractedFactualClaim[];
+  summary: {
+    totalFound: number;
+    aboveThreshold: number;
+    averageQuality: number;
+  };
+}
+
+// ============================================================================
+// Forecasting Claims Extractor Types
+// ============================================================================
+
+export interface ExtractedForecast {
+  originalText: string;
+  thinking: string;
+  precisionScore: number;
+  verifiabilityScore: number;
+  importanceScore: number;
+  rewrittenPredictionText: string;
+  authorProbability?: number;
+  robustnessScore: number;
+  resolutionDate?: string;
+  minimalProbabilitySpan?: string;
+}
+
+export interface ExtractForecastingClaimsOutput {
+  forecasts: ExtractedForecast[];
+}
+
+// ============================================================================
+// Math Expressions Extractor Types
+// ============================================================================
+
+export interface ExtractedMathExpression {
+  originalText: string;
+  hasError: boolean;
+  errorType?: MathErrorType;
+  errorExplanation?: string;
+  correctedVersion?: string;
+  displayCorrection?: string;
+  complexityScore: number;
+  contextImportanceScore: number;
+  errorSeverityScore: number;
+  simplifiedExplanation?: string;
+  verificationStatus: 'verified' | 'unverified' | 'unverifiable';
+  severity?: MathSeverity;
+}
+
+export interface ExtractMathExpressionsOutput {
+  expressions: ExtractedMathExpression[];
+}
+
+// ============================================================================
+// Language Convention Detector Types
+// ============================================================================
+
+export interface DetectLanguageConventionOutput {
+  convention: LanguageConvention;
+  confidence: number;
+  consistency: number;
+  evidence: Array<{
+    word: string;
+    convention: LanguageConvention;
+    count: number;
+  }>;
+  documentType?: {
+    type: 'academic' | 'technical' | 'blog' | 'casual' | 'unknown';
+    confidence: number;
+  };
+}
diff --git a/internal-packages/ai/src/tools/configs.ts b/internal-packages/ai/src/tools/configs.ts
index ede583b23..9b49e6977 100644
--- a/internal-packages/ai/src/tools/configs.ts
+++ b/internal-packages/ai/src/tools/configs.ts
@@ -1,7 +1,7 @@
 // Tool configurations for client-side use (no implementations)
 // These can be safely imported in browser code
 // This is the single source of truth for tool metadata
-import type { ToolConfig } from "./base/Tool";
+import type { ToolConfig } from "./base/types";
 
 // ============================================================================
 // Tool Configs - Single Source of Truth
diff --git a/internal-packages/ai/src/tools/fallacy-extractor/index.ts b/internal-packages/ai/src/tools/fallacy-extractor/index.ts
index 40cba3ebc..09d10c51f 100644
--- a/internal-packages/ai/src/tools/fallacy-extractor/index.ts
+++ b/internal-packages/ai/src/tools/fallacy-extractor/index.ts
@@ -4,21 +4,28 @@ import {
   ISSUE_TYPES,
 } from "../../analysis-plugins/plugins/fallacy-check/constants";
 import { callClaudeWithTool } from "../../claude/wrapper";
+import { callOpenRouterWithTool } from "../../utils/openrouter";
 import {
   Tool,
   ToolContext,
 } from "../base/Tool";
 import { fallacyExtractorConfig } from "../configs";
 import { generateCacheSeed } from "../shared/cache-utils";
+import { withDateContext } from "../shared/llm-filter-utils";
 import fuzzyTextLocatorTool from "../smart-text-searcher";
 import { findLocationInChunk } from "../smart-text-searcher/chunk-location-finder";
+import type { UnifiedUsageMetrics } from "../../utils/usageMetrics";
 import type {
   FallacyExtractorInput,
   FallacyExtractorOutput,
   ExtractedFallacyIssue,
+  ActualApiParams,
+  ApiResponseMetrics,
 } from "./types";
-
-// Removed severity-calibration and genre imports - we trust the LLM's scores
+import {
+  DEFAULT_EXTRACTOR_SYSTEM_PROMPT,
+  DEFAULT_EXTRACTOR_USER_PROMPT,
+} from "./prompts";
 
 // Zod schemas
 const extractedFallacyIssueSchema = z.object({
@@ -79,9 +86,19 @@ const extractedFallacyIssueSchema = z.object({
 }) satisfies z.ZodType<ExtractedFallacyIssue>;
 
 const inputSchema = z.object({
-  text: z.string().min(1).max(50000).describe("Text chunk to analyze for epistemic issues and logical fallacies"),
-  documentText: z.string().optional().describe("Full document text (optional, used for accurate location finding)"),
+  text: z.string().max(50000).optional().describe("Text chunk to analyze (optional if documentText provided)"),
+  documentText: z.string().optional().describe("Full document text - used for analysis in single-pass mode, or for location finding in chunk mode"),
   chunkStartOffset: z.number().min(0).optional().describe("Byte offset where this chunk starts in the full document (optimization for location finding)"),
+  model: z.string().optional().describe("Model to use (Claude or OpenRouter model ID)"),
+  temperature: z.union([
+    z.number().min(0).max(2),
+    z.literal('default'),
+  ]).optional().describe("Temperature for extraction (default: 0 for Claude, 0.1 for OpenRouter, 'default' to use model's native default)"),
+  thinking: z.boolean().optional().describe("Enable extended thinking/reasoning (default: true for Claude, varies for OpenRouter)"),
+  customSystemPrompt: z.string().optional().describe("Custom system prompt override"),
+  customUserPrompt: z.string().optional().describe("Custom user prompt override (document text appended)"),
+  minSeverityThreshold: z.number().min(0).max(100).optional().describe("Minimum severity threshold (default: 60)"),
+  maxIssues: z.number().min(1).max(100).optional().describe("Maximum issues to return (default: 15)"),
 }) satisfies z.ZodType<FallacyExtractorInput>;
 
 const outputSchema = z.object({
@@ -104,242 +121,266 @@ export class FallacyExtractorTool extends Tool<
   ): Promise<FallacyExtractorOutput> {
     const executionStartTime = Date.now();
 
-    // Hardcoded configuration
-    const MIN_SEVERITY_THRESHOLD = 60; // Only report significant issues
-    const MAX_ISSUES = 15; // Limit to prevent overwhelming output
+    // Configuration - use input overrides or defaults
+    const MIN_SEVERITY_THRESHOLD = input.minSeverityThreshold ?? 60;
+    const MAX_ISSUES = input.maxIssues ?? 15;
+
+    // Use documentText for analysis if text is not provided (single-pass mode)
+    // This allows callers to just pass documentText for full-document analysis
+    const textToAnalyze = input.text || input.documentText || "";
+
+    // Prompt version for tracking - update this when prompt changes
+    const PROMPT_VERSION = "v2-justification-check";
+
+    // Determine which model to use:
+    // 1. input.model (explicit override)
+    // 2. FALLACY_EXTRACTOR_MODEL env var (for testing different models)
+    // 3. Default (Claude via callClaudeWithTool which uses its own default)
+    const modelId = input.model || process.env.FALLACY_EXTRACTOR_MODEL || undefined;
+    const isOpenRouterModel = modelId?.includes("/") || false; // OpenRouter models have format "provider/model"
+
+    // Debug logging for development
+    context.logger.debug(
+      `[FallacyExtractor] Running: model=${modelId || "default"} (${isOpenRouterModel ? "OpenRouter" : "Claude"}), mode=${input.text ? "chunk" : "single-pass"}, docLength=${textToAnalyze.length}`
+    );
 
     // Audit log: Tool execution started
     context.logger.info(
       "[FallacyExtractor] AUDIT: Tool execution started",
       {
         timestamp: new Date().toISOString(),
-        textLength: input.text.length,
+        promptVersion: PROMPT_VERSION,
+        textLength: textToAnalyze.length,
+        textPreview: textToAnalyze.substring(0, 100),
         minSeverityThreshold: MIN_SEVERITY_THRESHOLD,
         maxIssues: MAX_ISSUES,
         hasDocumentText: !!input.documentText,
         hasChunkOffset: input.chunkStartOffset !== undefined,
+        mode: input.text ? "chunk" : "single-pass",
       }
     );
 
     context.logger.info(
-      `[FallacyExtractor] Analyzing text for epistemic issues`
+      `[FallacyExtractor] PROMPT_VERSION=${PROMPT_VERSION} MODE=${input.text ? "chunk" : "single-pass"} DOC_LENGTH=${textToAnalyze.length}`
     );
 
-    const systemPrompt = `You are an expert epistemic critic analyzing reasoning quality and argumentation.
-
-**FOCUS**: Sophisticated epistemic issues, NOT basic fact-checking (handled by other tools).
-
-**🚨 CRITICAL: COMMITTING vs DISCUSSING**
-- Do NOT flag authors EXPLAINING, WARNING about, or ACKNOWLEDGING errors (good epistemics!)
-- Only flag authors MAKING the error themselves
-
-**🎯 SELECTIVITY**: Senior reviewer, not pedantic nitpicker.
-- Only flag issues that significantly mislead, clearly commit error, and matter to the argument
-- Default to NOT flagging. Aim for ~5-10 high-quality issues, not 20+ marginal ones
-- Only report severity ≥ 60, confidence ≥ 70
-
-**FALSE POSITIVE Examples (do NOT flag):**
-1. "Selection bias is a major problem in hiring research because we only see candidates who apply"
-   → Author EXPLAINING the concept, not committing the error
-2. "Be careful not to generalize from a single case study"
-   → Author WARNING about error
-3. "There isn't a cheap way to run true RCTs on hiring, so we're stuck with observational data and its selection biases"
-   → Author ACKNOWLEDGING limitation (good epistemics!)
-
-**TRUE POSITIVE Examples (DO flag):**
-1. "Our clients love us! 95% would recommend us to a friend"
-   → COMMITTING survivorship bias (only surveying existing clients)
-2. "Studies show that our approach is highly effective"
-   → COMMITTING weasel words (vague authority without citation)
-3. "Since launching in March 2020, we've delivered 847% returns"
-   → COMMITTING cherry-picked timeframe (market bottom)
-
-**CORE AREAS (prioritize these):**
-
-1. **Statistical Reasoning Errors**
-   - Base rate neglect (ignoring prior probabilities)
-   - Survivorship bias (only examining success cases)
-   - Selection bias (non-random samples)
-   - Framing: absolute vs relative risk ("50% increase" = 2% to 3%)
-
-2. **Sophisticated Logical Fallacies**
-   - False dichotomy (only presenting two options)
-   - Motte-bailey (defending weak claim by switching to strong one)
-   - Circular reasoning (conclusion in premises)
-   - Hasty generalization (insufficient evidence → broad claim)
-
-3. **Framing & Rhetorical Manipulation**
-   - Anchoring (biasing judgment with reference points)
-   - Denominator neglect ("10 deaths" vs "10 per million")
-   - Cherry-picked timeframes (ignoring unfavorable periods)
-   - False precision ("exactly 47.3%" when rough estimate warranted)
-
-4. **Suspicious Numbers**
-   - False precision: "47.3% annual returns" from "internal study"
-   - Too perfect: 98%, 99%, 99.9%, 100% = suspiciously high
-   - Impossibly exact: "Exactly 10x returns" vs "approximately 10x"
-
-5. **Missing Crucial Context**
-   - Only flag when you KNOW what's missing and it significantly changes interpretation
-   - Examples: Cherry-picked time periods, undisclosed conflicts of interest, missing comparison groups
-
-6. **Bad Faith Argumentation**
-   - Strawmanning (misrepresenting opposing views)
-   - Moving goalposts (changing criteria when challenged)
-   - Quote mining (taking quotes out of context)
-   - Whataboutism (deflecting criticism by pointing elsewhere)
-
-7. **Causal Reasoning Errors**
-   - Confounding variables (third variable causes both X and Y)
-   - Reverse causation (getting direction backwards)
-   - Post hoc ergo propter hoc ("after this, therefore because of this")
-
-8. **Temporal & Historical Errors**
-   - Hindsight bias ("I knew it all along" after outcome known)
-   - Cherry-picked timeframes: March 2020 (COVID bottom), March 2009 (financial crisis bottom)
-   - Suspiciously short time periods (<2 years for market claims)
-
-9. **Narrative Content Issues**
-   - Vague claims: "Amazing project", "great work" without specifics
-   - Uncritical authority appeals: "Worked at Google" (in what capacity?)
-   - Selective self-presentation: Only mentioning successes, hiding failures
-   - Implied causation: "After I joined, the company grew 10x" (post hoc)
-
-**AVOID FLAGGING** (other tools handle): Basic fact verification, math errors, grammar, probability forecasts
-
-**Severity Scoring** (0-100):
-- 80-100: Egregious manipulation seriously distorting reality (rare!)
-- 60-79: Clear, significant reasoning error affecting core claims
-- 40-59: Moderate issue (usually skip)
-- Below 40: Skip
-
-**For each issue provide:**
-- exactText: Exact text from document (must match exactly)
-- approximateLineNumber: Rough line number where text appears
-- issueType: misinformation, missing-context, deceptive-wording, logical-fallacy, or verified-accurate
-- fallacyType (for logical-fallacy): ad-hominem, straw-man, false-dilemma, slippery-slope, appeal-to-authority, appeal-to-emotion, appeal-to-nature, hasty-generalization, survivorship-bias, selection-bias, cherry-picking, circular-reasoning, equivocation, non-sequitur, other
-- severityScore (0-100): How serious is this issue
-- confidenceScore (0-100): Only flag if ≥ 70
-- importanceScore (0-100): How central to the document's argument
-- reasoning: Concise explanation using markdown formatting (numbered lists, bullet points)
-
-**Avoid redundancy**: Don't flag same fallacy type multiple times per chunk - report only the most severe instance.`;
-
-    const userPrompt = `Analyze this text for epistemic and reasoning issues:
-
-${input.text}
-
-Analyze ALL sections (argumentative, factual, biographical). Look for statistical errors, logical fallacies, rhetorical manipulation, and narrative issues like vague claims or selective self-presentation. Distribute findings across the entire text.`;
+    // Use custom prompts if provided, otherwise use defaults from prompts.ts
+    // Always prepend date context to prevent false positives on recent dates
+    const baseSystemPrompt = input.customSystemPrompt || DEFAULT_EXTRACTOR_SYSTEM_PROMPT;
+    const systemPrompt = withDateContext(baseSystemPrompt);
+    const userPrompt = input.customUserPrompt
+      ? `${input.customUserPrompt}\n\n${textToAnalyze}`
+      : `${DEFAULT_EXTRACTOR_USER_PROMPT}\n\n${textToAnalyze}`;
 
     const cacheSeed = generateCacheSeed("fallacy-extract", [
-      input.text,
+      textToAnalyze,
       MIN_SEVERITY_THRESHOLD,
       MAX_ISSUES,
     ]);
 
-    const result = await callClaudeWithTool<{
-      issues: ExtractedFallacyIssue[];
-      wasComplete: boolean;
-    }>({
-      system: systemPrompt,
-      messages: [
-        {
-          role: "user",
-          content: userPrompt,
-        },
-      ],
-      max_tokens: 8000,
-      temperature: 0,
-      toolName: "extract_fallacy_issues",
-      toolDescription: "Extract and score fallacy issues from text",
-      toolSchema: {
-        type: "object",
-        properties: {
-          issues: {
-            type: "array",
-            items: {
-              type: "object",
-              properties: {
-                exactText: {
-                  type: "string",
-                  description: "The exact text from the document",
-                },
-                issueType: {
-                  type: "string",
-                  enum: [
-                    ISSUE_TYPES.MISINFORMATION,
-                    ISSUE_TYPES.MISSING_CONTEXT,
-                    ISSUE_TYPES.DECEPTIVE_WORDING,
-                    ISSUE_TYPES.LOGICAL_FALLACY,
-                    ISSUE_TYPES.VERIFIED_ACCURATE,
-                  ],
-                  description: "Type of issue",
-                },
-                fallacyType: {
-                  type: "string",
-                  enum: [
-                    "ad-hominem",
-                    "straw-man",
-                    "false-dilemma",
-                    "slippery-slope",
-                    "appeal-to-authority",
-                    "appeal-to-emotion",
-                    "appeal-to-nature",
-                    "hasty-generalization",
-                    "survivorship-bias",
-                    "selection-bias",
-                    "cherry-picking",
-                    "circular-reasoning",
-                    "equivocation",
-                    "non-sequitur",
-                    "other",
-                  ],
-                  description: "Specific fallacy type (only for logical-fallacy issues)",
-                },
-                severityScore: {
-                  type: "number",
-                  description: "0-100: How severe is this issue",
-                },
-                confidenceScore: {
-                  type: "number",
-                  description: "0-100: How confident you are this is the fallacy",
-                },
-                reasoning: {
-                  type: "string",
-                  description: "Why this is an issue",
-                },
-                importanceScore: {
-                  type: "number",
-                  description: "0-100: How important to address",
-                },
-                approximateLineNumber: {
-                  type: "number",
-                  description: "Approximate line number where this text appears (optional, helps speed up location finding)",
-                },
+    // Shared tool schema for both Claude and OpenRouter
+    const toolSchema = {
+      type: "object" as const,
+      properties: {
+        issues: {
+          type: "array",
+          items: {
+            type: "object",
+            properties: {
+              exactText: {
+                type: "string",
+                description: "The exact text from the document",
+              },
+              issueType: {
+                type: "string",
+                enum: [
+                  ISSUE_TYPES.MISINFORMATION,
+                  ISSUE_TYPES.MISSING_CONTEXT,
+                  ISSUE_TYPES.DECEPTIVE_WORDING,
+                  ISSUE_TYPES.LOGICAL_FALLACY,
+                  ISSUE_TYPES.VERIFIED_ACCURATE,
+                ],
+                description: "Type of issue",
+              },
+              fallacyType: {
+                type: "string",
+                enum: [
+                  "ad-hominem",
+                  "straw-man",
+                  "false-dilemma",
+                  "slippery-slope",
+                  "appeal-to-authority",
+                  "appeal-to-emotion",
+                  "appeal-to-nature",
+                  "hasty-generalization",
+                  "survivorship-bias",
+                  "selection-bias",
+                  "cherry-picking",
+                  "circular-reasoning",
+                  "equivocation",
+                  "non-sequitur",
+                  "other",
+                ],
+                description: "Specific fallacy type (only for logical-fallacy issues)",
+              },
+              severityScore: {
+                type: "number",
+                description: "0-100: How severe is this issue",
+              },
+              confidenceScore: {
+                type: "number",
+                description: "0-100: How confident you are this is the fallacy",
+              },
+              reasoning: {
+                type: "string",
+                description: "Why this is an issue",
+              },
+              importanceScore: {
+                type: "number",
+                description: "0-100: How important to address",
+              },
+              approximateLineNumber: {
+                type: "number",
+                description: "Approximate line number where this text appears (optional, helps speed up location finding)",
               },
-              required: [
-                "exactText",
-                "issueType",
-                "severityScore",
-                "confidenceScore",
-                "reasoning",
-                "importanceScore",
-              ],
             },
+            required: [
+              "exactText",
+              "issueType",
+              "severityScore",
+              "confidenceScore",
+              "reasoning",
+              "importanceScore",
+            ],
           },
-          wasComplete: {
-            type: "boolean",
-            description: "Whether analysis was complete or had to be truncated",
-          },
         },
-        required: ["issues", "wasComplete"],
+        wasComplete: {
+          type: "boolean",
+          description: "Whether analysis was complete or had to be truncated",
+        },
       },
-      enablePromptCaching: true,
-      cacheSeed,
-    });
+      required: ["issues", "wasComplete"],
+    };
+
+    type ExtractorResults = {
+      issues: ExtractedFallacyIssue[];
+      wasComplete: boolean;
+    };
+
+    type ExtractorCallResult = {
+      toolResult: ExtractorResults;
+      actualParams?: ActualApiParams;
+      responseMetrics?: ApiResponseMetrics;
+      unifiedUsage?: UnifiedUsageMetrics;
+    };
+
+    let result: ExtractorCallResult;
+    let actualApiParams: ActualApiParams | undefined;
+    let responseMetrics: ApiResponseMetrics | undefined;
+    let unifiedUsage: UnifiedUsageMetrics | undefined;
+
+    // Determine temperature to use:
+    // - "default": Don't pass temperature, let model use its native default
+    // - undefined: Use our model-specific default (0 for Claude, 0.1 for OpenRouter)
+    // - number: Use explicit value
+    const useDefaultTemperature = input.temperature === 'default';
+    const defaultTemp = isOpenRouterModel ? 0.1 : 0;
+    const temperature = useDefaultTemperature ? undefined : (typeof input.temperature === 'number' ? input.temperature : defaultTemp);
+
+    // Thinking parameter: undefined/true = enabled, false = disabled
+    const thinkingEnabled = input.thinking !== false;
+
+    // For Anthropic models, convert reasoning effort to budget_tokens
+    // Anthropic supports up to 128K thinking tokens
+    const ANTHROPIC_MAX_THINKING_TOKENS = 128000;
+    const EFFORT_PERCENTAGES: Record<string, number> = {
+      minimal: 0.1,
+      low: 0.3,
+      medium: 0.5,
+      high: 0.7,
+      xhigh: 0.9,
+    };
 
-    let allIssues = result.toolResult.issues || [];
-    const wasComplete = result.toolResult.wasComplete ?? true;
+    // Calculate thinking config for Claude based on reasoning effort
+    const getClaudeThinkingConfig = (): boolean | { type: 'enabled'; budget_tokens: number } => {
+      if (!thinkingEnabled) return false;
+
+      // Only set explicit budget if effort level is specified
+      if (input.reasoningEffort && input.reasoningEffort !== 'none') {
+        const percentage = EFFORT_PERCENTAGES[input.reasoningEffort];
+        if (percentage) {
+          const budgetTokens = Math.floor(ANTHROPIC_MAX_THINKING_TOKENS * percentage);
+          return { type: 'enabled' as const, budget_tokens: budgetTokens };
+        }
+      }
+
+      // No effort specified - just return true, let wrapper use its default
+      return true;
+    };
+
+    if (isOpenRouterModel && modelId) {
+      // Use OpenRouter for non-Claude models (Gemini, GPT, etc.)
+      const providerInfo = input.provider?.order ? `, provider: [${input.provider.order.join(', ')}]` : '';
+      context.logger.debug(`[FallacyExtractor] Calling OpenRouter: model=${modelId}, temp=${temperature ?? 'default'}, thinking=${thinkingEnabled}, reasoningEffort=${input.reasoningEffort ?? 'not set'}${providerInfo}`);
+      const openRouterResult = await callOpenRouterWithTool<ExtractorResults>({
+        model: modelId,
+        system: systemPrompt,
+        messages: [{ role: "user", content: userPrompt }],
+        max_tokens: 8000,
+        ...(temperature !== undefined && { temperature }),
+        toolName: "extract_fallacy_issues",
+        toolDescription: "Extract and score fallacy issues from text",
+        toolSchema,
+        thinking: thinkingEnabled,
+        // Pass explicit reasoning effort if provided (from profile config)
+        ...(input.reasoningEffort !== undefined && { reasoningEffort: input.reasoningEffort }),
+        // Pass provider preferences if specified
+        ...(input.provider && { provider: input.provider }),
+      });
+      result = openRouterResult;
+      // Capture actual API params from OpenRouter response
+      actualApiParams = {
+        model: openRouterResult.actualParams.model,
+        temperature: openRouterResult.actualParams.temperature,
+        maxTokens: openRouterResult.actualParams.maxTokens,
+        reasoning: openRouterResult.actualParams.reasoning,
+      };
+      responseMetrics = openRouterResult.responseMetrics;
+      unifiedUsage = openRouterResult.unifiedUsage;
+    } else {
+      // Use Claude API directly
+      const claudeThinkingConfig = getClaudeThinkingConfig();
+      const thinkingBudgetInfo = typeof claudeThinkingConfig === 'object'
+        ? `budget: ${claudeThinkingConfig.budget_tokens}`
+        : (claudeThinkingConfig ? 'default' : 'disabled');
+      context.logger.debug(`[FallacyExtractor] Calling Claude API: model=${modelId || 'default'}, temp=${temperature ?? 'default'}, thinking=${thinkingBudgetInfo}, reasoningEffort=${input.reasoningEffort ?? 'not set'}`);
+      const claudeResult = await callClaudeWithTool<ExtractorResults>({
+        ...(modelId && { model: modelId }),
+        system: systemPrompt,
+        messages: [{ role: "user", content: userPrompt }],
+        max_tokens: 8000,
+        ...(temperature !== undefined && { temperature }),
+        toolName: "extract_fallacy_issues",
+        toolDescription: "Extract and score fallacy issues from text",
+        toolSchema,
+        enablePromptCaching: true,
+        cacheSeed,
+        thinking: claudeThinkingConfig,
+      });
+      result = claudeResult;
+      // Capture actual API params from Claude response
+      actualApiParams = {
+        model: claudeResult.actualParams.model,
+        temperature: claudeResult.actualParams.temperature,
+        maxTokens: claudeResult.actualParams.maxTokens,
+        thinking: claudeResult.actualParams.thinking,
+      };
+      responseMetrics = claudeResult.responseMetrics;
+      unifiedUsage = claudeResult.unifiedUsage;
+    }
+
+    let allIssues = result.toolResult.issues;
+    const wasComplete = result.toolResult.wasComplete;
 
     // Handle case where LLM returns issues as a JSON string
     if (typeof allIssues === "string") {
@@ -416,19 +457,19 @@ Analyze ALL sections (argumentative, factual, biographical). Look for statistica
           let locationResult;
 
           // OPTIMIZATION: If we have chunk offset, search in chunk first (much faster!)
-          if (input.chunkStartOffset !== undefined) {
+          if (input.chunkStartOffset !== undefined && input.text) {
             // Use optimized 3-tier chunk-based location finding
             locationResult = await findLocationInChunk(
               {
                 chunkText: input.text,
-                fullDocumentText: input.documentText,
+                fullDocumentText: input.documentText || input.text,
                 chunkStartOffset: input.chunkStartOffset,
                 searchText: issue.exactText,
                 lineNumberHint: issue.approximateLineNumber,
               },
               context
             );
-          } else {
+          } else if (input.documentText) {
             // No chunk offset, search in full document
             locationResult = await fuzzyTextLocatorTool.execute(
               {
@@ -443,6 +484,10 @@ Analyze ALL sections (argumentative, factual, biographical). Look for statistica
               },
               context
             );
+          } else {
+            // No document text available for location finding
+            issuesWithLocations.push(issue);
+            continue;
           }
 
           if (locationResult.found && locationResult.location) {
@@ -500,10 +545,12 @@ Analyze ALL sections (argumentative, factual, biographical). Look for statistica
       issues: issuesWithLocations,
       totalIssuesFound: allIssues.length,
       wasComplete,
+      actualApiParams,
+      responseMetrics,
+      unifiedUsage,
     };
   }
 }
 
-// Export singleton instance
 export const fallacyExtractorTool = new FallacyExtractorTool();
 export default fallacyExtractorTool;
diff --git a/internal-packages/ai/src/tools/fallacy-extractor/prompts.ts b/internal-packages/ai/src/tools/fallacy-extractor/prompts.ts
new file mode 100644
index 000000000..f088db236
--- /dev/null
+++ b/internal-packages/ai/src/tools/fallacy-extractor/prompts.ts
@@ -0,0 +1,117 @@
+/**
+ * Default prompts for fallacy extraction
+ *
+ * These are exported so they can be displayed in the profile editor UI
+ * as placeholders/defaults while allowing customization.
+ */
+
+export const DEFAULT_EXTRACTOR_SYSTEM_PROMPT = `You are an expert epistemic critic analyzing reasoning quality and argumentation.
+
+**FOCUS**: Sophisticated epistemic issues, NOT basic fact-checking (handled by other tools).
+
+**🚨 CRITICAL: COMMITTING vs DISCUSSING**
+- Do NOT flag authors EXPLAINING, WARNING about, or ACKNOWLEDGING errors (good epistemics!)
+- Only flag authors MAKING the error themselves
+
+**🚨 CRITICAL: CHECK FOR JUSTIFICATION ELSEWHERE**
+- Before flagging a claim as unsupported or a non sequitur, CHECK if the author provides justification ELSEWHERE in the document
+- Authors often state conclusions first, then explain reasoning later - this is valid argumentation
+- A claim in paragraph 2 may be fully justified by technical explanation in paragraph 5
+- Only flag as "non sequitur" if there is NO supporting reasoning ANYWHERE in the document
+- Read the ENTIRE document before deciding whether a logical leap exists
+
+**🎯 SELECTIVITY**: Senior reviewer, not pedantic nitpicker.
+- Only flag issues that significantly mislead, clearly commit error, and matter to the argument
+- Default to NOT flagging. Aim for ~5-10 high-quality issues, not 20+ marginal ones
+- Only report severity ≥ 60, confidence ≥ 70
+
+**FALSE POSITIVE Examples (do NOT flag):**
+1. "Selection bias is a major problem in hiring research because we only see candidates who apply"
+   → Author EXPLAINING the concept, not committing the error
+2. "Be careful not to generalize from a single case study"
+   → Author WARNING about error
+3. "There isn't a cheap way to run true RCTs on hiring, so we're stuck with observational data and its selection biases"
+   → Author ACKNOWLEDGING limitation (good epistemics!)
+
+**TRUE POSITIVE Examples (DO flag):**
+1. "Our clients love us! 95% would recommend us to a friend"
+   → COMMITTING survivorship bias (only surveying existing clients)
+2. "Studies show that our approach is highly effective"
+   → COMMITTING weasel words (vague authority without citation)
+3. "Since launching in March 2020, we've delivered 847% returns"
+   → COMMITTING cherry-picked timeframe (market bottom)
+
+**CORE AREAS (prioritize these):**
+
+1. **Statistical Reasoning Errors**
+   - Base rate neglect (ignoring prior probabilities)
+   - Survivorship bias (only examining success cases)
+   - Selection bias (non-random samples)
+   - Framing: absolute vs relative risk ("50% increase" = 2% to 3%)
+
+2. **Sophisticated Logical Fallacies**
+   - False dichotomy (only presenting two options)
+   - Motte-bailey (defending controversial claim by retreating to defensible one)
+   - Circular reasoning (conclusion in premises)
+   - Hasty generalization (insufficient evidence → broad claim)
+
+3. **Framing & Rhetorical Manipulation**
+   - Anchoring (biasing judgment with reference points)
+   - Denominator neglect ("10 deaths" vs "10 per million")
+   - Cherry-picked timeframes (ignoring unfavorable periods)
+   - False precision ("exactly 47.3%" when rough estimate warranted)
+
+4. **Suspicious Numbers**
+   - False precision: "47.3% annual returns" from "internal study"
+   - Too perfect: 98%, 99%, 99.9%, 100% = suspiciously high
+   - Impossibly exact: "Exactly 10x returns" vs "approximately 10x"
+
+5. **Missing Crucial Context**
+   - Only flag when you KNOW what's missing and it significantly changes interpretation
+   - Examples: Cherry-picked time periods, undisclosed conflicts of interest, missing comparison groups
+
+6. **Bad Faith Argumentation**
+   - Strawmanning (misrepresenting opposing views)
+   - Moving goalposts (changing criteria when challenged)
+   - Quote mining (taking quotes out of context)
+   - Whataboutism (deflecting criticism by pointing elsewhere)
+
+7. **Causal Reasoning Errors**
+   - Confounding variables (third variable causes both X and Y)
+   - Reverse causation (getting direction backwards)
+   - Post hoc ergo propter hoc ("after this, therefore because of this")
+
+8. **Temporal & Historical Errors**
+   - Hindsight bias ("I knew it all along" after outcome known)
+   - Cherry-picked timeframes: March 2020 (COVID bottom), March 2009 (financial crisis bottom)
+   - Suspiciously short time periods (<2 years for market claims)
+
+9. **Narrative Content Issues**
+   - Vague claims: "Amazing project", "great work" without specifics
+   - Uncritical authority appeals: "Worked at Google" (in what capacity?)
+   - Selective self-presentation: Only mentioning successes, hiding failures
+   - Implied causation: "After I joined, the company grew 10x" (post hoc)
+
+**AVOID FLAGGING** (other tools handle): Basic fact verification, math errors, grammar, probability forecasts
+
+**Severity Scoring** (0-100):
+- 80-100: Egregious manipulation seriously distorting reality (rare!)
+- 60-79: Clear, significant reasoning error affecting core claims
+- 40-59: Moderate issue (usually skip)
+- Below 40: Skip
+
+**For each issue provide:**
+- exactText: Exact text from document (must match exactly)
+- approximateLineNumber: Rough line number where text appears
+- issueType: misinformation, missing-context, deceptive-wording, logical-fallacy, or verified-accurate
+- fallacyType (for logical-fallacy): ad-hominem, straw-man, false-dilemma, slippery-slope, appeal-to-authority, appeal-to-emotion, appeal-to-nature, hasty-generalization, survivorship-bias, selection-bias, cherry-picking, circular-reasoning, equivocation, non-sequitur, other
+- severityScore (0-100): How serious is this issue
+- confidenceScore (0-100): Only flag if ≥ 70
+- importanceScore (0-100): How central to the document's argument
+- reasoning: Concise explanation using markdown formatting (numbered lists, bullet points)
+
+**Avoid redundancy**: Don't flag same fallacy type multiple times per chunk - report only the most severe instance.`;
+
+export const DEFAULT_EXTRACTOR_USER_PROMPT = `Analyze this text for epistemic and reasoning issues:
+
+Analyze ALL sections (argumentative, factual, biographical). Look for statistical errors, logical fallacies, rhetorical manipulation, and narrative issues like vague claims or selective self-presentation. Distribute findings across the entire text.`;
diff --git a/internal-packages/ai/src/tools/fallacy-extractor/types.ts b/internal-packages/ai/src/tools/fallacy-extractor/types.ts
index 6ce000779..c88081d5d 100644
--- a/internal-packages/ai/src/tools/fallacy-extractor/types.ts
+++ b/internal-packages/ai/src/tools/fallacy-extractor/types.ts
@@ -1,4 +1,9 @@
-import { IssueType } from '../../analysis-plugins/plugins/fallacy-check/constants';
+import type { IssueType } from '../../analysis-plugins/plugins/fallacy-check/constants';
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+import type { ActualApiParams, ApiResponseMetrics } from '../../types/common';
+
+// Re-export for backwards compatibility
+export type { ActualApiParams, ApiResponseMetrics };
 
 /**
  * Specific types of fallacies (for logical-fallacy issue type)
@@ -65,14 +70,78 @@ export interface ExtractedFallacyIssue {
  * Input for the epistemic issues extractor tool
  */
 export interface FallacyExtractorInput {
-  /** Text chunk to analyze */
-  text: string;
+  /** Text chunk to analyze (optional if documentText provided) */
+  text?: string;
 
-  /** Full document text (for accurate location finding in full doc) */
+  /** Full document text - used for analysis in single-pass mode, or for location finding in chunk mode */
   documentText?: string;
 
   /** Absolute offset where this chunk starts in the full document (optimization) */
   chunkStartOffset?: number;
+
+  /**
+   * Optional model to use for extraction.
+   * Can be a Claude model (default) or an OpenRouter model ID.
+   * Examples: "claude-sonnet-4-20250514", "google/gemini-3-flash-preview"
+   */
+  model?: string;
+
+  /**
+   * Optional temperature override for extraction.
+   * - undefined: Use model-specific default (0 for Claude, 0.1 for OpenRouter)
+   * - number: Use this specific temperature
+   * - "default": Let the model use its own default (don't pass temperature)
+   * Use higher values (0.3-0.7) to get more diverse extractions.
+   */
+  temperature?: number | 'default';
+
+  /**
+   * Whether to enable extended thinking/reasoning mode.
+   * - undefined/true: Enable extended thinking (Claude) / reasoning (OpenRouter/Gemini)
+   * - false: Disable extended thinking for faster, cheaper responses
+   */
+  thinking?: boolean;
+
+  /**
+   * Reasoning effort level for OpenRouter models (Gemini, etc.)
+   * - 'none': Disable reasoning
+   * - 'minimal', 'low', 'medium', 'high', 'xhigh': Effort levels
+   */
+  reasoningEffort?: 'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh';
+
+  /**
+   * Provider routing preferences (OpenRouter only)
+   * Allows specifying preferred providers for a model
+   */
+  provider?: {
+    order?: string[];
+    allow_fallbacks?: boolean;
+  };
+
+  /**
+   * Optional custom system prompt override.
+   * If provided, replaces the default system prompt entirely.
+   */
+  customSystemPrompt?: string;
+
+  /**
+   * Optional custom user prompt override.
+   * If provided, replaces the default user prompt entirely.
+   * The document text will be appended to the end.
+   */
+  customUserPrompt?: string;
+
+  /**
+   * Optional minimum severity threshold override.
+   * Default: 60
+   */
+  minSeverityThreshold?: number;
+
+  /**
+   * Optional max issues override.
+   * Default: 15
+   */
+  maxIssues?: number;
 }
 
 /**
@@ -87,4 +156,13 @@ export interface FallacyExtractorOutput {
 
   /** Whether the analysis was complete or truncated */
   wasComplete: boolean;
+
+  /** Actual parameters sent to the API (source of truth) */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics from the API call */
+  responseMetrics?: ApiResponseMetrics;
+
+  /** Unified usage metrics (includes cost, tokens, latency across providers) */
+  unifiedUsage?: UnifiedUsageMetrics;
 }
diff --git a/internal-packages/ai/src/tools/fallacy-judge/config.ts b/internal-packages/ai/src/tools/fallacy-judge/config.ts
new file mode 100644
index 000000000..82eb8bcfa
--- /dev/null
+++ b/internal-packages/ai/src/tools/fallacy-judge/config.ts
@@ -0,0 +1,12 @@
+import type { ToolConfig } from '../base/Tool';
+
+export const fallacyJudgeConfig: ToolConfig = {
+  id: 'fallacy-judge',
+  name: 'Fallacy Judge Aggregator',
+  description:
+    'Aggregates fallacy issues from multiple extractors, merging duplicates and filtering weak single-source issues with explainable decisions',
+  version: '1.0.0',
+  category: 'utility',
+  path: '/tools/fallacy-judge',
+  status: 'beta',
+};
diff --git a/internal-packages/ai/src/tools/fallacy-judge/index.ts b/internal-packages/ai/src/tools/fallacy-judge/index.ts
new file mode 100644
index 000000000..fbb74eaff
--- /dev/null
+++ b/internal-packages/ai/src/tools/fallacy-judge/index.ts
@@ -0,0 +1,636 @@
+/**
+ * Fallacy Judge Aggregator Tool
+ *
+ * Aggregates issues from multiple extractors using an LLM judge to:
+ * 1. Group similar/duplicate issues across extractors
+ * 2. Merge duplicates into single best-formulation issues
+ * 3. Accept high-confidence multi-source issues
+ * 4. Reject low-confidence single-source issues
+ * 5. Provide reasoning for each decision
+ */
+
+import { z } from 'zod';
+import Anthropic from '@anthropic-ai/sdk';
+import { Tool, type ToolContext } from '../base/Tool';
+import { callClaude, callClaudeWithTool } from '../../claude/wrapper';
+import { callOpenRouterWithTool } from '../../utils/openrouter';
+import { resolveModelConfig, getReasoningDisplayString } from '../../utils/modelConfigResolver';
+import { withDateContext } from '../shared/llm-filter-utils';
+import { fallacyJudgeConfig } from './config';
+import type {
+  FallacyJudgeInput,
+  FallacyJudgeOutput,
+  JudgeDecision,
+  JudgeConfig,
+  ActualApiParams,
+  ApiResponseMetrics,
+} from './types';
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+import { DEFAULT_JUDGE_SYSTEM_PROMPT } from './prompts';
+
+// Default model for judge (can be overridden via env var)
+const DEFAULT_JUDGE_MODEL = 'claude-sonnet-4-5-20250929';
+const DEFAULT_CLAUDE_TEMPERATURE = 0.1;
+const DEFAULT_OPENROUTER_TEMPERATURE = 0.1;
+
+/**
+ * Check if a model is an OpenRouter model (contains '/')
+ */
+function isOpenRouterModel(model: string): boolean {
+  return model.includes('/');
+}
+
+/**
+ * Parse a single judge config object
+ */
+function parseJudgeConfigObject(parsed: unknown): JudgeConfig | null {
+  if (typeof parsed === 'object' && parsed !== null && typeof (parsed as Record<string, unknown>).model === 'string') {
+    const obj = parsed as Record<string, unknown>;
+
+    // Parse reasoning config
+    let reasoning: JudgeConfig['reasoning'] = undefined;
+    if (obj.reasoning !== undefined) {
+      if (obj.reasoning === false) {
+        reasoning = false;
+      } else if (typeof obj.reasoning === 'object' && obj.reasoning !== null) {
+        const r = obj.reasoning as Record<string, unknown>;
+        if (typeof r.effort === 'string') {
+          reasoning = { effort: r.effort as 'minimal' | 'low' | 'medium' | 'high' | 'xhigh' };
+        } else if (typeof r.budget_tokens === 'number') {
+          reasoning = { budget_tokens: r.budget_tokens };
+        }
+      }
+    }
+
+    // Parse provider preferences
+    let provider: JudgeConfig['provider'] = undefined;
+    if (typeof obj.provider === 'object' && obj.provider !== null) {
+      const p = obj.provider as Record<string, unknown>;
+      provider = {
+        order: Array.isArray(p.order) ? p.order as string[] : undefined,
+        allow_fallbacks: typeof p.allow_fallbacks === 'boolean' ? p.allow_fallbacks : undefined,
+      };
+    }
+
+    return {
+      model: obj.model as string,
+      temperature: typeof obj.temperature === 'number' ? obj.temperature :
+                   obj.temperature === 'default' ? 'default' : undefined,
+      thinking: typeof obj.thinking === 'boolean' ? obj.thinking : undefined,
+      reasoning,
+      provider,
+      label: typeof obj.label === 'string' ? obj.label : undefined,
+      enabled: obj.enabled !== false,
+    };
+  }
+  return null;
+}
+
+/**
+ * Parse FALLACY_JUDGES env var for array of judge configs
+ * Also accepts array in FALLACY_JUDGE for convenience
+ *
+ * Example:
+ * FALLACY_JUDGES='[{"model":"claude-sonnet-4-5-20250929","thinking":true},{"model":"google/gemini-3-flash-preview","thinking":false}]'
+ */
+export function getJudgesConfig(): JudgeConfig[] {
+  // Try FALLACY_JUDGES first, then FALLACY_JUDGE (both can contain arrays)
+  const judgesEnv = process.env.FALLACY_JUDGES || process.env.FALLACY_JUDGE;
+
+  if (judgesEnv) {
+    try {
+      const parsed = JSON.parse(judgesEnv);
+      if (Array.isArray(parsed)) {
+        const configs: JudgeConfig[] = [];
+        for (const item of parsed) {
+          const config = parseJudgeConfigObject(item);
+          if (config) {
+            configs.push(config);
+          }
+        }
+        if (configs.length > 0) {
+          return configs;
+        }
+      } else {
+        const config = parseJudgeConfigObject(parsed);
+        if (config && config.enabled) {
+          return [config];
+        }
+      }
+    } catch {
+      // Invalid JSON - fall through to default
+    }
+  }
+
+  return [];
+}
+
+/**
+ * Parse FALLACY_JUDGE env var for single judge config (legacy)
+ *
+ * Example:
+ * FALLACY_JUDGE='{"model":"google/gemini-3-flash-preview","temperature":"default","thinking":false,"enabled":true}'
+ */
+export function getJudgeConfig(): JudgeConfig {
+  const judgeEnv = process.env.FALLACY_JUDGE;
+
+  if (judgeEnv) {
+    try {
+      const parsed = JSON.parse(judgeEnv);
+      const config = parseJudgeConfigObject(parsed);
+      if (config) {
+        return config;
+      }
+      console.warn('[FallacyJudge] Invalid FALLACY_JUDGE format, using defaults');
+    } catch (e) {
+      console.warn('[FallacyJudge] Failed to parse FALLACY_JUDGE:', e);
+    }
+  }
+
+  // Default config when env var not set
+  return {
+    model: DEFAULT_JUDGE_MODEL,
+    enabled: false, // Disabled by default when not configured
+  };
+}
+
+/**
+ * Generate a display label for a judge config
+ */
+export function generateJudgeLabel(config: JudgeConfig): string {
+  if (config.label) {
+    return config.label;
+  }
+
+  // Extract short model name
+  let shortName: string;
+  if (isOpenRouterModel(config.model)) {
+    const parts = config.model.split('/');
+    shortName = parts[parts.length - 1].replace('-preview', '').replace('-latest', '');
+  } else {
+    if (config.model.includes('opus')) {
+      shortName = 'opus';
+    } else if (config.model.includes('sonnet')) {
+      shortName = 'sonnet';
+    } else if (config.model.includes('haiku')) {
+      shortName = 'haiku';
+    } else {
+      shortName = config.model.slice(0, 10);
+    }
+  }
+
+  // Build suffix parts
+  const suffixParts: string[] = [];
+
+  if (config.temperature === 'default') {
+    suffixParts.push('tDef');
+  } else if (config.temperature !== undefined) {
+    suffixParts.push(`t${config.temperature}`);
+  }
+
+  if (config.thinking === false) {
+    suffixParts.push('noThink');
+  } else if (config.thinking === true) {
+    suffixParts.push('think');
+  }
+
+  if (suffixParts.length > 0) {
+    return `${shortName}-${suffixParts.join('-')}`;
+  }
+
+  return shortName;
+}
+
+const extractorIssueInputSchema = z.object({
+  extractorId: z.string(),
+  exactText: z.string(),
+  issueType: z.string(),
+  fallacyType: z.string().optional(),
+  severityScore: z.number(),
+  confidenceScore: z.number(),
+  importanceScore: z.number(),
+  reasoning: z.string(),
+});
+
+const reasoningConfigSchema = z.union([
+  z.literal(false),
+  z.object({ effort: z.enum(['minimal', 'low', 'medium', 'high', 'xhigh']) }),
+  z.object({ budget_tokens: z.number() }),
+]);
+
+const providerPreferencesSchema = z.object({
+  order: z.array(z.string()).optional(),
+  allow_fallbacks: z.boolean().optional(),
+});
+
+const judgeConfigSchema = z.object({
+  model: z.string(),
+  temperature: z.union([z.number(), z.literal('default')]).optional(),
+  thinking: z.boolean().optional(),
+  reasoning: reasoningConfigSchema.optional(),
+  provider: providerPreferencesSchema.optional(),
+  label: z.string().optional(),
+  enabled: z.boolean(),
+});
+
+const inputSchema = z.object({
+  documentText: z.string().min(1),
+  issues: z.array(extractorIssueInputSchema),
+  extractorIds: z.array(z.string()),
+  judgeConfig: judgeConfigSchema.optional(),
+  customSystemPrompt: z.string().optional(),
+});
+
+const judgeDecisionSchema = z.object({
+  decision: z.enum(['accept', 'merge', 'reject']),
+  finalText: z.string(),
+  finalIssueType: z.string(),
+  finalFallacyType: z.string().optional(),
+  finalSeverity: z.number(),
+  finalConfidence: z.number(),
+  finalImportance: z.number(),
+  finalReasoning: z.string(),
+  sourceExtractors: z.array(z.string()),
+  sourceIssueIndices: z.array(z.number()),
+  judgeReasoning: z.string(),
+});
+
+const outputSchema = z.object({
+  acceptedDecisions: z.array(judgeDecisionSchema),
+  rejectedDecisions: z.array(judgeDecisionSchema),
+  summary: z.object({
+    totalInputIssues: z.number(),
+    uniqueGroups: z.number(),
+    acceptedCount: z.number(),
+    mergedCount: z.number(),
+    rejectedCount: z.number(),
+  }),
+});
+
+export class FallacyJudgeTool extends Tool<FallacyJudgeInput, FallacyJudgeOutput> {
+  config = fallacyJudgeConfig;
+  inputSchema = inputSchema;
+  outputSchema = outputSchema;
+
+  async execute(
+    input: FallacyJudgeInput,
+    context: ToolContext
+  ): Promise<FallacyJudgeOutput> {
+    const startTime = Date.now();
+    context.logger.info(
+      `[FallacyJudge] Aggregating ${input.issues.length} issues from ${input.extractorIds.length} extractors`
+    );
+
+    // If no issues or only one extractor, skip judge and return as-is
+    if (input.issues.length === 0) {
+      return {
+        acceptedDecisions: [],
+        rejectedDecisions: [],
+        summary: {
+          totalInputIssues: 0,
+          uniqueGroups: 0,
+          acceptedCount: 0,
+          mergedCount: 0,
+          rejectedCount: 0,
+        },
+      };
+    }
+
+    // If only one extractor, accept all issues (no aggregation needed)
+    if (input.extractorIds.length === 1) {
+      const acceptedDecisions = input.issues.map((issue, idx) => ({
+        decision: 'accept' as const,
+        finalText: issue.exactText,
+        finalIssueType: issue.issueType,
+        finalFallacyType: issue.fallacyType,
+        finalSeverity: issue.severityScore,
+        finalConfidence: issue.confidenceScore,
+        finalImportance: issue.importanceScore,
+        finalReasoning: issue.reasoning,
+        sourceExtractors: [issue.extractorId],
+        sourceIssueIndices: [idx],
+        judgeReasoning: 'Single extractor mode - all issues accepted',
+      }));
+
+      return {
+        acceptedDecisions,
+        rejectedDecisions: [],
+        summary: {
+          totalInputIssues: input.issues.length,
+          uniqueGroups: input.issues.length,
+          acceptedCount: input.issues.length,
+          mergedCount: 0,
+          rejectedCount: 0,
+        },
+      };
+    }
+
+    // Format issues for the LLM, sorted alphabetically by text to group similar issues together
+    // This makes it easier for the judge to spot duplicates/similar issues
+    const issuesWithIndices = input.issues.map((issue, idx) => ({ issue, originalIdx: idx }));
+    issuesWithIndices.sort((a, b) => (a.issue.exactText || '').localeCompare(b.issue.exactText || ''));
+
+    const formattedIssues = issuesWithIndices
+      .map(({ issue, originalIdx }) => {
+        const reasoning = issue.reasoning || '(no reasoning provided)';
+        const exactText = issue.exactText || '(no text)';
+        return `[Issue ${originalIdx}] Extractor: ${issue.extractorId}
+Text: "${exactText.substring(0, 150)}${exactText.length > 150 ? '...' : ''}"
+Type: ${issue.issueType}${issue.fallacyType ? ` (${issue.fallacyType})` : ''}
+Severity: ${issue.severityScore}, Confidence: ${issue.confidenceScore}, Importance: ${issue.importanceScore}
+Reasoning: ${reasoning.substring(0, 200)}${reasoning.length > 200 ? '...' : ''}`;
+      })
+      .join('\n\n');
+
+    // Use custom prompt if provided, otherwise use default from prompts.ts
+    // Always prepend date context to prevent false positives on recent dates
+    const baseSystemPrompt = input.customSystemPrompt || DEFAULT_JUDGE_SYSTEM_PROMPT;
+    const systemPrompt = withDateContext(baseSystemPrompt);
+
+    const userPrompt = `Aggregate these ${input.issues.length} issues from ${input.extractorIds.length} extractors (${input.extractorIds.join(', ')}):
+
+**Document Context** (first 1500 chars):
+${input.documentText.substring(0, 1500)}${input.documentText.length > 1500 ? '\n...[truncated]...' : ''}
+
+**Issues to Aggregate:**
+
+${formattedIssues}
+
+---
+
+Group similar issues together and provide your decisions. Remember:
+- Issues found by multiple extractors are more likely to be valid
+- Single-source issues need very high confidence (≥90) to be accepted
+- Explain your reasoning for each decision`;
+
+    try {
+      // Use passed config if provided, otherwise fall back to env var config
+      const judgeConfig = input.judgeConfig ?? getJudgeConfig();
+
+      // Use the unified model config resolver
+      const isOpenRouter = judgeConfig.model.includes('/');
+      const resolved = resolveModelConfig(judgeConfig, {
+        defaultTemperature: isOpenRouter ? DEFAULT_OPENROUTER_TEMPERATURE : DEFAULT_CLAUDE_TEMPERATURE,
+      });
+
+      const reasoningDisplay = getReasoningDisplayString(judgeConfig);
+      context.logger.info(
+        `[FallacyJudge] Using ${resolved.isOpenRouter ? 'OpenRouter' : 'Claude'} model: ${judgeConfig.model}, temp: ${resolved.temperature ?? 'default'}, reasoning: ${reasoningDisplay}`
+      );
+
+      type JudgeResultType = {
+        decisions: Array<{
+          decision: 'accept' | 'merge' | 'reject';
+          finalText: string;
+          finalIssueType: string;
+          finalFallacyType?: string;
+          finalSeverity: number;
+          finalConfidence: number;
+          finalImportance: number;
+          finalReasoning: string;
+          sourceExtractors: string[];
+          sourceIssueIndices: number[];
+          judgeReasoning: string;
+        }>;
+      };
+
+      const toolSchema = {
+        type: 'object' as const,
+        properties: {
+          decisions: {
+            type: 'array',
+            items: {
+              type: 'object',
+              properties: {
+                decision: {
+                  type: 'string',
+                  enum: ['accept', 'merge', 'reject'],
+                  description: 'Judge decision for this issue/group',
+                },
+                finalText: {
+                  type: 'string',
+                  description: 'Final text for the issue (best formulation)',
+                },
+                finalIssueType: {
+                  type: 'string',
+                  description: 'Final issue type',
+                },
+                finalFallacyType: {
+                  type: 'string',
+                  description: 'Final fallacy type (if applicable)',
+                },
+                finalSeverity: {
+                  type: 'number',
+                  description: 'Final severity score (0-100)',
+                },
+                finalConfidence: {
+                  type: 'number',
+                  description: 'Final confidence score (0-100)',
+                },
+                finalImportance: {
+                  type: 'number',
+                  description: 'Final importance score (0-100)',
+                },
+                finalReasoning: {
+                  type: 'string',
+                  description: 'Best reasoning for this issue',
+                },
+                sourceExtractors: {
+                  type: 'array',
+                  items: { type: 'string' },
+                  description: 'Which extractors found this issue',
+                },
+                sourceIssueIndices: {
+                  type: 'array',
+                  items: { type: 'number' },
+                  description: 'Indices of original issues in this group',
+                },
+                judgeReasoning: {
+                  type: 'string',
+                  description: 'Why you made this decision',
+                },
+              },
+              required: [
+                'decision',
+                'finalText',
+                'finalIssueType',
+                'finalSeverity',
+                'finalConfidence',
+                'finalImportance',
+                'finalReasoning',
+                'sourceExtractors',
+                'sourceIssueIndices',
+                'judgeReasoning',
+              ],
+            },
+          },
+        },
+        required: ['decisions'],
+      };
+
+      type JudgeCallResult = {
+        toolResult: JudgeResultType;
+        unifiedUsage?: UnifiedUsageMetrics;
+        actualApiParams?: ActualApiParams;
+        responseMetrics?: ApiResponseMetrics;
+      };
+
+      let result: JudgeCallResult;
+
+      if (resolved.isOpenRouter) {
+        // Use OpenRouter for non-Claude models
+        // Use 32000 max_tokens to handle large outputs with many issues (esp. with thinking)
+        const openRouterResult = await callOpenRouterWithTool<JudgeResultType>({
+          model: resolved.model,
+          system: systemPrompt,
+          messages: [{ role: 'user', content: userPrompt }],
+          max_tokens: 32000,
+          ...(resolved.temperature !== undefined && { temperature: resolved.temperature }),
+          toolName: 'aggregate_fallacy_issues',
+          toolDescription: 'Aggregate and deduplicate fallacy issues from multiple extractors',
+          toolSchema,
+          thinking: resolved.thinkingEnabled,
+          // Pass reasoning effort for budget calculation
+          ...(resolved.reasoningEffort && { reasoningEffort: resolved.reasoningEffort }),
+          // Pass provider preferences for routing
+          ...(resolved.provider && { provider: resolved.provider }),
+        });
+        result = {
+          toolResult: openRouterResult.toolResult,
+          unifiedUsage: openRouterResult.unifiedUsage,
+          actualApiParams: {
+            model: openRouterResult.actualParams.model,
+            temperature: openRouterResult.actualParams.temperature,
+            maxTokens: openRouterResult.actualParams.maxTokens,
+            reasoning: openRouterResult.actualParams.reasoning,
+          },
+          responseMetrics: {
+            success: openRouterResult.responseMetrics.success,
+            latencyMs: openRouterResult.responseMetrics.latencyMs,
+            inputTokens: openRouterResult.responseMetrics.inputTokens,
+            outputTokens: openRouterResult.responseMetrics.outputTokens,
+            stopReason: openRouterResult.responseMetrics.stopReason,
+          },
+        };
+      } else {
+        // Use Claude API directly
+        if (resolved.thinkingEnabled) {
+          // When thinking is enabled, use tool_choice: 'auto' to allow thinking
+          // (forced tool_choice like 'any' or specific tool is incompatible with extended thinking)
+          // Calculate max_tokens to accommodate thinking budget
+          const thinkingBudget = typeof resolved.claudeThinkingConfig === 'object'
+            ? resolved.claudeThinkingConfig.budget_tokens
+            : 10000;
+          const maxTokens = Math.max(16000, thinkingBudget + 4000);
+
+          const claudeResult = await callClaude(
+            {
+              model: resolved.model,
+              system: systemPrompt,
+              messages: [{ role: 'user', content: userPrompt }],
+              max_tokens: maxTokens,
+              ...(resolved.temperature !== undefined && { temperature: resolved.temperature }),
+              tools: [{
+                name: 'aggregate_fallacy_issues',
+                description: 'Aggregate and deduplicate fallacy issues from multiple extractors',
+                input_schema: toolSchema,
+              }],
+              tool_choice: { type: 'auto' },
+              thinking: resolved.claudeThinkingConfig,
+            },
+            []
+          );
+
+          // Extract tool result from response
+          const toolUse = claudeResult.response.content.find(
+            (c): c is Anthropic.Messages.ToolUseBlock => c.type === 'tool_use'
+          );
+          if (!toolUse) {
+            throw new Error('Judge did not call the aggregation tool - no tool use in response');
+          }
+          result = {
+            toolResult: toolUse.input as JudgeResultType,
+            unifiedUsage: claudeResult.unifiedUsage,
+          };
+        } else {
+          // Without thinking, use forced tool_choice for guaranteed structure
+          const claudeResult = await callClaudeWithTool<JudgeResultType>(
+            {
+              model: resolved.model,
+              system: systemPrompt,
+              messages: [{ role: 'user', content: userPrompt }],
+              max_tokens: 8000,
+              ...(resolved.temperature !== undefined && { temperature: resolved.temperature }),
+              toolName: 'aggregate_fallacy_issues',
+              toolDescription: 'Aggregate and deduplicate fallacy issues from multiple extractors',
+              toolSchema,
+              thinking: false,
+            },
+            []
+          );
+          result = {
+            toolResult: claudeResult.toolResult,
+            unifiedUsage: claudeResult.unifiedUsage,
+          };
+        }
+      }
+
+      // Separate accepted/rejected decisions
+      const acceptedDecisions: JudgeDecision[] = [];
+      const rejectedDecisions: JudgeDecision[] = [];
+      let mergedCount = 0;
+
+      for (const d of result.toolResult.decisions) {
+        const decision: JudgeDecision = {
+          decision: d.decision,
+          finalText: d.finalText,
+          finalIssueType: d.finalIssueType,
+          finalFallacyType: d.finalFallacyType,
+          finalSeverity: d.finalSeverity,
+          finalConfidence: d.finalConfidence,
+          finalImportance: d.finalImportance,
+          finalReasoning: d.finalReasoning,
+          sourceExtractors: d.sourceExtractors,
+          sourceIssueIndices: d.sourceIssueIndices,
+          judgeReasoning: d.judgeReasoning,
+        };
+
+        if (d.decision === 'reject') {
+          rejectedDecisions.push(decision);
+        } else {
+          acceptedDecisions.push(decision);
+          if (d.decision === 'merge') {
+            mergedCount++;
+          }
+        }
+      }
+
+      const durationMs = Date.now() - startTime;
+      context.logger.info(
+        `[FallacyJudge] Aggregation complete in ${(durationMs / 1000).toFixed(1)}s: ${acceptedDecisions.length} accepted, ${mergedCount} merged, ${rejectedDecisions.length} rejected`
+      );
+
+      return {
+        acceptedDecisions,
+        rejectedDecisions,
+        summary: {
+          totalInputIssues: input.issues.length,
+          uniqueGroups: result.toolResult.decisions.length,
+          acceptedCount: acceptedDecisions.length,
+          mergedCount,
+          rejectedCount: rejectedDecisions.length,
+        },
+        unifiedUsage: result.unifiedUsage,
+        actualApiParams: result.actualApiParams,
+        responseMetrics: result.responseMetrics,
+      };
+    } catch (error) {
+      context.logger.error('[FallacyJudge] Aggregation failed:', error);
+      // Re-throw to surface error to user - don't silently fallback
+      throw error;
+    }
+  }
+}
+
+const fallacyJudgeTool = new FallacyJudgeTool();
+export default fallacyJudgeTool;
diff --git a/internal-packages/ai/src/tools/fallacy-judge/prompts.ts b/internal-packages/ai/src/tools/fallacy-judge/prompts.ts
new file mode 100644
index 000000000..206435e6f
--- /dev/null
+++ b/internal-packages/ai/src/tools/fallacy-judge/prompts.ts
@@ -0,0 +1,33 @@
+/**
+ * Default prompts for fallacy judge
+ *
+ * These are exported so they can be displayed in the profile editor UI
+ * as placeholders/defaults while allowing customization.
+ */
+
+export const DEFAULT_JUDGE_SYSTEM_PROMPT = `You are an expert epistemic judge aggregating fallacy issues from multiple extractors.
+
+Your task is to:
+1. **Group similar issues** - Issues about the same text/concept from different extractors
+2. **Make decisions** for each group:
+   - **accept**: Issue is valid and found by 2+ extractors, OR single-source with very high confidence (≥90)
+   - **merge**: Multiple extractors found similar issues - combine into best formulation
+   - **reject**: Low-confidence single-source issue (likely false positive)
+
+**Decision Guidelines:**
+- Multi-source issues (found by 2+ extractors): Almost always accept or merge
+- Single-source with confidence ≥90: Accept
+- Single-source with confidence 80-89 and severity ≥80: Consider accepting
+- Single-source with confidence <80: Reject as likely false positive
+
+**When merging:**
+- Use the clearest/most specific text formulation
+- Take the highest severity and confidence scores
+- Combine reasoning from multiple sources
+- List ALL source extractors
+
+**Output Requirements:**
+- Every input issue must be accounted for in exactly one decision
+- sourceIssueIndices should reference the original issue indices
+- sourceExtractors should list which extractors contributed
+- judgeReasoning should explain your decision`;
diff --git a/internal-packages/ai/src/tools/fallacy-judge/types.ts b/internal-packages/ai/src/tools/fallacy-judge/types.ts
new file mode 100644
index 000000000..bb8773c6f
--- /dev/null
+++ b/internal-packages/ai/src/tools/fallacy-judge/types.ts
@@ -0,0 +1,178 @@
+/**
+ * Types for the Fallacy Judge Aggregator Tool
+ *
+ * The judge aggregates issues from multiple extractors,
+ * merging duplicates and filtering weak single-source issues.
+ */
+
+import type { ExtractedFallacyIssue } from '../fallacy-extractor/types';
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+import type {
+  ReasoningConfig,
+  ProviderPreferences,
+  ActualApiParams,
+  ApiResponseMetrics,
+} from '../../types/common';
+
+// Re-export for backwards compatibility
+export type { ReasoningConfig, ProviderPreferences, ActualApiParams, ApiResponseMetrics };
+
+/**
+ * Judge configuration from profile or FALLACY_JUDGE env var
+ *
+ * Example:
+ * FALLACY_JUDGE='{"model":"google/gemini-3-flash-preview","temperature":"default","reasoning":{"effort":"high"},"enabled":true}'
+ */
+export interface JudgeConfig {
+  /** Model to use (Claude or OpenRouter format) */
+  model: string;
+
+  /** Temperature (number or "default" for model's native default) */
+  temperature?: number | 'default';
+
+  /** @deprecated Use reasoning instead */
+  thinking?: boolean;
+
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+
+  /** Optional display label (auto-generated if not provided) */
+  label?: string;
+
+  /** Whether the judge is enabled */
+  enabled: boolean;
+}
+
+/**
+ * An issue from a specific extractor
+ */
+export interface ExtractorIssueInput {
+  /** Which extractor found this issue */
+  extractorId: string;
+
+  /** The exact text flagged */
+  exactText: string;
+
+  /** Issue type */
+  issueType: string;
+
+  /** Specific fallacy type (if applicable) */
+  fallacyType?: string;
+
+  /** Severity score (0-100) */
+  severityScore: number;
+
+  /** Confidence score (0-100) */
+  confidenceScore: number;
+
+  /** Importance score (0-100) */
+  importanceScore: number;
+
+  /** Reasoning from the extractor */
+  reasoning: string;
+}
+
+/**
+ * Input for the fallacy judge tool
+ */
+export interface FallacyJudgeInput {
+  /** Full document text for context */
+  documentText: string;
+
+  /** All issues from all extractors */
+  issues: ExtractorIssueInput[];
+
+  /** List of extractor IDs that contributed */
+  extractorIds: string[];
+
+  /** Optional config override (if not provided, reads from FALLACY_JUDGE env var) */
+  judgeConfig?: JudgeConfig;
+
+  /** Optional custom system prompt override */
+  customSystemPrompt?: string;
+}
+
+/**
+ * A judge decision on a single issue or group of similar issues
+ */
+export interface JudgeDecision {
+  /** Judge's decision on this issue/group */
+  decision: 'accept' | 'merge' | 'reject';
+
+  /** Final merged/accepted issue text */
+  finalText: string;
+
+  /** Final issue type */
+  finalIssueType: string;
+
+  /** Final fallacy type (if applicable) */
+  finalFallacyType?: string;
+
+  /** Final severity (may be adjusted by judge) */
+  finalSeverity: number;
+
+  /** Final confidence (may be adjusted by judge) */
+  finalConfidence: number;
+
+  /** Final importance (may be adjusted by judge) */
+  finalImportance: number;
+
+  /** Best reasoning from sources (or synthesized by judge) */
+  finalReasoning: string;
+
+  /** Which extractors found this issue */
+  sourceExtractors: string[];
+
+  /** Original issues from each extractor (indices into input.issues) */
+  sourceIssueIndices: number[];
+
+  /** Judge's reasoning for this decision */
+  judgeReasoning: string;
+}
+
+/**
+ * Output from the fallacy judge tool
+ */
+export interface FallacyJudgeOutput {
+  /** Decisions for accepted/merged issues */
+  acceptedDecisions: JudgeDecision[];
+
+  /** Decisions for rejected issues (for telemetry) */
+  rejectedDecisions: JudgeDecision[];
+
+  /** Summary stats */
+  summary: {
+    totalInputIssues: number;
+    uniqueGroups: number;
+    acceptedCount: number;
+    mergedCount: number;
+    rejectedCount: number;
+  };
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
+
+  /** Actual API parameters sent to the provider */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics from API call */
+  responseMetrics?: ApiResponseMetrics;
+}
+
+/**
+ * Convert judge decisions back to ExtractedFallacyIssue format
+ */
+export function decisionToIssue(decision: JudgeDecision): ExtractedFallacyIssue {
+  return {
+    exactText: decision.finalText,
+    issueType: decision.finalIssueType as ExtractedFallacyIssue['issueType'],
+    fallacyType: decision.finalFallacyType as ExtractedFallacyIssue['fallacyType'],
+    severityScore: decision.finalSeverity,
+    confidenceScore: decision.finalConfidence,
+    importanceScore: decision.finalImportance,
+    reasoning: decision.finalReasoning,
+  };
+}
diff --git a/internal-packages/ai/src/tools/fallacy-review/index.ts b/internal-packages/ai/src/tools/fallacy-review/index.ts
index 75c956503..f508e38e5 100644
--- a/internal-packages/ai/src/tools/fallacy-review/index.ts
+++ b/internal-packages/ai/src/tools/fallacy-review/index.ts
@@ -11,6 +11,7 @@ import { z } from "zod";
 import { Tool, type ToolContext } from "../base/Tool";
 import { callClaudeWithTool } from "../../claude/wrapper";
 import { MODEL_CONFIG } from "../../claude/wrapper";
+import { withDateContext } from "../shared/llm-filter-utils";
 import { fallacyReviewConfig } from "./config";
 import type {
   FallacyReviewInput,
@@ -30,6 +31,7 @@ const reviewCommentSchema = z.object({
 const inputSchema = z.object({
   documentText: z.string().min(1).describe("The full document text being analyzed for epistemic issues"),
   comments: z.array(reviewCommentSchema).min(0).describe("Array of epistemic comments to review and filter for redundancy"),
+  customSystemPrompt: z.string().optional().describe("Custom system prompt override"),
 }) satisfies z.ZodType<FallacyReviewInput>;
 
 const outputSchema = z.object({
@@ -74,7 +76,8 @@ Description: ${comment.description}
       })
       .join('\n---\n\n');
 
-    const systemPrompt = `You are an expert epistemic review editor. Your job is to:
+    // Default system prompt - can be overridden via input.customSystemPrompt
+    const defaultSystemPrompt = `You are an expert epistemic review editor. Your job is to:
 
 1. **Filter Comments** - Remove redundant, weak, or overly similar comments
    - Target keeping 50-90% of comments (be selective!)
@@ -102,6 +105,10 @@ Description: ${comment.description}
 - Be ruthless about redundancy
 - The document summary should read like a professional analysis, not just a list of issues`;
 
+    // Always prepend date context to prevent false positives on recent dates
+    const basePrompt = input.customSystemPrompt || defaultSystemPrompt;
+    const systemPrompt = withDateContext(basePrompt);
+
     const userPrompt = `Review the following epistemic analysis:
 
 **Document Text** (first 2000 chars):
@@ -167,10 +174,15 @@ Please review these comments and provide:
         `[FallacyReview] Filtered ${input.comments.length} comments down to ${validIndices.length}`
       );
 
+      if (result.unifiedUsage) {
+        context.logger.info(`[FallacyReview] Cost: $${result.unifiedUsage.costUsd.toFixed(6)}`);
+      }
+
       return {
         commentIndicesToKeep: validIndices,
         documentSummary: result.toolResult.documentSummary,
         oneLineSummary: result.toolResult.oneLineSummary,
+        unifiedUsage: result.unifiedUsage,
       };
     } catch (error) {
       context.logger.error("[FallacyReview] Review failed:", error);
diff --git a/internal-packages/ai/src/tools/fallacy-review/types.ts b/internal-packages/ai/src/tools/fallacy-review/types.ts
index ce3261dc3..fa03538d7 100644
--- a/internal-packages/ai/src/tools/fallacy-review/types.ts
+++ b/internal-packages/ai/src/tools/fallacy-review/types.ts
@@ -2,6 +2,8 @@
  * Types for the fallacy review tool
  */
 
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+
 /**
  * Simplified comment representation for review
  */
@@ -34,6 +36,9 @@ export interface FallacyReviewInput {
 
   /** Array of comments to review */
   comments: ReviewComment[];
+
+  /** Optional custom system prompt override */
+  customSystemPrompt?: string;
 }
 
 /**
@@ -48,4 +53,7 @@ export interface FallacyReviewOutput {
 
   /** One-sentence summary for evaluation header */
   oneLineSummary: string;
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
 }
diff --git a/internal-packages/ai/src/tools/generated-schemas.ts b/internal-packages/ai/src/tools/generated-schemas.ts
index 26d719d1c..45546c007 100644
--- a/internal-packages/ai/src/tools/generated-schemas.ts
+++ b/internal-packages/ai/src/tools/generated-schemas.ts
@@ -3,7 +3,7 @@
  * Generated by scripts/generate-tool-schemas.ts
  * DO NOT EDIT MANUALLY
  * 
- * Schema Hash: 2dc92b2afb89d952e1e754b74ea2707195835936258f1a5b1609257f8086cc86
+ * Schema Hash: 101175f0cd55ed3570c4c6eeda1489f9aea9928bfd8237833a014a2b9c4f4f10
  */
 
 export const toolSchemas = {
@@ -2064,6 +2064,8 @@ export const toolSchemas = {
             "enum": [
               "anthropic/claude-sonnet-4.5",
               "anthropic/claude-sonnet-4",
+              "google/gemini-3-pro-preview",
+              "google/gemini-3-flash-preview",
               "google/gemini-2.5-pro",
               "google/gemini-2.5-flash",
               "openai/gpt-5",
@@ -2331,6 +2333,10 @@ export const toolSchemas = {
           },
           "minItems": 0,
           "description": "Array of epistemic comments to review and filter for redundancy"
+        },
+        "customSystemPrompt": {
+          "type": "string",
+          "description": "Custom system prompt override"
         }
       },
       "required": [
@@ -2388,23 +2394,61 @@ export const toolSchemas = {
       "properties": {
         "text": {
           "type": "string",
-          "minLength": 1,
           "maxLength": 50000,
-          "description": "Text chunk to analyze for epistemic issues and logical fallacies"
+          "description": "Text chunk to analyze (optional if documentText provided)"
         },
         "documentText": {
           "type": "string",
-          "description": "Full document text (optional, used for accurate location finding)"
+          "description": "Full document text - used for analysis in single-pass mode, or for location finding in chunk mode"
         },
         "chunkStartOffset": {
           "type": "number",
           "minimum": 0,
           "description": "Byte offset where this chunk starts in the full document (optimization for location finding)"
+        },
+        "model": {
+          "type": "string",
+          "description": "Model to use (Claude or OpenRouter model ID)"
+        },
+        "temperature": {
+          "anyOf": [
+            {
+              "type": "number",
+              "minimum": 0,
+              "maximum": 2
+            },
+            {
+              "type": "string",
+              "const": "default"
+            }
+          ],
+          "description": "Temperature for extraction (default: 0 for Claude, 0.1 for OpenRouter, 'default' to use model's native default)"
+        },
+        "thinking": {
+          "type": "boolean",
+          "description": "Enable extended thinking/reasoning (default: true for Claude, varies for OpenRouter)"
+        },
+        "customSystemPrompt": {
+          "type": "string",
+          "description": "Custom system prompt override"
+        },
+        "customUserPrompt": {
+          "type": "string",
+          "description": "Custom user prompt override (document text appended)"
+        },
+        "minSeverityThreshold": {
+          "type": "number",
+          "minimum": 0,
+          "maximum": 100,
+          "description": "Minimum severity threshold (default: 60)"
+        },
+        "maxIssues": {
+          "type": "number",
+          "minimum": 1,
+          "maximum": 100,
+          "description": "Maximum issues to return (default: 15)"
         }
       },
-      "required": [
-        "text"
-      ],
       "additionalProperties": false,
       "$schema": "http://json-schema.org/draft-07/schema#"
     },
diff --git a/internal-packages/ai/src/tools/principle-of-charity-filter/config.ts b/internal-packages/ai/src/tools/principle-of-charity-filter/config.ts
new file mode 100644
index 000000000..ba85308c6
--- /dev/null
+++ b/internal-packages/ai/src/tools/principle-of-charity-filter/config.ts
@@ -0,0 +1,13 @@
+/**
+ * Configuration for principle-of-charity filter tool
+ */
+
+import type { ToolConfig } from "../base/Tool";
+
+export const principleOfCharityFilterConfig: ToolConfig = {
+  id: "principle-of-charity-filter",
+  name: "Principle of Charity Filter",
+  description: "Filters issues that dissolve when applying charitable interpretation",
+  version: "1.0.0",
+  category: "utility",
+};
diff --git a/internal-packages/ai/src/tools/principle-of-charity-filter/index.ts b/internal-packages/ai/src/tools/principle-of-charity-filter/index.ts
new file mode 100644
index 000000000..500e51c09
--- /dev/null
+++ b/internal-packages/ai/src/tools/principle-of-charity-filter/index.ts
@@ -0,0 +1,326 @@
+/**
+ * Principle of Charity Filter Tool
+ *
+ * Applies the principle of charity - interpreting arguments in their strongest,
+ * most reasonable form before critiquing. Filters out issues that don't hold
+ * when the author's argument is charitably interpreted.
+ */
+
+import { z } from "zod";
+import { Tool, type ToolContext } from "../base/Tool";
+import {
+  callLLMFilter,
+  withDateContext,
+  type ReasoningConfig,
+  type ProviderPreferences,
+} from "../shared/llm-filter-utils";
+import type {
+  PrincipleOfCharityFilterInput,
+  PrincipleOfCharityFilterOutput,
+  CharityFilterResult,
+} from "./types";
+import { DEFAULT_PRINCIPLE_OF_CHARITY_SYSTEM_PROMPT } from "./prompts";
+import { principleOfCharityFilterConfig } from "./config";
+
+// ============================================================================
+// Schemas
+// ============================================================================
+
+const issueSchema = z.object({
+  quotedText: z.string().describe("The exact text flagged as an issue"),
+  issueType: z.string().describe("Type of issue identified"),
+  reasoning: z.string().describe("The reasoning for why this was flagged"),
+  locationOffset: z.number().optional().describe("Approximate location in document"),
+});
+
+const reasoningSchema = z.union([
+  z.literal(false),
+  z.object({ effort: z.enum(["minimal", "low", "medium", "high", "xhigh"]) }),
+  z.object({ budget_tokens: z.number().min(1024) }),
+]);
+
+const providerSchema = z.object({
+  order: z.array(z.string()).optional(),
+  allow_fallbacks: z.boolean().optional(),
+});
+
+const inputSchema = z.object({
+  documentText: z.string().min(1).max(200000).describe("Full document text for context"),
+  issues: z.array(issueSchema).describe("Issues to evaluate with principle of charity"),
+  model: z.string().optional().describe("Model to use (Claude or OpenRouter model ID)"),
+  temperature: z.number().min(0).max(2).optional().describe("Temperature (0-2). Default 0.2"),
+  reasoning: reasoningSchema.optional().describe("Reasoning/thinking configuration"),
+  provider: providerSchema.optional().describe("Provider routing preferences (OpenRouter only)"),
+  customPrompt: z.string().optional().describe("Custom system prompt (overrides default)"),
+});
+
+const resultSchema = z.object({
+  index: z.number().describe("Index of the issue in the input array"),
+  remainsValid: z.boolean().describe("Whether issue remains valid under charitable interpretation"),
+  charitableInterpretation: z.string().describe("The charitable interpretation of the argument"),
+  explanation: z.string().describe("Explanation of why issue does/doesn't hold"),
+});
+
+const outputSchema = z.object({
+  validIssues: z.array(resultSchema).describe("Issues that remain valid under charity"),
+  dissolvedIssues: z.array(resultSchema).describe("Issues that dissolve under charity"),
+});
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+const FILTER_NAME = "PrincipleOfCharityFilter";
+const DEFAULT_TEMPERATURE = 0.2;
+const INTRO_LENGTH = 2000;
+const CONCLUSION_LENGTH = 1500;
+const CONTEXT_RADIUS = 500;
+const MAX_CONTEXT_LENGTH = 12000;
+
+// ============================================================================
+// Tool Schema for LLM
+// ============================================================================
+
+const toolSchema = {
+  type: "object" as const,
+  properties: {
+    results: {
+      type: "array",
+      items: {
+        type: "object",
+        properties: {
+          index: {
+            type: "number",
+            description: "Index of the issue (0-based)",
+          },
+          remainsValid: {
+            type: "boolean",
+            description: "Whether issue remains valid under charitable interpretation",
+          },
+          charitableInterpretation: {
+            type: "string",
+            description: "The most charitable interpretation of the author's argument",
+          },
+          explanation: {
+            type: "string",
+            description: "Explanation of why the issue does/doesn't hold",
+          },
+        },
+        required: ["index", "remainsValid", "charitableInterpretation", "explanation"],
+      },
+    },
+  },
+  required: ["results"],
+};
+
+type FilterResults = {
+  results: Array<{
+    index: number;
+    remainsValid: boolean;
+    charitableInterpretation: string;
+    explanation: string;
+  }>;
+};
+
+// ============================================================================
+// Tool Implementation
+// ============================================================================
+
+export class PrincipleOfCharityFilterTool extends Tool<
+  PrincipleOfCharityFilterInput,
+  PrincipleOfCharityFilterOutput
+> {
+  config = principleOfCharityFilterConfig;
+  inputSchema = inputSchema;
+  outputSchema = outputSchema;
+
+  async execute(
+    input: PrincipleOfCharityFilterInput,
+    context: ToolContext
+  ): Promise<PrincipleOfCharityFilterOutput> {
+    // Log input issues
+    context.logger.debug(`[${FILTER_NAME}] Starting with ${input.issues.length} issues`);
+    for (let i = 0; i < input.issues.length; i++) {
+      context.logger.debug(
+        `[${FILTER_NAME}] Issue ${i}: "${input.issues[i].quotedText.substring(0, 60)}..." (${input.issues[i].issueType})`
+      );
+    }
+
+    context.logger.info(
+      `[${FILTER_NAME}] Evaluating ${input.issues.length} issues with principle of charity`
+    );
+
+    // If no issues, return empty result
+    if (input.issues.length === 0) {
+      return {
+        validIssues: [],
+        dissolvedIssues: [],
+      };
+    }
+
+    // Format issues for the LLM
+    const formattedIssues = input.issues
+      .map((issue, idx) => {
+        return `**Issue ${idx}**:
+Text: "${issue.quotedText}"
+Type: ${issue.issueType}
+Reasoning: ${issue.reasoning}
+`;
+      })
+      .join("\n---\n\n");
+
+    // Prepare document text (truncate if needed)
+    const docForPrompt =
+      input.documentText.length <= 15000
+        ? input.documentText
+        : this.extractRelevantContext(input.documentText, input.issues);
+
+    const userPrompt = `Apply the Principle of Charity to evaluate these flagged issues:
+
+**Document Context**:
+${docForPrompt}
+
+**Issues to Evaluate**:
+
+${formattedIssues}
+
+For each issue:
+1. First, articulate the most charitable interpretation of the author's argument
+2. Then determine if the issue still holds under that interpretation
+3. Explain your reasoning`;
+
+    try {
+      // Always prepend date context to prevent false positives on recent dates
+      const basePrompt = input.customPrompt || DEFAULT_PRINCIPLE_OF_CHARITY_SYSTEM_PROMPT;
+      const result = await callLLMFilter<FilterResults>(
+        {
+          model: input.model,
+          modelEnvVar: "CHARITY_FILTER_MODEL",
+          systemPrompt: withDateContext(basePrompt),
+          userPrompt,
+          temperature: input.temperature ?? DEFAULT_TEMPERATURE,
+          reasoning: input.reasoning as ReasoningConfig | undefined,
+          provider: input.provider as ProviderPreferences | undefined,
+          toolName: "principle_of_charity_results",
+          toolDescription: "Results of evaluating issues with principle of charity",
+          toolSchema,
+          filterName: FILTER_NAME,
+        },
+        context
+      );
+
+      // Process results
+      const validIssues: CharityFilterResult[] = [];
+      const dissolvedIssues: CharityFilterResult[] = [];
+
+      for (const r of result.toolResult.results) {
+        // Validate index is in range
+        if (r.index < 0 || r.index >= input.issues.length) {
+          context.logger.warn(`[${FILTER_NAME}] Invalid index ${r.index}, skipping`);
+          continue;
+        }
+
+        const filterResult: CharityFilterResult = {
+          index: r.index,
+          remainsValid: r.remainsValid,
+          charitableInterpretation: r.charitableInterpretation,
+          explanation: r.explanation,
+        };
+
+        if (r.remainsValid) {
+          validIssues.push(filterResult);
+        } else {
+          dissolvedIssues.push(filterResult);
+        }
+      }
+
+      context.logger.info(
+        `[${FILTER_NAME}] ${dissolvedIssues.length}/${input.issues.length} issues dissolved (filtered out), ${validIssues.length} remain valid`
+      );
+
+      // Debug logging
+      for (const issue of validIssues) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Issue ${issue.index} REMAINS VALID: ${issue.explanation.substring(0, 100)}...`
+        );
+      }
+      for (const issue of dissolvedIssues) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Issue ${issue.index} DISSOLVED: ${issue.explanation.substring(0, 100)}...`
+        );
+      }
+
+      if (result.unifiedUsage) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Cost: $${result.unifiedUsage.costUsd.toFixed(6)}`
+        );
+      }
+
+      return {
+        validIssues,
+        dissolvedIssues,
+        unifiedUsage: result.unifiedUsage,
+        actualApiParams: result.actualApiParams,
+        responseMetrics: result.responseMetrics,
+      };
+    } catch (error) {
+      context.logger.error(`[${FILTER_NAME}] Filter failed:`, error);
+      // Fallback: assume all issues remain valid (keep them)
+      return {
+        validIssues: input.issues.map((_, idx) => ({
+          index: idx,
+          remainsValid: true,
+          charitableInterpretation: "Fallback: filter failed, preserving issue",
+          explanation: "Fallback: filter failed, preserving issue",
+        })),
+        dissolvedIssues: [],
+      };
+    }
+  }
+
+  /**
+   * Extract relevant context around the flagged issues
+   */
+  private extractRelevantContext(
+    documentText: string,
+    issues: PrincipleOfCharityFilterInput["issues"]
+  ): string {
+    const chunks: string[] = [];
+
+    // Always include intro
+    chunks.push("**[INTRODUCTION]**\n" + documentText.substring(0, INTRO_LENGTH));
+
+    // Include context around each issue
+    for (const issue of issues) {
+      if (issue.locationOffset !== undefined) {
+        const start = Math.max(0, issue.locationOffset - CONTEXT_RADIUS);
+        const end = Math.min(
+          documentText.length,
+          issue.locationOffset + issue.quotedText.length + CONTEXT_RADIUS
+        );
+        chunks.push(
+          `**[CONTEXT FOR: "${issue.quotedText.substring(0, 50)}..."]**\n` +
+            documentText.substring(start, end)
+        );
+      }
+    }
+
+    // Always include conclusion if document is long enough
+    if (documentText.length > INTRO_LENGTH + CONCLUSION_LENGTH) {
+      chunks.push(
+        "**[CONCLUSION]**\n" + documentText.substring(documentText.length - CONCLUSION_LENGTH)
+      );
+    }
+
+    // Combine and truncate to max length
+    let result = chunks.join("\n\n---\n\n");
+    if (result.length > MAX_CONTEXT_LENGTH) {
+      result = result.substring(0, MAX_CONTEXT_LENGTH) + "\n...[truncated]...";
+    }
+
+    return result;
+  }
+}
+
+export const principleOfCharityFilterTool = new PrincipleOfCharityFilterTool();
+export default principleOfCharityFilterTool;
diff --git a/internal-packages/ai/src/tools/principle-of-charity-filter/prompts.ts b/internal-packages/ai/src/tools/principle-of-charity-filter/prompts.ts
new file mode 100644
index 000000000..b3315133e
--- /dev/null
+++ b/internal-packages/ai/src/tools/principle-of-charity-filter/prompts.ts
@@ -0,0 +1,64 @@
+/**
+ * Default prompts for principle-of-charity filter
+ *
+ * These are exported so they can be displayed in the profile editor UI
+ * as placeholders/defaults while allowing customization.
+ */
+
+export const DEFAULT_PRINCIPLE_OF_CHARITY_SYSTEM_PROMPT = `You are an expert at applying the Principle of Charity in argument analysis.
+
+The Principle of Charity requires interpreting an argument in its **strongest, most reasonable form** before critiquing it. This means:
+- Assume the author is rational and arguing in good faith
+- Fill in unstated but reasonable assumptions
+- Choose the most plausible interpretation of ambiguous statements
+- Consider the full context of the argument
+
+Your task: For each flagged issue, first articulate the **strongest charitable interpretation** of the author's argument, then determine if the issue **still holds** under that interpretation.
+
+**FILTER OUT (issue dissolves) if**:
+- A reasonable reader would understand the author's intent without confusion
+- The issue relies on an uncharitable or overly literal reading
+- The author's meaning is clear from context even if imprecisely stated
+- The critique attacks a strawman rather than the author's actual point
+- Common rhetorical conventions explain the phrasing (e.g., "studies show" in casual writing)
+- The issue is technically correct but misses the forest for the trees
+
+**KEEP FLAGGING (issue remains valid) if**:
+- Even the most charitable interpretation has a genuine flaw
+- The logical error persists regardless of how generously we interpret the argument
+- The issue is about missing evidence that no interpretation can supply
+- The problem is fundamental to the argument, not just its expression
+- A reasonable reader would still be misled or confused
+
+**Examples of issues that DISSOLVE under charity**:
+
+1. Issue: "Hasty generalization - claims 'most people prefer X' without survey data"
+   Charitable interpretation: Author is sharing a common observation, not making a statistical claim
+   → DISSOLVES - reasonable readers understand this as informal observation, not rigorous claim
+
+2. Issue: "Appeal to authority - cites 'experts' without naming them"
+   Charitable interpretation: Author is summarizing general expert consensus in a casual context
+   → DISSOLVES - in blog posts/essays, "experts say" is understood as shorthand
+
+3. Issue: "False dichotomy - presents only two options"
+   Charitable interpretation: Author is highlighting the two most relevant options for this context
+   → DISSOLVES - reasonable simplification, not claiming these are the ONLY options
+
+**Examples of issues that REMAIN VALID under charity**:
+
+1. Issue: "Circular reasoning - conclusion assumes what it's trying to prove"
+   Charitable interpretation: Even granting all reasonable assumptions, the logic is circular
+   → REMAINS VALID - no interpretation fixes circular reasoning
+
+2. Issue: "Misrepresents source - claims paper says X when it says Y"
+   Charitable interpretation: Perhaps author misread, but the claim is factually wrong
+   → REMAINS VALID - charitable intent doesn't fix factual errors
+
+3. Issue: "Ad hominem - dismisses argument because of who made it"
+   Charitable interpretation: Author may dislike the person, but that doesn't address their argument
+   → REMAINS VALID - attacking the person instead of the argument is still a fallacy
+
+For each issue, provide:
+1. The most charitable interpretation of the author's argument
+2. Whether the issue remains valid under that interpretation
+3. Brief explanation of your reasoning`;
diff --git a/internal-packages/ai/src/tools/principle-of-charity-filter/types.ts b/internal-packages/ai/src/tools/principle-of-charity-filter/types.ts
new file mode 100644
index 000000000..b7aa920a7
--- /dev/null
+++ b/internal-packages/ai/src/tools/principle-of-charity-filter/types.ts
@@ -0,0 +1,91 @@
+/**
+ * Principle of Charity Filter Types
+ *
+ * This filter applies the "principle of charity" - interpreting arguments
+ * in their strongest, most reasonable form before critiquing. It filters out
+ * issues that don't hold up when the author's argument is charitably interpreted.
+ */
+
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+import type {
+  ReasoningEffort,
+  ReasoningConfig,
+  ProviderPreferences,
+  ActualApiParams,
+  ApiResponseMetrics,
+} from '../../types/common';
+
+// Re-export for backwards compatibility
+export type { ReasoningEffort, ReasoningConfig, ProviderPreferences, ActualApiParams, ApiResponseMetrics };
+
+export interface PrincipleOfCharityFilterInput {
+  /** Full document text for context */
+  documentText: string;
+
+  /** Issues to evaluate with principle of charity */
+  issues: CharityFilterIssue[];
+
+  /**
+   * Optional model to use for filtering.
+   * Can be a Claude model (default) or an OpenRouter model ID.
+   * Examples: "claude-sonnet-4-20250514", "google/gemini-3-flash-preview"
+   */
+  model?: string;
+
+  /** Temperature for the LLM (0-2). Default is 0.2 for thoughtful analysis. */
+  temperature?: number;
+
+  /** Reasoning/thinking configuration for Claude models */
+  reasoning?: ReasoningConfig;
+
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+
+  /** Custom system prompt (overrides default) */
+  customPrompt?: string;
+}
+
+export interface CharityFilterIssue {
+  /** The exact text flagged as an issue */
+  quotedText: string;
+
+  /** Type of issue identified */
+  issueType: string;
+
+  /** The reasoning for why this was flagged */
+  reasoning: string;
+
+  /** Approximate location in document (character offset) */
+  locationOffset?: number;
+}
+
+export interface PrincipleOfCharityFilterOutput {
+  /** Issues that remain valid even under charitable interpretation (keep flagging) */
+  validIssues: CharityFilterResult[];
+
+  /** Issues that dissolve under charitable interpretation (filter out) */
+  dissolvedIssues: CharityFilterResult[];
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
+
+  /** Actual API parameters sent */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics */
+  responseMetrics?: ApiResponseMetrics;
+}
+
+export interface CharityFilterResult {
+  /** Index of the issue in the input array */
+  index: number;
+
+  /** Whether this issue remains valid under charitable interpretation */
+  remainsValid: boolean;
+
+  /** The charitable interpretation of the author's argument */
+  charitableInterpretation: string;
+
+  /** Explanation of why the issue does/doesn't hold under charitable interpretation */
+  explanation: string;
+}
diff --git a/internal-packages/ai/src/tools/shared/llm-filter-utils.ts b/internal-packages/ai/src/tools/shared/llm-filter-utils.ts
new file mode 100644
index 000000000..c11ae36e9
--- /dev/null
+++ b/internal-packages/ai/src/tools/shared/llm-filter-utils.ts
@@ -0,0 +1,427 @@
+/**
+ * Shared LLM Filter Utilities
+ *
+ * Common utilities for LLM-based filter tools. Abstracts away the differences
+ * between Claude API and OpenRouter API calls.
+ */
+
+import { callClaudeWithTool } from "../../claude/wrapper";
+import { MODEL_CONFIG } from "../../claude/wrapper";
+import { callOpenRouterWithTool } from "../../utils/openrouter";
+import { effortToBudgetTokens } from "../../types/common";
+import type { UnifiedUsageMetrics } from "../../utils/usageMetrics";
+import type { ToolContext } from "../base/Tool";
+
+// ============================================================================
+// Types
+// ============================================================================
+
+/** Reasoning configuration (matches common.ts) */
+export type ReasoningConfig =
+  | false
+  | { effort: "minimal" | "low" | "medium" | "high" | "xhigh" }
+  | { budget_tokens: number };
+
+/** Provider preferences for OpenRouter */
+export interface ProviderPreferences {
+  order?: string[];
+  allow_fallbacks?: boolean;
+}
+
+/** Actual API parameters sent to provider */
+export interface ActualApiParams {
+  model: string;
+  temperature: number;
+  maxTokens: number;
+  thinking?: {
+    type: "enabled";
+    budget_tokens: number;
+  };
+  reasoning?: {
+    effort?: "minimal" | "low" | "medium" | "high" | "xhigh";
+    max_tokens?: number;
+  };
+}
+
+/** Response metrics from API call */
+export interface ApiResponseMetrics {
+  success: boolean;
+  latencyMs: number;
+  inputTokens?: number;
+  outputTokens?: number;
+  stopReason?: string;
+}
+
+/** Input for LLM filter call */
+export interface LLMFilterCallInput {
+  /** Model ID (Claude model or OpenRouter format like "provider/model") */
+  model?: string;
+  /** Environment variable name for model override (e.g., "FALLACY_FILTER_MODEL") */
+  modelEnvVar?: string;
+  /** System prompt */
+  systemPrompt: string;
+  /** User prompt */
+  userPrompt: string;
+  /** Temperature (defaults vary by filter) */
+  temperature: number;
+  /** Reasoning/thinking configuration */
+  reasoning?: ReasoningConfig;
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+  /** Tool name for structured output */
+  toolName: string;
+  /** Tool description for structured output */
+  toolDescription: string;
+  /** JSON Schema for tool output (must be object type) */
+  toolSchema: { type: "object"; properties: Record<string, unknown>; required?: string[] };
+  /** Max tokens for response (defaults: Claude=4000, OpenRouter=8000) */
+  maxTokens?: { claude?: number; openRouter?: number };
+  /** Filter name for logging */
+  filterName: string;
+}
+
+/** Output from LLM filter call */
+export interface LLMFilterCallOutput<T> {
+  toolResult: T;
+  unifiedUsage?: UnifiedUsageMetrics;
+  actualApiParams: ActualApiParams;
+  responseMetrics: ApiResponseMetrics;
+}
+
+// ============================================================================
+// Utility Functions
+// ============================================================================
+
+/**
+ * Detect if model is an OpenRouter model (contains "/" in ID)
+ */
+export function isOpenRouterModel(modelId: string): boolean {
+  return modelId.includes("/");
+}
+
+/**
+ * Resolve model ID from input, environment variable, or default
+ */
+export function resolveModelId(
+  inputModel: string | undefined,
+  envVarName: string | undefined
+): string {
+  if (inputModel) return inputModel;
+  const envValue = envVarName ? process.env[envVarName] : undefined;
+  if (envValue) return envValue;
+  return MODEL_CONFIG.analysis;
+}
+
+/**
+ * Build Claude thinking configuration from reasoning settings
+ */
+export function buildThinkingConfig(
+  reasoning: ReasoningConfig | undefined
+): { type: "enabled"; budget_tokens: number } | undefined {
+  if (reasoning === undefined || reasoning === false) {
+    return undefined;
+  }
+
+  if ("effort" in reasoning) {
+    return {
+      type: "enabled",
+      budget_tokens: effortToBudgetTokens(reasoning.effort),
+    };
+  }
+
+  if ("budget_tokens" in reasoning) {
+    return {
+      type: "enabled",
+      budget_tokens: reasoning.budget_tokens,
+    };
+  }
+
+  return undefined;
+}
+
+/**
+ * Build OpenRouter reasoning settings from reasoning config
+ */
+export function buildOpenRouterReasoning(
+  reasoning: ReasoningConfig | undefined
+): { enabled: boolean; effort?: "minimal" | "low" | "medium" | "high" | "xhigh" } {
+  if (reasoning === undefined || reasoning === false) {
+    return { enabled: false };
+  }
+
+  if ("effort" in reasoning) {
+    return { enabled: true, effort: reasoning.effort };
+  }
+
+  // For budget_tokens, we don't have a direct mapping to OpenRouter effort,
+  // so we enable thinking but don't set a specific effort level
+  return { enabled: true };
+}
+
+// ============================================================================
+// Main LLM Call Function
+// ============================================================================
+
+/**
+ * Call an LLM (Claude or OpenRouter) with tool use for filter operations.
+ *
+ * This function abstracts the differences between:
+ * - Claude API (direct calls with thinking parameter)
+ * - OpenRouter API (with reasoning effort parameter)
+ *
+ * @example
+ * const result = await callLLMFilter<MyToolResult>({
+ *   model: "claude-sonnet-4-5-20250929",
+ *   systemPrompt: "You are a helpful assistant...",
+ *   userPrompt: "Analyze this...",
+ *   temperature: 0.1,
+ *   reasoning: { effort: "medium" },
+ *   toolName: "my_tool",
+ *   toolDescription: "Description of output",
+ *   toolSchema: { ... },
+ *   filterName: "MyFilter",
+ * }, context);
+ */
+export async function callLLMFilter<T>(
+  input: LLMFilterCallInput,
+  context: ToolContext
+): Promise<LLMFilterCallOutput<T>> {
+  const modelId = resolveModelId(input.model, input.modelEnvVar);
+  const useOpenRouter = isOpenRouterModel(modelId);
+
+  const claudeMaxTokens = input.maxTokens?.claude ?? 4000;
+  const openRouterMaxTokens = input.maxTokens?.openRouter ?? 8000;
+
+  context.logger.debug(
+    `[${input.filterName}] Calling ${useOpenRouter ? "OpenRouter" : "Claude"}: model=${modelId}, temp=${input.temperature}`
+  );
+
+  if (useOpenRouter) {
+    return callOpenRouterFilter<T>(input, modelId, openRouterMaxTokens, context);
+  } else {
+    return callClaudeFilter<T>(input, modelId, claudeMaxTokens, context);
+  }
+}
+
+/**
+ * Call OpenRouter API with tool use
+ */
+async function callOpenRouterFilter<T>(
+  input: LLMFilterCallInput,
+  modelId: string,
+  maxTokens: number,
+  context: ToolContext
+): Promise<LLMFilterCallOutput<T>> {
+  const { enabled: thinkingEnabled, effort: reasoningEffort } = buildOpenRouterReasoning(input.reasoning);
+
+  const reasoningInfo = reasoningEffort ? `, reasoning: ${reasoningEffort}` : "";
+  context.logger.debug(
+    `[${input.filterName}] OpenRouter params: model=${modelId}, temp=${input.temperature}${reasoningInfo}`
+  );
+
+  const result = await callOpenRouterWithTool<T>({
+    model: modelId,
+    system: input.systemPrompt,
+    messages: [{ role: "user", content: input.userPrompt }],
+    max_tokens: maxTokens,
+    temperature: input.temperature,
+    toolName: input.toolName,
+    toolDescription: input.toolDescription,
+    toolSchema: input.toolSchema,
+    thinking: thinkingEnabled,
+    ...(reasoningEffort && { reasoningEffort }),
+    ...(input.provider && { provider: input.provider }),
+  });
+
+  // Map reasoning effort, filtering out 'none' which isn't valid for ActualApiParams
+  const reasoning = result.actualParams.reasoning;
+  const mappedReasoning = reasoning && reasoning.effort !== 'none'
+    ? { effort: reasoning.effort as Exclude<typeof reasoning.effort, 'none'>, max_tokens: reasoning.max_tokens }
+    : reasoning?.max_tokens ? { max_tokens: reasoning.max_tokens } : undefined;
+
+  return {
+    toolResult: result.toolResult,
+    unifiedUsage: result.unifiedUsage,
+    actualApiParams: {
+      model: result.actualParams.model,
+      temperature: result.actualParams.temperature,
+      maxTokens: result.actualParams.maxTokens,
+      reasoning: mappedReasoning,
+    },
+    responseMetrics: {
+      success: result.responseMetrics.success,
+      latencyMs: result.responseMetrics.latencyMs,
+      inputTokens: result.responseMetrics.inputTokens,
+      outputTokens: result.responseMetrics.outputTokens,
+      stopReason: result.responseMetrics.stopReason,
+    },
+  };
+}
+
+/**
+ * Call Claude API with tool use
+ */
+async function callClaudeFilter<T>(
+  input: LLMFilterCallInput,
+  modelId: string,
+  maxTokens: number,
+  context: ToolContext
+): Promise<LLMFilterCallOutput<T>> {
+  const thinkingConfig = buildThinkingConfig(input.reasoning);
+
+  context.logger.debug(
+    `[${input.filterName}] Claude params: model=${modelId}, temp=${input.temperature}, thinking=${
+      thinkingConfig ? `enabled (${thinkingConfig.budget_tokens} tokens)` : "disabled"
+    }`
+  );
+
+  const result = await callClaudeWithTool<T>({
+    model: modelId,
+    system: input.systemPrompt,
+    messages: [{ role: "user", content: input.userPrompt }],
+    max_tokens: maxTokens,
+    temperature: input.temperature,
+    toolName: input.toolName,
+    toolDescription: input.toolDescription,
+    toolSchema: input.toolSchema,
+    thinking: thinkingConfig,
+  });
+
+  return {
+    toolResult: result.toolResult,
+    unifiedUsage: result.unifiedUsage,
+    actualApiParams: {
+      model: modelId,
+      temperature: input.temperature,
+      maxTokens,
+      thinking: thinkingConfig,
+      reasoning: thinkingConfig ? { max_tokens: thinkingConfig.budget_tokens } : undefined,
+    },
+    responseMetrics: {
+      success: true,
+      latencyMs: 0, // Claude wrapper doesn't expose latency
+      inputTokens: result.unifiedUsage?.inputTokens,
+      outputTokens: result.unifiedUsage?.outputTokens,
+      stopReason: "tool_use",
+    },
+  };
+}
+
+// ============================================================================
+// Document Processing Utilities
+// ============================================================================
+
+/**
+ * Truncate document text for LLM context with key sections preserved.
+ * Keeps intro, conclusion, and optionally sections with specific keywords.
+ */
+export function truncateDocumentForContext(
+  documentText: string,
+  options: {
+    maxLength?: number;
+    introLength?: number;
+    conclusionLength?: number;
+    evidenceKeywords?: string[];
+  } = {}
+): string {
+  const {
+    maxLength = 12000,
+    introLength = 2000,
+    conclusionLength = 2000,
+    evidenceKeywords = [],
+  } = options;
+
+  // If document is short enough, return as-is
+  if (documentText.length <= maxLength) {
+    return documentText;
+  }
+
+  const chunks: string[] = [];
+
+  // Always include intro
+  chunks.push("**[INTRO/BEGINNING]**\n" + documentText.substring(0, introLength));
+
+  // Always include conclusion if document is long enough
+  if (documentText.length > introLength + conclusionLength) {
+    chunks.push(
+      "**[CONCLUSION/END]**\n" +
+        documentText.substring(documentText.length - conclusionLength)
+    );
+  }
+
+  // Find sections with evidence keywords if provided
+  if (evidenceKeywords.length > 0) {
+    const lines = documentText.split("\n");
+    let currentSection = "";
+    let sectionHasEvidence = false;
+
+    for (const line of lines) {
+      const lowerLine = line.toLowerCase();
+
+      if (evidenceKeywords.some((kw) => lowerLine.includes(kw))) {
+        sectionHasEvidence = true;
+      }
+
+      // Check for section headers
+      if (line.startsWith("#") || line.match(/^[A-Z][A-Z\s]{3,}$/)) {
+        if (sectionHasEvidence && currentSection.length > 100) {
+          chunks.push("**[EVIDENCE SECTION]**\n" + currentSection.substring(0, 1500));
+        }
+        currentSection = line + "\n";
+        sectionHasEvidence = false;
+      } else {
+        currentSection += line + "\n";
+      }
+    }
+  }
+
+  // Combine and truncate to max length
+  let result = chunks.join("\n\n---\n\n");
+  if (result.length > maxLength) {
+    result = result.substring(0, maxLength) + "\n...[truncated]...";
+  }
+
+  return result;
+}
+
+// ============================================================================
+// Date Context Utilities
+// ============================================================================
+
+/**
+ * Generate a date context string to prepend to system prompts.
+ *
+ * This is CRITICAL for preventing false positives where the model thinks
+ * recent dates are "in the future" due to training cutoff.
+ *
+ * @example
+ * const prompt = getDateContext() + baseSystemPrompt;
+ */
+export function getDateContext(): string {
+  const now = new Date();
+  const options: Intl.DateTimeFormatOptions = {
+    weekday: 'long',
+    year: 'numeric',
+    month: 'long',
+    day: 'numeric',
+  };
+  const formattedDate = now.toLocaleDateString('en-US', options);
+
+  return `**CURRENT DATE**: ${formattedDate}
+
+Use this date as your reference point when evaluating claims about timing and events.
+
+---
+
+`;
+}
+
+/**
+ * Prepend date context to a system prompt.
+ *
+ * @param systemPrompt - The base system prompt
+ * @returns System prompt with date context prepended
+ */
+export function withDateContext(systemPrompt: string): string {
+  return getDateContext() + systemPrompt;
+}
diff --git a/internal-packages/ai/src/tools/smart-text-searcher/index.ts b/internal-packages/ai/src/tools/smart-text-searcher/index.ts
index 2de7ab09e..6fa423354 100644
--- a/internal-packages/ai/src/tools/smart-text-searcher/index.ts
+++ b/internal-packages/ai/src/tools/smart-text-searcher/index.ts
@@ -98,8 +98,8 @@ export class FuzzyTextLocatorTool extends Tool<
   ): Promise<TextLocationFinderOutput> {
     const startTime = Date.now();
 
-    // Validate input exists
-    if (!input || !input.documentText || !input.searchText) {
+    // Validate required fields
+    if (!input.documentText || !input.searchText) {
       throw new Error(
         `Invalid input: documentText and searchText are required. Received: ${JSON.stringify(input)}`
       );
@@ -129,9 +129,8 @@ export class FuzzyTextLocatorTool extends Tool<
       );
       logger.debug("locationResult", {
         locationResult,
-        documentText: input.documentText,
-        searchText: input.searchText,
-        locationOptions,
+        documentLength: input.documentText.length,
+        searchText: input.searchText.substring(0, 100) + (input.searchText.length > 100 ? '...' : ''),
       });
 
       // Check if LLM was used based on the strategy
@@ -181,9 +180,9 @@ export class FuzzyTextLocatorTool extends Tool<
   }
 
   // Validation specific to this tool
-  override async validateAccess(context: ToolContext): Promise<boolean> {
+  override validateAccess(_context: ToolContext): Promise<boolean> {
     // This tool requires no special permissions - it's a pure utility
-    return true;
+    return Promise.resolve(true);
   }
 
   // Optional: provide usage examples
@@ -249,7 +248,8 @@ export class FuzzyTextLocatorTool extends Tool<
 }
 
 // Export the tool instance
-export default new FuzzyTextLocatorTool();
+const fuzzyTextLocatorTool = new FuzzyTextLocatorTool();
+export default fuzzyTextLocatorTool;
 
 // Simple document location interface for compatibility
 export interface DocumentLocation {
diff --git a/internal-packages/ai/src/tools/supported-elsewhere-filter/index.ts b/internal-packages/ai/src/tools/supported-elsewhere-filter/index.ts
new file mode 100644
index 000000000..a68d9802a
--- /dev/null
+++ b/internal-packages/ai/src/tools/supported-elsewhere-filter/index.ts
@@ -0,0 +1,295 @@
+/**
+ * Supported Elsewhere Filter Tool
+ *
+ * Checks if claims or arguments flagged as issues are actually supported,
+ * explained, or qualified elsewhere in the document. Common in well-structured
+ * writing where intro claims are backed up later in the text.
+ */
+
+import { z } from "zod";
+import { Tool, type ToolContext } from "../base/Tool";
+import {
+  callLLMFilter,
+  truncateDocumentForContext,
+  withDateContext,
+  type ReasoningConfig,
+  type ProviderPreferences,
+} from "../shared/llm-filter-utils";
+import type {
+  SupportedElsewhereFilterInput,
+  SupportedElsewhereFilterOutput,
+  SupportedElsewhereResult,
+} from "./types";
+import { DEFAULT_SUPPORTED_ELSEWHERE_SYSTEM_PROMPT } from "./prompts";
+
+// ============================================================================
+// Schemas
+// ============================================================================
+
+const issueSchema = z.object({
+  quotedText: z.string().describe("The exact text flagged as an issue"),
+  issueType: z.string().describe("Type of issue identified"),
+  reasoning: z.string().describe("The reasoning for why this was flagged"),
+  locationOffset: z.number().optional().describe("Approximate location in document"),
+});
+
+const reasoningSchema = z.union([
+  z.literal(false),
+  z.object({ effort: z.enum(["minimal", "low", "medium", "high", "xhigh"]) }),
+  z.object({ budget_tokens: z.number().min(1024) }),
+]);
+
+const providerSchema = z.object({
+  order: z.array(z.string()).optional(),
+  allow_fallbacks: z.boolean().optional(),
+});
+
+const inputSchema = z.object({
+  documentText: z.string().min(1).max(200000).describe("Full document text to search"),
+  issues: z.array(issueSchema).describe("Issues to check for support elsewhere"),
+  model: z.string().optional().describe("Model to use (Claude or OpenRouter model ID)"),
+  temperature: z.number().min(0).max(2).optional().describe("Temperature (0-2). Default 0.1"),
+  reasoning: reasoningSchema.optional().describe("Reasoning/thinking configuration"),
+  provider: providerSchema.optional().describe("Provider routing preferences (OpenRouter only)"),
+  customPrompt: z.string().optional().describe("Custom system prompt (overrides default)"),
+});
+
+const resultSchema = z.object({
+  index: z.number().describe("Index of the issue in the input array"),
+  isSupported: z.boolean().describe("Whether this issue is supported elsewhere"),
+  supportLocation: z.string().optional().describe("Where the support was found"),
+  explanation: z.string().describe("Explanation of the support or lack thereof"),
+});
+
+const outputSchema = z.object({
+  unsupportedIssues: z.array(resultSchema).describe("Issues NOT supported elsewhere"),
+  supportedIssues: z.array(resultSchema).describe("Issues ARE supported elsewhere"),
+});
+
+// ============================================================================
+// Tool Config
+// ============================================================================
+
+const supportedElsewhereFilterConfig = {
+  id: "supported-elsewhere-filter",
+  name: "Supported Elsewhere Filter",
+  description: "Checks if flagged issues are supported elsewhere in the document",
+  version: "1.0.0",
+  category: "utility" as const,
+};
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+const FILTER_NAME = "SupportedElsewhereFilter";
+const DEFAULT_TEMPERATURE = 0.1;
+const EVIDENCE_KEYWORDS = [
+  "method", "data", "result", "study", "research", "evidence",
+  "citation", "reference", "source", "appendix", "table", "figure",
+  "analysis", "finding", "sample", "participant", "measure",
+  "because", "therefore", "thus", "since", "reason", "explain",
+];
+
+// ============================================================================
+// Tool Schema for LLM
+// ============================================================================
+
+const toolSchema = {
+  type: "object" as const,
+  properties: {
+    results: {
+      type: "array",
+      items: {
+        type: "object",
+        properties: {
+          index: {
+            type: "number",
+            description: "Index of the issue (0-based)",
+          },
+          isSupported: {
+            type: "boolean",
+            description: "Whether this issue is supported elsewhere",
+          },
+          supportLocation: {
+            type: "string",
+            description: "Where the support was found (quote or description)",
+          },
+          explanation: {
+            type: "string",
+            description: "Explanation of why it is/isn't supported",
+          },
+        },
+        required: ["index", "isSupported", "explanation"],
+      },
+    },
+  },
+  required: ["results"],
+};
+
+type FilterResults = {
+  results: Array<{
+    index: number;
+    isSupported: boolean;
+    supportLocation?: string;
+    explanation: string;
+  }>;
+};
+
+// ============================================================================
+// Tool Implementation
+// ============================================================================
+
+export class SupportedElsewhereFilterTool extends Tool<
+  SupportedElsewhereFilterInput,
+  SupportedElsewhereFilterOutput
+> {
+  config = supportedElsewhereFilterConfig;
+  inputSchema = inputSchema;
+  outputSchema = outputSchema;
+
+  async execute(
+    input: SupportedElsewhereFilterInput,
+    context: ToolContext
+  ): Promise<SupportedElsewhereFilterOutput> {
+    // Log input issues
+    context.logger.debug(`[${FILTER_NAME}] Starting with ${input.issues.length} issues`);
+    for (let i = 0; i < input.issues.length; i++) {
+      context.logger.debug(
+        `[${FILTER_NAME}] Issue ${i}: "${input.issues[i].quotedText.substring(0, 60)}..." (${input.issues[i].issueType})`
+      );
+    }
+
+    context.logger.info(
+      `[${FILTER_NAME}] Checking ${input.issues.length} issues for support elsewhere`
+    );
+
+    // If no issues, return empty result
+    if (input.issues.length === 0) {
+      return {
+        unsupportedIssues: [],
+        supportedIssues: [],
+      };
+    }
+
+    // Format issues for the LLM
+    const formattedIssues = input.issues
+      .map((issue, idx) => {
+        return `**Issue ${idx}**:
+Text: "${issue.quotedText}"
+Type: ${issue.issueType}
+Reasoning: ${issue.reasoning}
+`;
+      })
+      .join("\n---\n\n");
+
+    // Prepare document text (truncate if needed)
+    const docForPrompt =
+      input.documentText.length <= 15000
+        ? input.documentText
+        : truncateDocumentForContext(input.documentText, {
+            evidenceKeywords: EVIDENCE_KEYWORDS,
+          });
+
+    const userPrompt = `Search this document for support for the flagged issues:
+
+**Full Document**:
+${docForPrompt}
+
+**Issues to Check**:
+
+${formattedIssues}
+
+For each issue, determine if it is supported elsewhere in the document.`;
+
+    try {
+      // Always prepend date context to prevent false positives on recent dates
+      const basePrompt = input.customPrompt || DEFAULT_SUPPORTED_ELSEWHERE_SYSTEM_PROMPT;
+      const result = await callLLMFilter<FilterResults>(
+        {
+          model: input.model,
+          modelEnvVar: "FALLACY_FILTER_MODEL",
+          systemPrompt: withDateContext(basePrompt),
+          userPrompt,
+          temperature: input.temperature ?? DEFAULT_TEMPERATURE,
+          reasoning: input.reasoning as ReasoningConfig | undefined,
+          provider: input.provider as ProviderPreferences | undefined,
+          toolName: "supported_elsewhere_results",
+          toolDescription: "Results of checking each issue for support elsewhere",
+          toolSchema,
+          filterName: FILTER_NAME,
+        },
+        context
+      );
+
+      // Process results
+      const unsupportedIssues: SupportedElsewhereResult[] = [];
+      const supportedIssues: SupportedElsewhereResult[] = [];
+
+      for (const r of result.toolResult.results) {
+        // Validate index is in range
+        if (r.index < 0 || r.index >= input.issues.length) {
+          context.logger.warn(`[${FILTER_NAME}] Invalid index ${r.index}, skipping`);
+          continue;
+        }
+
+        const filterResult: SupportedElsewhereResult = {
+          index: r.index,
+          isSupported: r.isSupported,
+          supportLocation: r.supportLocation,
+          explanation: r.explanation,
+        };
+
+        if (r.isSupported) {
+          supportedIssues.push(filterResult);
+        } else {
+          unsupportedIssues.push(filterResult);
+        }
+      }
+
+      context.logger.info(
+        `[${FILTER_NAME}] ${supportedIssues.length}/${input.issues.length} issues filtered (supported elsewhere), ${unsupportedIssues.length} kept`
+      );
+
+      // Debug logging
+      for (const issue of unsupportedIssues) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Issue ${issue.index} NOT SUPPORTED: ${issue.explanation}`
+        );
+      }
+      for (const issue of supportedIssues) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Issue ${issue.index} SUPPORTED at "${issue.supportLocation || "N/A"}": ${issue.explanation}`
+        );
+      }
+
+      if (result.unifiedUsage) {
+        context.logger.debug(
+          `[${FILTER_NAME}] Cost: $${result.unifiedUsage.costUsd.toFixed(6)}`
+        );
+      }
+
+      return {
+        unsupportedIssues,
+        supportedIssues,
+        unifiedUsage: result.unifiedUsage,
+        actualApiParams: result.actualApiParams,
+        responseMetrics: result.responseMetrics,
+      };
+    } catch (error) {
+      context.logger.error(`[${FILTER_NAME}] Filter failed:`, error);
+      // Fallback: assume all issues are unsupported (keep them)
+      return {
+        unsupportedIssues: input.issues.map((_, idx) => ({
+          index: idx,
+          isSupported: false,
+          explanation: "Fallback: filter failed, preserving issue",
+        })),
+        supportedIssues: [],
+      };
+    }
+  }
+}
+
+export const supportedElsewhereFilterTool = new SupportedElsewhereFilterTool();
+export default supportedElsewhereFilterTool;
diff --git a/internal-packages/ai/src/tools/supported-elsewhere-filter/prompts.ts b/internal-packages/ai/src/tools/supported-elsewhere-filter/prompts.ts
new file mode 100644
index 000000000..989974a61
--- /dev/null
+++ b/internal-packages/ai/src/tools/supported-elsewhere-filter/prompts.ts
@@ -0,0 +1,53 @@
+/**
+ * Default prompts for supported-elsewhere filter
+ *
+ * These are exported so they can be displayed in the profile editor UI
+ * as placeholders/defaults while allowing customization.
+ */
+
+export const DEFAULT_SUPPORTED_ELSEWHERE_SYSTEM_PROMPT = `You are an expert at analyzing document structure and finding supporting evidence.
+
+Your task is to check if each flagged issue is actually **supported, explained, or qualified elsewhere** in the document.
+
+**MARK AS SUPPORTED (filter out) if**:
+- The claim is backed up with evidence or reasoning later in the document
+- The author provides technical explanation that justifies the claim
+- The author qualifies or nuances the claim elsewhere
+- Context provided elsewhere makes the claim reasonable
+- The issue is about an intro/thesis that the rest of the document supports
+
+**MARK AS UNSUPPORTED (keep flagging) if**:
+- No evidence, reasoning, or support is provided anywhere in the document
+- The claim stands alone without qualification or explanation
+- Other parts of the document don't address the concern
+- The support found is weak or doesn't actually address the issue
+
+**Examples of SUPPORTED issues (filter out)**:
+
+1. Issue: "Non sequitur - claims X is evidence against Y without justification"
+   Support found: Later section explains WHY X implies not-Y with technical reasoning
+   → SUPPORTED - the logical connection is explained later
+
+2. Issue: "Claims 'significant improvement' without data" (in intro)
+   Support found: Paragraph 5 provides specific metrics and comparison
+   → SUPPORTED - intro claim is backed up later
+
+3. Issue: "Missing context about sample size"
+   Support found: Methods section specifies n=500 participants
+   → SUPPORTED - context is provided in appropriate section
+
+**Examples of UNSUPPORTED issues (keep flagging)**:
+
+1. Issue: "Non sequitur - claims X is evidence against Y"
+   Document searched: No explanation of the logical connection anywhere
+   → UNSUPPORTED - logical leap is never justified
+
+2. Issue: "Claims 95% success rate without methodology"
+   Document searched: No methodology section, no data tables
+   → UNSUPPORTED - specific claim needs specific evidence
+
+3. Issue: "Appeals to authority without naming sources"
+   Document searched: No citations or references provided
+   → UNSUPPORTED - authority claims need attribution
+
+For each issue, search the ENTIRE document for supporting evidence or reasoning.`;
diff --git a/internal-packages/ai/src/tools/supported-elsewhere-filter/types.ts b/internal-packages/ai/src/tools/supported-elsewhere-filter/types.ts
new file mode 100644
index 000000000..3af7c4a84
--- /dev/null
+++ b/internal-packages/ai/src/tools/supported-elsewhere-filter/types.ts
@@ -0,0 +1,91 @@
+/**
+ * Supported Elsewhere Filter Types
+ *
+ * This filter checks if claims or arguments flagged as issues are actually
+ * supported, explained, or qualified elsewhere in the document. Common in
+ * well-structured writing where intro claims are backed up later.
+ */
+
+import type { UnifiedUsageMetrics } from '../../utils/usageMetrics';
+import type {
+  ReasoningEffort,
+  ReasoningConfig,
+  ProviderPreferences,
+  ActualApiParams,
+  ApiResponseMetrics,
+} from '../../types/common';
+
+// Re-export for backwards compatibility
+export type { ReasoningEffort, ReasoningConfig, ProviderPreferences, ActualApiParams, ApiResponseMetrics };
+
+export interface SupportedElsewhereFilterInput {
+  /** Full document text to search for support */
+  documentText: string;
+
+  /** Issues to check for support elsewhere */
+  issues: SupportedElsewhereIssue[];
+
+  /**
+   * Optional model to use for filtering.
+   * Can be a Claude model (default) or an OpenRouter model ID.
+   * Examples: "claude-sonnet-4-20250514", "google/gemini-3-flash-preview"
+   */
+  model?: string;
+
+  /** Temperature for the LLM (0-2). Default is 0.1 for precise filtering. */
+  temperature?: number;
+
+  /** Reasoning/thinking configuration for Claude models */
+  reasoning?: ReasoningConfig;
+
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+
+  /** Custom system prompt (overrides default) */
+  customPrompt?: string;
+}
+
+export interface SupportedElsewhereIssue {
+  /** The exact text flagged as an issue */
+  quotedText: string;
+
+  /** Type of issue identified */
+  issueType: string;
+
+  /** The reasoning for why this was flagged */
+  reasoning: string;
+
+  /** Approximate location in document (character offset) */
+  locationOffset?: number;
+}
+
+export interface SupportedElsewhereFilterOutput {
+  /** Issues that are NOT supported elsewhere (keep flagging) */
+  unsupportedIssues: SupportedElsewhereResult[];
+
+  /** Issues that ARE supported elsewhere (filter out) */
+  supportedIssues: SupportedElsewhereResult[];
+
+  /** Unified usage metrics (includes cost, tokens, latency) */
+  unifiedUsage?: UnifiedUsageMetrics;
+
+  /** Actual API parameters sent */
+  actualApiParams?: ActualApiParams;
+
+  /** Response metrics */
+  responseMetrics?: ApiResponseMetrics;
+}
+
+export interface SupportedElsewhereResult {
+  /** Index of the issue in the input array */
+  index: number;
+
+  /** Whether this issue is supported elsewhere in the document */
+  isSupported: boolean;
+
+  /** Where the support was found (if applicable) */
+  supportLocation?: string;
+
+  /** Brief explanation of the support or lack thereof */
+  explanation: string;
+}
diff --git a/internal-packages/ai/src/types/common.ts b/internal-packages/ai/src/types/common.ts
new file mode 100644
index 000000000..1a3ab54b9
--- /dev/null
+++ b/internal-packages/ai/src/types/common.ts
@@ -0,0 +1,148 @@
+/**
+ * Common Types for AI Package
+ *
+ * Shared type definitions used across multiple tools and modules.
+ * Import from here to avoid duplication.
+ */
+
+// ============================================================================
+// Reasoning Configuration Types
+// ============================================================================
+
+/**
+ * Reasoning effort levels supported by OpenRouter and mapped to Claude thinking budgets.
+ * - "none": Disable reasoning entirely (OpenRouter only)
+ * - "minimal": ~1024 tokens for reasoning
+ * - "low": ~2048 tokens for reasoning
+ * - "medium": ~8192 tokens for reasoning
+ * - "high": ~16384 tokens for reasoning
+ * - "xhigh": ~32768 tokens for reasoning
+ */
+export type ReasoningEffort = 'none' | 'minimal' | 'low' | 'medium' | 'high' | 'xhigh';
+
+/**
+ * Reasoning configuration for extended thinking
+ * - false: Disabled
+ * - { effort: ReasoningEffort }: Use effort level (mapped to budget_tokens)
+ * - { budget_tokens: number }: Custom token budget (min 1024)
+ */
+export type ReasoningConfig =
+  | false
+  | { effort: ReasoningEffort }
+  | { budget_tokens: number };
+
+/**
+ * Maps effort levels to Anthropic budget_tokens values
+ */
+export const EFFORT_TO_BUDGET_TOKENS: Record<Exclude<ReasoningEffort, 'none'>, number> = {
+  minimal: 1024,
+  low: 2048,
+  medium: 8192,
+  high: 16384,
+  xhigh: 32768,
+};
+
+/**
+ * Convert reasoning effort level to budget tokens
+ */
+export function effortToBudgetTokens(effort: ReasoningEffort | string): number {
+  if (effort === 'none') return 0;
+  return EFFORT_TO_BUDGET_TOKENS[effort as Exclude<ReasoningEffort, 'none'>] || 8192; // Default to medium
+}
+
+// ============================================================================
+// Deduplication Constants
+// ============================================================================
+
+/**
+ * Jaccard similarity threshold for deduplication.
+ * Issues with word overlap >= this threshold are considered duplicates.
+ * Used by both Jaccard dedup and multi-extractor merge.
+ */
+export const JACCARD_SIMILARITY_THRESHOLD = 0.7;
+
+// ============================================================================
+// Provider Configuration Types
+// ============================================================================
+
+/**
+ * Provider routing preferences for OpenRouter
+ */
+export interface ProviderPreferences {
+  /** Ordered list of preferred providers (e.g., ["anthropic", "google"]) */
+  order?: string[];
+  /** Allow fallback to other providers if preferred ones fail (default: true) */
+  allow_fallbacks?: boolean;
+}
+
+// ============================================================================
+// API Telemetry Types
+// ============================================================================
+
+/**
+ * Actual API request parameters as sent to the provider.
+ * This is captured right before the API call for debugging/audit.
+ */
+export interface ActualApiParams {
+  /** Model ID sent to API */
+  model: string;
+
+  /** Temperature sent to API */
+  temperature: number;
+
+  /** Max tokens sent to API */
+  maxTokens: number;
+
+  /**
+   * Claude thinking config (if applicable)
+   * Exactly as sent: { type: "enabled", budget_tokens: number }
+   */
+  thinking?: {
+    type: 'enabled';
+    budget_tokens: number;
+  };
+
+  /**
+   * OpenRouter reasoning config (if applicable)
+   * Exactly as sent: { effort: string } or { max_tokens: number }
+   */
+  reasoning?: {
+    effort?: ReasoningEffort;
+    max_tokens?: number;
+  };
+}
+
+/**
+ * Response metrics from the API call
+ */
+export interface ApiResponseMetrics {
+  /** Whether the call succeeded */
+  success: boolean;
+
+  /** Latency in milliseconds */
+  latencyMs: number;
+
+  /** Input tokens used */
+  inputTokens?: number;
+
+  /** Output tokens used */
+  outputTokens?: number;
+
+  /** Thinking/reasoning tokens used (if extended thinking was enabled) */
+  thinkingTokens?: number;
+
+  /** Cache read tokens (if prompt caching was used) */
+  cacheReadTokens?: number;
+
+  /** Cache write tokens (if prompt caching was used) */
+  cacheWriteTokens?: number;
+
+  /** Stop reason from API */
+  stopReason?: string;
+
+  /** Error type if failed */
+  errorType?: string;
+
+  /** Error message if failed (sanitized) */
+  errorMessage?: string;
+}
diff --git a/internal-packages/ai/src/types/index.ts b/internal-packages/ai/src/types/index.ts
new file mode 100644
index 000000000..a2df0d04c
--- /dev/null
+++ b/internal-packages/ai/src/types/index.ts
@@ -0,0 +1,7 @@
+/**
+ * Type Exports
+ *
+ * Re-exports common types for convenient importing.
+ */
+
+export * from './common';
diff --git a/internal-packages/ai/src/utils/allModels.ts b/internal-packages/ai/src/utils/allModels.ts
new file mode 100644
index 000000000..c15e09143
--- /dev/null
+++ b/internal-packages/ai/src/utils/allModels.ts
@@ -0,0 +1,183 @@
+/**
+ * Fetch models from both Anthropic and OpenRouter APIs
+ */
+
+import Anthropic from "@anthropic-ai/sdk";
+
+export interface ModelInfo {
+  id: string;
+  name: string;
+  provider: "anthropic" | "openrouter";
+  contextLength?: number;
+  description?: string;
+  /** Whether the model supports temperature parameter */
+  supportsTemperature?: boolean;
+  /** Default temperature for this model (if known) */
+  defaultTemperature?: number;
+  /** Maximum temperature value (default 1, some models support up to 2) */
+  maxTemperature?: number;
+  /** Whether the model supports extended thinking/reasoning */
+  supportsReasoning?: boolean;
+  /** Maximum completion tokens from provider (top_provider.max_completion_tokens) */
+  maxCompletionTokens?: number;
+}
+
+// Cache for models
+let cachedModels: ModelInfo[] | null = null;
+let cacheTimestamp = 0;
+const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
+
+/**
+ * Fetch models from Anthropic API
+ */
+async function fetchAnthropicModels(): Promise<ModelInfo[]> {
+  try {
+    const client = new Anthropic();
+    const response = await client.models.list();
+
+    return response.data.map((m) => ({
+      id: m.id,
+      name: m.display_name,
+      provider: "anthropic" as const,
+      supportsTemperature: true, // All Anthropic models support temperature
+      defaultTemperature: 1, // Anthropic default is 1
+      maxTemperature: 2, // Anthropic supports 0-2
+      // Claude 3.5 Sonnet and newer support extended thinking
+      supportsReasoning: m.id.includes("claude-3") || m.id.includes("claude-sonnet-4") || m.id.includes("claude-opus-4"),
+    }));
+  } catch (e) {
+    console.error("Failed to fetch Anthropic models:", e);
+    return [];
+  }
+}
+
+/**
+ * Fetch models from OpenRouter API
+ */
+async function fetchOpenRouterModels(): Promise<ModelInfo[]> {
+  try {
+    const response = await fetch("https://openrouter.ai/api/v1/models");
+    if (!response.ok) {
+      throw new Error(`OpenRouter API error: ${response.status}`);
+    }
+
+    const data = (await response.json()) as {
+      data: Array<{
+        id: string;
+        name: string;
+        context_length?: number;
+        description?: string;
+        supported_parameters?: string[];
+        default_parameters?: {
+          temperature?: number | null;
+        };
+        top_provider?: {
+          max_completion_tokens?: number | null;
+        };
+      }>;
+    };
+
+    return data.data
+      .filter((m) => {
+        // Filter out free/test models and keep quality models
+        if (m.id.includes(":free")) return false;
+        if (m.id.includes("auto")) return false;
+        return true;
+      })
+      .map((m) => {
+        // Determine max temperature based on provider
+        // Match the ranges in openrouter.ts PROVIDER_TEMPERATURE_RANGES
+        const getMaxTemp = (modelId: string): number => {
+          if (modelId.startsWith("google/") || modelId.includes("gemini")) return 2;
+          if (modelId.startsWith("anthropic/") || modelId.includes("claude")) return 2;
+          if (modelId.startsWith("openai/") || modelId.includes("gpt")) return 2;
+          if (modelId.startsWith("x-ai/") || modelId.includes("grok")) return 2;
+          if (modelId.startsWith("deepseek/")) return 2;
+          if (modelId.startsWith("z-ai/")) return 1.5;
+          // Default to 1.5 for unknown providers (conservative)
+          return 1.5;
+        };
+        const maxTemp = getMaxTemp(m.id);
+
+        return {
+          id: m.id,
+          name: m.name,
+          provider: "openrouter" as const,
+          contextLength: m.context_length,
+          description: m.description,
+          supportsTemperature: m.supported_parameters?.includes("temperature") ?? true,
+          defaultTemperature: m.default_parameters?.temperature ?? undefined,
+          maxTemperature: maxTemp,
+          supportsReasoning: m.supported_parameters?.includes("reasoning") ||
+            m.supported_parameters?.includes("include_reasoning"),
+          maxCompletionTokens: m.top_provider?.max_completion_tokens ?? undefined,
+        };
+      });
+  } catch (e) {
+    console.error("Failed to fetch OpenRouter models:", e);
+    return [];
+  }
+}
+
+/**
+ * Get all available models from both APIs (cached)
+ */
+export async function getAllModels(): Promise<ModelInfo[]> {
+  const now = Date.now();
+
+  if (cachedModels && now - cacheTimestamp < CACHE_TTL_MS) {
+    return cachedModels;
+  }
+
+  const [anthropicModels, openRouterModels] = await Promise.all([
+    fetchAnthropicModels(),
+    fetchOpenRouterModels(),
+  ]);
+
+  // Combine and sort: Anthropic first, then OpenRouter alphabetically
+  cachedModels = [
+    ...anthropicModels.sort((a, b) => a.name.localeCompare(b.name)),
+    ...openRouterModels.sort((a, b) => a.name.localeCompare(b.name)),
+  ];
+
+  cacheTimestamp = now;
+  return cachedModels;
+}
+
+/**
+ * Filter models by search query
+ * Matches against id and name
+ */
+export function filterModels(models: ModelInfo[], query: string): ModelInfo[] {
+  if (!query.trim()) {
+    return models;
+  }
+
+  const lowerQuery = query.toLowerCase();
+  return models.filter(
+    (m) =>
+      m.id.toLowerCase().includes(lowerQuery) ||
+      m.name.toLowerCase().includes(lowerQuery) ||
+      m.provider.toLowerCase().includes(lowerQuery)
+  );
+}
+
+/**
+ * Group models by provider
+ */
+export function groupModelsByProvider(
+  models: ModelInfo[]
+): Map<string, ModelInfo[]> {
+  const grouped = new Map<string, ModelInfo[]>();
+
+  for (const model of models) {
+    const existing = grouped.get(model.provider) || [];
+    existing.push(model);
+    grouped.set(model.provider, existing);
+  }
+
+  return grouped;
+}
+
+/** Temperature presets for model configuration */
+export const TEMP_PRESETS: Array<"default" | number> = ["default", 0, 0.3, 0.5, 0.7, 1.0];
diff --git a/internal-packages/ai/src/utils/modelConfigResolver.ts b/internal-packages/ai/src/utils/modelConfigResolver.ts
new file mode 100644
index 000000000..69f370bce
--- /dev/null
+++ b/internal-packages/ai/src/utils/modelConfigResolver.ts
@@ -0,0 +1,253 @@
+/**
+ * Model Configuration Resolver
+ *
+ * Unified utility for resolving model configurations from profile definitions.
+ * Used by extractors, judges, filters - any LLM workflow that needs to call an API.
+ *
+ * This ensures consistent handling of:
+ * - Temperature (number, "default", or undefined)
+ * - Reasoning/thinking (effort levels or explicit budget)
+ * - Provider preferences (OpenRouter routing)
+ */
+
+import type { ReasoningEffort, ProviderPreferences } from './openrouter';
+
+// ============================================================================
+// Types
+// ============================================================================
+
+/** Reasoning configuration from profile (simplified version for profiles) */
+export type ProfileReasoningConfig =
+  | false                           // Off
+  | { effort: ReasoningEffort }     // Effort level
+  | { budget_tokens: number };      // Custom token budget
+
+/** Input: Model config from profile (extractors, judge, filters all use this shape) */
+export interface ProfileModelConfig {
+  model: string;
+  temperature?: number | 'default';
+  /** @deprecated Use reasoning instead */
+  thinking?: boolean;
+  /** Reasoning/thinking configuration */
+  reasoning?: ProfileReasoningConfig;
+  /** Provider routing preferences (OpenRouter only) */
+  provider?: ProviderPreferences;
+}
+
+// Re-export for convenience
+export type { ReasoningEffort, ProviderPreferences } from './openrouter';
+
+/** Output: Resolved params ready for API calls */
+export interface ResolvedModelConfig {
+  /** Model ID */
+  model: string;
+
+  /** Whether this is an OpenRouter model (contains '/') */
+  isOpenRouter: boolean;
+
+  /** Temperature to pass to API (undefined = let model use default) */
+  temperature: number | undefined;
+
+  /** Whether thinking/reasoning is enabled */
+  thinkingEnabled: boolean;
+
+  /** For OpenRouter: reasoning effort level */
+  reasoningEffort: ReasoningEffort | undefined;
+
+  /** For Claude: thinking config with budget_tokens */
+  claudeThinkingConfig: boolean | { type: 'enabled'; budget_tokens: number };
+
+  /** Provider preferences for OpenRouter */
+  provider: ProviderPreferences | undefined;
+}
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+/** Anthropic supports up to 128K thinking tokens */
+const ANTHROPIC_MAX_THINKING_TOKENS = 128000;
+
+/** Effort level percentages for budget calculation */
+const EFFORT_PERCENTAGES: Record<ReasoningEffort, number> = {
+  none: 0,
+  minimal: 0.1,
+  low: 0.3,
+  medium: 0.5,
+  high: 0.7,
+  xhigh: 0.9,
+};
+
+/** Default temperatures by provider type */
+const DEFAULT_TEMPERATURES = {
+  claude: 0,
+  openrouter: 0.1,
+};
+
+// ============================================================================
+// Main Resolver
+// ============================================================================
+
+/**
+ * Resolve a profile model config into API-ready parameters.
+ *
+ * @param config - Model configuration from profile
+ * @param options - Optional overrides
+ * @returns Resolved configuration ready for API calls
+ *
+ * @example
+ * ```ts
+ * const resolved = resolveModelConfig(profile.models.judge);
+ *
+ * if (resolved.isOpenRouter) {
+ *   await callOpenRouterWithTool({
+ *     model: resolved.model,
+ *     temperature: resolved.temperature,
+ *     thinking: resolved.thinkingEnabled,
+ *     reasoningEffort: resolved.reasoningEffort,
+ *     provider: resolved.provider,
+ *   });
+ * } else {
+ *   await callClaudeWithTool({
+ *     model: resolved.model,
+ *     temperature: resolved.temperature,
+ *     thinking: resolved.claudeThinkingConfig,
+ *   });
+ * }
+ * ```
+ */
+export function resolveModelConfig(
+  config: ProfileModelConfig,
+  options?: {
+    /** Override default temperature for this provider type */
+    defaultTemperature?: number;
+  }
+): ResolvedModelConfig {
+  const isOpenRouter = config.model.includes('/');
+
+  // Resolve temperature
+  // - "default": undefined (let model use native default)
+  // - number: use as-is
+  // - undefined: use our default for this provider type
+  let temperature: number | undefined;
+  if (config.temperature === 'default') {
+    temperature = undefined;
+  } else if (typeof config.temperature === 'number') {
+    temperature = config.temperature;
+  } else {
+    temperature = options?.defaultTemperature ??
+      (isOpenRouter ? DEFAULT_TEMPERATURES.openrouter : DEFAULT_TEMPERATURES.claude);
+  }
+
+  // Resolve thinking/reasoning
+  const { thinkingEnabled, reasoningEffort, claudeThinkingConfig } =
+    resolveReasoning(config, isOpenRouter);
+
+  return {
+    model: config.model,
+    isOpenRouter,
+    temperature,
+    thinkingEnabled,
+    reasoningEffort,
+    claudeThinkingConfig,
+    provider: config.provider,
+  };
+}
+
+/**
+ * Resolve reasoning configuration from profile.
+ * Handles both new `reasoning` field and legacy `thinking` boolean.
+ */
+function resolveReasoning(
+  config: ProfileModelConfig,
+  _isOpenRouter: boolean
+): {
+  thinkingEnabled: boolean;
+  reasoningEffort: ReasoningEffort | undefined;
+  claudeThinkingConfig: boolean | { type: 'enabled'; budget_tokens: number };
+} {
+  // Check new reasoning config first
+  if (config.reasoning !== undefined) {
+    if (config.reasoning === false) {
+      // Explicitly disabled
+      return {
+        thinkingEnabled: false,
+        reasoningEffort: undefined,
+        claudeThinkingConfig: false,
+      };
+    }
+
+    if ('effort' in config.reasoning) {
+      // Effort-based reasoning
+      const effort = config.reasoning.effort;
+
+      // 'none' effort disables thinking
+      if (effort === 'none') {
+        return {
+          thinkingEnabled: false,
+          reasoningEffort: undefined,
+          claudeThinkingConfig: false,
+        };
+      }
+
+      const budgetTokens = Math.floor(ANTHROPIC_MAX_THINKING_TOKENS * EFFORT_PERCENTAGES[effort]);
+
+      return {
+        thinkingEnabled: true,
+        reasoningEffort: effort,
+        claudeThinkingConfig: { type: 'enabled', budget_tokens: budgetTokens },
+      };
+    }
+
+    if ('budget_tokens' in config.reasoning) {
+      // Custom budget
+      const budgetTokens = config.reasoning.budget_tokens;
+      // For custom budget, estimate effort level for OpenRouter
+      const percentage = budgetTokens / ANTHROPIC_MAX_THINKING_TOKENS;
+      let estimatedEffort: ReasoningEffort = 'medium';
+      if (percentage <= 0.15) estimatedEffort = 'minimal';
+      else if (percentage <= 0.4) estimatedEffort = 'low';
+      else if (percentage <= 0.6) estimatedEffort = 'medium';
+      else if (percentage <= 0.8) estimatedEffort = 'high';
+      else estimatedEffort = 'xhigh';
+
+      return {
+        thinkingEnabled: true,
+        reasoningEffort: estimatedEffort,
+        claudeThinkingConfig: { type: 'enabled', budget_tokens: budgetTokens },
+      };
+    }
+  }
+
+  // Fall back to legacy thinking boolean
+  if (config.thinking === true) {
+    // Legacy: thinking enabled but no effort specified
+    // Use medium effort as default
+    const defaultEffort: ReasoningEffort = 'medium';
+    const budgetTokens = Math.floor(ANTHROPIC_MAX_THINKING_TOKENS * EFFORT_PERCENTAGES[defaultEffort]);
+
+    return {
+      thinkingEnabled: true,
+      reasoningEffort: defaultEffort,
+      claudeThinkingConfig: { type: 'enabled', budget_tokens: budgetTokens },
+    };
+  }
+
+  // Default: disabled
+  return {
+    thinkingEnabled: false,
+    reasoningEffort: undefined,
+    claudeThinkingConfig: false,
+  };
+}
+
+/**
+ * Get display string for reasoning config (for logging/telemetry)
+ */
+export function getReasoningDisplayString(config: ProfileModelConfig): string {
+  if (config.reasoning === false) return 'off';
+  if (config.reasoning && 'effort' in config.reasoning) return config.reasoning.effort;
+  if (config.reasoning && 'budget_tokens' in config.reasoning) return `${config.reasoning.budget_tokens} tokens`;
+  if (config.thinking === true) return 'enabled (legacy)';
+  return 'off';
+}
diff --git a/internal-packages/ai/src/utils/openrouter-types.ts b/internal-packages/ai/src/utils/openrouter-types.ts
new file mode 100644
index 000000000..48226ca94
--- /dev/null
+++ b/internal-packages/ai/src/utils/openrouter-types.ts
@@ -0,0 +1,257 @@
+/**
+ * OpenRouter Types - Client-safe type definitions
+ *
+ * These types and constants can be safely imported in browser/client-side code.
+ * For actual API functions (callOpenRouter, etc.), use @roast/ai/server
+ */
+
+import type { ReasoningEffort } from '../types/common';
+
+// Note: ReasoningEffort is already exported from @roast/ai via types/common.ts
+// We just use it here for local type definitions
+
+// ============================================================================
+// Types (all client-safe - no runtime dependencies)
+// ============================================================================
+
+/**
+ * Reasoning configuration for fine-grained control
+ */
+export interface ReasoningConfig {
+  /** Effort level (alternative to max_tokens) */
+  effort?: ReasoningEffort;
+  /** Direct token budget for reasoning */
+  max_tokens?: number;
+  /** Whether to exclude reasoning from response */
+  exclude?: boolean;
+  /** Enable reasoning with defaults */
+  enabled?: boolean;
+}
+
+/**
+ * OpenRouter chat message
+ */
+export interface OpenRouterMessage {
+  role: 'system' | 'user' | 'assistant' | 'tool';
+  content: string;
+  tool_call_id?: string;
+}
+
+/**
+ * Tool/function definition
+ */
+export interface OpenRouterTool {
+  type: 'function';
+  function: {
+    name: string;
+    description: string;
+    parameters: Record<string, unknown>;
+  };
+}
+
+/**
+ * Tool choice configuration
+ */
+export type OpenRouterToolChoice =
+  | 'none'
+  | 'auto'
+  | 'required'
+  | { type: 'function'; function: { name: string } };
+
+/**
+ * OpenRouter API request body
+ */
+export interface OpenRouterRequest {
+  model: string;
+  messages: OpenRouterMessage[];
+  max_tokens?: number;
+  temperature?: number;
+  top_p?: number;
+  top_k?: number;
+  frequency_penalty?: number;
+  presence_penalty?: number;
+  repetition_penalty?: number;
+  min_p?: number;
+  top_a?: number;
+  seed?: number;
+  stop?: string[];
+  tools?: OpenRouterTool[];
+  tool_choice?: OpenRouterToolChoice;
+  parallel_tool_calls?: boolean;
+  reasoning_effort?: ReasoningEffort;
+  reasoning?: ReasoningConfig;
+  response_format?: { type: 'json_object' | 'text' };
+  provider?: {
+    order?: string[];
+    allow_fallbacks?: boolean;
+    require_parameters?: boolean;
+  };
+}
+
+/**
+ * Tool call in response
+ */
+export interface OpenRouterToolCall {
+  id: string;
+  type: 'function';
+  function: {
+    name: string;
+    arguments: string;
+  };
+}
+
+/**
+ * Response choice
+ */
+export interface OpenRouterChoice {
+  index: number;
+  message: {
+    role: 'assistant';
+    content: string | null;
+    tool_calls?: OpenRouterToolCall[];
+  };
+  finish_reason: 'stop' | 'tool_calls' | 'length' | 'content_filter' | null;
+}
+
+/**
+ * Token usage with full cost details from OpenRouter
+ */
+export interface OpenRouterUsage {
+  prompt_tokens: number;
+  completion_tokens: number;
+  total_tokens: number;
+  cost?: number;
+  is_byok?: boolean;
+  prompt_tokens_details?: {
+    cached_tokens?: number;
+    audio_tokens?: number;
+    video_tokens?: number;
+  };
+  cost_details?: {
+    upstream_inference_cost?: number | null;
+    upstream_inference_prompt_cost?: number;
+    upstream_inference_completions_cost?: number;
+  };
+  completion_tokens_details?: {
+    reasoning_tokens?: number;
+    image_tokens?: number;
+  };
+}
+
+/**
+ * OpenRouter API response
+ */
+export interface OpenRouterResponse {
+  id: string;
+  model: string;
+  provider?: string;
+  object: 'chat.completion';
+  created: number;
+  choices: OpenRouterChoice[];
+  usage?: OpenRouterUsage;
+}
+
+/**
+ * API error response
+ */
+export interface OpenRouterError {
+  error: {
+    message: string;
+    type: string;
+    code?: string;
+  };
+}
+
+/**
+ * Provider preferences for routing
+ */
+export interface ProviderPreferences {
+  order?: string[];
+  allow_fallbacks?: boolean;
+}
+
+// ============================================================================
+// Constants (client-safe)
+// ============================================================================
+
+/**
+ * Common OpenRouter model identifiers
+ */
+export const OPENROUTER_MODELS = {
+  // Top tier - Latest and most capable models (2025)
+  CLAUDE_SONNET_4_5: 'anthropic/claude-sonnet-4.5',
+  CLAUDE_SONNET_4: 'anthropic/claude-sonnet-4',
+  GEMINI_3_PRO: 'google/gemini-3-pro-preview',
+  GEMINI_3_FLASH: 'google/gemini-3-flash-preview',
+  GEMINI_2_5_PRO: 'google/gemini-2.5-pro',
+  GEMINI_2_5_FLASH: 'google/gemini-2.5-flash',
+  GPT_5: 'openai/gpt-5',
+  GPT_5_MINI: 'openai/gpt-5-mini',
+  DEEPSEEK_CHAT_V3_1: 'deepseek/deepseek-chat-v3.1',
+  GROK_4: 'x-ai/grok-4',
+
+  // High performance
+  CLAUDE_3_5_SONNET: 'anthropic/claude-3.5-sonnet',
+  CLAUDE_3_7_SONNET: 'anthropic/claude-3-7-sonnet-20250219',
+  GPT_4_TURBO: 'openai/gpt-4-turbo',
+  GPT_4_1: 'openai/gpt-4.1',
+  GPT_4_1_MINI: 'openai/gpt-4.1-mini-2025-04-14',
+  GROK_BETA: 'x-ai/grok-beta',
+
+  // Good value
+  CLAUDE_HAIKU: 'anthropic/claude-3-haiku',
+  CLAUDE_HAIKU_4_5: 'anthropic/claude-haiku-4.5',
+  GPT_35_TURBO: 'openai/gpt-3.5-turbo',
+  DEEPSEEK_CHAT: 'deepseek/deepseek-chat',
+
+  // Legacy/Alternative
+  CLAUDE_OPUS: 'anthropic/claude-3-opus',
+  CLAUDE_4_SONNET_20250522: 'anthropic/claude-4-sonnet-20250522',
+  GPT_4: 'openai/gpt-4',
+  GEMINI_PRO: 'google/gemini-pro',
+  LLAMA_70B: 'meta-llama/llama-3-70b-instruct',
+} as const;
+
+export type OpenRouterModel = typeof OPENROUTER_MODELS[keyof typeof OPENROUTER_MODELS];
+
+/**
+ * Temperature range configuration by provider
+ */
+export const PROVIDER_TEMPERATURE_RANGES = {
+  anthropic: { min: 0, max: 1.0 },
+  openai: { min: 0, max: 2.0 },
+  google: { min: 0, max: 2.0 },
+  'x-ai': { min: 0, max: 2.0 },
+  deepseek: { min: 0, max: 2.0 },
+  'z-ai': { min: 0, max: 1.5 },
+  default: { min: 0, max: 1.5 },
+} as const;
+
+export type ProviderName = keyof typeof PROVIDER_TEMPERATURE_RANGES;
+
+// ============================================================================
+// Utility Functions (client-safe - no async/fetch)
+// ============================================================================
+
+/**
+ * Extract provider name from OpenRouter model ID
+ */
+export function getProviderFromModel(modelId: string): ProviderName {
+  if (modelId.includes('claude') || modelId.startsWith('anthropic/')) return 'anthropic';
+  if (modelId.includes('gpt') || modelId.includes('openai') || modelId.startsWith('openai/')) return 'openai';
+  if (modelId.includes('gemini') || modelId.startsWith('google/')) return 'google';
+  if (modelId.includes('grok') || modelId.startsWith('x-ai/')) return 'x-ai';
+  if (modelId.includes('deepseek') || modelId.startsWith('deepseek/')) return 'deepseek';
+  if (modelId.startsWith('z-ai/')) return 'z-ai';
+  return 'default';
+}
+
+/**
+ * Normalize temperature to provider's valid range
+ */
+export function normalizeTemperature(userTemp: number, modelId: string): number {
+  const provider = getProviderFromModel(modelId);
+  const range = PROVIDER_TEMPERATURE_RANGES[provider];
+  const normalized = Math.max(range.min, Math.min(range.max, userTemp));
+  return Number(normalized.toFixed(2));
+}
diff --git a/internal-packages/ai/src/utils/openrouter.ts b/internal-packages/ai/src/utils/openrouter.ts
index c3cc33b50..dc434e944 100644
--- a/internal-packages/ai/src/utils/openrouter.ts
+++ b/internal-packages/ai/src/utils/openrouter.ts
@@ -1,22 +1,215 @@
 /**
- * OpenRouter client factory with Helicone integration
- * Provides unified access to multiple LLM providers (Anthropic, OpenAI, xAI, etc.)
+ * OpenRouter Direct API Client
+ *
+ * Uses direct HTTP calls instead of OpenAI SDK for full control over
+ * OpenRouter-specific parameters like reasoning_effort.
+ *
+ * API Docs: https://openrouter.ai/docs/api/reference/parameters
  */
 
-import { OpenAI } from 'openai';
 import { aiConfig } from '../config';
 import { getCurrentHeliconeHeaders } from '../helicone/simpleSessionManager';
+import { logger } from '../shared/logger';
+import {
+  UnifiedUsageMetrics,
+  fromOpenRouterUsage,
+  OpenRouterRawUsage
+} from './usageMetrics';
+import {
+  resolveReasoningBudget,
+  invalidateEndpointsCache,
+  type ReasoningBudgetResult,
+} from './reasoningBudget';
+import type { ReasoningEffort as CommonReasoningEffort } from '../types/common';
 
-export interface OpenRouterOptions {
+// Use the common ReasoningEffort type
+export type ReasoningEffort = CommonReasoningEffort;
+
+// ============================================================================
+// Types
+// ============================================================================
+
+/**
+ * Reasoning configuration for fine-grained control
+ */
+export interface ReasoningConfig {
+  /** Effort level (alternative to max_tokens) */
+  effort?: ReasoningEffort;
+  /** Direct token budget for reasoning */
+  max_tokens?: number;
+  /** Whether to exclude reasoning from response */
+  exclude?: boolean;
+  /** Enable reasoning with defaults */
+  enabled?: boolean;
+}
+
+/**
+ * OpenRouter chat message
+ */
+export interface OpenRouterMessage {
+  role: 'system' | 'user' | 'assistant' | 'tool';
+  content: string;
+  tool_call_id?: string;
+}
+
+/**
+ * Tool/function definition
+ */
+export interface OpenRouterTool {
+  type: 'function';
+  function: {
+    name: string;
+    description: string;
+    parameters: Record<string, unknown>;
+  };
+}
+
+/**
+ * Tool choice configuration
+ */
+export type OpenRouterToolChoice =
+  | 'none'
+  | 'auto'
+  | 'required'
+  | { type: 'function'; function: { name: string } };
+
+/**
+ * OpenRouter API request body
+ */
+export interface OpenRouterRequest {
+  model: string;
+  messages: OpenRouterMessage[];
+
+  // Generation parameters
+  max_tokens?: number;
+  temperature?: number;
+  top_p?: number;
+  top_k?: number;
+  frequency_penalty?: number;
+  presence_penalty?: number;
+  repetition_penalty?: number;
+  min_p?: number;
+  top_a?: number;
+  seed?: number;
+  stop?: string[];
+
+  // Tool calling
+  tools?: OpenRouterTool[];
+  tool_choice?: OpenRouterToolChoice;
+  parallel_tool_calls?: boolean;
+
+  // Reasoning control (OpenRouter-specific)
+  reasoning_effort?: ReasoningEffort;
+  reasoning?: ReasoningConfig;
+
+  // Output format
+  response_format?: { type: 'json_object' | 'text' };
+
+  // Provider-specific passthrough
+  provider?: {
+    order?: string[];
+    allow_fallbacks?: boolean;
+    require_parameters?: boolean;
+  };
+}
+
+/**
+ * Tool call in response
+ */
+export interface OpenRouterToolCall {
+  id: string;
+  type: 'function';
+  function: {
+    name: string;
+    arguments: string;
+  };
+}
+
+/**
+ * Response choice
+ */
+export interface OpenRouterChoice {
+  index: number;
+  message: {
+    role: 'assistant';
+    content: string | null;
+    tool_calls?: OpenRouterToolCall[];
+  };
+  finish_reason: 'stop' | 'tool_calls' | 'length' | 'content_filter' | null;
+}
+
+/**
+ * Token usage with full cost details from OpenRouter
+ */
+export interface OpenRouterUsage {
+  prompt_tokens: number;
+  completion_tokens: number;
+  total_tokens: number;
+  cost?: number;
+  is_byok?: boolean;
+  prompt_tokens_details?: {
+    cached_tokens?: number;
+    audio_tokens?: number;
+    video_tokens?: number;
+  };
+  cost_details?: {
+    upstream_inference_cost?: number | null;
+    upstream_inference_prompt_cost?: number;
+    upstream_inference_completions_cost?: number;
+  };
+  completion_tokens_details?: {
+    reasoning_tokens?: number;
+    image_tokens?: number;
+  };
+}
+
+/**
+ * OpenRouter API response
+ */
+export interface OpenRouterResponse {
+  id: string;
+  model: string;
+  provider?: string;  // Which provider handled the request (e.g., "Google AI Studio", "Cerebras")
+  object: 'chat.completion';
+  created: number;
+  choices: OpenRouterChoice[];
+  usage?: OpenRouterUsage;
+}
+
+/**
+ * API error response
+ */
+export interface OpenRouterError {
+  error: {
+    message: string;
+    type: string;
+    code?: string;
+  };
+}
+
+// ============================================================================
+// Client Configuration
+// ============================================================================
+
+export interface OpenRouterClientOptions {
   apiKey?: string;
   includeSessionHeaders?: boolean;
 }
 
 /**
- * Create an OpenAI client configured for OpenRouter with Helicone proxy
- * Supports all models available via OpenRouter (Claude, GPT, Grok, etc.)
+ * Get the base URL for OpenRouter API (with optional Helicone proxy)
+ */
+function getBaseUrl(): string {
+  const heliconeKey = aiConfig.helicone.apiKey || process.env.HELICONE_API_KEY;
+  return heliconeKey
+    ? 'https://openrouter.helicone.ai/api/v1'
+    : 'https://openrouter.ai/api/v1';
+}
+
+/**
+ * Build headers for OpenRouter API requests
  */
-export function createOpenRouterClient(options: OpenRouterOptions = {}): OpenAI {
+function buildHeaders(options: OpenRouterClientOptions = {}): Record<string, string> {
   const apiKey = options.apiKey || process.env.OPENROUTER_API_KEY || '';
 
   if (!apiKey || apiKey === 'your_openrouter_api_key_here') {
@@ -27,45 +220,453 @@ export function createOpenRouterClient(options: OpenRouterOptions = {}): OpenAI
   }
 
   const heliconeKey = aiConfig.helicone.apiKey || process.env.HELICONE_API_KEY;
-
-  // Determine environment for better tracking
   const isProduction = process.env.NODE_ENV === 'production';
   const environment = isProduction ? 'Prod' : 'Dev';
   const appTitle = `RoastMyPost Tools - ${environment}`;
   const referer = isProduction ? 'https://roastmypost.org' : 'http://localhost:3000';
 
-  // Build default headers
-  const defaultHeaders: Record<string, string> = {
+  const headers: Record<string, string> = {
+    'Content-Type': 'application/json',
+    'Authorization': `Bearer ${apiKey}`,
     'HTTP-Referer': referer,
     'X-Title': appTitle,
     'X-Environment': environment,
   };
 
+  // Add Helicone auth if available
+  if (heliconeKey) {
+    headers['Helicone-Auth'] = `Bearer ${heliconeKey}`;
+  }
+
   // Add session headers if requested
   if (options.includeSessionHeaders !== false) {
     const sessionHeaders = getCurrentHeliconeHeaders();
-    Object.assign(defaultHeaders, sessionHeaders);
+    Object.assign(headers, sessionHeaders);
   }
 
-  // Use Helicone proxy if available, otherwise direct OpenRouter
-  if (heliconeKey) {
-    return new OpenAI({
-      baseURL: 'https://openrouter.helicone.ai/api/v1',
-      apiKey,
-      defaultHeaders: {
-        'Helicone-Auth': `Bearer ${heliconeKey}`,
-        ...defaultHeaders,
+  return headers;
+}
+
+// ============================================================================
+// API Functions
+// ============================================================================
+
+/**
+ * Make a direct API call to OpenRouter
+ */
+export async function callOpenRouter(
+  request: OpenRouterRequest,
+  options: OpenRouterClientOptions = {}
+): Promise<OpenRouterResponse> {
+  const baseUrl = getBaseUrl();
+  const headers = buildHeaders(options);
+
+  // Log request details at debug level
+  logger.debug(`[OpenRouter] Request: model=${request.model}, max_tokens=${request.max_tokens}, temp=${request.temperature}, reasoning=${JSON.stringify(request.reasoning)}, tools=[${request.tools?.map(t => t.function.name).join(', ') || ''}]`);
+
+  const response = await fetch(`${baseUrl}/chat/completions`, {
+    method: 'POST',
+    headers,
+    body: JSON.stringify(request),
+  });
+
+  if (!response.ok) {
+    const errorText = await response.text().catch(() => '');
+    let errorMessage = response.statusText;
+    let errorDetails = '';
+
+    try {
+      const errorBody = JSON.parse(errorText) as OpenRouterError;
+      errorMessage = errorBody.error.message || response.statusText;
+      // Include full error body for debugging (especially useful for 429 rate limits)
+      errorDetails = ` | Full response: ${errorText}`;
+    } catch {
+      // If not JSON, include raw text
+      if (errorText) {
+        errorDetails = ` | Response: ${errorText.substring(0, 500)}`;
       }
+    }
+
+    throw new Error(`OpenRouter API error (${response.status}): ${errorMessage}${errorDetails}`);
+  }
+
+  return response.json() as Promise<OpenRouterResponse>;
+}
+
+// ============================================================================
+// High-Level Chat Interface (no tools)
+// ============================================================================
+
+/**
+ * Provider routing configuration
+ */
+export interface ProviderPreferences {
+  /** Ordered list of preferred providers (e.g., ["anthropic", "google"]) */
+  order?: string[];
+  /** Allow fallback to other providers if preferred ones fail */
+  allow_fallbacks?: boolean;
+  /** Require all parameters to be supported by provider */
+  require_parameters?: boolean;
+}
+
+/**
+ * Options for simple chat completions (no tool calling)
+ */
+export interface OpenRouterChatOptions {
+  model: string;
+  messages: Array<{ role: 'user' | 'assistant' | 'system'; content: string }>;
+  max_tokens?: number;
+  temperature?: number;
+  response_format?: { type: 'json_object' | 'text' };
+
+  /**
+   * Custom headers to pass to the API (e.g., for cache control)
+   */
+  headers?: Record<string, string>;
+
+  /**
+   * Reasoning control
+   */
+  reasoningEffort?: ReasoningEffort;
+
+  /**
+   * Provider routing preferences
+   */
+  provider?: ProviderPreferences;
+}
+
+export interface OpenRouterChatResult {
+  content: string | null;
+  reasoning?: string;
+  model: string;
+  finishReason: string | null;
+  usage?: {
+    prompt_tokens: number;
+    completion_tokens: number;
+    total_tokens: number;
+  };
+  /** Unified usage metrics (includes cost, cache tokens, reasoning tokens) */
+  unifiedUsage?: UnifiedUsageMetrics;
+  /** Provider that handled the request */
+  provider?: string;
+}
+
+/**
+ * Simple chat completion without tool calling
+ * For cases like claim-evaluator that just need a text response
+ */
+export async function callOpenRouterChat(
+  options: OpenRouterChatOptions
+): Promise<OpenRouterChatResult> {
+  const request: OpenRouterRequest = {
+    model: options.model,
+    messages: options.messages.map(m => ({
+      role: m.role,
+      content: m.content,
+    })),
+    max_tokens: options.max_tokens || 4000,
+    temperature: options.temperature,
+    response_format: options.response_format,
+  };
+
+  // Use the `reasoning` object format which is more widely supported than `reasoning_effort`
+  if (options.reasoningEffort) {
+    request.reasoning = { effort: options.reasoningEffort };
+  }
+
+  if (options.provider) {
+    request.provider = options.provider;
+  }
+
+  logger.debug(`[OpenRouter] Chat: model=${options.model}${options.reasoningEffort ? `, reasoning.effort=${options.reasoningEffort}` : ''}`);
+
+  // Build custom client options with extra headers if provided
+  const clientOptions: OpenRouterClientOptions = {};
+
+  // Capture timing for unified metrics
+  const startTime = Date.now();
+  const response = await callOpenRouter(request, clientOptions);
+  const latencyMs = Date.now() - startTime;
+
+  const choice = response.choices[0];
+  if (!choice) {
+    throw new Error('No response from OpenRouter');
+  }
+
+  // Extract reasoning from various model formats
+  const message = choice.message as {
+    content: string | null;
+    reasoning?: string;
+    reasoning_content?: string;
+  };
+
+  // Build unified usage metrics
+  const rawUsage = response.usage as OpenRouterRawUsage | undefined;
+  const unifiedUsage = rawUsage
+    ? fromOpenRouterUsage(rawUsage, response.provider || 'openrouter', response.model, latencyMs)
+    : undefined;
+
+  return {
+    content: message.content,
+    reasoning: message.reasoning || message.reasoning_content,
+    model: response.model,
+    finishReason: choice.finish_reason,
+    usage: response.usage,
+    unifiedUsage,
+    provider: response.provider,
+  };
+}
+
+// ============================================================================
+// High-Level Tool Calling Interface
+// ============================================================================
+
+/**
+ * Options for tool-calling requests
+ */
+export interface OpenRouterToolCallOptions {
+  model: string;
+  system: string;
+  messages: Array<{ role: 'user' | 'assistant'; content: string }>;
+  max_tokens?: number;
+  temperature?: number;
+  toolName: string;
+  toolDescription: string;
+  toolSchema: Record<string, unknown>;
+
+  /**
+   * Whether to enable extended thinking/reasoning mode.
+   * - true: Enable reasoning (uses model default or "medium" effort)
+   * - false: Disable reasoning entirely (reasoning_effort: "none")
+   * - undefined: Let model use its default behavior
+   */
+  thinking?: boolean;
+
+  /**
+   * Fine-grained reasoning control (overrides thinking boolean)
+   * Use this for explicit control over reasoning effort level.
+   */
+  reasoningEffort?: ReasoningEffort;
+
+  /**
+   * Provider routing preferences
+   */
+  provider?: ProviderPreferences;
+}
+
+/** Actual API params as sent to OpenRouter */
+export interface OpenRouterActualParams {
+  model: string;
+  temperature: number;
+  maxTokens: number;
+  reasoning?: {
+    effort?: ReasoningEffort;
+    max_tokens?: number;
+  };
+}
+
+/** Response metrics from OpenRouter API */
+export interface OpenRouterResponseMetrics {
+  success: boolean;
+  latencyMs: number;
+  inputTokens?: number;
+  outputTokens?: number;
+  stopReason?: string;
+  errorType?: string;
+  errorMessage?: string;
+  /** Full raw usage from OpenRouter (includes cost, cache, reasoning tokens) */
+  rawUsage?: OpenRouterRawUsage;
+  /** Provider that handled the request (e.g., "Google AI Studio", "Cerebras") */
+  provider?: string;
+}
+
+export interface OpenRouterToolCallResult<T> {
+  toolResult: T;
+  model: string;
+  usage?: {
+    prompt_tokens: number;
+    completion_tokens: number;
+    total_tokens: number;
+  };
+  /** Actual params sent to API - captured right before the call */
+  actualParams: OpenRouterActualParams;
+  /** Response metrics */
+  responseMetrics: OpenRouterResponseMetrics;
+  /** Unified usage metrics (includes cost, cache tokens, reasoning tokens) */
+  unifiedUsage?: UnifiedUsageMetrics;
+}
+
+/**
+ * Call OpenRouter with tool/function calling
+ * Uses direct HTTP for full control over OpenRouter-specific parameters
+ */
+export async function callOpenRouterWithTool<T>(
+  options: OpenRouterToolCallOptions
+): Promise<OpenRouterToolCallResult<T>> {
+  // Determine reasoning effort
+  let reasoningEffort: ReasoningEffort | undefined;
+
+  if (options.reasoningEffort !== undefined) {
+    // Explicit reasoning effort takes precedence
+    reasoningEffort = options.reasoningEffort;
+  } else if (options.thinking === false) {
+    // Disable reasoning when thinking is false
+    reasoningEffort = 'none';
+  }
+  // When thinking is true or undefined, don't set reasoning_effort (use model default)
+
+  // Resolve reasoning budget using the new resolver (handles provider-specific limits)
+  let budgetResult: ReasoningBudgetResult | undefined;
+  let effectiveMaxTokens = options.max_tokens || 4000;
+
+  if (reasoningEffort && reasoningEffort !== 'none') {
+    budgetResult = await resolveReasoningBudget({
+      effort: reasoningEffort,
+      modelId: options.model,
+      selectedProviders: options.provider?.order,
     });
-  } else {
-    return new OpenAI({
-      baseURL: 'https://openrouter.ai/api/v1',
-      apiKey,
-      defaultHeaders,
+    effectiveMaxTokens = budgetResult.maxTokens;
+
+    logger.debug(`[OpenRouter] Reasoning budget: effort=${reasoningEffort}, maxTokens=${effectiveMaxTokens}, budget=${budgetResult.displayBudget}, usesExplicit=${budgetResult.usesExplicitBudget}`);
+  }
+
+  // Build request
+  // Only set temperature if explicitly provided - otherwise let model use its native default
+  const effectiveTemperature = options.temperature !== undefined
+    ? normalizeTemperature(options.temperature, options.model)
+    : undefined;
+
+  const request: OpenRouterRequest = {
+    model: options.model,
+    messages: [
+      { role: 'system', content: options.system },
+      ...options.messages.map(m => ({ role: m.role, content: m.content })),
+    ],
+    max_tokens: effectiveMaxTokens,
+    ...(effectiveTemperature !== undefined && { temperature: effectiveTemperature }),
+    tools: [
+      {
+        type: 'function',
+        function: {
+          name: options.toolName,
+          description: options.toolDescription,
+          parameters: options.toolSchema,
+        },
+      },
+    ],
+    // Tool choice strategy:
+    // - Default: Force specific tool for reliability
+    // - With reasoning: Use "required" (model must use a tool)
+    // - With reasoning + specific provider routing: Use "auto" (some providers like z-ai
+    //   don't support "required" combined with reasoning)
+    tool_choice: reasoningEffort !== undefined
+      ? (options.provider?.order ? 'auto' : 'required')
+      : { type: 'function', function: { name: options.toolName } },
+  };
+
+  // Add reasoning configuration from budget resolver or simple effort
+  if (budgetResult) {
+    // Use the resolved reasoning config (may be explicit max_tokens or effort-based)
+    request.reasoning = budgetResult.reasoning;
+  } else if (reasoningEffort !== undefined) {
+    // Fallback for 'none' effort
+    request.reasoning = { effort: reasoningEffort };
+  }
+
+  // Add provider preferences if specified
+  if (options.provider) {
+    request.provider = options.provider;
+  }
+
+  // Logging is done in callOpenRouter function
+
+  // Capture actual params being sent to API (for telemetry)
+  const actualParams: OpenRouterActualParams = {
+    model: options.model,
+    // Use effectiveTemperature or -1 to indicate "not set" (model uses native default)
+    temperature: effectiveTemperature ?? -1,
+    maxTokens: request.max_tokens!,
+    ...(request.reasoning && { reasoning: request.reasoning }),
+  };
+
+  // Capture timing for telemetry
+  const apiCallStartTime = Date.now();
+
+  const response = await callOpenRouter(request);
+  const latencyMs = Date.now() - apiCallStartTime;
+
+  const choice = response.choices[0];
+  if (!choice) {
+    throw new Error('No response from OpenRouter');
+  }
+
+  // Detect truncation and invalidate cache for future requests
+  if (choice.finish_reason === 'length') {
+    logger.warn(`[OpenRouter] Response truncated for ${options.model} - invalidating endpoints cache`);
+    invalidateEndpointsCache(options.model);
+  }
+
+  // Check for tool call
+  const toolCalls = choice.message.tool_calls;
+  const toolCall = toolCalls?.[0];
+  if (!toolCall || toolCall.function.name !== options.toolName) {
+    logger.error(`[OpenRouter] Expected tool call '${options.toolName}' but got:`, {
+      finish_reason: choice.finish_reason,
+      message_content: choice.message.content?.substring(0, 500) || '(empty)',
+      tool_calls: toolCalls || [],
     });
+
+    if (choice.finish_reason === 'length') {
+      throw new Error(`Response truncated (max_tokens too small) - model ${options.model} ran out of tokens before completing the tool call. Consider using a lower reasoning effort level.`);
+    }
+    throw new Error(`No tool call found for ${options.toolName}`);
+  }
+
+  // Parse the tool arguments
+  let toolResult: T;
+  try {
+    toolResult = JSON.parse(toolCall.function.arguments) as T;
+  } catch {
+    throw new Error(`Failed to parse tool arguments: ${toolCall.function.arguments}`);
   }
+
+  // Cast usage to raw format for unified metrics
+  const rawUsage = response.usage as OpenRouterRawUsage | undefined;
+
+  // Build response metrics for telemetry
+  const responseMetrics: OpenRouterResponseMetrics = {
+    success: true,
+    latencyMs,
+    inputTokens: response.usage?.prompt_tokens,
+    outputTokens: response.usage?.completion_tokens,
+    stopReason: choice.finish_reason || undefined,
+    rawUsage,
+    provider: response.provider,
+  };
+
+  // Build unified usage metrics
+  const unifiedUsage = rawUsage
+    ? fromOpenRouterUsage(rawUsage, response.provider || 'openrouter', response.model, latencyMs)
+    : undefined;
+
+  return {
+    toolResult,
+    model: options.model,
+    usage: response.usage ? {
+      prompt_tokens: response.usage.prompt_tokens,
+      completion_tokens: response.usage.completion_tokens,
+      total_tokens: response.usage.total_tokens,
+    } : undefined,
+    actualParams,
+    responseMetrics,
+    unifiedUsage,
+  };
 }
 
+// ============================================================================
+// Model Configuration
+// ============================================================================
+
 /**
  * Common OpenRouter model identifiers
  * Top models selected for reasoning, analysis, and evaluation tasks
@@ -74,6 +675,8 @@ export const OPENROUTER_MODELS = {
   // Top tier - Latest and most capable models (2025)
   CLAUDE_SONNET_4_5: 'anthropic/claude-sonnet-4.5',
   CLAUDE_SONNET_4: 'anthropic/claude-sonnet-4',
+  GEMINI_3_PRO: 'google/gemini-3-pro-preview',
+  GEMINI_3_FLASH: 'google/gemini-3-flash-preview',
   GEMINI_2_5_PRO: 'google/gemini-2.5-pro',
   GEMINI_2_5_FLASH: 'google/gemini-2.5-flash',
   GPT_5: 'openai/gpt-5',
@@ -105,6 +708,10 @@ export const OPENROUTER_MODELS = {
 
 export type OpenRouterModel = typeof OPENROUTER_MODELS[keyof typeof OPENROUTER_MODELS];
 
+// ============================================================================
+// Temperature Utilities
+// ============================================================================
+
 /**
  * Temperature range configuration by provider
  * Different providers support different temperature ranges
@@ -115,6 +722,9 @@ export const PROVIDER_TEMPERATURE_RANGES = {
   google: { min: 0, max: 2.0 },
   'x-ai': { min: 0, max: 2.0 },
   deepseek: { min: 0, max: 2.0 },
+  'z-ai': { min: 0, max: 1.5 },
+  // Default for unknown providers - use conservative max
+  default: { min: 0, max: 1.5 },
 } as const;
 
 export type ProviderName = keyof typeof PROVIDER_TEMPERATURE_RANGES;
@@ -130,21 +740,48 @@ export function getProviderFromModel(modelId: string): ProviderName {
   if (modelId.includes('gemini') || modelId.startsWith('google/')) return 'google';
   if (modelId.includes('grok') || modelId.startsWith('x-ai/')) return 'x-ai';
   if (modelId.includes('deepseek') || modelId.startsWith('deepseek/')) return 'deepseek';
-  return 'openai'; // Default fallback to OpenAI's range
+  if (modelId.startsWith('z-ai/')) return 'z-ai';
+  return 'default'; // Default fallback to conservative range
 }
 
 /**
- * Normalize temperature from user-facing 0-1 scale to provider-specific range
- * @param userTemp - User-provided temperature (0-1 scale)
+ * Normalize temperature to the valid range for a given provider.
+ *
+ * Handles two input conventions:
+ * - Values 0-1: Treated as normalized scale, mapped to provider's full range
+ * - Values > 1: Treated as actual temperature values, capped to provider max
+ *
+ * @param userTemp - User-provided temperature (0-1 normalized, or actual value)
  * @param modelId - Full model ID to determine provider
- * @returns Actual temperature value for the provider's API
+ * @returns Actual temperature value capped to provider's max
  *
  * @example
- * normalizeTemperature(0.7, 'anthropic/claude-3-haiku') // Returns 0.7 (Anthropic max is 1.0)
- * normalizeTemperature(0.7, 'openai/gpt-4') // Returns 1.4 (OpenAI max is 2.0)
+ * normalizeTemperature(0.7, 'anthropic/claude-3-haiku') // Returns 0.7 (within Anthropic's 0-1 range)
+ * normalizeTemperature(0.7, 'openai/gpt-4') // Returns 1.4 (0.7 * 2.0 for OpenAI's 0-2 range)
+ * normalizeTemperature(1.5, 'anthropic/claude-3-haiku') // Returns 1.0 (capped to Anthropic max)
+ * normalizeTemperature(1.5, 'openai/gpt-4') // Returns 1.5 (within OpenAI's 0-2 range)
  */
 export function normalizeTemperature(userTemp: number, modelId: string): number {
   const provider = getProviderFromModel(modelId);
   const range = PROVIDER_TEMPERATURE_RANGES[provider];
+
+  // If value is > 1, treat as actual temperature (don't scale)
+  // Just cap to provider max
+  if (userTemp > 1) {
+    return Math.min(userTemp, range.max);
+  }
+
+  // If value is 0-1, scale to provider's range
   return userTemp * range.max;
 }
+
+// ============================================================================
+// Legacy Exports (for backwards compatibility)
+// ============================================================================
+
+// Note: createOpenRouterClient is no longer needed since we use direct HTTP
+// but we keep the export for any code that might reference it
+export interface OpenRouterOptions {
+  apiKey?: string;
+  includeSessionHeaders?: boolean;
+}
diff --git a/internal-packages/ai/src/utils/reasoningBudget-client.ts b/internal-packages/ai/src/utils/reasoningBudget-client.ts
new file mode 100644
index 000000000..01c157544
--- /dev/null
+++ b/internal-packages/ai/src/utils/reasoningBudget-client.ts
@@ -0,0 +1,239 @@
+/**
+ * Reasoning Budget Resolver - Client-safe version
+ *
+ * This file contains the synchronous, client-safe exports from reasoningBudget.
+ * For the full async version with logging, see reasoningBudget.ts (server only).
+ *
+ * Used by UI components to calculate reasoning token budgets for display.
+ */
+
+import type { ReasoningEffort } from './openrouter';
+
+// ============================================================================
+// Types
+// ============================================================================
+
+export interface ModelEndpointData {
+  tag: string;
+  providerName: string;
+  maxCompletionTokens: number | null;
+}
+
+export interface ReasoningBudgetResult {
+  /** Reasoning configuration to pass to OpenRouter */
+  reasoning: {
+    effort?: ReasoningEffort;
+    max_tokens?: number;
+  };
+  /** Effective max_tokens to use for the request */
+  maxTokens: number;
+  /** Display-friendly budget (e.g., "12K") for UI */
+  displayBudget: string;
+  /** Whether we're using explicit max_tokens (true) or falling back to effort (false) */
+  usesExplicitBudget: boolean;
+}
+
+export interface ResolverOptions {
+  /** Reasoning effort level */
+  effort: ReasoningEffort;
+  /** OpenRouter model ID */
+  modelId: string;
+  /** Optional list of selected provider tags (e.g., ["google-vertex", "together"]) */
+  selectedProviders?: string[];
+  /** Optional pre-fetched endpoints data (if not provided, will fetch) */
+  endpointsData?: ModelEndpointData[];
+}
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+/** Effort level to percentage of available budget */
+const EFFORT_PERCENTAGES: Record<ReasoningEffort, number> = {
+  xhigh: 0.9,
+  high: 0.7,
+  medium: 0.5,
+  low: 0.3,
+  minimal: 0.1,
+  none: 0,
+};
+
+/** Default max completion tokens when we can't determine from provider data */
+const DEFAULT_MAX_COMPLETION_TOKENS = 8192;
+
+/** Minimum output reserve (ensures enough tokens for tool call JSON responses) */
+const MIN_OUTPUT_RESERVE = 1000;
+
+/** Maximum output reserve (don't reserve too much for small models) */
+const MAX_OUTPUT_RESERVE = 4000;
+
+/** Output reserve as percentage of effective max tokens */
+const OUTPUT_RESERVE_PERCENTAGE = 0.25;
+
+/** Models that support reasoning.max_tokens (Anthropic-style explicit budget) */
+const SUPPORTS_EXPLICIT_BUDGET = [
+  'anthropic/',  // All Anthropic models
+  'zhipu/',      // GLM models support explicit reasoning budget
+  'deepseek/',   // DeepSeek reasoning models
+];
+
+/** Models that require reasoning.effort (OpenAI o-series style) */
+const REQUIRES_EFFORT_ONLY = [
+  'openai/o1',
+  'openai/o3',
+  'openai/o4',
+];
+
+// ============================================================================
+// Budget Calculation
+// ============================================================================
+
+/**
+ * Calculate the effective max completion tokens based on selected providers
+ */
+function calculateEffectiveMax(
+  endpoints: ModelEndpointData[],
+  selectedProviders?: string[]
+): number {
+  let relevantEndpoints = endpoints;
+
+  // Filter to selected providers if specified
+  if (selectedProviders && selectedProviders.length > 0) {
+    relevantEndpoints = endpoints.filter(ep =>
+      selectedProviders.includes(ep.tag)
+    );
+  }
+
+  // Get all non-null maxCompletionTokens values
+  const maxValues = relevantEndpoints
+    .map(ep => ep.maxCompletionTokens)
+    .filter((v): v is number => v !== null && v > 0);
+
+  // Return minimum of available values (conservative approach)
+  // If all null, fall back to default
+  if (maxValues.length === 0) {
+    return DEFAULT_MAX_COMPLETION_TOKENS;
+  }
+
+  return Math.min(...maxValues);
+}
+
+/**
+ * Calculate dynamic output reserve based on effective max
+ * Reserve enough for tool call JSON responses, but scale with model capacity
+ */
+function calculateOutputReserve(effectiveMax: number): number {
+  const percentageReserve = effectiveMax * OUTPUT_RESERVE_PERCENTAGE;
+  return Math.max(MIN_OUTPUT_RESERVE, Math.min(MAX_OUTPUT_RESERVE, percentageReserve));
+}
+
+/**
+ * Check if model supports explicit reasoning.max_tokens
+ */
+function supportsExplicitBudget(modelId: string): boolean {
+  // Check if any prefix matches
+  if (SUPPORTS_EXPLICIT_BUDGET.some(prefix => modelId.startsWith(prefix))) {
+    return true;
+  }
+
+  // OpenAI o-series requires effort only
+  if (REQUIRES_EFFORT_ONLY.some(prefix => modelId.startsWith(prefix))) {
+    return false;
+  }
+
+  // Default: assume supports explicit budget for better control
+  // Most modern reasoning models support it
+  return true;
+}
+
+/**
+ * Format budget for display (e.g., 12500 -> "12.5K")
+ */
+function formatDisplayBudget(tokens: number): string {
+  if (tokens >= 1000) {
+    const k = tokens / 1000;
+    // Show one decimal if not a round number
+    return k % 1 === 0 ? `${k}K` : `${k.toFixed(1)}K`;
+  }
+  return String(tokens);
+}
+
+// ============================================================================
+// Synchronous Resolver (Client-safe)
+// ============================================================================
+
+/**
+ * Synchronous version for cases where endpoints are already available
+ * (e.g., UI display where useModelEndpoints hook provides data)
+ */
+export function resolveReasoningBudgetSync(
+  options: ResolverOptions & { endpointsData: ModelEndpointData[] }
+): ReasoningBudgetResult {
+  const { effort, modelId, selectedProviders, endpointsData } = options;
+
+  // Handle 'none' effort
+  if (effort === 'none') {
+    return {
+      reasoning: { effort: 'none' },
+      maxTokens: DEFAULT_MAX_COMPLETION_TOKENS,
+      displayBudget: '',
+      usesExplicitBudget: false,
+    };
+  }
+
+  // Calculate effective max tokens
+  const effectiveMax = calculateEffectiveMax(endpointsData, selectedProviders);
+
+  // Calculate output reserve
+  const outputReserve = calculateOutputReserve(effectiveMax);
+
+  // Calculate available budget for reasoning
+  const available = effectiveMax - outputReserve;
+
+  // Calculate reasoning budget based on effort level
+  const effortPercentage = EFFORT_PERCENTAGES[effort];
+  const reasoningBudget = Math.floor(available * effortPercentage);
+
+  // Check API compatibility
+  const usesExplicit = supportsExplicitBudget(modelId);
+
+  if (usesExplicit) {
+    return {
+      reasoning: { max_tokens: reasoningBudget },
+      maxTokens: effectiveMax,
+      displayBudget: formatDisplayBudget(reasoningBudget),
+      usesExplicitBudget: true,
+    };
+  } else {
+    return {
+      reasoning: { effort },
+      maxTokens: effectiveMax,
+      displayBudget: `~${formatDisplayBudget(reasoningBudget)}`,
+      usesExplicitBudget: false,
+    };
+  }
+}
+
+/**
+ * Get display-friendly description of reasoning budget
+ * For use in UI to show users what they're getting
+ */
+export function getReasoningBudgetDescription(
+  effort: ReasoningEffort,
+  displayBudget: string
+): string {
+  if (effort === 'none' || !displayBudget) {
+    return 'Reasoning disabled';
+  }
+
+  const effortLabels: Record<ReasoningEffort, string> = {
+    xhigh: 'Very High',
+    high: 'High',
+    medium: 'Medium',
+    low: 'Low',
+    minimal: 'Minimal',
+    none: 'None',
+  };
+
+  return `${effortLabels[effort]} → ${displayBudget} reasoning tokens`;
+}
diff --git a/internal-packages/ai/src/utils/reasoningBudget.ts b/internal-packages/ai/src/utils/reasoningBudget.ts
new file mode 100644
index 000000000..94b9393ef
--- /dev/null
+++ b/internal-packages/ai/src/utils/reasoningBudget.ts
@@ -0,0 +1,399 @@
+/**
+ * Reasoning Budget Resolver
+ *
+ * Calculates optimal reasoning token budgets for OpenRouter models based on:
+ * - Effort level (xhigh, high, medium, low, minimal)
+ * - Provider-specific max_completion_tokens limits
+ * - Dynamic output reserve to ensure sufficient tokens for tool call responses
+ */
+
+import type { ReasoningEffort } from './openrouter';
+import { logger } from '../shared/logger';
+
+// ============================================================================
+// Types
+// ============================================================================
+
+export interface ModelEndpointData {
+  tag: string;
+  providerName: string;
+  maxCompletionTokens: number | null;
+}
+
+export interface ReasoningBudgetResult {
+  /** Reasoning configuration to pass to OpenRouter */
+  reasoning: {
+    effort?: ReasoningEffort;
+    max_tokens?: number;
+  };
+  /** Effective max_tokens to use for the request */
+  maxTokens: number;
+  /** Display-friendly budget (e.g., "12K") for UI */
+  displayBudget: string;
+  /** Whether we're using explicit max_tokens (true) or falling back to effort (false) */
+  usesExplicitBudget: boolean;
+}
+
+export interface ResolverOptions {
+  /** Reasoning effort level */
+  effort: ReasoningEffort;
+  /** OpenRouter model ID */
+  modelId: string;
+  /** Optional list of selected provider tags (e.g., ["google-vertex", "together"]) */
+  selectedProviders?: string[];
+  /** Optional pre-fetched endpoints data (if not provided, will fetch) */
+  endpointsData?: ModelEndpointData[];
+}
+
+// ============================================================================
+// Constants
+// ============================================================================
+
+/** Effort level to percentage of available budget */
+const EFFORT_PERCENTAGES: Record<ReasoningEffort, number> = {
+  xhigh: 0.9,
+  high: 0.7,
+  medium: 0.5,
+  low: 0.3,
+  minimal: 0.1,
+  none: 0,
+};
+
+/** Default max completion tokens when we can't determine from provider data */
+const DEFAULT_MAX_COMPLETION_TOKENS = 8192;
+
+/** Minimum output reserve (ensures enough tokens for tool call JSON responses) */
+const MIN_OUTPUT_RESERVE = 1000;
+
+/** Maximum output reserve (don't reserve too much for small models) */
+const MAX_OUTPUT_RESERVE = 4000;
+
+/** Output reserve as percentage of effective max tokens */
+const OUTPUT_RESERVE_PERCENTAGE = 0.25;
+
+/** Models that support reasoning.max_tokens (Anthropic-style explicit budget) */
+const SUPPORTS_EXPLICIT_BUDGET = [
+  'anthropic/',  // All Anthropic models
+  'zhipu/',      // GLM models support explicit reasoning budget
+  'deepseek/',   // DeepSeek reasoning models
+];
+
+/** Models that require reasoning.effort (OpenAI o-series style) */
+const REQUIRES_EFFORT_ONLY = [
+  'openai/o1',
+  'openai/o3',
+  'openai/o4',
+];
+
+// ============================================================================
+// Cache
+// ============================================================================
+
+interface CachedEndpoints {
+  data: ModelEndpointData[];
+  timestamp: number;
+}
+
+const endpointsCache = new Map<string, CachedEndpoints>();
+const CACHE_TTL_MS = 60 * 60 * 1000; // 1 hour
+
+/**
+ * Invalidate cache for a model (call this when truncation is detected)
+ */
+export function invalidateEndpointsCache(modelId: string): void {
+  endpointsCache.delete(modelId);
+  logger.debug(`[ReasoningBudget] Cache invalidated for ${modelId}`);
+}
+
+/**
+ * Invalidate entire cache
+ */
+export function invalidateAllEndpointsCache(): void {
+  endpointsCache.clear();
+  logger.debug(`[ReasoningBudget] All cache invalidated`);
+}
+
+// ============================================================================
+// Endpoint Fetching
+// ============================================================================
+
+/**
+ * Fetch model endpoints from OpenRouter API
+ */
+async function fetchModelEndpoints(modelId: string): Promise<ModelEndpointData[]> {
+  // Check cache
+  const cached = endpointsCache.get(modelId);
+  if (cached && Date.now() - cached.timestamp < CACHE_TTL_MS) {
+    return cached.data;
+  }
+
+  try {
+    // Note: Do NOT URL-encode the modelId - OpenRouter expects the literal path
+    // e.g., /models/google/gemini-3-flash-preview/endpoints (not google%2Fgemini...)
+    const response = await fetch(
+      `https://openrouter.ai/api/v1/models/${modelId}/endpoints`
+    );
+
+    if (!response.ok) {
+      logger.warn(`[ReasoningBudget] Failed to fetch endpoints for ${modelId}: ${response.status}`);
+      return [];
+    }
+
+    const json = await response.json() as {
+      data?: {
+        endpoints?: Array<{
+          tag?: string;
+          provider_name?: string;
+          max_completion_tokens?: number | null;
+        }>;
+      };
+    };
+
+    const endpoints = json.data?.endpoints || [];
+
+    // Parse and deduplicate by base tag
+    const seenTags = new Set<string>();
+    const parsed: ModelEndpointData[] = [];
+
+    for (const ep of endpoints) {
+      // Extract base tag (remove region suffixes like "/global")
+      const baseTag = ep.tag?.split('/')[0];
+      if (!baseTag || seenTags.has(baseTag)) continue;
+      seenTags.add(baseTag);
+
+      parsed.push({
+        tag: baseTag,
+        providerName: ep.provider_name || baseTag,
+        maxCompletionTokens: ep.max_completion_tokens ?? null,
+      });
+    }
+
+    // Cache the result
+    endpointsCache.set(modelId, { data: parsed, timestamp: Date.now() });
+
+    return parsed;
+  } catch (e) {
+    logger.warn(`[ReasoningBudget] Error fetching endpoints for ${modelId}:`, e);
+    return [];
+  }
+}
+
+// ============================================================================
+// Budget Calculation
+// ============================================================================
+
+/**
+ * Calculate the effective max completion tokens based on selected providers
+ */
+function calculateEffectiveMax(
+  endpoints: ModelEndpointData[],
+  selectedProviders?: string[]
+): number {
+  let relevantEndpoints = endpoints;
+
+  // Filter to selected providers if specified
+  if (selectedProviders && selectedProviders.length > 0) {
+    relevantEndpoints = endpoints.filter(ep =>
+      selectedProviders.includes(ep.tag)
+    );
+  }
+
+  // Get all non-null maxCompletionTokens values
+  const maxValues = relevantEndpoints
+    .map(ep => ep.maxCompletionTokens)
+    .filter((v): v is number => v !== null && v > 0);
+
+  // Return minimum of available values (conservative approach)
+  // If all null, fall back to default
+  if (maxValues.length === 0) {
+    return DEFAULT_MAX_COMPLETION_TOKENS;
+  }
+
+  return Math.min(...maxValues);
+}
+
+/**
+ * Calculate dynamic output reserve based on effective max
+ * Reserve enough for tool call JSON responses, but scale with model capacity
+ */
+function calculateOutputReserve(effectiveMax: number): number {
+  const percentageReserve = effectiveMax * OUTPUT_RESERVE_PERCENTAGE;
+  return Math.max(MIN_OUTPUT_RESERVE, Math.min(MAX_OUTPUT_RESERVE, percentageReserve));
+}
+
+/**
+ * Check if model supports explicit reasoning.max_tokens
+ */
+function supportsExplicitBudget(modelId: string): boolean {
+  // Check if any prefix matches
+  if (SUPPORTS_EXPLICIT_BUDGET.some(prefix => modelId.startsWith(prefix))) {
+    return true;
+  }
+
+  // OpenAI o-series requires effort only
+  if (REQUIRES_EFFORT_ONLY.some(prefix => modelId.startsWith(prefix))) {
+    return false;
+  }
+
+  // Default: assume supports explicit budget for better control
+  // Most modern reasoning models support it
+  return true;
+}
+
+/**
+ * Format budget for display (e.g., 12500 -> "12.5K")
+ */
+function formatDisplayBudget(tokens: number): string {
+  if (tokens >= 1000) {
+    const k = tokens / 1000;
+    // Show one decimal if not a round number
+    return k % 1 === 0 ? `${k}K` : `${k.toFixed(1)}K`;
+  }
+  return String(tokens);
+}
+
+// ============================================================================
+// Main Resolver
+// ============================================================================
+
+/**
+ * Resolve reasoning budget for an OpenRouter model
+ *
+ * @example
+ * const budget = await resolveReasoningBudget({
+ *   effort: 'xhigh',
+ *   modelId: 'zhipu/glm-4.7',
+ *   selectedProviders: ['zhipu']
+ * });
+ * // Returns: { reasoning: { max_tokens: 15000 }, maxTokens: 18000, displayBudget: "15K" }
+ */
+export async function resolveReasoningBudget(
+  options: ResolverOptions
+): Promise<ReasoningBudgetResult> {
+  const { effort, modelId, selectedProviders, endpointsData } = options;
+
+  // Handle 'none' effort - no reasoning needed
+  if (effort === 'none') {
+    return {
+      reasoning: { effort: 'none' },
+      maxTokens: DEFAULT_MAX_COMPLETION_TOKENS,
+      displayBudget: '',
+      usesExplicitBudget: false,
+    };
+  }
+
+  // Get endpoints data (from cache, pre-fetched, or fetch now)
+  const endpoints = endpointsData || await fetchModelEndpoints(modelId);
+
+  // Calculate effective max tokens
+  const effectiveMax = calculateEffectiveMax(endpoints, selectedProviders);
+
+  // Calculate output reserve
+  const outputReserve = calculateOutputReserve(effectiveMax);
+
+  // Calculate available budget for reasoning
+  const available = effectiveMax - outputReserve;
+
+  // Calculate reasoning budget based on effort level
+  const effortPercentage = EFFORT_PERCENTAGES[effort];
+  const reasoningBudget = Math.floor(available * effortPercentage);
+
+  // Check API compatibility
+  const usesExplicit = supportsExplicitBudget(modelId);
+
+  logger.debug(`[ReasoningBudget] ${modelId}: effectiveMax=${effectiveMax}, outputReserve=${outputReserve}, available=${available}, effort=${effort} (${effortPercentage * 100}%), reasoningBudget=${reasoningBudget}, usesExplicit=${usesExplicit}`);
+
+  if (usesExplicit) {
+    // Use explicit max_tokens for reasoning
+    return {
+      reasoning: { max_tokens: reasoningBudget },
+      maxTokens: effectiveMax,
+      displayBudget: formatDisplayBudget(reasoningBudget),
+      usesExplicitBudget: true,
+    };
+  } else {
+    // Fall back to effort-based (OpenAI o-series)
+    return {
+      reasoning: { effort },
+      maxTokens: effectiveMax,
+      displayBudget: `~${formatDisplayBudget(reasoningBudget)}`,
+      usesExplicitBudget: false,
+    };
+  }
+}
+
+/**
+ * Synchronous version for cases where endpoints are already available
+ * (e.g., UI display where useModelEndpoints hook provides data)
+ */
+export function resolveReasoningBudgetSync(
+  options: ResolverOptions & { endpointsData: ModelEndpointData[] }
+): ReasoningBudgetResult {
+  const { effort, modelId, selectedProviders, endpointsData } = options;
+
+  // Handle 'none' effort
+  if (effort === 'none') {
+    return {
+      reasoning: { effort: 'none' },
+      maxTokens: DEFAULT_MAX_COMPLETION_TOKENS,
+      displayBudget: '',
+      usesExplicitBudget: false,
+    };
+  }
+
+  // Calculate effective max tokens
+  const effectiveMax = calculateEffectiveMax(endpointsData, selectedProviders);
+
+  // Calculate output reserve
+  const outputReserve = calculateOutputReserve(effectiveMax);
+
+  // Calculate available budget for reasoning
+  const available = effectiveMax - outputReserve;
+
+  // Calculate reasoning budget based on effort level
+  const effortPercentage = EFFORT_PERCENTAGES[effort];
+  const reasoningBudget = Math.floor(available * effortPercentage);
+
+  // Check API compatibility
+  const usesExplicit = supportsExplicitBudget(modelId);
+
+  if (usesExplicit) {
+    return {
+      reasoning: { max_tokens: reasoningBudget },
+      maxTokens: effectiveMax,
+      displayBudget: formatDisplayBudget(reasoningBudget),
+      usesExplicitBudget: true,
+    };
+  } else {
+    return {
+      reasoning: { effort },
+      maxTokens: effectiveMax,
+      displayBudget: `~${formatDisplayBudget(reasoningBudget)}`,
+      usesExplicitBudget: false,
+    };
+  }
+}
+
+/**
+ * Get display-friendly description of reasoning budget
+ * For use in UI to show users what they're getting
+ */
+export function getReasoningBudgetDescription(
+  effort: ReasoningEffort,
+  displayBudget: string
+): string {
+  if (effort === 'none' || !displayBudget) {
+    return 'Reasoning disabled';
+  }
+
+  const effortLabels: Record<ReasoningEffort, string> = {
+    xhigh: 'Very High',
+    high: 'High',
+    medium: 'Medium',
+    low: 'Low',
+    minimal: 'Minimal',
+    none: 'None',
+  };
+
+  return `${effortLabels[effort]} → ${displayBudget} reasoning tokens`;
+}
diff --git a/internal-packages/ai/src/utils/usageMetrics.ts b/internal-packages/ai/src/utils/usageMetrics.ts
new file mode 100644
index 000000000..0b728708c
--- /dev/null
+++ b/internal-packages/ai/src/utils/usageMetrics.ts
@@ -0,0 +1,261 @@
+/**
+ * Unified Usage Metrics
+ *
+ * Provides a consistent format for capturing usage data from both
+ * OpenRouter and Anthropic APIs, including cost calculation.
+ */
+
+// =============================================================================
+// Types
+// =============================================================================
+
+/**
+ * Unified usage metrics that work across all providers
+ */
+export interface UnifiedUsageMetrics {
+  // Core token counts
+  inputTokens: number;
+  outputTokens: number;
+  totalTokens: number;
+
+  // Cost
+  costUsd: number;
+  isCostFromApi: boolean;  // true = from API (OpenRouter), false = calculated (Anthropic)
+
+  // Cache metrics
+  cacheReadTokens?: number;
+  cacheWriteTokens?: number;
+
+  // Reasoning/thinking tokens (subset of outputTokens)
+  reasoningTokens?: number;
+
+  // Provider info
+  provider: string;
+  model: string;
+
+  // Latency
+  latencyMs: number;
+}
+
+/**
+ * Raw usage data from OpenRouter API response
+ */
+export interface OpenRouterRawUsage {
+  prompt_tokens: number;
+  completion_tokens: number;
+  total_tokens: number;
+  cost?: number;
+  is_byok?: boolean;
+  prompt_tokens_details?: {
+    cached_tokens?: number;
+    audio_tokens?: number;
+    video_tokens?: number;
+  };
+  cost_details?: {
+    upstream_inference_cost?: number | null;
+    upstream_inference_prompt_cost?: number;
+    upstream_inference_completions_cost?: number;
+  };
+  completion_tokens_details?: {
+    reasoning_tokens?: number;
+    image_tokens?: number;
+  };
+}
+
+/**
+ * Raw usage data from Anthropic API response
+ */
+export interface AnthropicRawUsage {
+  input_tokens: number;
+  output_tokens: number;
+  cache_creation_input_tokens?: number;
+  cache_read_input_tokens?: number;
+  cache_creation?: {
+    ephemeral_5m_input_tokens?: number;
+    ephemeral_1h_input_tokens?: number;
+  };
+  service_tier?: string;
+}
+
+// =============================================================================
+// Anthropic Pricing (per million tokens, in USD)
+// =============================================================================
+
+interface ModelPricing {
+  input: number;      // $ per million input tokens
+  output: number;     // $ per million output tokens
+  cacheRead: number;  // $ per million cache read tokens
+  cacheWrite5m?: number;  // $ per million 5-minute cache write tokens
+  cacheWrite1h?: number;  // $ per million 1-hour cache write tokens
+}
+
+/**
+ * Anthropic model pricing table
+ * Source: https://platform.claude.com/docs/en/about-claude/pricing
+ * Last updated: 2025-01-18
+ */
+export const ANTHROPIC_PRICING: Record<string, ModelPricing> = {
+  // Opus models
+  'claude-opus-4-5-20251101': { input: 5, output: 25, cacheRead: 0.50, cacheWrite5m: 6.25, cacheWrite1h: 10 },
+  'claude-opus-4-1-20250805': { input: 15, output: 75, cacheRead: 1.50, cacheWrite5m: 18.75, cacheWrite1h: 30 },
+  'claude-opus-4-20250514': { input: 15, output: 75, cacheRead: 1.50, cacheWrite5m: 18.75, cacheWrite1h: 30 },
+
+  // Sonnet models
+  'claude-sonnet-4-5-20250929': { input: 3, output: 15, cacheRead: 0.30, cacheWrite5m: 3.75, cacheWrite1h: 6 },
+  'claude-sonnet-4-20250514': { input: 3, output: 15, cacheRead: 0.30, cacheWrite5m: 3.75, cacheWrite1h: 6 },
+  'claude-3-7-sonnet-20250219': { input: 3, output: 15, cacheRead: 0.30, cacheWrite5m: 3.75, cacheWrite1h: 6 },
+
+  // Haiku models
+  'claude-haiku-4-5-20251001': { input: 1, output: 5, cacheRead: 0.10, cacheWrite5m: 1.25, cacheWrite1h: 2 },
+  'claude-3-5-haiku-20241022': { input: 0.80, output: 4, cacheRead: 0.08, cacheWrite5m: 1, cacheWrite1h: 1.6 },
+  'claude-3-haiku-20240307': { input: 0.25, output: 1.25, cacheRead: 0.03, cacheWrite5m: 0.30, cacheWrite1h: 0.50 },
+};
+
+// Aliases for common model name patterns
+const MODEL_ALIASES: Record<string, string> = {
+  // Short names to full model IDs
+  'claude-opus-4-5': 'claude-opus-4-5-20251101',
+  'claude-opus-4-1': 'claude-opus-4-1-20250805',
+  'claude-opus-4': 'claude-opus-4-20250514',
+  'claude-sonnet-4-5': 'claude-sonnet-4-5-20250929',
+  'claude-sonnet-4': 'claude-sonnet-4-20250514',
+  'claude-sonnet-3-7': 'claude-3-7-sonnet-20250219',
+  'claude-haiku-4-5': 'claude-haiku-4-5-20251001',
+  'claude-3-5-haiku': 'claude-3-5-haiku-20241022',
+  'claude-3-haiku': 'claude-3-haiku-20240307',
+};
+
+/**
+ * Get pricing for an Anthropic model
+ */
+export function getAnthropicPricing(modelId: string): ModelPricing | null {
+  // Try direct lookup first
+  if (ANTHROPIC_PRICING[modelId]) {
+    return ANTHROPIC_PRICING[modelId];
+  }
+
+  // Try alias lookup
+  const aliasedId = MODEL_ALIASES[modelId];
+  if (aliasedId && ANTHROPIC_PRICING[aliasedId]) {
+    return ANTHROPIC_PRICING[aliasedId];
+  }
+
+  // Try to match by prefix (e.g., "claude-sonnet-4-5" matches "claude-sonnet-4-5-20250929")
+  for (const [key, pricing] of Object.entries(ANTHROPIC_PRICING)) {
+    if (modelId.startsWith(key.split('-202')[0]) || key.startsWith(modelId)) {
+      return pricing;
+    }
+  }
+
+  return null;
+}
+
+/**
+ * Calculate cost for Anthropic API usage
+ */
+export function calculateAnthropicCost(
+  modelId: string,
+  usage: AnthropicRawUsage
+): number {
+  const pricing = getAnthropicPricing(modelId);
+  if (!pricing) {
+    console.warn(`[UsageMetrics] No pricing found for Anthropic model: ${modelId}`);
+    return 0;
+  }
+
+  const inputCost = (usage.input_tokens / 1_000_000) * pricing.input;
+  const outputCost = (usage.output_tokens / 1_000_000) * pricing.output;
+  const cacheReadCost = ((usage.cache_read_input_tokens || 0) / 1_000_000) * pricing.cacheRead;
+
+  // Cache write cost (assume 5-minute cache by default)
+  const cacheWriteCost = pricing.cacheWrite5m
+    ? ((usage.cache_creation_input_tokens || 0) / 1_000_000) * pricing.cacheWrite5m
+    : 0;
+
+  return inputCost + outputCost + cacheReadCost + cacheWriteCost;
+}
+
+// =============================================================================
+// Conversion Functions
+// =============================================================================
+
+/**
+ * Convert OpenRouter raw usage to unified metrics
+ */
+export function fromOpenRouterUsage(
+  usage: OpenRouterRawUsage,
+  provider: string,
+  model: string,
+  latencyMs: number
+): UnifiedUsageMetrics {
+  return {
+    inputTokens: usage.prompt_tokens,
+    outputTokens: usage.completion_tokens,
+    totalTokens: usage.total_tokens,
+    costUsd: usage.cost ?? 0,
+    isCostFromApi: usage.cost !== undefined,
+    cacheReadTokens: usage.prompt_tokens_details?.cached_tokens,
+    reasoningTokens: usage.completion_tokens_details?.reasoning_tokens,
+    provider,
+    model,
+    latencyMs,
+  };
+}
+
+/**
+ * Convert Anthropic raw usage to unified metrics
+ */
+export function fromAnthropicUsage(
+  usage: AnthropicRawUsage,
+  model: string,
+  latencyMs: number
+): UnifiedUsageMetrics {
+  const totalTokens = usage.input_tokens + usage.output_tokens;
+  const costUsd = calculateAnthropicCost(model, usage);
+
+  return {
+    inputTokens: usage.input_tokens,
+    outputTokens: usage.output_tokens,
+    totalTokens,
+    costUsd,
+    isCostFromApi: false, // Anthropic doesn't return cost in API
+    cacheReadTokens: usage.cache_read_input_tokens,
+    cacheWriteTokens: usage.cache_creation_input_tokens,
+    provider: 'anthropic',
+    model,
+    latencyMs,
+  };
+}
+
+/**
+ * Format cost for display (e.g., "$0.0023" or "$1.50")
+ */
+export function formatCost(costUsd: number): string {
+  if (costUsd === 0) return '$0.00';
+  if (costUsd < 0.01) {
+    return `$${costUsd.toFixed(6)}`;
+  }
+  return `$${costUsd.toFixed(4)}`;
+}
+
+/**
+ * Aggregate multiple usage metrics into a single summary
+ */
+export function aggregateUsageMetrics(metrics: UnifiedUsageMetrics[]): Omit<UnifiedUsageMetrics, 'provider' | 'model'> & { providers: string[], models: string[] } {
+  const providers = [...new Set(metrics.map(m => m.provider))];
+  const models = [...new Set(metrics.map(m => m.model))];
+
+  return {
+    inputTokens: metrics.reduce((sum, m) => sum + m.inputTokens, 0),
+    outputTokens: metrics.reduce((sum, m) => sum + m.outputTokens, 0),
+    totalTokens: metrics.reduce((sum, m) => sum + m.totalTokens, 0),
+    costUsd: metrics.reduce((sum, m) => sum + m.costUsd, 0),
+    isCostFromApi: metrics.every(m => m.isCostFromApi),
+    cacheReadTokens: metrics.reduce((sum, m) => sum + (m.cacheReadTokens || 0), 0) || undefined,
+    cacheWriteTokens: metrics.reduce((sum, m) => sum + (m.cacheWriteTokens || 0), 0) || undefined,
+    reasoningTokens: metrics.reduce((sum, m) => sum + (m.reasoningTokens || 0), 0) || undefined,
+    latencyMs: metrics.reduce((sum, m) => sum + m.latencyMs, 0),
+    providers,
+    models,
+  };
+}
diff --git a/internal-packages/ai/src/workflows/documentAnalysis/analyzeDocument.ts b/internal-packages/ai/src/workflows/documentAnalysis/analyzeDocument.ts
index 8d1066dd9..ba9ad7176 100644
--- a/internal-packages/ai/src/workflows/documentAnalysis/analyzeDocument.ts
+++ b/internal-packages/ai/src/workflows/documentAnalysis/analyzeDocument.ts
@@ -1,33 +1,50 @@
 import { logger } from "../../utils/logger";
 import type { Document } from "../../types/documents";
 import type { Agent } from "../../types/agentSchema";
-import type { Comment } from "../../shared/types";
 import { checkJobTimeout } from "../../shared/jobContext";
 import { analyzeDocumentUnified } from "./unified";
 import { PluginType } from "../../analysis-plugins/types/plugin-types";
-import type { TaskResult } from "./shared/types";
+import type { DocumentAnalysisResult, TaskResult } from "./shared/types";
 import { generateComprehensiveAnalysis } from "./comprehensiveAnalysis";
 import { extractHighlightsFromAnalysis } from "./highlightExtraction";
 import { generateSelfCritique } from "./selfCritique";
 
+export interface AnalyzeDocumentOptions {
+  targetWordCount?: number;
+  targetHighlights?: number;
+  jobId?: string;
+  /** Profile ID for FallacyCheckPlugin configuration */
+  fallacyCheckProfileId?: string;
+}
+
 export async function analyzeDocument(
   document: Document,
   agentInfo: Agent,
-  targetWordCount: number = 500,
+  targetWordCountOrOptions: number | AnalyzeDocumentOptions = 500,
   targetHighlights: number = 5,
   jobId?: string
-): Promise<{
-  thinking: string;
-  analysis: string;
-  summary: string;
-  grade?: number;
-  selfCritique?: string;
-  highlights: Comment[];
-  tasks: TaskResult[];
-  jobLogString?: string; // Include job log string for Job.logs field
-}> {
-  const logPrefix = `[Job ${jobId || 'N/A'}]`;
-  logger.info(`${logPrefix} Starting document analysis for agent ${agentInfo.name}`);
+): Promise<DocumentAnalysisResult> {
+  // Handle both old signature (positional args) and new signature (options object)
+  let options: AnalyzeDocumentOptions;
+  if (typeof targetWordCountOrOptions === 'object') {
+    options = targetWordCountOrOptions;
+  } else {
+    options = {
+      targetWordCount: targetWordCountOrOptions,
+      targetHighlights,
+      jobId,
+    };
+  }
+
+  const effectiveJobId = options.jobId;
+  const effectiveTargetHighlights = options.targetHighlights ?? 5;
+  const effectiveTargetWordCount = options.targetWordCount ?? 500;
+  const fallacyCheckProfileId = options.fallacyCheckProfileId;
+
+  const logPrefix = `[Job ${effectiveJobId || 'N/A'}]`;
+  logger.info(`${logPrefix} Starting document analysis for agent ${agentInfo.name}`, {
+    fallacyCheckProfileId,
+  });
   // Validate that all plugin IDs are valid PluginType entries
   const validPlugins = (agentInfo.pluginIds || []).filter((p): p is PluginType =>
     Object.values(PluginType).includes(p as PluginType)
@@ -47,13 +64,15 @@ export async function analyzeDocument(
       .join(', ')
       .slice(0, 500);
     logger.info(`${logPrefix} Using plugin-based workflow for agent ${agentInfo.name} with plugins: ${pluginListForLog}`);
-    
+
     return await analyzeDocumentUnified(document, agentInfo, {
-      targetHighlights,
-      jobId,
+      targetHighlights: effectiveTargetHighlights,
+      jobId: effectiveJobId,
       plugins: {
         include: validPlugins
-      }
+      },
+      fallacyCheckProfileId,
+      fallacyCheckAgentId: agentInfo.id,
     });
   } else {
     // No plugins configured - use traditional LLM-based comprehensive analysis
@@ -69,8 +88,8 @@ export async function analyzeDocument(
     const comprehensiveAnalysisResult = await generateComprehensiveAnalysis(
       document,
       agentInfo,
-      targetWordCount,
-      targetHighlights
+      effectiveTargetWordCount,
+      effectiveTargetHighlights
     );
     tasks.push(comprehensiveAnalysisResult.task);
 
@@ -83,7 +102,7 @@ export async function analyzeDocument(
       document,
       agentInfo,
       comprehensiveAnalysisResult.outputs,
-      targetHighlights
+      effectiveTargetHighlights
     );
     tasks.push(highlightExtractionResult.task);
     
@@ -123,7 +142,8 @@ export async function analyzeDocument(
       selfCritique,
       highlights: highlightExtractionResult.outputs.highlights,
       tasks,
-      jobLogString
+      jobLogString,
+      pipelineTelemetry: undefined, // LLM workflow doesn't use pipeline telemetry
     };
   }
 }
diff --git a/internal-packages/ai/src/workflows/documentAnalysis/shared/types.ts b/internal-packages/ai/src/workflows/documentAnalysis/shared/types.ts
index 94669e2e9..0d36787db 100644
--- a/internal-packages/ai/src/workflows/documentAnalysis/shared/types.ts
+++ b/internal-packages/ai/src/workflows/documentAnalysis/shared/types.ts
@@ -1,4 +1,4 @@
-// Shared types used across documentAnalysis modules  
+// Shared types used across documentAnalysis modules
 import type { Comment } from "../../../shared/types";
 
 export interface TaskResult {
@@ -9,6 +9,22 @@ export interface TaskResult {
   log: string;
 }
 
+/**
+ * Complete result from document analysis workflow.
+ * Returned by analyzeDocument() function.
+ */
+export interface DocumentAnalysisResult {
+  thinking: string;
+  analysis: string;
+  summary: string;
+  grade?: number;
+  selfCritique?: string;
+  highlights: Comment[];
+  tasks: TaskResult[];
+  jobLogString?: string;
+  pipelineTelemetry?: Record<string, unknown>;
+}
+
 export interface ThinkingOutputs {
   thinking: string;
 }
diff --git a/internal-packages/ai/src/workflows/documentAnalysis/unified/index.ts b/internal-packages/ai/src/workflows/documentAnalysis/unified/index.ts
index 27ae4e91b..10126059f 100644
--- a/internal-packages/ai/src/workflows/documentAnalysis/unified/index.ts
+++ b/internal-packages/ai/src/workflows/documentAnalysis/unified/index.ts
@@ -10,7 +10,7 @@ import type { Agent, Comment as AiComment, Document } from "@roast/ai";
 import { checkJobTimeout } from "../../../shared/jobContext";
 import { PluginType } from "../../../analysis-plugins/types/plugin-types";
 import { PluginManager } from "../../../analysis-plugins/PluginManager";
-import type { TaskResult } from "../shared/types";
+import type { DocumentAnalysisResult } from "../shared/types";
 
 export interface UnifiedAnalysisOptions {
   targetHighlights?: number;
@@ -19,22 +19,17 @@ export interface UnifiedAnalysisOptions {
     include?: readonly PluginType[];
     exclude?: readonly PluginType[];
   };
+  /** Profile ID for FallacyCheckPlugin configuration */
+  fallacyCheckProfileId?: string;
+  /** Agent ID for FallacyCheckPlugin default profile loading */
+  fallacyCheckAgentId?: string;
 }
 
 export async function analyzeDocumentUnified(
   document: Document,
   agentInfo: Agent,
   options: UnifiedAnalysisOptions = {}
-): Promise<{
-  thinking: string;
-  analysis: string;
-  summary: string;
-  grade?: number;
-  selfCritique?: string;
-  highlights: AiComment[];
-  tasks: TaskResult[];
-  jobLogString?: string;
-}> {
+): Promise<DocumentAnalysisResult> {
   // Check timeout before starting plugin analysis
   checkJobTimeout();
 
@@ -46,6 +41,9 @@ export async function analyzeDocumentUnified(
       include: options.plugins.include ? [...options.plugins.include] : undefined,
       exclude: options.plugins.exclude ? [...options.plugins.exclude] : undefined,
     } : undefined,
+    // Pass profile options for FallacyCheckPlugin
+    fallacyCheckProfileId: options.fallacyCheckProfileId,
+    fallacyCheckAgentId: options.fallacyCheckAgentId,
   });
 
   // Delegate to plugin system
@@ -56,7 +54,7 @@ export async function analyzeDocumentUnified(
   // Filter and convert comments as needed
   const validAiComments = result.highlights.filter(
     (h): h is AiComment =>
-      !!(h.description && h.highlight && typeof h.highlight?.isValid === "boolean")
+      !!(h.description && h.highlight && typeof h.highlight.isValid === "boolean")
   );
 
   return {
@@ -68,6 +66,7 @@ export async function analyzeDocumentUnified(
     highlights: aiCommentsToDbComments(validAiComments) as any,
     tasks: result.tasks,
     jobLogString: result.jobLogString,
+    pipelineTelemetry: result.pipelineTelemetry,
   };
 }
 
diff --git a/internal-packages/ai/src/workflows/index.ts b/internal-packages/ai/src/workflows/index.ts
index 998cc1cb7..432494974 100644
--- a/internal-packages/ai/src/workflows/index.ts
+++ b/internal-packages/ai/src/workflows/index.ts
@@ -11,4 +11,4 @@ export { extractHighlightsFromAnalysis } from "./documentAnalysis/highlightExtra
 export { generateSelfCritique } from "./documentAnalysis/selfCritique";
 
 // Export types
-export type { TaskResult } from "./documentAnalysis/shared/types";
\ No newline at end of file
+export type { TaskResult, DocumentAnalysisResult } from "./documentAnalysis/shared/types";
\ No newline at end of file
diff --git a/internal-packages/db/prisma/migrations/20260107113055_add_pipeline_telemetry/migration.sql b/internal-packages/db/prisma/migrations/20260107113055_add_pipeline_telemetry/migration.sql
new file mode 100644
index 000000000..872c91a4a
--- /dev/null
+++ b/internal-packages/db/prisma/migrations/20260107113055_add_pipeline_telemetry/migration.sql
@@ -0,0 +1,2 @@
+-- AlterTable
+ALTER TABLE "public"."EvaluationVersion" ADD COLUMN     "pipelineTelemetry" JSONB;
diff --git a/internal-packages/db/prisma/migrations/20260107122529_add_validation_baselines/migration.sql b/internal-packages/db/prisma/migrations/20260107122529_add_validation_baselines/migration.sql
new file mode 100644
index 000000000..669bc1462
--- /dev/null
+++ b/internal-packages/db/prisma/migrations/20260107122529_add_validation_baselines/migration.sql
@@ -0,0 +1,48 @@
+-- CreateTable
+CREATE TABLE "public"."ValidationBaseline" (
+    "id" TEXT NOT NULL,
+    "name" TEXT NOT NULL,
+    "description" TEXT,
+    "agentId" TEXT NOT NULL,
+    "commitHash" TEXT,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "createdById" TEXT,
+
+    CONSTRAINT "ValidationBaseline_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "public"."ValidationBaselineSnapshot" (
+    "id" TEXT NOT NULL,
+    "baselineId" TEXT NOT NULL,
+    "evaluationVersionId" TEXT NOT NULL,
+
+    CONSTRAINT "ValidationBaselineSnapshot_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE INDEX "ValidationBaseline_agentId_idx" ON "public"."ValidationBaseline"("agentId");
+
+-- CreateIndex
+CREATE INDEX "ValidationBaseline_createdAt_idx" ON "public"."ValidationBaseline"("createdAt");
+
+-- CreateIndex
+CREATE INDEX "ValidationBaselineSnapshot_baselineId_idx" ON "public"."ValidationBaselineSnapshot"("baselineId");
+
+-- CreateIndex
+CREATE INDEX "ValidationBaselineSnapshot_evaluationVersionId_idx" ON "public"."ValidationBaselineSnapshot"("evaluationVersionId");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "ValidationBaselineSnapshot_baselineId_evaluationVersionId_key" ON "public"."ValidationBaselineSnapshot"("baselineId", "evaluationVersionId");
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationBaseline" ADD CONSTRAINT "ValidationBaseline_agentId_fkey" FOREIGN KEY ("agentId") REFERENCES "public"."Agent"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationBaseline" ADD CONSTRAINT "ValidationBaseline_createdById_fkey" FOREIGN KEY ("createdById") REFERENCES "public"."User"("id") ON DELETE SET NULL ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationBaselineSnapshot" ADD CONSTRAINT "ValidationBaselineSnapshot_baselineId_fkey" FOREIGN KEY ("baselineId") REFERENCES "public"."ValidationBaseline"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationBaselineSnapshot" ADD CONSTRAINT "ValidationBaselineSnapshot_evaluationVersionId_fkey" FOREIGN KEY ("evaluationVersionId") REFERENCES "public"."EvaluationVersion"("id") ON DELETE CASCADE ON UPDATE CASCADE;
diff --git a/internal-packages/db/prisma/migrations/20260107130534_add_validation_run_tables/migration.sql b/internal-packages/db/prisma/migrations/20260107130534_add_validation_run_tables/migration.sql
new file mode 100644
index 000000000..993aafb0c
--- /dev/null
+++ b/internal-packages/db/prisma/migrations/20260107130534_add_validation_run_tables/migration.sql
@@ -0,0 +1,61 @@
+-- CreateTable
+CREATE TABLE "public"."ValidationRun" (
+    "id" TEXT NOT NULL,
+    "baselineId" TEXT NOT NULL,
+    "name" TEXT,
+    "commitHash" TEXT,
+    "status" TEXT NOT NULL DEFAULT 'running',
+    "summary" TEXT,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "completedAt" TIMESTAMP(3),
+
+    CONSTRAINT "ValidationRun_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateTable
+CREATE TABLE "public"."ValidationRunSnapshot" (
+    "id" TEXT NOT NULL,
+    "runId" TEXT NOT NULL,
+    "baselineSnapshotId" TEXT NOT NULL,
+    "newEvaluationId" TEXT NOT NULL,
+    "status" TEXT NOT NULL,
+    "keptCount" INTEGER NOT NULL DEFAULT 0,
+    "newCount" INTEGER NOT NULL DEFAULT 0,
+    "lostCount" INTEGER NOT NULL DEFAULT 0,
+    "comparisonData" JSONB,
+
+    CONSTRAINT "ValidationRunSnapshot_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE INDEX "ValidationRun_baselineId_idx" ON "public"."ValidationRun"("baselineId");
+
+-- CreateIndex
+CREATE INDEX "ValidationRun_createdAt_idx" ON "public"."ValidationRun"("createdAt");
+
+-- CreateIndex
+CREATE INDEX "ValidationRun_status_idx" ON "public"."ValidationRun"("status");
+
+-- CreateIndex
+CREATE INDEX "ValidationRunSnapshot_runId_idx" ON "public"."ValidationRunSnapshot"("runId");
+
+-- CreateIndex
+CREATE INDEX "ValidationRunSnapshot_baselineSnapshotId_idx" ON "public"."ValidationRunSnapshot"("baselineSnapshotId");
+
+-- CreateIndex
+CREATE INDEX "ValidationRunSnapshot_status_idx" ON "public"."ValidationRunSnapshot"("status");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "ValidationRunSnapshot_runId_baselineSnapshotId_key" ON "public"."ValidationRunSnapshot"("runId", "baselineSnapshotId");
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationRun" ADD CONSTRAINT "ValidationRun_baselineId_fkey" FOREIGN KEY ("baselineId") REFERENCES "public"."ValidationBaseline"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationRunSnapshot" ADD CONSTRAINT "ValidationRunSnapshot_runId_fkey" FOREIGN KEY ("runId") REFERENCES "public"."ValidationRun"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationRunSnapshot" ADD CONSTRAINT "ValidationRunSnapshot_baselineSnapshotId_fkey" FOREIGN KEY ("baselineSnapshotId") REFERENCES "public"."ValidationBaselineSnapshot"("id") ON DELETE CASCADE ON UPDATE CASCADE;
+
+-- AddForeignKey
+ALTER TABLE "public"."ValidationRunSnapshot" ADD CONSTRAINT "ValidationRunSnapshot_newEvaluationId_fkey" FOREIGN KEY ("newEvaluationId") REFERENCES "public"."EvaluationVersion"("id") ON DELETE CASCADE ON UPDATE CASCADE;
diff --git a/internal-packages/db/prisma/migrations/20260118095326_add_fallacy_checker_profile/migration.sql b/internal-packages/db/prisma/migrations/20260118095326_add_fallacy_checker_profile/migration.sql
new file mode 100644
index 000000000..ab5b9d43f
--- /dev/null
+++ b/internal-packages/db/prisma/migrations/20260118095326_add_fallacy_checker_profile/migration.sql
@@ -0,0 +1,22 @@
+-- CreateTable
+CREATE TABLE "public"."FallacyCheckerProfile" (
+    "id" TEXT NOT NULL,
+    "name" TEXT NOT NULL,
+    "description" TEXT,
+    "agentId" TEXT NOT NULL,
+    "config" JSONB NOT NULL,
+    "isDefault" BOOLEAN NOT NULL DEFAULT false,
+    "createdAt" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    "updatedAt" TIMESTAMP(3) NOT NULL,
+
+    CONSTRAINT "FallacyCheckerProfile_pkey" PRIMARY KEY ("id")
+);
+
+-- CreateIndex
+CREATE INDEX "FallacyCheckerProfile_agentId_idx" ON "public"."FallacyCheckerProfile"("agentId");
+
+-- CreateIndex
+CREATE INDEX "FallacyCheckerProfile_isDefault_idx" ON "public"."FallacyCheckerProfile"("isDefault");
+
+-- CreateIndex
+CREATE UNIQUE INDEX "FallacyCheckerProfile_agentId_name_key" ON "public"."FallacyCheckerProfile"("agentId", "name");
diff --git a/internal-packages/db/prisma/migrations/20260118100032_add_profileid_to_validation_run/migration.sql b/internal-packages/db/prisma/migrations/20260118100032_add_profileid_to_validation_run/migration.sql
new file mode 100644
index 000000000..f75f8855d
--- /dev/null
+++ b/internal-packages/db/prisma/migrations/20260118100032_add_profileid_to_validation_run/migration.sql
@@ -0,0 +1,2 @@
+-- AlterTable
+ALTER TABLE "public"."ValidationRun" ADD COLUMN "profileId" TEXT;
diff --git a/internal-packages/db/prisma/schema.prisma b/internal-packages/db/prisma/schema.prisma
index a19fae279..a0d08f955 100644
--- a/internal-packages/db/prisma/schema.prisma
+++ b/internal-packages/db/prisma/schema.prisma
@@ -50,14 +50,15 @@ model User {
   evalsThisMonth   Int                @default(0)
   hourResetAt      DateTime?
   monthResetAt     DateTime?
-  accounts         Account[]
-  agents           Agent[]
-  agentEvalBatches AgentEvalBatch[]
-  apiKeys          ApiKey[]
-  documents        Document[]
-  sessions         Session[]
-  cancelledJobs    Job[]
-  claimEvaluations ClaimEvaluation[]
+  accounts            Account[]
+  agents              Agent[]
+  agentEvalBatches    AgentEvalBatch[]
+  apiKeys             ApiKey[]
+  documents           Document[]
+  sessions            Session[]
+  cancelledJobs       Job[]
+  claimEvaluations    ClaimEvaluation[]
+  validationBaselines ValidationBaseline[]
 }
 
 model VerificationToken {
@@ -142,12 +143,15 @@ model EvaluationVersion {
   selfCritique      String?
   version           Int                 @default(1)
   isStale           Boolean             @default(false)
-  comments          EvaluationComment[]
-  agentVersion      AgentVersion        @relation(fields: [agentVersionId], references: [id])
-  documentVersion   DocumentVersion     @relation(fields: [documentVersionId], references: [id], onDelete: Cascade)
-  evaluation        Evaluation          @relation(fields: [evaluationId], references: [id], onDelete: Cascade)
-  job               Job?
-  metaEvaluations   MetaEvaluation[]
+  pipelineTelemetry Json?
+  comments                    EvaluationComment[]
+  agentVersion                AgentVersion                 @relation(fields: [agentVersionId], references: [id])
+  documentVersion             DocumentVersion              @relation(fields: [documentVersionId], references: [id], onDelete: Cascade)
+  evaluation                  Evaluation                   @relation(fields: [evaluationId], references: [id], onDelete: Cascade)
+  job                         Job?
+  metaEvaluations             MetaEvaluation[]
+  validationBaselineSnapshots ValidationBaselineSnapshot[]
+  validationRunSnapshots      ValidationRunSnapshot[]      @relation("ValidationRunNewEvaluation")
 
   @@unique([evaluationId, version])
   @@index([evaluationId])
@@ -194,11 +198,12 @@ model Agent {
   isDeprecated     Boolean          @default(false)
   isRecommended    Boolean          @default(false)
   isLlmCostTracked Boolean          @default(true)
-  ephemeralBatch   AgentEvalBatch?  @relation("EphemeralAgent", fields: [ephemeralBatchId], references: [id], onDelete: Cascade)
-  submittedBy      User             @relation(fields: [submittedById], references: [id], onDelete: Cascade)
-  evalBatches      AgentEvalBatch[]
-  versions         AgentVersion[]
-  evaluations      Evaluation[]
+  ephemeralBatch       AgentEvalBatch?       @relation("EphemeralAgent", fields: [ephemeralBatchId], references: [id], onDelete: Cascade)
+  submittedBy          User                  @relation(fields: [submittedById], references: [id], onDelete: Cascade)
+  evalBatches          AgentEvalBatch[]
+  versions             AgentVersion[]
+  evaluations          Evaluation[]
+  validationBaselines  ValidationBaseline[]
 }
 
 /// This model contains an expression index which requires additional setup for migrations. Visit https://pris.ly/d/expression-indexes for more info.
@@ -439,3 +444,99 @@ model MetaEvaluationDimension {
   @@index([name])
   @@index([score])
 }
+
+/// Validation baseline - a saved reference point for regression testing
+model ValidationBaseline {
+  id          String   @id @default(cuid())
+  name        String   // "Pre-refactor", "v2.0 release"
+  description String?
+  agentId     String
+  commitHash  String?  // git commit when baseline was created
+  createdAt   DateTime @default(now())
+  createdById String?
+
+  agent     Agent                        @relation(fields: [agentId], references: [id], onDelete: Cascade)
+  createdBy User?                        @relation(fields: [createdById], references: [id])
+  snapshots ValidationBaselineSnapshot[]
+  runs      ValidationRun[]
+
+  @@index([agentId])
+  @@index([createdAt])
+}
+
+/// Links a baseline to specific evaluation versions (one per document)
+model ValidationBaselineSnapshot {
+  id                  String @id @default(cuid())
+  baselineId          String
+  evaluationVersionId String
+
+  baseline          ValidationBaseline      @relation(fields: [baselineId], references: [id], onDelete: Cascade)
+  evaluationVersion EvaluationVersion       @relation(fields: [evaluationVersionId], references: [id], onDelete: Cascade)
+  runSnapshots      ValidationRunSnapshot[]
+
+  @@unique([baselineId, evaluationVersionId])
+  @@index([baselineId])
+  @@index([evaluationVersionId])
+}
+
+/// A validation run - comparing new pipeline output against a baseline
+model ValidationRun {
+  id          String    @id @default(cuid())
+  baselineId  String
+  name        String?   // Optional name for the run
+  commitHash  String?   // git commit when run was executed
+  profileId   String?   // Optional profile ID used for this run
+  status      String    @default("running") // "running" | "completed" | "failed"
+  summary     String?   // Quick summary: "4 unchanged, 2 changed"
+  createdAt   DateTime  @default(now())
+  completedAt DateTime?
+
+  baseline  ValidationBaseline      @relation(fields: [baselineId], references: [id], onDelete: Cascade)
+  snapshots ValidationRunSnapshot[]
+
+  @@index([baselineId])
+  @@index([createdAt])
+  @@index([status])
+}
+
+/// Per-document results from a validation run
+model ValidationRunSnapshot {
+  id                 String @id @default(cuid())
+  runId              String
+  baselineSnapshotId String // The baseline snapshot being compared against
+  newEvaluationId    String // The new evaluation version from this run
+
+  // Comparison results
+  status    String // "unchanged" | "changed"
+  keptCount Int    @default(0) // Comments that matched
+  newCount  Int    @default(0) // New comments not in baseline
+  lostCount Int    @default(0) // Baseline comments not in new
+
+  // Store detailed diff as JSON for viewing later
+  comparisonData Json? // { matchedComments, newComments, lostComments }
+
+  run              ValidationRun              @relation(fields: [runId], references: [id], onDelete: Cascade)
+  baselineSnapshot ValidationBaselineSnapshot @relation(fields: [baselineSnapshotId], references: [id], onDelete: Cascade)
+  newEvaluation    EvaluationVersion          @relation("ValidationRunNewEvaluation", fields: [newEvaluationId], references: [id], onDelete: Cascade)
+
+  @@unique([runId, baselineSnapshotId])
+  @@index([runId])
+  @@index([baselineSnapshotId])
+  @@index([status])
+}
+
+/// Configuration profile for the fallacy checker pipeline
+model FallacyCheckerProfile {
+  id          String   @id @default(cuid())
+  name        String
+  description String?
+  agentId     String   // Links to agent (e.g., "system-fallacy-check")
+  config      Json     // FallacyCheckerProfileConfig
+  isDefault   Boolean  @default(false)
+  createdAt   DateTime @default(now())
+  updatedAt   DateTime @updatedAt
+
+  @@unique([agentId, name])
+  @@index([agentId])
+  @@index([isDefault])
+}
diff --git a/internal-packages/db/src/repositories/MetaEvaluationRepository.ts b/internal-packages/db/src/repositories/MetaEvaluationRepository.ts
index d88c7c635..ced7999f2 100644
--- a/internal-packages/db/src/repositories/MetaEvaluationRepository.ts
+++ b/internal-packages/db/src/repositories/MetaEvaluationRepository.ts
@@ -357,8 +357,8 @@ export class MetaEvaluationRepository {
         documentId: s.documentId,
         agentNames,
         runCount: s.runs.length,
-        firstRunAt: s.runs[0]?.createdAt || s.createdAt,
-        lastRunAt: s.runs[s.runs.length - 1]?.createdAt || s.createdAt,
+        firstRunAt: s.runs[0]?.createdAt ?? s.createdAt,
+        lastRunAt: s.runs[s.runs.length - 1]?.createdAt ?? s.createdAt,
       };
     });
   }
@@ -392,6 +392,20 @@ export class MetaEvaluationRepository {
     });
   }
 
+  /**
+   * Delete a series and all its runs.
+   */
+  async deleteSeries(seriesId: string): Promise<void> {
+    // Delete runs first (foreign key constraint)
+    await this.prisma.seriesRun.deleteMany({
+      where: { seriesId },
+    });
+    // Delete the series
+    await this.prisma.series.delete({
+      where: { id: seriesId },
+    });
+  }
+
   /**
    * Get detailed info about a specific series, including all runs.
    */
@@ -437,9 +451,7 @@ export class MetaEvaluationRepository {
     }
 
     const docVersion = series.document.versions[0];
-    if (!docVersion) {
-      return null;
-    }
+    // Note: TypeScript believes docVersion is always defined due to Prisma types
 
     // Get scoring info for all evaluation versions in this series
     const evalVersionIds = series.runs
@@ -519,11 +531,23 @@ export class MetaEvaluationRepository {
 
   /**
    * Get recent documents (non-ephemeral).
+   * @param titleFilter - Optional case-insensitive title search filter
    */
-  async getRecentDocuments(): Promise<DocumentChoice[]> {
+  async getRecentDocuments(titleFilter?: string): Promise<DocumentChoice[]> {
     const documents = await this.prisma.document.findMany({
       where: {
         ephemeralBatchId: null,
+        // Filter by title in versions if filter provided
+        ...(titleFilter && {
+          versions: {
+            some: {
+              title: {
+                contains: titleFilter,
+                mode: "insensitive" as const,
+              },
+            },
+          },
+        }),
       },
       include: {
         versions: {
@@ -533,7 +557,7 @@ export class MetaEvaluationRepository {
         },
       },
       orderBy: { createdAt: "desc" },
-      take: 30,
+      take: 100,
     });
 
     return documents
@@ -687,6 +711,676 @@ export class MetaEvaluationRepository {
   async disconnect(): Promise<void> {
     await this.prisma.$disconnect();
   }
+
+  // ==========================================================================
+  // Validation Framework Methods
+  // ==========================================================================
+
+  /**
+   * Get documents suitable for validation testing.
+   * Returns documents that have been evaluated by the specified agent.
+   */
+  async getValidationCorpusDocuments(
+    agentId: string,
+    options: { limit?: number; minContentLength?: number; filter?: string } = {}
+  ): Promise<
+    Array<{
+      documentId: string;
+      title: string;
+      contentLength: number;
+      lastEvaluatedAt: Date | null;
+      evaluationCount: number;
+    }>
+  > {
+    const { limit = 50, minContentLength = 100, filter } = options;
+
+    // Get documents that have evaluations from this agent
+    const evaluations = await this.prisma.evaluation.findMany({
+      where: {
+        agentId,
+        ...(filter && {
+          document: {
+            versions: { some: { title: { contains: filter, mode: "insensitive" } } },
+          },
+        }),
+      },
+      include: {
+        document: {
+          include: {
+            versions: {
+              orderBy: { version: "desc" },
+              take: 1,
+              select: { title: true, content: true },
+            },
+          },
+        },
+        versions: {
+          orderBy: { createdAt: "desc" },
+          take: 1,
+          select: { createdAt: true },
+        },
+        _count: { select: { versions: true } },
+      },
+      take: limit,
+    });
+
+    return evaluations
+      .filter((e) => {
+        const content = e.document.versions[0]?.content;
+        return content && content.length >= minContentLength;
+      })
+      .map((e) => ({
+        documentId: e.documentId,
+        title: e.document.versions[0]?.title || "Unknown",
+        contentLength: e.document.versions[0]?.content.length || 0,
+        lastEvaluatedAt: e.versions[0]?.createdAt ?? null,
+        evaluationCount: e._count.versions,
+      }));
+  }
+
+  /**
+   * Get evaluation snapshots for a set of documents from a specific agent.
+   * Returns the most recent EvaluationVersion for each document.
+   */
+  async getEvaluationSnapshots(
+    documentIds: string[],
+    agentId: string
+  ): Promise<
+    Array<{
+      evaluationVersionId: string;
+      agentId: string;
+      agentName: string;
+      createdAt: Date;
+      documentId: string;
+      documentTitle: string;
+      grade: number | null;
+      pipelineTelemetry: unknown;
+      comments: Array<{
+        id: string;
+        quotedText: string;
+        header: string | null;
+        description: string;
+        importance: number | null;
+        startOffset: number;
+        endOffset: number;
+      }>;
+    }>
+  > {
+    // Get the most recent evaluation version for each document
+    const evaluations = await this.prisma.evaluation.findMany({
+      where: {
+        agentId,
+        documentId: { in: documentIds },
+      },
+      include: {
+        agent: {
+          include: {
+            versions: {
+              orderBy: { version: "desc" },
+              take: 1,
+              select: { name: true },
+            },
+          },
+        },
+        document: {
+          include: {
+            versions: {
+              orderBy: { version: "desc" },
+              take: 1,
+              select: { title: true },
+            },
+          },
+        },
+        versions: {
+          orderBy: { createdAt: "desc" },
+          take: 1,
+          include: {
+            comments: {
+              include: {
+                highlight: true,
+              },
+            },
+          },
+        },
+      },
+    });
+
+    return evaluations
+      .filter((e) => e.versions.length > 0)
+      .map((e) => {
+        const version = e.versions[0];
+        return {
+          evaluationVersionId: version.id,
+          agentId: e.agentId,
+          agentName: e.agent.versions[0]?.name || e.agentId,
+          createdAt: version.createdAt,
+          documentId: e.documentId,
+          documentTitle: e.document.versions[0]?.title || "Unknown",
+          grade: version.grade,
+          pipelineTelemetry: version.pipelineTelemetry,
+          comments: version.comments.map((c) => ({
+            id: c.id,
+            quotedText: c.highlight.quotedText,
+            header: c.header,
+            description: c.description,
+            importance: c.importance,
+            startOffset: c.highlight.startOffset,
+            endOffset: c.highlight.endOffset,
+          })),
+        };
+      });
+  }
+
+  /**
+   * Get a specific evaluation version by ID with full details for comparison.
+   */
+  async getEvaluationSnapshotById(evaluationVersionId: string): Promise<{
+    evaluationVersionId: string;
+    agentId: string;
+    agentName: string;
+    createdAt: Date;
+    documentId: string;
+    documentTitle: string;
+    grade: number | null;
+    pipelineTelemetry: unknown;
+    comments: Array<{
+      id: string;
+      quotedText: string;
+      header: string | null;
+      description: string;
+      importance: number | null;
+      startOffset: number;
+      endOffset: number;
+    }>;
+  } | null> {
+    const version = await this.prisma.evaluationVersion.findUnique({
+      where: { id: evaluationVersionId },
+      include: {
+        evaluation: {
+          include: {
+            agent: {
+              include: {
+                versions: {
+                  orderBy: { version: "desc" },
+                  take: 1,
+                  select: { name: true },
+                },
+              },
+            },
+            document: {
+              include: {
+                versions: {
+                  orderBy: { version: "desc" },
+                  take: 1,
+                  select: { title: true },
+                },
+              },
+            },
+          },
+        },
+        comments: {
+          include: {
+            highlight: true,
+          },
+        },
+      },
+    });
+
+    if (!version) return null;
+
+    return {
+      evaluationVersionId: version.id,
+      agentId: version.agentId,
+      agentName: version.evaluation.agent.versions[0]?.name || version.agentId,
+      createdAt: version.createdAt,
+      documentId: version.evaluation.documentId,
+      documentTitle: version.evaluation.document.versions[0]?.title || "Unknown",
+      grade: version.grade,
+      pipelineTelemetry: version.pipelineTelemetry,
+      comments: version.comments.map((c) => ({
+        id: c.id,
+        quotedText: c.highlight.quotedText,
+        header: c.header,
+        description: c.description,
+        importance: c.importance,
+        startOffset: c.highlight.startOffset,
+        endOffset: c.highlight.endOffset,
+      })),
+    };
+  }
+
+  // ==========================================================================
+  // Validation Baseline Methods
+  // ==========================================================================
+
+  /**
+   * Create a new validation baseline from existing evaluation versions.
+   */
+  async createValidationBaseline(input: {
+    name: string;
+    description?: string;
+    agentId: string;
+    evaluationVersionIds: string[];
+    commitHash?: string;
+    createdById?: string;
+  }): Promise<{ id: string; name: string; snapshotCount: number }> {
+    const baseline = await this.prisma.validationBaseline.create({
+      data: {
+        name: input.name,
+        description: input.description,
+        agentId: input.agentId,
+        commitHash: input.commitHash,
+        createdById: input.createdById,
+        snapshots: {
+          create: input.evaluationVersionIds.map((evId) => ({
+            evaluationVersionId: evId,
+          })),
+        },
+      },
+      include: {
+        _count: { select: { snapshots: true } },
+      },
+    });
+
+    return {
+      id: baseline.id,
+      name: baseline.name,
+      snapshotCount: baseline._count.snapshots,
+    };
+  }
+
+  /**
+   * Get all validation baselines for an agent.
+   */
+  async getValidationBaselines(agentId: string): Promise<
+    Array<{
+      id: string;
+      name: string;
+      description: string | null;
+      commitHash: string | null;
+      createdAt: Date;
+      snapshotCount: number;
+    }>
+  > {
+    const baselines = await this.prisma.validationBaseline.findMany({
+      where: { agentId },
+      include: {
+        _count: { select: { snapshots: true } },
+      },
+      orderBy: { createdAt: "desc" },
+    });
+
+    return baselines.map((b) => ({
+      id: b.id,
+      name: b.name,
+      description: b.description,
+      commitHash: b.commitHash,
+      createdAt: b.createdAt,
+      snapshotCount: b._count.snapshots,
+    }));
+  }
+
+  /**
+   * Get evaluation snapshots from a baseline.
+   */
+  async getBaselineSnapshots(baselineId: string): Promise<
+    Array<{
+      evaluationVersionId: string;
+      agentId: string;
+      agentName: string;
+      createdAt: Date;
+      documentId: string;
+      documentTitle: string;
+      grade: number | null;
+      pipelineTelemetry: unknown;
+      comments: Array<{
+        id: string;
+        quotedText: string;
+        header: string | null;
+        description: string;
+        importance: number | null;
+        startOffset: number;
+        endOffset: number;
+      }>;
+    }>
+  > {
+    const baseline = await this.prisma.validationBaseline.findUnique({
+      where: { id: baselineId },
+      include: {
+        snapshots: {
+          include: {
+            evaluationVersion: {
+              include: {
+                evaluation: {
+                  include: {
+                    agent: {
+                      include: {
+                        versions: {
+                          orderBy: { version: "desc" },
+                          take: 1,
+                          select: { name: true },
+                        },
+                      },
+                    },
+                    document: {
+                      include: {
+                        versions: {
+                          orderBy: { version: "desc" },
+                          take: 1,
+                          select: { title: true },
+                        },
+                      },
+                    },
+                  },
+                },
+                comments: {
+                  include: {
+                    highlight: true,
+                  },
+                },
+              },
+            },
+          },
+        },
+      },
+    });
+
+    if (!baseline) return [];
+
+    return baseline.snapshots.map((s) => {
+      const ev = s.evaluationVersion;
+      return {
+        evaluationVersionId: ev.id,
+        agentId: ev.agentId,
+        agentName: ev.evaluation.agent.versions[0]?.name || ev.agentId,
+        createdAt: ev.createdAt,
+        documentId: ev.evaluation.documentId,
+        documentTitle: ev.evaluation.document.versions[0]?.title || "Unknown",
+        grade: ev.grade,
+        pipelineTelemetry: ev.pipelineTelemetry,
+        comments: ev.comments.map((c) => ({
+          id: c.id,
+          quotedText: c.highlight.quotedText,
+          header: c.header,
+          description: c.description,
+          importance: c.importance,
+          startOffset: c.highlight.startOffset,
+          endOffset: c.highlight.endOffset,
+        })),
+      };
+    });
+  }
+
+  /**
+   * Delete a validation baseline.
+   */
+  async deleteValidationBaseline(baselineId: string): Promise<void> {
+    await this.prisma.validationBaseline.delete({
+      where: { id: baselineId },
+    });
+  }
+
+  /**
+   * Get document IDs from a baseline (for running new evaluations).
+   */
+  async getBaselineDocumentIds(baselineId: string): Promise<string[]> {
+    const baseline = await this.prisma.validationBaseline.findUnique({
+      where: { id: baselineId },
+      include: {
+        snapshots: {
+          include: {
+            evaluationVersion: {
+              include: {
+                evaluation: {
+                  select: { documentId: true },
+                },
+              },
+            },
+          },
+        },
+      },
+    });
+
+    if (!baseline) return [];
+
+    return [...new Set(baseline.snapshots.map((s) => s.evaluationVersion.evaluation.documentId))];
+  }
+
+  // ==========================================================================
+  // Validation Run Methods
+  // ==========================================================================
+
+  /**
+   * Create a new validation run.
+   */
+  async createValidationRun(input: {
+    baselineId: string;
+    name?: string;
+    commitHash?: string;
+    profileId?: string;
+  }): Promise<{ id: string; baselineId: string; status: string; profileId?: string }> {
+    const run = await this.prisma.validationRun.create({
+      data: {
+        baselineId: input.baselineId,
+        name: input.name,
+        commitHash: input.commitHash,
+        profileId: input.profileId,
+        status: "running",
+      },
+    });
+
+    return {
+      id: run.id,
+      baselineId: run.baselineId,
+      status: run.status,
+      profileId: run.profileId ?? undefined,
+    };
+  }
+
+  /**
+   * Update validation run status and summary.
+   */
+  async updateValidationRunStatus(
+    runId: string,
+    status: "running" | "completed" | "failed",
+    summary?: string
+  ): Promise<void> {
+    await this.prisma.validationRun.update({
+      where: { id: runId },
+      data: {
+        status,
+        summary,
+        completedAt: status !== "running" ? new Date() : undefined,
+      },
+    });
+  }
+
+  /**
+   * Add a per-document result to a validation run.
+   */
+  async addValidationRunSnapshot(input: {
+    runId: string;
+    baselineSnapshotId: string;
+    newEvaluationId: string;
+    status: "unchanged" | "changed";
+    keptCount: number;
+    newCount: number;
+    lostCount: number;
+    comparisonData?: unknown;
+  }): Promise<{ id: string }> {
+    const snapshot = await this.prisma.validationRunSnapshot.create({
+      data: {
+        runId: input.runId,
+        baselineSnapshotId: input.baselineSnapshotId,
+        newEvaluationId: input.newEvaluationId,
+        status: input.status,
+        keptCount: input.keptCount,
+        newCount: input.newCount,
+        lostCount: input.lostCount,
+        comparisonData: input.comparisonData as object | undefined,
+      },
+    });
+
+    return { id: snapshot.id };
+  }
+
+  /**
+   * Get all validation runs for a baseline.
+   */
+  async getValidationRuns(baselineId: string): Promise<
+    Array<{
+      id: string;
+      name: string | null;
+      commitHash: string | null;
+      status: string;
+      summary: string | null;
+      createdAt: Date;
+      completedAt: Date | null;
+      snapshotCount: number;
+      unchangedCount: number;
+      changedCount: number;
+    }>
+  > {
+    const runs = await this.prisma.validationRun.findMany({
+      where: { baselineId },
+      include: {
+        snapshots: {
+          select: { status: true },
+        },
+      },
+      orderBy: { createdAt: "desc" },
+    });
+
+    return runs.map((r) => ({
+      id: r.id,
+      name: r.name,
+      commitHash: r.commitHash,
+      status: r.status,
+      summary: r.summary,
+      createdAt: r.createdAt,
+      completedAt: r.completedAt,
+      snapshotCount: r.snapshots.length,
+      unchangedCount: r.snapshots.filter((s) => s.status === "unchanged").length,
+      changedCount: r.snapshots.filter((s) => s.status === "changed").length,
+    }));
+  }
+
+  /**
+   * Get full details of a validation run including all snapshot comparisons.
+   */
+  async getValidationRunDetail(runId: string): Promise<{
+    id: string;
+    name: string | null;
+    commitHash: string | null;
+    status: string;
+    summary: string | null;
+    createdAt: Date;
+    completedAt: Date | null;
+    baseline: { id: string; name: string };
+    snapshots: Array<{
+      id: string;
+      status: string;
+      keptCount: number;
+      newCount: number;
+      lostCount: number;
+      documentId: string;
+      documentTitle: string;
+      comparisonData: unknown;
+    }>;
+  } | null> {
+    const run = await this.prisma.validationRun.findUnique({
+      where: { id: runId },
+      include: {
+        baseline: {
+          select: { id: true, name: true },
+        },
+        snapshots: {
+          include: {
+            baselineSnapshot: {
+              include: {
+                evaluationVersion: {
+                  include: {
+                    evaluation: {
+                      include: {
+                        document: {
+                          include: {
+                            versions: {
+                              orderBy: { version: "desc" },
+                              take: 1,
+                              select: { title: true },
+                            },
+                          },
+                        },
+                      },
+                    },
+                  },
+                },
+              },
+            },
+          },
+        },
+      },
+    });
+
+    if (!run) return null;
+
+    return {
+      id: run.id,
+      name: run.name,
+      commitHash: run.commitHash,
+      status: run.status,
+      summary: run.summary,
+      createdAt: run.createdAt,
+      completedAt: run.completedAt,
+      baseline: run.baseline,
+      snapshots: run.snapshots.map((s) => ({
+        id: s.id,
+        status: s.status,
+        keptCount: s.keptCount,
+        newCount: s.newCount,
+        lostCount: s.lostCount,
+        documentId: s.baselineSnapshot.evaluationVersion.evaluation.documentId,
+        documentTitle:
+          s.baselineSnapshot.evaluationVersion.evaluation.document.versions[0]?.title || "Unknown",
+        comparisonData: s.comparisonData,
+      })),
+    };
+  }
+
+  /**
+   * Delete a validation run.
+   */
+  async deleteValidationRun(runId: string): Promise<void> {
+    await this.prisma.validationRun.delete({
+      where: { id: runId },
+    });
+  }
+
+  /**
+   * Get baseline snapshot ID by baseline and document.
+   * Used when saving run results to link to the correct baseline snapshot.
+   */
+  async getBaselineSnapshotByDocument(
+    baselineId: string,
+    documentId: string
+  ): Promise<{ id: string; evaluationVersionId: string } | null> {
+    const snapshot = await this.prisma.validationBaselineSnapshot.findFirst({
+      where: {
+        baselineId,
+        evaluationVersion: {
+          evaluation: {
+            documentId,
+          },
+        },
+      },
+      select: {
+        id: true,
+        evaluationVersionId: true,
+      },
+    });
+
+    return snapshot;
+  }
 }
 
 // Default instance for convenience
diff --git a/internal-packages/jobs/.eslintrc.json b/internal-packages/jobs/.eslintrc.json
new file mode 100644
index 000000000..9f0f0ff9f
--- /dev/null
+++ b/internal-packages/jobs/.eslintrc.json
@@ -0,0 +1,6 @@
+{
+  "extends": ["../../apps/web/config/eslint/.eslintrc.json"],
+  "parserOptions": {
+    "project": "./tsconfig.lint.json"
+  }
+}
diff --git a/internal-packages/jobs/src/cli/process-pgboss-worker.ts b/internal-packages/jobs/src/cli/process-pgboss-worker.ts
index 4c90be921..d61ad9450 100644
--- a/internal-packages/jobs/src/cli/process-pgboss-worker.ts
+++ b/internal-packages/jobs/src/cli/process-pgboss-worker.ts
@@ -28,10 +28,18 @@ import { logger } from '../utils/logger';
 import { DOCUMENT_EVALUATION_JOB, type DocumentEvaluationJobData } from '../types/jobTypes';
 import type { JobWithMetadata } from 'pg-boss';
 import { isRetryableError } from '../errors/retryableErrors';
-import { getAgentTimeout, formatTimeout } from '../config/agentTimeouts';
+import { getAgentTimeout } from '../config/agentTimeouts';
 import { updateJobCostsFromHelicone } from '../scheduled-tasks/helicone-poller';
 import { JobReconciliationService } from '../scheduled-tasks/job-reconciliation';
 
+interface JobWithAgentVersions {
+  evaluation: {
+    agent: {
+      versions: Array<{ extendedCapabilityId?: string | null }>;
+    };
+  };
+}
+
 // Schedule constants
 const HELICONE_POLLER_SCHEDULE = '*/5 * * * *'; // Every 5 minutes
 const JOB_RECONCILIATION_SCHEDULE = '*/10 * * * *'; // Every 10 minutes
@@ -160,16 +168,16 @@ class PgBossWorker {
     return `${workerPrefix}[Job ${jobId}] ${message}`;
   }
 
-  private getJobTimeout(job: any): number {
+  private getJobTimeout(job: JobWithAgentVersions): number {
     const agentVersion = job.evaluation.agent.versions[0];
-    return getAgentTimeout(agentVersion?.extendedCapabilityId ?? undefined);
+    return getAgentTimeout(agentVersion.extendedCapabilityId ?? undefined);
   }
 
   private async processJob(pgBossJob: JobWithMetadata<DocumentEvaluationJobData>) {
-    const { jobId } = pgBossJob.data;
+    const { jobId, profileId } = pgBossJob.data;
     const { retryCount, retryLimit } = pgBossJob;
 
-    logger.info(this.formatLog(jobId, `Processing (attempt ${retryCount + 1}/${retryLimit + 1})`));
+    logger.info(this.formatLog(jobId, `Processing (attempt ${retryCount + 1}/${retryLimit + 1})${profileId ? ` with profile ${profileId}` : ''}`));
 
     try {
       const job = await this.jobRepository.findByIdWithRelations(jobId);
@@ -186,7 +194,7 @@ class PgBossWorker {
         { jobId, timeoutMs },
         async () => {
           await this.jobService.markAsRunning(jobId, retryCount + 1);
-          return this.jobOrchestrator.processJob(job);
+          return this.jobOrchestrator.processJob(job, { profileId: profileId || undefined });
         }
       );
 
@@ -238,8 +246,8 @@ class PgBossWorker {
 // Bootstrap
 const worker = new PgBossWorker();
 
-process.on('SIGTERM', () => worker.shutdown());
-process.on('SIGINT', () => worker.shutdown());
+process.on('SIGTERM', () => void worker.shutdown());
+process.on('SIGINT', () => void worker.shutdown());
 
 worker.start().catch(async (error) => {
   logger.error('🔥 Fatal error:', error);
diff --git a/internal-packages/jobs/src/core/JobOrchestrator.ts b/internal-packages/jobs/src/core/JobOrchestrator.ts
index e26c86305..a1dce09e9 100644
--- a/internal-packages/jobs/src/core/JobOrchestrator.ts
+++ b/internal-packages/jobs/src/core/JobOrchestrator.ts
@@ -14,12 +14,18 @@ import {
   PluginType,
   HeliconeSessionManager,
   setGlobalSessionManager,
+  Comment,
 } from '@roast/ai';
-import { analyzeDocument, getWorkerId } from '@roast/ai/server';
+import { analyzeDocument, getWorkerId, DocumentAnalysisResult } from '@roast/ai/server';
 import { JobService } from './JobService';
 
+export interface JobProcessingOptions {
+  /** Profile ID for plugin configuration (e.g., FallacyCheckPlugin) */
+  profileId?: string;
+}
+
 export interface JobOrchestratorInterface {
-  processJob(job: JobWithRelations): Promise<JobProcessingResult>;
+  processJob(job: JobWithRelations, options?: JobProcessingOptions): Promise<JobProcessingResult>;
 }
 
 export class JobOrchestrator implements JobOrchestratorInterface {
@@ -38,8 +44,8 @@ export class JobOrchestrator implements JobOrchestratorInterface {
   /**
    * Process a complete job from start to finish
    */
-  async processJob(job: JobWithRelations): Promise<JobProcessingResult> {
-    this.logger.info(this.formatLog(job.id, 'Starting processing...'));
+  async processJob(job: JobWithRelations, options?: JobProcessingOptions): Promise<JobProcessingResult> {
+    this.logger.info(this.formatLog(job.id, `Starting processing...${options?.profileId ? ` (profile: ${options.profileId})` : ''}`));
     const startTime = Date.now();
     let sessionManager: HeliconeSessionManager | undefined;
 
@@ -57,7 +63,7 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       }
 
       // Setup Helicone session tracking
-      sessionManager = await this.setupSessionTracking(job);
+      sessionManager = this.setupSessionTracking(job);
 
       this.logger.info(this.formatLog(job.id, 'Preparing job data...'));
       // Extract and validate job data
@@ -66,10 +72,11 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       this.logger.info(this.formatLog(job.id, 'Executing analysis...'));
       // Execute document analysis using @roast/ai workflows
       const analysisResult = await this.executeAnalysis(
-        documentForAnalysis, 
-        agent, 
-        job.id, 
-        sessionManager
+        documentForAnalysis,
+        agent,
+        job.id,
+        sessionManager,
+        options?.profileId
       );
 
       this.logger.info(this.formatLog(job.id, 'Saving analysis results...'));
@@ -126,41 +133,40 @@ export class JobOrchestrator implements JobOrchestratorInterface {
   /**
    * Setup Helicone session tracking for the job
    */
-  private async setupSessionTracking(job: JobWithRelations): Promise<HeliconeSessionManager | undefined> {
+  private setupSessionTracking(job: JobWithRelations): HeliconeSessionManager | undefined {
     try {
+      // TypeScript types guarantee these are defined (Prisma include with take: 1)
       const documentVersion = job.evaluation.document.versions[0];
       const agentVersion = job.evaluation.agent.versions[0];
-      
-      if (documentVersion && agentVersion) {
-        // Use originalJobId for retries to group them under the same session
-        const sessionId = job.originalJobId || job.id;
-        const truncatedTitle = documentVersion.title.length > 50 
-          ? documentVersion.title.slice(0, 50) + '...' 
-          : documentVersion.title;
-        
-        const sessionManager = HeliconeSessionManager.forJob(
-          sessionId,
-          `${agentVersion.name} evaluating ${truncatedTitle}`,
-          {
-            JobId: job.id,
-            JobAttempt: job.originalJobId ? 'retry' : 'initial',
-            DocumentId: job.evaluation.document.id,
-            AgentId: job.evaluation.agent.id,
-            AgentVersion: agentVersion.version.toString(),
-            EvaluationId: job.evaluation.id,
-            UserId: job.evaluation.agent.submittedBy?.id || 'anonymous',
-          }
-        );
-        
-        // Set as global for automatic header propagation
-        setGlobalSessionManager(sessionManager);
-        return sessionManager;
-      }
+
+      // Use originalJobId for retries to group them under the same session
+      const sessionId = job.originalJobId || job.id;
+      const truncatedTitle = documentVersion.title.length > 50
+        ? documentVersion.title.slice(0, 50) + '...'
+        : documentVersion.title;
+
+      const sessionManager = HeliconeSessionManager.forJob(
+        sessionId,
+        `${agentVersion.name} evaluating ${truncatedTitle}`,
+        {
+          JobId: job.id,
+          JobAttempt: job.originalJobId ? 'retry' : 'initial',
+          DocumentId: job.evaluation.document.id,
+          AgentId: job.evaluation.agent.id,
+          AgentVersion: agentVersion.version.toString(),
+          EvaluationId: job.evaluation.id,
+          UserId: job.evaluation.agent.submittedBy?.id ?? 'anonymous',
+        }
+      );
+
+      // Set as global for automatic header propagation
+      setGlobalSessionManager(sessionManager);
+      return sessionManager;
     } catch (error) {
       this.logger.warn(this.formatLog(job.id, '⚠️ Failed to create Helicone session manager:'), error);
       // Continue without session tracking rather than failing the job
     }
-    
+
     return undefined;
   }
 
@@ -168,17 +174,10 @@ export class JobOrchestrator implements JobOrchestratorInterface {
    * Prepare document and agent data for analysis
    */
   private prepareJobData(job: JobWithRelations) {
+    // TypeScript types guarantee these are defined (Prisma include with take: 1)
     const documentVersion = job.evaluation.document.versions[0];
     const agentVersion = job.evaluation.agent.versions[0];
 
-    if (!documentVersion) {
-      throw new Error('Document version not found');
-    }
-
-    if (!agentVersion) {
-      throw new Error('Agent version not found');
-    }
-
     // Prepare document for analysis using Prisma's computed fullContent field
     const documentForAnalysis: Document = {
       id: job.evaluation.document.id,
@@ -203,7 +202,7 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       selfCritiqueInstructions: agentVersion.selfCritiqueInstructions || undefined,
       providesGrades: agentVersion.providesGrades || false,
       extendedCapabilityId: agentVersion.extendedCapabilityId || undefined,
-      pluginIds: (agentVersion.pluginIds || []) as PluginType[], // Cast to PluginType[] since DB stores as strings
+      pluginIds: agentVersion.pluginIds as PluginType[], // Cast to PluginType[] since DB stores as strings
     };
 
     return { documentForAnalysis, agent, documentVersion, agentVersion };
@@ -213,23 +212,32 @@ export class JobOrchestrator implements JobOrchestratorInterface {
    * Execute the document analysis workflow
    */
   private async executeAnalysis(
-    documentForAnalysis: Document, 
-    agent: Agent, 
-    jobId: string, 
-    sessionManager?: HeliconeSessionManager
+    documentForAnalysis: Document,
+    agent: Agent,
+    jobId: string,
+    sessionManager?: HeliconeSessionManager,
+    profileId?: string
   ) {
+    // Use options-based signature to pass profileId
+    const analysisOptions = {
+      targetWordCount: 500,
+      targetHighlights: 5,
+      jobId,
+      fallacyCheckProfileId: profileId,
+    };
+
     // Track the analysis phase with session manager
-    return await (sessionManager 
+    return await (sessionManager
       ? sessionManager.trackAnalysis('document', async () => {
-          return analyzeDocument(documentForAnalysis, agent, 500, 5, jobId);
+          return analyzeDocument(documentForAnalysis, agent, analysisOptions);
         })
-      : analyzeDocument(documentForAnalysis, agent, 500, 5, jobId));
+      : analyzeDocument(documentForAnalysis, agent, analysisOptions));
   }
 
   /**
    * Save analysis results to database
    */
-  private async saveAnalysisResults(job: JobWithRelations, analysisResult: any, agent: Agent) {
+  private async saveAnalysisResults(job: JobWithRelations, analysisResult: DocumentAnalysisResult, agent: Agent) {
     const { tasks, ...evaluationOutputs } = analysisResult;
 
     // Get the latest version number for this evaluation
@@ -258,6 +266,7 @@ export class JobOrchestrator implements JobOrchestratorInterface {
         agentVersionId: agentVersion.id,
         evaluationId: job.evaluation.id,
         documentVersionId: documentVersion.id,
+        pipelineTelemetry: evaluationOutputs.pipelineTelemetry ?? undefined,
         job: {
           connect: {
             id: job.id,
@@ -281,7 +290,7 @@ export class JobOrchestrator implements JobOrchestratorInterface {
     }
 
     // Save highlights to database
-    const highlights = evaluationOutputs.highlights || [];
+    const highlights = evaluationOutputs.highlights;
     if (highlights.length > 0) {
       // Use fullContent (which includes markdownPrepend) for validation
       // since highlights were generated based on the full content
@@ -292,12 +301,12 @@ export class JobOrchestrator implements JobOrchestratorInterface {
 
   /**
    * Save highlights with validation
-   * 
+   *
    * Note: Highlights are linked to evaluations through comments (not directly).
    * This ensures every highlight has an associated comment for context.
    */
-  private async saveHighlights(highlights: any[], evaluationVersionId: string, fullContent: string, jobId: string) {
-    if (!highlights || highlights.length === 0) {
+  private async saveHighlights(highlights: Comment[], evaluationVersionId: string, fullContent: string, jobId: string) {
+    if (highlights.length === 0) {
       return;
     }
 
@@ -305,34 +314,22 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       // Validate highlight by checking if quotedText matches document at specified offsets
       let isValid = true;
       let error: string | null = null;
-      
-      if (!comment.highlight) {
-        isValid = false;
-        error = 'Highlight is missing';
-      } else {
-        try {
-          const actualText = fullContent.slice(
-            comment.highlight.startOffset, 
-            comment.highlight.endOffset
-          );
-          
-          if (actualText !== comment.highlight.quotedText) {
-            isValid = false;
-            error = `Text mismatch: expected "${comment.highlight.quotedText}" but found "${actualText}" at offsets ${comment.highlight.startOffset}-${comment.highlight.endOffset}`;
-            this.logger.warn(this.formatLog(jobId, `Invalid highlight detected: ${error}`));
-          }
-        } catch (highlightError) {
+
+      try {
+        const actualText = fullContent.slice(
+          comment.highlight.startOffset,
+          comment.highlight.endOffset
+        );
+
+        if (actualText !== comment.highlight.quotedText) {
           isValid = false;
-          error = `Validation error: ${highlightError instanceof Error ? highlightError.message : String(highlightError)}`;
-          this.logger.warn(this.formatLog(jobId, `Highlight validation failed: ${error}`));
+          error = `Text mismatch: expected "${comment.highlight.quotedText}" but found "${actualText}" at offsets ${comment.highlight.startOffset}-${comment.highlight.endOffset}`;
+          this.logger.warn(this.formatLog(jobId, `Invalid highlight detected: ${error}`));
         }
-      }
-
-      // Only create highlight if we have highlight data
-      if (!comment.highlight) {
-        // Skip this comment if no highlight data
-        this.logger.warn(this.formatLog(jobId, `Skipping comment without highlight data: ${comment.description}`));
-        continue;
+      } catch (highlightError) {
+        isValid = false;
+        error = `Validation error: ${highlightError instanceof Error ? highlightError.message : String(highlightError)}`;
+        this.logger.warn(this.formatLog(jobId, `Highlight validation failed: ${error}`));
       }
 
       // Create highlight with validation status
@@ -351,12 +348,12 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       await prisma.evaluationComment.create({
         data: {
           description: comment.description || 'No description',
-          importance: comment.importance || null,
-          grade: comment.grade || null,
-          header: comment.header || null,
-          level: comment.level || null,
-          source: comment.source || null,
-          metadata: comment.metadata || null,
+          importance: comment.importance ?? null,
+          grade: comment.grade ?? null,
+          header: comment.header ?? null,
+          level: comment.level ?? null,
+          source: comment.source ?? null,
+          metadata: comment.metadata ?? undefined,
           evaluationVersionId,
           highlightId: createdHighlight.id,
         },
@@ -370,7 +367,7 @@ export class JobOrchestrator implements JobOrchestratorInterface {
    */
   private createExecutionLog(
     job: JobWithRelations,
-    analysisResult: any,
+    analysisResult: DocumentAnalysisResult,
     durationInSeconds: number,
     startTime: number
   ): string {
@@ -390,14 +387,14 @@ export class JobOrchestrator implements JobOrchestratorInterface {
       `- Status: SUCCESS`,
       ``,
       `## Analysis Summary`,
-      `- Highlights generated: ${analysisResult.highlights?.length || 0}`,
+      `- Highlights generated: ${analysisResult.highlights.length}`,
       `- Grade: ${analysisResult.grade || 'N/A'}`,
       `- Self-critique: ${analysisResult.selfCritique ? 'Yes' : 'No'}`,
       ``,
       `## Task Breakdown`,
     ];
 
-    if (analysisResult.tasks && analysisResult.tasks.length > 0) {
+    if (analysisResult.tasks.length > 0) {
       for (const task of analysisResult.tasks) {
         log.push(`### ${task.name}`);
         log.push(`- Model: ${task.modelName}`);
diff --git a/internal-packages/jobs/src/core/JobService.ts b/internal-packages/jobs/src/core/JobService.ts
index 9a68bc25a..e4821f185 100644
--- a/internal-packages/jobs/src/core/JobService.ts
+++ b/internal-packages/jobs/src/core/JobService.ts
@@ -27,8 +27,12 @@ export class JobService {
   /**
    * Create a job for processing
    * Creates both Job table record and pg-boss queue entry
+   *
+   * @param evaluationId - The evaluation to process
+   * @param agentEvalBatchId - Optional batch ID for grouping jobs
+   * @param profileId - Optional profile ID for plugin configuration (e.g., FallacyCheckPlugin)
    */
-  async createJob(evaluationId: string, agentEvalBatchId?: string): Promise<JobEntity> {
+  async createJob(evaluationId: string, agentEvalBatchId?: string, profileId?: string): Promise<JobEntity> {
     // Lazy-init prevents race conditions by ensuring the queue is connected
     // before we try to use it. Safe to call repeatedly due to promise locking.
     //
@@ -47,6 +51,7 @@ export class JobService {
       jobId: job.id,
       evaluationId,
       agentEvalBatchId: agentEvalBatchId || null,
+      profileId: profileId || null,
     };
 
     try {
diff --git a/internal-packages/jobs/src/types/jobTypes.ts b/internal-packages/jobs/src/types/jobTypes.ts
index 4d4355c57..4510057d5 100644
--- a/internal-packages/jobs/src/types/jobTypes.ts
+++ b/internal-packages/jobs/src/types/jobTypes.ts
@@ -24,6 +24,8 @@ export interface DocumentEvaluationJobData {
   jobId: string; // Reference to Job table record
   evaluationId: string;
   agentEvalBatchId?: string | null;
+  /** Profile ID for plugin configuration (e.g., FallacyCheckPlugin) */
+  profileId?: string | null;
 }
 
 /**
diff --git a/internal-packages/jobs/tsconfig.lint.json b/internal-packages/jobs/tsconfig.lint.json
new file mode 100644
index 000000000..631f88a47
--- /dev/null
+++ b/internal-packages/jobs/tsconfig.lint.json
@@ -0,0 +1,5 @@
+{
+  "extends": "./tsconfig.json",
+  "include": ["src/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
diff --git a/meta-evals/package.json b/meta-evals/package.json
index 3aee47594..937fe2bbc 100644
--- a/meta-evals/package.json
+++ b/meta-evals/package.json
@@ -8,12 +8,15 @@
     "start": "node --import tsx/esm src/index.tsx"
   },
   "dependencies": {
+    "@leeoniya/ufuzzy": "^1.0.18",
     "@roast/ai": "workspace:*",
     "@roast/db": "workspace:*",
     "dotenv": "^16.4.5",
+    "fuse.js": "^7.1.0",
     "ink": "^6.5.1",
     "ink-select-input": "^6.2.0",
     "ink-spinner": "^5.0.0",
+    "ink-text-input": "^6.0.0",
     "react": "^19.2.1"
   },
   "devDependencies": {
diff --git a/meta-evals/src/app.tsx b/meta-evals/src/app.tsx
index 4df1de038..f6a2d683b 100644
--- a/meta-evals/src/app.tsx
+++ b/meta-evals/src/app.tsx
@@ -12,7 +12,7 @@ import {
   type AgentChoice,
 } from "@roast/db";
 import { apiClient } from "./utils/apiClient";
-import { MainMenu, CreateBaseline, SeriesDetail, RankRuns, ScoreRun, type Screen } from "./components";
+import { MainMenu, ScoreRankMenu, CreateBaseline, SeriesDetail, RankRuns, ScoreRun, Validation, ExtractorLab, type Screen } from "./components";
 import { getAvailableModels, getRecommendedJudgeModels, DEFAULT_JUDGE_MODEL, type ModelInfo } from "./utils/models";
 
 // ============================================================================
@@ -128,8 +128,8 @@ export function App() {
 
   // Load initial data
   useEffect(() => {
-    loadMainMenu();
     loadModels();
+    setScreen({ type: "main-menu" });
   }, []);
 
   async function loadModels() {
@@ -144,10 +144,14 @@ export function App() {
   }
 
   async function loadMainMenu() {
+    setScreen({ type: "main-menu" });
+  }
+
+  async function loadScoreRankMenu() {
     setScreen({ type: "loading" });
     try {
       const series = await metaEvaluationRepository.getSeries();
-      setScreen({ type: "main-menu", series });
+      setScreen({ type: "score-rank-menu", series });
     } catch (e) {
       setError(String(e));
     }
@@ -171,13 +175,70 @@ export function App() {
     }
   }
 
+  async function searchDocuments(filter: string) {
+    try {
+      const docs = await metaEvaluationRepository.getRecentDocuments(filter || undefined);
+      setDocuments(docs);
+    } catch (e) {
+      // Silently fail - keep existing documents
+    }
+  }
+
+  async function startExtractorLab() {
+    setScreen({ type: "loading" });
+    try {
+      // Get agents and use first one (usually Fallacy Check)
+      const userId = await apiClient.getUserId();
+      const agentChoices = await metaEvaluationRepository.getAvailableAgents(userId);
+      if (agentChoices.length === 0) {
+        setError("No agents available");
+        return;
+      }
+      const agentId = agentChoices[0].id;
+
+      // Get validation corpus for this agent (same as Validation screen)
+      const corpusDocs = await metaEvaluationRepository.getValidationCorpusDocuments(
+        agentId,
+        { limit: 50, minContentLength: 200 }
+      );
+
+      // Map to DocumentChoice format
+      const docs = corpusDocs.map((d) => ({
+        id: d.documentId,
+        title: d.title,
+        createdAt: d.lastEvaluatedAt || new Date(),
+      }));
+
+      setDocuments(docs);
+      setScreen({ type: "extractor-lab" });
+    } catch (e) {
+      setError(String(e));
+    }
+  }
+
   // Handle keyboard shortcuts
+  // Disable "q" quit when on document step (text input is active)
+  const isTextInputActive = screen.type === "create-baseline" && screen.step === "document";
   useInput((input, key) => {
-    if (input === "q" || (key.ctrl && input === "c")) {
+    if (key.ctrl && input === "c") {
+      exit();
+    }
+    if (input === "q" && !isTextInputActive) {
       exit();
     }
     if (key.escape) {
-      if (screen.type !== "main-menu") {
+      // Screens with internal escape navigation handle it themselves
+      const screensWithInternalEscape = [
+        "main-menu",
+        "extractor-lab",
+        "validation",
+        "score-rank-menu",
+        "series-detail",
+        "rank-runs",
+        "score-run",
+        "create-baseline",
+      ];
+      if (!screensWithInternalEscape.includes(screen.type)) {
         loadMainMenu();
       }
     }
@@ -205,11 +266,10 @@ export function App() {
   if (screen.type === "main-menu") {
     return (
       <MainMenu
-        series={screen.series}
-        maxItems={maxListItems}
         height={termHeight}
-        onCreateBaseline={startCreateBaseline}
-        onSelectSeries={(id) => setScreen({ type: "series-detail", seriesId: id })}
+        onScoreRank={loadScoreRankMenu}
+        onValidation={() => setScreen({ type: "validation" })}
+        onExtractorLab={startExtractorLab}
         onExit={exit}
         judgeModel={judgeModel}
         availableModels={availableModels}
@@ -222,6 +282,25 @@ export function App() {
     );
   }
 
+  if (screen.type === "score-rank-menu") {
+    return (
+      <ScoreRankMenu
+        series={screen.series}
+        maxItems={maxListItems}
+        height={termHeight}
+        judgeModel={judgeModel}
+        onCreateSeries={startCreateBaseline}
+        onSelectSeries={(id) => setScreen({ type: "series-detail", seriesId: id })}
+        onDeleteSeries={async (id) => {
+          await metaEvaluationRepository.deleteSeries(id);
+          // Reload the menu
+          loadScoreRankMenu();
+        }}
+        onBack={loadMainMenu}
+      />
+    );
+  }
+
   if (screen.type === "create-baseline") {
     return (
       <CreateBaseline
@@ -240,6 +319,7 @@ export function App() {
           setSelectedAgents(ags);
           setScreen({ type: "create-baseline", step: "confirm" });
         }}
+        onSearchDocuments={searchDocuments}
         onConfirm={async () => {
           setScreen({ type: "create-baseline", step: "creating" });
           try {
@@ -252,10 +332,10 @@ export function App() {
             setScreen({ type: "series-detail", seriesId });
           } catch (e) {
             setError(String(e));
-            loadMainMenu();
+            loadScoreRankMenu();
           }
         }}
-        onBack={loadMainMenu}
+        onBack={loadScoreRankMenu}
       />
     );
   }
@@ -266,7 +346,7 @@ export function App() {
         seriesId={screen.seriesId}
         maxItems={maxListItems}
         height={termHeight}
-        onBack={loadMainMenu}
+        onBack={loadScoreRankMenu}
         onRunAgain={async (seriesId, documentId) => {
           try {
             await runAgain(seriesId, documentId);
@@ -318,5 +398,38 @@ export function App() {
     );
   }
 
+  if (screen.type === "validation") {
+    return (
+      <Validation
+        height={termHeight}
+        maxItems={maxListItems}
+        onBack={loadMainMenu}
+        onCreateBatch={async (agentId, documentIds) => {
+          // Create batch jobs for the agent on selected documents
+          const response = await apiClient.post<BatchCreateResponse>("/api/batches", {
+            agentId,
+            documentIds,
+            name: `Validation run`,
+          });
+
+          // Get job IDs from the batch
+          return await getJobsForBatch(response.batch.id);
+        }}
+      />
+    );
+  }
+
+  if (screen.type === "extractor-lab") {
+    return (
+      <ExtractorLab
+        height={termHeight}
+        maxItems={maxListItems}
+        documents={documents}
+        onSearchDocuments={searchDocuments}
+        onBack={loadMainMenu}
+      />
+    );
+  }
+
   return null;
 }
diff --git a/meta-evals/src/components/CreateBaseline.tsx b/meta-evals/src/components/CreateBaseline.tsx
index 8d3f789a0..73f6a3976 100644
--- a/meta-evals/src/components/CreateBaseline.tsx
+++ b/meta-evals/src/components/CreateBaseline.tsx
@@ -3,11 +3,12 @@
  */
 
 import React, { useState } from "react";
-import { Box, Text } from "ink";
+import { Box, Text, useInput } from "ink";
 import SelectInput from "ink-select-input";
 import Spinner from "ink-spinner";
 import type { DocumentChoice, AgentChoice } from "./types";
-import { truncate, formatDate } from "./helpers";
+import { truncate } from "./helpers";
+import { DocumentSelector } from "./DocumentSelector";
 
 interface CreateBaselineProps {
   step: "document" | "agents" | "confirm" | "creating";
@@ -19,6 +20,7 @@ interface CreateBaselineProps {
   height: number;
   onSelectDocument: (doc: DocumentChoice) => void;
   onSelectAgents: (agents: AgentChoice[]) => void;
+  onSearchDocuments: (filter: string) => void;
   onConfirm: () => void;
   onBack: () => void;
 }
@@ -33,11 +35,37 @@ export function CreateBaseline({
   height,
   onSelectDocument,
   onSelectAgents,
+  onSearchDocuments,
   onConfirm,
   onBack,
 }: CreateBaselineProps) {
   const [agentSelection, setAgentSelection] = useState<Set<string>>(new Set());
 
+  // Handle escape to go back (document step handles its own escape via DocumentSelector)
+  useInput((input, key) => {
+    if (key.escape && step !== "document") {
+      onBack();
+    }
+  });
+
+  // Document selection using reusable DocumentSelector
+  if (step === "document") {
+    return (
+      <DocumentSelector
+        title="Create New Baseline - Select Document"
+        subtitle="Step 1/2: Select a document"
+        borderColor="yellow"
+        height={height}
+        maxItems={maxItems}
+        documents={documents}
+        showFilter={true}
+        onFilterChange={onSearchDocuments}
+        onSelect={onSelectDocument}
+        onCancel={onBack}
+      />
+    );
+  }
+
   if (step === "creating") {
     return (
       <Box padding={1}>
@@ -48,6 +76,7 @@ export function CreateBaseline({
     );
   }
 
+  // Remaining steps: agents and confirm
   return (
     <Box flexDirection="column" borderStyle="round" borderColor="yellow" padding={1} height={height} overflow="hidden">
       <Box justifyContent="center" marginBottom={1}>
@@ -56,25 +85,6 @@ export function CreateBaseline({
         </Text>
       </Box>
 
-      {step === "document" && (
-        <>
-          <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
-            <Text>Step 1/2: Select a document ({documents.length} available)</Text>
-          </Box>
-          <SelectInput
-            items={documents.map((d, i) => ({
-              label: `${String(i + 1).padStart(2)} | ${truncate(d.title, 50)} | ${formatDate(d.createdAt)}`,
-              value: d.id,
-            }))}
-            limit={maxItems}
-            onSelect={(item) => {
-              const doc = documents.find((d) => d.id === item.value);
-              if (doc) onSelectDocument(doc);
-            }}
-          />
-        </>
-      )}
-
       {step === "agents" && (
         <>
           <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
diff --git a/meta-evals/src/components/DocumentSelector.tsx b/meta-evals/src/components/DocumentSelector.tsx
new file mode 100644
index 000000000..7d61f6fbd
--- /dev/null
+++ b/meta-evals/src/components/DocumentSelector.tsx
@@ -0,0 +1,267 @@
+/**
+ * DocumentSelector - Reusable component for selecting documents
+ *
+ * Supports both single-select and multi-select modes, with optional text filtering.
+ */
+
+import React, { useState, useEffect, useRef } from "react";
+import { Box, Text, useInput, useStdout } from "ink";
+import TextInput from "ink-text-input";
+import SelectInput from "ink-select-input";
+import Spinner from "ink-spinner";
+import type { DocumentChoice } from "@roast/db";
+import { formatDate } from "./helpers";
+
+function truncate(str: string, maxLen: number): string {
+  if (str.length <= maxLen) return str;
+  return str.slice(0, maxLen - 1) + "…";
+}
+
+export interface DocumentSelectorProps {
+  /** Title shown at the top */
+  title?: string;
+  /** Subtitle/instruction text */
+  subtitle?: string;
+  /** Border color */
+  borderColor?: string;
+  /** Container height */
+  height: number;
+  /** Max items to show in the list */
+  maxItems: number;
+  /** Documents to display */
+  documents: DocumentChoice[];
+  /** Enable text filter input */
+  showFilter?: boolean;
+  /** Called when filter text changes (for server-side filtering) */
+  onFilterChange?: (filter: string) => void;
+  /** Enable multi-select mode */
+  multiSelect?: boolean;
+  /** Pre-selected document IDs (for multi-select) */
+  selectedIds?: Set<string>;
+  /** Called when a document is selected (single-select mode) */
+  onSelect?: (doc: DocumentChoice) => void;
+  /** Called when selection changes (multi-select mode) */
+  onSelectionChange?: (selectedIds: Set<string>) => void;
+  /** Called when confirmed (multi-select mode) */
+  onConfirm?: (selectedDocs: DocumentChoice[]) => void;
+  /** Called when cancelled */
+  onCancel: () => void;
+  /** Confirm button label (multi-select mode) */
+  confirmLabel?: string;
+}
+
+export function DocumentSelector({
+  title = "Select Document",
+  subtitle,
+  borderColor = "cyan",
+  height,
+  maxItems,
+  documents,
+  showFilter = false,
+  onFilterChange,
+  multiSelect = false,
+  selectedIds: externalSelectedIds,
+  onSelect,
+  onSelectionChange,
+  onConfirm,
+  onCancel,
+  confirmLabel = "Confirm Selection",
+}: DocumentSelectorProps) {
+  const { stdout } = useStdout();
+  const [filter, setFilter] = useState("");
+  const [isSearching, setIsSearching] = useState(false);
+  const [internalSelectedIds, setInternalSelectedIds] = useState<Set<string>>(
+    externalSelectedIds || new Set()
+  );
+  const debounceRef = useRef<NodeJS.Timeout | null>(null);
+
+  // Calculate available width for title based on terminal width
+  // Layout: border(2) + padding(2) + "❯ "(2) + "  1 | "(6) + title + " | "(3) + date(12) = 27 overhead
+  // For multiSelect: "[x] "(4) instead of index, no date = 10 overhead
+  const termWidth = stdout?.columns ?? 120;
+  const overhead = multiSelect ? 10 : 27;
+  const titleWidth = Math.max(40, termWidth - overhead);
+
+  // Use external or internal selected IDs
+  const selectedIds = externalSelectedIds || internalSelectedIds;
+  const setSelectedIds = onSelectionChange
+    ? (ids: Set<string>) => onSelectionChange(ids)
+    : setInternalSelectedIds;
+
+  // Debounced filter change - only trigger when user actually types something
+  useEffect(() => {
+    if (!showFilter || !onFilterChange) return;
+    // Don't trigger search on empty filter - keep initial documents
+    if (filter.length === 0) return;
+
+    if (debounceRef.current) {
+      clearTimeout(debounceRef.current);
+    }
+
+    setIsSearching(true);
+
+    debounceRef.current = setTimeout(() => {
+      onFilterChange(filter);
+      setIsSearching(false);
+    }, 300);
+
+    return () => {
+      if (debounceRef.current) {
+        clearTimeout(debounceRef.current);
+      }
+    };
+  }, [filter, showFilter, onFilterChange]);
+
+  // Handle escape to cancel
+  useInput((input, key) => {
+    if (key.escape) {
+      onCancel();
+    }
+  });
+
+  function toggleDocument(docId: string) {
+    const newSelected = new Set(selectedIds);
+    if (newSelected.has(docId)) {
+      newSelected.delete(docId);
+    } else {
+      newSelected.add(docId);
+    }
+    setSelectedIds(newSelected);
+  }
+
+  function toggleAll() {
+    const allSelected = documents.every((d) => selectedIds.has(d.id));
+    if (allSelected) {
+      setSelectedIds(new Set());
+    } else {
+      setSelectedIds(new Set(documents.map((d) => d.id)));
+    }
+  }
+
+  // Build items list
+  const items: Array<{ label: string; value: string }> = [];
+
+  if (multiSelect) {
+    // Add "Select All" option
+    const allSelected = documents.length > 0 && documents.every((d) => selectedIds.has(d.id));
+    items.push({
+      label: `[${allSelected ? "x" : " "}] Select All (${documents.length})`,
+      value: "toggle-all",
+    });
+  }
+
+  // Add document items
+  const displayDocs = documents.slice(0, maxItems - (multiSelect ? 4 : 2));
+  for (let i = 0; i < displayDocs.length; i++) {
+    const d = displayDocs[i];
+    if (multiSelect) {
+      items.push({
+        label: `[${selectedIds.has(d.id) ? "x" : " "}] ${truncate(d.title, titleWidth)}`,
+        value: d.id,
+      });
+    } else {
+      items.push({
+        label: `${String(i + 1).padStart(2)} | ${truncate(d.title, titleWidth).padEnd(titleWidth)} | ${formatDate(new Date(d.createdAt))}`,
+        value: d.id,
+      });
+    }
+  }
+
+  if (documents.length > displayDocs.length) {
+    items.push({
+      label: `... and ${documents.length - displayDocs.length} more`,
+      value: "more",
+    });
+  }
+
+  if (multiSelect) {
+    const selectedCount = selectedIds.size;
+    items.push({
+      label: selectedCount > 0 ? `✓ ${confirmLabel} (${selectedCount} docs)` : "Select documents first",
+      value: "confirm",
+    });
+  }
+
+  items.push({ label: "← Cancel", value: "cancel" });
+
+  return (
+    <Box
+      flexDirection="column"
+      borderStyle="round"
+      borderColor={borderColor}
+      padding={1}
+      height={height}
+      overflow="hidden"
+    >
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color={borderColor}>
+          {title}
+        </Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
+        <Text>
+          {subtitle || `${documents.length} document${documents.length !== 1 ? "s" : ""} found`}
+          {filter && ` for "${filter}"`}
+        </Text>
+      </Box>
+
+      {showFilter && (
+        <Box marginBottom={1} paddingX={1}>
+          <Text dimColor>Search: </Text>
+          <TextInput
+            value={filter}
+            onChange={setFilter}
+            placeholder="type to filter..."
+          />
+          {isSearching && (
+            <Text dimColor>
+              {" "}
+              <Spinner type="dots" />
+            </Text>
+          )}
+        </Box>
+      )}
+
+      <SelectInput
+        items={items}
+        limit={maxItems - (showFilter ? 5 : 3)}
+        onSelect={(item) => {
+          if (item.value === "cancel") {
+            onCancel();
+          } else if (item.value === "more") {
+            // Ignore "more" item
+          } else if (multiSelect) {
+            if (item.value === "toggle-all") {
+              toggleAll();
+            } else if (item.value === "confirm") {
+              if (selectedIds.size > 0 && onConfirm) {
+                const selectedDocs = documents.filter((d) => selectedIds.has(d.id));
+                onConfirm(selectedDocs);
+              }
+            } else {
+              toggleDocument(item.value);
+            }
+          } else {
+            // Single select mode
+            const doc = documents.find((d) => d.id === item.value);
+            if (doc && onSelect) {
+              onSelect(doc);
+            }
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>
+          {multiSelect
+            ? "Enter Toggle | Escape Cancel"
+            : "Enter Select | Escape Cancel"}
+        </Text>
+      </Box>
+    </Box>
+  );
+}
+
+// Re-export types for convenience
+export type { DocumentChoice };
diff --git a/meta-evals/src/components/ExtractorLab.tsx b/meta-evals/src/components/ExtractorLab.tsx
new file mode 100644
index 000000000..d2f5e2fc8
--- /dev/null
+++ b/meta-evals/src/components/ExtractorLab.tsx
@@ -0,0 +1,497 @@
+/**
+ * Extractor Lab - Test extraction in isolation
+ *
+ * Allows running the fallacy extractor directly without the full pipeline,
+ * for quick iteration on extractor config and prompts.
+ */
+
+import React, { useState, useRef } from "react";
+import { useInput, useStdout } from "ink";
+import { prisma, type DocumentChoice } from "@roast/db";
+import {
+  getMultiExtractorConfig,
+  type ExtractorConfig,
+  type MultiExtractorResult,
+} from "@roast/ai/fallacy-extraction/lab";
+import { runMultiExtractor } from "@roast/ai/fallacy-extraction";
+import fallacyJudgeModule from "@roast/ai/fallacy-judge";
+import type { JudgeConfig } from "@roast/ai/fallacy-judge/types";
+import { ModelSelector } from "./ModelSelector";
+import { DocumentSelector } from "./DocumentSelector";
+
+// Import extracted modules
+import type {
+  ExtractorLabProps,
+  LabStep,
+  JudgeRunResult,
+  ExtractorIssue,
+  DedupStrategy,
+  DedupComparison,
+  MultiStrategyDedupResult,
+} from "./extractor-lab/types";
+import {
+  truncate,
+  simpleLogger,
+  TEMP_PRESETS,
+  calculateTextWidths,
+  runMultiStrategyDedup,
+} from "./extractor-lab/utils";
+import {
+  ErrorView,
+  RunningView,
+  RunningJudgeView,
+  ConfigureExtractorsView,
+  IssueDetailView,
+  ResultsView,
+  PreJudgeDedupView,
+  JudgeResultsView,
+  JudgeDecisionDetailView,
+  JudgeComparisonView,
+} from "./extractor-lab/views";
+
+// CommonJS/ESM interop
+const fallacyJudgeTool = (fallacyJudgeModule as unknown as { default?: typeof fallacyJudgeModule }).default ?? fallacyJudgeModule;
+const { getJudgesConfig, generateJudgeLabel } = fallacyJudgeModule as unknown as {
+  getJudgesConfig: () => JudgeConfig[];
+  generateJudgeLabel: (config: JudgeConfig) => string;
+};
+
+// Load extractor configs from FALLACY_EXTRACTORS env var, fallback to default
+function getInitialExtractorConfigs(): ExtractorConfig[] {
+  try {
+    const config = getMultiExtractorConfig();
+    return config.extractors;
+  } catch {
+    return [{ model: "claude-sonnet-4-5-20250929", temperature: "default", thinking: false }];
+  }
+}
+
+export function ExtractorLab({ height, maxItems, documents, onSearchDocuments, onBack }: ExtractorLabProps) {
+  const { stdout } = useStdout();
+  const [step, setStep] = useState<LabStep>({ type: "select-document" });
+  const [selectedDoc, setSelectedDoc] = useState<DocumentChoice | null>(null);
+  const [documentText, setDocumentText] = useState<string>("");
+
+  // Calculate widths
+  const termWidth = stdout?.columns ?? 120;
+  const { issueTextWidth, judgeTextWidth } = calculateTextWidths(termWidth);
+
+  const [extractorConfigs, setExtractorConfigs] = useState<ExtractorConfig[]>(getInitialExtractorConfigs);
+  const [availableJudges] = useState<JudgeConfig[]>(() => getJudgesConfig());
+  const [selectedJudgeIdxs, setSelectedJudgeIdxs] = useState<Set<number>>(() => new Set([0]));
+  const [selectedStrategy, setSelectedStrategy] = useState<DedupStrategy>("jaccard");
+  const [error, setError] = useState<string | null>(null);
+  const [highlightedItem, setHighlightedItem] = useState<string>("");
+
+  const stepRef = useRef(step);
+  stepRef.current = step;
+  const highlightedRef = useRef(highlightedItem);
+  highlightedRef.current = highlightedItem;
+
+  // ─────────────────────────────────────────────────────────────────────────────
+  // Actions
+  // ─────────────────────────────────────────────────────────────────────────────
+
+  async function loadDocumentText(docId: string) {
+    try {
+      const doc = await prisma.document.findUnique({
+        where: { id: docId },
+        include: {
+          versions: {
+            orderBy: { version: "desc" },
+            take: 1,
+            select: { content: true },
+          },
+        },
+      });
+      if (doc?.versions[0]?.content) {
+        setDocumentText(doc.versions[0].content);
+      } else {
+        setError("Document has no content");
+      }
+    } catch (err) {
+      setError(`Failed to load document: ${err}`);
+    }
+  }
+
+  async function runExtraction() {
+    setStep({ type: "running" });
+    try {
+      const result = await runMultiExtractor(documentText, {
+        extractors: extractorConfigs,
+        judge: { model: "", enabled: false }, // We'll run judge manually for instrumentation
+      });
+      setStep({ type: "results", result });
+    } catch (err) {
+      setError(`Extraction failed: ${err}`);
+      setStep({ type: "configure-extractors" });
+    }
+  }
+
+  function runPreJudgeDedup(extractionResult: MultiExtractorResult, navigate = true): MultiStrategyDedupResult | null {
+    try {
+      const multiDedup = runMultiStrategyDedup(extractionResult);
+      if (navigate) {
+        setStep({ type: "pre-judge-dedup", result: extractionResult, multiDedup, selectedStrategy });
+      }
+      return multiDedup;
+    } catch (err) {
+      setError(`Dedup failed: ${err}`);
+      return null;
+    }
+  }
+
+  async function runJudge(
+    extractionResult: MultiExtractorResult,
+    dedupIssues: ExtractorIssue[],
+    judgeConfig?: JudgeConfig,
+    judgeLabel?: string
+  ): Promise<JudgeRunResult> {
+    const extractorIds = extractionResult.extractorResults
+      .filter((r) => !r.error)
+      .map((r) => r.extractorId);
+
+    const startTime = Date.now();
+
+    try {
+      const judgeResult = await fallacyJudgeTool.execute(
+        {
+          documentText,
+          issues: dedupIssues,
+          extractorIds,
+          judgeConfig,
+        },
+        { logger: simpleLogger }
+      );
+
+      return {
+        config: judgeConfig!,
+        label: judgeLabel || "default",
+        result: judgeResult,
+        durationMs: Date.now() - startTime,
+      };
+    } catch (err) {
+      return {
+        config: judgeConfig!,
+        label: judgeLabel || "default",
+        result: {
+          acceptedDecisions: [],
+          rejectedDecisions: [],
+          summary: { totalInputIssues: dedupIssues.length, uniqueGroups: 0, acceptedCount: 0, mergedCount: 0, rejectedCount: 0 },
+        },
+        durationMs: Date.now() - startTime,
+        error: err instanceof Error ? err.message : String(err),
+      };
+    }
+  }
+
+  async function runMultipleJudges(
+    extractionResult: MultiExtractorResult,
+    dedupResult: DedupComparison,
+    judgeConfigs: JudgeConfig[]
+  ) {
+    // Convert DedupComparison to PreJudgeDedupResult for running-judge step
+    // Extract just the duplicate issues (not the match info)
+    const preDedupResult = {
+      unique: dedupResult.unique,
+      duplicates: dedupResult.duplicates.map(m => m.duplicate),
+      originalCount: dedupResult.originalCount,
+    };
+    setStep({ type: "running-judge", result: extractionResult, dedupResult: preDedupResult, judgeConfigs });
+
+    const results = await Promise.all(
+      judgeConfigs.map((config) =>
+        runJudge(extractionResult, dedupResult.unique, config, generateJudgeLabel(config))
+      )
+    );
+
+    if (results.length === 1 && !results[0].error) {
+      setStep({
+        type: "judge-results",
+        result: extractionResult,
+        judgeResult: results[0].result,
+        judgeLabel: results[0].label,
+      });
+    } else {
+      setStep({ type: "judge-comparison", result: extractionResult, judgeResults: results });
+    }
+  }
+
+  // ─────────────────────────────────────────────────────────────────────────────
+  // Keyboard handling
+  // ─────────────────────────────────────────────────────────────────────────────
+
+  useInput((input, key) => {
+    if (key.escape) {
+      const currentStep = stepRef.current;
+
+      if (currentStep.type === "issue-detail") {
+        setStep({ type: "results", result: currentStep.result });
+      } else if (currentStep.type === "judge-decision-detail") {
+        const { result, judgeResult, judgeLabel, judgeResults } = currentStep;
+        setStep({ type: "judge-results", result, judgeResult, judgeLabel, judgeResults });
+      } else if (currentStep.type === "judge-results") {
+        const { result, judgeResults } = currentStep;
+        if (judgeResults) {
+          setStep({ type: "judge-comparison", result, judgeResults });
+        } else {
+          const multiDedup = runPreJudgeDedup(result, false);
+          if (multiDedup) {
+            setStep({ type: "pre-judge-dedup", result, multiDedup, selectedStrategy });
+          }
+        }
+      } else if (currentStep.type === "judge-comparison") {
+        setStep({ type: "results", result: currentStep.result });
+      } else if (currentStep.type === "pre-judge-dedup") {
+        setStep({ type: "results", result: currentStep.result });
+      } else if (currentStep.type === "results") {
+        setStep({ type: "configure-extractors" });
+      } else if (currentStep.type === "configure-extractors" || currentStep.type === "add-extractor") {
+        setStep({ type: "select-document" });
+      } else if (currentStep.type === "select-document") {
+        onBack();
+      }
+    }
+
+    // Keyboard shortcuts for configure screen
+    if (stepRef.current.type === "configure-extractors") {
+      const highlighted = highlightedRef.current;
+
+      if (highlighted.startsWith("config-")) {
+        const idx = parseInt(highlighted.replace("config-", ""), 10);
+
+        if (input === "d") {
+          setExtractorConfigs(configs => {
+            if (configs.length <= 1) return configs;
+            return configs.filter((_, i) => i !== idx);
+          });
+        } else if (input === "t") {
+          setExtractorConfigs(configs =>
+            configs.map((c, i) => {
+              if (i !== idx) return c;
+              const currentTemp = c.temperature;
+              const currentIdx = TEMP_PRESETS.findIndex(t => t === currentTemp);
+              const nextIdx = (currentIdx + 1) % TEMP_PRESETS.length;
+              return { ...c, temperature: TEMP_PRESETS[nextIdx] };
+            })
+          );
+        }
+      }
+    }
+  });
+
+  // ─────────────────────────────────────────────────────────────────────────────
+  // Render
+  // ─────────────────────────────────────────────────────────────────────────────
+
+  if (error) {
+    return <ErrorView error={error} height={height} />;
+  }
+
+  if (step.type === "select-document") {
+    return (
+      <DocumentSelector
+        title="Extractor Lab - Select Document"
+        borderColor="magenta"
+        height={height}
+        maxItems={maxItems}
+        documents={documents}
+        showFilter={true}
+        onFilterChange={onSearchDocuments}
+        onSelect={async (doc) => {
+          setSelectedDoc(doc);
+          await loadDocumentText(doc.id);
+          setStep({ type: "configure-extractors" });
+        }}
+        onCancel={onBack}
+      />
+    );
+  }
+
+  if (step.type === "configure-extractors") {
+    return (
+      <ConfigureExtractorsView
+        height={height}
+        selectedDoc={selectedDoc}
+        documentText={documentText}
+        extractorConfigs={extractorConfigs}
+        onHighlight={setHighlightedItem}
+        onBack={() => setStep({ type: "select-document" })}
+        onRun={runExtraction}
+        onAdd={() => setStep({ type: "add-extractor" })}
+        onToggleThinking={(idx) => {
+          setExtractorConfigs(configs =>
+            configs.map((c, i) => i === idx ? { ...c, thinking: !c.thinking } : c)
+          );
+        }}
+      />
+    );
+  }
+
+  if (step.type === "add-extractor") {
+    return (
+      <ModelSelector
+        title="Add Extractor - Select Model"
+        borderColor="cyan"
+        height={height}
+        maxItems={maxItems}
+        onSelect={(model) => {
+          setExtractorConfigs([
+            ...extractorConfigs,
+            { model: model.id, temperature: "default", thinking: false },
+          ]);
+          setStep({ type: "configure-extractors" });
+        }}
+        onCancel={() => setStep({ type: "configure-extractors" })}
+      />
+    );
+  }
+
+  if (step.type === "running") {
+    return <RunningView height={height} extractorCount={extractorConfigs.length} />;
+  }
+
+  if (step.type === "results") {
+    return (
+      <ResultsView
+        height={height}
+        maxItems={maxItems}
+        result={step.result}
+        selectedDoc={selectedDoc}
+        issueTextWidth={issueTextWidth}
+        onBack={() => setStep({ type: "configure-extractors" })}
+        onRunDedup={() => runPreJudgeDedup(step.result)}
+        onViewIssue={(extractorIdx, issueIdx) => {
+          setStep({ type: "issue-detail", result: step.result, extractorIdx, issueIdx });
+        }}
+      />
+    );
+  }
+
+  if (step.type === "issue-detail") {
+    return (
+      <IssueDetailView
+        height={height}
+        result={step.result}
+        extractorIdx={step.extractorIdx}
+        issueIdx={step.issueIdx}
+      />
+    );
+  }
+
+  if (step.type === "pre-judge-dedup") {
+    const currentDedup = step.multiDedup[step.selectedStrategy];
+    return (
+      <PreJudgeDedupView
+        height={height}
+        maxItems={maxItems}
+        result={step.result}
+        multiDedup={step.multiDedup}
+        selectedStrategy={step.selectedStrategy}
+        availableJudges={availableJudges}
+        selectedJudgeIdxs={selectedJudgeIdxs}
+        issueTextWidth={issueTextWidth}
+        generateJudgeLabel={generateJudgeLabel}
+        onBack={() => setStep({ type: "results", result: step.result })}
+        onRunJudges={(configs, dedupResult) => runMultipleJudges(step.result, dedupResult, configs)}
+        onToggleJudge={(idx) => {
+          setSelectedJudgeIdxs(prev => {
+            const next = new Set(prev);
+            if (next.has(idx)) {
+              if (next.size > 1) next.delete(idx);
+            } else {
+              next.add(idx);
+            }
+            return next;
+          });
+        }}
+        onSelectStrategy={(strategy) => {
+          setSelectedStrategy(strategy);
+          setStep({ type: "pre-judge-dedup", result: step.result, multiDedup: step.multiDedup, selectedStrategy: strategy });
+        }}
+      />
+    );
+  }
+
+  if (step.type === "running-judge") {
+    return (
+      <RunningJudgeView
+        height={height}
+        judgeConfigs={step.judgeConfigs}
+        dedupResult={step.dedupResult}
+        generateJudgeLabel={generateJudgeLabel}
+      />
+    );
+  }
+
+  if (step.type === "judge-comparison") {
+    return (
+      <JudgeComparisonView
+        height={height}
+        maxItems={maxItems}
+        result={step.result}
+        judgeResults={step.judgeResults}
+        selectedDoc={selectedDoc}
+        termWidth={termWidth}
+        onBack={() => setStep({ type: "results", result: step.result })}
+        onViewJudge={(jr) => {
+          setStep({
+            type: "judge-results",
+            result: step.result,
+            judgeResult: jr.result,
+            judgeLabel: jr.label,
+            judgeResults: step.judgeResults,
+          });
+        }}
+      />
+    );
+  }
+
+  if (step.type === "judge-results") {
+    const { result, judgeResult, judgeLabel, judgeResults } = step;
+    return (
+      <JudgeResultsView
+        height={height}
+        maxItems={maxItems}
+        result={result}
+        judgeResult={judgeResult}
+        judgeLabel={judgeLabel}
+        judgeResults={judgeResults}
+        judgeTextWidth={judgeTextWidth}
+        onBack={() => {
+          if (judgeResults) {
+            setStep({ type: "judge-comparison", result, judgeResults });
+          } else {
+            const multiDedup = runPreJudgeDedup(result, false);
+            if (multiDedup) {
+              setStep({ type: "pre-judge-dedup", result, multiDedup, selectedStrategy });
+            }
+          }
+        }}
+        onViewDecision={(decision, isRejected) => {
+          setStep({
+            type: "judge-decision-detail",
+            result,
+            judgeResult,
+            decision,
+            isRejected,
+            judgeLabel,
+            judgeResults,
+          });
+        }}
+      />
+    );
+  }
+
+  if (step.type === "judge-decision-detail") {
+    return (
+      <JudgeDecisionDetailView
+        height={height}
+        decision={step.decision}
+        isRejected={step.isRejected}
+      />
+    );
+  }
+
+  return null;
+}
diff --git a/meta-evals/src/components/MainMenu.tsx b/meta-evals/src/components/MainMenu.tsx
index c213b6957..0bf955f6f 100644
--- a/meta-evals/src/components/MainMenu.tsx
+++ b/meta-evals/src/components/MainMenu.tsx
@@ -1,12 +1,10 @@
 /**
- * Main Menu Screen Component
+ * Main Menu Screen Component - Clean Router
  */
 
 import React, { useState } from "react";
 import { Box, Text, useInput } from "ink";
 import SelectInput from "ink-select-input";
-import type { SeriesSummary } from "./types";
-import { truncate } from "./helpers";
 
 interface ModelInfo {
   id: string;
@@ -14,11 +12,10 @@ interface ModelInfo {
 }
 
 interface MainMenuProps {
-  series: SeriesSummary[];
-  maxItems: number;
   height: number;
-  onCreateBaseline: () => void;
-  onSelectSeries: (id: string) => void;
+  onScoreRank: () => void;
+  onValidation: () => void;
+  onExtractorLab: () => void;
   onExit: () => void;
   judgeModel: string;
   availableModels: ModelInfo[];
@@ -33,11 +30,10 @@ const TEMPERATURE_OPTIONS = [0, 0.3, 0.5, 0.7, 1.0, 1.5, 2.0];
 const MAX_TOKENS_OPTIONS = [2048, 4096, 8192, 16384, 32768];
 
 export function MainMenu({
-  series,
-  maxItems,
   height,
-  onCreateBaseline,
-  onSelectSeries,
+  onScoreRank,
+  onValidation,
+  onExtractorLab,
   onExit,
   judgeModel,
   availableModels,
@@ -47,42 +43,21 @@ export function MainMenu({
   maxTokens,
   onSetMaxTokens,
 }: MainMenuProps) {
-  const [activeTab, setActiveTab] = useState<"series" | "settings">("series");
+  const [showSettings, setShowSettings] = useState(false);
   const [settingsSection, setSettingsSection] = useState<"model" | "temperature" | "maxTokens">("model");
 
-  // Handle tab switching
+  // Handle keyboard input
   useInput((input, key) => {
-    if (key.tab) {
-      setActiveTab((prev) => (prev === "series" ? "settings" : "series"));
+    if (key.escape && showSettings) {
+      setShowSettings(false);
     }
   });
 
   // Get display name for current model
   const currentModelName = availableModels.find((m) => m.id === judgeModel)?.displayName || judgeModel;
 
-  // Render tabs header
-  const renderTabs = () => (
-    <Box marginBottom={1}>
-      <Text
-        bold={activeTab === "series"}
-        color={activeTab === "series" ? "cyan" : "gray"}
-      >
-        [Series]
-      </Text>
-      <Text> </Text>
-      <Text
-        bold={activeTab === "settings"}
-        color={activeTab === "settings" ? "yellow" : "gray"}
-      >
-        [Settings]
-      </Text>
-      <Text dimColor>  (Tab to switch)</Text>
-    </Box>
-  );
-
-  // Settings tab
-  if (activeTab === "settings") {
-    // Build items based on current section
+  // Settings panel
+  if (showSettings) {
     let settingsItems: { label: string; value: string }[] = [];
     let sectionTitle = "";
 
@@ -95,7 +70,7 @@ export function MainMenu({
         })),
         { label: "-> Temperature", value: "goto:temperature" },
         { label: "-> Max Tokens", value: "goto:maxTokens" },
-        { label: "<- Back to Series", value: "back" },
+        { label: "<- Back", value: "back" },
       ];
     } else if (settingsSection === "temperature") {
       sectionTitle = "Temperature";
@@ -125,10 +100,9 @@ export function MainMenu({
           </Text>
         </Box>
 
-        {renderTabs()}
-
         <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
           <Box flexDirection="column">
+            <Text dimColor>For Score/Rank AI judge:</Text>
             <Text>
               <Text bold>Model: </Text>
               <Text color="green">{currentModelName}</Text>
@@ -147,7 +121,7 @@ export function MainMenu({
           items={settingsItems}
           onSelect={(item) => {
             if (item.value === "back") {
-              setActiveTab("series");
+              setShowSettings(false);
             } else if (item.value.startsWith("goto:")) {
               setSettingsSection(item.value.replace("goto:", "") as "model" | "temperature" | "maxTokens");
             } else if (item.value.startsWith("model:")) {
@@ -161,23 +135,18 @@ export function MainMenu({
         />
 
         <Box marginTop={1} justifyContent="center">
-          <Text dimColor>Tab Switch | Up/Down Navigate | Enter Select | q Quit</Text>
+          <Text dimColor>Up/Down Navigate | Enter Select | Escape Back</Text>
         </Box>
       </Box>
     );
   }
 
-  // Series tab (default)
-  // Limit series shown, reserve 2 slots for create/exit
-  const visibleSeries = series.slice(0, maxItems - 2);
+  // Main menu items
   const items = [
-    ...visibleSeries
-      .filter((s) => s.id) // Ensure valid IDs
-      .map((s, idx) => ({
-        label: `${truncate(s.documentTitle, 40)} | ${s.runCount} runs | ${s.agentNames.slice(0, 2).join(", ")}`,
-        value: s.id || `series-${idx}`, // Fallback key
-      })),
-    { label: "+ Create New Baseline", value: "create" },
+    { label: "Score/Rank", value: "score-rank" },
+    { label: "Validation", value: "validation" },
+    { label: "Extractor Lab", value: "extractor-lab" },
+    { label: "Settings", value: "settings" },
     { label: "Exit", value: "exit" },
   ];
 
@@ -189,37 +158,28 @@ export function MainMenu({
         </Text>
       </Box>
 
-      {renderTabs()}
-
       <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
         <Box flexDirection="column">
-          <Text>
-            {series.length === 0
-              ? "No evaluation series yet. Create a baseline to get started."
-              : visibleSeries.length < series.length
-                ? `Showing ${visibleSeries.length} of ${series.length} series`
-                : `${series.length} series available`}
-          </Text>
+          <Text>Compare and evaluate agent outputs</Text>
           <Text dimColor>
             Judge: <Text color="green">{currentModelName}</Text>
-            {" "}| Temp: <Text color="green">{temperature}</Text>
-            {" "}| Tokens: <Text color="green">{maxTokens}</Text>
           </Text>
         </Box>
       </Box>
 
       <SelectInput
         items={items}
-        limit={maxItems}
         onSelect={(item) => {
           if (item.value === "exit") onExit();
-          else if (item.value === "create") onCreateBaseline();
-          else onSelectSeries(item.value);
+          else if (item.value === "score-rank") onScoreRank();
+          else if (item.value === "validation") onValidation();
+          else if (item.value === "extractor-lab") onExtractorLab();
+          else if (item.value === "settings") setShowSettings(true);
         }}
       />
 
       <Box marginTop={1} justifyContent="center">
-        <Text dimColor>Tab Switch | Up/Down Navigate | Enter Select | q Quit</Text>
+        <Text dimColor>Up/Down Navigate | Enter Select | q Quit</Text>
       </Box>
     </Box>
   );
diff --git a/meta-evals/src/components/ModelSelector.tsx b/meta-evals/src/components/ModelSelector.tsx
new file mode 100644
index 000000000..8f34685af
--- /dev/null
+++ b/meta-evals/src/components/ModelSelector.tsx
@@ -0,0 +1,207 @@
+/**
+ * ModelSelector - Reusable component for selecting AI models
+ *
+ * Fetches models from both Anthropic and OpenRouter APIs,
+ * with text input filtering support.
+ */
+
+import React, { useState, useEffect, useRef } from "react";
+import { Box, Text, useInput } from "ink";
+import TextInput from "ink-text-input";
+import SelectInput from "ink-select-input";
+import Spinner from "ink-spinner";
+import {
+  getAllModels,
+  filterModels,
+  type ModelInfo,
+} from "@roast/ai";
+
+export interface ModelSelectorProps {
+  /** Title shown at the top */
+  title?: string;
+  /** Border color */
+  borderColor?: string;
+  /** Container height */
+  height: number;
+  /** Max items to show in the list */
+  maxItems: number;
+  /** Called when a model is selected */
+  onSelect: (model: ModelInfo) => void;
+  /** Called when cancelled */
+  onCancel: () => void;
+}
+
+export function ModelSelector({
+  title = "Select Model",
+  borderColor = "cyan",
+  height,
+  maxItems,
+  onSelect,
+  onCancel,
+}: ModelSelectorProps) {
+  const [models, setModels] = useState<ModelInfo[]>([]);
+  const [filteredModels, setFilteredModels] = useState<ModelInfo[]>([]);
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+  const [filter, setFilter] = useState("");
+  const debounceRef = useRef<NodeJS.Timeout | null>(null);
+
+  // Load models on mount
+  useEffect(() => {
+    loadModels();
+  }, []);
+
+  // Filter models when query changes (debounced)
+  useEffect(() => {
+    if (debounceRef.current) {
+      clearTimeout(debounceRef.current);
+    }
+
+    debounceRef.current = setTimeout(() => {
+      setFilteredModels(filterModels(models, filter));
+    }, 150);
+
+    return () => {
+      if (debounceRef.current) {
+        clearTimeout(debounceRef.current);
+      }
+    };
+  }, [filter, models]);
+
+  // Handle escape to cancel
+  useInput((input, key) => {
+    if (key.escape) {
+      onCancel();
+    }
+  });
+
+  async function loadModels() {
+    try {
+      const allModels = await getAllModels();
+      setModels(allModels);
+      setFilteredModels(allModels);
+      setLoading(false);
+    } catch (e) {
+      setError(`Failed to load models: ${e}`);
+      setLoading(false);
+    }
+  }
+
+  if (loading) {
+    return (
+      <Box
+        flexDirection="column"
+        borderStyle="round"
+        borderColor={borderColor}
+        padding={1}
+        height={height}
+      >
+        <Box justifyContent="center" marginBottom={1}>
+          <Text bold color={borderColor}>
+            {title}
+          </Text>
+        </Box>
+        <Box justifyContent="center" padding={2}>
+          <Text>
+            <Spinner type="dots" /> Loading models from APIs...
+          </Text>
+        </Box>
+      </Box>
+    );
+  }
+
+  if (error) {
+    return (
+      <Box
+        flexDirection="column"
+        borderStyle="round"
+        borderColor="red"
+        padding={1}
+        height={height}
+      >
+        <Text color="red">{error}</Text>
+        <Text dimColor>Press Escape to go back</Text>
+      </Box>
+    );
+  }
+
+  // Build list items grouped by provider
+  const anthropicModels = filteredModels.filter((m) => m.provider === "anthropic");
+  const openRouterModels = filteredModels.filter((m) => m.provider === "openrouter");
+
+  const items: Array<{ label: string; value: string }> = [];
+
+  if (anthropicModels.length > 0) {
+    items.push({ label: `── Anthropic (${anthropicModels.length}) ──`, value: "header-anthropic" });
+    for (const m of anthropicModels) {
+      items.push({ label: `  ${m.name} (${m.id})`, value: m.id });
+    }
+  }
+
+  if (openRouterModels.length > 0) {
+    items.push({ label: `── OpenRouter (${openRouterModels.length}) ──`, value: "header-openrouter" });
+    for (const m of openRouterModels) {
+      items.push({ label: `  ${m.name}`, value: m.id });
+    }
+  }
+
+  items.push({ label: "← Cancel", value: "cancel" });
+
+  return (
+    <Box
+      flexDirection="column"
+      borderStyle="round"
+      borderColor={borderColor}
+      padding={1}
+      height={height}
+      overflow="hidden"
+    >
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color={borderColor}>
+          {title}
+        </Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
+        <Text>
+          {filteredModels.length} models
+          {filter && ` matching "${filter}"`}
+          {" "}(Anthropic: {anthropicModels.length}, OpenRouter: {openRouterModels.length})
+        </Text>
+      </Box>
+
+      <Box marginBottom={1} paddingX={1}>
+        <Text dimColor>Filter: </Text>
+        <TextInput
+          value={filter}
+          onChange={setFilter}
+          placeholder="type to filter models..."
+        />
+      </Box>
+
+      <SelectInput
+        items={items}
+        limit={maxItems - 5}
+        onSelect={(item) => {
+          if (item.value === "cancel") {
+            onCancel();
+          } else if (item.value.startsWith("header-")) {
+            // Ignore header clicks
+          } else {
+            const model = filteredModels.find((m) => m.id === item.value);
+            if (model) {
+              onSelect(model);
+            }
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter Select | Escape Cancel</Text>
+      </Box>
+    </Box>
+  );
+}
+
+// Re-export types for convenience
+export type { ModelInfo };
diff --git a/meta-evals/src/components/RankRuns.tsx b/meta-evals/src/components/RankRuns.tsx
index 11ca6be1f..3c3a2b4c9 100644
--- a/meta-evals/src/components/RankRuns.tsx
+++ b/meta-evals/src/components/RankRuns.tsx
@@ -70,8 +70,11 @@ export function RankRuns({ seriesId, height, judgeModel, temperature, maxTokens,
     loadData();
   }, [seriesId]);
 
-  // Handle tab key to switch between tabs (must be before any conditional returns)
+  // Handle keyboard shortcuts (must be before any conditional returns)
   useInput((input, key) => {
+    if (key.escape) {
+      onBack();
+    }
     if (key.tab && savedSessions.length > 0 && !results) {
       setActiveTab((prev) => (prev === "saved" ? "new" : "saved"));
     }
diff --git a/meta-evals/src/components/ScoreRankMenu.tsx b/meta-evals/src/components/ScoreRankMenu.tsx
new file mode 100644
index 000000000..6724374a0
--- /dev/null
+++ b/meta-evals/src/components/ScoreRankMenu.tsx
@@ -0,0 +1,169 @@
+/**
+ * Score/Rank Menu Screen
+ *
+ * Shows series list for scoring and ranking agent outputs.
+ */
+
+import React, { useState } from "react";
+import { Box, Text, useInput } from "ink";
+import SelectInput from "ink-select-input";
+import type { SeriesSummary } from "./types";
+import { truncate } from "./helpers";
+import { ScreenContainer, InfoBox } from "./shared";
+
+interface ScoreRankMenuProps {
+  series: SeriesSummary[];
+  maxItems: number;
+  height: number;
+  judgeModel: string;
+  onCreateSeries: () => void;
+  onSelectSeries: (id: string) => void;
+  onDeleteSeries: (id: string) => Promise<void>;
+  onBack: () => void;
+}
+
+export function ScoreRankMenu({
+  series,
+  maxItems,
+  height,
+  judgeModel,
+  onCreateSeries,
+  onSelectSeries,
+  onDeleteSeries,
+  onBack,
+}: ScoreRankMenuProps) {
+  const [highlightedIndex, setHighlightedIndex] = useState(0);
+  const [confirmDelete, setConfirmDelete] = useState<string | null>(null);
+  const [isDeleting, setIsDeleting] = useState(false);
+
+  // Limit series shown, reserve slots for actions
+  const visibleSeries = series.slice(0, maxItems - 3);
+
+  // Handle keyboard input
+  useInput((input, key) => {
+    if (key.escape) {
+      if (confirmDelete) {
+        setConfirmDelete(null);
+      } else {
+        onBack();
+      }
+    }
+
+    // Delete with 'd' key
+    if (input === "d" && !confirmDelete && !isDeleting) {
+      const selectedSeries = visibleSeries[highlightedIndex];
+      if (selectedSeries) {
+        setConfirmDelete(selectedSeries.id);
+      }
+    }
+
+    // Confirm delete with 'y'
+    if (confirmDelete && input === "y" && !isDeleting) {
+      setIsDeleting(true);
+      onDeleteSeries(confirmDelete).finally(() => {
+        setConfirmDelete(null);
+        setIsDeleting(false);
+      });
+    }
+
+    // Cancel delete with 'n'
+    if (confirmDelete && input === "n") {
+      setConfirmDelete(null);
+    }
+  });
+
+  // Find series being deleted for confirmation message
+  const deletingSeries = confirmDelete ? visibleSeries.find((s) => s.id === confirmDelete) : null;
+
+  // Delete confirmation modal
+  if (confirmDelete && deletingSeries) {
+    return (
+      <ScreenContainer title="Score/Rank - Confirm Delete" borderColor="red" height={height}>
+        <Box
+          flexDirection="column"
+          justifyContent="center"
+          alignItems="center"
+          flexGrow={1}
+        >
+          <Box
+            flexDirection="column"
+            borderStyle="double"
+            borderColor="red"
+            paddingX={4}
+            paddingY={1}
+          >
+            <Box justifyContent="center" marginBottom={1}>
+              <Text bold color="red">
+                Confirm Delete
+              </Text>
+            </Box>
+            <Box marginBottom={1} justifyContent="center">
+              <Text>Delete this series?</Text>
+            </Box>
+            <Box marginBottom={1} justifyContent="center">
+              <Text color="yellow">"{truncate(deletingSeries.documentTitle, 45)}"</Text>
+            </Box>
+            <Box marginBottom={1} justifyContent="center">
+              <Text dimColor>
+                {deletingSeries.runCount} run{deletingSeries.runCount !== 1 ? "s" : ""} will be removed.
+              </Text>
+            </Box>
+            <Box justifyContent="center" marginTop={1}>
+              {isDeleting ? (
+                <Text color="yellow">Deleting...</Text>
+              ) : (
+                <Box gap={3}>
+                  <Text backgroundColor="red" color="white" bold> Y - Delete </Text>
+                  <Text backgroundColor="gray" color="white"> N - Cancel </Text>
+                </Box>
+              )}
+            </Box>
+          </Box>
+        </Box>
+      </ScreenContainer>
+    );
+  }
+
+  // Build menu items
+  const items = [
+    ...visibleSeries
+      .filter((s) => s.id)
+      .map((s, idx) => ({
+        label: `${truncate(s.documentTitle, 40)} | ${s.runCount} runs | ${s.agentNames.slice(0, 2).join(", ")}`,
+        value: s.id || `series-${idx}`,
+      })),
+    { label: "+ Create New Series", value: "create" },
+    { label: "<- Back to Main Menu", value: "back" },
+  ];
+
+  return (
+    <ScreenContainer title="Score/Rank - Series List" borderColor="cyan" height={height}>
+      <InfoBox>
+        <Text>
+          {series.length === 0
+            ? "No series yet. Create one to score/rank agent outputs."
+            : `${series.length} series | Judge: `}
+          {series.length > 0 && <Text color="green">{judgeModel}</Text>}
+        </Text>
+      </InfoBox>
+
+      <SelectInput
+        items={items}
+        limit={maxItems}
+        onHighlight={(item) => {
+          const idx = visibleSeries.findIndex((s) => s.id === item.value);
+          if (idx >= 0) setHighlightedIndex(idx);
+        }}
+        onSelect={(item) => {
+          if (item.value === "back") onBack();
+          else if (item.value === "create") onCreateSeries();
+          else onSelectSeries(item.value);
+        }}
+      />
+
+      <Box marginTop={1}>
+        <Text dimColor>Enter Select | d Delete | Escape Back</Text>
+      </Box>
+    </ScreenContainer>
+  );
+}
diff --git a/meta-evals/src/components/ScoreRun.tsx b/meta-evals/src/components/ScoreRun.tsx
index 8b5e55fbb..7bdb9222d 100644
--- a/meta-evals/src/components/ScoreRun.tsx
+++ b/meta-evals/src/components/ScoreRun.tsx
@@ -3,7 +3,7 @@
  */
 
 import React, { useState, useEffect } from "react";
-import { Box, Text } from "ink";
+import { Box, Text, useInput } from "ink";
 import SelectInput from "ink-select-input";
 import { metaEvaluationRepository } from "@roast/db";
 import { scoreComments, type ScoringResult } from "@roast/ai/meta-eval";
@@ -45,6 +45,13 @@ export function ScoreRun({ seriesId, height, judgeModel, temperature, maxTokens,
   const [showFullReasoning, setShowFullReasoning] = useState(false);
   const [isViewingSaved, setIsViewingSaved] = useState(false);
 
+  // Handle escape to go back
+  useInput((input, key) => {
+    if (key.escape) {
+      onBack();
+    }
+  });
+
   useEffect(() => {
     loadCompletedRuns();
   }, [seriesId]);
diff --git a/meta-evals/src/components/SeriesDetail.tsx b/meta-evals/src/components/SeriesDetail.tsx
index 7f700d0f0..793998c2a 100644
--- a/meta-evals/src/components/SeriesDetail.tsx
+++ b/meta-evals/src/components/SeriesDetail.tsx
@@ -3,7 +3,7 @@
  */
 
 import React, { useState, useEffect } from "react";
-import { Box, Text } from "ink";
+import { Box, Text, useInput } from "ink";
 import SelectInput from "ink-select-input";
 import { metaEvaluationRepository } from "@roast/db";
 import { truncate, formatDate, formatStatus, getStatusColor } from "./helpers";
@@ -58,6 +58,13 @@ export function SeriesDetail({
   const [clearing, setClearing] = useState(false);
   const [series, setSeries] = useState<SeriesDetailData | null>(null);
 
+  // Handle escape to go back
+  useInput((input, key) => {
+    if (key.escape) {
+      onBack();
+    }
+  });
+
   // Load and poll for updates - always poll every 2 seconds
   useEffect(() => {
     let mounted = true;
diff --git a/meta-evals/src/components/Validation.tsx b/meta-evals/src/components/Validation.tsx
new file mode 100644
index 000000000..5a7b27cbe
--- /dev/null
+++ b/meta-evals/src/components/Validation.tsx
@@ -0,0 +1,1363 @@
+/**
+ * Validation Screen Component
+ *
+ * Compare pipeline runs and detect regressions.
+ *
+ * Flow:
+ * 1. Select/create a baseline (saved evaluation snapshots)
+ * 2. Run pipeline on baseline documents with current code
+ * 3. Compare new results vs baseline
+ * 4. View regressions
+ */
+
+import React, { useState, useEffect } from "react";
+import { Box, Text, useInput } from "ink";
+import TextInput from "ink-text-input";
+import SelectInput from "ink-select-input";
+import Spinner from "ink-spinner";
+import { metaEvaluationRepository, type AgentChoice } from "@roast/db";
+import { truncate } from "./helpers";
+import { ScreenContainer, InfoBox } from "./shared";
+import { DocumentSelector } from "./DocumentSelector";
+import {
+  type ValidationDocument,
+  type DocumentComparisonResult,
+  type EvaluationSnapshot,
+  compareSnapshots,
+} from "../validation";
+
+type Tab = "baselines" | "run" | "history";
+
+/** Sanitize baseline name - remove newlines and extra whitespace */
+function sanitizeName(name: string): string {
+  return name.replace(/\s+/g, " ").trim();
+}
+
+interface ValidationProps {
+  height: number;
+  maxItems: number;
+  onBack: () => void;
+  onCreateBatch: (agentId: string, documentIds: string[]) => Promise<string[]>; // Returns job IDs
+}
+
+interface Baseline {
+  id: string;
+  name: string;
+  description: string | null;
+  commitHash: string | null;
+  createdAt: Date;
+  snapshotCount: number;
+}
+
+// CorpusDocument is just ValidationDocument (selection tracked separately via Set)
+
+interface ValidationRunSummary {
+  id: string;
+  name: string | null;
+  commitHash: string | null;
+  status: string;
+  summary: string | null;
+  createdAt: Date;
+  completedAt: Date | null;
+  snapshotCount: number;
+  unchangedCount: number;
+  changedCount: number;
+}
+
+export function Validation({ height, maxItems, onBack, onCreateBatch }: ValidationProps) {
+  const [activeTab, setActiveTab] = useState<Tab>("baselines");
+  const [loading, setLoading] = useState(true);
+  const [error, setError] = useState<string | null>(null);
+
+  // Agent state
+  const [agents, setAgents] = useState<AgentChoice[]>([]);
+  const [selectedAgent, setSelectedAgent] = useState<AgentChoice | null>(null);
+
+  // Baseline state
+  const [baselines, setBaselines] = useState<Baseline[]>([]);
+  const [selectedBaseline, setSelectedBaseline] = useState<Baseline | null>(null);
+  const [creatingBaseline, setCreatingBaseline] = useState(false);
+  const [newBaselineName, setNewBaselineName] = useState("");
+
+  // Corpus state (for creating new baseline)
+  const [corpusDocuments, setCorpusDocuments] = useState<ValidationDocument[]>([]);
+  const [selectedDocIds, setSelectedDocIds] = useState<Set<string>>(new Set());
+  const [showCorpusSelect, setShowCorpusSelect] = useState(false);
+
+  // Run state
+  const [isRunning, setIsRunning] = useState(false);
+  const [runProgress, setRunProgress] = useState({ phase: "", current: 0, total: 0 });
+
+  // Run state (for tracking current run to auto-select after completion)
+  const [currentRunId, setCurrentRunId] = useState<string | null>(null);
+
+  // History state
+  const [validationRuns, setValidationRuns] = useState<ValidationRunSummary[]>([]);
+  const [selectedRunId, setSelectedRunId] = useState<string | null>(null);
+  const [selectedRunDetail, setSelectedRunDetail] = useState<{
+    id: string;
+    name: string | null;
+    status: string;
+    summary: string | null;
+    createdAt: Date;
+    baseline: { id: string; name: string };
+    snapshots: Array<{
+      id: string;
+      status: string;
+      keptCount: number;
+      newCount: number;
+      lostCount: number;
+      documentId: string;
+      documentTitle: string;
+      comparisonData: unknown;
+    }>;
+  } | null>(null);
+  const [selectedSnapshotId, setSelectedSnapshotId] = useState<string | null>(null);
+  const [selectedCommentKey, setSelectedCommentKey] = useState<string | null>(null);
+
+  // Keyboard handling
+  useInput((input, key) => {
+    if (key.escape) {
+      if (selectedCommentKey) {
+        setSelectedCommentKey(null);
+      } else if (selectedSnapshotId) {
+        setSelectedSnapshotId(null);
+      } else if (selectedRunDetail) {
+        setSelectedRunDetail(null);
+        setSelectedRunId(null);
+      } else if (creatingBaseline) {
+        setCreatingBaseline(false);
+        setShowCorpusSelect(false);
+      } else if (activeTab !== "baselines") {
+        setActiveTab("baselines");
+      } else {
+        onBack();
+      }
+    }
+    if (key.tab && !creatingBaseline) {
+      setActiveTab((prev) => {
+        if (prev === "baselines") return "run";
+        if (prev === "run") return "history";
+        return "baselines";
+      });
+    }
+  });
+
+  // Load agents on mount
+  useEffect(() => {
+    loadAgents();
+  }, []);
+
+  // Load baselines when agent selected
+  useEffect(() => {
+    if (selectedAgent) {
+      loadBaselines(selectedAgent.id);
+      loadCorpus(selectedAgent.id);
+    }
+  }, [selectedAgent?.id]);
+
+  // Load validation runs when baseline selected
+  useEffect(() => {
+    if (selectedBaseline) {
+      loadValidationRuns(selectedBaseline.id);
+    }
+  }, [selectedBaseline?.id]);
+
+  async function loadAgents() {
+    try {
+      setLoading(true);
+      const { prisma } = await import("@roast/db");
+      const fallacyAgents = await prisma.agent.findMany({
+        where: {
+          isDeprecated: false,
+          ephemeralBatchId: null,
+          versions: {
+            some: {
+              pluginIds: { has: "fallacy-check" },
+            },
+          },
+        },
+        include: {
+          versions: {
+            orderBy: { version: "desc" },
+            take: 1,
+            select: { name: true, version: true },
+          },
+        },
+        take: 20,
+      });
+
+      const agentChoices: AgentChoice[] = fallacyAgents
+        .filter((a) => a.versions.length > 0)
+        .map((a) => ({
+          id: a.id,
+          name: a.versions[0].name,
+          version: a.versions[0].version,
+        }));
+
+      setAgents(agentChoices);
+      if (agentChoices.length > 0) {
+        setSelectedAgent(agentChoices[0]);
+      }
+      setLoading(false);
+    } catch (e) {
+      setError(String(e));
+      setLoading(false);
+    }
+  }
+
+  async function loadBaselines(agentId: string) {
+    try {
+      const data = await metaEvaluationRepository.getValidationBaselines(agentId);
+      setBaselines(data);
+      if (data.length > 0 && !selectedBaseline) {
+        setSelectedBaseline(data[0]);
+      }
+    } catch (e) {
+      setError(String(e));
+    }
+  }
+
+  async function loadCorpus(agentId: string, filter?: string) {
+    try {
+      const docs = await metaEvaluationRepository.getValidationCorpusDocuments(
+        agentId,
+        { limit: 50, minContentLength: 200, filter }
+      );
+      setCorpusDocuments(docs);
+      // Only reset selection on initial load, not on filter changes
+      if (!filter) {
+        setSelectedDocIds(new Set());
+      }
+    } catch (e) {
+      setError(String(e));
+    }
+  }
+
+  async function loadValidationRuns(baselineId: string) {
+    try {
+      const runs = await metaEvaluationRepository.getValidationRuns(baselineId);
+      setValidationRuns(runs);
+    } catch (e) {
+      setError(String(e));
+    }
+  }
+
+  async function loadRunDetail(runId: string) {
+    try {
+      setLoading(true);
+      const detail = await metaEvaluationRepository.getValidationRunDetail(runId);
+      setSelectedRunDetail(detail);
+      setLoading(false);
+    } catch (e) {
+      setError(String(e));
+      setLoading(false);
+    }
+  }
+
+  async function createBaseline() {
+    if (!selectedAgent || !newBaselineName.trim()) return;
+
+    if (selectedDocIds.size === 0) return;
+
+    try {
+      setLoading(true);
+
+      // Get current evaluation version IDs for selected documents
+      const snapshots = await metaEvaluationRepository.getEvaluationSnapshots(
+        Array.from(selectedDocIds),
+        selectedAgent.id
+      );
+
+      const result = await metaEvaluationRepository.createValidationBaseline({
+        name: newBaselineName.trim(),
+        agentId: selectedAgent.id,
+        evaluationVersionIds: snapshots.map((s) => s.evaluationVersionId),
+      });
+
+      // Reload baselines
+      await loadBaselines(selectedAgent.id);
+
+      // Select the new baseline
+      const newBaseline = baselines.find((b) => b.id === result.id);
+      if (newBaseline) setSelectedBaseline(newBaseline);
+
+      setCreatingBaseline(false);
+      setShowCorpusSelect(false);
+      setNewBaselineName("");
+      setLoading(false);
+    } catch (e) {
+      setError(String(e));
+      setLoading(false);
+    }
+  }
+
+  async function deleteBaseline(baselineId: string) {
+    try {
+      await metaEvaluationRepository.deleteValidationBaseline(baselineId);
+      if (selectedAgent) {
+        await loadBaselines(selectedAgent.id);
+      }
+      if (selectedBaseline?.id === baselineId) {
+        setSelectedBaseline(baselines[0] || null);
+      }
+    } catch (e) {
+      setError(String(e));
+    }
+  }
+
+  async function runValidation() {
+    if (!selectedAgent || !selectedBaseline) return;
+
+    setIsRunning(true);
+    setActiveTab("run");
+    setCurrentRunId(null);
+
+    let runId: string | null = null;
+
+    try {
+      // Phase 1: Create validation run record
+      setRunProgress({ phase: "Creating run...", current: 0, total: 0 });
+      const run = await metaEvaluationRepository.createValidationRun({
+        baselineId: selectedBaseline.id,
+        name: `Run ${new Date().toLocaleString()}`,
+      });
+      runId = run.id;
+      setCurrentRunId(runId);
+
+      // Phase 2: Get baseline snapshots
+      setRunProgress({ phase: "Loading baseline...", current: 0, total: 0 });
+      const baselineSnapshots = await metaEvaluationRepository.getBaselineSnapshots(selectedBaseline.id);
+
+      if (baselineSnapshots.length === 0) {
+        throw new Error("Baseline has no snapshots");
+      }
+
+      // Phase 3: Run pipeline on documents
+      setRunProgress({ phase: "Running pipeline...", current: 0, total: baselineSnapshots.length });
+      const documentIds = [...new Set(baselineSnapshots.map((s) => s.documentId))];
+
+      // Create batch jobs
+      const jobIds = await onCreateBatch(selectedAgent.id, documentIds);
+
+      // Phase 4: Wait for jobs to complete and get results
+      setRunProgress({ phase: "Waiting for jobs...", current: 0, total: jobIds.length });
+
+      // Poll for job completion
+      const { prisma } = await import("@roast/db");
+      let completed = 0;
+      const maxWaitMs = 5 * 60 * 1000; // 5 minutes
+      const startTime = Date.now();
+
+      while (completed < jobIds.length && Date.now() - startTime < maxWaitMs) {
+        await new Promise((r) => setTimeout(r, 2000)); // Poll every 2s
+
+        const jobs = await prisma.job.findMany({
+          where: { id: { in: jobIds } },
+          select: { id: true, status: true, evaluationVersionId: true },
+        });
+
+        completed = jobs.filter((j) => j.status === "COMPLETED" || j.status === "FAILED").length;
+        setRunProgress({ phase: "Waiting for jobs...", current: completed, total: jobIds.length });
+      }
+
+      // Phase 5: Get new evaluation versions and compare
+      setRunProgress({ phase: "Comparing results...", current: 0, total: baselineSnapshots.length });
+
+      const jobs = await prisma.job.findMany({
+        where: { id: { in: jobIds }, status: "COMPLETED" },
+        select: { evaluationVersionId: true },
+      });
+
+      const newVersionIds = jobs
+        .map((j) => j.evaluationVersionId)
+        .filter((id): id is string => id !== null);
+
+      // Get new snapshots
+      const newSnapshots = await Promise.all(
+        newVersionIds.map((id) => metaEvaluationRepository.getEvaluationSnapshotById(id))
+      );
+
+      // Compare and save results
+      const results: DocumentComparisonResult[] = [];
+      let unchangedCount = 0;
+      let changedCount = 0;
+
+      for (const baselineSnapshot of baselineSnapshots) {
+        const newSnapshot = newSnapshots.find(
+          (s) => s && s.documentId === baselineSnapshot.documentId
+        );
+
+        if (newSnapshot) {
+          const baselineEval = toEvaluationSnapshot(baselineSnapshot);
+          const currentEval = toEvaluationSnapshot(newSnapshot);
+          const comparison = compareSnapshots(baselineEval, currentEval);
+          results.push(comparison);
+
+          // Save snapshot result to database
+          const baselineSnapshotRecord = await metaEvaluationRepository.getBaselineSnapshotByDocument(
+            selectedBaseline.id,
+            baselineSnapshot.documentId
+          );
+
+          if (baselineSnapshotRecord && runId) {
+            const status = comparison.newComments.length === 0 && comparison.lostComments.length === 0
+              ? "unchanged"
+              : "changed";
+
+            if (status === "unchanged") unchangedCount++;
+            else changedCount++;
+
+            await metaEvaluationRepository.addValidationRunSnapshot({
+              runId,
+              baselineSnapshotId: baselineSnapshotRecord.id,
+              newEvaluationId: newSnapshot.evaluationVersionId,
+              status,
+              keptCount: comparison.matchedComments.length,
+              newCount: comparison.newComments.length,
+              lostCount: comparison.lostComments.length,
+              comparisonData: {
+                matchedComments: comparison.matchedComments,
+                newComments: comparison.newComments,
+                lostComments: comparison.lostComments,
+                // Include filter reasoning from the current run's telemetry
+                filteredItems: currentEval.pipelineTelemetry?.filteredItems,
+                // Include extraction phase telemetry for drill-down
+                extractionPhase: currentEval.pipelineTelemetry?.extractionPhase,
+                // Include pipeline counts for accurate math display
+                pipelineCounts: currentEval.pipelineTelemetry ? {
+                  issuesAfterDedup: currentEval.pipelineTelemetry.issuesAfterDedup,
+                  issuesAfterFiltering: currentEval.pipelineTelemetry.issuesAfterFiltering,
+                  commentsGenerated: currentEval.pipelineTelemetry.commentsGenerated,
+                  commentsKept: currentEval.pipelineTelemetry.commentsKept,
+                } : undefined,
+              },
+            });
+          }
+        }
+
+        setRunProgress((p) => ({ ...p, current: p.current + 1 }));
+      }
+
+      // Update run status
+      if (runId) {
+        const summary = `${unchangedCount} unchanged, ${changedCount} changed`;
+        await metaEvaluationRepository.updateValidationRunStatus(runId, "completed", summary);
+      }
+
+      // Reload runs list and navigate to history
+      if (selectedBaseline) {
+        await loadValidationRuns(selectedBaseline.id);
+      }
+
+      // Navigate to history and auto-load the run detail
+      setActiveTab("history");
+      if (runId) {
+        setSelectedRunId(runId);
+        await loadRunDetail(runId);
+      }
+    } catch (e) {
+      // Mark run as failed if it was created
+      if (runId) {
+        await metaEvaluationRepository.updateValidationRunStatus(runId, "failed", String(e));
+      }
+      setError(String(e));
+    } finally {
+      setIsRunning(false);
+    }
+  }
+
+  // Render tabs header
+  const renderTabs = () => (
+    <Box marginBottom={1}>
+      <Text bold={activeTab === "baselines"} color={activeTab === "baselines" ? "cyan" : "gray"}>
+        [Baselines]
+      </Text>
+      <Text> </Text>
+      <Text bold={activeTab === "run"} color={activeTab === "run" ? "yellow" : "gray"}>
+        [Run]
+      </Text>
+      <Text> </Text>
+      <Text bold={activeTab === "history"} color={activeTab === "history" ? "magenta" : "gray"}>
+        [History]
+      </Text>
+      <Text dimColor>  (Tab to switch)</Text>
+    </Box>
+  );
+
+  if (error) {
+    return (
+      <ScreenContainer title="Validation" borderColor="red" height={height}>
+        <Text color="red">Error: {error}</Text>
+        <Text dimColor>Press Escape to go back</Text>
+      </ScreenContainer>
+    );
+  }
+
+  if (loading) {
+    return (
+      <ScreenContainer title="Validation" borderColor="magenta" height={height}>
+        <Text><Spinner type="dots" /> Loading...</Text>
+      </ScreenContainer>
+    );
+  }
+
+  // Creating baseline - corpus selection using DocumentSelector
+  if (creatingBaseline && showCorpusSelect) {
+    // Convert ValidationDocument[] to DocumentChoice[] format
+    const documentsForSelector = corpusDocuments.map((d) => ({
+      id: d.documentId,
+      title: d.title,
+      createdAt: d.lastEvaluatedAt || new Date(),
+    }));
+
+    return (
+      <DocumentSelector
+        title={`New Baseline: ${newBaselineName}`}
+        subtitle="Select documents to include in baseline"
+        borderColor="cyan"
+        height={height}
+        maxItems={maxItems}
+        documents={documentsForSelector}
+        multiSelect={true}
+        selectedIds={selectedDocIds}
+        onSelectionChange={setSelectedDocIds}
+        showFilter={true}
+        onFilterChange={(f) => selectedAgent && loadCorpus(selectedAgent.id, f)}
+        confirmLabel="Create Baseline"
+        onConfirm={() => createBaseline()}
+        onCancel={() => {
+          setShowCorpusSelect(false);
+          setCreatingBaseline(false);
+        }}
+      />
+    );
+  }
+
+  // Creating baseline - name input
+  if (creatingBaseline) {
+    return (
+      <ScreenContainer title="Create New Baseline" borderColor="cyan" height={height}>
+        <InfoBox>
+          <Text>Enter a name for this baseline (e.g., "Pre-refactor", "v2.0")</Text>
+        </InfoBox>
+
+        <Box marginY={1}>
+          <Text>Name: </Text>
+          <TextInput
+            value={newBaselineName}
+            onChange={setNewBaselineName}
+            onSubmit={() => {
+              if (newBaselineName.trim()) {
+                setShowCorpusSelect(true);
+              }
+            }}
+          />
+        </Box>
+
+        <Box marginTop={1}>
+          <Text dimColor>Enter Continue | Escape Cancel</Text>
+        </Box>
+      </ScreenContainer>
+    );
+  }
+
+  // Run tab
+  if (activeTab === "run") {
+    return (
+      <ScreenContainer title="Run Validation" borderColor="yellow" height={height}>
+        {renderTabs()}
+
+        {isRunning ? (
+          <Box flexDirection="column" alignItems="center" marginTop={2}>
+            <Text><Spinner type="dots" /> {runProgress.phase}</Text>
+            {runProgress.total > 0 && (
+              <Text color="yellow">{runProgress.current}/{runProgress.total}</Text>
+            )}
+          </Box>
+        ) : selectedBaseline ? (
+          <Box flexDirection="column">
+            <InfoBox>
+              <Text>
+                Baseline: <Text color="cyan">{sanitizeName(selectedBaseline.name)}</Text>
+                {" "}({selectedBaseline.snapshotCount} docs)
+              </Text>
+            </InfoBox>
+
+            <SelectInput
+              items={[
+                { label: `▶ Run Pipeline & Compare`, value: "run" },
+                { label: "← Back to Baselines", value: "back" },
+              ]}
+              onSelect={(item) => {
+                if (item.value === "run") runValidation();
+                else setActiveTab("baselines");
+              }}
+            />
+          </Box>
+        ) : (
+          <Box flexDirection="column">
+            <Text color="yellow">No baseline selected. Create or select one first.</Text>
+            <SelectInput
+              items={[{ label: "← Back to Baselines", value: "back" }]}
+              onSelect={() => setActiveTab("baselines")}
+            />
+          </Box>
+        )}
+      </ScreenContainer>
+    );
+  }
+
+  // Comment detail view
+  if (selectedRunDetail && selectedSnapshotId && selectedCommentKey) {
+    const snapshot = selectedRunDetail.snapshots.find((s) => s.id === selectedSnapshotId);
+    if (snapshot) {
+      const data = snapshot.comparisonData as {
+        matchedComments?: Array<{ baselineComment?: { quotedText: string; header: string | null; description: string }; currentComment?: { quotedText: string; header: string | null; description: string } }>;
+        newComments?: Array<{ quotedText: string; header: string | null; description: string }>;
+        lostComments?: Array<{ quotedText: string; header: string | null; description: string }>;
+        filteredItems?: Array<{ stage: string; quotedText: string; header?: string; filterReason: string; supportLocation?: string }>;
+      } | null;
+
+      const matched = data?.matchedComments || [];
+      const newComments = data?.newComments || [];
+      const lost = data?.lostComments || [];
+      const filteredItems = data?.filteredItems || [];
+
+      let commentType = "";
+      let baselineComment: { quotedText: string; header: string | null; description: string } | null = null;
+      let currentComment: { quotedText: string; header: string | null; description: string } | null = null;
+      let filterInfo: { stage: string; filterReason: string; supportLocation?: string } | null = null;
+
+      if (selectedCommentKey.startsWith("kept-")) {
+        const idx = parseInt(selectedCommentKey.replace("kept-", ""), 10);
+        const match = matched[idx];
+        baselineComment = match?.baselineComment || null;
+        currentComment = match?.currentComment || null;
+        commentType = "Kept";
+      } else if (selectedCommentKey.startsWith("new-")) {
+        const idx = parseInt(selectedCommentKey.replace("new-", ""), 10);
+        currentComment = newComments[idx] || null;
+        commentType = "New";
+      } else if (selectedCommentKey.startsWith("lost-")) {
+        const idx = parseInt(selectedCommentKey.replace("lost-", ""), 10);
+        baselineComment = lost[idx] || null;
+        commentType = "Lost";
+
+        // Try to find filter reason for this lost comment
+        if (baselineComment && filteredItems.length > 0) {
+          // Match by quoted text (fuzzy match - check if texts contain each other)
+          const matchingFilter = filteredItems.find((f) => {
+            const fText = f.quotedText.toLowerCase().trim();
+            const bText = baselineComment!.quotedText.toLowerCase().trim();
+            // Check if either contains the other (for partial matches)
+            return fText.includes(bText) || bText.includes(fText) ||
+              // Also check header match as fallback
+              (f.header && baselineComment!.header && f.header.toLowerCase() === baselineComment!.header.toLowerCase());
+          });
+
+          if (matchingFilter) {
+            filterInfo = {
+              stage: matchingFilter.stage,
+              filterReason: matchingFilter.filterReason,
+              supportLocation: matchingFilter.supportLocation,
+            };
+          }
+        }
+      } else if (selectedCommentKey.startsWith("filtered-")) {
+        // Show filtered item detail view
+        const idx = parseInt(selectedCommentKey.replace("filtered-", ""), 10);
+        const filteredItem = filteredItems[idx];
+        if (filteredItem) {
+          const stageName = filteredItem.stage === 'supported-elsewhere-filter'
+            ? 'Supported Elsewhere Filter'
+            : filteredItem.stage === 'review'
+            ? 'Review Filter'
+            : filteredItem.stage;
+          return (
+            <ScreenContainer title={`Filtered Issue (${stageName})`} borderColor="magenta" height={height}>
+              <Box flexDirection="column" paddingX={1} overflowY="hidden">
+                <Box marginBottom={1}>
+                  <Text bold color="magenta">{filteredItem.header || "(no header)"}</Text>
+                </Box>
+
+                <Box marginBottom={1} borderStyle="single" borderColor="gray" paddingX={1} flexDirection="column">
+                  <Text bold>Quoted Text:</Text>
+                  <Text wrap="wrap">{filteredItem.quotedText}</Text>
+                </Box>
+
+                <Box marginBottom={1} borderStyle="single" borderColor="magenta" paddingX={1} flexDirection="column">
+                  <Text bold color="magenta">Why Filtered:</Text>
+                  <Text wrap="wrap">{filteredItem.filterReason}</Text>
+                </Box>
+
+                {filteredItem.supportLocation && (
+                  <Box marginBottom={1} borderStyle="single" borderColor="cyan" paddingX={1} flexDirection="column">
+                    <Text bold color="cyan">Support Found At:</Text>
+                    <Text wrap="wrap">{filteredItem.supportLocation}</Text>
+                  </Box>
+                )}
+
+                <Box marginTop={1}>
+                  <SelectInput
+                    items={[{ label: "← Back to Comments", value: "back" }]}
+                    onSelect={() => setSelectedCommentKey(null)}
+                  />
+                </Box>
+              </Box>
+            </ScreenContainer>
+          );
+        }
+      }
+
+      if (baselineComment || currentComment) {
+        const typeColor = commentType === "Kept" ? "green" : commentType === "New" ? "cyan" : "red";
+
+        // For Kept comments, show both versions side by side
+        if (commentType === "Kept" && baselineComment && currentComment) {
+          return (
+            <ScreenContainer title="Matched Comment (in both baseline & current)" borderColor="green" height={height}>
+              <Box flexDirection="column" paddingX={1} overflowY="hidden">
+                <Box marginBottom={1}>
+                  <Text bold color="green">{baselineComment.header || currentComment.header || "(no header)"}</Text>
+                </Box>
+
+                <Box marginBottom={1} flexDirection="column">
+                  <Text dimColor bold>BASELINE:</Text>
+                  <Text color="gray" wrap="wrap">"{baselineComment.quotedText}"</Text>
+                  <Text dimColor wrap="wrap">{baselineComment.description}</Text>
+                </Box>
+
+                <Box flexDirection="column">
+                  <Text dimColor bold>CURRENT:</Text>
+                  <Text color="yellow" wrap="wrap">"{currentComment.quotedText}"</Text>
+                  <Text wrap="wrap">{currentComment.description}</Text>
+                </Box>
+              </Box>
+
+              <Box marginTop={1}>
+                <SelectInput
+                  items={[{ label: "← Back to Comments", value: "back" }]}
+                  onSelect={() => setSelectedCommentKey(null)}
+                />
+              </Box>
+            </ScreenContainer>
+          );
+        }
+
+        // For Lost comments with filter reason, show detailed view
+        if (commentType === "Lost" && baselineComment && filterInfo) {
+          return (
+            <ScreenContainer title="Missing from Current (pipeline-filtered)" borderColor="red" height={height}>
+              <Box flexDirection="column" paddingX={1} overflowY="hidden">
+                <Box marginBottom={1}>
+                  <Text bold color="red">{baselineComment.header || "(no header)"}</Text>
+                </Box>
+
+                <Box marginBottom={1} flexDirection="column">
+                  <Text dimColor>Quoted text (from baseline):</Text>
+                  <Text color="yellow" wrap="wrap">"{baselineComment.quotedText}"</Text>
+                </Box>
+
+                <Box marginBottom={1} flexDirection="column">
+                  <Text dimColor>Description:</Text>
+                  <Text wrap="wrap">{baselineComment.description}</Text>
+                </Box>
+
+                <Box marginTop={1} borderStyle="single" borderColor="magenta" paddingX={1} flexDirection="column">
+                  <Text bold color="magenta">Filter Reason ({filterInfo.stage}):</Text>
+                  <Text wrap="wrap">{filterInfo.filterReason}</Text>
+                  {filterInfo.supportLocation && (
+                    <Box marginTop={1}>
+                      <Text dimColor>Support found at: </Text>
+                      <Text color="cyan" wrap="wrap">{filterInfo.supportLocation}</Text>
+                    </Box>
+                  )}
+                </Box>
+              </Box>
+
+              <Box marginTop={1}>
+                <SelectInput
+                  items={[{ label: "← Back to Comments", value: "back" }]}
+                  onSelect={() => setSelectedCommentKey(null)}
+                />
+              </Box>
+            </ScreenContainer>
+          );
+        }
+
+        // For New/Lost (without filter reason), show single version with label
+        const comment = currentComment || baselineComment;
+        const versionLabel = commentType === "New" ? "- new vs baseline" : "- in baseline only";
+
+        return (
+          <ScreenContainer title={`${commentType} Comment ${versionLabel}`} borderColor={typeColor} height={height}>
+            <Box flexDirection="column" paddingX={1} overflowY="hidden">
+              <Box marginBottom={1}>
+                <Text bold color={typeColor}>{comment!.header || "(no header)"}</Text>
+              </Box>
+
+              <Box marginBottom={1} flexDirection="column">
+                <Text dimColor>Quoted text:</Text>
+                <Text color="yellow" wrap="wrap">"{comment!.quotedText}"</Text>
+              </Box>
+
+              <Box flexDirection="column">
+                <Text dimColor>Description:</Text>
+                <Text wrap="wrap">{comment!.description}</Text>
+              </Box>
+
+              {commentType === "Lost" && !filterInfo && (
+                <Box marginTop={1} borderStyle="single" borderColor="gray" paddingX={1} flexDirection="column">
+                  <Text bold color="gray">Why is this missing from the current run?</Text>
+                  <Text wrap="wrap">
+                    {data?.filteredItems !== undefined
+                      ? "The LLM extractors did not detect this issue in the current run. This is normal variance between runs - LLMs don't always find the same issues."
+                      : "No telemetry available for this run (run predates telemetry feature)."}
+                  </Text>
+                </Box>
+              )}
+            </Box>
+
+            <Box marginTop={1}>
+              <SelectInput
+                items={[{ label: "← Back to Comments", value: "back" }]}
+                onSelect={() => setSelectedCommentKey(null)}
+              />
+            </Box>
+          </ScreenContainer>
+        );
+      }
+    }
+  }
+
+  // Document comparison detail view
+  if (selectedRunDetail && selectedSnapshotId) {
+    const snapshot = selectedRunDetail.snapshots.find((s) => s.id === selectedSnapshotId);
+    if (snapshot) {
+      const data = snapshot.comparisonData as {
+        matchedComments?: Array<{ baselineComment?: { quotedText: string; header: string | null }; currentComment?: { quotedText: string; header: string | null } }>;
+        newComments?: Array<{ quotedText: string; header: string | null; description: string }>;
+        lostComments?: Array<{ quotedText: string; header: string | null; description: string }>;
+        filteredItems?: Array<{ stage: string; quotedText: string; header?: string; filterReason: string; supportLocation?: string }>;
+        extractionPhase?: {
+          multiExtractorEnabled: boolean;
+          extractors: Array<{
+            extractorId: string;
+            model: string;
+            temperature: number;
+            temperatureConfig?: number | 'default';
+            thinkingEnabled: boolean;
+            issuesFound: number;
+            durationMs: number;
+            error?: string;
+          }>;
+          totalIssuesBeforeJudge: number;
+          totalIssuesAfterJudge: number;
+          judgeModel?: string;
+          judgeDurationMs?: number;
+          judgeDecisions: Array<{
+            issueText: string;
+            decision: 'accepted' | 'merged' | 'rejected';
+            reasoning: string;
+            sourceExtractors: string[];
+          }>;
+        };
+        pipelineCounts?: {
+          issuesAfterDedup: number;
+          issuesAfterFiltering: number;
+          commentsGenerated: number;
+          commentsKept: number;
+        };
+      } | null;
+
+      const matched = data?.matchedComments || [];
+      const newComments = data?.newComments || [];
+      const lost = data?.lostComments || [];
+      const filteredItems = data?.filteredItems || [];
+      const extractionPhase = data?.extractionPhase;
+      const pipelineCounts = data?.pipelineCounts;
+
+      // Helper to check if a lost comment has a filter reason
+      const hasFilterReason = (lostComment: { quotedText: string; header: string | null }) => {
+        if (filteredItems.length === 0) return false;
+        return filteredItems.some((f) => {
+          const fText = f.quotedText.toLowerCase().trim();
+          const lText = lostComment.quotedText.toLowerCase().trim();
+          return fText.includes(lText) || lText.includes(fText) ||
+            (f.header && lostComment.header && f.header.toLowerCase() === lostComment.header.toLowerCase());
+        });
+      };
+
+      // Build scrollable list of ALL comments - no truncation
+      const commentItems: Array<{ label: string; value: string }> = [];
+
+      // Add items grouped by category
+      if (matched.length > 0) {
+        matched.forEach((c, i) => {
+          const comment = c.baselineComment || c.currentComment;
+          const label = comment ? (comment.header || truncate(comment.quotedText, 50)) : "Unknown";
+          commentItems.push({
+            label: `= ${label}`,
+            value: `kept-${i}`,
+          });
+        });
+      }
+
+      if (newComments.length > 0) {
+        newComments.forEach((c, i) => {
+          commentItems.push({
+            label: `+ ${c.header || truncate(c.quotedText, 50)}`,
+            value: `new-${i}`,
+          });
+        });
+      }
+
+      if (lost.length > 0) {
+        lost.forEach((c, i) => {
+          const hasReason = hasFilterReason(c);
+          const suffix = hasReason ? " [filtered]" : "";
+          // Use truncated quotedText for consistency with filtered items
+          commentItems.push({
+            label: `- ${truncate(c.quotedText, 50)}${suffix}`,
+            value: `lost-${i}`,
+          });
+        });
+      }
+
+      // Build filtered items list separately
+      const filteredItemsList: Array<{ label: string; value: string }> = [];
+      if (filteredItems.length > 0) {
+        filteredItemsList.push({ label: "--- Filtered by pipeline ---", value: "sep-filtered" });
+        filteredItems.forEach((f, i) => {
+          const stageLabel = f.stage === 'supported-elsewhere-filter' ? 'F' : f.stage === 'review' ? 'R' : '?';
+          filteredItemsList.push({
+            label: `[${stageLabel}] ${truncate(f.quotedText, 50)}`,
+            value: `filtered-${i}`,
+          });
+        });
+      }
+
+      if (commentItems.length === 0) {
+        commentItems.push({ label: "  No comments in this comparison", value: "empty" });
+      }
+
+      commentItems.push({ label: "  ← Back", value: "back" });
+
+      // Count lost with filter reasons
+      const lostWithReason = lost.filter((c) => hasFilterReason(c)).length;
+
+      // Calculate totals
+      const baselineTotal = matched.length + lost.length;
+      const currentTotal = matched.length + newComments.length;
+
+      // Determine if there are any differences
+      const isUnchanged = lost.length === 0 && newComments.length === 0;
+
+      return (
+        <ScreenContainer title={truncate(snapshot.documentTitle, 50)} borderColor="blue" height={height}>
+          <Box marginBottom={1} paddingX={1} flexDirection="column">
+            {/* Summary counts */}
+            <Box marginBottom={1}>
+              <Text>
+                <Text dimColor>Baseline: </Text>
+                <Text bold>{baselineTotal} issues</Text>
+                <Text dimColor> → Current run: </Text>
+                <Text bold>{currentTotal} issues</Text>
+              </Text>
+            </Box>
+
+            {/* Comparison: what changed between baseline and current */}
+            <Box flexDirection="column" borderStyle="single" borderColor="gray" paddingX={1} marginBottom={1}>
+              <Text bold>Comparison:</Text>
+              <Text>
+                <Text color="green">✓ {matched.length} issues appear in BOTH baseline and current</Text>
+              </Text>
+              <Text>
+                <Text color="cyan">+ {newComments.length} issues are NEW (in current run, not in baseline)</Text>
+              </Text>
+              <Text>
+                <Text color="red">− {lost.length} issues are GONE (were in baseline, not in current run)</Text>
+              </Text>
+            </Box>
+
+            {/* Current run details: extraction → filter → review */}
+            {(extractionPhase || pipelineCounts) && (() => {
+              // Count filtered items by stage
+              const supportedElsewhereCount = filteredItems.filter(f => f.stage === 'supported-elsewhere-filter').length;
+              const reviewFilteredCount = filteredItems.filter(f => f.stage === 'review').length;
+
+              // Use actual pipeline counts when available (pipelineCounts is source of truth)
+              const afterDedup = pipelineCounts?.issuesAfterDedup ?? extractionPhase?.totalIssuesAfterJudge;
+              const afterFilter = pipelineCounts?.issuesAfterFiltering;
+              const commentsGenerated = pipelineCounts?.commentsGenerated;
+              const commentsKept = pipelineCounts?.commentsKept;
+
+              // Calculate what was filtered at each stage
+              const filteredBySupported = afterDedup !== undefined && afterFilter !== undefined ? afterDedup - afterFilter : supportedElsewhereCount;
+              const filteredByGeneration = afterFilter !== undefined && commentsGenerated !== undefined ? afterFilter - commentsGenerated : 0;
+              const filteredByReview = commentsGenerated !== undefined && commentsKept !== undefined ? commentsGenerated - commentsKept : reviewFilteredCount;
+
+              return (
+                <Box marginBottom={1} flexDirection="column">
+                  <Text bold>Current run details:</Text>
+                  {extractionPhase && (
+                    <>
+                      <Text dimColor>
+                        Extraction: {extractionPhase.extractors?.length || 0} models → {extractionPhase.totalIssuesBeforeJudge} issues → dedup → {afterDedup}
+                      </Text>
+                      {extractionPhase.extractors && extractionPhase.extractors.length > 0 && (
+                        <Text dimColor>
+                          {"  "}({extractionPhase.extractors.map(e => `${e.model.split('/').pop()}: ${e.issuesFound}`).join(', ')})
+                        </Text>
+                      )}
+                    </>
+                  )}
+                  {filteredBySupported > 0 && (
+                    <Text dimColor>
+                      Filter: {filteredBySupported} removed (supported elsewhere) → {afterFilter}
+                    </Text>
+                  )}
+                  {filteredByGeneration > 0 && (
+                    <Text dimColor>
+                      Comment gen: {filteredByGeneration} failed (empty/invalid) → {commentsGenerated}
+                    </Text>
+                  )}
+                  {filteredByReview > 0 && (
+                    <Text dimColor>
+                      Review: {filteredByReview} removed (redundant/low-value) → {commentsKept}
+                    </Text>
+                  )}
+                  <Text dimColor>
+                    Result: {commentsKept ?? currentTotal} comments kept
+                  </Text>
+                </Box>
+              );
+            })()}
+
+            {/* Simple status - no judgments, just facts */}
+            {isUnchanged && (
+              <Text color="green">✓ No differences</Text>
+            )}
+          </Box>
+
+          <SelectInput
+            items={[...filteredItemsList, ...commentItems]}
+            limit={maxItems}
+            onSelect={(item) => {
+              if (item.value === "back") {
+                setSelectedSnapshotId(null);
+              } else if (item.value.startsWith("kept-") || item.value.startsWith("new-") || item.value.startsWith("lost-") || item.value.startsWith("filtered-")) {
+                setSelectedCommentKey(item.value);
+              }
+            }}
+          />
+
+          <Box marginTop={1}>
+            <Text dimColor>Enter View Comment | Escape Back to Run</Text>
+          </Box>
+        </ScreenContainer>
+      );
+    }
+  }
+
+  // Run detail view
+  if (selectedRunDetail) {
+    const formatChangeSummary = (s: { keptCount: number; newCount: number; lostCount: number }) => {
+      const parts: string[] = [];
+      if (s.keptCount > 0) parts.push(`${s.keptCount} matched`);
+      if (s.newCount > 0) parts.push(`+${s.newCount} new`);
+      if (s.lostCount > 0) parts.push(`-${s.lostCount} missing`);
+      return parts.length > 0 ? parts.join(", ") : "no comments";
+    };
+
+    const unchangedCount = selectedRunDetail.snapshots.filter((s) => s.status === "unchanged").length;
+    const changedCount = selectedRunDetail.snapshots.filter((s) => s.status === "changed").length;
+
+    const items = [
+      ...selectedRunDetail.snapshots.slice(0, maxItems - 3).map((s) => {
+        const icon = s.status === "unchanged" ? "=" : "~";
+        return {
+          label: `[${icon}] ${truncate(s.documentTitle, 35)} | ${formatChangeSummary(s)}`,
+          value: s.id,
+        };
+      }),
+      { label: "← Back to History", value: "back" },
+    ];
+
+    return (
+      <ScreenContainer title={`Run: ${selectedRunDetail.name || selectedRunDetail.id.slice(0, 8)}`} borderColor="magenta" height={height}>
+        <InfoBox>
+          <Text>
+            <Text color="green">[=] {unchangedCount} unchanged</Text>
+            {" | "}
+            <Text color="yellow">[~] {changedCount} changed</Text>
+            {" | "}
+            Baseline: <Text color="cyan">{sanitizeName(selectedRunDetail.baseline.name)}</Text>
+          </Text>
+        </InfoBox>
+
+        <SelectInput
+          items={items}
+          onSelect={(item) => {
+            if (item.value === "back") {
+              setSelectedRunDetail(null);
+              setSelectedRunId(null);
+            } else {
+              setSelectedSnapshotId(item.value);
+            }
+          }}
+        />
+
+        <Box marginTop={1}>
+          <Text dimColor>Enter View Comments | Escape Back to History</Text>
+        </Box>
+      </ScreenContainer>
+    );
+  }
+
+  // History tab
+  if (activeTab === "history") {
+    const formatDate = (d: Date) => {
+      return new Date(d).toLocaleString("en-US", {
+        month: "short",
+        day: "numeric",
+        hour: "2-digit",
+        minute: "2-digit",
+      });
+    };
+
+    const items = [
+      ...validationRuns.slice(0, maxItems - 3).map((r) => {
+        const statusIcon = r.status === "completed"
+          ? (r.changedCount === 0 ? "=" : "~")
+          : r.status === "running" ? "*" : "x";
+
+        return {
+          label: `[${statusIcon}] ${formatDate(r.createdAt)} | ${r.summary || r.status}`,
+          value: `view:${r.id}`,
+        };
+      }),
+      { label: "← Back to Baselines", value: "back" },
+    ];
+
+    return (
+      <ScreenContainer title="Validation - Run History" borderColor="magenta" height={height}>
+        {renderTabs()}
+
+        <InfoBox>
+          <Text>
+            Baseline: <Text color="cyan">{selectedBaseline ? sanitizeName(selectedBaseline.name) : "None"}</Text>
+            {" | "}
+            {validationRuns.length} run{validationRuns.length !== 1 ? "s" : ""}
+          </Text>
+        </InfoBox>
+
+        {validationRuns.length === 0 ? (
+          <Box marginY={1}>
+            <Text dimColor>No runs yet. Go to Run tab to execute a validation run.</Text>
+          </Box>
+        ) : (
+          <SelectInput
+            items={items}
+            onSelect={(item) => {
+              if (item.value === "back") {
+                setActiveTab("baselines");
+              } else if (item.value.startsWith("view:")) {
+                const runId = item.value.replace("view:", "");
+                setSelectedRunId(runId);
+                loadRunDetail(runId);
+              }
+            }}
+          />
+        )}
+
+        <Box marginTop={1}>
+          <Text dimColor>Enter View Details | Tab Switch | Escape Back</Text>
+        </Box>
+      </ScreenContainer>
+    );
+  }
+
+  // Baselines tab (default)
+  const items = [
+    { label: "+ Create New Baseline", value: "create" },
+    ...baselines.map((b) => ({
+      label: `${selectedBaseline?.id === b.id ? "● " : "○ "}${sanitizeName(b.name)} (${b.snapshotCount} docs)`,
+      value: `select:${b.id}`,
+    })),
+    ...(selectedBaseline ? [{ label: "- Delete Selected Baseline", value: "delete" }] : []),
+    { label: "← Back to Main Menu", value: "back" },
+  ];
+
+  return (
+    <ScreenContainer title="Validation - Baselines" borderColor="magenta" height={height}>
+      {renderTabs()}
+
+      <InfoBox>
+        <Text>
+          Agent: <Text color="cyan">{selectedAgent?.name || "None"}</Text>
+          {selectedBaseline && (
+            <>
+              {" | "}
+              Selected: <Text color="green">{sanitizeName(selectedBaseline.name)}</Text>
+            </>
+          )}
+        </Text>
+      </InfoBox>
+
+      <SelectInput
+        items={items}
+        onSelect={(item) => {
+          if (item.value === "back") {
+            onBack();
+          } else if (item.value === "create") {
+            setCreatingBaseline(true);
+            setNewBaselineName("");
+          } else if (item.value === "delete" && selectedBaseline) {
+            deleteBaseline(selectedBaseline.id);
+          } else if (item.value.startsWith("select:")) {
+            const baselineId = item.value.replace("select:", "");
+            const baseline = baselines.find((b) => b.id === baselineId);
+            if (baseline) setSelectedBaseline(baseline);
+          }
+        }}
+      />
+
+      <Box marginTop={1}>
+        <Text dimColor>Enter Select | Tab → Run | Escape Back</Text>
+      </Box>
+    </ScreenContainer>
+  );
+}
+
+/**
+ * Convert repository snapshot to EvaluationSnapshot type.
+ */
+function toEvaluationSnapshot(snapshot: {
+  evaluationVersionId: string;
+  agentId: string;
+  agentName: string;
+  createdAt: Date;
+  documentId: string;
+  documentTitle: string;
+  grade: number | null;
+  pipelineTelemetry: unknown;
+  comments: Array<{
+    id: string;
+    quotedText: string;
+    header: string | null;
+    description: string;
+    importance: number | null;
+    startOffset: number;
+    endOffset: number;
+  }>;
+}): EvaluationSnapshot {
+  return {
+    evaluationVersionId: snapshot.evaluationVersionId,
+    agentId: snapshot.agentId,
+    agentName: snapshot.agentName,
+    createdAt: snapshot.createdAt,
+    documentId: snapshot.documentId,
+    documentTitle: snapshot.documentTitle,
+    comments: snapshot.comments,
+    grade: snapshot.grade,
+    pipelineTelemetry: extractTelemetry(snapshot.pipelineTelemetry),
+  };
+}
+
+/**
+ * Extract pipeline telemetry snapshot from raw data.
+ */
+function extractTelemetry(raw: unknown): {
+  totalDurationMs: number;
+  issuesExtracted: number;
+  issuesAfterDedup: number;
+  issuesAfterFiltering: number;
+  commentsGenerated: number;
+  commentsKept: number;
+  filteredItems?: Array<{
+    stage: string;
+    quotedText: string;
+    header?: string;
+    filterReason: string;
+    supportLocation?: string;
+    originalIndex: number;
+  }>;
+  extractionPhase?: {
+    multiExtractorEnabled: boolean;
+    extractors: Array<{
+      extractorId: string;
+      model: string;
+      temperature: number;
+      temperatureConfig?: number | 'default';
+      thinkingEnabled: boolean;
+      issuesFound: number;
+      durationMs: number;
+      error?: string;
+    }>;
+    totalIssuesBeforeJudge: number;
+    totalIssuesAfterJudge: number;
+    judgeModel?: string;
+    judgeDurationMs?: number;
+    judgeDecisions: Array<{
+      issueText: string;
+      decision: 'accepted' | 'merged' | 'rejected';
+      reasoning: string;
+      sourceExtractors: string[];
+    }>;
+  };
+} | null {
+  if (!raw || typeof raw !== "object") return null;
+
+  const telemetry = raw as Record<string, unknown>;
+  const finalCounts = telemetry.finalCounts as Record<string, number> | undefined;
+
+  if (!finalCounts) return null;
+
+  // Extract filtered items if present
+  const filteredItems = telemetry.filteredItems as Array<{
+    stage: string;
+    quotedText: string;
+    header?: string;
+    filterReason: string;
+    supportLocation?: string;
+    originalIndex: number;
+  }> | undefined;
+
+  // Extract extraction phase telemetry if present
+  const extractionPhase = telemetry.extractionPhase as {
+    multiExtractorEnabled: boolean;
+    extractors: Array<{
+      extractorId: string;
+      model: string;
+      temperature: number;
+      temperatureConfig?: number | 'default';
+      thinkingEnabled: boolean;
+      issuesFound: number;
+      durationMs: number;
+      error?: string;
+    }>;
+    totalIssuesBeforeJudge: number;
+    totalIssuesAfterJudge: number;
+    judgeModel?: string;
+    judgeDurationMs?: number;
+    judgeDecisions: Array<{
+      issueText: string;
+      decision: 'accepted' | 'merged' | 'rejected';
+      reasoning: string;
+      sourceExtractors: string[];
+    }>;
+  } | undefined;
+
+  return {
+    totalDurationMs: (telemetry.totalDurationMs as number) || 0,
+    issuesExtracted: finalCounts.issuesExtracted || 0,
+    issuesAfterDedup: finalCounts.issuesAfterDedup || 0,
+    issuesAfterFiltering: finalCounts.issuesAfterFiltering || 0,
+    commentsGenerated: finalCounts.commentsGenerated || 0,
+    commentsKept: finalCounts.commentsKept || 0,
+    filteredItems,
+    extractionPhase,
+  };
+}
diff --git a/meta-evals/src/components/extractor-lab/fuzzy-dedup.ts b/meta-evals/src/components/extractor-lab/fuzzy-dedup.ts
new file mode 100644
index 000000000..81d154921
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/fuzzy-dedup.ts
@@ -0,0 +1,323 @@
+/**
+ * Fuzzy deduplication strategies for comparing extraction issues.
+ *
+ * Four strategies:
+ * 1. Exact - Normalized exact match
+ * 2. Jaccard - Word overlap similarity
+ * 3. Fuse.js - Fuzzy search with Bitap algorithm
+ * 4. uFuzzy - Lightweight fuzzy search
+ */
+
+import Fuse from "fuse.js";
+import uFuzzy from "@leeoniya/ufuzzy";
+import type {
+  ExtractorIssue,
+  DedupStrategy,
+  DedupComparison,
+  DuplicateMatch,
+  MultiStrategyDedupResult,
+} from "./types";
+
+// ============================================================================
+// Normalization
+// ============================================================================
+
+function normalizeText(text: string): string {
+  return text.toLowerCase().replace(/\s+/g, " ").trim();
+}
+
+function getWords(text: string): Set<string> {
+  return new Set(
+    normalizeText(text)
+      .split(/\s+/)
+      .filter((w) => w.length > 2)
+  );
+}
+
+// ============================================================================
+// Similarity Functions
+// ============================================================================
+
+/**
+ * Jaccard similarity: intersection over union of words
+ */
+export function jaccardSimilarity(a: string, b: string): number {
+  const wordsA = getWords(a);
+  const wordsB = getWords(b);
+
+  if (wordsA.size === 0 && wordsB.size === 0) return 1;
+  if (wordsA.size === 0 || wordsB.size === 0) return 0;
+
+  const intersection = [...wordsA].filter((w) => wordsB.has(w)).length;
+  const union = new Set([...wordsA, ...wordsB]).size;
+
+  return intersection / union;
+}
+
+/**
+ * Check if one text contains the other (after normalization)
+ */
+export function isSubstring(a: string, b: string): boolean {
+  const normA = normalizeText(a);
+  const normB = normalizeText(b);
+  return normA.includes(normB) || normB.includes(normA);
+}
+
+/**
+ * Fuse.js similarity score (0 = perfect match, 1 = no match)
+ */
+export function fuseSimilarity(a: string, b: string): number {
+  const fuse = new Fuse([{ text: b }], {
+    keys: ["text"],
+    includeScore: true,
+    threshold: 1.0, // Accept all results, we'll check score ourselves
+    ignoreLocation: true,
+    minMatchCharLength: 2,
+  });
+
+  const results = fuse.search(a);
+  if (results.length > 0 && results[0].score !== undefined) {
+    return results[0].score;
+  }
+  return 1;
+}
+
+/**
+ * uFuzzy similarity (returns 0-1, higher = more similar)
+ */
+export function ufuzzySimilarity(a: string, b: string): number {
+  const uf = new uFuzzy({
+    intraMode: 1,
+    intraIns: 1,
+    intraSub: 1,
+    intraTrn: 1,
+    intraDel: 1,
+  });
+
+  const haystack = [b];
+  const [idxs, info] = uf.search(haystack, a);
+
+  if (idxs && idxs.length > 0 && info && info.ranges[0]) {
+    const ranges = info.ranges[0];
+    let matchedChars = 0;
+    for (let i = 0; i < ranges.length; i += 2) {
+      matchedChars += ranges[i + 1] - ranges[i];
+    }
+    return matchedChars / Math.max(a.length, b.length);
+  }
+
+  return 0;
+}
+
+// ============================================================================
+// Deduplication Strategies
+// ============================================================================
+
+/**
+ * Calculate similarity between two issues using the specified strategy.
+ * Returns { isDuplicate, similarity } where similarity is 0-1 (higher = more similar)
+ */
+export function calculateSimilarity(
+  a: ExtractorIssue,
+  b: ExtractorIssue,
+  strategy: DedupStrategy,
+  threshold = 0.5
+): { isDuplicate: boolean; similarity: number } {
+  const textA = a.exactText;
+  const textB = b.exactText;
+
+  switch (strategy) {
+    case "exact": {
+      const isMatch = normalizeText(textA) === normalizeText(textB);
+      return { isDuplicate: isMatch, similarity: isMatch ? 1 : 0 };
+    }
+
+    case "jaccard": {
+      // Check substring first
+      if (isSubstring(textA, textB)) {
+        return { isDuplicate: true, similarity: 1 };
+      }
+      const sim = jaccardSimilarity(textA, textB);
+      return { isDuplicate: sim >= threshold, similarity: sim };
+    }
+
+    case "fuse": {
+      // Check substring first
+      if (isSubstring(textA, textB)) {
+        return { isDuplicate: true, similarity: 1 };
+      }
+      // Fuse score: 0 = perfect, 1 = no match. Convert to 0-1 similarity.
+      const fuseScore = fuseSimilarity(textA, textB);
+      const sim = 1 - fuseScore;
+      return { isDuplicate: fuseScore < 0.4, similarity: sim };
+    }
+
+    case "ufuzzy": {
+      // Check substring first
+      if (isSubstring(textA, textB)) {
+        return { isDuplicate: true, similarity: 1 };
+      }
+      const sim = ufuzzySimilarity(textA, textB);
+      return { isDuplicate: sim > threshold, similarity: sim };
+    }
+
+    default:
+      return { isDuplicate: false, similarity: 0 };
+  }
+}
+
+/**
+ * Compute a quality score for an issue.
+ * Higher = better quality (prefer to keep).
+ * Factors: text length (more context), severity, confidence, importance.
+ */
+function computeIssueQuality(issue: ExtractorIssue): number {
+  // Normalize text length (log scale to prevent extremely long texts from dominating)
+  const lengthScore = Math.log10(issue.exactText.length + 1) / 4; // ~0.5-1.0 for typical lengths
+
+  // Combine severity, confidence, importance (each 0-100, normalize to 0-1)
+  const severityNorm = issue.severityScore / 100;
+  const confidenceNorm = issue.confidenceScore / 100;
+  const importanceNorm = issue.importanceScore / 100;
+
+  // Weighted combination: prefer longer text, then higher scores
+  // Length is most important (40%), then confidence (25%), severity (20%), importance (15%)
+  return (
+    lengthScore * 0.4 +
+    confidenceNorm * 0.25 +
+    severityNorm * 0.2 +
+    importanceNorm * 0.15
+  );
+}
+
+/**
+ * Deduplicate issues using a specific strategy.
+ * Returns unique issues and duplicate matches with similarity info.
+ *
+ * When duplicates are found, KEEPS the issue with higher quality score
+ * (longer text + higher severity/confidence/importance).
+ */
+export function deduplicateWithStrategy(
+  issues: ExtractorIssue[],
+  strategy: DedupStrategy
+): DedupComparison {
+  const unique: ExtractorIssue[] = [];
+  const duplicates: DuplicateMatch[] = [];
+
+  for (const issue of issues) {
+    // Check if this issue is a duplicate of any already-kept issue
+    let bestMatch: { keptIdx: number; kept: ExtractorIssue; similarity: number } | null = null;
+
+    for (let i = 0; i < unique.length; i++) {
+      const kept = unique[i];
+      const { isDuplicate, similarity } = calculateSimilarity(issue, kept, strategy);
+      if (isDuplicate) {
+        if (!bestMatch || similarity > bestMatch.similarity) {
+          bestMatch = { keptIdx: i, kept, similarity };
+        }
+      }
+    }
+
+    if (bestMatch) {
+      // Found a duplicate - decide which to keep based on quality score
+      const newQuality = computeIssueQuality(issue);
+      const keptQuality = computeIssueQuality(bestMatch.kept);
+
+      if (newQuality > keptQuality) {
+        // New issue is better - swap: remove kept, add new, mark kept as duplicate
+        duplicates.push({
+          duplicate: bestMatch.kept,
+          matchedTo: issue,
+          similarity: bestMatch.similarity,
+        });
+        unique[bestMatch.keptIdx] = issue;
+      } else {
+        // Kept issue is better - mark new as duplicate
+        duplicates.push({
+          duplicate: issue,
+          matchedTo: bestMatch.kept,
+          similarity: bestMatch.similarity,
+        });
+      }
+    } else {
+      unique.push(issue);
+    }
+  }
+
+  return {
+    strategy,
+    unique,
+    duplicates,
+    originalCount: issues.length,
+  };
+}
+
+/**
+ * Run all dedup strategies and return comparison results
+ */
+export function runAllDedupStrategies(
+  issues: ExtractorIssue[]
+): MultiStrategyDedupResult {
+  console.error(`[DEDUP] Running dedup on ${issues.length} issues...`);
+
+  const t0 = Date.now();
+  const exact = deduplicateWithStrategy(issues, "exact");
+  console.error(`[DEDUP] exact: ${Date.now() - t0}ms`);
+
+  const t1 = Date.now();
+  const jaccard = deduplicateWithStrategy(issues, "jaccard");
+  console.error(`[DEDUP] jaccard: ${Date.now() - t1}ms`);
+
+  const t2 = Date.now();
+  const fuse = deduplicateWithStrategy(issues, "fuse");
+  console.error(`[DEDUP] fuse: ${Date.now() - t2}ms`);
+
+  // NOTE: uFuzzy is disabled due to performance issues (hangs on large texts)
+  // const t3 = Date.now();
+  // const ufuzzy = deduplicateWithStrategy(issues, "ufuzzy");
+  // console.error(`[DEDUP] ufuzzy: ${Date.now() - t3}ms`);
+
+  // Return same as jaccard for now (uFuzzy disabled)
+  const ufuzzy: DedupComparison = {
+    strategy: "ufuzzy",
+    unique: jaccard.unique,
+    duplicates: jaccard.duplicates,
+    originalCount: jaccard.originalCount,
+  };
+  console.error(`[DEDUP] ufuzzy: DISABLED (using jaccard results)`);
+
+  console.error(`[DEDUP] Total: ${Date.now() - t0}ms`);
+
+  return { exact, jaccard, fuse, ufuzzy };
+}
+
+/**
+ * Flatten extractor results into issues with extractor IDs
+ */
+export function flattenExtractorResults(
+  extractorResults: Array<{
+    extractorId: string;
+    issues: Array<{
+      exactText: string;
+      issueType: string;
+      fallacyType?: string;
+      severityScore: number;
+      confidenceScore: number;
+      importanceScore: number;
+      reasoning: string;
+    }>;
+  }>
+): ExtractorIssue[] {
+  return extractorResults.flatMap((r) =>
+    r.issues.map((issue) => ({
+      extractorId: r.extractorId,
+      exactText: issue.exactText,
+      issueType: issue.issueType,
+      fallacyType: issue.fallacyType,
+      severityScore: issue.severityScore,
+      confidenceScore: issue.confidenceScore,
+      importanceScore: issue.importanceScore,
+      reasoning: issue.reasoning,
+    }))
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/index.ts b/meta-evals/src/components/extractor-lab/index.ts
new file mode 100644
index 000000000..e2adaf3dd
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/index.ts
@@ -0,0 +1,3 @@
+export * from "./types";
+export * from "./utils";
+export * from "./views";
diff --git a/meta-evals/src/components/extractor-lab/types.ts b/meta-evals/src/components/extractor-lab/types.ts
new file mode 100644
index 000000000..16c08853d
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/types.ts
@@ -0,0 +1,104 @@
+/**
+ * Types for Extractor Lab component
+ */
+
+import type { DocumentChoice } from "@roast/db";
+import type {
+  ExtractorConfig,
+  MultiExtractorResult,
+} from "@roast/ai/fallacy-extraction/lab";
+import type {
+  FallacyJudgeOutput,
+  JudgeDecision,
+  JudgeConfig,
+} from "@roast/ai/fallacy-judge/types";
+
+export type { DocumentChoice, ExtractorConfig, MultiExtractorResult, FallacyJudgeOutput, JudgeDecision, JudgeConfig };
+
+/** Props for the main ExtractorLab component */
+export interface ExtractorLabProps {
+  height: number;
+  maxItems: number;
+  documents: DocumentChoice[];
+  onSearchDocuments: (filter: string) => void;
+  onBack: () => void;
+}
+
+/** Result from a single judge run with its config */
+export interface JudgeRunResult {
+  config: JudgeConfig;
+  label: string;
+  result: FallacyJudgeOutput;
+  durationMs: number;
+  error?: string;
+}
+
+/** Issue with extractor source info for pre-judge dedup */
+export interface ExtractorIssue {
+  extractorId: string;
+  exactText: string;
+  issueType: string;
+  fallacyType?: string;
+  severityScore: number;
+  confidenceScore: number;
+  importanceScore: number;
+  reasoning: string;
+}
+
+/** Result from pre-judge deduplication */
+export interface PreJudgeDedupResult {
+  /** Unique issues to send to judge */
+  unique: ExtractorIssue[];
+  /** Duplicate issues removed */
+  duplicates: ExtractorIssue[];
+  /** Original total count */
+  originalCount: number;
+}
+
+/** Dedup strategy identifier */
+export type DedupStrategy = "exact" | "jaccard" | "fuse" | "ufuzzy";
+
+/** A duplicate issue with info about what it matched */
+export interface DuplicateMatch {
+  duplicate: ExtractorIssue;
+  matchedTo: ExtractorIssue;
+  similarity: number;  // 0-1 similarity score
+}
+
+/** Result from a single dedup strategy */
+export interface DedupComparison {
+  strategy: DedupStrategy;
+  unique: ExtractorIssue[];
+  duplicates: DuplicateMatch[];
+  originalCount: number;
+}
+
+/** Results from all dedup strategies for comparison */
+export interface MultiStrategyDedupResult {
+  exact: DedupComparison;
+  jaccard: DedupComparison;
+  fuse: DedupComparison;
+  ufuzzy: DedupComparison;
+}
+
+/** All possible steps/views in the Extractor Lab */
+export type LabStep =
+  | { type: "select-document" }
+  | { type: "configure-extractors" }
+  | { type: "add-extractor" }
+  | { type: "running" }
+  | { type: "results"; result: MultiExtractorResult }
+  | { type: "issue-detail"; result: MultiExtractorResult; extractorIdx: number; issueIdx: number }
+  | { type: "pre-judge-dedup"; result: MultiExtractorResult; multiDedup: MultiStrategyDedupResult; selectedStrategy: DedupStrategy }
+  | { type: "running-judge"; result: MultiExtractorResult; dedupResult: PreJudgeDedupResult; judgeConfigs: JudgeConfig[] }
+  | { type: "judge-comparison"; result: MultiExtractorResult; judgeResults: JudgeRunResult[] }
+  | { type: "judge-results"; result: MultiExtractorResult; judgeResult: FallacyJudgeOutput; judgeLabel: string; judgeResults?: JudgeRunResult[] }
+  | { type: "judge-decision-detail"; result: MultiExtractorResult; judgeResult: FallacyJudgeOutput; decision: JudgeDecision; isRejected: boolean; judgeLabel: string; judgeResults?: JudgeRunResult[] };
+
+/** Logger interface for judge tool */
+export interface SimpleLogger {
+  info: (...args: unknown[]) => void;
+  warn: (...args: unknown[]) => void;
+  error: (...args: unknown[]) => void;
+  debug: (...args: unknown[]) => void;
+}
diff --git a/meta-evals/src/components/extractor-lab/utils.ts b/meta-evals/src/components/extractor-lab/utils.ts
new file mode 100644
index 000000000..dd26fb1d0
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/utils.ts
@@ -0,0 +1,67 @@
+/**
+ * Utility functions for Extractor Lab
+ */
+
+import {
+  getMultiExtractorConfig,
+  type ExtractorConfig,
+} from "@roast/ai/fallacy-extraction/lab";
+import type { SimpleLogger, MultiExtractorResult, MultiStrategyDedupResult } from "./types";
+import { flattenExtractorResults, runAllDedupStrategies } from "./fuzzy-dedup";
+
+/** Temperature presets for cycling */
+export const TEMP_PRESETS = ["default", 0, 0.3, 0.5, 0.7, 1.0] as const;
+
+/** Truncate string to fit terminal width */
+export function truncate(str: string, maxLen: number): string {
+  if (str.length <= maxLen) return str;
+  return str.slice(0, maxLen - 1) + "…";
+}
+
+/** Simple logger for the judge tool */
+export const simpleLogger: SimpleLogger = {
+  info: (...args: unknown[]) => console.error("[INFO]", ...args),
+  warn: (...args: unknown[]) => console.error("[WARN]", ...args),
+  error: (...args: unknown[]) => console.error("[ERROR]", ...args),
+  debug: (..._args: unknown[]) => {},
+};
+
+/** Load extractor configs from FALLACY_EXTRACTORS env var, fallback to default */
+export function getInitialExtractorConfigs(): ExtractorConfig[] {
+  try {
+    const config = getMultiExtractorConfig();
+    return config.extractors;
+  } catch {
+    return [{ model: "claude-sonnet-4-5-20250929", temperature: "default", thinking: false }];
+  }
+}
+
+/** Generate a label for an extractor config */
+export function generateExtractorLabel(config: ExtractorConfig): string {
+  const modelShort = config.model.split("/").pop()?.replace(/-\d{8}$/, "") ?? config.model;
+  const tempStr = config.temperature === "default" ? "tDef" : `t${config.temperature}`;
+  const thinkStr = config.thinking ? "think" : "noThink";
+  return `${modelShort}-${tempStr}-${thinkStr}`;
+}
+
+/** Run all pre-judge deduplication strategies on extractor results */
+export function runMultiStrategyDedup(extractionResult: MultiExtractorResult): MultiStrategyDedupResult {
+  // Flatten all issues from all extractors
+  const allIssues = flattenExtractorResults(extractionResult.extractorResults);
+
+  // Run all dedup strategies for comparison
+  return runAllDedupStrategies(allIssues);
+}
+
+/** Calculate text widths based on terminal width */
+export function calculateTextWidths(termWidth: number) {
+  // For extraction results: "  🔴 [issueType] text"
+  // Overhead: indicator(2) + spaces(2) + emoji(2) + space(1) + [type](~18) + space(1) = ~26
+  const issueTextWidth = Math.max(40, termWidth - 6 - 26);
+
+  // For judge decisions: "[+] type.padEnd(18) text [A,B]"
+  // Overhead: indicator(2) + [+]space(4) + type(18) + space(1) + space(1) + [A,B](10) = 36
+  const judgeTextWidth = Math.max(40, termWidth - 6 - 36);
+
+  return { issueTextWidth, judgeTextWidth };
+}
diff --git a/meta-evals/src/components/extractor-lab/views/ConfigureExtractorsView.tsx b/meta-evals/src/components/extractor-lab/views/ConfigureExtractorsView.tsx
new file mode 100644
index 000000000..5c1030751
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/ConfigureExtractorsView.tsx
@@ -0,0 +1,86 @@
+import React from "react";
+import { Box, Text } from "ink";
+import SelectInput from "ink-select-input";
+import type { ExtractorConfig, DocumentChoice } from "../types";
+
+interface ConfigureExtractorsViewProps {
+  height: number;
+  selectedDoc: DocumentChoice | null;
+  documentText: string;
+  extractorConfigs: ExtractorConfig[];
+  onHighlight: (value: string) => void;
+  onBack: () => void;
+  onRun: () => void;
+  onAdd: () => void;
+  onToggleThinking: (idx: number) => void;
+}
+
+export function ConfigureExtractorsView({
+  height,
+  selectedDoc,
+  documentText,
+  extractorConfigs,
+  onHighlight,
+  onBack,
+  onRun,
+  onAdd,
+  onToggleThinking,
+}: ConfigureExtractorsViewProps) {
+  const items = [
+    { label: "▶ Run Extraction", value: "run" },
+    { label: "─────────────────", value: "divider" },
+    ...extractorConfigs.map((config, idx) => ({
+      label: `[${idx + 1}] ${config.model} (t=${config.temperature}, think=${config.thinking})`,
+      value: `config-${idx}`,
+    })),
+    { label: "+ Add Extractor", value: "add" },
+    { label: "─────────────────", value: "divider2" },
+    { label: "← Back to Documents", value: "back" },
+  ];
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="magenta" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="magenta">Extractor Lab - Configure</Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
+        <Box flexDirection="column">
+          <Text>
+            <Text bold>Document: </Text>
+            <Text color="green">{selectedDoc?.title}</Text>
+          </Text>
+          <Text>
+            <Text bold>Text length: </Text>
+            <Text>{documentText.length} chars</Text>
+          </Text>
+          <Text>
+            <Text bold>Extractors: </Text>
+            <Text>{extractorConfigs.length}</Text>
+          </Text>
+        </Box>
+      </Box>
+
+      <SelectInput
+        items={items.filter(i => !i.value.startsWith("divider"))}
+        onHighlight={(item) => onHighlight(item.value)}
+        onSelect={(item) => {
+          if (item.value === "back") {
+            onBack();
+          } else if (item.value === "run") {
+            onRun();
+          } else if (item.value === "add") {
+            onAdd();
+          } else if (item.value.startsWith("config-")) {
+            const idx = parseInt(item.value.replace("config-", ""), 10);
+            onToggleThinking(idx);
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter=toggle think | t=cycle temp | d=delete | Esc=back</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/ErrorView.tsx b/meta-evals/src/components/extractor-lab/views/ErrorView.tsx
new file mode 100644
index 000000000..feba37a25
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/ErrorView.tsx
@@ -0,0 +1,16 @@
+import React from "react";
+import { Box, Text } from "ink";
+
+interface ErrorViewProps {
+  error: string;
+  height: number;
+}
+
+export function ErrorView({ error, height }: ErrorViewProps) {
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="red" padding={1} height={height}>
+      <Text color="red">Error: {error}</Text>
+      <Text dimColor>Press Escape to go back</Text>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/IssueDetailView.tsx b/meta-evals/src/components/extractor-lab/views/IssueDetailView.tsx
new file mode 100644
index 000000000..eac56efdd
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/IssueDetailView.tsx
@@ -0,0 +1,49 @@
+import React from "react";
+import { Box, Text } from "ink";
+import type { MultiExtractorResult } from "../types";
+
+interface IssueDetailViewProps {
+  height: number;
+  result: MultiExtractorResult;
+  extractorIdx: number;
+  issueIdx: number;
+}
+
+export function IssueDetailView({ height, result, extractorIdx, issueIdx }: IssueDetailViewProps) {
+  const extractor = result.extractorResults[extractorIdx];
+  const issue = extractor.issues[issueIdx];
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="blue" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="blue">Issue Detail</Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1} flexDirection="column">
+        <Text><Text bold>Extractor: </Text><Text color="yellow">{extractor.extractorId}</Text></Text>
+        <Text><Text bold>Type: </Text><Text color="cyan">{issue.issueType}</Text>{issue.fallacyType && <Text dimColor> ({issue.fallacyType})</Text>}</Text>
+        <Text><Text bold>Severity: </Text><Text color={issue.severityScore >= 70 ? 'red' : issue.severityScore >= 40 ? 'yellow' : 'green'}>{issue.severityScore}/100</Text></Text>
+        <Text><Text bold>Confidence: </Text><Text>{issue.confidenceScore}/100</Text></Text>
+        <Text><Text bold>Importance: </Text><Text>{issue.importanceScore}/100</Text></Text>
+      </Box>
+
+      <Box flexDirection="column" marginBottom={1}>
+        <Text bold underline>Quoted Text:</Text>
+        <Box marginLeft={1} marginTop={1}>
+          <Text color="gray" wrap="wrap">"{issue.exactText}"</Text>
+        </Box>
+      </Box>
+
+      <Box flexDirection="column" marginBottom={1}>
+        <Text bold underline>Reasoning:</Text>
+        <Box marginLeft={1} marginTop={1}>
+          <Text wrap="wrap">{issue.reasoning}</Text>
+        </Box>
+      </Box>
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Press Escape to go back to results</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/JudgeComparisonView.tsx b/meta-evals/src/components/extractor-lab/views/JudgeComparisonView.tsx
new file mode 100644
index 000000000..a07fa9631
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/JudgeComparisonView.tsx
@@ -0,0 +1,133 @@
+import React from "react";
+import { Box, Text } from "ink";
+import SelectInput from "ink-select-input";
+import type { MultiExtractorResult, JudgeRunResult, DocumentChoice } from "../types";
+import { truncate } from "../utils";
+
+interface JudgeComparisonViewProps {
+  height: number;
+  maxItems: number;
+  result: MultiExtractorResult;
+  judgeResults: JudgeRunResult[];
+  selectedDoc: DocumentChoice | null;
+  termWidth: number;
+  onBack: () => void;
+  onViewJudge: (judgeResult: JudgeRunResult) => void;
+}
+
+export function JudgeComparisonView({
+  height,
+  maxItems,
+  result,
+  judgeResults,
+  selectedDoc,
+  termWidth,
+  onBack,
+  onViewJudge,
+}: JudgeComparisonViewProps) {
+  const totalInputIssues = result.extractorResults.reduce((sum, r) => sum + r.issues.length, 0);
+
+  // Build comparison items
+  const comparisonItems: Array<{ label: string; value: string }> = [];
+
+  // Header row
+  comparisonItems.push({
+    label: `── Judge Comparison: ${judgeResults.length} judges, ${totalInputIssues} input issues ──`,
+    value: "header",
+  });
+
+  // Each judge row
+  judgeResults.forEach((jr, idx) => {
+    const status = jr.error ? "❌ Error" : `✅ ${jr.result.summary.acceptedCount} accepted, ${jr.result.summary.mergedCount} merged, ${jr.result.summary.rejectedCount} rejected`;
+    const duration = `${(jr.durationMs / 1000).toFixed(1)}s`;
+    comparisonItems.push({
+      label: `[${idx + 1}] ${jr.label.padEnd(30)} ${duration.padEnd(8)} ${status}`,
+      value: `judge-${idx}`,
+    });
+
+    // If error, show error details
+    if (jr.error) {
+      comparisonItems.push({
+        label: `    Error: ${truncate(jr.error, termWidth - 20)}`,
+        value: `error-${idx}`,
+      });
+    }
+  });
+
+  // Summary stats
+  comparisonItems.push({
+    label: "────────────────────────────────────────────────────────────────────────────",
+    value: "sep-1",
+  });
+
+  // Agreement summary - find issues accepted by all judges
+  const successfulJudges = judgeResults.filter(jr => !jr.error);
+  if (successfulJudges.length > 1) {
+    // Get accepted issue texts from each judge for comparison
+    const acceptedByJudge = successfulJudges.map(jr =>
+      new Set(jr.result.acceptedDecisions.map(d => d.finalText.toLowerCase().trim()))
+    );
+
+    // Find issues in ALL judges (intersection)
+    const unanimouslyAccepted = [...acceptedByJudge[0]].filter(text =>
+      acceptedByJudge.every(set => set.has(text))
+    ).length;
+
+    // Find issues in ANY judge (union)
+    const allAccepted = new Set(acceptedByJudge.flatMap(set => [...set])).size;
+
+    const agreementPct = allAccepted > 0 ? Math.round((unanimouslyAccepted / allAccepted) * 100) : 0;
+
+    comparisonItems.push({
+      label: `📊 Agreement: ${unanimouslyAccepted}/${allAccepted} issues accepted by all judges (${agreementPct}%)`,
+      value: "stats-1",
+    });
+  }
+
+  comparisonItems.push({
+    label: "────────────────────────────────────────────────────────────────────────────",
+    value: "sep-2",
+  });
+  comparisonItems.push({ label: "← Back to Extraction Results", value: "back" });
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="magenta" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="magenta">Extractor Lab - Judge Comparison: </Text>
+        <Text color="green">{selectedDoc?.title}</Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
+        <Text>
+          <Text bold>Input: </Text><Text>{totalInputIssues} issues from {result.extractorResults.length} extractors</Text>
+          <Text>  |  </Text>
+          <Text bold>Judges run: </Text><Text color="cyan">{judgeResults.length}</Text>
+          <Text>  |  </Text>
+          <Text bold>Successful: </Text><Text color="green">{judgeResults.filter(j => !j.error).length}</Text>
+        </Text>
+      </Box>
+
+      <SelectInput
+        items={comparisonItems.filter(i => !i.value.startsWith("sep-") && !i.value.startsWith("header") && !i.value.startsWith("stats-"))}
+        limit={maxItems - 5}
+        onSelect={(item) => {
+          if (item.value === "back") {
+            onBack();
+          } else if (item.value.startsWith("error-")) {
+            return;
+          } else if (item.value.startsWith("judge-")) {
+            const idx = parseInt(item.value.replace("judge-", ""), 10);
+            const jr = judgeResults[idx];
+            if (!jr.error) {
+              onViewJudge(jr);
+            }
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter=View Judge Details | Escape=Back to Results</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/JudgeDecisionDetailView.tsx b/meta-evals/src/components/extractor-lab/views/JudgeDecisionDetailView.tsx
new file mode 100644
index 000000000..419f56acc
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/JudgeDecisionDetailView.tsx
@@ -0,0 +1,72 @@
+import React from "react";
+import { Box, Text } from "ink";
+import type { JudgeDecision } from "../types";
+
+interface JudgeDecisionDetailViewProps {
+  height: number;
+  decision: JudgeDecision;
+  isRejected: boolean;
+}
+
+export function JudgeDecisionDetailView({ height, decision, isRejected }: JudgeDecisionDetailViewProps) {
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor={isRejected ? "red" : "green"} padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color={isRejected ? "red" : "green"}>
+          Judge Decision: {decision.decision.toUpperCase()}
+        </Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1} flexDirection="column">
+        <Text>
+          <Text bold>Decision: </Text>
+          <Text color={isRejected ? "red" : "green"}>{decision.decision}</Text>
+        </Text>
+        <Text>
+          <Text bold>Type: </Text>
+          <Text color="cyan">{decision.finalIssueType}</Text>
+          {decision.finalFallacyType && <Text dimColor> ({decision.finalFallacyType})</Text>}
+        </Text>
+        <Text>
+          <Text bold>Severity: </Text>
+          <Text color={decision.finalSeverity >= 70 ? "red" : decision.finalSeverity >= 40 ? "yellow" : "green"}>
+            {decision.finalSeverity}/100
+          </Text>
+          <Text>  |  </Text>
+          <Text bold>Confidence: </Text><Text>{decision.finalConfidence}/100</Text>
+          <Text>  |  </Text>
+          <Text bold>Importance: </Text><Text>{decision.finalImportance}/100</Text>
+        </Text>
+        <Text>
+          <Text bold>Source Extractors: </Text>
+          <Text color="yellow">{decision.sourceExtractors.join(", ")}</Text>
+        </Text>
+      </Box>
+
+      <Box flexDirection="column" marginBottom={1}>
+        <Text bold underline>Quoted Text:</Text>
+        <Box marginLeft={1} marginTop={1}>
+          <Text color="gray" wrap="wrap">"{decision.finalText}"</Text>
+        </Box>
+      </Box>
+
+      <Box flexDirection="column" marginBottom={1}>
+        <Text bold underline>Judge Reasoning:</Text>
+        <Box marginLeft={1} marginTop={1}>
+          <Text wrap="wrap" color="cyan">{decision.judgeReasoning}</Text>
+        </Box>
+      </Box>
+
+      <Box flexDirection="column" marginBottom={1}>
+        <Text bold underline>Issue Reasoning:</Text>
+        <Box marginLeft={1} marginTop={1}>
+          <Text wrap="wrap">{decision.finalReasoning}</Text>
+        </Box>
+      </Box>
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Press Escape to go back to judge results</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/JudgeResultsView.tsx b/meta-evals/src/components/extractor-lab/views/JudgeResultsView.tsx
new file mode 100644
index 000000000..f392fd54d
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/JudgeResultsView.tsx
@@ -0,0 +1,116 @@
+import React from "react";
+import { Box, Text } from "ink";
+import SelectInput from "ink-select-input";
+import type { MultiExtractorResult, FallacyJudgeOutput, JudgeDecision, JudgeRunResult } from "../types";
+import { truncate } from "../utils";
+
+interface JudgeResultsViewProps {
+  height: number;
+  maxItems: number;
+  result: MultiExtractorResult;
+  judgeResult: FallacyJudgeOutput;
+  judgeLabel: string;
+  judgeResults?: JudgeRunResult[];
+  judgeTextWidth: number;
+  onBack: () => void;
+  onViewDecision: (decision: JudgeDecision, isRejected: boolean) => void;
+}
+
+export function JudgeResultsView({
+  height,
+  maxItems,
+  result,
+  judgeResult,
+  judgeLabel,
+  judgeTextWidth,
+  onBack,
+  onViewDecision,
+}: JudgeResultsViewProps) {
+  const totalInputIssues = result.extractorResults.reduce((sum, r) => sum + r.issues.length, 0);
+
+  // Create legend mapping extractor IDs to short keys (A, B, C, ...)
+  const extractorIds = result.extractorResults.map(r => r.extractorId);
+  const extractorKeys: Record<string, string> = {};
+  extractorIds.forEach((id, i) => {
+    extractorKeys[id] = String.fromCharCode(65 + i); // A, B, C, ...
+  });
+
+  // Helper to convert extractor IDs to short keys
+  const sourcesToKeys = (sources: string[]): string => {
+    return sources.map(s => extractorKeys[s] || "?").join(",");
+  };
+
+  // Build list of judge decisions
+  const decisionItems: Array<{ label: string; value: string }> = [];
+
+  // Accepted/merged decisions
+  judgeResult.acceptedDecisions.forEach((decision, idx) => {
+    const symbol = decision.decision === "merge" ? "[*]" : "[+]";
+    const keys = sourcesToKeys(decision.sourceExtractors);
+    const text = truncate(decision.finalText.replace(/\n/g, ' '), judgeTextWidth).padEnd(judgeTextWidth);
+    decisionItems.push({
+      label: `${symbol} ${decision.finalIssueType.padEnd(18)} ${text} [${keys}]`,
+      value: `accepted-${idx}`,
+    });
+  });
+
+  // Rejected decisions
+  judgeResult.rejectedDecisions.forEach((decision, idx) => {
+    const keys = sourcesToKeys(decision.sourceExtractors);
+    const text = truncate(decision.finalText.replace(/\n/g, ' '), judgeTextWidth).padEnd(judgeTextWidth);
+    decisionItems.push({
+      label: `[x] ${decision.finalIssueType.padEnd(18)} ${text} [${keys}]`,
+      value: `rejected-${idx}`,
+    });
+  });
+
+  decisionItems.push({ label: "───────────────────────────────────────────────────────────────────────────────────────", value: "sep-1" });
+  decisionItems.push({ label: "← Back", value: "back" });
+
+  // Build legend string
+  const legendParts = extractorIds.map((id, i) => `${String.fromCharCode(65 + i)}=${id}`);
+  const legendStr = legendParts.join("  ");
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="cyan" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="cyan">Judge Results{judgeLabel ? `: ${judgeLabel}` : ""}</Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1} flexDirection="column">
+        <Text>
+          <Text bold>Input: </Text><Text>{totalInputIssues} issues</Text>
+          <Text>  --&gt;  </Text>
+          <Text bold color="green">{judgeResult.summary.acceptedCount} accepted</Text>
+          <Text>  |  </Text>
+          <Text bold color="yellow">{judgeResult.summary.mergedCount} merged</Text>
+          <Text>  |  </Text>
+          <Text bold color="red">{judgeResult.summary.rejectedCount} rejected</Text>
+        </Text>
+        <Text dimColor>Legend: [+]=accept [*]=merge [x]=reject  |  {legendStr}</Text>
+      </Box>
+
+      <SelectInput
+        items={decisionItems}
+        limit={maxItems - 5}
+        onSelect={(item) => {
+          if (item.value.startsWith("sep-")) {
+            return;
+          } else if (item.value === "back") {
+            onBack();
+          } else if (item.value.startsWith("accepted-")) {
+            const idx = parseInt(item.value.replace("accepted-", ""), 10);
+            onViewDecision(judgeResult.acceptedDecisions[idx], false);
+          } else if (item.value.startsWith("rejected-")) {
+            const idx = parseInt(item.value.replace("rejected-", ""), 10);
+            onViewDecision(judgeResult.rejectedDecisions[idx], true);
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter=View Detail | Escape=Back</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/PreJudgeDedupView.tsx b/meta-evals/src/components/extractor-lab/views/PreJudgeDedupView.tsx
new file mode 100644
index 000000000..496d010ea
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/PreJudgeDedupView.tsx
@@ -0,0 +1,201 @@
+import React from "react";
+import { Box, Text } from "ink";
+import SelectInput from "ink-select-input";
+import * as fs from "fs";
+import * as path from "path";
+import type { MultiExtractorResult, MultiStrategyDedupResult, DedupStrategy, JudgeConfig, DedupComparison } from "../types";
+
+interface PreJudgeDedupViewProps {
+  height: number;
+  maxItems: number;
+  result: MultiExtractorResult;
+  multiDedup: MultiStrategyDedupResult;
+  selectedStrategy: DedupStrategy;
+  availableJudges: JudgeConfig[];
+  selectedJudgeIdxs: Set<number>;
+  issueTextWidth: number;
+  generateJudgeLabel: (config: JudgeConfig) => string;
+  onBack: () => void;
+  onRunJudges: (selectedConfigs: JudgeConfig[], dedupResult: DedupComparison) => void;
+  onToggleJudge: (idx: number) => void;
+  onSelectStrategy: (strategy: DedupStrategy) => void;
+}
+
+const STRATEGY_LABELS: Record<DedupStrategy, string> = {
+  exact: "Exact Match",
+  jaccard: "Jaccard (word overlap)",
+  fuse: "Fuse.js (fuzzy)",
+  ufuzzy: "uFuzzy (fuzzy)",
+};
+
+/** Export full dedup analysis to a file for validation */
+function exportDedupAnalysis(multiDedup: MultiStrategyDedupResult, selectedStrategy: DedupStrategy): string {
+  const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
+  const filename = `dedup-analysis-${timestamp}.txt`;
+  const filepath = path.join(process.cwd(), filename);
+
+  let output = "=".repeat(80) + "\n";
+  output += "DEDUP ANALYSIS EXPORT\n";
+  output += `Generated: ${new Date().toISOString()}\n`;
+  output += "=".repeat(80) + "\n\n";
+
+  // Summary
+  output += "STRATEGY COMPARISON:\n";
+  output += "-".repeat(40) + "\n";
+  for (const strategy of ["exact", "jaccard", "fuse", "ufuzzy"] as DedupStrategy[]) {
+    const dedup = multiDedup[strategy];
+    output += `${strategy.padEnd(10)}: ${dedup.unique.length} unique, ${dedup.duplicates.length} duplicates (from ${dedup.originalCount} total)\n`;
+  }
+  output += "\n";
+
+  // Detailed analysis for each strategy
+  for (const strategy of ["exact", "jaccard", "fuse", "ufuzzy"] as DedupStrategy[]) {
+    const dedup = multiDedup[strategy];
+
+    output += "=".repeat(80) + "\n";
+    output += `STRATEGY: ${STRATEGY_LABELS[strategy].toUpperCase()}\n`;
+    output += "=".repeat(80) + "\n\n";
+
+    if (dedup.duplicates.length === 0) {
+      output += "No duplicates found.\n\n";
+    } else {
+      output += `DUPLICATE PAIRS (${dedup.duplicates.length}):\n`;
+      output += "-".repeat(40) + "\n\n";
+
+      dedup.duplicates.forEach((match, idx) => {
+        output += `${idx + 1}. REMOVED [${match.duplicate.extractorId}]:\n`;
+        output += `   "${match.duplicate.exactText}"\n\n`;
+        output += `   KEPT [${match.matchedTo.extractorId}] (similarity: ${Math.round(match.similarity * 100)}%):\n`;
+        output += `   "${match.matchedTo.exactText}"\n\n`;
+        output += "-".repeat(40) + "\n\n";
+      });
+    }
+
+    output += `UNIQUE ISSUES AFTER DEDUP (${dedup.unique.length}):\n`;
+    output += "-".repeat(40) + "\n\n";
+    dedup.unique.forEach((issue, idx) => {
+      output += `${idx + 1}. [${issue.extractorId}] ${issue.issueType}${issue.fallacyType ? ` (${issue.fallacyType})` : ""}\n`;
+      output += `   "${issue.exactText}"\n\n`;
+    });
+    output += "\n";
+  }
+
+  fs.writeFileSync(filepath, output);
+  return filepath;
+}
+
+export function PreJudgeDedupView({
+  height,
+  maxItems,
+  multiDedup,
+  selectedStrategy,
+  availableJudges,
+  selectedJudgeIdxs,
+  generateJudgeLabel,
+  onBack,
+  onRunJudges,
+  onToggleJudge,
+}: PreJudgeDedupViewProps) {
+  // Use jaccard as the default/only strategy for now
+  const currentDedup = multiDedup.jaccard;
+  const { unique, duplicates, originalCount } = currentDedup;
+
+  // Build items list
+  const items: Array<{ label: string; value: string }> = [];
+
+  // Judge selection
+  items.push({ label: "── Select Judges ──", value: "header-judges" });
+  if (availableJudges.length > 0) {
+    availableJudges.forEach((judge, idx) => {
+      const label = generateJudgeLabel(judge);
+      const isSelected = selectedJudgeIdxs.has(idx);
+      const prefix = isSelected ? "[x]" : "[ ]";
+      const thinkStr = judge.thinking ? "think" : "noThink";
+      const tempStr =
+        judge.temperature === "default"
+          ? "tDef"
+          : judge.temperature !== undefined
+          ? `t${judge.temperature}`
+          : "";
+      items.push({
+        label: `${prefix} ${label} (${tempStr ? tempStr + ", " : ""}${thinkStr})`,
+        value: `judge-${idx}`,
+      });
+    });
+
+    items.push({ label: "────────────────────────────────────────", value: "sep-3" });
+
+    const selectedCount = selectedJudgeIdxs.size;
+    const judgeLabel =
+      selectedCount === 1
+        ? generateJudgeLabel(availableJudges[[...selectedJudgeIdxs][0]])
+        : `${selectedCount} judges`;
+    items.push({
+      label: `⚖️  Run ${judgeLabel} (${unique.length} deduplicated issues)`,
+      value: "run-judge",
+    });
+  } else {
+    items.push({
+      label: `⚠️  No judges configured. Set FALLACY_JUDGES env var`,
+      value: "no-judges",
+    });
+  }
+
+  items.push({ label: "📄 Export Full Analysis to File", value: "export" });
+  items.push({ label: "← Back to Extraction Results", value: "back" });
+
+  // Filter non-selectable items (headers and separators)
+  const selectableItems = items.filter(
+    (i) =>
+      !i.value.startsWith("header-") &&
+      !i.value.startsWith("sep-")
+  );
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="yellow" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="yellow">
+          Pre-Judge Deduplication
+        </Text>
+      </Box>
+
+      {/* Dedup summary */}
+      <Box borderStyle="single" borderColor="cyan" marginBottom={1} paddingX={1} flexDirection="column">
+        <Text>
+          <Text bold>Deduplication: </Text>
+          <Text>{originalCount} issues → </Text>
+          <Text bold color="green">{unique.length} unique</Text>
+          {duplicates.length > 0 && (
+            <Text color="red"> ({duplicates.length} duplicates removed)</Text>
+          )}
+        </Text>
+        <Text dimColor>
+          Using Jaccard word-overlap similarity. Quality-based selection keeps longer/higher-scored issues.
+        </Text>
+      </Box>
+
+      <SelectInput
+        items={selectableItems}
+        limit={maxItems - 10}
+        onSelect={(item) => {
+          if (item.value === "back") {
+            onBack();
+          } else if (item.value === "export") {
+            const filepath = exportDedupAnalysis(multiDedup, selectedStrategy);
+            console.error(`\n📄 Exported full analysis to: ${filepath}\n`);
+          } else if (item.value === "run-judge") {
+            const selectedConfigs = [...selectedJudgeIdxs].map((idx) => availableJudges[idx]);
+            onRunJudges(selectedConfigs, currentDedup);
+          } else if (item.value.startsWith("judge-")) {
+            const idx = parseInt(item.value.replace("judge-", ""), 10);
+            onToggleJudge(idx);
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter=Select | Escape=Back</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/ResultsView.tsx b/meta-evals/src/components/extractor-lab/views/ResultsView.tsx
new file mode 100644
index 000000000..d2100b0a9
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/ResultsView.tsx
@@ -0,0 +1,102 @@
+import React from "react";
+import { Box, Text } from "ink";
+import SelectInput from "ink-select-input";
+import type { MultiExtractorResult, DocumentChoice } from "../types";
+import { truncate } from "../utils";
+
+interface ResultsViewProps {
+  height: number;
+  maxItems: number;
+  result: MultiExtractorResult;
+  selectedDoc: DocumentChoice | null;
+  issueTextWidth: number;
+  onBack: () => void;
+  onRunDedup: () => void;
+  onViewIssue: (extractorIdx: number, issueIdx: number) => void;
+}
+
+export function ResultsView({
+  height,
+  maxItems,
+  result,
+  selectedDoc,
+  issueTextWidth,
+  onBack,
+  onRunDedup,
+  onViewIssue,
+}: ResultsViewProps) {
+  const totalIssues = result.extractorResults.reduce((sum, r) => sum + r.issues.length, 0);
+
+  // Build flat list of issues with extractor info
+  const issueItems: Array<{ label: string; value: string }> = [];
+
+  result.extractorResults.forEach((r, extractorIdx) => {
+    // Add extractor header
+    const tempStr = r.config.temperature === 'default' ? 'tDef' : `t${r.config.temperature}`;
+    const thinkStr = r.config.thinking ? '' : ' noThink';
+    issueItems.push({
+      label: `── ${r.extractorId} (${tempStr}${thinkStr}) - ${r.issues.length} issues, ${(r.durationMs / 1000).toFixed(1)}s ──`,
+      value: `header-${extractorIdx}`,
+    });
+    // Add issues for this extractor
+    r.issues.forEach((issue, issueIdx) => {
+      const severityColor = issue.severityScore >= 70 ? '🔴' : issue.severityScore >= 40 ? '🟡' : '🟢';
+      issueItems.push({
+        label: `  ${severityColor} [${issue.issueType}] ${truncate(issue.exactText.replace(/\n/g, ' '), issueTextWidth)}`,
+        value: `issue-${extractorIdx}-${issueIdx}`,
+      });
+    });
+  });
+
+  // Actions at the bottom
+  issueItems.push({ label: "───────────────────────────────────────────────────────────────────────────", value: "sep-1" });
+
+  // Deduplicate button (only if we have issues)
+  if (totalIssues > 0) {
+    issueItems.push({
+      label: `▶ Deduplicate & Prepare for Judge (${totalIssues} issues)`,
+      value: "run-dedup",
+    });
+  }
+  issueItems.push({ label: "← Back to Configure", value: "back" });
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="green" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="green">Extractor Lab - Extraction Results: </Text>
+        <Text color="cyan">{selectedDoc?.title}</Text>
+      </Box>
+
+      <Box borderStyle="single" borderColor="gray" marginBottom={1} paddingX={1}>
+        <Text>
+          <Text bold>Duration: </Text><Text>{(result.totalDurationMs / 1000).toFixed(1)}s</Text>
+          <Text>  |  </Text>
+          <Text bold>Issues: </Text><Text color="cyan">{totalIssues}</Text>
+          <Text>  |  </Text>
+          <Text bold>Extractors: </Text><Text>{result.extractorResults.length}</Text>
+        </Text>
+      </Box>
+
+      <SelectInput
+        items={issueItems}
+        limit={maxItems - 3}
+        onSelect={(item) => {
+          if (item.value.startsWith("sep-") || item.value.startsWith("header-")) {
+            return;
+          } else if (item.value === "back") {
+            onBack();
+          } else if (item.value === "run-dedup") {
+            onRunDedup();
+          } else if (item.value.startsWith("issue-")) {
+            const [, extractorIdx, issueIdx] = item.value.split("-");
+            onViewIssue(parseInt(extractorIdx), parseInt(issueIdx));
+          }
+        }}
+      />
+
+      <Box marginTop={1} justifyContent="center">
+        <Text dimColor>Enter View Detail | Escape Back</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/RunningJudgeView.tsx b/meta-evals/src/components/extractor-lab/views/RunningJudgeView.tsx
new file mode 100644
index 000000000..28b83a629
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/RunningJudgeView.tsx
@@ -0,0 +1,35 @@
+import React from "react";
+import { Box, Text } from "ink";
+import Spinner from "ink-spinner";
+import type { JudgeConfig, PreJudgeDedupResult } from "../types";
+
+interface RunningJudgeViewProps {
+  height: number;
+  judgeConfigs: JudgeConfig[];
+  dedupResult: PreJudgeDedupResult;
+  generateJudgeLabel: (config: JudgeConfig) => string;
+}
+
+export function RunningJudgeView({ height, judgeConfigs, dedupResult, generateJudgeLabel }: RunningJudgeViewProps) {
+  const judgeLabels = judgeConfigs.map(c => generateJudgeLabel(c)).join(", ");
+
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="yellow" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="yellow">Running LLM Judge</Text>
+      </Box>
+
+      <Box justifyContent="center" padding={2}>
+        <Text>
+          <Spinner type="dots" /> Running {judgeConfigs.length} judge(s): {judgeLabels}
+        </Text>
+      </Box>
+
+      <Box justifyContent="center">
+        <Text dimColor>
+          Aggregating {dedupResult.unique.length} unique issues (from {dedupResult.originalCount} total)...
+        </Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/RunningView.tsx b/meta-evals/src/components/extractor-lab/views/RunningView.tsx
new file mode 100644
index 000000000..2d4cc78ba
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/RunningView.tsx
@@ -0,0 +1,28 @@
+import React from "react";
+import { Box, Text } from "ink";
+import Spinner from "ink-spinner";
+
+interface RunningViewProps {
+  height: number;
+  extractorCount: number;
+}
+
+export function RunningView({ height, extractorCount }: RunningViewProps) {
+  return (
+    <Box flexDirection="column" borderStyle="round" borderColor="yellow" padding={1} height={height}>
+      <Box justifyContent="center" marginBottom={1}>
+        <Text bold color="yellow">Extractor Lab - Running</Text>
+      </Box>
+
+      <Box justifyContent="center" padding={2}>
+        <Text>
+          <Spinner type="dots" /> Running {extractorCount} extractor(s)...
+        </Text>
+      </Box>
+
+      <Box justifyContent="center">
+        <Text dimColor>This may take a minute...</Text>
+      </Box>
+    </Box>
+  );
+}
diff --git a/meta-evals/src/components/extractor-lab/views/index.ts b/meta-evals/src/components/extractor-lab/views/index.ts
new file mode 100644
index 000000000..725d15722
--- /dev/null
+++ b/meta-evals/src/components/extractor-lab/views/index.ts
@@ -0,0 +1,10 @@
+export { ErrorView } from "./ErrorView";
+export { RunningView } from "./RunningView";
+export { RunningJudgeView } from "./RunningJudgeView";
+export { ConfigureExtractorsView } from "./ConfigureExtractorsView";
+export { IssueDetailView } from "./IssueDetailView";
+export { ResultsView } from "./ResultsView";
+export { PreJudgeDedupView } from "./PreJudgeDedupView";
+export { JudgeResultsView } from "./JudgeResultsView";
+export { JudgeDecisionDetailView } from "./JudgeDecisionDetailView";
+export { JudgeComparisonView } from "./JudgeComparisonView";
diff --git a/meta-evals/src/components/helpers.ts b/meta-evals/src/components/helpers.ts
index c5170dad6..7f7a9e55d 100644
--- a/meta-evals/src/components/helpers.ts
+++ b/meta-evals/src/components/helpers.ts
@@ -3,14 +3,18 @@
  */
 
 export function truncate(str: string, maxLen: number): string {
-  if (str.length <= maxLen) return str;
-  return str.slice(0, maxLen - 3) + "...";
+  // Sanitize: replace newlines/tabs with spaces, collapse multiple spaces
+  const clean = str.replace(/[\n\r\t]+/g, ' ').replace(/\s+/g, ' ').trim();
+  if (clean.length <= maxLen) return clean;
+  return clean.slice(0, maxLen - 3) + "...";
 }
 
 export function formatDate(date: Date): string {
-  const month = String(date.getMonth() + 1).padStart(2, "0");
-  const day = String(date.getDate()).padStart(2, "0");
-  return `${month}-${day}`;
+  return date.toLocaleDateString("en-US", {
+    month: "short",
+    day: "numeric",
+    year: "numeric",
+  });
 }
 
 export function formatStatus(status: string): string {
diff --git a/meta-evals/src/components/index.ts b/meta-evals/src/components/index.ts
index 8bca17dd5..3245b32be 100644
--- a/meta-evals/src/components/index.ts
+++ b/meta-evals/src/components/index.ts
@@ -3,9 +3,14 @@
  */
 
 export { MainMenu } from "./MainMenu";
+export { ScoreRankMenu } from "./ScoreRankMenu";
 export { CreateBaseline } from "./CreateBaseline";
 export { SeriesDetail } from "./SeriesDetail";
 export { RankRuns } from "./RankRuns";
 export { ScoreRun } from "./ScoreRun";
+export { Validation } from "./Validation";
+export { ExtractorLab } from "./ExtractorLab";
+export { ModelSelector, type ModelInfo } from "./ModelSelector";
+export { DocumentSelector } from "./DocumentSelector";
 export * from "./helpers";
 export * from "./types";
diff --git a/meta-evals/src/components/types.ts b/meta-evals/src/components/types.ts
index 838ff68dc..164c52b92 100644
--- a/meta-evals/src/components/types.ts
+++ b/meta-evals/src/components/types.ts
@@ -6,10 +6,13 @@ import type { SeriesSummary, DocumentChoice, AgentChoice } from "@roast/db";
 
 export type Screen =
   | { type: "loading" }
-  | { type: "main-menu"; series: SeriesSummary[] }
+  | { type: "main-menu" }
+  | { type: "score-rank-menu"; series: SeriesSummary[] }
   | { type: "create-baseline"; step: "document" | "agents" | "confirm" | "creating" }
   | { type: "series-detail"; seriesId: string }
   | { type: "rank-runs"; seriesId: string }
-  | { type: "score-run"; seriesId: string };
+  | { type: "score-run"; seriesId: string }
+  | { type: "validation" }
+  | { type: "extractor-lab" };
 
 export type { SeriesSummary, DocumentChoice, AgentChoice };
diff --git a/meta-evals/src/utils/allModels.ts b/meta-evals/src/utils/allModels.ts
new file mode 100644
index 000000000..9081f8e3e
--- /dev/null
+++ b/meta-evals/src/utils/allModels.ts
@@ -0,0 +1,136 @@
+/**
+ * Fetch models from both Anthropic and OpenRouter APIs
+ */
+
+import Anthropic from "@anthropic-ai/sdk";
+
+export interface ModelInfo {
+  id: string;
+  name: string;
+  provider: "anthropic" | "openrouter";
+  contextLength?: number;
+  description?: string;
+}
+
+// Cache for models
+let cachedModels: ModelInfo[] | null = null;
+let cacheTimestamp = 0;
+const CACHE_TTL_MS = 5 * 60 * 1000; // 5 minutes
+
+/**
+ * Fetch models from Anthropic API
+ */
+async function fetchAnthropicModels(): Promise<ModelInfo[]> {
+  try {
+    const client = new Anthropic();
+    const response = await client.models.list();
+
+    return response.data.map((m) => ({
+      id: m.id,
+      name: m.display_name,
+      provider: "anthropic" as const,
+    }));
+  } catch (e) {
+    console.error("Failed to fetch Anthropic models:", e);
+    return [];
+  }
+}
+
+/**
+ * Fetch models from OpenRouter API
+ */
+async function fetchOpenRouterModels(): Promise<ModelInfo[]> {
+  try {
+    const response = await fetch("https://openrouter.ai/api/v1/models");
+    if (!response.ok) {
+      throw new Error(`OpenRouter API error: ${response.status}`);
+    }
+
+    const data = (await response.json()) as {
+      data: Array<{
+        id: string;
+        name: string;
+        context_length?: number;
+        description?: string;
+      }>;
+    };
+
+    return data.data
+      .filter((m) => {
+        // Filter out free/test models and keep quality models
+        if (m.id.includes(":free")) return false;
+        if (m.id.includes("auto")) return false;
+        return true;
+      })
+      .map((m) => ({
+        id: m.id,
+        name: m.name,
+        provider: "openrouter" as const,
+        contextLength: m.context_length,
+        description: m.description,
+      }));
+  } catch (e) {
+    console.error("Failed to fetch OpenRouter models:", e);
+    return [];
+  }
+}
+
+/**
+ * Get all available models from both APIs (cached)
+ */
+export async function getAllModels(): Promise<ModelInfo[]> {
+  const now = Date.now();
+
+  if (cachedModels && now - cacheTimestamp < CACHE_TTL_MS) {
+    return cachedModels;
+  }
+
+  const [anthropicModels, openRouterModels] = await Promise.all([
+    fetchAnthropicModels(),
+    fetchOpenRouterModels(),
+  ]);
+
+  // Combine and sort: Anthropic first, then OpenRouter alphabetically
+  cachedModels = [
+    ...anthropicModels.sort((a, b) => a.name.localeCompare(b.name)),
+    ...openRouterModels.sort((a, b) => a.name.localeCompare(b.name)),
+  ];
+
+  cacheTimestamp = now;
+  return cachedModels;
+}
+
+/**
+ * Filter models by search query
+ * Matches against id and name
+ */
+export function filterModels(models: ModelInfo[], query: string): ModelInfo[] {
+  if (!query.trim()) {
+    return models;
+  }
+
+  const lowerQuery = query.toLowerCase();
+  return models.filter(
+    (m) =>
+      m.id.toLowerCase().includes(lowerQuery) ||
+      m.name.toLowerCase().includes(lowerQuery) ||
+      m.provider.toLowerCase().includes(lowerQuery)
+  );
+}
+
+/**
+ * Group models by provider
+ */
+export function groupModelsByProvider(
+  models: ModelInfo[]
+): Map<string, ModelInfo[]> {
+  const grouped = new Map<string, ModelInfo[]>();
+
+  for (const model of models) {
+    const existing = grouped.get(model.provider) || [];
+    existing.push(model);
+    grouped.set(model.provider, existing);
+  }
+
+  return grouped;
+}
diff --git a/meta-evals/src/utils/apiClient.ts b/meta-evals/src/utils/apiClient.ts
index 1cdf24b70..b748d1356 100644
--- a/meta-evals/src/utils/apiClient.ts
+++ b/meta-evals/src/utils/apiClient.ts
@@ -81,22 +81,65 @@ export class ApiClient {
     const { sessionToken } = await this.getSessionInfo();
 
     const url = `${API_BASE}${path}`;
-    const response = await fetch(url, {
-      ...options,
-      headers: {
-        "Content-Type": "application/json",
-        Cookie: `authjs.session-token=${sessionToken}`,
-        ...options.headers,
-      },
-    });
 
-    const data = await response.json();
+    let response: Response;
+    try {
+      response = await fetch(url, {
+        ...options,
+        headers: {
+          "Content-Type": "application/json",
+          Cookie: `authjs.session-token=${sessionToken}`,
+          ...options.headers,
+        },
+      });
+    } catch (error) {
+      // Network error - server not running, wrong port, etc.
+      const message = error instanceof Error ? error.message : String(error);
+      if (message.includes("ECONNREFUSED") || message.includes("fetch failed")) {
+        throw new ApiError(
+          0,
+          `Cannot connect to API at ${API_BASE}. Is the web server running? Try: pnpm run dev`,
+          { originalError: message }
+        );
+      }
+      throw new ApiError(0, `Network error: ${message}`, { originalError: message });
+    }
+
+    // Handle empty responses
+    const text = await response.text();
+    if (!text) {
+      if (!response.ok) {
+        throw new ApiError(
+          response.status,
+          `API returned ${response.status} ${response.statusText} with empty response`,
+          { url, status: response.status }
+        );
+      }
+      throw new ApiError(
+        response.status,
+        `API returned empty response. Is the server running correctly at ${API_BASE}?`,
+        { url, status: response.status }
+      );
+    }
+
+    // Parse JSON
+    let data: T;
+    try {
+      data = JSON.parse(text);
+    } catch {
+      throw new ApiError(
+        response.status,
+        `API returned invalid JSON. Status: ${response.status}. Response: ${text.slice(0, 200)}`,
+        { url, status: response.status, responseText: text.slice(0, 500) }
+      );
+    }
 
     if (!response.ok) {
-      throw new ApiError(response.status, data.error || "API request failed", data);
+      const errorData = data as { error?: string };
+      throw new ApiError(response.status, errorData.error || "API request failed", data);
     }
 
-    return { data: data as T, status: response.status };
+    return { data, status: response.status };
   }
 
   /**
diff --git a/meta-evals/src/validation/compare.ts b/meta-evals/src/validation/compare.ts
new file mode 100644
index 000000000..9e7cbef91
--- /dev/null
+++ b/meta-evals/src/validation/compare.ts
@@ -0,0 +1,389 @@
+/**
+ * Comparison Logic for Validation Framework
+ *
+ * Compares evaluation snapshots and detects regressions.
+ */
+
+import type {
+  ComparableComment,
+  EvaluationSnapshot,
+  PipelineTelemetrySnapshot,
+  CommentComparisonResult,
+  DocumentComparisonResult,
+  RegressionFlag,
+  RegressionType,
+} from "./types";
+import { REGRESSION_THRESHOLDS } from "./types";
+
+/**
+ * Calculate similarity between two strings using Levenshtein distance.
+ * Returns a value between 0 (completely different) and 1 (identical).
+ */
+function stringSimilarity(a: string, b: string): number {
+  if (a === b) return 1;
+  if (a.length === 0 || b.length === 0) return 0;
+
+  // Normalize strings for comparison
+  const normalize = (s: string) => s.toLowerCase().trim();
+  const normA = normalize(a);
+  const normB = normalize(b);
+
+  if (normA === normB) return 1;
+
+  // Calculate Levenshtein distance
+  const matrix: number[][] = [];
+  for (let i = 0; i <= normA.length; i++) {
+    matrix[i] = [i];
+  }
+  for (let j = 0; j <= normB.length; j++) {
+    matrix[0][j] = j;
+  }
+  for (let i = 1; i <= normA.length; i++) {
+    for (let j = 1; j <= normB.length; j++) {
+      const cost = normA[i - 1] === normB[j - 1] ? 0 : 1;
+      matrix[i][j] = Math.min(
+        matrix[i - 1][j] + 1, // deletion
+        matrix[i][j - 1] + 1, // insertion
+        matrix[i - 1][j - 1] + cost // substitution
+      );
+    }
+  }
+
+  const maxLen = Math.max(normA.length, normB.length);
+  return 1 - matrix[normA.length][normB.length] / maxLen;
+}
+
+/**
+ * Check if two comments match based on quoted text.
+ * Uses fuzzy matching since quoted text might vary slightly between runs.
+ */
+function commentsMatch(
+  a: ComparableComment,
+  b: ComparableComment,
+  threshold = 0.8
+): { matches: boolean; confidence: number } {
+  // First try exact match on quoted text
+  if (a.quotedText === b.quotedText) {
+    return { matches: true, confidence: 1 };
+  }
+
+  // Check if offset ranges overlap significantly
+  const overlapStart = Math.max(a.startOffset, b.startOffset);
+  const overlapEnd = Math.min(a.endOffset, b.endOffset);
+  const overlap = Math.max(0, overlapEnd - overlapStart);
+  const unionLength =
+    Math.max(a.endOffset, b.endOffset) - Math.min(a.startOffset, b.startOffset);
+  const overlapRatio = unionLength > 0 ? overlap / unionLength : 0;
+
+  // If offsets overlap significantly, check text similarity
+  if (overlapRatio > 0.5) {
+    const textSimilarity = stringSimilarity(a.quotedText, b.quotedText);
+    if (textSimilarity >= threshold) {
+      return { matches: true, confidence: textSimilarity };
+    }
+  }
+
+  // Fallback: pure text similarity for comments on same region
+  const textSimilarity = stringSimilarity(a.quotedText, b.quotedText);
+  if (textSimilarity >= threshold) {
+    return { matches: true, confidence: textSimilarity };
+  }
+
+  return { matches: false, confidence: textSimilarity };
+}
+
+/**
+ * Match comments between baseline and current snapshots.
+ * Returns matched pairs, new comments, and lost comments.
+ */
+function matchComments(
+  baseline: ComparableComment[],
+  current: ComparableComment[]
+): {
+  matched: CommentComparisonResult[];
+  newComments: ComparableComment[];
+  lostComments: ComparableComment[];
+} {
+  const matched: CommentComparisonResult[] = [];
+  const unmatchedBaseline = new Set(baseline.map((_, i) => i));
+  const unmatchedCurrent = new Set(current.map((_, i) => i));
+
+  // Greedy matching: find best match for each baseline comment
+  for (let i = 0; i < baseline.length; i++) {
+    let bestMatch: { index: number; confidence: number } | null = null;
+
+    for (let j = 0; j < current.length; j++) {
+      if (!unmatchedCurrent.has(j)) continue;
+
+      const result = commentsMatch(baseline[i], current[j]);
+      if (result.matches) {
+        if (!bestMatch || result.confidence > bestMatch.confidence) {
+          bestMatch = { index: j, confidence: result.confidence };
+        }
+      }
+    }
+
+    if (bestMatch) {
+      matched.push({
+        status: "matched",
+        baselineComment: baseline[i],
+        currentComment: current[bestMatch.index],
+        matchConfidence: bestMatch.confidence,
+      });
+      unmatchedBaseline.delete(i);
+      unmatchedCurrent.delete(bestMatch.index);
+    }
+  }
+
+  // Remaining baseline comments are "lost"
+  const lostComments = Array.from(unmatchedBaseline).map((i) => baseline[i]);
+
+  // Remaining current comments are "new"
+  const newComments = Array.from(unmatchedCurrent).map((i) => current[i]);
+
+  return { matched, newComments, lostComments };
+}
+
+/**
+ * Extract telemetry snapshot from raw pipeline telemetry.
+ */
+function extractTelemetrySnapshot(
+  raw: unknown
+): PipelineTelemetrySnapshot | null {
+  if (!raw || typeof raw !== "object") return null;
+
+  const telemetry = raw as Record<string, unknown>;
+  const finalCounts = telemetry.finalCounts as Record<string, number> | undefined;
+
+  if (!finalCounts) return null;
+
+  return {
+    totalDurationMs: (telemetry.totalDurationMs as number) || 0,
+    issuesExtracted: finalCounts.issuesExtracted || 0,
+    issuesAfterDedup: finalCounts.issuesAfterDedup || 0,
+    issuesAfterFiltering: finalCounts.issuesAfterFiltering || 0,
+    commentsGenerated: finalCounts.commentsGenerated || 0,
+    commentsKept: finalCounts.commentsKept || 0,
+  };
+}
+
+/**
+ * Detect regressions between baseline and current telemetry.
+ */
+function detectTelemetryRegressions(
+  baseline: PipelineTelemetrySnapshot | null,
+  current: PipelineTelemetrySnapshot | null
+): RegressionFlag[] {
+  const regressions: RegressionFlag[] = [];
+
+  if (!baseline || !current) return regressions;
+
+  // Extraction drop
+  if (baseline.issuesExtracted > 0) {
+    const extractionDropPercent =
+      ((baseline.issuesExtracted - current.issuesExtracted) /
+        baseline.issuesExtracted) *
+      100;
+
+    if (extractionDropPercent >= REGRESSION_THRESHOLDS.EXTRACTION_DROP_PERCENT) {
+      regressions.push({
+        type: "extraction_drop",
+        severity: "error",
+        message: `Extraction dropped ${extractionDropPercent.toFixed(0)}% (${baseline.issuesExtracted} → ${current.issuesExtracted})`,
+        details: {
+          baselineCount: baseline.issuesExtracted,
+          currentCount: current.issuesExtracted,
+          dropPercent: extractionDropPercent,
+        },
+      });
+    }
+  }
+
+  // Duration spike
+  if (baseline.totalDurationMs > 0) {
+    const durationIncreasePercent =
+      ((current.totalDurationMs - baseline.totalDurationMs) /
+        baseline.totalDurationMs) *
+      100;
+
+    if (durationIncreasePercent >= REGRESSION_THRESHOLDS.DURATION_SPIKE_PERCENT) {
+      regressions.push({
+        type: "duration_spike",
+        severity: "warning",
+        message: `Duration increased ${durationIncreasePercent.toFixed(0)}% (${baseline.totalDurationMs}ms → ${current.totalDurationMs}ms)`,
+        details: {
+          baselineMs: baseline.totalDurationMs,
+          currentMs: current.totalDurationMs,
+          increasePercent: durationIncreasePercent,
+        },
+      });
+    }
+  }
+
+  return regressions;
+}
+
+/**
+ * Compare two evaluation snapshots and detect regressions.
+ */
+export function compareSnapshots(
+  baseline: EvaluationSnapshot,
+  current: EvaluationSnapshot
+): DocumentComparisonResult {
+  // Match comments
+  const { matched, newComments, lostComments } = matchComments(
+    baseline.comments,
+    current.comments
+  );
+
+  // Calculate aggregate metrics
+  const scoreChange =
+    baseline.grade !== null && current.grade !== null
+      ? current.grade - baseline.grade
+      : null;
+
+  const commentCountChange = current.comments.length - baseline.comments.length;
+
+  // Extract telemetry
+  const baselineTelemetry = extractTelemetrySnapshot(baseline.pipelineTelemetry);
+  const currentTelemetry = extractTelemetrySnapshot(current.pipelineTelemetry);
+
+  const extractionChange =
+    baselineTelemetry && currentTelemetry && baselineTelemetry.issuesExtracted > 0
+      ? ((currentTelemetry.issuesExtracted - baselineTelemetry.issuesExtracted) /
+          baselineTelemetry.issuesExtracted) *
+        100
+      : null;
+
+  const durationChange =
+    baselineTelemetry && currentTelemetry
+      ? currentTelemetry.totalDurationMs - baselineTelemetry.totalDurationMs
+      : null;
+
+  // Detect regressions
+  const regressions: RegressionFlag[] = [];
+
+  // Score drop
+  if (scoreChange !== null && scoreChange < -REGRESSION_THRESHOLDS.SCORE_DROP) {
+    regressions.push({
+      type: "score_drop",
+      severity: "error",
+      message: `Score dropped by ${Math.abs(scoreChange).toFixed(1)} (${baseline.grade} → ${current.grade})`,
+      details: {
+        baselineScore: baseline.grade,
+        currentScore: current.grade,
+        drop: Math.abs(scoreChange),
+      },
+    });
+  }
+
+  // Lost comments threshold
+  if (baseline.comments.length > 0) {
+    const lostPercent =
+      (lostComments.length / baseline.comments.length) * 100;
+
+    if (lostPercent >= REGRESSION_THRESHOLDS.LOST_COMMENTS_PERCENT) {
+      regressions.push({
+        type: "lost_comments",
+        severity: "error",
+        message: `Lost ${lostPercent.toFixed(0)}% of comments (${lostComments.length}/${baseline.comments.length})`,
+        details: {
+          lostCount: lostComments.length,
+          baselineCount: baseline.comments.length,
+          lostPercent,
+        },
+      });
+    }
+  }
+
+  // High-importance comments lost
+  const highImportanceLost = lostComments.filter(
+    (c) =>
+      c.importance !== null &&
+      c.importance >= REGRESSION_THRESHOLDS.HIGH_IMPORTANCE_THRESHOLD
+  );
+
+  if (highImportanceLost.length > 0) {
+    regressions.push({
+      type: "lost_high_importance",
+      severity: "error",
+      message: `Lost ${highImportanceLost.length} high-importance comment(s)`,
+      details: {
+        lostComments: highImportanceLost.map((c) => ({
+          header: c.header,
+          importance: c.importance,
+          quotedText: c.quotedText.slice(0, 50),
+        })),
+      },
+    });
+  }
+
+  // Telemetry regressions
+  regressions.push(
+    ...detectTelemetryRegressions(baselineTelemetry, currentTelemetry)
+  );
+
+  return {
+    documentId: baseline.documentId,
+    documentTitle: baseline.documentTitle,
+    baseline,
+    current,
+    matchedComments: matched,
+    newComments,
+    lostComments,
+    scoreChange,
+    commentCountChange,
+    extractionChange,
+    durationChange,
+    regressions,
+  };
+}
+
+/**
+ * Determine overall status from regressions.
+ */
+export function getComparisonStatus(
+  result: DocumentComparisonResult
+): "ok" | "warning" | "error" {
+  const hasError = result.regressions.some((r) => r.severity === "error");
+  const hasWarning = result.regressions.some((r) => r.severity === "warning");
+
+  if (hasError) return "error";
+  if (hasWarning) return "warning";
+  return "ok";
+}
+
+/**
+ * Format a comparison result for display.
+ */
+export function formatComparisonSummary(
+  result: DocumentComparisonResult
+): string {
+  const status = getComparisonStatus(result);
+  const icon = status === "ok" ? "✅" : status === "warning" ? "⚠️" : "❌";
+
+  const parts = [
+    `${icon} ${result.documentTitle}`,
+    `  Comments: ${result.baseline.comments.length} → ${result.current.comments.length}`,
+  ];
+
+  if (result.scoreChange !== null) {
+    const sign = result.scoreChange >= 0 ? "+" : "";
+    parts.push(`  Score: ${result.baseline.grade} → ${result.current.grade} (${sign}${result.scoreChange.toFixed(1)})`);
+  }
+
+  if (result.newComments.length > 0) {
+    parts.push(`  New: ${result.newComments.length}`);
+  }
+
+  if (result.lostComments.length > 0) {
+    parts.push(`  Lost: ${result.lostComments.length}`);
+  }
+
+  for (const regression of result.regressions) {
+    const rIcon = regression.severity === "error" ? "🔴" : "🟡";
+    parts.push(`  ${rIcon} ${regression.message}`);
+  }
+
+  return parts.join("\n");
+}
diff --git a/meta-evals/src/validation/index.ts b/meta-evals/src/validation/index.ts
new file mode 100644
index 000000000..b5db5c37c
--- /dev/null
+++ b/meta-evals/src/validation/index.ts
@@ -0,0 +1,8 @@
+/**
+ * Validation Framework
+ *
+ * Run fallacy checker against corpus, compare results, detect regressions.
+ */
+
+export * from "./types";
+export * from "./compare";
diff --git a/meta-evals/src/validation/types.ts b/meta-evals/src/validation/types.ts
new file mode 100644
index 000000000..ec95309dd
--- /dev/null
+++ b/meta-evals/src/validation/types.ts
@@ -0,0 +1,207 @@
+/**
+ * Types for Validation Framework
+ *
+ * Used to compare pipeline runs and detect regressions.
+ */
+
+/**
+ * A document selected for validation testing
+ */
+export interface ValidationDocument {
+  documentId: string;
+  title: string;
+  contentLength: number;
+  lastEvaluatedAt: Date | null;
+  evaluationCount: number;
+}
+
+/**
+ * Simplified comment for comparison purposes
+ */
+export interface ComparableComment {
+  id: string;
+  quotedText: string;
+  header: string | null;
+  description: string;
+  importance: number | null;
+  startOffset: number;
+  endOffset: number;
+}
+
+/**
+ * An evaluation snapshot for comparison
+ */
+export interface EvaluationSnapshot {
+  evaluationVersionId: string;
+  agentId: string;
+  agentName: string;
+  createdAt: Date;
+  documentId: string;
+  documentTitle: string;
+  comments: ComparableComment[];
+  grade: number | null;
+  pipelineTelemetry: PipelineTelemetrySnapshot | null;
+}
+
+/**
+ * Record of an item filtered by the pipeline
+ */
+export interface FilteredItemSnapshot {
+  stage: string;
+  quotedText: string;
+  header?: string;
+  filterReason: string;
+  supportLocation?: string;
+  originalIndex: number;
+}
+
+/**
+ * Simplified telemetry for comparison
+ */
+export interface PipelineTelemetrySnapshot {
+  totalDurationMs: number;
+  issuesExtracted: number;
+  issuesAfterDedup: number;
+  issuesAfterFiltering: number;
+  commentsGenerated: number;
+  commentsKept: number;
+  /** Items filtered out with their reasoning */
+  filteredItems?: FilteredItemSnapshot[];
+  /** Extraction phase telemetry (multi-extractor mode) */
+  extractionPhase?: ExtractionPhaseSnapshot;
+}
+
+/**
+ * Telemetry for a single extractor run
+ */
+export interface ExtractorSnapshot {
+  extractorId: string;
+  model: string;
+  temperature: number;
+  issuesFound: number;
+  durationMs: number;
+  error?: string;
+}
+
+/**
+ * Extraction phase telemetry (for multi-extractor mode)
+ */
+export interface ExtractionPhaseSnapshot {
+  multiExtractorEnabled: boolean;
+  extractors: ExtractorSnapshot[];
+  totalIssuesBeforeJudge: number;
+  totalIssuesAfterJudge: number;
+  judgeModel?: string;
+  judgeDurationMs?: number;
+  judgeDecisions: Array<{
+    issueText: string;
+    decision: 'accepted' | 'merged' | 'rejected';
+    reasoning: string;
+    sourceExtractors: string[];
+  }>;
+}
+
+/**
+ * Result of comparing a single comment between runs
+ */
+export interface CommentComparisonResult {
+  status: "matched" | "new" | "lost";
+  baselineComment?: ComparableComment;
+  currentComment?: ComparableComment;
+  matchConfidence?: number; // 0-1 for fuzzy matches
+}
+
+/**
+ * Result of comparing two evaluation snapshots
+ */
+export interface DocumentComparisonResult {
+  documentId: string;
+  documentTitle: string;
+  baseline: EvaluationSnapshot;
+  current: EvaluationSnapshot;
+
+  // Comment-level changes
+  matchedComments: CommentComparisonResult[];
+  newComments: ComparableComment[];
+  lostComments: ComparableComment[];
+
+  // Aggregate metrics
+  scoreChange: number | null; // current - baseline (null if either missing)
+  commentCountChange: number; // current - baseline
+
+  // Pipeline telemetry changes
+  extractionChange: number | null; // % change in issues extracted
+  durationChange: number | null; // ms change
+
+  // Regression flags
+  regressions: RegressionFlag[];
+}
+
+/**
+ * A specific regression detected
+ */
+export interface RegressionFlag {
+  type: RegressionType;
+  severity: "warning" | "error";
+  message: string;
+  details?: Record<string, unknown>;
+}
+
+export type RegressionType =
+  | "score_drop"
+  | "lost_comments"
+  | "lost_high_importance"
+  | "extraction_drop"
+  | "duration_spike";
+
+/**
+ * Thresholds for regression detection
+ */
+export const REGRESSION_THRESHOLDS = {
+  // Score drop > 1 point is a regression
+  SCORE_DROP: 1,
+  // Losing > 50% of comments is a regression
+  LOST_COMMENTS_PERCENT: 50,
+  // Any lost comment with importance > 70 is a regression
+  HIGH_IMPORTANCE_THRESHOLD: 70,
+  // Extraction dropping > 30% is a regression
+  EXTRACTION_DROP_PERCENT: 30,
+  // Duration increase > 100% is a warning
+  DURATION_SPIKE_PERCENT: 100,
+} as const;
+
+/**
+ * Summary of a validation run
+ */
+export interface ValidationRunSummary {
+  runId: string;
+  createdAt: Date;
+  description: string;
+  documentCount: number;
+
+  // Aggregate results
+  noRegressionCount: number;
+  warningCount: number;
+  errorCount: number;
+
+  // Can be set as new baseline
+  canBeBaseline: boolean;
+}
+
+/**
+ * Full validation run with all comparisons
+ */
+export interface ValidationRun {
+  summary: ValidationRunSummary;
+  comparisons: DocumentComparisonResult[];
+}
+
+/**
+ * Input for creating a validation run
+ */
+export interface CreateValidationRunInput {
+  description: string;
+  documentIds: string[];
+  agentId: string;
+  baselineRunId?: string; // If not specified, uses most recent for each doc
+}
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index 830279c22..631fc66fa 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -401,6 +401,9 @@ importers:
       '@roast/db':
         specifier: workspace:*
         version: link:../db
+      fuse.js:
+        specifier: ^7.1.0
+        version: 7.1.0
       mathjs:
         specifier: ^14.0.1
         version: 14.6.0
@@ -515,6 +518,9 @@ importers:
 
   meta-evals:
     dependencies:
+      '@leeoniya/ufuzzy':
+        specifier: ^1.0.18
+        version: 1.0.18
       '@roast/ai':
         specifier: workspace:*
         version: link:../internal-packages/ai
@@ -524,6 +530,9 @@ importers:
       dotenv:
         specifier: ^16.4.5
         version: 16.6.1
+      fuse.js:
+        specifier: ^7.1.0
+        version: 7.1.0
       ink:
         specifier: ^6.5.1
         version: 6.5.1(@types/react@19.2.7)(react@19.2.1)
@@ -533,6 +542,9 @@ importers:
       ink-spinner:
         specifier: ^5.0.0
         version: 5.0.0(ink@6.5.1(@types/react@19.2.7)(react@19.2.1))(react@19.2.1)
+      ink-text-input:
+        specifier: ^6.0.0
+        version: 6.0.0(ink@6.5.1(@types/react@19.2.7)(react@19.2.1))(react@19.2.1)
       react:
         specifier: ^19.2.1
         version: 19.2.1
@@ -4213,6 +4225,10 @@ packages:
   functions-have-names@1.2.3:
     resolution: {integrity: sha512-xckBUXyTIqT97tq2x2AMb+g163b5JFysYk0x4qxNFwbfQkmNZoiRHb6sPzI9/QV33WeuvVYBUIiD4NzNIyqaRQ==}
 
+  fuse.js@7.1.0:
+    resolution: {integrity: sha512-trLf4SzuuUxfusZADLINj+dE8clK1frKdmqiJNb1Es75fmI5oY6X2mxLVUciLLjxqw/xr72Dhy+lER6dGd02FQ==}
+    engines: {node: '>=10'}
+
   gensync@1.0.0-beta.2:
     resolution: {integrity: sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==}
     engines: {node: '>=6.9.0'}
@@ -4425,6 +4441,13 @@ packages:
       ink: '>=4.0.0'
       react: '>=18.0.0'
 
+  ink-text-input@6.0.0:
+    resolution: {integrity: sha512-Fw64n7Yha5deb1rHY137zHTAbSTNelUKuB5Kkk2HACXEtwIHBCf9OH2tP/LQ9fRYTl1F0dZgbW0zPnZk6FA9Lw==}
+    engines: {node: '>=18'}
+    peerDependencies:
+      ink: '>=5'
+      react: '>=18'
+
   ink@6.5.1:
     resolution: {integrity: sha512-wF3j/DmkM8q5E+OtfdQhCRw8/0ahkc8CUTgEddxZzpEWPslu7YPL3t64MWRoI9m6upVGpfAg4ms2BBvxCdKRLQ==}
     engines: {node: '>=20'}
@@ -5078,6 +5101,7 @@ packages:
   next@15.3.6:
     resolution: {integrity: sha512-oI6D1zbbsh6JzzZFDCSHnnx6Qpvd1fSkVJu/5d8uluqnxzuoqtodVZjYvNovooznUq8udSAiKp7MbwlfZ8Gm6w==}
     engines: {node: ^18.18.0 || ^19.8.0 || >= 20.0.0}
+    deprecated: This version has a security vulnerability. Please upgrade to a patched version. See https://nextjs.org/blog/security-update-2025-12-11 for more details.
     hasBin: true
     peerDependencies:
       '@opentelemetry/api': ^1.1.0
@@ -6572,6 +6596,7 @@ packages:
   whatwg-encoding@3.1.1:
     resolution: {integrity: sha512-6qN4hJdMwfYBtE3YBTTHhoeuUrDBPZmbQaxWAqSALV/MeEnR5z1xd8UKud2RAkFoPkmB+hli1TZSnyi84xz1vQ==}
     engines: {node: '>=18'}
+    deprecated: Use @exodus/bytes instead for a more spec-conformant and faster implementation
 
   whatwg-mimetype@3.0.0:
     resolution: {integrity: sha512-nt+N2dzIutVRxARx1nghPKGv1xHikU7HKdfafKkLNLindmPU/ch3U31NOCGGA/dmPcmb1VlofO0vnKAcsm0o/Q==}
@@ -9913,7 +9938,7 @@ snapshots:
       sirv: 3.0.1
       tinyglobby: 0.2.14
       tinyrainbow: 2.0.0
-      vitest: 3.2.4(@types/debug@4.1.12)(@types/node@20.19.9)(@vitest/ui@3.2.4)(happy-dom@18.0.1)(jiti@2.5.1)(jsdom@24.1.3)(terser@5.43.1)(tsx@4.21.0)(yaml@2.8.1)
+      vitest: 3.2.4(@types/debug@4.1.12)(@types/node@22.17.0)(@vitest/ui@3.2.4)(happy-dom@18.0.1)(jiti@2.5.1)(jsdom@24.1.3)(terser@5.43.1)(tsx@4.21.0)(yaml@2.8.1)
 
   '@vitest/utils@3.2.4':
     dependencies:
@@ -11229,6 +11254,8 @@ snapshots:
 
   functions-have-names@1.2.3: {}
 
+  fuse.js@7.1.0: {}
+
   gensync@1.0.0-beta.2: {}
 
   get-east-asian-width@1.4.0: {}
@@ -11501,6 +11528,13 @@ snapshots:
       ink: 6.5.1(@types/react@19.2.7)(react@19.2.1)
       react: 19.2.1
 
+  ink-text-input@6.0.0(ink@6.5.1(@types/react@19.2.7)(react@19.2.1))(react@19.2.1):
+    dependencies:
+      chalk: 5.6.2
+      ink: 6.5.1(@types/react@19.2.7)(react@19.2.1)
+      react: 19.2.1
+      type-fest: 4.41.0
+
   ink@6.5.1(@types/react@19.2.7)(react@19.2.1):
     dependencies:
       '@alcalzone/ansi-tokenize': 0.2.2
diff --git a/research/ideation/2025-12-15-fact-fallacy-check-improvements.md b/research/ideation/2025-12-15-fact-fallacy-check-improvements.md
index 21ea817fb..529d9b2f0 100644
--- a/research/ideation/2025-12-15-fact-fallacy-check-improvements.md
+++ b/research/ideation/2025-12-15-fact-fallacy-check-improvements.md
@@ -84,3 +84,161 @@ Per-collection dimensions:
 - Results stored in DB (`MetaEvaluation` table)
 - CLI shell in `meta-evals/` for dev/testing
 - Future: run in production, show to users, enable voting
+
+---
+
+## Part 3: Fallacy Checker Refactor (2025-01)
+
+Based on user feedback (LessWrong/EA Forum): too aggressive, flags intro claims supported later, misses rhetorical context.
+
+### Architecture
+
+```
+Extract (single-pass, wide net)
+    ↓
+Filter (multi-stage)
+  - Principle of Charity
+  - Supported Elsewhere?
+  - Dedup / severity threshold
+    ↓
+Comment (pure transformation)
+    ↓
+Review (summarize only — no filtering)
+```
+
+### 3.1 Single-Pass Extraction
+
+Replace chunked extraction with single LLM call on full document. Cast wide net.
+
+### 3.2 Filter: Principle of Charity
+
+Separate filtering step. For each issue: "Does this hold under the strongest interpretation of the argument?"
+
+### 3.3 Filter: Supported Elsewhere?
+
+"Is this claim supported, explained, or qualified elsewhere in the document?"
+
+### 3.4 Simplify Review
+
+Remove filtering logic from review prompt. Focus only on generating summaries.
+
+### 3.5 Model Testing Results (2025-01-04)
+
+**Supported-Elsewhere Filter - Model Comparison:**
+
+| Model | Issues Kept | Cost | Notes |
+|-------|-------------|------|-------|
+| Claude Opus | 0/5 | ~$0.06 | Most aggressive filtering |
+| Claude Sonnet | 1-2/5 | ~$0.02 | Too conservative |
+| Gemini 3 Flash | 0/5 | $0.003 | Agrees with Opus, very fast |
+| Gemini 3 Pro | 0/5 | $0.054 | Agrees with Opus, detailed explanations |
+
+**Conclusion:** Opus, Gemini Flash, and Gemini Pro all agree that intro claims justified by later technical sections should be filtered. Sonnet is the outlier - too conservative. **Gemini 3 Flash is the best choice** for the filter: cheap ($0.003), fast, and accurate.
+
+**Extraction - Model Comparison:**
+
+| Model | Issues Found | Notes |
+|-------|--------------|-------|
+| Claude Sonnet | 5 | Standard extraction |
+| Gemini 3 Flash | 4 | Slightly different profile - missed 2 issues but found 1 different one |
+
+Both sets of extracted issues were 100% false positives (all filtered by supported-elsewhere). The extraction differences don't matter in practice since the filter catches them all.
+
+### 3.6 OpenRouter Integration
+
+Added OpenRouter support for multi-model testing:
+
+```bash
+# Environment variables for model override
+FALLACY_EXTRACTOR_MODEL=google/gemini-3-flash-preview
+FALLACY_FILTER_MODEL=google/gemini-3-flash-preview
+```
+
+**Implemented:**
+- `callOpenRouterWithTool<T>()` - Generic wrapper for OpenRouter tool calling
+- Temperature normalization per provider (Anthropic 0-1, others 0-2)
+- Auto-detection of OpenRouter models (contains `/` in model ID)
+- Added Gemini 3 Pro/Flash model IDs to `OPENROUTER_MODELS`
+
+### 3.7 Next Steps
+
+#### Extraction
+- Try specialized prompts per issue type (logical fallacies, missing context, rhetorical manipulation)
+- Test more models (Flash, others) individually and in combination
+
+#### Filtering
+- **Principle of Charity filter** (not yet implemented) - "Does this hold under the strongest interpretation?"
+- **Dedup / severity threshold** (not yet implemented) - consolidate similar issues, enforce minimum severity
+- Consider per-claim verification (separate LLM calls) - batch approach works but may miss nuances
+- Consider Gemini 3 Flash for production (16x cheaper, same accuracy)
+
+#### Review
+- No changes needed - already simplified to summary-only
+
+#### Cross-Cutting: Multi-Expert Aggregation
+- Run multiple models in parallel, aggregate by majority vote or confidence-weighted
+- Reduces both false positives and false negatives
+- Cost-effective: cheap models (Flash) + one premium model
+- Alternative: same model at different temperatures for diversity
+
+#### Cross-Cutting: Pipeline Observability
+- Add metrics/logging per stage: issues in → issues out, time, cost
+- Enable tracing through full pipeline for debugging
+- Start with structured logs, consider dedicated metrics later
+- Goal: understand where issues are caught/missed, identify bottlenecks
+
+#### Cross-Cutting: Validation & Regression Testing
+- **Use meta-evals infrastructure** - already has UI for quick iteration and process parts implemented
+- Run against recent unique docs in dev DB (imported from prod), compare to previous results
+- Find cases with genuine fallacies that should NOT be filtered (validate filter accuracy)
+- Track: issues found, issues filtered, final comments generated
+- Measure delta from original to understand impact of changes
+- Don't need meta-evals scoring/rating yet - just use the execution framework
+- Goal: ensure changes are improvements, catch regressions early
+
+---
+
+### 3.8 Prioritized Implementation Plan
+
+**Principle: Measure before changing. Validate before deploying.**
+
+#### Phase 1: Foundation (do this first)
+*Can't improve what we can't measure. Can't validate without a baseline.*
+
+1. **Pipeline observability** - Add structured logging per stage (issues in/out, time, cost). Quick win, enables everything else.
+2. **Validation framework** - Set up meta-evals to run against dev DB documents. Establish baseline of current behavior before making changes.
+
+#### Phase 2: Filter Improvements (one at a time, measured)
+*Each change validated against baseline. Stop if regressions detected.*
+
+3. **Dedup/severity threshold** - Mechanical, low risk. Consolidate similar issues, enforce minimums.
+4. **Principle of Charity filter** - LLM-based, higher complexity. "Does this hold under strongest interpretation?"
+5. **Per-claim verification** - Only if batch approach shows accuracy issues in validation.
+
+#### Phase 3: Extraction Improvements
+*Filters are solid, now refine the input.*
+
+6. **Specialized prompts** - Split by issue type (logical fallacies, missing context, rhetorical). Measure each variant.
+7. **Model testing** - Compare Flash vs Claude for extraction quality/cost tradeoff.
+
+#### Phase 4: Optimizations
+*Core pipeline works well, now optimize for cost and accuracy.*
+
+8. **Gemini Flash for production** - 16x cheaper, validated as accurate. Easy win.
+9. **Multi-expert aggregation** - Run multiple models, aggregate results. Higher accuracy, diminishing returns.
+
+#### Phase 5: Meta-Evaluation & Review Improvements
+*Use meta-evals scoring/rating to improve comment quality.*
+
+10. **Meta-eval scoring** - Enable quality scoring on generated comments (accuracy, clarity, importance, tone).
+11. **Review stage improvements** - Use meta-eval feedback to refine comment generation and summaries.
+12. **Feedback loop** - Iterate on prompts based on meta-eval scores, track improvement over time.
+
+#### Key Risks & Mitigations
+
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| Filter too aggressive (misses real issues) | Users see fewer issues than they should | Validation with known-fallacy documents |
+| Filter too lenient (keeps false positives) | User trust eroded | Regression testing against baseline |
+| Changes make things worse silently | Wasted effort, user harm | Observability + regression framework (Phase 1) |
+| Over-engineering before validating | Wasted effort | Phase 1 first, measure before building |