refactor(llm): one robust JSON-parsing seam (Phase 1) by gurdenbatra · Pull Request #15 · Dark-Matter-Labs/cof-learning-system

gurdenbatra · 2026-06-12T08:49:29Z

Why

LLM responses were parsed six different ways across ~10 call sites — the root cause of the recurring fenced-JSON bugs (the distillation failure earlier this project was one instance):

raw JSON.parse(content) with no fence handling (setup, webScanner) → a ```json-fenced response threw, silently yielding "Failed to parse" / zero signals
ad-hoc fence regexes + object-only extractJsonObject (extraction ×3, process, reflection) → object-only extraction couldn't handle process step-3's array response, and the regex missed trailing prose
the robust path (distillation, tour) — fine, but duplicated

What

New src/lib/llm/parse.ts — the single seam every agent parses through:

extractJson(text) — strips ``` fences + leading/trailing prose, extracts the first balanced object {} or array [] (fixes the array case)
parseLlmJson(content, schema) — parse + Zod validate
tryParseLlmJson(...) — non-throwing (returns null)
parseLlmJsonLoose(content) — parse without a schema for callers that validate downstream

All 10 sites migrated. Each keeps its prior error semantics — setup's "Failed to parse", extraction's PDF_UNREADABLE, correction's empty-on-failure, tour's normalizeTour. The meeting/document extractors gain an explicit object guard (they previously leaned on JSON.parse's any).

Scope note

This is the parser consolidation — it deletes the fenced-JSON / silent-failure class at its root without touching the provider abstraction. The API-level structured-outputs option (output_config.format for a hard schema guarantee) is a natural follow-up that can build on this seam.

Test plan

14 new unit tests (parse.test.ts) — fences, prose, arrays, strings-with-braces, schema validation, non-throwing variant
existing distillation / extraction / agent suites unchanged and green
vitest run → 552 pass · clean tsc --noEmit → 0 · eslint . → 0

Part of Phase 1, alongside #13 (cost table) and #14 (withAuth). Independent of both.

LLM responses were parsed six different ways across ~10 call sites: - raw JSON.parse(content) with NO fence handling (setup, webScanner) — a ```json-fenced response threw and silently produced zero results - ad-hoc fence regexes + object-only extractJsonObject (extraction ×3, process, reflection) — object-only extraction broke process step-3's array - the robust extractJsonObject path (distillation, tour) Adds src/lib/llm/parse.ts with one seam: - extractJson(): strips fences / prose, extracts the first balanced object OR array (fixes the array case) - parseLlmJson(content, schema): parse + Zod validate - tryParseLlmJson(): non-throwing variant - parseLlmJsonLoose(): parse without schema for callers that validate downstream All sites migrated to these; each keeps its prior error semantics (setup's 'Failed to parse', extraction's PDF_UNREADABLE, correction's empty-on-failure). Meeting/document extraction gain an explicit object guard (they relied on JSON.parse's `any`). 14 unit tests for the helper; existing agent tests unchanged and green. This deletes the fenced-JSON / silent-zero-results bug class at its root. The API-level structured-outputs option (output_config.format) can build on this seam in a follow-up.

vercel · 2026-06-12T08:49:35Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
cof-learning-system	Ready	Preview, Comment	Jun 12, 2026 8:49am

vercel Bot deployed to Preview June 12, 2026 08:49 View deployment

gurdenbatra merged commit afa7496 into main Jun 12, 2026
3 checks passed

gurdenbatra mentioned this pull request Jun 12, 2026

refactor(api): migrate 38 routes to withAuth + 401 JSON for /api (Phase 1) #16

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(llm): one robust JSON-parsing seam (Phase 1)#15

refactor(llm): one robust JSON-parsing seam (Phase 1)#15
gurdenbatra merged 1 commit into
mainfrom
phase-1-structured-outputs

gurdenbatra commented Jun 12, 2026

Uh oh!

vercel Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gurdenbatra commented Jun 12, 2026

Why

What

Scope note

Test plan

Uh oh!

vercel Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 12, 2026 •

edited

Loading