Skip to content

refactor(llm): one robust JSON-parsing seam (Phase 1)#15

Merged
gurdenbatra merged 1 commit into
mainfrom
phase-1-structured-outputs
Jun 12, 2026
Merged

refactor(llm): one robust JSON-parsing seam (Phase 1)#15
gurdenbatra merged 1 commit into
mainfrom
phase-1-structured-outputs

Conversation

@gurdenbatra

Copy link
Copy Markdown
Member

Why

LLM responses were parsed six different ways across ~10 call sites — the root cause of the recurring fenced-JSON bugs (the distillation failure earlier this project was one instance):

  • raw JSON.parse(content) with no fence handling (setup, webScanner) → a ```json-fenced response threw, silently yielding "Failed to parse" / zero signals
  • ad-hoc fence regexes + object-only extractJsonObject (extraction ×3, process, reflection) → object-only extraction couldn't handle process step-3's array response, and the regex missed trailing prose
  • the robust path (distillation, tour) — fine, but duplicated

What

New src/lib/llm/parse.ts — the single seam every agent parses through:

  • extractJson(text) — strips ``` fences + leading/trailing prose, extracts the first balanced object {} or array [] (fixes the array case)
  • parseLlmJson(content, schema) — parse + Zod validate
  • tryParseLlmJson(...) — non-throwing (returns null)
  • parseLlmJsonLoose(content) — parse without a schema for callers that validate downstream

All 10 sites migrated. Each keeps its prior error semanticssetup's "Failed to parse", extraction's PDF_UNREADABLE, correction's empty-on-failure, tour's normalizeTour. The meeting/document extractors gain an explicit object guard (they previously leaned on JSON.parse's any).

Scope note

This is the parser consolidation — it deletes the fenced-JSON / silent-failure class at its root without touching the provider abstraction. The API-level structured-outputs option (output_config.format for a hard schema guarantee) is a natural follow-up that can build on this seam.

Test plan

  • 14 new unit tests (parse.test.ts) — fences, prose, arrays, strings-with-braces, schema validation, non-throwing variant
  • existing distillation / extraction / agent suites unchanged and green
  • vitest run → 552 pass · clean tsc --noEmit → 0 · eslint . → 0

Part of Phase 1, alongside #13 (cost table) and #14 (withAuth). Independent of both.

LLM responses were parsed six different ways across ~10 call sites:
- raw JSON.parse(content) with NO fence handling (setup, webScanner) — a
  ```json-fenced response threw and silently produced zero results
- ad-hoc fence regexes + object-only extractJsonObject (extraction ×3,
  process, reflection) — object-only extraction broke process step-3's array
- the robust extractJsonObject path (distillation, tour)

Adds src/lib/llm/parse.ts with one seam:
- extractJson(): strips fences / prose, extracts the first balanced object OR
  array (fixes the array case)
- parseLlmJson(content, schema): parse + Zod validate
- tryParseLlmJson(): non-throwing variant
- parseLlmJsonLoose(): parse without schema for callers that validate downstream

All sites migrated to these; each keeps its prior error semantics (setup's
'Failed to parse', extraction's PDF_UNREADABLE, correction's empty-on-failure).
Meeting/document extraction gain an explicit object guard (they relied on
JSON.parse's `any`). 14 unit tests for the helper; existing agent tests
unchanged and green.

This deletes the fenced-JSON / silent-zero-results bug class at its root. The
API-level structured-outputs option (output_config.format) can build on this
seam in a follow-up.
@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
cof-learning-system Ready Ready Preview, Comment Jun 12, 2026 8:49am

Request Review

@gurdenbatra gurdenbatra merged commit afa7496 into main Jun 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant