refactor(llm): one robust JSON-parsing seam (Phase 1)#15
Merged
Conversation
LLM responses were parsed six different ways across ~10 call sites: - raw JSON.parse(content) with NO fence handling (setup, webScanner) — a ```json-fenced response threw and silently produced zero results - ad-hoc fence regexes + object-only extractJsonObject (extraction ×3, process, reflection) — object-only extraction broke process step-3's array - the robust extractJsonObject path (distillation, tour) Adds src/lib/llm/parse.ts with one seam: - extractJson(): strips fences / prose, extracts the first balanced object OR array (fixes the array case) - parseLlmJson(content, schema): parse + Zod validate - tryParseLlmJson(): non-throwing variant - parseLlmJsonLoose(): parse without schema for callers that validate downstream All sites migrated to these; each keeps its prior error semantics (setup's 'Failed to parse', extraction's PDF_UNREADABLE, correction's empty-on-failure). Meeting/document extraction gain an explicit object guard (they relied on JSON.parse's `any`). 14 unit tests for the helper; existing agent tests unchanged and green. This deletes the fenced-JSON / silent-zero-results bug class at its root. The API-level structured-outputs option (output_config.format) can build on this seam in a follow-up.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
LLM responses were parsed six different ways across ~10 call sites — the root cause of the recurring fenced-JSON bugs (the distillation failure earlier this project was one instance):
JSON.parse(content)with no fence handling (setup,webScanner) → a```json-fenced response threw, silently yielding "Failed to parse" / zero signalsextractJsonObject(extraction×3,process,reflection) → object-only extraction couldn't handleprocessstep-3's array response, and the regex missed trailing prosedistillation,tour) — fine, but duplicatedWhat
New
src/lib/llm/parse.ts— the single seam every agent parses through:extractJson(text)— strips```fences + leading/trailing prose, extracts the first balanced object{}or array[](fixes the array case)parseLlmJson(content, schema)— parse + Zod validatetryParseLlmJson(...)— non-throwing (returnsnull)parseLlmJsonLoose(content)— parse without a schema for callers that validate downstreamAll 10 sites migrated. Each keeps its prior error semantics —
setup's "Failed to parse",extraction'sPDF_UNREADABLE,correction's empty-on-failure,tour'snormalizeTour. The meeting/document extractors gain an explicit object guard (they previously leaned onJSON.parse'sany).Scope note
This is the parser consolidation — it deletes the fenced-JSON / silent-failure class at its root without touching the provider abstraction. The API-level structured-outputs option (
output_config.formatfor a hard schema guarantee) is a natural follow-up that can build on this seam.Test plan
parse.test.ts) — fences, prose, arrays, strings-with-braces, schema validation, non-throwing variantvitest run→ 552 pass · cleantsc --noEmit→ 0 ·eslint .→ 0Part of Phase 1, alongside #13 (cost table) and #14 (withAuth). Independent of both.