fix(reviewer): verdict severity tier + diagnostic write + UI banner + SSR lineage#96
Merged
Merged
Conversation
… SSR lineage Five connected defects surfaced by live bowerbird chat 019E6E3318873C26DCA60409B84F90E9 (gemini wrote 29 lines of severity findings, billed $0.05, was falsely marked verdict_ambiguous → fallback chain fired → collision → contradictory UI banners; kimi failed silently with no diagnostic trace; SSR phantom CLAUDE card for antigravity slots). 1. verdict.ts — recognise `### CRITICAL` / `**HIGH**` headers (with >=200- char body) as implicit request_changes. Trailing lookahead requires a heading terminator (`:`, `\n`, `**`, EOL) so prose like `### High-level review` or `**High-traffic endpoint**` doesn't false- positive. Codex + gemini both caught the hyphen-suffix issue in the chorus self-review 019E6E7A5D1DF943AD275D5595460D9A. 2. reviewer.ts + reviewer-driver.ts — factored `writeAttemptRow()` helper. Every null return (errored, new `empty_no_error`, `verdict_ambiguous`) AND every thrown exception now writes a row to `_attempts.jsonl` plus a `[reviewer]` daemon log line. Zero rows in a slot's _attempts.jsonl is now a true bug signal, not a diagnostic gap. Also stamps lineage+model into `_stats.json` on every successful completion. 3. run-artifacts route + participant-card — API derives per-swap `actuallyRan: boolean` by matching the swap's `to` against the slot's final _stats.json lineage stamp. UI suppresses the swap banner block entirely when primary produced the displayed answer AND no swap actually ran. Per-entry `actuallyRan` (not `isLast`) drives the strikethrough + "actually ran" badge. 4. runs/[runId]/page.tsx — SSR readChatRounds had its own inline AGENT_TO_LINEAGE missing `antigravity-cli` / `grok-cli` and defaulting unknown agents to literal "claude". Initial server-rendered HTML classified antigravity participants as claude → phantom CLAUDE card alongside the synthesised ANTIGRAVITY placeholder, vanishing on the first client-side /api/run-artifacts poll. Now uses the shared AGENT_TO_UI_LINEAGE map and passes raw agent name through on miss. Tests: 993 pass (was 989; added 4 hyphen-suffix false-positive cases to verdict tests, including a verbatim replay of the real bowerbird gemini review classifying as request_changes).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five connected defects in the chorus reviewer chain, surfaced by live bowerbird chat
019E6E3318873C26DCA60409B84F90E9:### CRITICAL/### HIGHfindings, billed $0.05, was falsely markedverdict_ambiguous→ fallback chain fired → collided with kimi's prior claim on claude-sonnet-4-6 → chain exhausted. UI renderedDONE+ gemini's real content + three contradictory amber banners (LINEAGE_FALLBACK + FALLBACK_COLLISION + "actually ran" badge on a voice that didn't run).answer.md, empty_attempts.jsonl, daemon log shows onlyfallback fired— no record of why kimi failed.Reviewer · CLAUDEcard next to the realANTIGRAVITY · gemini-3.5-flashplaceholder on initial page load (appears on refresh, vanishes after seconds when client-side/api/run-artifactspoll lands).Fixes
### CRITICAL/**HIGH**with >=200-char body) → implicitrequest_changes. Trailing lookahead requires a heading terminator (:,\n,**, EOL) so### High-level reviewetc. don't false-positivesrc/daemon/runner/verdict.tsempty_no_error,verdict_ambiguous) and thrown-exception catch. Closes the silent-failure gap — zero rows in_attempts.jsonlis now a true bug signalreviewer.ts,reviewer-driver.tsactuallyRan: booleanfrom the slot's_stats.jsonlineage+model stamp. UI suppresses banner block when primary produced the answer and no swap actually ran. Per-entryactuallyRan(notisLast) drives the strikethrough + badgeroute.ts,participant-card.tsx,types.tsreadChatRoundshad its own inlineAGENT_TO_LINEAGEmissingantigravity-cli/grok-cli, defaulting unknown agents to literal"claude". Now uses the sharedAGENT_TO_UI_LINEAGEmap and passes raw agent name through on misssrc/app/runs/[runId]/page.tsxChorus self-review
Fired on this exact diff (chat
019E6E7A5D1DF943AD275D5595460D9A). 5/7 reviewers approve. Codex and gemini convergently flagged the severity-regex hyphen-suffix issue (### High-level reviewwould match) — applied trailing-lookahead fix and added 4 false-positive test cases.Test plan
pnpm typecheckcleanpnpm test— 993 pass / 2 skipped (was 989; +4 hyphen-suffix false-positive cases including verbatim replay of the real bowerbird gemini review)pnpm build && pnpm build:servercleanactuallyRan: falseon the gemini swap → UI suppresses banner on reload[reviewer] attempt failed kind=verdict_ambiguousfor qwen3.7-max in the chorus self-review chatlineage: "antigravity"instead of falling through to"claude"/runs/<chat>and confirm no phantom CLAUDE card on initial loadrequest_changesend-to-end on a fresh chat