fix(reviewer): verdict severity tier + diagnostic write + UI banner + SSR lineage by chorus-codes · Pull Request #96 · chorus-codes/chorus

chorus-codes · 2026-05-28T12:24:48Z

Summary

Five connected defects in the chorus reviewer chain, surfaced by live bowerbird chat 019E6E3318873C26DCA60409B84F90E9:

Gemini-3.1-pro-preview wrote 29 lines of ### CRITICAL / ### HIGH findings, billed $0.05, was falsely marked verdict_ambiguous → fallback chain fired → collided with kimi's prior claim on claude-sonnet-4-6 → chain exhausted. UI rendered DONE + gemini's real content + three contradictory amber banners (LINEAGE_FALLBACK + FALLBACK_COLLISION + "actually ran" badge on a voice that didn't run).
Kimi-k2.6 same chat: 0-byte answer.md, empty _attempts.jsonl, daemon log shows only fallback fired — no record of why kimi failed.
Phantom Reviewer · CLAUDE card next to the real ANTIGRAVITY · gemini-3.5-flash placeholder on initial page load (appears on refresh, vanishes after seconds when client-side /api/run-artifacts poll lands).

Fixes

#	Fix	Files
1	Severity-style reviews (`### CRITICAL` / `HIGH` with >=200-char body) → implicit `request_changes`. Trailing lookahead requires a heading terminator (`:`, `\n`, `**`, EOL) so `### High-level review` etc. don't false-positive	`src/daemon/runner/verdict.ts`
2	Diagnostic write helper called from every null-return path (errored, new `empty_no_error`, `verdict_ambiguous`) and thrown-exception catch. Closes the silent-failure gap — zero rows in `_attempts.jsonl` is now a true bug signal	`reviewer.ts`, `reviewer-driver.ts`
3	API derives per-swap `actuallyRan: boolean` from the slot's `_stats.json` lineage+model stamp. UI suppresses banner block when primary produced the answer and no swap actually ran. Per-entry `actuallyRan` (not `isLast`) drives the strikethrough + badge	`route.ts`, `participant-card.tsx`, `types.ts`
4	SSR `readChatRounds` had its own inline `AGENT_TO_LINEAGE` missing `antigravity-cli` / `grok-cli`, defaulting unknown agents to literal `"claude"`. Now uses the shared `AGENT_TO_UI_LINEAGE` map and passes raw agent name through on miss	`src/app/runs/[runId]/page.tsx`

Chorus self-review

Fired on this exact diff (chat 019E6E7A5D1DF943AD275D5595460D9A). 5/7 reviewers approve. Codex and gemini convergently flagged the severity-regex hyphen-suffix issue (### High-level review would match) — applied trailing-lookahead fix and added 4 false-positive test cases.

Test plan

pnpm typecheck clean
pnpm test — 993 pass / 2 skipped (was 989; +4 hyphen-suffix false-positive cases including verbatim replay of the real bowerbird gemini review)
pnpm build && pnpm build:server clean
Daemon restarted on the new build; cockpit + daemon both HTTP 200
Existing bowerbird chat API now returns actuallyRan: false on the gemini swap → UI suppresses banner on reload
New diagnostic write verified live: [reviewer] attempt failed kind=verdict_ambiguous for qwen3.7-max in the chorus self-review chat
SSR antigravity participant now serialised with lineage: "antigravity" instead of falling through to "claude"
Verify in browser: hard-reload /runs/<chat> and confirm no phantom CLAUDE card on initial load
Verify in browser: gemini severity-style review classifies as request_changes end-to-end on a fresh chat

… SSR lineage Five connected defects surfaced by live bowerbird chat 019E6E3318873C26DCA60409B84F90E9 (gemini wrote 29 lines of severity findings, billed $0.05, was falsely marked verdict_ambiguous → fallback chain fired → collision → contradictory UI banners; kimi failed silently with no diagnostic trace; SSR phantom CLAUDE card for antigravity slots). 1. verdict.ts — recognise `### CRITICAL` / `**HIGH**` headers (with >=200- char body) as implicit request_changes. Trailing lookahead requires a heading terminator (`:`, `\n`, `**`, EOL) so prose like `### High-level review` or `**High-traffic endpoint**` doesn't false- positive. Codex + gemini both caught the hyphen-suffix issue in the chorus self-review 019E6E7A5D1DF943AD275D5595460D9A. 2. reviewer.ts + reviewer-driver.ts — factored `writeAttemptRow()` helper. Every null return (errored, new `empty_no_error`, `verdict_ambiguous`) AND every thrown exception now writes a row to `_attempts.jsonl` plus a `[reviewer]` daemon log line. Zero rows in a slot's _attempts.jsonl is now a true bug signal, not a diagnostic gap. Also stamps lineage+model into `_stats.json` on every successful completion. 3. run-artifacts route + participant-card — API derives per-swap `actuallyRan: boolean` by matching the swap's `to` against the slot's final _stats.json lineage stamp. UI suppresses the swap banner block entirely when primary produced the displayed answer AND no swap actually ran. Per-entry `actuallyRan` (not `isLast`) drives the strikethrough + "actually ran" badge. 4. runs/[runId]/page.tsx — SSR readChatRounds had its own inline AGENT_TO_LINEAGE missing `antigravity-cli` / `grok-cli` and defaulting unknown agents to literal "claude". Initial server-rendered HTML classified antigravity participants as claude → phantom CLAUDE card alongside the synthesised ANTIGRAVITY placeholder, vanishing on the first client-side /api/run-artifacts poll. Now uses the shared AGENT_TO_UI_LINEAGE map and passes raw agent name through on miss. Tests: 993 pass (was 989; added 4 hyphen-suffix false-positive cases to verdict tests, including a verbatim replay of the real bowerbird gemini review classifying as request_changes).

chorus-codes merged commit 8fbe6c3 into main May 28, 2026
2 checks passed

chorus-codes deleted the fix/verdict-severity-and-diagnostics branch May 28, 2026 12:37

chorus-codes mentioned this pull request May 28, 2026

chore: bump to v0.8.60 #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reviewer): verdict severity tier + diagnostic write + UI banner + SSR lineage#96

fix(reviewer): verdict severity tier + diagnostic write + UI banner + SSR lineage#96
chorus-codes merged 1 commit into
mainfrom
fix/verdict-severity-and-diagnostics

chorus-codes commented May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chorus-codes commented May 28, 2026

Summary

Fixes

Chorus self-review

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant