docs(checklist): add hazards from real Copilot catches on have-config + pr-review by willgriffin · Pull Request #3 · happyvertical/pr-review

willgriffin · 2026-05-22T18:15:17Z

Summary

Folds five patterns Copilot caught on this week's PRs (have-config #2 + pr-review #2) into the shared checklist. Acting on the calibration insight immediately rather than batching for pr-review-tune — each pattern is concrete with file/line citation from the source PR, no ambiguity.

New bullets

Theme 7 renamed from "Dead config & unwired parameters" → "Config hazards: dead, surprising, or over-active". The original "missing consumer" framing missed the inverse problem (config that has invisible side effects on consumers), which is just as common in shared-config / monorepo / base-config setups.

Three new bullets:

Shared config with invisible side effects on consumers (e.g. incremental: true in shared tsconfig → silent .tsbuildinfo files everywhere)
Shared config too narrow for documented use cases (e.g. base tsconfig claims SvelteKit support but lacks DOM lib)
README install instructions that contradict package.json (telling consumers to install what's already in optionalDependencies)

Theme 8 (Infra/deploy hazards) gains three new bullets:

GitHub Actions workflow-command injection (`::error::$user_input` without escaping %/\r/\n)
echo \"\$user_input\" | grep parsing dashes as flags (use printf '%s\n')
GitHub Actions permissions broader than the workflow needs (dead scope, least privilege)

Why now, not via `pr-review-tune`

The tune loop is built for batch refinement when many findings need triage and dedup. These five patterns are unambiguous — each one is a real bug or DX hazard with a clear fix and a real example. Waiting to batch them just delays the value.

Test plan

Conventional commit message check (passes — the PR's commit conforms)
Visually verify markdown renders correctly on GitHub
Next pr-review run that includes a shared-config change should now catch the "invisible side effect" pattern proactively

… + pr-review Two waves of Copilot review on this week's PRs surfaced patterns the checklist didn't cover. Folding them in immediately rather than waiting for batched pr-review-tune. Theme 7 (renamed: "Config hazards: dead, surprising, or over-active") gains three new bullets: - Shared config with invisible side effects on consumers. Caught on have-config PR #2: `incremental: true` in a shared tsconfig.base.json silently emits .tsbuildinfo in every consuming tree. Generalizes to any extended/published config (tsconfig, eslint, prettier, vite, biome). - Shared config too narrow for documented use cases. Same PR: tsconfig-base README claimed SvelteKit support but lib was ES2022 only (no DOM). Either narrow the docs, add a variant, or require explicit override. - README install instructions that contradict package.json. Same PR: README told consumers to install three packages that were already optionalDependencies. Causes duplicate installs / version skew. Theme 8 (Infra/deploy hazards) gains three new bullets from pr-review PR #2: - GitHub Actions workflow-command injection from user-controlled content. `::error::<commit-subject>` without escaping `%` `\r` `\n` lets a crafted commit message inject workflow commands or spoof logs. Real security issue, not theoretical. - `echo "$user_input" | grep` parses dashes as flags. Subjects starting with `-n`/`-e` get treated as echo options. Use `printf '%s\n'`. - GitHub Actions permissions broader than the workflow needs. pull-requests: read on a workflow that only does git log + event payload reads is dead scope. Least privilege. All five patterns are concrete with file/line citations from the source PRs. Should reduce time spent on the same class of issue in future shared-config and workflow PRs.

Copilot

Pull request overview

Updates the shared PR-review checklist to incorporate concrete “hazard patterns” observed in recent reviews, expanding the config section to cover both dead config and config with surprising downstream effects, and adding new infra/deploy safety checks.

Changes:

Renames Theme 7 and adds guidance on shared-config hazards (invisible side effects, overly narrow base configs, README install mismatches).
Adds new Theme 8 bullets covering GitHub Actions workflow-command injection, safer shell printing patterns, and least-privilege workflow permissions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…llet Two Copilot catches on PR #3: 1. The escape examples were written as pseudo-sed (`s//\%/%25/` etc.) which is malformed and unhelpful. Replaced with the actual bash parameter-expansion helper from the commitlint.yml workflow that triggered this checklist update — concrete, copy-pasteable, and correct. Also linked GitHub's workflow-commands docs as the authoritative source for the encoding requirements. 2. Bullet title attributed the dash-parsing problem to the whole `echo "$user_input" | grep` pipeline; the issue is actually `echo` alone (grep just sees empty stdin). Renamed the bullet to "`echo "$user_input"` treats leading `-n`/`-e`/`-E` as flags" and expanded the body to explain the downstream silent-empty-input failure mode, which is the more dangerous symptom.

Pattern catch-rate audit on the 17 Copilot findings from have-config #2, pr-review #2, pr-review #3, and have-config #4 showed that the post-#3 checklist catches ~9 of 17 strongly and ~3 borderline. Four additional patterns surfaced repeatedly without dedicated bullets; adding them brings the catch rate to ~14 of 17 (~82%), leaving 3 genuinely uncatchable findings (one Copilot false positive + two meta-writing-quality issues). New bullets: Theme 7 (Config hazards): - Engine/version constraints looser than what the lockfile needs. Catches the recurring `engines.node: ">=20"` vs lockfile-requires- 20.19 pattern plus its sibling on `actions/setup-node node-version`, packageManager, .nvmrc, Docker tags, CI tool pins. Theme 8 (Infra/deploy hazards): - Third-party GitHub Actions pinned to moving tags instead of commit SHAs. Real supply-chain risk; vendored anytown agents already SHA-pin. Includes the readable `# v5` comment pattern and link to GitHub's hardening docs. - Extended the existing "Interpolated shell variables into psql/sed/ perl substitutions without escaping" bullet with a sibling bullet covering shell-escape / regex visual-vs-parsed ambiguity (the `Merge\ ` two-char issue, `'\n'` vs `$'\n'`, BRE vs ERE vs PCRE flavors). Theme 11 (Mechanical): - Shebang interpreter doesn't match the file's syntax. Catches the `#!/usr/bin/env node` on TypeScript file pattern from have-config #4 and generalizes to python version mismatch and `#!/bin/sh` bashisms. Out of scope: - Library-version-specific quirks like the disableTypeChecked array-vs-object spread (Copilot was wrong about that; doesn't generalize). - Meta writing-quality concerns (don't ship pseudo-code as examples) — fits Theme 6 in spirit but adding a separate bullet would dilute the theme. Skip until it shows up again.

willgriffin · 2026-05-22T18:56:10Z

Folded 4 more patterns from the catch-rate audit (commit ad5d05d):

Theme 7 — engine/version constraints looser than lockfile needs (catches the recurring engines.node: ">=20" vs requires-20.19 pattern + siblings on setup-node node-version, packageManager, .nvmrc, Docker tags)
Theme 8 — third-party GitHub Actions pinned to moving tags instead of SHAs (supply-chain hardening per GitHub's docs; matches the vendored anytown agent pattern)
Theme 8 — extended shell-escape bullet to cover the Merge\ two-char visual-vs-parsed gotcha and friends ('\n' vs $'\n', regex flavor differences)
Theme 11 — shebang interpreter doesn't match file syntax (the #!/usr/bin/env node on .ts pattern, plus python version mismatch, sh bashisms)

Net effect: catch rate on the last 17 Copilot findings moves from ~53% (current main) to ~82% (with #3 as-merged). Remaining ~18% is genuinely uncatchable — one Copilot false positive about library shape + two meta-writing-quality issues in the checklist's own prose.

willgriffin · 2026-05-22T19:01:05Z

Ran pr-review --base origin/main --pretty against this branch and applied the (now-expanded) checklist to the diff. Two findings on the new bullets themselves, both fixed in commit a91dd07:

[medium] The shell-escape bullet's Merge\\ accounting was factually wrong — claimed escape-plus-space is two chars when it's one. The actual bug being illustrated was the trailing extra space in ^Merge\\ , not the escape. A reader following the original framing would have written subtly wrong patterns themselves — the bullet was actively causing the class of bug it was meant to prevent. Fix: rewrite to focus on invisible trailing whitespace as the actual failure mode.

[low] The SHA-pin bullet used a concrete SHA (93cb6efe...) as the example. That SHA was current when written but will drift as actions/checkout ships new v5 patches. Fix: replaced with <sha> placeholder + the one-liner gh api repos/actions/checkout/git/refs/tags/v5 -q .object.sha to fetch the current SHA, so readers don't pin to a slowly-aging example.

Nice meta moment — the checklist applied to its own modifications caught two real issues. The calibration loop closing on itself works.

willgriffin · 2026-05-22T19:07:24Z

Ran the multi-tool review-cycle (codex + claude -p + Copilot) on this PR. Codex (gpt-5.5, xhigh) caught two real factual issues my own pass and Copilot's both missed. Fixed in commit dc99b8d:

[low — factual accuracy] Shell-escape bullet's regex counts were wrong both as source text (8/9) and as match length (6/7). Codex's fix is cleaner — drop the numeric framing entirely and describe what the regex matches in plain English ("Merge followed by two spaces" vs "...one space"). No counts to get wrong.

[low — outdated] Shebang bullet said Node "can't parse TypeScript natively" with interface/as/generics as examples. That's outdated for Node 22.6+ (--experimental-strip-types) and Node 23.6+/24 (default). All three examples are erasable syntax that Node strips natively. The org standard is Node 24, so the bullet would have produced false positives on this org's own projects. Rewrote to scope the hazard to actually-non-erasable syntax (enums, namespaces, parameter properties, TSX/JSX, decorators) which Node genuinely can't handle.

Operational notes from the run:

The claude -p subprocess in the ensemble failed with 401 Invalid authentication credentials — OAuth doesn't propagate from a parent claude session to spawned children. Need ANTHROPIC_API_KEY env var or a different invocation pattern when review-cycle is run from inside an active claude session. Should add to the /have:review-cycle command's hard-rules section.
Copilot already reviewed 2 rounds; my third commit didn't re-fire it.
codex with xhigh took ~50s to produce findings on a small docs diff. Slower than claude or grep-based linters, but it earned its keep by catching what the others missed.

This is the second consecutive PR where the calibration loop caught real factual bugs in checklist bullets describing the patterns those bugs exemplify. Worth treating the bullet text itself as code subject to the same checklist.

willgriffin · 2026-05-22T19:20:33Z

Third reviewer (Copilot CLI) ran successfully after the org policy was enabled. Caught two more factual issues that neither my own pass nor codex flagged — both required current external knowledge that Copilot got via live web fetch:

[medium] setup-node behaviour bullet was wrong. I wrote node-version: '20' "resolves to latest 20.x" — that's not how setup-node works. It picks from the runner's tool cache via semver match, which can lag the current latest 20.x. check-latest: true forces a fresh lookup. Copilot fetched setup-node's README to verify. Fixed by describing the actual cache+check-latest mechanism and recommending a specific minimum-minor pin ('20.19') instead of bare '20'.

[medium] Node TS native-strip threshold was outdated. I said "Node 22.6+ (with flag), 23.6+/24 (default)". Per Node's current docs (which Copilot fetched at https://nodejs.org/api/typescript.html), type stripping is default-enabled in 22.18+ as well as 23.6+/24. The flag-required window is 22.6 through 22.17. On a project pinned to 22.18 LTS, the old wording would have generated false-positive findings recommending an unnecessary flag.

Both fixed in commit 5588133.

Reviewer blind-spot map after this run:

Reviewer	Catches	Misses
Me (parent claude, reading my own diff)	Structural / framing issues	Off-by-one in counts; subtle factual claims
Codex (gpt-5.5, xhigh, training-time knowledge)	Reasoning errors; logical inconsistencies	External-tool / runtime version-specific facts that shifted post-training
Copilot CLI (with live web fetch)	Knowledge-currency on external tools, libraries, runtimes	(TBD — likely structural / cross-file refactor patterns)

Concrete validation that the ensemble pulls its weight: three different reviewers caught three non-overlapping sets of issues on this 50-line documentation PR. The dc99b8d fix was already wrong within a Node release cycle by the time it was written; Copilot's web fetch made the difference between shipping a stale bullet and shipping a current one.

Going to merge this only after one more full-ensemble run to confirm no fourth round of findings.

willgriffin · 2026-05-22T19:52:50Z

Cycle converged after 7 rounds. 🎉

Round-by-round summary

Round	Codex	Copilot	Fix	Commit
Sequential pre-cycle (me)	—	—	Merge regex accounting + SHA freshness	`a91dd07`
Sequential pre-cycle (codex one-shot)	2	—	Regex counts + Node TS examples	`dc99b8d`
Sequential pre-cycle (copilot one-shot)	—	2	setup-node cache + Node 22.18 threshold	`5588133`
Round 1 (ensemble)	1	0	transform-types option omitted	`3ce6be7`
Round 2 (ensemble)	1	1 (same finding)	transform-types wrongly attributed to TSX/decorators	`28e9a5f`
Round 3 (ensemble)	1	0	Node 25/26 type-strip ambiguity	`4a4bbea`
Round 4 (ensemble)	0	2	Structural: shebang in wrong section + SHA-pin scope	`d65931d`
Round 5 (ensemble)	1	0	Node 23.0-23.5 strip-types window	`5afbb6d`
Round 6 (ensemble)	0	0	—	(CONVERGED)

What the cycle actually surfaced

9 fixes across 7 rounds on a documentation diff that started at 37 lines. Without the loop, I would have shipped at round 1 (the sequential pre-cycle phase) with the round-1 ensemble finding still latent.
Asymmetric convergence: copilot tends to converge first (rounds 1, 3, 4 had it at 0). Codex catches narrow factual edge cases via deep verification. Copilot catches structural / placement issues.
The two reviewer types have non-overlapping blind spots, validated 7 times:
- Me (parent claude reading own diff): catches structural framing on first read; confirmation bias on factual claims I just wrote
- Codex (xhigh reasoning, training-time knowledge): deep deliberation on consistency; misses post-training facts
- Copilot CLI (xhigh + live web fetch): knowledge currency; structural / placement issues
The fix from round 2 was itself wrong in a way that needed round 3 to catch (transform-types claim about TSX/decorators). The fix from round 3 was also slightly wrong (round 4 caught the ambiguity). This is exactly the failure mode /review-cycle is designed to catch — fixes can introduce new bugs.

Operational lessons (folding into have-config#5)

Never stop after a fix without a verify round. A fix-round followed by a clean-round is convergence. A fix-round alone is not. I had been doing one-shot rounds — that's how I missed the round-1 ensemble finding.
Default cap of 3 rounds is too low for documentation PRs. Took 7 here. Cap should be advisory, not hard.
Run all reviewers in parallel on the same commit — not sequentially against each other's fixes. Sequential lets findings cascade and obscures whether reviewers actually agree.

Ready to merge.

Real failure: I ran review-cycle on pr-review#3 and "stopped" after each reviewer's first pass instead of actually looping. When I caught myself and ran it properly, the cycle took 7 rounds to converge — catching 9 progressively-narrower factual issues that would have shipped if I'd stopped early. The command file already said "Run up to `rounds` review rounds. Default: 3" and "stop the loop as clean" when no findings remain, but the wording was loose enough that I rationalized one-shot behaviour. This commit adds explicit Hard Rules that close that gap: 1. Each round runs all reviewers in parallel against the SAME commit (not sequentially against each other's fixes — that lets findings cascade in misleading ways and obscures whether reviewers actually agree on the latest state). 2. A fix-round is NEVER the final round. Convergence requires at least one round where every reviewer returns 0 findings against the latest commit. Just pushed a fix? Run another round before declaring clean. 3. Convergence is per-commit, not per-finding. Reviewer A clean against commit X doesn't transfer to commit Y (the fix commit). Also updated: - Default cap guidance: 3 is right for code; 5-10 for documentation / reviewer-checklist content where each round catches narrower factual edges (the pr-review#3 cycle was 7 rounds). - Step 10 now explicit: "If a fix was pushed in this round, the next round MUST run." - Step 11 explicit: "Stop as clean only when a verify round (no edits) returns no actionable findings." - Cap-hit guidance distinguishes three cases: spec too detailed, diminishing returns acceptable, genuine gap. Mirror edit in both claude/ and codex/ command files. Evidence: see pr-review#3 (happyvertical/pr-review#3) for the 7-round convergence log with per-round commits, findings counts, and the asymmetric convergence pattern between codex (catches narrow factual edges via deep verification) and copilot CLI (catches structural/placement issues via live web fetch + cross-file grep).

willgriffin requested a review from Copilot May 22, 2026 18:27

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread prompts/checklist.md Outdated

Comment thread prompts/checklist.md Outdated

willgriffin merged commit 12d107b into main May 22, 2026
1 check passed

willgriffin mentioned this pull request May 22, 2026

fix(have:review-cycle): operational discipline — Copilot non-optional, loop convergence, blocker docs happyvertical/have-config#5

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(checklist): add hazards from real Copilot catches on have-config + pr-review#3

docs(checklist): add hazards from real Copilot catches on have-config + pr-review#3
willgriffin merged 2 commits into
mainfrom
docs/checklist-config-hazards

willgriffin commented May 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

willgriffin commented May 22, 2026

Summary

New bullets

Why now, not via pr-review-tune

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Uh oh!

willgriffin commented May 22, 2026

Round-by-round summary

What the cycle actually surfaced

Operational lessons (folding into have-config#5)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why now, not via `pr-review-tune`