Skip to content

fix(release): close 4 release-path bugs found by pr-review ensemble#6

Merged
willgriffin merged 16 commits into
mainfrom
fix/release-workflow-correctness
May 23, 2026
Merged

fix(release): close 4 release-path bugs found by pr-review ensemble#6
willgriffin merged 16 commits into
mainfrom
fix/release-workflow-correctness

Conversation

@willgriffin
Copy link
Copy Markdown
Contributor

Summary

Follow-up to the just-merged direct-publish flow (#4). Ran pr-review ensemble (codex-cli + copilot-cli) retroactively against the release workflow + auto-changeset script — caught 4 real bugs that would degrade release behavior in different scenarios.

Findings → fixes

Sev Finding Both reviewers?
medium Single-package version gate skipped manual changesets bumping only one of prettier-config / tsconfig-base ✓ both
medium Publish-then-push order produced registry/git drift when main advanced mid-run ✓ both
medium Breaking-change footers in commit bodies got classified as patch instead of minor codex-cli only
low Unused id-token: write permission copilot-cli only

Fix details

  1. Working-tree gate instead of single-package: git diff --quiet && git diff --cached --quiet after changeset:version catches any package's changes — auto-generated, manual subset, or no-op alike. The old eslint-config-only check was a coupling assumption that didn't hold for manual changesets.

  2. Reordered publish flow: now commit → fetch origin/main → verify it hasn't moved → push commit → publish → push tags. If origin/main advanced, we abort before publish with a clear error. Next run picks up the new state. Narrow window remains where partial publish failure could leave tags un-pushed, but that's recoverable (publish is idempotent for already-published versions).

  3. Read commit body in auto-changeset.ts: format now includes %b (with \x00-sentinel separators so embedded newlines/pipes don't break parsing). isBreaking() checks both subject and body for BREAKING CHANGE / BREAKING-CHANGE. Matches what the Conventional Commits spec actually says about where to put breaking markers.

  4. Dropped id-token: write: no step in the workflow uses OIDC. GitHub Packages publish via App-minted token doesn't need it.

Test plan

  • Smoke test: pnpm changeset:auto defers correctly to existing manual changeset infrastructure (current state of the repo)
  • YAML structure intact (workflow loads — relied on github's validation since local PyYAML wasn't installed)
  • First release after merge: verify the working-tree gate fires correctly, push-before-publish order succeeds, tags get pushed
  • Test manual changeset for non-eslint package: drop a manual .changeset/foo.md bumping only prettier-config, verify it publishes (previously would silently skip)
  • Test breaking-change footer: commit with feat: x\n\nBREAKING CHANGE: y, verify auto-changeset produces a minor bump (previously would produce patch)

Notes

Ran pr-review (codex-cli + copilot-cli ensemble) retroactively
against the just-merged direct-publish flow (PR #4). Both reviewers
caught the same two release-path bugs; codex caught a third about
breaking-change parsing; copilot caught a least-privilege gap. All
four are real and fixable in one focused PR.

1. [medium] Single-package version gate skipped manual changesets.
   The workflow compared only `packages/eslint-config/package.json`
   before/after `changeset:version`. A valid manual changeset
   bumping only `@happyvertical/prettier-config` or
   `@happyvertical/tsconfig-base` would update those package files,
   pass through `changeset:version`, and then hit the eslint-only
   gate which sees no change → exit 0 without commit/publish. The
   manual release silently vanishes.

   Fixed by gating on the working tree itself
   (`git diff --quiet && git diff --cached --quiet`) after
   `changeset:version`. Catches every case: auto-generated bumps to
   all packages, manual changesets that touch any subset, and
   no-op runs where nothing was releasable.

2. [medium] Publish ran before push, allowing registry/git drift.
   Original order: commit → publish → push. If main advanced
   mid-run (concurrency only serializes release runs, not human
   pushes), publish would succeed and the push would fail
   non-fast-forward. Result: registry has new versions, main
   doesn't have the bump commit, next run tries to republish.

   Fixed by reordering: commit → fetch origin/main → verify it
   hasn't moved past our parent → push commit → publish → push
   tags. If origin/main advanced, we abort BEFORE publishing with
   a clear error message; the next push to main re-runs and picks
   up the new state. The narrow window where publish runs after
   push but before tags-push can still produce missing tags on a
   partial publish failure, but that's recoverable (rerun
   `changeset:publish`; published versions are idempotent).

3. [medium] Breaking-change footers were ignored. auto-changeset.ts
   read commit subjects only via `--pretty=format:%H|||%s` and
   matched `BREAKING CHANGE` against the subject. Standard
   Conventional Commits put breaking markers in a body footer:

       feat: change config shape

       BREAKING CHANGE: consumers must update their extends path

   That commit was being classified as patch — shipping a breaking
   change under a patch version.

   Fixed by including `%b` in the log format (with sentinels for
   safe parsing across newlines and pipes in bodies), passing the
   body through to `isBreaking()`, and matching both
   `BREAKING CHANGE` and `BREAKING-CHANGE` in subject OR body.

4. [low] `id-token: write` permission was declared but no step uses
   OIDC. GitHub Packages publishing doesn't need it; we don't mint
   attestations. Removed per least-privilege.

Smoke test: `pnpm changeset:auto` defers correctly to existing
manual changeset infrastructure.
Round-1 fix in f2ec77a introduced two bugs that the round-2 ensemble caught:

1. [HIGH] `\x00FIELD\x00` / `\x00COMMIT\x00` sentinels in argv

   Node's child_process refuses NUL bytes in argv with a TypeError. The
   try/catch in `git()` swallowed it as "no output", so
   getCommitsSinceLastRelease() always returned [], real.length === 0,
   and auto-changeset early-returned every run — silently breaking the
   whole release pipeline.

   Fix: use git's pretty-format escapes `%x1f` / `%x1e` which emit
   ASCII 31 / 30 in the OUTPUT, not in argv. Neither byte appears in
   commit text, and they don't trip Node's argv validation.

   Also tightened `git()` to distinguish git's non-zero exits (expected
   for `describe --tags` with no tags) from Node-level errors (our bugs)
   — Node-level errors now log to stderr so the next such typo doesn't
   silently skip releases.

2. [medium] No recovery path for partial publish failure

   If `pnpm changeset:publish` fails after the version commit was already
   pushed, the version bump is on main but packages aren't published.
   Re-running via push is skipped by the chore(release) guard, and the
   main-advanced check would abort anyway.

   Fix: add workflow_dispatch trigger with mode=publish-only that skips
   auto-changeset + version + commit + push and runs only changeset
   publish against current main. The job-level if: no longer skips
   workflow_dispatch invocations.
… push

Round-3 ensemble found three real release-correctness hazards in the
publish-only recovery path added in 10f879f:

1. [HIGH] Recovery could publish the wrong source tree

   `pnpm changeset:publish` runs against whatever ref the
   workflow_dispatch checked out. If main has advanced beyond the
   failed release commit `R` (any commit `N` landed in the interval),
   recovery publishes R's package.json versions with N's source code —
   the registry artifact for that version silently contains
   post-version-commit changes.

   Fix: guard the recovery step. Refuse to run unless HEAD is a
   `chore(release):` commit AND HEAD === origin/main. Recovery is only
   safe immediately after a failed publish, before any new commits
   land. If main has advanced, operator must resolve manually (revert
   the new commits, or hand-publish from a clean checkout of `R`).

2. [medium] Recovery doesn't recreate lost local tags

   In a partial failure, the first run can publish + tag some packages
   locally before erroring out. Local tags die with the runner. On
   recovery, `changeset publish` skips already-published versions
   (correctly idempotent), so the tags are never recreated by publish
   alone, and `git push --tags` pushes nothing for them. The next
   auto run's tag-based release boundary then scans too far back and
   double-includes old commits.

   Fix: run `pnpm changeset tag` after publish in the recovery step.
   That reads package.json versions and creates any missing tags
   regardless of publish outcome.

3. [medium] Tag-push step also pushed the main branch ref

   `git push --tags origin main` pushes BOTH tags and the main branch.
   In the auto path, main was already pushed at step 5 before publish
   ran. If another commit lands between that step-5 push and the
   step-7 tag push, the redundant main push fails with
   non-fast-forward — AFTER packages have already published to the
   registry, leaving registry/git drift.

   Fix: `git push origin --tags` in both paths. Recovery never needs
   to advance main; auto already pushed it explicitly.
Round-4 ensemble caught two more medium-severity correctness hazards in
the workflow:

1. [medium] `git fetch origin main` may not update `refs/remotes/origin/main`

   With a source-only refspec (`main` instead of `main:main`), git's
   behaviour depends on the remote's configured fetch refspec, shallow
   state, and partial-clone setup. Under some configurations it only
   updates FETCH_HEAD, leaving `refs/remotes/origin/main` stale. The
   subsequent `git rev-parse origin/main` then compares against a stale
   value — the guard passes even though main has actually advanced, and
   the publish runs against the wrong tree.

   Fix: explicit destination refspec
   `+refs/heads/main:refs/remotes/origin/main` in both auto and
   recovery paths. The `+` forces non-fast-forward updates so the
   remote-tracking branch always reflects the remote's current state.

2. [medium] Commit subject embedded in `::error::` workflow command unescaped

   The recovery guard's failure message interpolates `$HEAD_MSG`
   (commit subject — user-controlled) directly into a workflow
   command. GitHub workflow commands require `%`, CR, LF to be escaped
   (`%` → `%25`, CR → `%0D`, LF → `%0A`) before interpolating
   user-controlled content; without escaping, a crafted commit subject
   can corrupt the command payload or inject additional workflow
   commands into the log parser.

   Fix: inline bash parameter expansion to escape the three bytes
   before embedding. Verified the escape produces correct output for
   subjects containing all three characters.

   (The non-error `echo` in the success path interpolates only the
   short SHA from `git rev-parse --short HEAD`, which is hex-only and
   can't contain `%`/CR/LF — no escaping needed there.)
…osition

Round-5 ensemble caught a real bump-classification bug, empirically
verified against this branch's own commits:

`isBreaking()` body-scan used `\bBREAKING[- ]CHANGE\b`, matching the
phrase anywhere in the commit body. That includes narrative text,
docstring examples, and any commit that documents what a
breaking-change footer is.

Concrete failure: commit `f2ec77a` on this branch has a docstring
example in its body that reads
`BREAKING CHANGE: consumers must update their extends path`. Under the
loose body regex, that triggers `isBreaking() === true` and forces a
minor bump for the whole release — even though every commit on the
branch is `fix(release): ...` (patch only).

Per Conventional Commits spec, `BREAKING CHANGE:` (or
`BREAKING-CHANGE:`) is a footer line. Anchor the regex with
`^BREAKING[- ]CHANGE: ` and the multiline flag so only line-anchored
occurrences match. Narrative mentions and indented examples no longer
false-positive.

Verified on this branch's 4 commits: all now classify as patch (PATCH
verdict), matching the intended `fix:` semantics.
Round-5 tightened the subject regex to `^BREAKING[- ]CHANGE: ` (line
start anchor), matching the body regex. That fixed the docstring-false-
positive but accidentally narrowed too far: the previously documented
inline subject form `feat: BREAKING CHANGE: remove legacy preset` no
longer matches, so a breaking release could ship under a patch version.

Failure path:
- Subject `feat: BREAKING CHANGE: remove legacy preset`
- `!` regex on line 141 — no match (`feat:` not `feat!:`)
- body footer regex — no match (no body footer)
- subject `^BREAKING[- ]CHANGE: ` — no match (starts with `feat:`)
- `isBreaking()` → false → patch bump for a breaking change

Fix: keep the body regex line-anchored (footer-correct) but loosen the
subject regex back to `\bBREAKING[- ]CHANGE: ` — word boundary +
colon-space. The colon requirement keeps narrative phrasing like
"avoid BREAKING CHANGE false positives" out; the word-boundary lets
the marker appear anywhere in the subject including after `feat:` /
`fix:` prefixes.

Verified 8 edge cases (5 subject + 3 body) including indented
docstring examples, narrative prose, hyphenated variants, and the
restored inline subject form. All pass.
Rounds 5 and 6 of the review loop were both bug-fixing the same
feature — and that feature does not match how this org actually writes
commits. The org's convention is `feat!:` / `feat(scope)!:`, not
`BREAKING CHANGE:` footers.

Footer scanning is a regex-tuning trap:
- Round 5 caught the body regex matching docstring narrative
  (`f2ec77a`'s own JSDoc example would have triggered a false minor
  bump for this very PR).
- Round 6 caught the round-5 tightening dropping the legitimate inline
  subject form (`feat: BREAKING CHANGE: ...`).

Every fix for one false positive introduces a window for a different
false negative — narrative vs footer, indented vs unindented, inline
subject vs bare subject, hyphenated vs spaced. None of those edge
cases would arise if we simply recognized `!`.

Drop footer scanning entirely. `isBreaking()` is now one line. As a
follow-on simplification, the body fetch goes away too: the git pretty
format drops to `%H %s`, the field/record separator machinery
disappears, and parsing uses newlines (git subjects can't contain
newlines, so they're unambiguous).

If a footer-marked commit ever lands (e.g. from an external
contributor unfamiliar with the org convention), the documented
escape hatch is to drop a manual `.changeset/*.md` file — the
deferral check at the top of main() yields to it.

Net: -51 / +27 lines, removes the two highest-friction regex paths in
the file.
…t errors

Round-7 ensemble found two more correctness issues:

1. [medium] Empty-changeset trap

   `pnpm changeset:version` reads and DELETES changeset files even
   when their frontmatter is empty (no bumps). The old gate
   `git diff --quiet && git diff --cached --quiet` would see the
   deletion as a change, commit it, run publish as a no-op, advance
   no tags. The next auto run would then re-scan commits from the
   same (unmoved) tag boundary and re-include the commits the empty
   changeset was meant to skip.

   Fix: scope the gate to `packages/*/package.json` only. If no
   package version actually bumped, exit clean — leave the deleted
   changeset file uncommitted on the runner (next push re-fetches it
   from origin anyway).

2. [low] Node-level git errors went silent

   The round-2 `git()` helper logged Node-level failures (NUL in argv,
   missing binary, spawn errors) but still returned empty. Callers
   interpret empty as "no commits" / "no tags" — a real spawn-time
   bug would skip the release with a green workflow run.

   Fix: throw on Node-level errors (e.status undefined/null), keep
   the empty-return path only for git's own non-zero exits (which are
   expected for `describe --tags` with no tags). Verified: NUL in argv
   now throws ERR_INVALID_ARG_VALUE rather than silently returning.
…curate commit title

Round-8 ensemble found three issues, including a regression from
round-7's empty-changeset fix:

1. [medium] Tag-push failures lost the release boundary

   If `pnpm changeset:publish` succeeds but `git push origin --tags`
   fails, packages are on the registry, the commit is on main, but
   the per-package tags are not on origin. The recovery path requires
   HEAD to still be the `chore(release):` commit — once any new
   commit lands, recovery is blocked, and the next auto run scans
   `LAST_TAG..HEAD`, filters out the chore commit, and re-includes
   the commits that were already published. Duplicate release with
   duplicate changelog entries.

   Fix: add a step-0 guard that scans for an untagged `chore(release):`
   commit between the last `@happyvertical/*@*` tag and HEAD. If
   found, abort with an operator-action message pointing at the
   resolution (hand-push the missing tags, then re-run).

2. [medium] No-op changesets were stuck on main (regression from round-7)

   Round-7's package-json gate exits early without committing the
   deletion of an empty manual changeset. The cleanup never reaches
   origin, so the no-op changeset stays on main — every future push
   repeats the same no-op cycle, silently blocking real auto-changeset
   runs.

   Fix: when the gate finds no package bumps but the tree changed,
   commit + push the cleanup with a `chore(release):` title (so the
   next push doesn't re-trigger us via the job-level guard).
   Includes the same origin-main-advanced check used in the main
   path to avoid drift.

3. [low] Subset releases got a misleading commit title

   The old `chore(release): v$AFTER` used eslint-config's version
   unconditionally. A manual changeset that bumps only prettier-config
   would commit `v<old-eslint-version>` while actually publishing
   a different package.

   Fix: build the commit title from the package names that actually
   bumped this run (`git diff --name-only -- 'packages/*/package.json'`
   → comma-separated list). Per-package versions live in the tags
   created by `changeset publish`; the commit title carries which
   packages changed.
…mits only

Round-9 caught a regression where round-8's two fixes interacted badly:

- Fix 8.1 added a preflight detecting any `chore(release):` commit
  between the last per-package tag and HEAD (signal of failed
  tag-push from a previous release).
- Fix 8.2 added a cleanup path that creates `chore(release): clean up
  consumed no-op changesets` commits when a manual no-op changeset
  needs to be removed from main.

After a cleanup commit lands, the preflight grep matches it (same
`chore(release):` prefix), and EVERY future release self-blocks
forever — the operator sees "previous release didn't complete tag
push" errors on commits that were never a release.

Fix: narrow the preflight grep to `^chore(release): bump ` — the
exact format the version-bump commit uses (set by fix 8.3). The
cleanup commit's `clean up consumed no-op changesets` suffix doesn't
match, so it's correctly ignored by the partial-release detector.

Verified the grep behavior in a synthetic git repo: pattern matches
`chore(release): bump eslint-config,prettier-config`, rejects
`chore(release): clean up consumed no-op changesets` and
`fix(release): not a release commit`.
…very guidance

Round-10 ensemble found two issues with the round-8 partial-release
detector:

1. [HIGH] First-release partial publish bypassed the guard

   The detector only ran when `LAST_PKG_TAG` was non-empty. That
   excluded the exact case the guard is meant to catch in a fresh
   repo: a first release that publishes packages but fails to push
   tags. With no prior tags, `LAST_PKG_TAG` is empty, the guard is
   skipped, and `auto-changeset` scans all history, filters out the
   chore commit, and re-includes the commits that were already
   published — duplicate release.

   Fix: when no package tag exists, scan all history (`UNTAGGED_RANGE=HEAD`)
   instead of skipping the guard. Same detection, same abort path.

2. [medium] Error message could lead operators to publish-skip

   The old message said "previous release published packages but
   failed to push tags" and told operators to push the missing tags.
   But the same untagged-bump-commit state can arise from a partial
   publish failure (some packages published, some didn't) before tag
   push was even attempted. If an operator follows the old guidance
   and pushes tags for versions that don't exist on the registry,
   future auto runs treat those tags as the release boundary and
   skip publishing the missing versions — silent data loss.

   Fix: distinguish the two failure modes explicitly. Recovery
   guidance now:
   (a) Recommends workflow_dispatch publish-only when HEAD is still
       the bump commit. That path is idempotent on both axes:
       changeset:publish skips already-published versions, and
       changeset tag creates tags only for versions in package.json.
       Single action covers both partial-publish and tag-push-failure.
   (b) When HEAD has advanced past the bump commit (the case where
       this guard fires), tells operators to manually verify
       registry state before pushing any tags, with a concrete
       example of how to tag-only what's actually published.
…t guards

Round-11 ensemble caught two findings, both rooted in the same bug
class: matching commit message contents (body) instead of just the
subject.

1. [medium] Untagged-release detector matched commit bodies

   `git log --grep` searches the FULL commit message — subject and
   body. Round-8 added `--grep='^chore(release): bump '` thinking it
   would only match release commit subjects, but a regular commit
   whose body documents the literal string `chore(release): bump …`
   (e.g. as a code example, like several of my own commit messages
   on this PR) sets `UNTAGGED_RELEASE` and aborts every future run.

   Fix: pipe `git log --pretty='%H %s'` through awk to filter on the
   subject line only. Verified in a synthetic repo: false-positive
   commit (body mentions `chore(release): bump …`) was matched by
   `--grep` but correctly skipped by the awk pipeline; the real
   bump commit still matched.

2. [medium] Publish-only recovery accepted cleanup commits

   Round-3's recovery guard checked `^chore(release):` — too loose.
   The cleanup commit from round-8 (`chore(release): clean up
   consumed no-op changesets`) matches that prefix. If an operator
   runs publish-only while that cleanup is at HEAD, `changeset tag`
   creates per-package tags pointing at the cleanup SHA instead of
   the actual version-bump commit. Future release scans then trust
   the wrong commit as the release boundary.

   Fix: narrow to `^chore(release): bump ` — the exact format of the
   real version-bump commits, matching the detector in step 0.
… push

Round-12 ensemble found two findings:

1. [medium] No-op cleanup path didn't preserve "skipped" boundary

   Round-7/8 added a path that commits the cleanup of an empty manual
   changeset (`chore(release): clean up consumed no-op changesets`)
   so the empty file wouldn't keep blocking future runs. But:
   - The cleanup commits no tag
   - auto-changeset.ts's `LAST_PKG_TAG` still points at the previous
     real release
   - The commits the operator intended to "skip release" via the
     empty changeset stay in the next auto run's scan range
   - Next push releases them anyway

   The cleanup path was treating empty changesets as a valid "skip
   release" signal that this workflow can't actually honor (no
   commit-range bookkeeping persists across runs).

   Fix: remove the cleanup machinery. Fail loudly when an empty
   changeset is detected. Operator's only valid actions are:
   (a) fill in package bumps in the changeset frontmatter, or
   (b) delete the empty changeset file.

   This also kills the round-9 fix's reason-to-exist (narrowing the
   detector to exclude `chore(release): clean up` commits) — those
   commits no longer get created. The narrowing stays as defense in
   depth.

2. [medium] Non-atomic tag push left partial state undetectable

   `git push origin --tags` pushes all local tags but is NOT atomic:
   a network failure mid-push can land some tags and leave others
   local-only. The step-0 untagged-bump detector relies on
   `LAST_PKG_TAG` being absent when the previous release's tags
   never reached origin — a partial push sets `LAST_PKG_TAG` to one
   of the freshly-pushed sibling tags, the detector treats the bump
   commit as already-released, and the missing tags persist
   silently.

   Fix: `git push --atomic origin --tags` in both auto and recovery
   paths. With atomic, either all tags reach origin or none do, so
   the detector's "no tag pushed" branch is the only possible
   partial-tag state.
…t subject alone

Round-13 ensemble added copilot-cli (codex-cli had solo'd rounds 2-12).
Copilot caught a real release-deadlock that codex missed across 12
rounds:

The step-0 partial-release detector and the publish-only recovery
guard both relied on subject pattern `^chore(release): bump ` to
identify workflow-created release commits. But that subject is also
a valid human Conventional Commit — e.g. `chore(release): bump
pnpm/action-setup` for a dependency update.

Deadlock path:
- Human pushes `chore(release): bump pnpm/action-setup`
- Job-level `if:` skips that push (filter on chore(release): prefix)
- Any future push runs the workflow
- Step-0 detector finds the human commit, treats it as a failed
  release commit, aborts with operator-action error
- No release ever happens until someone manually deletes the human
  commit (impossible — it's already on main) or rewrites history

Fix: combine subject pattern with committer-identity check. The
workflow sets `have-release-bot` as the git committer for its own
commits (line 81), so committer identity is a reliable machine-only
marker that humans can't accidentally produce.

- Step-0 detector: `git log --committer='have-release-bot' ...`
  combined with the existing awk subject filter
- Recovery guard: also check `git log -1 --pretty=%cn HEAD` matches
  `have-release-bot`, in addition to the subject grep

Verified in a synthetic repo: human commit with matching subject is
correctly excluded; bot commit with same subject still matches.

Process note: this finding shipped after rounds 2-12 of codex-cli
solo because the loop wasn't running the ensemble it documents. From
now: copilot-cli runs each round alongside codex-cli.

Also rejecting copilot's second finding (loss of `BREAKING CHANGE`
recognition) — that was the deliberate round-6.5 simplification (see
commit 7e13016), not a regression. Copilot didn't see that design
conversation.
…geset + escape HEAD_COMMITTER

Round-14 ensemble (codex-cli + copilot-cli both running this round —
correcting rounds 2-13 which were codex-cli solo). Copilot caught
two real findings codex missed; codex caught one of them at lower
severity. Deduplicated:

1. [medium] Subject-only chore(release) match in TWO more code paths
   (extension of round-13 finding into job gate + auto-changeset)

   Round-13 fixed the step-0 detector and the publish-only recovery
   guard to require BOTH subject pattern AND `have-release-bot`
   committer identity. But two other places still used subject-only
   matching:

   - Job-level `if:` filter (line 42): skips ANY commit starting
     with `chore(release):` regardless of committer. A human commit
     like `chore(release): bump pnpm/action-setup` (valid Conventional
     Commit for a dependency update) would never trigger the release
     workflow for its own changes.
   - auto-changeset.ts filter (line 161): excludes ANY commit with
     that subject prefix from the generated changelog summary. The
     same human commit would be silently absent from release notes
     even when a subsequent commit triggers a release.

   Net effect: human-authored `chore(release):` commits become
   invisible in release pipeline AND release notes.

   Fix:
   - Job-level if: combine `startsWith` with
     `github.event.head_commit.committer.name == 'have-release-bot'`.
     GitHub's push event payload includes committer.name so we can
     gate at the workflow level without running any steps.
   - auto-changeset.ts: re-add committer field to git log output
     (using `%cn` with `\x1f` field separator — newline still
     separates records since subjects/committers can't contain
     newlines). Update the filter to require BOTH conditions.

   Verified in synthetic git repos that human commits with matching
   subject pass through the filter while bot commits with matching
   subject are excluded.

2. [medium] HEAD_COMMITTER printed in ::error:: without escaping
   (caught by both reviewers, copilot rated medium / codex rated low)

   Round-13 added HEAD_COMMITTER to the recovery guard's failure
   output but only escaped HEAD_MSG. Copilot empirically verified
   that git accepts committer names containing `%0A::warning::pwn`
   — direct workflow-command injection risk.

   Fix: extract escape logic into a local `escape_wc` function and
   apply to BOTH user-controlled git metadata fields before
   embedding in workflow commands.

Process note: the round-1-13 codex-cli-solo run produced 22 findings
across 12 rounds. Copilot's first-round catch on a fundamental
deadlock scenario (covered by round 13) and this round's two findings
(both real) show why the ensemble is non-negotiable. Going forward:
both run every round; ensemble dedup happens at fix time.
…id SIGPIPE bypass

Round-15 ensemble: codex-cli clean, copilot-cli caught a real
high-confidence bug.

The step-0 untagged-release detector used `awk '/.../ {print $1; exit}'`
to find the first match efficiently. Under `set -o pipefail`, that
early exit closes awk's stdin, git log dies on SIGPIPE (exit 141),
and the pipeline returns non-zero.

Bash's default behavior (no `inherit_errexit`) is that command
substitution swallows pipeline failures silently. So
`UNTAGGED_RELEASE=$(... | awk ... exit)` produces empty output when
the pipeline fails — the subsequent `if [ -n "$UNTAGGED_RELEASE" ]`
check sees empty, skips the abort, and the detector is bypassed
ENTIRELY. The release proceeds against a state that contains an
untagged release commit, producing the exact duplicate-changelog
scenario this detector was added to prevent.

Fix: full-scan awk with deferred print:
```awk
/^[a-f0-9]+ chore\(release\): bump / && !found {hash=$1; found=1}
END {if (found) print hash}
```

Same "first match wins" semantics, no early exit, no SIGPIPE.
Verified on three input cases (match, no-match, empty) under
`set -euo pipefail` — all return cleanly without tripping the
pipeline failure path.

Not applying copilot's other two findings:
- Legacy `chore(release): v<ver>` format support: this repo has no
  prior releases; no legacy state exists. The detector only needs
  to recognize commits this workflow itself creates.
- Stronger committer identity (name + email): real concern but the
  threat model is "anyone with push access". Push access already
  permits many forms of release pipeline interference. Adding email
  check is marginal defense in depth; keeping name-only for now.
@willgriffin willgriffin merged commit 1520f15 into main May 23, 2026
1 check passed
@willgriffin willgriffin deleted the fix/release-workflow-correctness branch May 23, 2026 05:14
willgriffin added a commit that referenced this pull request May 23, 2026
…msg in error output

PR #5's commits use `fix(review-cycle,ship):` (multi-scope, one
edit landing in both slash command surfaces). The old regex
`(\([a-z0-9-]+\))?` rejected the comma, blocking the PR from
merging despite the commits being well-formed Conventional Commits.

Relax to `[a-z0-9][a-z0-9,/-]*`:
- First char must be alphanumeric — prevents stray punctuation like
  `(,foo)` or `(-foo)`.
- Allows comma for multi-scope commits.
- Allows forward slash for dep-name scopes like
  `chore(tibdex/github-app-token):`.
- Still rejects uppercase, colons in scope, leading punctuation.
- Verified against 9 synthetic cases (5 pass, 4 fail as expected)
  AND all 13 PR #5 commits (all pass).

Also fix the existing `echo "::error::Invalid commit message: $msg"`
to escape `%`, CR, LF before embedding the user-controlled commit
message in a workflow command. This is the same lesson from PR #6
round 14 / round 4: GitHub workflow commands parse `%`, `%0D`, and
`%0A` as their respective bytes, so an attacker (or just an
inconvenient commit message) containing those bytes can corrupt the
command payload or inject additional workflow commands. Extract
escape logic into `escape_wc()` for reuse.

Also switched `echo "$msg" | grep` to `printf '%s' "$msg" | grep` —
`echo` can interpret leading `-n`/`-e`/`-E` as flags in some shells,
so a commit subject starting with one of those would silently
produce wrong output. `printf '%s'` is unconditional.
willgriffin added a commit that referenced this pull request May 23, 2026
…, no rejections)

Round-4 ensemble (codex-cli + copilot-cli + claude sub-agent + me).
Surfaced 6 distinct findings; 4 accepted, 1 declined as nit, 1 was
my own self-review note that overlapped with #1.

1. [medium, ALL 4 REVIEWERS] Cap-blocked-verify case overloaded
   `partial` status, breaking `/ship`'s partial-handling contract

   Round-3 fix #6 said "report status `partial`" when a P0/P1/P2 fix
   landed in the final permitted round (no verify possible). But the
   Status contract defines `partial` strictly as "skipped required
   reviewer". `/ship` then routes `partial` only by `Skipped
   reviewers`, which would be empty in this case — undefined
   behavior, agent could fall through to "treat as clean and ship
   unverified fix".

   Four-way confirmation:
   - codex: "make it `blocked`/`needs-verify`"
   - copilot: "either add explicit `verify-round-missed` reason +
     new ship branch, or classify as `blocked` to keep partial
     strict"
   - claude sub-agent: "either add third partial sub-branch in
     ship.md, or use different status (`blocked` with verify-needed
     marker)"
   - me (self-review): "contract overload — needs broadened
     definition or different status"

   Fix: reclassify as `blocked` with reason
   `verify-round-blocked-by-cap`. Three coordinated edits:
   (a) Round-cap section: change "report status as `partial`" to
       "report status as `blocked` with reason
       `verify-round-blocked-by-cap`".
   (b) Status contract `blocked =` definition: extend to enumerate
       "P0/P1/P2 fix landed in the final permitted round with no
       verify round possible". Reasoning: an unverified fix counts
       as potentially unaccepted because we don't yet know if the
       fix introduced new findings.
   (c) Status contract `partial =` definition: add "Single cause
       only — other 'incomplete' states (unverified fix, validation
       failed) are `blocked`, not `partial`" to lock down the
       single-cause invariant.
   (d) ship.md `blocked` branch: add explicit sub-case for
       `verify-round-blocked-by-cap` directing operator to re-run
       `/review-cycle rounds=N+1`.

2. [low, copilot + claude] `.gitignore` comment cited prior incident
   filenames the patterns don't actually catch

   Round-3 comment claimed defense-in-depth for `.deny-test.jsonl`
   / `.revparse-test.jsonl` from the 1fc8677 incident, but the
   narrow `copilot-session-*.jsonl` patterns don't match those
   names. Misleading framing.

   Fix: rewrite the comment to be honest about what the patterns
   cover and don't cover. Note explicitly that the narrow patterns
   were a deliberate round-7 walkback (avoided `*-test.jsonl`
   wildcard that would hide legitimate fixtures), and that the
   structural defense for arbitrary probe filenames is the /tmp
   rule, not gitignore.

3. [low, codex] Commitlint workflow had unused `pull-requests: read`
   permission

   The job only does `actions/checkout` + `git log` over commit
   SHAs from the event payload. No PR API calls. `pull-requests:
   read` is dead scope. Per least-privilege, dropped it.

4. [low, claude] Commitlint error message didn't mention `@` (the
   regex was updated to allow it in round 3 but the help text
   lagged)

   A user hitting the error with `feat(@foo/bar): ...` for an
   unrelated reason would read the help message and conclude `@`
   isn't supported. One-line fix to add "optional leading @
   (scoped npm packages)" to the chars list.

DECLINED:

5. [nit, claude] Regex accepts `feat(@scope):` (no `/pkg` suffix) —
   not a valid npm-scoped reference

   Tightening to require `/` after `@` would reject legitimate
   non-npm uses (e.g. someone wanting to use `@`-prefix for their
   own scope convention). The regex isn't claiming to enforce
   npm-semantic validity. Out of scope for a commit-message linter.

Round-4 ran the full 4-reviewer ensemble in parallel against the
same commit (1dd7688). Three rounds total of 4-reviewer ensembles
now — the claude-sub-agent-via-Agent-tool workaround is proving
durable enough to fold into the docs as the canonical claude-cli
substitute when OAuth fails.
willgriffin added a commit that referenced this pull request May 23, 2026
… converged)

Round-10 ensemble (codex + copilot + claude sub-agent + me).
4 distinct findings, all converging on the same conclusion: the
WIP-commit recipe added in round 8 and reworked in round 9 has
accumulated too many independent footguns to keep patching.

The footgun inventory (caught across this and prior rounds):

1. [HIGH, round 10, codex + copilot 2-way] `git reset --hard` step
   loses untracked files included in the WIP commit. `git diff`
   doesn't capture them, so the restore patch can't recreate them.
   Concrete local data loss for new files/binary assets.

2. [medium, round 10, claude sub-agent] `git apply` of the
   unstaged patch is unsound when staged + unstaged changes touch
   overlapping hunks in the same file. The unstaged patch's
   context is the INDEX state, but we apply it to a worktree at
   pre-WIP HEAD (no staged applied yet). Hunks fuzz-fail, leave
   `.rej` files, silent failure if not run with `set -e`.

3. [medium, round 10, claude sub-agent] `git commit` failures
   (extremely common — commitlint hooks, gitleaks, gpg signing,
   biome, plus this very repo's own commitlint workflow which
   rejects `wip:` as a non-Conventional-Commits type) silently
   make `WIP_SHA` empty, and the downstream verify check
   misdiagnoses it as "HEAD moved during review". User goes
   hunting for non-existent HEAD movement.

4. [medium, round 10, copilot + claude + me 3-way] Fixed `/tmp/`
   paths collide across concurrent runs and across retry runs
   after a failure (second invocation overwrites recovery state
   from first failed run).

5. [round 9, codex + claude] `git add -A` stages unrelated
   untracked files which end up in the WIP commit AND in
   `pr-review --base`'s diff sent to external reviewers —
   potential local data leak.

6. [round 9, claude] `git reset --mixed HEAD~1` is positional;
   if HEAD moved (interrupted run, stray amend, side commit),
   wrong commit gets reset.

7. [round 9, claude + me 2-way] Hard Rules section forbids
   `git reset --hard`; recipe's "carve-out" was only inline,
   not in the Hard Rules section itself.

8. [round 9, me] WIP dance destroys user's pre-WIP staging
   discipline (carefully staged/unstaged split → everything
   unstaged after reset).

That's 8 distinct sharp edges across 2 rounds. Each round of fixes
introduces another caveat to document. Same pattern as PR #6's
round-6.5 BREAKING-CHANGE-scanning removal: when a recipe needs
more caveats than recipe, delete it.

Fix: remove the WIP-commit alternative entirely. Replace with a
fully-fleshed-out snapshot-comparison recipe that:
- Uses `mktemp -d` for per-run snapshot directory (no /tmp
  collisions).
- Captures status + unstaged diff + staged diff + untracked file
  hashes BEFORE the reviewer.
- Captures same shape AFTER.
- Diffs the four pairs; ANY difference → round invalid.
- Catches same-status-but-different-content cases (e.g.
  `M path → M path` with different bytes) that a simple
  "is status empty" check would miss.
- Catches added/modified untracked files via the hash list.
- Preserves staging discipline.
- No `git reset` calls → Hard Rules don't need a carve-out.

Brief "why not WIP" explanation kept inline as a tombstone so
future contributors don't reinvent the WIP recipe — enumerates the
8 footguns so the choice is documented, not just made.

Net: -71 / +49 lines across both variants. Removes the most
caveat-heavy section of these docs and replaces it with a single
correct recipe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant