ci(changelog-review): refactor to triage-only + skip-if-exists#1302
Merged
Conversation
The previous design tried to do triage AND rule edits in a single Claude invocation. With a 62-version gap (2.1.76 → 2.1.138) this failed in run 25651584541 after burning $63.69 and 2h 7min: - 201 permission denials (4 per turn) bleeding the budget on commands that weren't in additional_permissions - The agent shipped a correct PR (#1299) mid-run, misdiagnosed it as an empty branch 72 seconds later, closed it, then spent ~75 minutes thrashing to "recreate" it before hitting max-turns at turn 51/50 Restructure splits the work into two phases: 1. Triage (this workflow): open ONE tracking issue summarizing relevant changes grouped by candidate rule file, ratchet .claude-code-version-check.json, open a JSON-only PR. No rule edits. Bounded budget: --max-turns 25. 2. Apply (existing claude.yml @-mention responder): user mentions @claude on the tracking issue to drive targeted rule edits. Each apply is small-scope and gets its own turn budget. Other changes: - Skip-if-exists check: before running Claude, look for an open issue matching "Review Claude Code changelog: <tracked> → <latest>". If found, skip. force_full_review=true overrides. - Expand additional_permissions to cover gh api/label, jq, grep, awk, cat, wc, date — eliminates the per-turn denial bleed observed in the failed run. - Truncate excerpts above 200KB to keep prompt cache predictable. - Replace open-ended rule-edit prompt with strict triage steps and a final sanity-check (one commit, one file, no rule edits) so the agent fails fast on misdiagnosis instead of looping. The recovery of the failed run's actual work is in PR #1301; this PR fixes the workflow so the next big gap doesn't repeat the failure.
3 tasks
laurigates
added a commit
that referenced
this pull request
May 11, 2026
## Summary Recovers the rule updates produced by run [25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541), which hit `error_max_turns` after running for 2h 7min and 51/50 turns. The work itself was correct — the bot misdiagnosed its own PR #1299 as an empty branch 72 seconds after opening it, then spent the remaining ~75 minutes thrashing to re-create what it had already shipped, and hit max-turns. This PR salvages commit `e58aed8c` as a single clean commit so the documentation work isn't wasted. ## Recovery scope - **`.claude-code-version-check.json`** — `lastCheckedVersion` → `2.1.138`, consolidated `reviewedChanges` entry covering 2.1.77 → 2.1.138 - **`hooks-reference.md`** — `effort.level` in Common Fields (2.1.133+), `duration_ms` on PostToolUse (2.1.119+), new "PostToolUse — Replace Tool Output" section with `updatedToolOutput` (2.1.121+) - **`skill-development.md`** — `${CLAUDE_EFFORT}` string substitution row (2.1.120+) - **`agent-development.md`** — `--agent` mode hooks/mcpServers note (2.1.116+), `worktree.baseRef` setting table (2.1.133+), subagent Skill-tool discovery note (2.1.133+) - **`plugin-structure.md`** — `bin/` directory entry, `experimental` block in Optional Fields (2.1.129+), `skills` entry fix note (2.1.136+) - **`agentic-permissions.md`** — `autoMode.hard_deny` subsection (2.1.136+) - **`sandbox-guidance.md`** — `CLAUDE_CODE_SESSION_ID` env var (2.1.132+), `sandbox.network.deniedDomains` (2.1.113+) ## Run resource usage The repo is on Claude Max 20x, so the SDK's notional cost figure isn't actually invoiced — but it's still a useful workload-size signal because it's computed directly from token counts. The SDK result line for the failed run: ``` "subtype": "error_max_turns", "duration_ms": 7611491, # 2h 7min "num_turns": 51, "total_cost_usd": 63.69354, # notional, not billed under Max 20x "permission_denials_count": 201 ``` For comparison, the same workflow's prior weekly runs: | Run | Date | Conclusion | Turns | Duration | Notional $ | |---|---|---|---|---|---| | [25315527872](https://github.com/laurigates/claude-plugins/actions/runs/25315527872) | 2026-05-04 | success | 41 | 8m 31s | $2.01 | | [24991465607](https://github.com/laurigates/claude-plugins/actions/runs/24991465607) | 2026-04-27 | success | 4 | 47m 13s | $57.45 | | [24662476905](https://github.com/laurigates/claude-plugins/actions/runs/24662476905) | 2026-04-20 | max_turns | 41 | 35m 51s | $39.18 | | [24453356948](https://github.com/laurigates/claude-plugins/actions/runs/24453356948) | 2026-04-15 | success | 20 | 1m 30s | $0.40 | | [24027905006](https://github.com/laurigates/claude-plugins/actions/runs/24027905006) | 2026-04-06 | success | 9 | 11m 02s | $6.18 | | [23739805721](https://github.com/laurigates/claude-plugins/actions/runs/23739805721) | 2026-03-30 | success | 38 | 7m 10s | $1.47 | | [25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541) | 2026-05-11 | **max_turns** | **51** | **2h 7m** | **$63.69** | The failed run's notional cost is ~60-100× a typical PR review run ($0.50–$1) and ~30× a normal weekly changelog-review tick. Worth noting: per-call token counts are in the SDK's hidden full output stream — only the aggregate cost/turn/duration is exposed in the result event. The 4-turn / $57 outlier on 2026-04-27 suggests the cost is dominated by context size on first hit after a long gap, not by turn count alone. ## Test plan - [ ] Spot-check each rule edit against the upstream changelog bullet that motivated it - [ ] Verify `.claude-code-version-check.json` parses and ratchets `lastCheckedVersion` to `2.1.138` - [ ] Confirm no placeholder/template artifacts remain (the failed run's 16 commits included several "fix placeholder" iterations; the final tree at `e58aed8c` was checked clean before salvage) ## Follow-up Workflow restructure to triage-as-issues + skip-if-exists is in #1302 to prevent the next 60-version backlog from blowing the budget again. 🤖 Recovered with [Claude Code](https://claude.com/claude-code)
Owner
Author
|
@claude let's create the follow-up issues |
|
Claude finished @laurigates's task in 3m 34s —— View job Follow-up issues for PR #1302
Created 3 issues:
#1303 covers the unverified test plan items (actionlint, dry-run dispatch, skip-if-exists) — these need a live run now that the tracked version is at 2.1.138. #1304 and #1305 are the two explicit "not blocking" follow-ups from the PR description. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restructures the changelog-review workflow so a 62-version gap no longer blows the turn budget. The previous design tried to do triage AND rule edits in a single Claude invocation; with the 2.1.76 → 2.1.138 gap that pattern failed in run 25651584541 at turn 51/50 after 2h 7min. (Recovery of that run's actual rule-edit work is in #1301.)
Why it failed (data)
The SDK reports
total_cost_usd(notional, not billed under Max 20x — but still a clean proxy for token volume since it's computed from token counts). Per-call token counts aren't exposed at the result level, only the aggregate.Comparison against this workflow's recent weekly runs:
For reference, a typical
claude-code-review.ymlPR review run is $0.50–$1.00 with 15-20 turns. The changelog-review workflow on a fresh gap is 30–100× that, dominated by context size on first hit (4 turns / $57 in one run) rather than turn count alone.What changed
Triage-only phase (this workflow):
.claude-code-version-check.json--max-turns 25with sonnet.Apply phase (delegated to existing
claude.yml):@claude please update <rule-file> for issue #Non the tracking issueSkip-if-exists check:
gh issue list --label changelog-review --state openfor a title matching the current range. If found, skip.workflow_dispatch+force_full_review=trueoverrides.Expanded
additional_permissions:Bash(gh api *),Bash(gh label *),Bash(jq *),Bash(grep *),Bash(awk *),Bash(wc *),Bash(cat *),Bash(date *), etc.Prompt hardening:
git diff/git logbefore destructive recovery")Why this design over alternatives
Task/SendMessagecount as turns from the parent loop. Only separate jobs get independent budgets.Test plan
workflow_dispatchwithforce_full_review=trueAFTER docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138 #1301 merges (tracked version will be 2.1.138, latest will be whatever's current — small or no gap, ideal for dry-running the new shape)Follow-up (not blocking)
laurigates/.githubasreusable-changelog-review.ymlonce the new shape is proven — per.claude/rules/ci-cd-workflows.mdwe prefer reusable workflows for cross-repo automation, though this one is plugin-collection specific so reuse value is currently low.show_full_output: falsesetting inanthropics/claude-code-actionkeeps the per-call token counts hidden. If we ever want to track token-level efficiency over time, we could either flip that on for the triage job specifically, or scrape the OpenTelemetry log events that the SDK emits (richer detail than the summary result).🤖 Drafted with Claude Code