docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138#1301
Merged
Conversation
Salvages the rule updates produced by the changelog-review workflow run that hit max-turns (run 25651584541). Original PR #1299 was closed by the bot on a false-positive "empty branch" self-diagnosis; the diff at commit e58aed8 is recovered here as a single clean commit. Updates: - .claude-code-version-check.json: lastCheckedVersion → 2.1.138 with a consolidated reviewedChanges entry covering 2.1.77 → 2.1.138 - hooks-reference.md: effort.level in Common Fields, duration_ms on PostToolUse, new "PostToolUse — Replace Tool Output" section - skill-development.md: $CLAUDE_EFFORT string substitution - agent-development.md: --agent mode hooks/mcpServers note, worktree.baseRef table, subagent Skill-tool discovery note - plugin-structure.md: bin/ directory, experimental block, skills entry fix note - agentic-permissions.md: autoMode.hard_deny subsection - sandbox-guidance.md: CLAUDE_CODE_SESSION_ID, deniedDomains
This was referenced May 11, 2026
laurigates
added a commit
that referenced
this pull request
May 11, 2026
## Summary Restructures the changelog-review workflow so a 62-version gap no longer blows the turn budget. The previous design tried to do triage AND rule edits in a single Claude invocation; with the 2.1.76 → 2.1.138 gap that pattern failed in [run 25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541) at turn 51/50 after 2h 7min. (Recovery of that run's actual rule-edit work is in #1301.) ## Why it failed (data) The SDK reports `total_cost_usd` (notional, not billed under Max 20x — but still a clean proxy for token volume since it's computed from token counts). Per-call token counts aren't exposed at the result level, only the aggregate. Comparison against this workflow's recent weekly runs: | Run | Date | Turns | Duration | Notional $ | Notes | |---|---|---|---|---|---| | 2026-05-04 | success | 41 | 8m 31s | $2.01 | Normal weekly run | | 2026-04-27 | success | 4 | 47m | $57.45 | Big-context first-hit; 4 huge turns | | 2026-04-20 | max_turns | 41 | 36m | $39.18 | Same failure mode, smaller gap | | 2026-04-15 | success | 20 | 1m 30s | $0.40 | No-op week | | 2026-04-06 | success | 9 | 11m | $6.18 | Modest gap | | **2026-05-11** | **max_turns** | **51** | **2h 7m** | **$63.69** | The recovered run | For reference, a typical `claude-code-review.yml` PR review run is $0.50–$1.00 with 15-20 turns. The changelog-review workflow on a fresh gap is **30–100× that**, dominated by context size on first hit (4 turns / $57 in one run) rather than turn count alone. ## What changed **Triage-only phase (this workflow):** - Read the excerpt + the JSON tracking file - Open ONE tracking issue summarizing relevant changes grouped by candidate rule file - Ratchet `.claude-code-version-check.json` - Open one JSON-only PR - **No rule edits.** Bounded to `--max-turns 25` with sonnet. **Apply phase (delegated to existing `claude.yml`):** - User mentions `@claude please update <rule-file> for issue #N` on the tracking issue - Each apply is small-scope with its own turn budget — single rule file at a time **Skip-if-exists check:** - Before invoking Claude, query `gh issue list --label changelog-review --state open` for a title matching the current range. If found, skip. `workflow_dispatch` + `force_full_review=true` overrides. - Makes the workflow safe to re-run and idempotent across cron triggers. **Expanded `additional_permissions`:** - Adds `Bash(gh api *)`, `Bash(gh label *)`, `Bash(jq *)`, `Bash(grep *)`, `Bash(awk *)`, `Bash(wc *)`, `Bash(cat *)`, `Bash(date *)`, etc. - The failed run logged **201 permission denials in 51 turns** — ~4 wasted turns per usable turn just retrying denied commands. These additions cover the patterns the agent reached for. **Prompt hardening:** - Strict step-by-step triage flow with explicit non-goals ("DO NOT edit rule files") - Final sanity check: confirm exactly one commit, exactly one file changed, no rule-file edits, before finishing the turn - Explicit warning about the previous run's self-misdiagnosis loop ("if you think a branch is empty, re-check with `git diff`/`git log` before destructive recovery") - Excerpt truncated to 200KB if larger — keeps the prompt cache predictable ## Why this design over alternatives | Option | Why not | |---|---| | Matrix: one job per version | 62 jobs is noisy; most versions are pure bug fixes (e.g. 2.1.137 is one bullet) | | Sub-agents / agent team within one run | Doesn't multiply turn budget — `Task`/`SendMessage` count as turns from the parent loop. Only separate jobs get independent budgets. | | Keep monolithic single-PR design | Will fail again on the next gap | | Iterative-with-checkpoint | Possible, but the issue/apply split matches the historical pattern (#78, #786, #854, #909, #949) and lets human review gate each apply | ## Test plan - [ ] Validate YAML parse + actionlint - [ ] Dry-run via `workflow_dispatch` with `force_full_review=true` AFTER #1301 merges (tracked version will be 2.1.138, latest will be whatever's current — small or no gap, ideal for dry-running the new shape) - [ ] Verify the triage issue is created with grouped per-rule-file sections - [ ] Verify the JSON-only PR contains exactly one file change - [ ] Run a second dispatch (without force) and confirm skip-if-exists fires ## Follow-up (not blocking) - Consider extracting the workflow into `laurigates/.github` as `reusable-changelog-review.yml` once the new shape is proven — per `.claude/rules/ci-cd-workflows.md` we prefer reusable workflows for cross-repo automation, though this one is plugin-collection specific so reuse value is currently low. - The `show_full_output: false` setting in `anthropics/claude-code-action` keeps the per-call token counts hidden. If we ever want to track token-level efficiency over time, we could either flip that on for the triage job specifically, or scrape the OpenTelemetry log events that the SDK emits (richer detail than the summary result). 🤖 Drafted with [Claude Code](https://claude.com/claude-code)
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recovers the rule updates produced by run 25651584541, which hit
error_max_turnsafter running for 2h 7min and 51/50 turns. The work itself was correct — the bot misdiagnosed its own PR #1299 as an empty branch 72 seconds after opening it, then spent the remaining ~75 minutes thrashing to re-create what it had already shipped, and hit max-turns. This PR salvages commite58aed8cas a single clean commit so the documentation work isn't wasted.Recovery scope
.claude-code-version-check.json—lastCheckedVersion→2.1.138, consolidatedreviewedChangesentry covering 2.1.77 → 2.1.138hooks-reference.md—effort.levelin Common Fields (2.1.133+),duration_mson PostToolUse (2.1.119+), new "PostToolUse — Replace Tool Output" section withupdatedToolOutput(2.1.121+)skill-development.md—${CLAUDE_EFFORT}string substitution row (2.1.120+)agent-development.md—--agentmode hooks/mcpServers note (2.1.116+),worktree.baseRefsetting table (2.1.133+), subagent Skill-tool discovery note (2.1.133+)plugin-structure.md—bin/directory entry,experimentalblock in Optional Fields (2.1.129+),skillsentry fix note (2.1.136+)agentic-permissions.md—autoMode.hard_denysubsection (2.1.136+)sandbox-guidance.md—CLAUDE_CODE_SESSION_IDenv var (2.1.132+),sandbox.network.deniedDomains(2.1.113+)Run resource usage
The repo is on Claude Max 20x, so the SDK's notional cost figure isn't actually invoiced — but it's still a useful workload-size signal because it's computed directly from token counts. The SDK result line for the failed run:
For comparison, the same workflow's prior weekly runs:
The failed run's notional cost is ~60-100× a typical PR review run ($0.50–$1) and ~30× a normal weekly changelog-review tick. Worth noting: per-call token counts are in the SDK's hidden full output stream — only the aggregate cost/turn/duration is exposed in the result event. The 4-turn / $57 outlier on 2026-04-27 suggests the cost is dominated by context size on first hit after a long gap, not by turn count alone.
Test plan
.claude-code-version-check.jsonparses and ratchetslastCheckedVersionto2.1.138e58aed8cwas checked clean before salvage)Follow-up
Workflow restructure to triage-as-issues + skip-if-exists is in #1302 to prevent the next 60-version backlog from blowing the budget again.
🤖 Recovered with Claude Code