docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138 by laurigates · Pull Request #1301 · laurigates/claude-plugins

laurigates · 2026-05-11T10:08:20Z

Summary

Recovers the rule updates produced by run 25651584541, which hit error_max_turns after running for 2h 7min and 51/50 turns. The work itself was correct — the bot misdiagnosed its own PR #1299 as an empty branch 72 seconds after opening it, then spent the remaining ~75 minutes thrashing to re-create what it had already shipped, and hit max-turns. This PR salvages commit e58aed8c as a single clean commit so the documentation work isn't wasted.

Recovery scope

.claude-code-version-check.json — lastCheckedVersion → 2.1.138, consolidated reviewedChanges entry covering 2.1.77 → 2.1.138
hooks-reference.md — effort.level in Common Fields (2.1.133+), duration_ms on PostToolUse (2.1.119+), new "PostToolUse — Replace Tool Output" section with updatedToolOutput (2.1.121+)
skill-development.md — ${CLAUDE_EFFORT} string substitution row (2.1.120+)
agent-development.md — --agent mode hooks/mcpServers note (2.1.116+), worktree.baseRef setting table (2.1.133+), subagent Skill-tool discovery note (2.1.133+)
plugin-structure.md — bin/ directory entry, experimental block in Optional Fields (2.1.129+), skills entry fix note (2.1.136+)
agentic-permissions.md — autoMode.hard_deny subsection (2.1.136+)
sandbox-guidance.md — CLAUDE_CODE_SESSION_ID env var (2.1.132+), sandbox.network.deniedDomains (2.1.113+)

Run resource usage

The repo is on Claude Max 20x, so the SDK's notional cost figure isn't actually invoiced — but it's still a useful workload-size signal because it's computed directly from token counts. The SDK result line for the failed run:

"subtype": "error_max_turns",
"duration_ms": 7611491,        # 2h 7min
"num_turns": 51,
"total_cost_usd": 63.69354,    # notional, not billed under Max 20x
"permission_denials_count": 201

For comparison, the same workflow's prior weekly runs:

Run	Date	Conclusion	Turns	Duration	Notional $
25315527872	2026-05-04	success	41	8m 31s	$2.01
24991465607	2026-04-27	success	4	47m 13s	$57.45
24662476905	2026-04-20	max_turns	41	35m 51s	$39.18
24453356948	2026-04-15	success	20	1m 30s	$0.40
24027905006	2026-04-06	success	9	11m 02s	$6.18
23739805721	2026-03-30	success	38	7m 10s	$1.47
25651584541	2026-05-11	max_turns	51	2h 7m	$63.69

The failed run's notional cost is ~60-100× a typical PR review run ($0.50–$1) and ~30× a normal weekly changelog-review tick. Worth noting: per-call token counts are in the SDK's hidden full output stream — only the aggregate cost/turn/duration is exposed in the result event. The 4-turn / $57 outlier on 2026-04-27 suggests the cost is dominated by context size on first hit after a long gap, not by turn count alone.

Test plan

Spot-check each rule edit against the upstream changelog bullet that motivated it
Verify .claude-code-version-check.json parses and ratchets lastCheckedVersion to 2.1.138
Confirm no placeholder/template artifacts remain (the failed run's 16 commits included several "fix placeholder" iterations; the final tree at e58aed8c was checked clean before salvage)

Follow-up

Workflow restructure to triage-as-issues + skip-if-exists is in #1302 to prevent the next 60-version backlog from blowing the budget again.

🤖 Recovered with Claude Code

Salvages the rule updates produced by the changelog-review workflow run that hit max-turns (run 25651584541). Original PR #1299 was closed by the bot on a false-positive "empty branch" self-diagnosis; the diff at commit e58aed8 is recovered here as a single clean commit. Updates: - .claude-code-version-check.json: lastCheckedVersion → 2.1.138 with a consolidated reviewedChanges entry covering 2.1.77 → 2.1.138 - hooks-reference.md: effort.level in Common Fields, duration_ms on PostToolUse, new "PostToolUse — Replace Tool Output" section - skill-development.md: $CLAUDE_EFFORT string substitution - agent-development.md: --agent mode hooks/mcpServers note, worktree.baseRef table, subagent Skill-tool discovery note - plugin-structure.md: bin/ directory, experimental block, skills entry fix note - agentic-permissions.md: autoMode.hard_deny subsection - sandbox-guidance.md: CLAUDE_CODE_SESSION_ID, deniedDomains

## Summary Restructures the changelog-review workflow so a 62-version gap no longer blows the turn budget. The previous design tried to do triage AND rule edits in a single Claude invocation; with the 2.1.76 → 2.1.138 gap that pattern failed in [run 25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541) at turn 51/50 after 2h 7min. (Recovery of that run's actual rule-edit work is in #1301.) ## Why it failed (data) The SDK reports `total_cost_usd` (notional, not billed under Max 20x — but still a clean proxy for token volume since it's computed from token counts). Per-call token counts aren't exposed at the result level, only the aggregate. Comparison against this workflow's recent weekly runs: | Run | Date | Turns | Duration | Notional $ | Notes | |---|---|---|---|---|---| | 2026-05-04 | success | 41 | 8m 31s | $2.01 | Normal weekly run | | 2026-04-27 | success | 4 | 47m | $57.45 | Big-context first-hit; 4 huge turns | | 2026-04-20 | max_turns | 41 | 36m | $39.18 | Same failure mode, smaller gap | | 2026-04-15 | success | 20 | 1m 30s | $0.40 | No-op week | | 2026-04-06 | success | 9 | 11m | $6.18 | Modest gap | | **2026-05-11** | **max_turns** | **51** | **2h 7m** | **$63.69** | The recovered run | For reference, a typical `claude-code-review.yml` PR review run is $0.50–$1.00 with 15-20 turns. The changelog-review workflow on a fresh gap is **30–100× that**, dominated by context size on first hit (4 turns / $57 in one run) rather than turn count alone. ## What changed **Triage-only phase (this workflow):** - Read the excerpt + the JSON tracking file - Open ONE tracking issue summarizing relevant changes grouped by candidate rule file - Ratchet `.claude-code-version-check.json` - Open one JSON-only PR - **No rule edits.** Bounded to `--max-turns 25` with sonnet. **Apply phase (delegated to existing `claude.yml`):** - User mentions `@claude please update <rule-file> for issue #N` on the tracking issue - Each apply is small-scope with its own turn budget — single rule file at a time **Skip-if-exists check:** - Before invoking Claude, query `gh issue list --label changelog-review --state open` for a title matching the current range. If found, skip. `workflow_dispatch` + `force_full_review=true` overrides. - Makes the workflow safe to re-run and idempotent across cron triggers. **Expanded `additional_permissions`:** - Adds `Bash(gh api *)`, `Bash(gh label *)`, `Bash(jq *)`, `Bash(grep *)`, `Bash(awk *)`, `Bash(wc *)`, `Bash(cat *)`, `Bash(date *)`, etc. - The failed run logged **201 permission denials in 51 turns** — ~4 wasted turns per usable turn just retrying denied commands. These additions cover the patterns the agent reached for. **Prompt hardening:** - Strict step-by-step triage flow with explicit non-goals ("DO NOT edit rule files") - Final sanity check: confirm exactly one commit, exactly one file changed, no rule-file edits, before finishing the turn - Explicit warning about the previous run's self-misdiagnosis loop ("if you think a branch is empty, re-check with `git diff`/`git log` before destructive recovery") - Excerpt truncated to 200KB if larger — keeps the prompt cache predictable ## Why this design over alternatives | Option | Why not | |---|---| | Matrix: one job per version | 62 jobs is noisy; most versions are pure bug fixes (e.g. 2.1.137 is one bullet) | | Sub-agents / agent team within one run | Doesn't multiply turn budget — `Task`/`SendMessage` count as turns from the parent loop. Only separate jobs get independent budgets. | | Keep monolithic single-PR design | Will fail again on the next gap | | Iterative-with-checkpoint | Possible, but the issue/apply split matches the historical pattern (#78, #786, #854, #909, #949) and lets human review gate each apply | ## Test plan - [ ] Validate YAML parse + actionlint - [ ] Dry-run via `workflow_dispatch` with `force_full_review=true` AFTER #1301 merges (tracked version will be 2.1.138, latest will be whatever's current — small or no gap, ideal for dry-running the new shape) - [ ] Verify the triage issue is created with grouped per-rule-file sections - [ ] Verify the JSON-only PR contains exactly one file change - [ ] Run a second dispatch (without force) and confirm skip-if-exists fires ## Follow-up (not blocking) - Consider extracting the workflow into `laurigates/.github` as `reusable-changelog-review.yml` once the new shape is proven — per `.claude/rules/ci-cd-workflows.md` we prefer reusable workflows for cross-repo automation, though this one is plugin-collection specific so reuse value is currently low. - The `show_full_output: false` setting in `anthropics/claude-code-action` keeps the per-call token counts hidden. If we ever want to track token-level efficiency over time, we could either flip that on for the triage job specifically, or scrape the OpenTelemetry log events that the SDK emits (richer detail than the summary result). 🤖 Drafted with [Claude Code](https://claude.com/claude-code)

laurigates added changelog-review Changelog review tracking maintenance labels May 11, 2026

laurigates self-assigned this May 11, 2026

This was referenced May 11, 2026

docs(rules): apply changelog review 2.1.77-2.1.138 #1299

Closed

ci(changelog-review): refactor to triage-only + skip-if-exists #1302

Merged

laurigates merged commit 8c4a638 into main May 11, 2026
3 checks passed

laurigates deleted the chore/changelog-review-2.1.138 branch May 11, 2026 12:02

claude Bot mentioned this pull request May 11, 2026

ci(changelog-review): validate new triage-only workflow shape #1303

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138#1301

docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138#1301
laurigates merged 1 commit into
mainfrom
chore/changelog-review-2.1.138

laurigates commented May 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laurigates commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Recovery scope

Run resource usage

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

laurigates commented May 11, 2026 •

edited

Loading