Skip to content

docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138#1301

Merged
laurigates merged 1 commit into
mainfrom
chore/changelog-review-2.1.138
May 11, 2026
Merged

docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138#1301
laurigates merged 1 commit into
mainfrom
chore/changelog-review-2.1.138

Conversation

@laurigates
Copy link
Copy Markdown
Owner

@laurigates laurigates commented May 11, 2026

Summary

Recovers the rule updates produced by run 25651584541, which hit error_max_turns after running for 2h 7min and 51/50 turns. The work itself was correct — the bot misdiagnosed its own PR #1299 as an empty branch 72 seconds after opening it, then spent the remaining ~75 minutes thrashing to re-create what it had already shipped, and hit max-turns. This PR salvages commit e58aed8c as a single clean commit so the documentation work isn't wasted.

Recovery scope

  • .claude-code-version-check.jsonlastCheckedVersion2.1.138, consolidated reviewedChanges entry covering 2.1.77 → 2.1.138
  • hooks-reference.mdeffort.level in Common Fields (2.1.133+), duration_ms on PostToolUse (2.1.119+), new "PostToolUse — Replace Tool Output" section with updatedToolOutput (2.1.121+)
  • skill-development.md${CLAUDE_EFFORT} string substitution row (2.1.120+)
  • agent-development.md--agent mode hooks/mcpServers note (2.1.116+), worktree.baseRef setting table (2.1.133+), subagent Skill-tool discovery note (2.1.133+)
  • plugin-structure.mdbin/ directory entry, experimental block in Optional Fields (2.1.129+), skills entry fix note (2.1.136+)
  • agentic-permissions.mdautoMode.hard_deny subsection (2.1.136+)
  • sandbox-guidance.mdCLAUDE_CODE_SESSION_ID env var (2.1.132+), sandbox.network.deniedDomains (2.1.113+)

Run resource usage

The repo is on Claude Max 20x, so the SDK's notional cost figure isn't actually invoiced — but it's still a useful workload-size signal because it's computed directly from token counts. The SDK result line for the failed run:

"subtype": "error_max_turns",
"duration_ms": 7611491,        # 2h 7min
"num_turns": 51,
"total_cost_usd": 63.69354,    # notional, not billed under Max 20x
"permission_denials_count": 201

For comparison, the same workflow's prior weekly runs:

Run Date Conclusion Turns Duration Notional $
25315527872 2026-05-04 success 41 8m 31s $2.01
24991465607 2026-04-27 success 4 47m 13s $57.45
24662476905 2026-04-20 max_turns 41 35m 51s $39.18
24453356948 2026-04-15 success 20 1m 30s $0.40
24027905006 2026-04-06 success 9 11m 02s $6.18
23739805721 2026-03-30 success 38 7m 10s $1.47
25651584541 2026-05-11 max_turns 51 2h 7m $63.69

The failed run's notional cost is ~60-100× a typical PR review run ($0.50–$1) and ~30× a normal weekly changelog-review tick. Worth noting: per-call token counts are in the SDK's hidden full output stream — only the aggregate cost/turn/duration is exposed in the result event. The 4-turn / $57 outlier on 2026-04-27 suggests the cost is dominated by context size on first hit after a long gap, not by turn count alone.

Test plan

  • Spot-check each rule edit against the upstream changelog bullet that motivated it
  • Verify .claude-code-version-check.json parses and ratchets lastCheckedVersion to 2.1.138
  • Confirm no placeholder/template artifacts remain (the failed run's 16 commits included several "fix placeholder" iterations; the final tree at e58aed8c was checked clean before salvage)

Follow-up

Workflow restructure to triage-as-issues + skip-if-exists is in #1302 to prevent the next 60-version backlog from blowing the budget again.

🤖 Recovered with Claude Code

Salvages the rule updates produced by the changelog-review workflow run
that hit max-turns (run 25651584541). Original PR #1299 was closed by
the bot on a false-positive "empty branch" self-diagnosis; the diff at
commit e58aed8 is recovered here as a single clean commit.

Updates:
- .claude-code-version-check.json: lastCheckedVersion → 2.1.138 with a
  consolidated reviewedChanges entry covering 2.1.77 → 2.1.138
- hooks-reference.md: effort.level in Common Fields, duration_ms on
  PostToolUse, new "PostToolUse — Replace Tool Output" section
- skill-development.md: $CLAUDE_EFFORT string substitution
- agent-development.md: --agent mode hooks/mcpServers note,
  worktree.baseRef table, subagent Skill-tool discovery note
- plugin-structure.md: bin/ directory, experimental block, skills
  entry fix note
- agentic-permissions.md: autoMode.hard_deny subsection
- sandbox-guidance.md: CLAUDE_CODE_SESSION_ID, deniedDomains
@laurigates laurigates added changelog-review Changelog review tracking maintenance labels May 11, 2026
@laurigates laurigates self-assigned this May 11, 2026
@laurigates laurigates merged commit 8c4a638 into main May 11, 2026
3 checks passed
@laurigates laurigates deleted the chore/changelog-review-2.1.138 branch May 11, 2026 12:02
laurigates added a commit that referenced this pull request May 11, 2026
## Summary

Restructures the changelog-review workflow so a 62-version gap no longer
blows the turn budget. The previous design tried to do triage AND rule
edits in a single Claude invocation; with the 2.1.76 → 2.1.138 gap that
pattern failed in [run
25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541)
at turn 51/50 after 2h 7min. (Recovery of that run's actual rule-edit
work is in #1301.)

## Why it failed (data)

The SDK reports `total_cost_usd` (notional, not billed under Max 20x —
but still a clean proxy for token volume since it's computed from token
counts). Per-call token counts aren't exposed at the result level, only
the aggregate.

Comparison against this workflow's recent weekly runs:

| Run | Date | Turns | Duration | Notional $ | Notes |
|---|---|---|---|---|---|
| 2026-05-04 | success | 41 | 8m 31s | $2.01 | Normal weekly run |
| 2026-04-27 | success | 4 | 47m | $57.45 | Big-context first-hit; 4
huge turns |
| 2026-04-20 | max_turns | 41 | 36m | $39.18 | Same failure mode,
smaller gap |
| 2026-04-15 | success | 20 | 1m 30s | $0.40 | No-op week |
| 2026-04-06 | success | 9 | 11m | $6.18 | Modest gap |
| **2026-05-11** | **max_turns** | **51** | **2h 7m** | **$63.69** | The
recovered run |

For reference, a typical `claude-code-review.yml` PR review run is
$0.50–$1.00 with 15-20 turns. The changelog-review workflow on a fresh
gap is **30–100× that**, dominated by context size on first hit (4 turns
/ $57 in one run) rather than turn count alone.

## What changed

**Triage-only phase (this workflow):**
- Read the excerpt + the JSON tracking file
- Open ONE tracking issue summarizing relevant changes grouped by
candidate rule file
- Ratchet `.claude-code-version-check.json`
- Open one JSON-only PR
- **No rule edits.** Bounded to `--max-turns 25` with sonnet.

**Apply phase (delegated to existing `claude.yml`):**
- User mentions `@claude please update <rule-file> for issue #N` on the
tracking issue
- Each apply is small-scope with its own turn budget — single rule file
at a time

**Skip-if-exists check:**
- Before invoking Claude, query `gh issue list --label changelog-review
--state open` for a title matching the current range. If found, skip.
`workflow_dispatch` + `force_full_review=true` overrides.
- Makes the workflow safe to re-run and idempotent across cron triggers.

**Expanded `additional_permissions`:**
- Adds `Bash(gh api *)`, `Bash(gh label *)`, `Bash(jq *)`, `Bash(grep
*)`, `Bash(awk *)`, `Bash(wc *)`, `Bash(cat *)`, `Bash(date *)`, etc.
- The failed run logged **201 permission denials in 51 turns** — ~4
wasted turns per usable turn just retrying denied commands. These
additions cover the patterns the agent reached for.

**Prompt hardening:**
- Strict step-by-step triage flow with explicit non-goals ("DO NOT edit
rule files")
- Final sanity check: confirm exactly one commit, exactly one file
changed, no rule-file edits, before finishing the turn
- Explicit warning about the previous run's self-misdiagnosis loop ("if
you think a branch is empty, re-check with `git diff`/`git log` before
destructive recovery")
- Excerpt truncated to 200KB if larger — keeps the prompt cache
predictable

## Why this design over alternatives

| Option | Why not |
|---|---|
| Matrix: one job per version | 62 jobs is noisy; most versions are pure
bug fixes (e.g. 2.1.137 is one bullet) |
| Sub-agents / agent team within one run | Doesn't multiply turn budget
— `Task`/`SendMessage` count as turns from the parent loop. Only
separate jobs get independent budgets. |
| Keep monolithic single-PR design | Will fail again on the next gap |
| Iterative-with-checkpoint | Possible, but the issue/apply split
matches the historical pattern (#78, #786, #854, #909, #949) and lets
human review gate each apply |

## Test plan

- [ ] Validate YAML parse + actionlint
- [ ] Dry-run via `workflow_dispatch` with `force_full_review=true`
AFTER #1301 merges (tracked version will be 2.1.138, latest will be
whatever's current — small or no gap, ideal for dry-running the new
shape)
- [ ] Verify the triage issue is created with grouped per-rule-file
sections
- [ ] Verify the JSON-only PR contains exactly one file change
- [ ] Run a second dispatch (without force) and confirm skip-if-exists
fires

## Follow-up (not blocking)

- Consider extracting the workflow into `laurigates/.github` as
`reusable-changelog-review.yml` once the new shape is proven — per
`.claude/rules/ci-cd-workflows.md` we prefer reusable workflows for
cross-repo automation, though this one is plugin-collection specific so
reuse value is currently low.
- The `show_full_output: false` setting in
`anthropics/claude-code-action` keeps the per-call token counts hidden.
If we ever want to track token-level efficiency over time, we could
either flip that on for the triage job specifically, or scrape the
OpenTelemetry log events that the SDK emits (richer detail than the
summary result).

🤖 Drafted with [Claude Code](https://claude.com/claude-code)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-review Changelog review tracking maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant