ci(changelog-review): refactor to triage-only + skip-if-exists by laurigates · Pull Request #1302 · laurigates/claude-plugins

laurigates · 2026-05-11T10:11:30Z

Summary

Restructures the changelog-review workflow so a 62-version gap no longer blows the turn budget. The previous design tried to do triage AND rule edits in a single Claude invocation; with the 2.1.76 → 2.1.138 gap that pattern failed in run 25651584541 at turn 51/50 after 2h 7min. (Recovery of that run's actual rule-edit work is in #1301.)

Why it failed (data)

The SDK reports total_cost_usd (notional, not billed under Max 20x — but still a clean proxy for token volume since it's computed from token counts). Per-call token counts aren't exposed at the result level, only the aggregate.

Comparison against this workflow's recent weekly runs:

Run	Date	Turns	Duration	Notional $	Notes
2026-05-04	success	41	8m 31s	$2.01	Normal weekly run
2026-04-27	success	4	47m	$57.45	Big-context first-hit; 4 huge turns
2026-04-20	max_turns	41	36m	$39.18	Same failure mode, smaller gap
2026-04-15	success	20	1m 30s	$0.40	No-op week
2026-04-06	success	9	11m	$6.18	Modest gap
2026-05-11	max_turns	51	2h 7m	$63.69	The recovered run

For reference, a typical claude-code-review.yml PR review run is $0.50–$1.00 with 15-20 turns. The changelog-review workflow on a fresh gap is 30–100× that, dominated by context size on first hit (4 turns / $57 in one run) rather than turn count alone.

What changed

Triage-only phase (this workflow):

Read the excerpt + the JSON tracking file
Open ONE tracking issue summarizing relevant changes grouped by candidate rule file
Ratchet .claude-code-version-check.json
Open one JSON-only PR
No rule edits. Bounded to --max-turns 25 with sonnet.

Apply phase (delegated to existing claude.yml):

User mentions @claude please update <rule-file> for issue #N on the tracking issue
Each apply is small-scope with its own turn budget — single rule file at a time

Skip-if-exists check:

Before invoking Claude, query gh issue list --label changelog-review --state open for a title matching the current range. If found, skip. workflow_dispatch + force_full_review=true overrides.
Makes the workflow safe to re-run and idempotent across cron triggers.

Expanded additional_permissions:

Adds Bash(gh api *), Bash(gh label *), Bash(jq *), Bash(grep *), Bash(awk *), Bash(wc *), Bash(cat *), Bash(date *), etc.
The failed run logged 201 permission denials in 51 turns — ~4 wasted turns per usable turn just retrying denied commands. These additions cover the patterns the agent reached for.

Prompt hardening:

Strict step-by-step triage flow with explicit non-goals ("DO NOT edit rule files")
Final sanity check: confirm exactly one commit, exactly one file changed, no rule-file edits, before finishing the turn
Explicit warning about the previous run's self-misdiagnosis loop ("if you think a branch is empty, re-check with git diff/git log before destructive recovery")
Excerpt truncated to 200KB if larger — keeps the prompt cache predictable

Why this design over alternatives

Option	Why not
Matrix: one job per version	62 jobs is noisy; most versions are pure bug fixes (e.g. 2.1.137 is one bullet)
Sub-agents / agent team within one run	Doesn't multiply turn budget — `Task`/`SendMessage` count as turns from the parent loop. Only separate jobs get independent budgets.
Keep monolithic single-PR design	Will fail again on the next gap
Iterative-with-checkpoint	Possible, but the issue/apply split matches the historical pattern (#78, #786, #854, #909, #949) and lets human review gate each apply

Test plan

Validate YAML parse + actionlint
Dry-run via workflow_dispatch with force_full_review=true AFTER docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138 #1301 merges (tracked version will be 2.1.138, latest will be whatever's current — small or no gap, ideal for dry-running the new shape)
Verify the triage issue is created with grouped per-rule-file sections
Verify the JSON-only PR contains exactly one file change
Run a second dispatch (without force) and confirm skip-if-exists fires

Follow-up (not blocking)

Consider extracting the workflow into laurigates/.github as reusable-changelog-review.yml once the new shape is proven — per .claude/rules/ci-cd-workflows.md we prefer reusable workflows for cross-repo automation, though this one is plugin-collection specific so reuse value is currently low.
The show_full_output: false setting in anthropics/claude-code-action keeps the per-call token counts hidden. If we ever want to track token-level efficiency over time, we could either flip that on for the triage job specifically, or scrape the OpenTelemetry log events that the SDK emits (richer detail than the summary result).

🤖 Drafted with Claude Code

@claude

The previous design tried to do triage AND rule edits in a single Claude invocation. With a 62-version gap (2.1.76 → 2.1.138) this failed in run 25651584541 after burning $63.69 and 2h 7min: - 201 permission denials (4 per turn) bleeding the budget on commands that weren't in additional_permissions - The agent shipped a correct PR (#1299) mid-run, misdiagnosed it as an empty branch 72 seconds later, closed it, then spent ~75 minutes thrashing to "recreate" it before hitting max-turns at turn 51/50 Restructure splits the work into two phases: 1. Triage (this workflow): open ONE tracking issue summarizing relevant changes grouped by candidate rule file, ratchet .claude-code-version-check.json, open a JSON-only PR. No rule edits. Bounded budget: --max-turns 25. 2. Apply (existing claude.yml @-mention responder): user mentions @claude on the tracking issue to drive targeted rule edits. Each apply is small-scope and gets its own turn budget. Other changes: - Skip-if-exists check: before running Claude, look for an open issue matching "Review Claude Code changelog: <tracked> → <latest>". If found, skip. force_full_review=true overrides. - Expand additional_permissions to cover gh api/label, jq, grep, awk, cat, wc, date — eliminates the per-turn denial bleed observed in the failed run. - Truncate excerpts above 200KB to keep prompt cache predictable. - Replace open-ended rule-edit prompt with strict triage steps and a final sanity-check (one commit, one file, no rule edits) so the agent fails fast on misdiagnosis instead of looping. The recovery of the failed run's actual work is in PR #1301; this PR fixes the workflow so the next big gap doesn't repeat the failure.

## Summary Recovers the rule updates produced by run [25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541), which hit `error_max_turns` after running for 2h 7min and 51/50 turns. The work itself was correct — the bot misdiagnosed its own PR #1299 as an empty branch 72 seconds after opening it, then spent the remaining ~75 minutes thrashing to re-create what it had already shipped, and hit max-turns. This PR salvages commit `e58aed8c` as a single clean commit so the documentation work isn't wasted. ## Recovery scope - **`.claude-code-version-check.json`** — `lastCheckedVersion` → `2.1.138`, consolidated `reviewedChanges` entry covering 2.1.77 → 2.1.138 - **`hooks-reference.md`** — `effort.level` in Common Fields (2.1.133+), `duration_ms` on PostToolUse (2.1.119+), new "PostToolUse — Replace Tool Output" section with `updatedToolOutput` (2.1.121+) - **`skill-development.md`** — `${CLAUDE_EFFORT}` string substitution row (2.1.120+) - **`agent-development.md`** — `--agent` mode hooks/mcpServers note (2.1.116+), `worktree.baseRef` setting table (2.1.133+), subagent Skill-tool discovery note (2.1.133+) - **`plugin-structure.md`** — `bin/` directory entry, `experimental` block in Optional Fields (2.1.129+), `skills` entry fix note (2.1.136+) - **`agentic-permissions.md`** — `autoMode.hard_deny` subsection (2.1.136+) - **`sandbox-guidance.md`** — `CLAUDE_CODE_SESSION_ID` env var (2.1.132+), `sandbox.network.deniedDomains` (2.1.113+) ## Run resource usage The repo is on Claude Max 20x, so the SDK's notional cost figure isn't actually invoiced — but it's still a useful workload-size signal because it's computed directly from token counts. The SDK result line for the failed run: ``` "subtype": "error_max_turns", "duration_ms": 7611491, # 2h 7min "num_turns": 51, "total_cost_usd": 63.69354, # notional, not billed under Max 20x "permission_denials_count": 201 ``` For comparison, the same workflow's prior weekly runs: | Run | Date | Conclusion | Turns | Duration | Notional $ | |---|---|---|---|---|---| | [25315527872](https://github.com/laurigates/claude-plugins/actions/runs/25315527872) | 2026-05-04 | success | 41 | 8m 31s | $2.01 | | [24991465607](https://github.com/laurigates/claude-plugins/actions/runs/24991465607) | 2026-04-27 | success | 4 | 47m 13s | $57.45 | | [24662476905](https://github.com/laurigates/claude-plugins/actions/runs/24662476905) | 2026-04-20 | max_turns | 41 | 35m 51s | $39.18 | | [24453356948](https://github.com/laurigates/claude-plugins/actions/runs/24453356948) | 2026-04-15 | success | 20 | 1m 30s | $0.40 | | [24027905006](https://github.com/laurigates/claude-plugins/actions/runs/24027905006) | 2026-04-06 | success | 9 | 11m 02s | $6.18 | | [23739805721](https://github.com/laurigates/claude-plugins/actions/runs/23739805721) | 2026-03-30 | success | 38 | 7m 10s | $1.47 | | [25651584541](https://github.com/laurigates/claude-plugins/actions/runs/25651584541) | 2026-05-11 | **max_turns** | **51** | **2h 7m** | **$63.69** | The failed run's notional cost is ~60-100× a typical PR review run ($0.50–$1) and ~30× a normal weekly changelog-review tick. Worth noting: per-call token counts are in the SDK's hidden full output stream — only the aggregate cost/turn/duration is exposed in the result event. The 4-turn / $57 outlier on 2026-04-27 suggests the cost is dominated by context size on first hit after a long gap, not by turn count alone. ## Test plan - [ ] Spot-check each rule edit against the upstream changelog bullet that motivated it - [ ] Verify `.claude-code-version-check.json` parses and ratchets `lastCheckedVersion` to `2.1.138` - [ ] Confirm no placeholder/template artifacts remain (the failed run's 16 commits included several "fix placeholder" iterations; the final tree at `e58aed8c` was checked clean before salvage) ## Follow-up Workflow restructure to triage-as-issues + skip-if-exists is in #1302 to prevent the next 60-version backlog from blowing the budget again. 🤖 Recovered with [Claude Code](https://claude.com/claude-code)

laurigates · 2026-05-11T12:03:22Z

@claude let's create the follow-up issues

claude · 2026-05-11T12:03:36Z

Claude finished @laurigates's task in 3m 34s —— View job

Follow-up issues for PR #1302

Gather context from PR description
Review the workflow file for additional follow-up candidates
Create follow-up issues

Created 3 issues:

Issue	Title	Source
#1303	ci(changelog-review): validate new triage-only workflow shape	PR test plan (unchecked items)
#1304	ci(changelog-review): extract as reusable workflow once shape is proven	PR follow-up item 1
#1305	ci(changelog-review): track per-call token efficiency over time	PR follow-up item 2

#1303 covers the unverified test plan items (actionlint, dry-run dispatch, skip-if-exists) — these need a live run now that the tracked version is at 2.1.138.

#1304 and #1305 are the two explicit "not blocking" follow-ups from the PR description.

laurigates added github-actions Pull requests that update GitHub Actions code maintenance labels May 11, 2026

laurigates self-assigned this May 11, 2026

laurigates mentioned this pull request May 11, 2026

docs(rules): apply Claude Code changelog review 2.1.76 → 2.1.138 #1301

Merged

3 tasks

laurigates merged commit 257b69c into main May 11, 2026
3 checks passed

laurigates deleted the ci/changelog-review-triage-only branch May 11, 2026 12:03

This was referenced May 11, 2026

ci(changelog-review): validate new triage-only workflow shape #1303

Open

ci(changelog-review): extract as reusable workflow once shape is proven #1304

Open

ci(changelog-review): track per-call token efficiency over time #1305

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(changelog-review): refactor to triage-only + skip-if-exists#1302

ci(changelog-review): refactor to triage-only + skip-if-exists#1302
laurigates merged 1 commit into
mainfrom
ci/changelog-review-triage-only

laurigates commented May 11, 2026 •

edited

Loading

Uh oh!

laurigates commented May 11, 2026

Uh oh!

Uh oh!

claude Bot commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

laurigates commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why it failed (data)

What changed

Why this design over alternatives

Test plan

Follow-up (not blocking)

Uh oh!

laurigates commented May 11, 2026

Uh oh!

Uh oh!

claude Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Follow-up issues for PR #1302

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

laurigates commented May 11, 2026 •

edited

Loading

claude Bot commented May 11, 2026 •

edited

Loading