Skip to content

Defend against ACP server reporting used > size#364

Closed
timvisher-dd wants to merge 10 commits intoxenodium:mainfrom
timvisher-dd:timvisher-dd/tests/add-shell-usage-regression-tests
Closed

Defend against ACP server reporting used > size#364
timvisher-dd wants to merge 10 commits intoxenodium:mainfrom
timvisher-dd:timvisher-dd/tests/add-shell-usage-regression-tests

Conversation

@timvisher-dd
Copy link
Copy Markdown
Contributor

@timvisher-dd timvisher-dd commented Mar 3, 2026

The ACP server (claude-agent-acp) has a bug where model switches cause used to exceed size in session/update notifications. For example, switching from Opus 1M to Sonnet 200k drops size to 200000 while used keeps growing past it (observed: 419574/200000 = 209.8%). This results in nonsensical context indicators and percentages e.g. (from a real session)

 Context: 420k/200k (209.8%)
  Tokens: 32 in · 11k out · 11m cached (11m total)
    Cost: USD75.96

While I intend to get a fix for this improper reporting into claude-agent-acp, agent-shell should be robust against it. To that end, when the garbage usage data is observed, the UI now signals unreliable data instead of showing nonsense:

  • Context indicator shows ? with warning face when used > size
  • Formatted usage shows raw numbers with (?) instead of a bogus percentage
  • A regression test replays the real observed ACP traffic from the model-switch scenario so this class of bug is caught going forward

This also adds comprehensive ERT test coverage for agent-shell-usage.el: notification updates, indicator scaling/colors, compaction replay, token saving, and number formatting.

For the claude-agent-acp side of this see agentclientprotocol/claude-agent-acp#412

Test plan

  • All 21 ERT tests pass
  • checkdoc clean on both files
  • byte-compile clean on both files
  • Manual baking verification

@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from e528a97 to 3e99679 Compare March 3, 2026 19:49
@timvisher-dd timvisher-dd changed the title Add usage tracking tests and fix existing test failures Add usage tracking and context indicator regression tests Mar 3, 2026
@timvisher-dd timvisher-dd marked this pull request as ready for review March 3, 2026 20:08
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 3e99679 to 6d914d4 Compare March 11, 2026 17:31
@timvisher-dd timvisher-dd changed the title Add usage tracking and context indicator regression tests Defend against ACP server reporting used > size Mar 11, 2026
@timvisher-dd timvisher-dd marked this pull request as draft March 12, 2026 15:54
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 6d914d4 to c98e427 Compare March 13, 2026 01:20
@timvisher-dd timvisher-dd marked this pull request as ready for review March 14, 2026 22:00
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch 2 times, most recently from e2ce5c0 to f64cdbd Compare March 14, 2026 22:02
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from f64cdbd to 2f153b2 Compare March 14, 2026 23:53
@timvisher-dd timvisher-dd reopened this Mar 15, 2026
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 2f153b2 to 8b63348 Compare March 15, 2026 16:05
timvisher-dd and others added 10 commits March 15, 2026 14:49
CI workflow runs byte-compilation and ERT tests on push/PR using
GitHub Actions with deps checked out from timvisher-dd/acp.el-plus
and xenodium/shell-maker.

bin/test parses ci.yml with yq so local runs stay in sync with CI
automatically. It symlinks local dependency checkouts into deps/ to
match the CI layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New library for sending desktop notifications from Emacs.

In GUI mode on macOS, uses native UNUserNotificationCenter via a
dynamic module (agent-shell-alert-mac.dylib) compiled JIT on first
use (inspired by vterm). When compilation fails (e.g. missing Xcode
CLI tools), a message recommends `xcode-select --install`.

In terminal mode, auto-detects the host terminal emulator and sends
the appropriate OSC escape sequence:
- OSC 9: iTerm2, Ghostty, WezTerm, foot, mintty, ConEmu
- OSC 99: kitty
- OSC 777: urxvt, VTE-based terminals

Inside tmux, wraps in DCS passthrough (checking allow-passthrough
first). Falls back to osascript on macOS when the terminal is
unknown or tmux passthrough is not enabled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follows the same pattern as acp.el: a boolean toggle
(agent-shell-logging-enabled, off by default), a per-shell log
buffer stored in state, and label+format-string logging. Adds log
calls to idle notification start/cancel/fire for observability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After each agent turn completes, a 30s timer starts. Any user input
in the buffer cancels it; otherwise it fires a desktop notification
via agent-shell-alert. The echo area message is only shown when the
shell buffer is not the active buffer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validate that agent-shell-buffers and agent-shell-project-buffers
reflect (buffer-list) ordering correctly: switch-to-buffer and
select-window promote, with-current-buffer does not, bury-buffer
demotes, and project filtering preserves order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fails PRs that modify .el files or tests/ without also updating
README.org, ensuring the soft-fork features list stays current.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive ERT tests for agent-shell-usage.el covering
notification updates, context indicator scaling/colors, compaction
replay, token saving, and number formatting.

The ACP server has a bug where model switches cause used to exceed
size in session/update notifications. Rather than clamping, signal
unreliable data: indicator shows ? with warning face, format shows
(?) instead of a bogus percentage. A regression test replays real
observed traffic from the Opus 1M -> Sonnet 200k switch scenario.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timvisher-dd timvisher-dd force-pushed the timvisher-dd/tests/add-shell-usage-regression-tests branch from 8b63348 to b9ca659 Compare March 15, 2026 20:03
@timvisher-dd timvisher-dd deleted the timvisher-dd/tests/add-shell-usage-regression-tests branch March 15, 2026 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant