fix: stale PostToolUse notifications and background task output leak by timvisher-dd · Pull Request #353 · agentclientprotocol/claude-agent-acp

timvisher-dd · 2026-02-27T01:01:42Z

Problem

Two independent bugs caused stale or garbled output in ACP clients (agent-shell):

PostToolUse hook race: The SDK fires PostToolUse hooks ~42ms before the streaming handler registers callbacks, causing "No onPostToolUseHook found" errors and lost session/update notifications. For subagent child tool uses, the gap is 10-60+ seconds.
Background task output leak: The SDK handles background task completion differently depending on task type. For local_agent tasks (subagents), the SDK internally defers the result message via its iP() function (minified; likely hasRunningDeferrableTasks) — the turn stays open until the agent task finishes. But for local_bash tasks (run_in_background Bash commands), the SDK does not defer — it emits result immediately, then yields task_notification followed by an internal model turn after the result. prompt() returned at the first result, leaving the internal turn in the iterator buffer. The next prompt() call processed stale background task output instead of the user's actual input — the model responded to its own background task completion rather than "yes". This asymmetry is why the fix only tracks local_bash tasks: local_agent is already handled by the SDK. It also points at the ideal upstream fix: extending the SDK's iP() / hasRunningDeferrableTasks to include local_bash tasks would eliminate the leak at the source, making the adapter's poll loop, queue peeking, and SDK internal API access unnecessary. The entire Fix 2 is a workaround for this gap.

Three contamination layers

The background task leak manifests through three independent mechanisms. All three must be understood to evaluate why the indefinite poll (Fix 2) is the only viable solution:

Layer 1 — Iterator-level internal turns: The SDK yields task_notification → assistant → result (sometimes with init / stream_event interspersed) through the query iterator after the user turn's result. If prompt() returns at the first result, these leak into the next prompt() call's iterator buffer. This is what Fix 2 directly addresses.
Layer 2 — SDK context injection: The SDK injects <task-notification> and <system-reminder> content directly into the LLM's conversation history as user-role messages, before the adapter's prompt loop runs. This is a separate mechanism from the iterator — these messages never appear on the ACP wire (confirmed: zero task_notification, task_started, or local_command_output in traffic.eld debug captures). The model sees system-dominated content, concludes "the user hasn't said anything," responds to the notifications, and end_turns without addressing the real question. Keeping the turn open also fixes this: if the turn never ends while tasks are pending, there is no "next turn" for the SDK to inject stale notifications into.
Layer 3 — local_command_output forwarding: The adapter's local_command_output handler (system message case) forwards linter/hook output as agent_message_chunk to ACP clients. This is visible agent text that the user didn't request. Not addressed by this PR — it's a separate contamination vector unrelated to background tasks.

Why the user's prompt gets "swallowed"

The user-visible symptom is that they must send the same message twice. Below is a complete walkthrough from the debug logs (x.acp-debug-20260315-000100/) showing the exact sequence.

Step 1: Background task launched during previous turn

The agent launched a codex review as a background bash task. ACP wire traffic shows the task ID and output path in tool_call_update notifications:

// ACP wire — tool_call_update with backgroundTaskId
{"jsonrpc":"2.0","method":"session/update","params":{
  "sessionId":"dc75a7b7-…",
  "update":{
    "_meta":{"claudeCode":{
      "toolResponse":{"stdout":"","stderr":"","backgroundTaskId":"bsmr1wpsf"},
      "toolName":"Bash"}},
    "toolCallId":"toolu_01UabGxp5WKpecR7S9fXGKAP",
    "sessionUpdate":"tool_call_update"}}}

// Separate tool_call_update with the output file path (same toolCallId)
{"jsonrpc":"2.0","method":"session/update","params":{
  "sessionId":"dc75a7b7-…",
  "update":{
    "_meta":{"terminal_output":{
      "terminal_id":"toolu_01UabGxp5WKpecR7S9fXGKAP",
      "data":"Command running in background with ID: bsmr1wpsf. Output is being written to: /private/tmp/claude-504/…/tasks/bsmr1wpsf.output"}},
    "toolCallId":"toolu_01UabGxp5WKpecR7S9fXGKAP",
    "sessionUpdate":"tool_call_update"}}}

The turn ends normally (end_turn). The background task is still running. Note: task_notification, task_started, and local_command_output never appear in traffic.eld — they are purely SDK-internal.

Step 2: User sends their next prompt

;; ACP wire (log.txt:6610) — user's actual question
{"jsonrpc":"2.0","method":"session/prompt","id":4,
 "params":{"sessionId":"dc75a7b7-…",
  "prompt":[{"type":"text","text":"this should be very little added code, right?
    everything about the subcommand parsing, selectors, targets, etc.
    should all be shared code between plan and init?"}]}}

Step 3: Adapter consumes stale internal turn (Layer 1 — working)

;; claude-agent-acp STDERR (log.txt:6615,6619)
Session dc75a7b7-…: consuming background task result
Session dc75a7b7-…: consuming background task result

The backgroundInitPending mechanism detects the stale init → result cycle and consumes it. Layer 1 is working.

Step 4: Model responds — but to the wrong content (Layer 2 — broken)

Despite Layer 1 consuming the iterator-level leak, the SDK has already injected <task-notification> and system content into the LLM's conversation history as user-role messages. The model's thinking stream reveals it:

;; ACP wire (log.txt:6623-6696) — agent_thought_chunk stream
"The user hasn't said anything - these are just system notifications
 about linter changes to BUILD.bazel and terraform.go (reordering
 the srcs list alphabetically), and a background task completing.
 The linter changes look fine - just alphabetical sorting of the
 srcs list. Nothing for me to do here unless the user asks something.

 Wait, I should check if the user is actually prompting me or if
 this is just notifications. Looking at the message, there's no
 user text - just system reminders and a task notification."

The model explicitly says "the user hasn't said anything" and "there's no user text" — even though the user's question is present in the session/prompt request. The SDK-injected content dominates the context.

Step 5: Visible response addresses linter output, not the user's question

;; ACP wire (log.txt:7106-7146) — agent_message_chunk stream
"Linter reordered the `srcs` list in BUILD.bazel alphabetically
 — looks correct. Everything checks out."

The model responds to the linter/system content and end_turns. The user's actual question about shared code between plan and init is never addressed.

What the user sees in agent-shell

╭─ Agent ─────────────────────────────────────────────────────────╮
│ Full review running in background. I'll share results when it   │
│ completes.                                                      │
╰─────────────────────────────────────────────────────────────────╯

> this should be very little added code, right? everything about
> the subcommand parsing, selectors, targets, etc. should all be
> shared code between plan and init?

╭─ Agent ─────────────────────────────────────────────────────────╮
│ Linter reordered the `srcs` list in BUILD.bazel alphabetically  │
│ — looks correct. Everything checks out.                         │
╰─────────────────────────────────────────────────────────────────╯

> this should be very little added code, right? everything about
> the subcommand parsing, selectors, targets, etc. should all be
> shared code between plan and init?

╭─ Agent ─────────────────────────────────────────────────────────╮
│ Yes — the `plan` subcommand reuses the existing selector,       │
│ target, and output infrastructure. The new code is mostly…      │
╰─────────────────────────────────────────────────────────────────╯

The user must send the identical message twice. The first attempt is consumed by the model responding to SDK-injected system content. Only the second attempt gets a real answer.

Observed in this investigation session

This exact pattern reproduced live during the investigation. Three <task-notification> messages from subagent find commands leaked into the conversation. The [bg-task-leak] warning fired:

[bg-task-leak] result received with 1 unresolved background task(s) [b4oxxatfz]
  but no task_notification in queue — possible internal turn leak into next prompt

The user's question about the [bg-task-leak] warning was swallowed — the model responded to the stale <task-notification> instead. The user had to resend.

Root Cause Analysis

Bug 1: PostToolUse hook timing

The SDK fires hooks synchronously from its tool execution path, but streaming events that trigger registerHookCallback() are consumed asynchronously from the query iterator. For fast tools, the hook fires before the content_block_start event is processed (~42ms gap). For subagent child tools, the gap is 10-60+ seconds because messages are relayed only when the subagent finishes.

Bug 2: SDK internal turns from background tasks

Reading the minified SDK source revealed the root cause: the SDK's iP() function (minified; likely hasRunningDeferrableTasks() or shouldDeferResult() based on usage) checks whether to defer the result message for running background tasks, but only includes local_agent tasks — NOT local_bash tasks. Similarly, the internal queue class q4 (likely InputStreamQueue or MessageBuffer) is where we peek for buffered task_notification messages. When a background Bash task completes after the turn's result, the SDK emits:

[148] result/success         <- prompt() returns here (premature)
[149] system/task_notification <- bg task completed
[150] system/init            <- internal turn starts
[151-167] stream_event deltas <- model responds to task completion
[168] assistant message
[172] result/success         <- internal turn ends (leaked to next prompt)

Verified: the Claude TUI hides this by rendering internal turns inline, but ACP clients can't — they see the stale output on the next prompt.

Wire-level evidence: task_notification, task_started, and local_command_output never appear in ACP wire traffic (traffic.eld captures from agent-shell debug sessions). These are purely SDK-internal messages that the adapter sees through the query iterator but that never cross the ACP protocol boundary. This distinction matters for debugging: ACP clients cannot observe or intercept these messages — only the adapter can.

Independence Proof

Empirically verified both fixes are independently necessary:

Reverted fire-and-stash -> 10 PostToolUse tests fail, bg-task-leak tests pass
Reverted internal turn fix -> bg-task-leak tests fail, PostToolUse tests pass

Fix 1: Non-blocking fire-and-stash (tools.ts)

When the hook fires before registration:

Stash { toolInput, toolResponse } and return { continue: true } immediately
When registerHookCallback() runs later, find stash, execute callback, clean up
Periodic sweep (60s, 5min TTL, unref()) cleans orphaned entries

Handles both the 42ms race and 10-60s subagent delay with zero blocking.

Fix 2: Indefinite poll loop for internal turn consumption (acp-agent.ts)

After receiving a result/success, the adapter keeps the turn open indefinitely until all background tasks resolve or the user cancels. This fixes both Layer 1 (iterator-level internal turns are consumed before prompt() returns) and Layer 2 (no stale notifications exist to inject into the next turn's context). It also aligns with the turn-based nature of ACP — the agent shouldn't appear idle while background work is pending. The competing Agent Communication Protocol (agentcommunicationprotocol.dev) validates this design: runs stay in-progress until terminal state, with no concept of "done but also not done."

Secondary benefit: agent-shell rendering continuity

Keeping the turn open also fixes the "idle then flood" rendering behavior observed in agent-shell. Without this fix, the old sequence was: (1) agent finishes main work, prompt() returns end_turn, (2) background task continues running, (3) agent-shell sees end_turn and stops rendering agent output, (4) background task output arrives but agent-shell doesn't render it (UI appears idle), (5) user sends next message, (6) agent-shell starts rendering again and floods all buffered content at once. With the indefinite poll loop, the turn stays open while background tasks run, so agent-shell continues to render sessionUpdate notifications (tool_call_update, agent_message_chunk) live. This was confirmed by Codex analysis and by the E2E poll harness — Turn A held open ~43s while the background task ran, with output activity logged throughout. The same edge-case gaps apply: error/cancellation paths can still cause premature end_turn (see Known limitations).

Design evolution

The fix went through several iterations based on review feedback:

v1: Peek once with setTimeout(0) — yielded one macrotask tick, peeked at queue[0]. Race condition: task_notification could arrive later.
v2: Poll loop with 30s inactivity timeout — polled every 1s for up to 30s of inactivity. File growth reset the timer. Risk: timeout could fire before slow tasks complete, reintroducing contamination.
v3 (current): Indefinite poll, no timeout — the turn stays open until task_notification arrives or the user cancels. This is the only design that fully prevents contamination.

How it works

Track local_bash background tasks via task_started messages (Map with outputPath, taskType, toolUseId, firstSeenAt, lastActivityAt). Ignore local_agent — SDK handles those via its internal iP() check.
Cache output paths from terminal_output even before task_started arrives (earlyOutputPaths Map), to handle SDK message ordering variations.
After result/success, if pendingTaskIds is non-empty, enter the poll loop:
- Poll every 1s
- Check cancellation each iteration (returns { stopReason: "cancelled" })
- Scan the full SDK internal queue (inputStream.queue) for any task_notification — not just queue[0], to handle other system messages queued before it
- Monitor output file growth via fsp.stat() as a per-task heartbeat
- Log aggregated task summary immediately on entry and every 30s, including per-task inactiveFor and a warning: "cancellation risks later prompt contamination"
- Log per-task output activity on file size changes
When task_notification is found in queue, save the prompt response and continue the outer message loop to consume the internal turn.

Key design decisions

No timeout: A timeout that fires before task_notification arrives reintroduces the exact contamination bug we're fixing — and there is no recovery path. Once the turn ends with tasks pending, the SDK will inject <task-notification> into the next turn's conversation context as user-role messages (Layer 2). The adapter cannot intercept or filter SDK-level context injection — it only controls what it pushes via session.input.push(), not what the SDK adds alongside it. A "pump prompt" strategy (sending a synthetic prompt to consume the notification) fails for the same reason: the pump prompt text faces the same contamination as a real user prompt. The only clean options are: wait for the task (current design), or kill the task on cancel. The user can always cancel.
Queue scanning: We scan queueArr.some(...) instead of just queue[0] because the SDK may queue other system messages (e.g., init) before the task_notification.
Output path caching: The earlyOutputPaths Map handles the case where terminal_output (with the background task ID and output file path) arrives before task_started. When task_started fires, it merges the cached path.
Only result/success path: The poll loop only runs for successful results. Error paths (max_tokens, error_during_execution, etc.) return/throw immediately — this is a documented known limitation. The internal turn still leaks for error results, but this is rare and less impactful.

Known limitations

Error result paths don't drain: If the main result is max_tokens or any error_* variant, the function returns/throws immediately without waiting for background tasks. Internal turns can still leak in these cases. Documented in the error_during_execution test.
SDK internal access: Uses inputStream.queue which is not a public API. The task_type === "local_bash" filter relies on an untyped SDK field verified by reading minified source. Both are documented as workarounds pending upstream SDK changes.
Indefinite wait risk: If task_notification never arrives (hung task, SDK bug), prompt() blocks forever. The 30s progress logs make this visible, and cancellation is the escape hatch.
local_command_output forwarding (Layer 3): The adapter forwards local_command_output system messages as agent_message_chunk to ACP clients. This causes linter/hook output to appear as visible agent text unrelated to the user's question. This is a separate contamination vector not addressed by this PR — it's independent of background tasks and would need its own fix (gating or reclassifying the output).

Changes

Source

src/tools.ts: Fire-and-stash mechanism replacing blocking Promise.race
src/acp-agent.ts: Indefinite poll loop for bg task internal turn consumption, structured logging, earlyOutputPaths ordering fix, pendingTaskIds Map with rich metadata (taskType, toolUseId, firstSeenAt, lastActivityAt)
src/embed.d.ts: Type declaration shim for single-file bun build module

Tests (159 pass, 6 skipped, 0 flaky)

src/tests/bg-task-leak.test.ts (16 tests): Full SDK message sequence from real trace data
- Sync/async internal turn consumption
- Cross-prompt stale message isolation
- local_agent vs local_bash filtering
- Aggregated log fires immediately + every 30s (escaped by cancellation)
- Terminal statuses (completed/failed/stopped)
- error_during_execution known limitation
- Multiple back-to-back internal turns
- Task_notification arriving during poll (500ms, 5s delays)
- Cancellation during poll loop returns immediately
- Queue scan detects task_notification behind other messages
src/tests/tools.test.ts (10 new tests): Fire-and-stash contract
- Happy path, 42ms race, subagent delay, batch, error handling
- All deterministic (fake timers + microtask flushing, zero real delays)

Infrastructure

bin/test: Local CI mirror — parses .github/workflows/ci.yml with yq
src/tests/authorization.test.ts: Type fix for auth extension property
Build fixes: SDK import path correction, embed module declaration

E2E Verification

Standalone harness (x.bg-task-leak-harness.mjs) spawns real ACP agent, launches background Bash tasks via subagents, and validates Turn B responds to "yes" (not stale background output):

Pre-fix: 3/3 leak detected — Turn B returned in 1ms with "The background task completed... still waiting on your answer"
Post-fix: 3/3 clean — Turn B returned in 9-19s with file creation tool calls

Codex Review Summary

5 rounds of automated review (Codex CLI --sandbox read-only). Final findings:

Severity	Finding	Status
High	Indefinite poll loop can block forever	Intentional — timeout reintroduces the bug. Cancellation is the escape. 30s progress logs provide visibility.
Medium	Error/max_tokens paths don't drain internal turns	Known limitation, documented in test. Follow-up.
Low	Stale `toolUseCallbacks` sweep comment (tools.ts)	Pre-existing, outside our diff.

Risk

Low for fire-and-stash: Happy path unchanged. Stash path replaces 5s block with immediate return.

Medium for internal turn fix: Accesses SDK internals (inputStream.queue) not in public API. The task_type === "local_bash" filter relies on an untyped SDK field verified by reading the minified SDK source. Both are documented as workarounds pending upstream SDK changes. The indefinite poll loop is a deliberate tradeoff: it prevents prompt contamination at the cost of requiring cancellation if a task never resolves. This tradeoff is forced — no timeout-based alternative exists that doesn't reintroduce contamination, because there is no recovery mechanism once the turn ends with pending tasks (the SDK's context injection cannot be intercepted by the adapter).

Test plan

npm run lint + npm run test:run — 159 tests pass, 0 lint errors
E2E harness: 3/3 iterations clean (background task consumed during Turn A)
5 rounds of Codex review — all findings addressed or documented as known limitations
Fire-and-stash independence empirically proven (revert -> 10 tests fail)
Manual: verify no [hook-trace] errors in agent-shell during subagent workloads
Manual: verify [bg-task-poll] logs appear in stderr during background task waits
E2E: long-running background task poll loop harness (tracked as claude-code-acp-uzw) — verified ~35s background task triggers poll loop, logs output file activity and aggregated summaries to STDERR, consumes task_notification cleanly, Turn B responds to "yes" without contamination (results in x.poll-harness-results.txt)
Upstream: file SDK issue for local_bash result deferral

benbrandt · 2026-03-03T13:44:07Z

@timvisher-dd how did you testing go/

timvisher-dd · 2026-03-03T15:48:47Z

@timvisher-dd how did you testing go/

Still working through it! Do you mind if I leave this open in Draft?

benbrandt · 2026-03-03T19:38:43Z

All good, thanks for testing!

timvisher-dd · 2026-03-05T02:30:07Z

@benbrandt I'd like to do a bit of manual testing tomorrow with this change but I'm confident that it's directionally correct. Does that seem right to you?

timvisher-dd · 2026-03-06T17:03:42Z

@benbrandt This is feeling pretty good to me. Definitely give it a critical eye because it's a whack-a-mole style problem but I've been running it all day in this state and haven't seen a spurious Notice or odd shell behavior from agent-shell.

timvisher-dd · 2026-03-10T13:15:24Z

Actually I guess I take that back again. Even on my dev integration branches I still see problems with this every now and then. I'm clearly not understanding the full scope of the interactions. I'll drop this back to draft because it's helpful to remind me to keep pressing on it it but if y'all'd rather it be closed for now I'd be fine with that, too. Sorry for the noise here. (－‸ლ)

timvisher-dd · 2026-03-10T14:39:50Z

Actually I guess I take that back again. Even on my dev integration branches I still see problems with this every now and then. I'm clearly not understanding the full scope of the interactions. I'll drop this back to draft because it's helpful to remind me to keep pressing on it it but if y'all'd rather it be closed for now I'd be fine with that, too. Sorry for the noise here. (－‸ლ)

LOL globdammit actually I think the issue was that my local integration branch wasn't including this fix this morning. (ﾉಥ益ಥ）ﾉ ┻━┻

timvisher-dd · 2026-03-12T23:35:03Z

@benbrandt The latest code here is behaving promisingly. I think I may finally have cracked something. xD

…g prompt When a background Bash task (run_in_background) completes, the SDK emits a task_notification followed by an internal model turn after the initial result message. prompt() was returning at the first result, leaving the internal turn in the iterator buffer. The next prompt() call would then process stale background task output instead of the user's actual input. Track pending background tasks via task_started messages and peek at the SDK's internal queue after receiving a result. If a task_notification is queued, defer returning and continue consuming the internal turn. This is a pragmatic workaround for the SDK not deferring local_bash background tasks the way it defers local_agent tasks (upstream issue to be filed). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…loop yield Two issues found during E2E harness testing: 1. Only track local_bash tasks in pendingTaskIds — the SDK already defers results for local_agent tasks, so Agent subagent task_started events were inflating the count and the notification never arrives via task_notification (agents complete through the tool flow). 2. Yield to the event loop (setTimeout(0)) before peeking the SDK queue. The task_notification is enqueued asynchronously after the result message; peeking synchronously missed it every time. The yield gives the SDK a tick to process the background task completion. E2E harness: 3/3 iterations now consume the internal turn during Turn A (verified by bg task completion text appearing in Turn A's response and Turn B correctly acting on "yes" by creating summary files). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Two gaps found during E2E harness testing: 1. local_agent task_started must not trigger internal turn detection — the SDK already handles agent task deferral. Without this test, agent task IDs inflated pendingTaskIds and were never resolved. 2. task_notification arriving during the setTimeout(0) yield must still be consumed. The mock simulates the real-world race by pushing internal turn messages into the queue asynchronously. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Codex review flagged that the comment said "microtask yield" but setTimeout(0) is a macrotask. Updated to accurately describe the timing semantics and document the best-effort nature. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Two unit test gaps identified from harness/codex analysis: 1. Warning path: when task_notification never arrives within the setTimeout(0) yield, prompt() should return (not hang) and log a warning with the unresolved task IDs. 2. Failed task_notification: status "failed" should clear pendingTaskIds just like "completed", preventing false-positive internal turn detection on subsequent results. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ack-to-back internal turns Three remaining gaps from codex review: 1. stopped task_notification status clears pendingTaskIds (same as failed/completed — verifies all terminal statuses work) 2. error_during_execution result with pending bg tasks does NOT drain internal turns (documents known limitation — drain only runs for result/success) 3. Multiple back-to-back background task completions producing two consecutive internal turns are both consumed in a single prompt() call Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…shing Real setTimeout delays (5-50ms) in tests are flaky under CI load and add unnecessary wall-clock time. Replace with: - tools.test.ts: flushMicrotasks() helper that chains Promise.resolve() calls to flush the .then() microtask queue deterministically - bg-task-leak.test.ts: vi.useFakeTimers() + vi.advanceTimersByTimeAsync(0) for the async task_notification test that needs both the production setTimeout(0) yield and the test's deferred push to fire Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…d document depth Two improvements from codex review of the timer fix: 1. Add Promise.resolve() flush before vi.advanceTimersByTimeAsync(0) to ensure the production setTimeout(0) is registered before advancing 2. Document the flushMicrotasks depth (5 iterations) with rationale Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- tools.test.ts: replace unused vi import with afterEach import - bg-task-leak.test.ts: remove unused updates destructuring - bin/test: new script that runs the same checks as CI (format, lint, build, test) against git-tracked files only Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The test helpers (makeResultMessage, makeNormalTurnMessages, makeBgTaskInternalTurnMessages) return untyped SDK message shapes that get spliced together with task_started/task_notification objects. TypeScript inferred strict element types from the literal returns, causing tsc errors when splice inserted objects with different shapes. Explicitly typing returns as any/any[] matches how the mock Query already uses any[] and avoids 11 spurious tsc errors. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Uses yq to extract run steps from .github/workflows/ci.yml so the script stays in sync automatically if CI changes. Skips npm ci (deps already installed locally) and overrides format:check to use git-tracked files only (matching CI's clean checkout behavior). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Three categories of build errors fixed: 1. @anthropic-ai/claude-code/sdk-tools.js: module moved to @anthropic-ai/claude-agent-sdk/sdk-tools.js — update import path 2. ClientCapabilities.auth: ACP extension property not yet in the type definition — cast to any for the two access sites 3. @anthropic-ai/claude-agent-sdk/embed: module only exists in single-file bun builds — add src/embed.d.ts type declaration shim Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

iP() → likely hasRunningDeferrableTasks q4 → likely InputStreamQueue or MessageBuffer Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The SDK ships sdk-tools.d.ts but doesn't export ./sdk-tools in its package.json exports map. With a stale local node_modules (0.2.68), tsc resolved the .js import via the loose .d.ts file. After npm ci installs 0.2.71 (matching package.json), tsc can't resolve the path. Add a declare module shim with a triple-slash reference to the SDK's sdk-tools.d.ts so tsc can resolve the import regardless of the SDK's exports configuration. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… npm ci The previous version skipped npm ci entirely, assuming local deps were correct. This let a stale SDK version (0.2.68 vs 0.2.71) mask a build error that CI caught. Now bin/test checks if node_modules matches package-lock.json (via npm ls) and runs npm ci if out of sync. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The bg-task-leak tests hung because the mock query never produced a user replay message. In production, session.input.push() causes the SDK to replay the user message through the query iterator, setting promptReplayed = true before the result arrives. Without this, every result was consumed as a "background task result" and the loop blocked forever on the next query.next(). Patch createAgentWithSession so input.push() splices a replay message into the mock queue right before the first result, matching SDK behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ection promptReplayed depended on the SDK replaying the user message through the query iterator before the result arrived. When the SDK omits or delays the replay, promptReplayed stays false and every result is consumed as a "background task result" — causing the real response to be discarded. Replace with backgroundInitPending: set true on system/init, cleared by any real activity (stream_events, assistant messages, compact_boundary, local_command_output, user replay). If still true when result arrives, it's a background task (init→result with no intervening activity). This matches the approach from timvisher/fix-prompt-replay-hang (d21169d) and doesn't depend on user message replay timing. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The previous approach yielded once with setTimeout(0) and peeked at queue[0] — a race where task_notification hadn't landed yet caused the model to respond to stale background task output on the next prompt. Replace with a poll loop (1s interval, 30s inactivity timeout) that: - Scans the full SDK queue for task_notification (not just [0]) - Monitors output file growth via fsp.stat as a heartbeat - Checks session.cancelled each iteration for prompt responsiveness - Caches output paths from terminal_output even if they arrive before task_started (earlyOutputPaths Map) Changes pendingTaskIds from Set<string> to Map<string, {outputPath}> to carry the output file path per task. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Update delayed-notification test for 1s poll interval (was setTimeout(0)) - Update never-arrives test to use fake timers past 30s timeout - Add test: task_notification arriving mid-poll (5s delay) - Add test: cancellation during poll loop returns immediately - Add test: queue scan detects task_notification behind other messages - Add test: poll loop with fs.stat mock (timeout path) - Add vi.mock("node:fs/promises") for controllable stat behavior Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Extend pendingTaskIds entry with taskType, toolUseId, firstSeenAt, and lastActivityAt to carry richer context for diagnostics. Add three log points (stderr only, no ACP sessionUpdate): - [bg-task-poll] entering: fires immediately on poll loop entry - [bg-task-poll] waiting: fires every 30s with inactiveFor/remaining - [bg-task-poll] output activity: fires on each file size change - [bg-task-timeout]: now per-task with full structured fields All logs are throttled to avoid spam. Cancellation still short-circuits before any logging. No behavior changes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The 30s inactivity timeout was a compromise that could still leak internal turns into the next prompt if it fired before task_notification arrived. Remove it entirely — the poll loop now runs indefinitely (`while (0 < pendingTaskIds.size)`) until all background tasks resolve or the user cancels. Logging: - Single aggregated line on entry and every 30s with all pending tasks - Each task shows task_id, task_type, tool_use_id, outputPath, inactiveFor - Line warns: "cancellation risks later prompt contamination" - Per-task output activity log retained Tests: - Remove timeout test (would hang without timeout exit) - Remove fs.stat mock test (relied on timeout exit) - Remove unused vi.mock("node:fs/promises") infrastructure - Add test verifying aggregated log fires immediately + at 30s, using cancellation as escape hatch Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

When a cancel arrived while the SDK was emitting background task init → result sequences, backgroundInitPending consumed the result and looped back to await session.query.next() — but the SDK had already been cancelled and would never produce "our" result. The prompt hung forever, leaving the shell unrecoverable. Move session.cancelled check before backgroundInitPending so cancellation is detected immediately on any result message. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Document secondary benefit: keeping the turn open prevents the idle-then-flood rendering behavior in agent-shell - Check off E2E poll harness test plan item with results Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Covers the scenario where cancellation arrives while the prompt loop is consuming background task init → result cycles. Before the fix (0516bd8), session.cancelled was checked after backgroundInitPending, so the loop re-entered await session.query.next() on a cancelled SDK and hung forever. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

pr.md is a local draft file used by the timvisher_gh workflow and should not be checked in. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

cla-bot bot added the cla-signed label Feb 27, 2026

timvisher-dd mentioned this pull request Feb 27, 2026

Stale session/update notifications leak into next prompt turn xenodium/agent-shell#345

Closed

timvisher-dd marked this pull request as ready for review February 27, 2026 01:16

timvisher-dd marked this pull request as draft February 27, 2026 04:25

xenodium mentioned this pull request Feb 27, 2026

[Bug] Tool calls and messages being delayed until I send new prompts xenodium/agent-shell#201

Closed

ElleNajt mentioned this pull request Feb 27, 2026

fix: drain background task turns before processing prompt response #359

Closed

xenodium mentioned this pull request Mar 3, 2026

session/load support is currently broken xenodium/agent-shell#361

Closed

5 tasks

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch 2 times, most recently from 8068985 to 07a7bf1 Compare March 5, 2026 02:29

timvisher-dd changed the title ~~fix: await pending PostToolUse hooks before prompt() returns~~ fix: suppress expected PostToolUse errors for subagent and server tools Mar 5, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from 07a7bf1 to 5ed2d93 Compare March 5, 2026 16:46

timvisher-dd changed the title ~~fix: suppress expected PostToolUse errors for subagent and server tools~~ fix: defer PostToolUse hook execution when callback not yet registered Mar 5, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch 4 times, most recently from a9aa618 to fa6761d Compare March 6, 2026 14:00

timvisher-dd marked this pull request as ready for review March 6, 2026 17:02

timvisher-dd marked this pull request as draft March 10, 2026 13:15

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from fa6761d to c1746d8 Compare March 10, 2026 14:40

timvisher-dd changed the title ~~fix: defer PostToolUse hook execution when callback not yet registered~~ fix: replace blocking PostToolUse hook deferral with non-blocking fire-and-stash model Mar 11, 2026

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from 17ec70d to 04c4e76 Compare March 12, 2026 17:51

timvisher-dd changed the title ~~fix: replace blocking PostToolUse hook deferral with non-blocking fire-and-stash model~~ fix: stale PostToolUse notifications and background task output leak Mar 12, 2026

timvisher-dd and others added 29 commits March 23, 2026 08:52

style: fix prettier formatting in test files

7884fdf

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

fix: improve bin/test comment explaining git-tracked-only format check

954d96f

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

docs: annotate minified SDK names with likely original names

00c9470

iP() → likely hasRunningDeferrableTasks q4 → likely InputStreamQueue or MessageBuffer Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

style: format bg-task-leak test with prettier

4ee2b23

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

chore: remove pr.md from tracking

8088cb2

pr.md is a local draft file used by the timvisher_gh workflow and should not be checked in. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

fix: remove duplicate imports in tools.ts introduced by rebase

bbc0f2c

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

timvisher-dd force-pushed the fix/stale-post-tool-hook-notifications branch from fead2be to bbc0f2c Compare March 23, 2026 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stale PostToolUse notifications and background task output leak#353

fix: stale PostToolUse notifications and background task output leak#353
timvisher-dd wants to merge 30 commits intoagentclientprotocol:mainfrom
timvisher-dd:fix/stale-post-tool-hook-notifications

timvisher-dd commented Feb 27, 2026 •

edited

Loading

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 3, 2026

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 5, 2026

Uh oh!

timvisher-dd commented Mar 6, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timvisher-dd commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Three contamination layers

Why the user's prompt gets "swallowed"

Step 1: Background task launched during previous turn

Step 2: User sends their next prompt

Step 3: Adapter consumes stale internal turn (Layer 1 — working)

Step 4: Model responds — but to the wrong content (Layer 2 — broken)

Step 5: Visible response addresses linter output, not the user's question

What the user sees in agent-shell

Observed in this investigation session

Root Cause Analysis

Bug 1: PostToolUse hook timing

Bug 2: SDK internal turns from background tasks

Independence Proof

Fix 1: Non-blocking fire-and-stash (tools.ts)

Fix 2: Indefinite poll loop for internal turn consumption (acp-agent.ts)

Secondary benefit: agent-shell rendering continuity

Design evolution

How it works

Key design decisions

Known limitations

Changes

Source

Tests (159 pass, 6 skipped, 0 flaky)

Infrastructure

E2E Verification

Codex Review Summary

Risk

Test plan

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 3, 2026

Uh oh!

benbrandt commented Mar 3, 2026

Uh oh!

timvisher-dd commented Mar 5, 2026

Uh oh!

timvisher-dd commented Mar 6, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 10, 2026

Uh oh!

timvisher-dd commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timvisher-dd commented Feb 27, 2026 •

edited

Loading