fix(poll-loop): suppress duplicate text when send_message fires mid-turn#2531
fix(poll-loop): suppress duplicate text when send_message fires mid-turn#2531cfis wants to merge 1 commit into
Conversation
|
cherrypicking this. thanks! |
|
one issue i quickly ran into: with the boolean flag, it drops subsequent messages even if they are different. e.g. agent said "on it - looking at your calendar" mid-turn, the final response was the availability from the calendar but it never delivered that because turn_send_invoked was set to true by the mid-turn message. i'm mitigating this by comparing the contents of the mid-turn message to the subsequent responses... and only skipping when they are verbatim. there may be a more comprehensive solution that compares and drops based on similarities as well (not just verbatim). |
|
@taslim great catch — you're right, the boolean approach drops legitimate distinct content. I just pushed a revised version that does per-payload verbatim matching instead. What changed:
Your calendar scenario now flows correctly: Verified live on a running install:
Known limitation: paraphrased duplicates aren't caught (e.g. tool sends "X"; SDK closing text says "I've sent X to the channel"). I considered similarity scoring but it's fragile and the failure mode (suppressing real content) is much worse than the success mode (catching a paraphrased duplicate). Verbatim-only is the narrower correct gate; better to ship the occasional paraphrased duplicate than to drop a real answer. Open to revisiting if you have a less fragile approach in mind. |
ae46adc to
f391776
Compare
|
@cfis i agree that similarity comparison is fragile. tried and dropped it as well. verbatim-only makes sense. small normalization tweak you might want: also - your note about not unit testing this because of the SDK seam tracked when it was the boolean version. with per-payload comparison, dispatchResultText is just string matching - no SDK needed. wrote some tests on my fork - happy to push to your branch if you want. |
When an agent calls send_message or send_file as an MCP tool mid-turn,
the Claude SDK still emits a closing-text result event afterward. If
that result's <message to="..."> block body is a verbatim repeat of
what the tool already shipped, nanoclaw was delivering it as a second
chat row — the user saw the tool's message followed by a duplicate.
Fix: record each text payload that send_message / send_file delivers
into turn_sent_payloads (a JSON-encoded array in session_state).
dispatchResultText filters parsed <message to="name">body</message>
blocks against that set; bodies that are a verbatim match for an
already-sent payload are skipped with a log line. Distinct content
the agent emits as part of the same final result (e.g. a progress
update via send_message followed by the actual answer in the result)
flows through normally.
Per-payload comparison is deliberate. An earlier iteration used a
boolean "did send_message fire this turn?" flag and suppressed the
entire result text on any send — but that conflates "the tool was
called" with "everything after it is a duplicate." It silently
dropped legitimate distinct content: send_message("looking it up") +
result with the calendar availability → user never saw the answer.
The verbatim-match check is the narrower correct gate. Paraphrased
duplicates aren't caught; that's the deliberate tradeoff. Better to
ship an occasional paraphrased duplicate than to suppress a real
answer.
add_reaction does not record a payload (reaction + closing summary is
a valid reply pattern).
turn_sent_payloads lives in SQLite session_state, not a module
variable. The nanoclaw MCP server (which owns send_message and
send_file) runs in a separate process from the poll-loop: index.ts
spawns it via `bun run mcp-tools/index.ts` and the SDK connects over
stdio. SQLite is the only IPC channel between the two processes;
module-level state is invisible across the boundary, leaving
suppression dead code.
Cleared at four boundaries: top of runPollLoop() (handles SIGKILL
mid-turn — stale payloads would otherwise survive into the next
container's first turn and suppress legitimate matches), in the
stale-session-invalid catch (so retry paths don't inherit the failed
attempt's payloads), in the outer finally (defense in depth at turn
boundary), and after each result event in processQuery (so follow-up
messages pushed into the same open query stream don't carry payloads
from a prior result).
Verified live against the real Claude Agent SDK:
- Verbatim match: send_message("X"), result contains
<message to="cli">X</message> → suppression log fires,
outbound has one row (the tool's write).
- Distinct content: send_message("Looking it up"), result contains
<message to="cli">The answer is 42</message> → no suppression
log; the block flows through dispatchResultText normally.
Thanks @taslim for the false-positive catch on the boolean version.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f391776 to
6af88f8
Compare
|
@taslim thanks - applied both your ideas in 6af88f8. Normalization tweak- Then added 13 tests (
Happy to review any additional ones if I missed test cases you were considering — push directly to |
|
@cfis your tests are actually much better than the ones i had planned. tagging @gavrielc or @gabi-simons to review and help merge this please. |
Type of Change
Description
What. When an agent calls `send_message` or `send_file` as an MCP tool mid-turn, the Claude SDK still emits a closing-text result event afterward. nanoclaw was dispatching that closing text as a second chat row, so the user saw the tool's message followed by a near-duplicate summary from the model wrapping up. Now suppressed.
Why. Two delivery paths for the same intent (the tool wrote it, and then the SDK's closing text described it) felt like a bug from the user's side, especially on chat channels with low information density (Signal, WhatsApp, CLI) where the duplicate is jarring.
How it works.
How it was tested. Live against the real Claude Agent SDK on a running install:
The mock-based unit tests in `integration.test.ts` deliberately don't cover this — the bug is in the seam between the real SDK and the real cross-process MCP server, which mock providers can't reproduce. The 89 existing tests all pass.
Usage. Transparent. Existing `send_message` / `send_file` callers (agents) need no change. The improvement is visible to users as cleaner replies on send-mid-turn flows.
For Skills