Skip to content

fix: skip ingesting empty error/aborted assistant messages#172

Open
craigamcw wants to merge 1 commit intoMartian-Engineering:mainfrom
craigamcw:fix/skip-empty-error-messages
Open

fix: skip ingesting empty error/aborted assistant messages#172
craigamcw wants to merge 1 commit intoMartian-Engineering:mainfrom
craigamcw:fix/skip-empty-error-messages

Conversation

@craigamcw
Copy link

@craigamcw craigamcw commented Mar 24, 2026

Summary

  • Ingestion guard: ingestSingle now skips assistant messages where stopReason is "error" or "aborted" and content is empty ([], "", null). Messages with partial content before the error are still preserved.
  • Assembly guard (defense-in-depth): resolveMessageItem skips empty assistant messages during context assembly when both the stored content text and message_parts are empty — catching any previously-ingested empty messages without affecting tool-call-only assistant turns.

Problem

When a cloud LLM provider returns a transient 500 error, OpenClaw appends an assistant message with stopReason: "error" and empty content to the session JSONL. LCM ingests these into the database. On retry, the accumulated empty messages are assembled into context, creating a positive feedback loop:

  1. API returns 500 → empty error message appended to session
  2. LCM ingests the empty message into its database
  3. Next turn: assembler includes the empty message in context
  4. API receives increasingly large payload with many empty assistant turns
  5. API continues to fail → more empty messages ingested → repeat

In production, this manifested as a permanently broken agent where the LCM database had accumulated 175 messages (dozens empty/duplicated) in a 31KB system prompt with 32 tools — the cloud model API rejected every request with a 500. The only recovery was manual database surgery.

Test plan

  • New test: skips ingest for assistant messages with stopReason error and empty content — covers empty array, empty string, aborted, normal messages, and error-with-content (all should behave correctly)
  • All 390 existing tests pass (no regressions)
  • Verified in production: agent recovered after clearing corrupted LCM data; fix prevents recurrence

🤖 Generated with Claude Code

When an API call returns a 500 or similar transient error, OpenClaw
appends an assistant message with stopReason "error" and empty content
to the session. LCM ingests these into the database, and on retry the
accumulated empty messages are assembled into context — creating a
positive feedback loop where each retry sends a larger, malformed
payload that continues to fail.

This commit adds two defenses:

1. engine.ts (ingestSingle): Skip assistant messages where stopReason
   is "error" or "aborted" AND content is empty ([], "", null). Messages
   with actual partial content before the error are still preserved.

2. assembler.ts (resolveMessageItem): Defense-in-depth — skip empty
   assistant messages during context assembly when both the stored
   content text and message_parts are empty. This catches any
   previously-ingested empty messages without affecting legitimate
   assistant messages that have tool calls (which have empty text
   content but non-empty parts).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant