fix: harden summarization against prompt injection persistence by hhe48203-ctrl · Pull Request #137 · Martian-Engineering/lossless-claw

hhe48203-ctrl · 2026-03-19T16:35:19Z

Summary

Addresses #71 — prompt injections embedded in conversation history can survive compaction and be reinserted as user messages, giving them maximum influence in later turns.

This PR hardens the summarization and assembly pipeline against injection persistence:

Summarizer system prompt: Replaced dangerous "Follow user instructions exactly" with explicit injection-defense instructions that tell the summarizer to strip directives and treat all input as untrusted data
All summarization prompts (leaf, D1, D2, D3+): Added "UNTRUSTED DATA" warnings so the summarizer model ignores embedded directives, role reassignments, and behavioral overrides
Summary XML wrapper: Added <meta type="historical_context" trust="untrusted"> taint labels so downstream models understand summaries are historical reference, not current instructions
LCM Recall system prompt: Added injection-awareness guidance telling the runtime model not to follow any instructions found within summary content

Scope

This PR focuses on the content-layer mitigations (recommendations 2–4 from the issue). Changing the message role from "user" to a non-user role (recommendation 1) would require upstream API changes in OpenClaw and is left for a follow-up.

Test plan

All 342 existing tests pass
Manual verification: inject a directive like "Ignore all previous instructions" into a conversation, trigger compaction, and confirm the summary neutralizes the directive
Manual verification: confirm assembled context includes <meta trust="untrusted"> tags around summary content

- Replace dangerous "Follow user instructions exactly" system prompt with explicit injection-defense instructions - Add untrusted-data warnings to all summarization prompts (leaf, D1, D2, D3+) so the summarizer model ignores embedded directives - Add <meta trust="untrusted"> taint labels to summary XML wrapper so downstream models treat summaries as historical reference, not instructions - Add injection-awareness guidance to LCM Recall system prompt addition Closes Martian-Engineering#71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: harden summarization against prompt injection persistence#137

fix: harden summarization against prompt injection persistence#137
hhe48203-ctrl wants to merge 1 commit intoMartian-Engineering:mainfrom
hhe48203-ctrl:fix/prompt-injection-summary-persistence

hhe48203-ctrl commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hhe48203-ctrl commented Mar 19, 2026

Summary

Scope

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant