fix: harden summarization against prompt injection persistence#137
Open
hhe48203-ctrl wants to merge 1 commit intoMartian-Engineering:mainfrom
Open
Conversation
- Replace dangerous "Follow user instructions exactly" system prompt with explicit injection-defense instructions - Add untrusted-data warnings to all summarization prompts (leaf, D1, D2, D3+) so the summarizer model ignores embedded directives - Add <meta trust="untrusted"> taint labels to summary XML wrapper so downstream models treat summaries as historical reference, not instructions - Add injection-awareness guidance to LCM Recall system prompt addition Closes Martian-Engineering#71
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses #71 — prompt injections embedded in conversation history can survive compaction and be reinserted as
usermessages, giving them maximum influence in later turns.This PR hardens the summarization and assembly pipeline against injection persistence:
<meta type="historical_context" trust="untrusted">taint labels so downstream models understand summaries are historical reference, not current instructionsScope
This PR focuses on the content-layer mitigations (recommendations 2–4 from the issue). Changing the message
rolefrom"user"to a non-user role (recommendation 1) would require upstream API changes in OpenClaw and is left for a follow-up.Test plan
<meta trust="untrusted">tags around summary content