Skip to content

🪝 feat: updatedPrompt on UserPromptSubmit + Message-Content Redaction Primitive#230

Closed
dustinhealy wants to merge 4 commits into
mainfrom
feat/userpromptsubmit-prompt-rewrite
Closed

🪝 feat: updatedPrompt on UserPromptSubmit + Message-Content Redaction Primitive#230
dustinhealy wants to merge 4 commits into
mainfrom
feat/userpromptsubmit-prompt-rewrite

Conversation

@dustinhealy

@dustinhealy dustinhealy commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds two complementary primitives that together let consumers rewrite the user's prompt before it reaches the model.

updatedPrompt on UserPromptSubmit hook output. Hooks can now return { updatedPrompt: string } and the value is applied to the prompt that flows into the model. Aggregation follows the existing pattern: parallel hooks fold via last-writer-wins, matching how updatedInput is handled. applyPromptOverride in run.ts performs the swap using a prototype-preserving clone so downstream consumers that check instanceof HumanMessage still work, and handles both string and multimodal content-block shapes.

redactSensitiveText primitive (src/messageContentRedaction.ts). Pure helper that scrubs caller-supplied regex patterns from a string, keeping the first capture group as a visible prefix so consumers can still tell which family of secret matched (sk-[REDACTED] vs Bearer [REDACTED]). No built-in pattern catalog. Validates the /g flag, rejects duplicate pattern ids, returns per-pattern match counts.

The two are decoupled. The hook field is useful to any consumer doing prompt rewriting (PII, content policy, translation, normalization). The redaction helper is useful as a standalone scrubbing utility.

Test plan

  • npm test -- --testPathPatterns=updatedPrompt passes
  • npm test -- --testPathPatterns=message-content-redaction passes
  • Full test suite green locally
  • CI green on the PR
  • Downstream LibreChat consumer (companion PR) builds and exercises both primitives end-to-end

A reusable text-scrubbing primitive any consumer of @librechat/agents
can call to redact credential-shaped substrings. Designed as a
building block, not a baked-in policy: ships zero opinionated
patterns. The caller supplies the pattern catalog; the primitive
applies it.

Exports (re-exported from the package index):
- DEFAULT_REDACTION_TEXT = '[REDACTED]'
- types: SensitivePattern, PatternMatch, MessageContentRedactionConfig
- redactSensitiveText(text, config)   - plain string
- redactSensitiveValue(value, config) - recursive walk over objects/arrays
- filterMessageContent(messages, config) - LangChain BaseMessage[] arrays
  with clone-on-mutate that preserves prototype so instanceof still
  holds; only message.content is touched

Each pattern's first capture group is preserved in the redacted output
so trace/log readers can still tell which family of secret was matched.
Match aggregation surfaces PatternMatch[] with { patternId,
patternLabel, count } for UI/telemetry consumers.
Extends the existing UserPromptSubmit hook surface so a hook can
rewrite the user's prompt before the model turn runs. The hook
surface already supports decision, additionalContext,
preventContinuation, and PreToolUse.updatedInput; this adds the last
missing capability for upstream guardrails to redact (rather than
just block) without forking the message array.

- UserPromptSubmitHookOutput: add optional updatedPrompt: string
- AggregatedHookResult: add optional updatedPrompt: string
- executeHooks.ts: applyUpdatedPrompt fold, last-writer-wins in
  registration order (matches updatedInput / updatedOutput semantics)
- run.ts: applyPromptOverride helper rewrites the last human message
  in place with the aggregated updatedPrompt before continuing into
  the graph. Handles both string-content and multimodal content-block
  arrays: in the array case, the first text block's text is replaced
  with updatedPrompt and any other text blocks are dropped; non-text
  blocks (images, files) are preserved in original order

Tests cover the fold, type-guard, multi-hook last-writer-wins,
coexistence with decision/additionalContext, and an integration where
the messageContentRedaction primitive is used as a UserPromptSubmit
hook for credential scrubbing.

Closes #229.
Three review-driven fixes:

- redactSensitiveText now throws TypeError at config-resolve time if
  any pattern lacks the global (g) flag. Without /g, String.replace
  only swaps the first match per string and subsequent matches leak
  silently. The library should fail loudly, not degrade.

- Documents the prefix-capture-group contract on SensitivePattern.
  The replacement is always $1<redactionText>, which requires the
  first capture group to be the visible prefix at the start of the
  match. Patterns whose first group is mid-match (e.g.
  /secret=([a-z]+)/g) corrupted output before; the doc makes the
  contract explicit and points operators at the empty-group escape
  for "no visible prefix."

- applyPromptOverride now preserves the message prototype (so
  instanceof checks on subclasses still hold) and every own
  property other than content. Previously the helper constructed a
  fresh HumanMessage with only four base fields, silently dropping
  tool_calls, usage_metadata, custom subclass fields, and the
  subclass identity itself. Matches the cloneMessageWithContent
  pattern already used in messageContentRedaction.ts. Also documents
  the multimodal collapse semantics (the hook sees concatenated
  text from extractPromptText and can only return one string, so
  text blocks collapse to one block; non-text blocks preserved).
Drops redactSensitiveValue and filterMessageContent from the public
surface. Neither has an in-tree consumer; consumers like LibreChat
only need redactSensitiveText. Smaller API surface, less to defend
on review.

Also adds a duplicate-pattern-id guard in resolveConfig (the match
aggregator keys by id; silent merging was a footgun), and drops the
"skips non-string updatedPrompt values" test (it cast a number to
string at runtime to exercise an impossible TypeScript path).
@dustinhealy

Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61632c1ebc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

);
}
seenIds.add(id);
if (!pattern.global) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject sticky redaction patterns

When a caller supplies a regexp with both g and y flags, this validation passes, but redactStringInto resets lastIndex to 0 before replace; sticky regexps only match at that exact position, so a secret later in the prompt (for example "my key sk-ant-...") is left unchanged and matches stays empty. Since this helper is validating caller-supplied patterns to avoid silent leaks, also reject pattern.sticky or clone the regexp without y.

Useful? React with 👍 / 👎.

@danny-avila danny-avila closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants