🪝 feat: updatedPrompt on UserPromptSubmit + Message-Content Redaction Primitive#230
🪝 feat: updatedPrompt on UserPromptSubmit + Message-Content Redaction Primitive#230dustinhealy wants to merge 4 commits into
Conversation
A reusable text-scrubbing primitive any consumer of @librechat/agents
can call to redact credential-shaped substrings. Designed as a
building block, not a baked-in policy: ships zero opinionated
patterns. The caller supplies the pattern catalog; the primitive
applies it.
Exports (re-exported from the package index):
- DEFAULT_REDACTION_TEXT = '[REDACTED]'
- types: SensitivePattern, PatternMatch, MessageContentRedactionConfig
- redactSensitiveText(text, config) - plain string
- redactSensitiveValue(value, config) - recursive walk over objects/arrays
- filterMessageContent(messages, config) - LangChain BaseMessage[] arrays
with clone-on-mutate that preserves prototype so instanceof still
holds; only message.content is touched
Each pattern's first capture group is preserved in the redacted output
so trace/log readers can still tell which family of secret was matched.
Match aggregation surfaces PatternMatch[] with { patternId,
patternLabel, count } for UI/telemetry consumers.
Extends the existing UserPromptSubmit hook surface so a hook can rewrite the user's prompt before the model turn runs. The hook surface already supports decision, additionalContext, preventContinuation, and PreToolUse.updatedInput; this adds the last missing capability for upstream guardrails to redact (rather than just block) without forking the message array. - UserPromptSubmitHookOutput: add optional updatedPrompt: string - AggregatedHookResult: add optional updatedPrompt: string - executeHooks.ts: applyUpdatedPrompt fold, last-writer-wins in registration order (matches updatedInput / updatedOutput semantics) - run.ts: applyPromptOverride helper rewrites the last human message in place with the aggregated updatedPrompt before continuing into the graph. Handles both string-content and multimodal content-block arrays: in the array case, the first text block's text is replaced with updatedPrompt and any other text blocks are dropped; non-text blocks (images, files) are preserved in original order Tests cover the fold, type-guard, multi-hook last-writer-wins, coexistence with decision/additionalContext, and an integration where the messageContentRedaction primitive is used as a UserPromptSubmit hook for credential scrubbing. Closes #229.
Three review-driven fixes: - redactSensitiveText now throws TypeError at config-resolve time if any pattern lacks the global (g) flag. Without /g, String.replace only swaps the first match per string and subsequent matches leak silently. The library should fail loudly, not degrade. - Documents the prefix-capture-group contract on SensitivePattern. The replacement is always $1<redactionText>, which requires the first capture group to be the visible prefix at the start of the match. Patterns whose first group is mid-match (e.g. /secret=([a-z]+)/g) corrupted output before; the doc makes the contract explicit and points operators at the empty-group escape for "no visible prefix." - applyPromptOverride now preserves the message prototype (so instanceof checks on subclasses still hold) and every own property other than content. Previously the helper constructed a fresh HumanMessage with only four base fields, silently dropping tool_calls, usage_metadata, custom subclass fields, and the subclass identity itself. Matches the cloneMessageWithContent pattern already used in messageContentRedaction.ts. Also documents the multimodal collapse semantics (the hook sees concatenated text from extractPromptText and can only return one string, so text blocks collapse to one block; non-text blocks preserved).
Drops redactSensitiveValue and filterMessageContent from the public surface. Neither has an in-tree consumer; consumers like LibreChat only need redactSensitiveText. Smaller API surface, less to defend on review. Also adds a duplicate-pattern-id guard in resolveConfig (the match aggregator keys by id; silent merging was a footgun), and drops the "skips non-string updatedPrompt values" test (it cast a number to string at runtime to exercise an impossible TypeScript path).
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61632c1ebc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ); | ||
| } | ||
| seenIds.add(id); | ||
| if (!pattern.global) { |
There was a problem hiding this comment.
Reject sticky redaction patterns
When a caller supplies a regexp with both g and y flags, this validation passes, but redactStringInto resets lastIndex to 0 before replace; sticky regexps only match at that exact position, so a secret later in the prompt (for example "my key sk-ant-...") is left unchanged and matches stays empty. Since this helper is validating caller-supplied patterns to avoid silent leaks, also reject pattern.sticky or clone the regexp without y.
Useful? React with 👍 / 👎.
Summary
Adds two complementary primitives that together let consumers rewrite the user's prompt before it reaches the model.
updatedPrompton UserPromptSubmit hook output. Hooks can now return{ updatedPrompt: string }and the value is applied to the prompt that flows into the model. Aggregation follows the existing pattern: parallel hooks fold via last-writer-wins, matching howupdatedInputis handled.applyPromptOverrideinrun.tsperforms the swap using a prototype-preserving clone so downstream consumers that checkinstanceof HumanMessagestill work, and handles both string and multimodal content-block shapes.redactSensitiveTextprimitive (src/messageContentRedaction.ts). Pure helper that scrubs caller-supplied regex patterns from a string, keeping the first capture group as a visible prefix so consumers can still tell which family of secret matched (sk-[REDACTED]vsBearer [REDACTED]). No built-in pattern catalog. Validates the/gflag, rejects duplicate pattern ids, returns per-pattern match counts.The two are decoupled. The hook field is useful to any consumer doing prompt rewriting (PII, content policy, translation, normalization). The redaction helper is useful as a standalone scrubbing utility.
Test plan
npm test -- --testPathPatterns=updatedPromptpassesnpm test -- --testPathPatterns=message-content-redactionpasses