feat: add native llm guard pipeline#523
feat: add native llm guard pipeline#523jeffscottward wants to merge 3 commits intoRunMaestro:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR implements a TypeScript-native LLM Guard security module for protecting LLM prompts and responses. It detects and handles PII, secrets, and prompt injection threats through configurable pre-processing and post-processing pipelines integrated into the process spawning and output handling flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ProcessHandler as Process Handler
participant LLMGuardPre as LLMGuard Pre
participant ProcessMgr as Process Manager
participant LLMGuardPost as LLMGuard Post
participant Output
Client->>ProcessHandler: spawn(config with prompt)
ProcessHandler->>LLMGuardPre: runLlmGuardPre(prompt)
LLMGuardPre->>LLMGuardPre: Detect PII/Secrets<br/>Anonymize & Vault<br/>Check Prompt Injection
LLMGuardPre-->>ProcessHandler: {sanitizedPrompt, vault, findings, blocked?}
alt Guard Blocked
ProcessHandler-->>Client: Error (blocked)
else Guard Allowed
ProcessHandler->>ProcessHandler: Log findings if any
ProcessHandler->>ProcessMgr: spawn(sanitizedPrompt, llmGuardState)
ProcessMgr->>ProcessMgr: Execute process
ProcessMgr-->>ProcessHandler: response
ProcessHandler->>LLMGuardPost: runLlmGuardPost(response, vault)
LLMGuardPost->>LLMGuardPost: Deanonymize PII<br/>Redact Secrets<br/>Detect Leakage
LLMGuardPost-->>ProcessHandler: {sanitizedResponse, findings, blocked?}
alt Post Guard Blocked
ProcessHandler-->>Output: Blocked message
else Post Guard Allowed
ProcessHandler-->>Output: Sanitized response
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR introduces a native TypeScript LLM guard pipeline that runs before process spawn (pre-scan: secret redaction, PII anonymization, prompt-injection detection) and after result emission (post-scan: PII deanonymization, output secret redaction, PII-leakage detection). Guard state — including the PII vault and input findings — is threaded through Key findings:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant R as Renderer (IPC)
participant PH as process.ts handler
participant LG as llm-guard (pre)
participant V as PiiVault
participant PM as ProcessManager.spawn
participant CP as ChildProcess
participant SH as StdoutHandler
participant LGO as llm-guard (post)
participant BM as BufferManager
R->>PH: process:spawn { prompt, toolType, ... }
PH->>PH: normalizeLlmGuardConfig(settingsStore.get)
alt toolType !== 'terminal' && prompt exists
PH->>LG: runLlmGuardPre(prompt, config)
LG->>LG: redactSecrets(prompt)
LG->>V: anonymizePii(sanitized, vault)
V-->>LG: placeholder mappings stored
LG->>LG: detectPromptInjection(prompt)
LG-->>PH: { sanitizedPrompt, vault, findings, blocked }
alt blocked === true
PH-->>R: throw Error (blocked by guard)
end
PH->>PH: effectivePrompt = sanitizedPrompt
PH->>PH: llmGuardState = { config, vault, inputFindings }
end
PH->>PM: spawn({ ...config, prompt: effectivePrompt, llmGuardState })
PM->>CP: spawn child process
CP-->>SH: stdout data
SH->>SH: parse JSON lines
alt result message
SH->>LGO: applyOutputGuard(sessionId, managedProcess, resultText)
LGO->>LGO: PiiVault.deanonymize(text, vault)
LGO->>LGO: redactSecrets(deanonymized)
LGO->>LGO: detectPiiLeakage(redacted, vault)
LGO-->>SH: guardedText (or block sentinel)
SH->>BM: emitDataBuffered(sessionId, guardedText)
end
Last reviewed commit: 5318d36 |
| function applyReplacements( | ||
| text: string, | ||
| findings: LlmGuardFinding[], | ||
| replacementBuilder: (finding: LlmGuardFinding, index: number) => string | ||
| ): { text: string; findings: LlmGuardFinding[] } { | ||
| const sortedFindings = [...findings].sort((a, b) => b.start - a.start); | ||
| let nextText = text; | ||
|
|
||
| sortedFindings.forEach((finding, reverseIndex) => { | ||
| const index = sortedFindings.length - reverseIndex; | ||
| const replacement = replacementBuilder(finding, index); | ||
| nextText = | ||
| nextText.slice(0, finding.start) + replacement + nextText.slice(finding.end); | ||
| finding.replacement = replacement; | ||
| }); | ||
|
|
||
| return { | ||
| text: nextText, | ||
| findings: sortedFindings.sort((a, b) => a.start - b.start), | ||
| }; | ||
| } |
There was a problem hiding this comment.
Overlapping findings corrupt output text
applyReplacements sorts findings by start descending and applies replacements right-to-left, relying on the invariant that each replacement operates on the original text's positions. However, when two patterns independently match overlapping spans (e.g. SECRET_CONNECTION_STRING matching postgres://sk-abc...@host/db while SECRET_OPENAI_KEY also matches sk-abc... inside that same range), both findings are collected and passed to applyReplacements together. The first replacement modifies the string in-place; the second then slices using stale start/end indices from the original text, producing garbled output — characters from the first replacement will be duplicated or overwritten.
This affects both redactSecrets (multiple secret patterns run on the same text) and anonymizePii (multiple PII patterns run on the same text, e.g. a credit card number overlapping a phone number via the very broad CREDIT_CARD_REGEX).
To fix, deduplicate/merge overlapping findings before applying replacements — keep only the longest (outermost) match when two findings overlap:
function mergeOverlapping(findings: LlmGuardFinding[]): LlmGuardFinding[] {
const sorted = [...findings].sort((a, b) => a.start - b.start || b.end - a.end);
const merged: LlmGuardFinding[] = [];
for (const f of sorted) {
const last = merged[merged.length - 1];
if (last && f.start < last.end) continue; // skip overlapping/contained finding
merged.push(f);
}
return merged;
}Then call applyReplacements(text, mergeOverlapping(findings), ...) in both redactSecrets and anonymizePii.
| @@ -0,0 +1,59 @@ | |||
| export type LlmGuardAction = 'warn' | 'sanitize' | 'block'; | |||
There was a problem hiding this comment.
warn action declared but not implemented
LlmGuardAction includes 'warn' as a valid value, but neither runLlmGuardPre nor runLlmGuardPost have any logic branch for action === 'warn'. At runtime, 'warn' silently behaves identically to 'sanitize' — secrets are still redacted and PII is anonymized, but no warning is surfaced to the caller. If 'warn' is meant to detect-and-log without modifying the prompt, the implementation needs an explicit branch. If it's not yet supported, it should be removed from the union type to prevent callers from configuring a non-functional mode.
| export type LlmGuardAction = 'warn' | 'sanitize' | 'block'; | |
| export type LlmGuardAction = 'sanitize' | 'block'; |
| const PHONE_REGEX = /\b(?:\+?1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?){2}\d{4}\b/g; | ||
| const SSN_REGEX = /\b\d{3}-\d{2}-\d{4}\b/g; | ||
| const IPV4_REGEX = /\b(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b/g; | ||
| const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g; |
There was a problem hiding this comment.
CREDIT_CARD_REGEX is extremely broad and will produce many false positives
/\b(?:\d[ -]*?){13,19}\b/g matches virtually any 13–19 digit sequence separated by optional spaces or hyphens. In typical developer prompts, this pattern will fire on long numeric identifiers, timestamps, hex values rendered as decimals, issue numbers, etc. Even with the Luhn check (which passes for ~10% of random digit strings), the false-positive rate in developer-oriented text is expected to be very high.
A narrower pattern that requires the common four-group card format would significantly reduce false positives:
| const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g; | |
| const CREDIT_CARD_REGEX = /\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g; |
| }, | ||
| { | ||
| type: 'SECRET_OPENAI_KEY', | ||
| regex: /\bsk-[A-Za-z0-9]{20,}\b/g, |
There was a problem hiding this comment.
SECRET_OPENAI_KEY pattern is too permissive
/\bsk-[A-Za-z0-9]{20,}\b/g will match any identifier starting with sk- followed by 20+ alphanumeric characters. The sk- prefix is widely used across many APIs and internal identifiers (e.g. sk-client-prod-identifier-12345abc…). Modern OpenAI API keys use more specific prefixes (sk-proj-, sk-svcacct-) that could be matched more precisely, and the legacy sk- format has a specific length (51 total characters). Tightening the pattern reduces false positives that could silently redact non-sensitive identifiers from prompts:
| regex: /\bsk-[A-Za-z0-9]{20,}\b/g, | |
| { | |
| type: 'SECRET_OPENAI_KEY', | |
| regex: /\bsk-(?:proj-|svcacct-|org-)?[A-Za-z0-9]{20,}T3BlbkFJ[A-Za-z0-9]{20,}\b|\bsk-[A-Za-z0-9]{48}\b/g, | |
| confidence: 0.96, | |
| }, |
Alternatively, if false positives are acceptable, at minimum raise the length threshold and require the sk- to appear in an API-key context (after =, :, etc.) to reduce ambient matching.
| let blockReason: string | undefined; | ||
|
|
||
| if (effectiveConfig.input.detectPromptInjection) { | ||
| const promptInjectionFindings = detectPromptInjection(prompt); |
There was a problem hiding this comment.
Prompt-injection detection runs on original prompt, not sanitized text
detectPromptInjection(prompt) uses the original prompt variable rather than sanitizedPrompt (which has already had secrets redacted and PII anonymized). While this doesn't cause incorrect blocking behavior (only confidence scores drive the block decision), it means the findings array returned from runLlmGuardPre contains injection findings whose start/end positions point into the original text, inconsistent with the positions of the secret/PII findings which point into the sanitized text. If callers ever use these positions to highlight findings in the UI, this will produce misaligned highlights.
| const promptInjectionFindings = detectPromptInjection(prompt); | |
| const promptInjectionFindings = detectPromptInjection(sanitizedPrompt); |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/ipc/handlers/process.ts (1)
132-162:⚠️ Potential issue | 🟠 MajorDon't log the raw prompt before the pre-scan.
This branch still builds
promptPreviewfromconfig.prompt, so Windows debug logs can persist secrets/PII even when LLM Guard is enabled. Move that preview belowrunLlmGuardPre()and derive it fromeffectivePrompt, or keep this log to metadata only.Also applies to: 171-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/ipc/handlers/process.ts` around lines 132 - 162, The current log block builds and logs promptPreview from config.prompt (via promptPreview) before running runLlmGuardPre(), potentially leaking secrets/PII; move the prompt-preview construction and logging to after runLlmGuardPre() and derive it from effectivePrompt (or remove the preview) so that LLM guard has already sanitized/approved the prompt; update references around logFn and sessionSshRemoteConfig accordingly to only include prompt metadata before the pre-scan and the actual preview from effectivePrompt after runLlmGuardPre().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/__tests__/main/security/llm-guard.test.ts`:
- Around line 15-18: The test uses a literal PAT-looking string in
runLlmGuardPre which trips secret scanners; change the fixture to build the
token at runtime from concatenated fragments (e.g., const part1 = 'ghp_'; const
part2 = '12345' + '67890'...; const token = part1 + part2) and pass that token
into runLlmGuardPre instead of embedding "ghp_..."; apply the same
fragment-concatenation approach to the other ghp_ fixtures referenced around
lines 37-43 so no full PAT-shaped literal remains in the test file.
In `@src/main/ipc/handlers/process.ts`:
- Around line 163-191: The llmGuardState is only set when effectivePrompt is
truthy, leaving non-terminal spawns without prompts unseeded; always derive
llmGuardConfig via normalizeLlmGuardConfig and, for non-terminal tool types,
call runLlmGuardPre using effectivePrompt || '' (or an explicit empty string) so
you always populate llmGuardState (config, vault, inputFindings) and still
respect findings/blocking logic and sanitizedPrompt replacement; update the
block around runLlmGuardPre and the assignment to llmGuardState (and mirror the
same change in the other occurrence handling lines for the post-scan path) so
processManager.spawn always receives a valid llmGuardState.
In `@src/main/process-manager/handlers/ExitHandler.ts`:
- Around line 33-57: The parse-failure fallback paths currently emit raw
remainingLine/jsonBuffer without redaction; ensure every emission path passes
output through ExitHandler.applyOutputGuard before sending/logging. Locate the
parse-failure/fallback branches that emit remainingLine or jsonBuffer (the
sections invoking unguarded emission in the same class that reference
ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to
this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the
guarded string; keep existing warning/blocked handling from applyOutputGuard so
blocked responses are handled consistently.
In `@src/main/process-manager/handlers/StdoutHandler.ts`:
- Around line 503-527: The applyOutputGuard logic is only used for structured
emissions, so plain-text and JSON-parse fallback branches still emit raw stdout;
update the plain-text emission branch and the JSON-parse fallback in
StdoutHandler.ts to call applyOutputGuard(sessionId, managedProcess, resultText)
(which uses runLlmGuardPost) before forwarding output, handle a blocked result
by suppressing or replacing the emission with the guard block message, and emit
guardResult.sanitizedResponse instead of the original resultText when not
blocked. Locate the plain-text branch and the JSON-parse fallback in
StdoutHandler and replace raw forwards with the guarded flow so all stdout paths
are redacted/blocked consistently.
In `@src/main/security/llm-guard/index.ts`:
- Around line 273-279: The pre-scan and post-scan PII pattern sets are out of
sync: anonymizePii() currently handles PII_IP_ADDRESS and PII_CREDIT_CARD (with
Luhn validation) but detectPiiLeakage() only re-scans email/phone/SSN; unify
them by extracting a shared piiPatterns array (including PII_EMAIL, PII_PHONE,
PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD) and reference it from both
anonymizePii() and detectPiiLeakage(), and ensure the PII_CREDIT_CARD entry
includes the Luhn filter logic so credit cards are validated consistently on
both paths (also update the rescanning branch used when action === 'block' to
use the shared list).
- Around line 57-64: The current rules PROMPT_INJECTION_ROLE_OVERRIDE and
PROMPT_INJECTION_NEW_INSTRUCTIONS use overly permissive regexes and a single-hit
confidence threshold that immediately sets blocked; tighten these regexes to
require stronger cues (e.g., include role nouns or punctuation like "you are now
the|an|a <role>" or require "new instructions:" preceded by a directive word)
and/or raise their confidence values, and change the blocking logic so blocked
is set only when multiple signals are present (e.g., require two or more
matching rules or a combined confidence sum threshold)—modify the rules array
entries for type 'PROMPT_INJECTION_ROLE_OVERRIDE' and
'PROMPT_INJECTION_NEW_INSTRUCTIONS' and adjust the code that computes `blocked`
to aggregate matches instead of blocking on a single match (affecting the rule
matching/aggregation logic used elsewhere around the other related rules).
---
Outside diff comments:
In `@src/main/ipc/handlers/process.ts`:
- Around line 132-162: The current log block builds and logs promptPreview from
config.prompt (via promptPreview) before running runLlmGuardPre(), potentially
leaking secrets/PII; move the prompt-preview construction and logging to after
runLlmGuardPre() and derive it from effectivePrompt (or remove the preview) so
that LLM guard has already sanitized/approved the prompt; update references
around logFn and sessionSshRemoteConfig accordingly to only include prompt
metadata before the pre-scan and the actual preview from effectivePrompt after
runLlmGuardPre().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3dcfd996-01fa-4b9d-b962-beeae9758ba2
📒 Files selected for processing (12)
src/__tests__/main/ipc/handlers/process.test.tssrc/__tests__/main/process-manager/handlers/ExitHandler.test.tssrc/__tests__/main/process-manager/handlers/StdoutHandler.test.tssrc/__tests__/main/security/llm-guard.test.tssrc/main/ipc/handlers/process.tssrc/main/process-manager/handlers/ExitHandler.tssrc/main/process-manager/handlers/StdoutHandler.tssrc/main/process-manager/spawners/ChildProcessSpawner.tssrc/main/process-manager/types.tssrc/main/security/llm-guard/index.tssrc/main/security/llm-guard/types.tssrc/main/security/llm-guard/vault.ts
| const result = runLlmGuardPre( | ||
| 'Contact john@example.com with token ghp_123456789012345678901234567890123456', | ||
| enabledConfig | ||
| ); |
There was a problem hiding this comment.
Avoid PAT-shaped literals in test fixtures.
These ghp_... strings are secret-scanner hits and can fail CI or incident automation even though they're synthetic. Build the token at runtime from split fragments instead of checking the full pattern into the repo, and apply the same cleanup to the other new ghp_ fixtures in this PR.
🧪 Proposed fix
+const githubToken = ['gh', 'p_', '123456789012345678901234567890123456'].join('');
+
describe('llm guard', () => {
it('anonymizes pii and redacts secrets during pre-scan', () => {
const result = runLlmGuardPre(
- 'Contact john@example.com with token ghp_123456789012345678901234567890123456',
+ `Contact john@example.com with token ${githubToken}`,
enabledConfig
);
@@
it('deanonymizes vault values and redacts output secrets during post-scan', () => {
const result = runLlmGuardPost(
- 'Reach [EMAIL_1] and rotate ghp_123456789012345678901234567890123456',
+ `Reach [EMAIL_1] and rotate ${githubToken}`,
{
entries: [{ placeholder: '[EMAIL_1]', original: 'john@example.com', type: 'PII_EMAIL' }],
},Also applies to: 37-43
🧰 Tools
🪛 Gitleaks (8.30.0)
[high] 16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.
(github-pat)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/__tests__/main/security/llm-guard.test.ts` around lines 15 - 18, The test
uses a literal PAT-looking string in runLlmGuardPre which trips secret scanners;
change the fixture to build the token at runtime from concatenated fragments
(e.g., const part1 = 'ghp_'; const part2 = '12345' + '67890'...; const token =
part1 + part2) and pass that token into runLlmGuardPre instead of embedding
"ghp_..."; apply the same fragment-concatenation approach to the other ghp_
fixtures referenced around lines 37-43 so no full PAT-shaped literal remains in
the test file.
| let effectivePrompt = config.prompt; | ||
| let llmGuardState: LlmGuardState | undefined; | ||
| const llmGuardConfig = normalizeLlmGuardConfig( | ||
| (settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as | ||
| | Partial<LlmGuardConfig> | ||
| | undefined) ?? DEFAULT_LLM_GUARD_CONFIG | ||
| ); | ||
|
|
||
| if (config.toolType !== 'terminal' && effectivePrompt) { | ||
| const guardResult = runLlmGuardPre(effectivePrompt, llmGuardConfig); | ||
| if (guardResult.findings.length > 0) { | ||
| logger.warn('[LLMGuard] Input findings detected', 'LLMGuard', { | ||
| sessionId: config.sessionId, | ||
| toolType: config.toolType, | ||
| findings: guardResult.findings.map((finding) => finding.type), | ||
| }); | ||
| } | ||
|
|
||
| if (guardResult.blocked) { | ||
| throw new Error(guardResult.blockReason ?? 'Prompt blocked by LLM Guard.'); | ||
| } | ||
|
|
||
| effectivePrompt = guardResult.sanitizedPrompt; | ||
| llmGuardState = { | ||
| config: llmGuardConfig, | ||
| vault: guardResult.vault, | ||
| inputFindings: guardResult.findings, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Seed llmGuardState for prompt-less agent spawns.
llmGuardState is only assigned when effectivePrompt is truthy. Non-terminal spawns without an initial prompt therefore reach processManager.spawn() with no guard context to carry into the post-scan path.
💡 Suggested fix
- let effectivePrompt = config.prompt;
- let llmGuardState: LlmGuardState | undefined;
const llmGuardConfig = normalizeLlmGuardConfig(
(settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as
| Partial<LlmGuardConfig>
| undefined) ?? DEFAULT_LLM_GUARD_CONFIG
);
+ let effectivePrompt = config.prompt;
+ let llmGuardState: LlmGuardState | undefined =
+ config.toolType !== 'terminal'
+ ? {
+ config: llmGuardConfig,
+ vault: { entries: [] },
+ inputFindings: [],
+ }
+ : undefined;Also applies to: 537-568
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/ipc/handlers/process.ts` around lines 163 - 191, The llmGuardState
is only set when effectivePrompt is truthy, leaving non-terminal spawns without
prompts unseeded; always derive llmGuardConfig via normalizeLlmGuardConfig and,
for non-terminal tool types, call runLlmGuardPre using effectivePrompt || '' (or
an explicit empty string) so you always populate llmGuardState (config, vault,
inputFindings) and still respect findings/blocking logic and sanitizedPrompt
replacement; update the block around runLlmGuardPre and the assignment to
llmGuardState (and mirror the same change in the other occurrence handling lines
for the post-scan path) so processManager.spawn always receives a valid
llmGuardState.
| private applyOutputGuard( | ||
| sessionId: string, | ||
| managedProcess: ManagedProcess, | ||
| resultText: string | ||
| ): string { | ||
| const guardState = managedProcess.llmGuardState; | ||
| if (!guardState?.config?.enabled) { | ||
| return resultText; | ||
| } | ||
|
|
||
| const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config); | ||
| if (guardResult.findings.length > 0) { | ||
| logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', { | ||
| sessionId, | ||
| toolType: managedProcess.toolType, | ||
| findings: guardResult.findings.map((finding) => finding.type), | ||
| }); | ||
| } | ||
|
|
||
| if (guardResult.blocked) { | ||
| return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`; | ||
| } | ||
|
|
||
| return guardResult.sanitizedResponse; | ||
| } |
There was a problem hiding this comment.
Parse-failure exit paths still emit unguarded output.
The new guard covers parsed results, but the fallbacks at Lines 126-129 and 303-310 still emit remainingLine/jsonBuffer verbatim. A truncated or malformed final payload would bypass redaction at the last emission point.
🔒 Proposed fix
} catch {
// If parsing fails, emit the raw line as data
- this.bufferManager.emitDataBuffered(sessionId, remainingLine);
+ this.bufferManager.emitDataBuffered(
+ sessionId,
+ this.applyOutputGuard(sessionId, managedProcess, remainingLine)
+ );
}
@@
} catch (error) {
logger.error('[ProcessManager] Failed to parse JSON response', 'ProcessManager', {
sessionId,
error: String(error),
});
// Emit raw buffer as fallback
- this.emitter.emit('data', sessionId, managedProcess.jsonBuffer!);
+ this.emitter.emit(
+ 'data',
+ sessionId,
+ this.applyOutputGuard(sessionId, managedProcess, managedProcess.jsonBuffer!)
+ );
}Also applies to: 120-123, 277-281
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/process-manager/handlers/ExitHandler.ts` around lines 33 - 57, The
parse-failure fallback paths currently emit raw remainingLine/jsonBuffer without
redaction; ensure every emission path passes output through
ExitHandler.applyOutputGuard before sending/logging. Locate the
parse-failure/fallback branches that emit remainingLine or jsonBuffer (the
sections invoking unguarded emission in the same class that reference
ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to
this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the
guarded string; keep existing warning/blocked handling from applyOutputGuard so
blocked responses are handled consistently.
| private applyOutputGuard( | ||
| sessionId: string, | ||
| managedProcess: ManagedProcess, | ||
| resultText: string | ||
| ): string { | ||
| const guardState = managedProcess.llmGuardState; | ||
| if (!guardState?.config?.enabled) { | ||
| return resultText; | ||
| } | ||
|
|
||
| const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config); | ||
| if (guardResult.findings.length > 0) { | ||
| logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', { | ||
| sessionId, | ||
| toolType: managedProcess.toolType, | ||
| findings: guardResult.findings.map((finding) => finding.type), | ||
| }); | ||
| } | ||
|
|
||
| if (guardResult.blocked) { | ||
| return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`; | ||
| } | ||
|
|
||
| return guardResult.sanitizedResponse; | ||
| } |
There was a problem hiding this comment.
Non-parser stdout still bypasses LLM Guard.
applyOutputGuard() only wraps structured result emissions. The plain-text branch at Line 132 and the JSON-parse fallback at Line 214 still forward raw output, so malformed or parser-less agent responses can leak secrets/PII without redaction or blocking.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/process-manager/handlers/StdoutHandler.ts` around lines 503 - 527,
The applyOutputGuard logic is only used for structured emissions, so plain-text
and JSON-parse fallback branches still emit raw stdout; update the plain-text
emission branch and the JSON-parse fallback in StdoutHandler.ts to call
applyOutputGuard(sessionId, managedProcess, resultText) (which uses
runLlmGuardPost) before forwarding output, handle a blocked result by
suppressing or replacing the emission with the guard block message, and emit
guardResult.sanitizedResponse instead of the original resultText when not
blocked. Locate the plain-text branch and the JSON-parse fallback in
StdoutHandler and replace raw forwards with the guarded flow so all stdout paths
are redacted/blocked consistently.
| type: 'PROMPT_INJECTION_ROLE_OVERRIDE', | ||
| regex: /you\s+are\s+now\s+(?:a|an)?/gi, | ||
| confidence: 0.84, | ||
| }, | ||
| { | ||
| type: 'PROMPT_INJECTION_NEW_INSTRUCTIONS', | ||
| regex: /\bnew\s+instructions?\s*:/gi, | ||
| confidence: 0.78, |
There was a problem hiding this comment.
block mode is too easy to trigger here.
Line 58 makes the role-override pattern match bare “you are now”, and Line 82's default threshold means that single hit blocks the request. new instructions: has the same problem. Tighten these patterns or require multiple signals before setting blocked.
Also applies to: 81-82, 142-149
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/security/llm-guard/index.ts` around lines 57 - 64, The current rules
PROMPT_INJECTION_ROLE_OVERRIDE and PROMPT_INJECTION_NEW_INSTRUCTIONS use overly
permissive regexes and a single-hit confidence threshold that immediately sets
blocked; tighten these regexes to require stronger cues (e.g., include role
nouns or punctuation like "you are now the|an|a <role>" or require "new
instructions:" preceded by a directive word) and/or raise their confidence
values, and change the blocking logic so blocked is set only when multiple
signals are present (e.g., require two or more matching rules or a combined
confidence sum threshold)—modify the rules array entries for type
'PROMPT_INJECTION_ROLE_OVERRIDE' and 'PROMPT_INJECTION_NEW_INSTRUCTIONS' and
adjust the code that computes `blocked` to aggregate matches instead of blocking
on a single match (affecting the rule matching/aggregation logic used elsewhere
around the other related rules).
| const piiPatterns = [ | ||
| { type: 'PII_EMAIL', regex: EMAIL_REGEX, confidence: 0.99 }, | ||
| { type: 'PII_PHONE', regex: PHONE_REGEX, confidence: 0.92 }, | ||
| { type: 'PII_SSN', regex: SSN_REGEX, confidence: 0.97 }, | ||
| { type: 'PII_IP_ADDRESS', regex: IPV4_REGEX, confidence: 0.88 }, | ||
| { type: 'PII_CREDIT_CARD', regex: CREDIT_CARD_REGEX, confidence: 0.75 }, | ||
| ]; |
There was a problem hiding this comment.
Keep post-scan PII coverage in sync with pre-scan coverage.
anonymizePii() handles PII_IP_ADDRESS and PII_CREDIT_CARD, but detectPiiLeakage() only rescans email/phone/SSN. In action: 'block' mode those two classes can still leave the system unchecked. Reuse a shared PII pattern list, including the Luhn filter for credit cards, on both paths.
Also applies to: 309-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/main/security/llm-guard/index.ts` around lines 273 - 279, The pre-scan
and post-scan PII pattern sets are out of sync: anonymizePii() currently handles
PII_IP_ADDRESS and PII_CREDIT_CARD (with Luhn validation) but detectPiiLeakage()
only re-scans email/phone/SSN; unify them by extracting a shared piiPatterns
array (including PII_EMAIL, PII_PHONE, PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD)
and reference it from both anonymizePii() and detectPiiLeakage(), and ensure the
PII_CREDIT_CARD entry includes the Luhn filter logic so credit cards are
validated consistently on both paths (also update the rescanning branch used
when action === 'block' to use the shared list).
Summary
Testing
Closes #522
Summary by CodeRabbit
Release Notes
New Features
Tests