Skip to content

feat: add native llm guard pipeline#523

Open
jeffscottward wants to merge 3 commits intoRunMaestro:mainfrom
jeffscottward:issue/522-llm-guard-mvp
Open

feat: add native llm guard pipeline#523
jeffscottward wants to merge 3 commits intoRunMaestro:mainfrom
jeffscottward:issue/522-llm-guard-mvp

Conversation

@jeffscottward
Copy link
Contributor

@jeffscottward jeffscottward commented Mar 6, 2026

Summary

  • add a native TypeScript LLM guard module with secrets redaction, basic PII anonymization/deanonymization, and prompt-injection heuristics
  • run the guard before process spawn and carry guard state through the managed process lifecycle for output scanning
  • cover the new prompt/result interception paths with focused unit tests

Testing

  • npm test -- src/tests/main/security/llm-guard.test.ts src/tests/main/ipc/handlers/process.test.ts src/tests/main/process-manager/handlers/StdoutHandler.test.ts
  • npm run build:prompts
  • npm run lint

Closes #522

Summary by CodeRabbit

Release Notes

  • New Features

    • Added LLM Guard security module with pre- and post-processing of prompts and responses
    • Integrated PII anonymization, secret redaction, and prompt injection detection
    • Configurable guard actions: warn, sanitize, or block
    • PII deanonymization in responses with vault management for sensitive data placeholders
  • Tests

    • Comprehensive test coverage for LLM Guard functionality, process spawning, and output handling

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

This PR implements a TypeScript-native LLM Guard security module for protecting LLM prompts and responses. It detects and handles PII, secrets, and prompt injection threats through configurable pre-processing and post-processing pipelines integrated into the process spawning and output handling flows.

Changes

Cohort / File(s) Summary
LLM Guard Core Module
src/main/security/llm-guard/types.ts, src/main/security/llm-guard/vault.ts, src/main/security/llm-guard/index.ts
New security module with pattern-based detection for PII (emails, phones, SSNs, credit cards, IPs), secrets (GitHub tokens, AWS keys, connection strings), and prompt injection signals. Includes PiiVault for managing anonymization mappings, configurable actions (warn/sanitize/block), and pre/post-processing guard functions (runLlmGuardPre, runLlmGuardPost) with optional blocking and detailed findings tracking.
Process Handler Pre-check Integration
src/main/ipc/handlers/process.ts
Integrates runLlmGuardPre into prompt processing before spawning. Computes llmGuardConfig, runs pre-check, logs findings, blocks if needed, and propagates llmGuardState. Replaces isWindows() with local constant and uses effectivePrompt throughout spawn logic and SSH/stdin handling.
Output Handler Post-check Integration
src/main/process-manager/handlers/ExitHandler.ts, src/main/process-manager/handlers/StdoutHandler.ts
Adds applyOutputGuard helper to both handlers that runs runLlmGuardPost on emitted outputs. Routes text through guard for sanitization/blocking based on llmGuardState, logs findings when detected, and replaces output with guarded text or blocked indicator when applicable.
Type Extensions
src/main/process-manager/types.ts, src/main/process-manager/spawners/ChildProcessSpawner.ts
Extends ProcessConfig and ManagedProcess interfaces with optional llmGuardState property. ChildProcessSpawner now propagates config.llmGuardState to spawned process record.
Test Coverage
src/__tests__/main/ipc/handlers/process.test.ts, src/__tests__/main/process-manager/handlers/ExitHandler.test.ts, src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts, src/__tests__/main/security/llm-guard.test.ts
Comprehensive test coverage validating: process handler sanitize/block modes, exit handler vault deanonymization with redaction, stdout handler placeholder replacement, and llm-guard unit tests for pre-scan anonymization, post-scan deanonymization, and block-mode prompt injection detection.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ProcessHandler as Process Handler
    participant LLMGuardPre as LLMGuard Pre
    participant ProcessMgr as Process Manager
    participant LLMGuardPost as LLMGuard Post
    participant Output

    Client->>ProcessHandler: spawn(config with prompt)
    ProcessHandler->>LLMGuardPre: runLlmGuardPre(prompt)
    LLMGuardPre->>LLMGuardPre: Detect PII/Secrets<br/>Anonymize & Vault<br/>Check Prompt Injection
    LLMGuardPre-->>ProcessHandler: {sanitizedPrompt, vault, findings, blocked?}
    
    alt Guard Blocked
        ProcessHandler-->>Client: Error (blocked)
    else Guard Allowed
        ProcessHandler->>ProcessHandler: Log findings if any
        ProcessHandler->>ProcessMgr: spawn(sanitizedPrompt, llmGuardState)
        ProcessMgr->>ProcessMgr: Execute process
        ProcessMgr-->>ProcessHandler: response
        
        ProcessHandler->>LLMGuardPost: runLlmGuardPost(response, vault)
        LLMGuardPost->>LLMGuardPost: Deanonymize PII<br/>Redact Secrets<br/>Detect Leakage
        LLMGuardPost-->>ProcessHandler: {sanitizedResponse, findings, blocked?}
        
        alt Post Guard Blocked
            ProcessHandler-->>Output: Blocked message
        else Post Guard Allowed
            ProcessHandler-->>Output: Sanitized response
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add native llm guard pipeline' clearly and concisely describes the main change—introducing a TypeScript-native LLM guard security pipeline, which aligns with the primary objective of the PR.
Linked Issues check ✅ Passed The PR implements core Phase 1 MVP requirements: secrets redaction, basic PII anonymization/deanonymization with vault, prompt-injection detection, and configurable actions (warn/sanitize/block). Pre- and post-processing hooks are integrated into the process spawn pipeline.
Out of Scope Changes check ✅ Passed All changes directly support LLM guard integration: new guard module with detection logic, vault system, pre/post-processing in handlers, type extensions, and focused unit tests. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR introduces a native TypeScript LLM guard pipeline that runs before process spawn (pre-scan: secret redaction, PII anonymization, prompt-injection detection) and after result emission (post-scan: PII deanonymization, output secret redaction, PII-leakage detection). Guard state — including the PII vault and input findings — is threaded through ProcessConfig and ManagedProcess so the StdoutHandler can deanonymize placeholders in LLM output. The overall architecture is sound and well-integrated into the existing spawn lifecycle.

Key findings:

  • Overlapping regex findings corrupt replacement outputapplyReplacements applies matches right-to-left using original text positions, but does not deduplicate or merge overlapping spans. When two patterns match the same region (e.g. a connection string that also triggers the SECRET_OPENAI_KEY pattern, or a credit card matching the phone regex), the second replacement slices into already-modified text using stale indices, producing garbled output. This needs overlap detection before calling applyReplacements.
  • warn action is declared but not implementedLlmGuardAction includes 'warn' as a valid config value, but both runLlmGuardPre and runLlmGuardPost only branch on 'block'; 'warn' silently behaves as 'sanitize' with no differentiation.
  • CREDIT_CARD_REGEX is extremely broad — the pattern /\b(?:\d[ -]*?){13,19}\b/ matches any 13–19 digit sequence with optional separators. Even with Luhn filtering, this will produce significant false positives in developer prompts (long numeric IDs, timestamps, hex values).
  • SECRET_OPENAI_KEY regex over-matchessk-[A-Za-z0-9]{20,} matches any sk- prefixed identifier, a common pattern well beyond OpenAI keys.
  • Prompt-injection findings use original text positionsdetectPromptInjection is passed the unmodified prompt instead of sanitizedPrompt, making finding start/end positions inconsistent with those of the secret/PII findings in the same result array.

Confidence Score: 2/5

  • Not safe to merge — the overlapping-findings bug in applyReplacements can produce silently corrupted prompts, undermining the security guarantees this feature is intended to provide.
  • The integration with the spawn lifecycle and the vault deanonymization are well-implemented, but the core replacement logic has a correctness bug (overlapping spans → garbled text) that can silently damage prompts rather than protecting them. Additionally, the overly broad regex patterns create a high false-positive risk that could redact legitimate content in normal developer use, and the warn action is a dead configuration path that will confuse users who configure it.
  • src/main/security/llm-guard/index.ts requires the most attention — the applyReplacements overlap issue and the pattern quality problems are all concentrated there.

Important Files Changed

Filename Overview
src/main/security/llm-guard/index.ts Core guard module — contains a logic bug where overlapping regex findings from multiple patterns corrupt the replacement output; warn action silently falls through to sanitize behavior; CREDIT_CARD_REGEX and SECRET_OPENAI_KEY patterns are overly broad causing high false-positive risk in developer prompts.
src/main/security/llm-guard/types.ts Type definitions are well-structured; the warn action variant in LlmGuardAction is declared but has no implementation in the guard logic, making it a dead/misleading config option.
src/main/security/llm-guard/vault.ts Simple, correct PII vault implementation; deanonymize uses safe string splitting to avoid regex-escaping issues with placeholder values.
src/main/ipc/handlers/process.ts Guard integration is clean; effectivePrompt correctly threads through all spawn paths; isWindows refactored from import to inline check correctly; processManager.spawn is now awaited (safe even if synchronous).
src/main/process-manager/handlers/StdoutHandler.ts Output guard applied correctly at all three result-emission paths; blocked responses are replaced with a user-visible sentinel string rather than silently dropped, which is appropriate.
src/main/process-manager/spawners/ChildProcessSpawner.ts Minimal one-line change propagating llmGuardState from ProcessConfig into the ManagedProcess record; straightforward and correct.
src/main/process-manager/types.ts Adds optional llmGuardState field to both ProcessConfig and ManagedProcess; clean, non-breaking change.
src/tests/main/security/llm-guard.test.ts Focused unit tests cover the three core flows (pre-sanitize, post-deanonymize, block-mode injection); good coverage for the happy paths, though no tests for overlapping-pattern behavior.
src/tests/main/ipc/handlers/process.test.ts Two new integration-style tests verify guard state is passed into spawn and that block-mode rejects the promise; covers the important branching paths in process.ts.
src/tests/main/process-manager/handlers/StdoutHandler.test.ts New test validates deanonymization and output-secret redaction in the stdout path; correctly asserts the emitted string contains both the restored email and the redacted token placeholder.

Sequence Diagram

sequenceDiagram
    participant R as Renderer (IPC)
    participant PH as process.ts handler
    participant LG as llm-guard (pre)
    participant V as PiiVault
    participant PM as ProcessManager.spawn
    participant CP as ChildProcess
    participant SH as StdoutHandler
    participant LGO as llm-guard (post)
    participant BM as BufferManager

    R->>PH: process:spawn { prompt, toolType, ... }
    PH->>PH: normalizeLlmGuardConfig(settingsStore.get)
    alt toolType !== 'terminal' && prompt exists
        PH->>LG: runLlmGuardPre(prompt, config)
        LG->>LG: redactSecrets(prompt)
        LG->>V: anonymizePii(sanitized, vault)
        V-->>LG: placeholder mappings stored
        LG->>LG: detectPromptInjection(prompt)
        LG-->>PH: { sanitizedPrompt, vault, findings, blocked }
        alt blocked === true
            PH-->>R: throw Error (blocked by guard)
        end
        PH->>PH: effectivePrompt = sanitizedPrompt
        PH->>PH: llmGuardState = { config, vault, inputFindings }
    end
    PH->>PM: spawn({ ...config, prompt: effectivePrompt, llmGuardState })
    PM->>CP: spawn child process
    CP-->>SH: stdout data
    SH->>SH: parse JSON lines
    alt result message
        SH->>LGO: applyOutputGuard(sessionId, managedProcess, resultText)
        LGO->>LGO: PiiVault.deanonymize(text, vault)
        LGO->>LGO: redactSecrets(deanonymized)
        LGO->>LGO: detectPiiLeakage(redacted, vault)
        LGO-->>SH: guardedText (or block sentinel)
        SH->>BM: emitDataBuffered(sessionId, guardedText)
    end
Loading

Last reviewed commit: 5318d36

Comment on lines +220 to +240
function applyReplacements(
text: string,
findings: LlmGuardFinding[],
replacementBuilder: (finding: LlmGuardFinding, index: number) => string
): { text: string; findings: LlmGuardFinding[] } {
const sortedFindings = [...findings].sort((a, b) => b.start - a.start);
let nextText = text;

sortedFindings.forEach((finding, reverseIndex) => {
const index = sortedFindings.length - reverseIndex;
const replacement = replacementBuilder(finding, index);
nextText =
nextText.slice(0, finding.start) + replacement + nextText.slice(finding.end);
finding.replacement = replacement;
});

return {
text: nextText,
findings: sortedFindings.sort((a, b) => a.start - b.start),
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overlapping findings corrupt output text

applyReplacements sorts findings by start descending and applies replacements right-to-left, relying on the invariant that each replacement operates on the original text's positions. However, when two patterns independently match overlapping spans (e.g. SECRET_CONNECTION_STRING matching postgres://sk-abc...@host/db while SECRET_OPENAI_KEY also matches sk-abc... inside that same range), both findings are collected and passed to applyReplacements together. The first replacement modifies the string in-place; the second then slices using stale start/end indices from the original text, producing garbled output — characters from the first replacement will be duplicated or overwritten.

This affects both redactSecrets (multiple secret patterns run on the same text) and anonymizePii (multiple PII patterns run on the same text, e.g. a credit card number overlapping a phone number via the very broad CREDIT_CARD_REGEX).

To fix, deduplicate/merge overlapping findings before applying replacements — keep only the longest (outermost) match when two findings overlap:

function mergeOverlapping(findings: LlmGuardFinding[]): LlmGuardFinding[] {
  const sorted = [...findings].sort((a, b) => a.start - b.start || b.end - a.end);
  const merged: LlmGuardFinding[] = [];
  for (const f of sorted) {
    const last = merged[merged.length - 1];
    if (last && f.start < last.end) continue; // skip overlapping/contained finding
    merged.push(f);
  }
  return merged;
}

Then call applyReplacements(text, mergeOverlapping(findings), ...) in both redactSecrets and anonymizePii.

@@ -0,0 +1,59 @@
export type LlmGuardAction = 'warn' | 'sanitize' | 'block';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn action declared but not implemented

LlmGuardAction includes 'warn' as a valid value, but neither runLlmGuardPre nor runLlmGuardPost have any logic branch for action === 'warn'. At runtime, 'warn' silently behaves identically to 'sanitize' — secrets are still redacted and PII is anonymized, but no warning is surfaced to the caller. If 'warn' is meant to detect-and-log without modifying the prompt, the implementation needs an explicit branch. If it's not yet supported, it should be removed from the union type to prevent callers from configuring a non-functional mode.

Suggested change
export type LlmGuardAction = 'warn' | 'sanitize' | 'block';
export type LlmGuardAction = 'sanitize' | 'block';

const PHONE_REGEX = /\b(?:\+?1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?){2}\d{4}\b/g;
const SSN_REGEX = /\b\d{3}-\d{2}-\d{4}\b/g;
const IPV4_REGEX = /\b(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b/g;
const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CREDIT_CARD_REGEX is extremely broad and will produce many false positives

/\b(?:\d[ -]*?){13,19}\b/g matches virtually any 13–19 digit sequence separated by optional spaces or hyphens. In typical developer prompts, this pattern will fire on long numeric identifiers, timestamps, hex values rendered as decimals, issue numbers, etc. Even with the Luhn check (which passes for ~10% of random digit strings), the false-positive rate in developer-oriented text is expected to be very high.

A narrower pattern that requires the common four-group card format would significantly reduce false positives:

Suggested change
const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g;
const CREDIT_CARD_REGEX = /\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g;

},
{
type: 'SECRET_OPENAI_KEY',
regex: /\bsk-[A-Za-z0-9]{20,}\b/g,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SECRET_OPENAI_KEY pattern is too permissive

/\bsk-[A-Za-z0-9]{20,}\b/g will match any identifier starting with sk- followed by 20+ alphanumeric characters. The sk- prefix is widely used across many APIs and internal identifiers (e.g. sk-client-prod-identifier-12345abc…). Modern OpenAI API keys use more specific prefixes (sk-proj-, sk-svcacct-) that could be matched more precisely, and the legacy sk- format has a specific length (51 total characters). Tightening the pattern reduces false positives that could silently redact non-sensitive identifiers from prompts:

Suggested change
regex: /\bsk-[A-Za-z0-9]{20,}\b/g,
{
type: 'SECRET_OPENAI_KEY',
regex: /\bsk-(?:proj-|svcacct-|org-)?[A-Za-z0-9]{20,}T3BlbkFJ[A-Za-z0-9]{20,}\b|\bsk-[A-Za-z0-9]{48}\b/g,
confidence: 0.96,
},

Alternatively, if false positives are acceptable, at minimum raise the length threshold and require the sk- to appear in an API-key context (after =, :, etc.) to reduce ambient matching.

let blockReason: string | undefined;

if (effectiveConfig.input.detectPromptInjection) {
const promptInjectionFindings = detectPromptInjection(prompt);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt-injection detection runs on original prompt, not sanitized text

detectPromptInjection(prompt) uses the original prompt variable rather than sanitizedPrompt (which has already had secrets redacted and PII anonymized). While this doesn't cause incorrect blocking behavior (only confidence scores drive the block decision), it means the findings array returned from runLlmGuardPre contains injection findings whose start/end positions point into the original text, inconsistent with the positions of the secret/PII findings which point into the sanitized text. If callers ever use these positions to highlight findings in the UI, this will produce misaligned highlights.

Suggested change
const promptInjectionFindings = detectPromptInjection(prompt);
const promptInjectionFindings = detectPromptInjection(sanitizedPrompt);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/main/ipc/handlers/process.ts (1)

132-162: ⚠️ Potential issue | 🟠 Major

Don't log the raw prompt before the pre-scan.

This branch still builds promptPreview from config.prompt, so Windows debug logs can persist secrets/PII even when LLM Guard is enabled. Move that preview below runLlmGuardPre() and derive it from effectivePrompt, or keep this log to metadata only.

Also applies to: 171-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/ipc/handlers/process.ts` around lines 132 - 162, The current log
block builds and logs promptPreview from config.prompt (via promptPreview)
before running runLlmGuardPre(), potentially leaking secrets/PII; move the
prompt-preview construction and logging to after runLlmGuardPre() and derive it
from effectivePrompt (or remove the preview) so that LLM guard has already
sanitized/approved the prompt; update references around logFn and
sessionSshRemoteConfig accordingly to only include prompt metadata before the
pre-scan and the actual preview from effectivePrompt after runLlmGuardPre().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/__tests__/main/security/llm-guard.test.ts`:
- Around line 15-18: The test uses a literal PAT-looking string in
runLlmGuardPre which trips secret scanners; change the fixture to build the
token at runtime from concatenated fragments (e.g., const part1 = 'ghp_'; const
part2 = '12345' + '67890'...; const token = part1 + part2) and pass that token
into runLlmGuardPre instead of embedding "ghp_..."; apply the same
fragment-concatenation approach to the other ghp_ fixtures referenced around
lines 37-43 so no full PAT-shaped literal remains in the test file.

In `@src/main/ipc/handlers/process.ts`:
- Around line 163-191: The llmGuardState is only set when effectivePrompt is
truthy, leaving non-terminal spawns without prompts unseeded; always derive
llmGuardConfig via normalizeLlmGuardConfig and, for non-terminal tool types,
call runLlmGuardPre using effectivePrompt || '' (or an explicit empty string) so
you always populate llmGuardState (config, vault, inputFindings) and still
respect findings/blocking logic and sanitizedPrompt replacement; update the
block around runLlmGuardPre and the assignment to llmGuardState (and mirror the
same change in the other occurrence handling lines for the post-scan path) so
processManager.spawn always receives a valid llmGuardState.

In `@src/main/process-manager/handlers/ExitHandler.ts`:
- Around line 33-57: The parse-failure fallback paths currently emit raw
remainingLine/jsonBuffer without redaction; ensure every emission path passes
output through ExitHandler.applyOutputGuard before sending/logging. Locate the
parse-failure/fallback branches that emit remainingLine or jsonBuffer (the
sections invoking unguarded emission in the same class that reference
ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to
this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the
guarded string; keep existing warning/blocked handling from applyOutputGuard so
blocked responses are handled consistently.

In `@src/main/process-manager/handlers/StdoutHandler.ts`:
- Around line 503-527: The applyOutputGuard logic is only used for structured
emissions, so plain-text and JSON-parse fallback branches still emit raw stdout;
update the plain-text emission branch and the JSON-parse fallback in
StdoutHandler.ts to call applyOutputGuard(sessionId, managedProcess, resultText)
(which uses runLlmGuardPost) before forwarding output, handle a blocked result
by suppressing or replacing the emission with the guard block message, and emit
guardResult.sanitizedResponse instead of the original resultText when not
blocked. Locate the plain-text branch and the JSON-parse fallback in
StdoutHandler and replace raw forwards with the guarded flow so all stdout paths
are redacted/blocked consistently.

In `@src/main/security/llm-guard/index.ts`:
- Around line 273-279: The pre-scan and post-scan PII pattern sets are out of
sync: anonymizePii() currently handles PII_IP_ADDRESS and PII_CREDIT_CARD (with
Luhn validation) but detectPiiLeakage() only re-scans email/phone/SSN; unify
them by extracting a shared piiPatterns array (including PII_EMAIL, PII_PHONE,
PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD) and reference it from both
anonymizePii() and detectPiiLeakage(), and ensure the PII_CREDIT_CARD entry
includes the Luhn filter logic so credit cards are validated consistently on
both paths (also update the rescanning branch used when action === 'block' to
use the shared list).
- Around line 57-64: The current rules PROMPT_INJECTION_ROLE_OVERRIDE and
PROMPT_INJECTION_NEW_INSTRUCTIONS use overly permissive regexes and a single-hit
confidence threshold that immediately sets blocked; tighten these regexes to
require stronger cues (e.g., include role nouns or punctuation like "you are now
the|an|a <role>" or require "new instructions:" preceded by a directive word)
and/or raise their confidence values, and change the blocking logic so blocked
is set only when multiple signals are present (e.g., require two or more
matching rules or a combined confidence sum threshold)—modify the rules array
entries for type 'PROMPT_INJECTION_ROLE_OVERRIDE' and
'PROMPT_INJECTION_NEW_INSTRUCTIONS' and adjust the code that computes `blocked`
to aggregate matches instead of blocking on a single match (affecting the rule
matching/aggregation logic used elsewhere around the other related rules).

---

Outside diff comments:
In `@src/main/ipc/handlers/process.ts`:
- Around line 132-162: The current log block builds and logs promptPreview from
config.prompt (via promptPreview) before running runLlmGuardPre(), potentially
leaking secrets/PII; move the prompt-preview construction and logging to after
runLlmGuardPre() and derive it from effectivePrompt (or remove the preview) so
that LLM guard has already sanitized/approved the prompt; update references
around logFn and sessionSshRemoteConfig accordingly to only include prompt
metadata before the pre-scan and the actual preview from effectivePrompt after
runLlmGuardPre().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3dcfd996-01fa-4b9d-b962-beeae9758ba2

📥 Commits

Reviewing files that changed from the base of the PR and between 58ec09a and 3516cfe.

📒 Files selected for processing (12)
  • src/__tests__/main/ipc/handlers/process.test.ts
  • src/__tests__/main/process-manager/handlers/ExitHandler.test.ts
  • src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts
  • src/__tests__/main/security/llm-guard.test.ts
  • src/main/ipc/handlers/process.ts
  • src/main/process-manager/handlers/ExitHandler.ts
  • src/main/process-manager/handlers/StdoutHandler.ts
  • src/main/process-manager/spawners/ChildProcessSpawner.ts
  • src/main/process-manager/types.ts
  • src/main/security/llm-guard/index.ts
  • src/main/security/llm-guard/types.ts
  • src/main/security/llm-guard/vault.ts

Comment on lines +15 to +18
const result = runLlmGuardPre(
'Contact john@example.com with token ghp_123456789012345678901234567890123456',
enabledConfig
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid PAT-shaped literals in test fixtures.

These ghp_... strings are secret-scanner hits and can fail CI or incident automation even though they're synthetic. Build the token at runtime from split fragments instead of checking the full pattern into the repo, and apply the same cleanup to the other new ghp_ fixtures in this PR.

🧪 Proposed fix
+const githubToken = ['gh', 'p_', '123456789012345678901234567890123456'].join('');
+
 describe('llm guard', () => {
 	it('anonymizes pii and redacts secrets during pre-scan', () => {
 		const result = runLlmGuardPre(
-			'Contact john@example.com with token ghp_123456789012345678901234567890123456',
+			`Contact john@example.com with token ${githubToken}`,
 			enabledConfig
 		);
@@
 	it('deanonymizes vault values and redacts output secrets during post-scan', () => {
 		const result = runLlmGuardPost(
-			'Reach [EMAIL_1] and rotate ghp_123456789012345678901234567890123456',
+			`Reach [EMAIL_1] and rotate ${githubToken}`,
 			{
 				entries: [{ placeholder: '[EMAIL_1]', original: 'john@example.com', type: 'PII_EMAIL' }],
 			},

Also applies to: 37-43

🧰 Tools
🪛 Gitleaks (8.30.0)

[high] 16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.

(github-pat)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/__tests__/main/security/llm-guard.test.ts` around lines 15 - 18, The test
uses a literal PAT-looking string in runLlmGuardPre which trips secret scanners;
change the fixture to build the token at runtime from concatenated fragments
(e.g., const part1 = 'ghp_'; const part2 = '12345' + '67890'...; const token =
part1 + part2) and pass that token into runLlmGuardPre instead of embedding
"ghp_..."; apply the same fragment-concatenation approach to the other ghp_
fixtures referenced around lines 37-43 so no full PAT-shaped literal remains in
the test file.

Comment on lines +163 to +191
let effectivePrompt = config.prompt;
let llmGuardState: LlmGuardState | undefined;
const llmGuardConfig = normalizeLlmGuardConfig(
(settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as
| Partial<LlmGuardConfig>
| undefined) ?? DEFAULT_LLM_GUARD_CONFIG
);

if (config.toolType !== 'terminal' && effectivePrompt) {
const guardResult = runLlmGuardPre(effectivePrompt, llmGuardConfig);
if (guardResult.findings.length > 0) {
logger.warn('[LLMGuard] Input findings detected', 'LLMGuard', {
sessionId: config.sessionId,
toolType: config.toolType,
findings: guardResult.findings.map((finding) => finding.type),
});
}

if (guardResult.blocked) {
throw new Error(guardResult.blockReason ?? 'Prompt blocked by LLM Guard.');
}

effectivePrompt = guardResult.sanitizedPrompt;
llmGuardState = {
config: llmGuardConfig,
vault: guardResult.vault,
inputFindings: guardResult.findings,
};
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Seed llmGuardState for prompt-less agent spawns.

llmGuardState is only assigned when effectivePrompt is truthy. Non-terminal spawns without an initial prompt therefore reach processManager.spawn() with no guard context to carry into the post-scan path.

💡 Suggested fix
-				let effectivePrompt = config.prompt;
-				let llmGuardState: LlmGuardState | undefined;
 				const llmGuardConfig = normalizeLlmGuardConfig(
 					(settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as
 						| Partial<LlmGuardConfig>
 						| undefined) ?? DEFAULT_LLM_GUARD_CONFIG
 				);
+				let effectivePrompt = config.prompt;
+				let llmGuardState: LlmGuardState | undefined =
+					config.toolType !== 'terminal'
+						? {
+								config: llmGuardConfig,
+								vault: { entries: [] },
+								inputFindings: [],
+							}
+						: undefined;

Also applies to: 537-568

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/ipc/handlers/process.ts` around lines 163 - 191, The llmGuardState
is only set when effectivePrompt is truthy, leaving non-terminal spawns without
prompts unseeded; always derive llmGuardConfig via normalizeLlmGuardConfig and,
for non-terminal tool types, call runLlmGuardPre using effectivePrompt || '' (or
an explicit empty string) so you always populate llmGuardState (config, vault,
inputFindings) and still respect findings/blocking logic and sanitizedPrompt
replacement; update the block around runLlmGuardPre and the assignment to
llmGuardState (and mirror the same change in the other occurrence handling lines
for the post-scan path) so processManager.spawn always receives a valid
llmGuardState.

Comment on lines +33 to +57
private applyOutputGuard(
sessionId: string,
managedProcess: ManagedProcess,
resultText: string
): string {
const guardState = managedProcess.llmGuardState;
if (!guardState?.config?.enabled) {
return resultText;
}

const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config);
if (guardResult.findings.length > 0) {
logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', {
sessionId,
toolType: managedProcess.toolType,
findings: guardResult.findings.map((finding) => finding.type),
});
}

if (guardResult.blocked) {
return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`;
}

return guardResult.sanitizedResponse;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Parse-failure exit paths still emit unguarded output.

The new guard covers parsed results, but the fallbacks at Lines 126-129 and 303-310 still emit remainingLine/jsonBuffer verbatim. A truncated or malformed final payload would bypass redaction at the last emission point.

🔒 Proposed fix
 		} catch {
 			// If parsing fails, emit the raw line as data
-			this.bufferManager.emitDataBuffered(sessionId, remainingLine);
+			this.bufferManager.emitDataBuffered(
+				sessionId,
+				this.applyOutputGuard(sessionId, managedProcess, remainingLine)
+			);
 		}
@@
 		} catch (error) {
 			logger.error('[ProcessManager] Failed to parse JSON response', 'ProcessManager', {
 				sessionId,
 				error: String(error),
 			});
 			// Emit raw buffer as fallback
-			this.emitter.emit('data', sessionId, managedProcess.jsonBuffer!);
+			this.emitter.emit(
+				'data',
+				sessionId,
+				this.applyOutputGuard(sessionId, managedProcess, managedProcess.jsonBuffer!)
+			);
 		}

Also applies to: 120-123, 277-281

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/process-manager/handlers/ExitHandler.ts` around lines 33 - 57, The
parse-failure fallback paths currently emit raw remainingLine/jsonBuffer without
redaction; ensure every emission path passes output through
ExitHandler.applyOutputGuard before sending/logging. Locate the
parse-failure/fallback branches that emit remainingLine or jsonBuffer (the
sections invoking unguarded emission in the same class that reference
ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to
this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the
guarded string; keep existing warning/blocked handling from applyOutputGuard so
blocked responses are handled consistently.

Comment on lines +503 to +527
private applyOutputGuard(
sessionId: string,
managedProcess: ManagedProcess,
resultText: string
): string {
const guardState = managedProcess.llmGuardState;
if (!guardState?.config?.enabled) {
return resultText;
}

const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config);
if (guardResult.findings.length > 0) {
logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', {
sessionId,
toolType: managedProcess.toolType,
findings: guardResult.findings.map((finding) => finding.type),
});
}

if (guardResult.blocked) {
return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`;
}

return guardResult.sanitizedResponse;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Non-parser stdout still bypasses LLM Guard.

applyOutputGuard() only wraps structured result emissions. The plain-text branch at Line 132 and the JSON-parse fallback at Line 214 still forward raw output, so malformed or parser-less agent responses can leak secrets/PII without redaction or blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/process-manager/handlers/StdoutHandler.ts` around lines 503 - 527,
The applyOutputGuard logic is only used for structured emissions, so plain-text
and JSON-parse fallback branches still emit raw stdout; update the plain-text
emission branch and the JSON-parse fallback in StdoutHandler.ts to call
applyOutputGuard(sessionId, managedProcess, resultText) (which uses
runLlmGuardPost) before forwarding output, handle a blocked result by
suppressing or replacing the emission with the guard block message, and emit
guardResult.sanitizedResponse instead of the original resultText when not
blocked. Locate the plain-text branch and the JSON-parse fallback in
StdoutHandler and replace raw forwards with the guarded flow so all stdout paths
are redacted/blocked consistently.

Comment on lines +57 to +64
type: 'PROMPT_INJECTION_ROLE_OVERRIDE',
regex: /you\s+are\s+now\s+(?:a|an)?/gi,
confidence: 0.84,
},
{
type: 'PROMPT_INJECTION_NEW_INSTRUCTIONS',
regex: /\bnew\s+instructions?\s*:/gi,
confidence: 0.78,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

block mode is too easy to trigger here.

Line 58 makes the role-override pattern match bare “you are now”, and Line 82's default threshold means that single hit blocks the request. new instructions: has the same problem. Tighten these patterns or require multiple signals before setting blocked.

Also applies to: 81-82, 142-149

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/security/llm-guard/index.ts` around lines 57 - 64, The current rules
PROMPT_INJECTION_ROLE_OVERRIDE and PROMPT_INJECTION_NEW_INSTRUCTIONS use overly
permissive regexes and a single-hit confidence threshold that immediately sets
blocked; tighten these regexes to require stronger cues (e.g., include role
nouns or punctuation like "you are now the|an|a <role>" or require "new
instructions:" preceded by a directive word) and/or raise their confidence
values, and change the blocking logic so blocked is set only when multiple
signals are present (e.g., require two or more matching rules or a combined
confidence sum threshold)—modify the rules array entries for type
'PROMPT_INJECTION_ROLE_OVERRIDE' and 'PROMPT_INJECTION_NEW_INSTRUCTIONS' and
adjust the code that computes `blocked` to aggregate matches instead of blocking
on a single match (affecting the rule matching/aggregation logic used elsewhere
around the other related rules).

Comment on lines +273 to +279
const piiPatterns = [
{ type: 'PII_EMAIL', regex: EMAIL_REGEX, confidence: 0.99 },
{ type: 'PII_PHONE', regex: PHONE_REGEX, confidence: 0.92 },
{ type: 'PII_SSN', regex: SSN_REGEX, confidence: 0.97 },
{ type: 'PII_IP_ADDRESS', regex: IPV4_REGEX, confidence: 0.88 },
{ type: 'PII_CREDIT_CARD', regex: CREDIT_CARD_REGEX, confidence: 0.75 },
];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep post-scan PII coverage in sync with pre-scan coverage.

anonymizePii() handles PII_IP_ADDRESS and PII_CREDIT_CARD, but detectPiiLeakage() only rescans email/phone/SSN. In action: 'block' mode those two classes can still leave the system unchecked. Reuse a shared PII pattern list, including the Luhn filter for credit cards, on both paths.

Also applies to: 309-315

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/security/llm-guard/index.ts` around lines 273 - 279, The pre-scan
and post-scan PII pattern sets are out of sync: anonymizePii() currently handles
PII_IP_ADDRESS and PII_CREDIT_CARD (with Luhn validation) but detectPiiLeakage()
only re-scans email/phone/SSN; unify them by extracting a shared piiPatterns
array (including PII_EMAIL, PII_PHONE, PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD)
and reference it from both anonymizePii() and detectPiiLeakage(), and ensure the
PII_CREDIT_CARD entry includes the Luhn filter logic so credit cards are
validated consistently on both paths (also update the rescanning branch used
when action === 'block' to use the shared list).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: TypeScript-Native LLM Guard for Input/Output Protection

1 participant