feat: add native llm guard pipeline by jeffscottward · Pull Request #523 · RunMaestro/Maestro

jeffscottward · 2026-03-06T18:11:25Z

Summary

add a native TypeScript LLM guard module with secrets redaction, basic PII anonymization/deanonymization, and prompt-injection heuristics
run the guard before process spawn and carry guard state through the managed process lifecycle for output scanning
cover the new prompt/result interception paths with focused unit tests

Testing

npm test -- src/tests/main/security/llm-guard.test.ts src/tests/main/ipc/handlers/process.test.ts src/tests/main/process-manager/handlers/StdoutHandler.test.ts
npm run build:prompts
npm run lint

Closes #522

Summary by CodeRabbit

Release Notes

New Features
- Added LLM Guard security module with pre- and post-processing of prompts and responses
- Integrated PII anonymization, secret redaction, and prompt injection detection
- Configurable guard actions: warn, sanitize, or block
- PII deanonymization in responses with vault management for sensitive data placeholders
Tests
- Comprehensive test coverage for LLM Guard functionality, process spawning, and output handling

coderabbitai · 2026-03-06T18:15:03Z

📝 Walkthrough

Walkthrough

This PR implements a TypeScript-native LLM Guard security module for protecting LLM prompts and responses. It detects and handles PII, secrets, and prompt injection threats through configurable pre-processing and post-processing pipelines integrated into the process spawning and output handling flows.

Changes

Cohort / File(s)	Summary
LLM Guard Core Module `src/main/security/llm-guard/types.ts`, `src/main/security/llm-guard/vault.ts`, `src/main/security/llm-guard/index.ts`	New security module with pattern-based detection for PII (emails, phones, SSNs, credit cards, IPs), secrets (GitHub tokens, AWS keys, connection strings), and prompt injection signals. Includes PiiVault for managing anonymization mappings, configurable actions (warn/sanitize/block), and pre/post-processing guard functions (runLlmGuardPre, runLlmGuardPost) with optional blocking and detailed findings tracking.
Process Handler Pre-check Integration `src/main/ipc/handlers/process.ts`	Integrates runLlmGuardPre into prompt processing before spawning. Computes llmGuardConfig, runs pre-check, logs findings, blocks if needed, and propagates llmGuardState. Replaces isWindows() with local constant and uses effectivePrompt throughout spawn logic and SSH/stdin handling.
Output Handler Post-check Integration `src/main/process-manager/handlers/ExitHandler.ts`, `src/main/process-manager/handlers/StdoutHandler.ts`	Adds applyOutputGuard helper to both handlers that runs runLlmGuardPost on emitted outputs. Routes text through guard for sanitization/blocking based on llmGuardState, logs findings when detected, and replaces output with guarded text or blocked indicator when applicable.
Type Extensions `src/main/process-manager/types.ts`, `src/main/process-manager/spawners/ChildProcessSpawner.ts`	Extends ProcessConfig and ManagedProcess interfaces with optional llmGuardState property. ChildProcessSpawner now propagates config.llmGuardState to spawned process record.
Test Coverage `src/__tests__/main/ipc/handlers/process.test.ts`, `src/__tests__/main/process-manager/handlers/ExitHandler.test.ts`, `src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts`, `src/__tests__/main/security/llm-guard.test.ts`	Comprehensive test coverage validating: process handler sanitize/block modes, exit handler vault deanonymization with redaction, stdout handler placeholder replacement, and llm-guard unit tests for pre-scan anonymization, post-scan deanonymization, and block-mode prompt injection detection.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ProcessHandler as Process Handler
    participant LLMGuardPre as LLMGuard Pre
    participant ProcessMgr as Process Manager
    participant LLMGuardPost as LLMGuard Post
    participant Output

    Client->>ProcessHandler: spawn(config with prompt)
    ProcessHandler->>LLMGuardPre: runLlmGuardPre(prompt)
    LLMGuardPre->>LLMGuardPre: Detect PII/Secrets<br/>Anonymize & Vault<br/>Check Prompt Injection
    LLMGuardPre-->>ProcessHandler: {sanitizedPrompt, vault, findings, blocked?}
    
    alt Guard Blocked
        ProcessHandler-->>Client: Error (blocked)
    else Guard Allowed
        ProcessHandler->>ProcessHandler: Log findings if any
        ProcessHandler->>ProcessMgr: spawn(sanitizedPrompt, llmGuardState)
        ProcessMgr->>ProcessMgr: Execute process
        ProcessMgr-->>ProcessHandler: response
        
        ProcessHandler->>LLMGuardPost: runLlmGuardPost(response, vault)
        LLMGuardPost->>LLMGuardPost: Deanonymize PII<br/>Redact Secrets<br/>Detect Leakage
        LLMGuardPost-->>ProcessHandler: {sanitizedResponse, findings, blocked?}
        
        alt Post Guard Blocked
            ProcessHandler-->>Output: Blocked message
        else Post Guard Allowed
            ProcessHandler-->>Output: Sanitized response
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

refactor: centralize platform detection into shared utilities (#319) #517: Modifies Windows detection handling in src/main/ipc/handlers/process.ts, affecting the same prompt processing logic where this PR adds LLM Guard integration.
fix: resolve Sentry crashes from debug logging, clipboard, and markdown details #481: Updates ExitHandler.ts and ChildProcessSpawner.ts to remove debug logging; overlaps with files extended in this PR for guard post-processing and state propagation.
Enhance the usage of Global Environment Vars #405: Extends ProcessConfig and ManagedProcess types in process manager, similar pattern to this PR's addition of llmGuardState properties.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add native llm guard pipeline' clearly and concisely describes the main change—introducing a TypeScript-native LLM guard security pipeline, which aligns with the primary objective of the PR.
Linked Issues check	✅ Passed	The PR implements core Phase 1 MVP requirements: secrets redaction, basic PII anonymization/deanonymization with vault, prompt-injection detection, and configurable actions (warn/sanitize/block). Pre- and post-processing hooks are integrated into the process spawn pipeline.
Out of Scope Changes check	✅ Passed	All changes directly support LLM guard integration: new guard module with detection logic, vault system, pre/post-processing in handlers, type extensions, and focused unit tests. No unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-03-06T18:15:40Z

Greptile Summary

This PR introduces a native TypeScript LLM guard pipeline that runs before process spawn (pre-scan: secret redaction, PII anonymization, prompt-injection detection) and after result emission (post-scan: PII deanonymization, output secret redaction, PII-leakage detection). Guard state — including the PII vault and input findings — is threaded through ProcessConfig and ManagedProcess so the StdoutHandler can deanonymize placeholders in LLM output. The overall architecture is sound and well-integrated into the existing spawn lifecycle.

Key findings:

Overlapping regex findings corrupt replacement output — applyReplacements applies matches right-to-left using original text positions, but does not deduplicate or merge overlapping spans. When two patterns match the same region (e.g. a connection string that also triggers the SECRET_OPENAI_KEY pattern, or a credit card matching the phone regex), the second replacement slices into already-modified text using stale indices, producing garbled output. This needs overlap detection before calling applyReplacements.
warn action is declared but not implemented — LlmGuardAction includes 'warn' as a valid config value, but both runLlmGuardPre and runLlmGuardPost only branch on 'block'; 'warn' silently behaves as 'sanitize' with no differentiation.
CREDIT_CARD_REGEX is extremely broad — the pattern /\b(?:\d[ -]*?){13,19}\b/ matches any 13–19 digit sequence with optional separators. Even with Luhn filtering, this will produce significant false positives in developer prompts (long numeric IDs, timestamps, hex values).
SECRET_OPENAI_KEY regex over-matches — sk-[A-Za-z0-9]{20,} matches any sk- prefixed identifier, a common pattern well beyond OpenAI keys.
Prompt-injection findings use original text positions — detectPromptInjection is passed the unmodified prompt instead of sanitizedPrompt, making finding start/end positions inconsistent with those of the secret/PII findings in the same result array.

Confidence Score: 2/5

Not safe to merge — the overlapping-findings bug in applyReplacements can produce silently corrupted prompts, undermining the security guarantees this feature is intended to provide.
The integration with the spawn lifecycle and the vault deanonymization are well-implemented, but the core replacement logic has a correctness bug (overlapping spans → garbled text) that can silently damage prompts rather than protecting them. Additionally, the overly broad regex patterns create a high false-positive risk that could redact legitimate content in normal developer use, and the warn action is a dead configuration path that will confuse users who configure it.
src/main/security/llm-guard/index.ts requires the most attention — the applyReplacements overlap issue and the pattern quality problems are all concentrated there.

Important Files Changed

Filename	Overview
src/main/security/llm-guard/index.ts	Core guard module — contains a logic bug where overlapping regex findings from multiple patterns corrupt the replacement output; `warn` action silently falls through to `sanitize` behavior; `CREDIT_CARD_REGEX` and `SECRET_OPENAI_KEY` patterns are overly broad causing high false-positive risk in developer prompts.
src/main/security/llm-guard/types.ts	Type definitions are well-structured; the `warn` action variant in `LlmGuardAction` is declared but has no implementation in the guard logic, making it a dead/misleading config option.
src/main/security/llm-guard/vault.ts	Simple, correct PII vault implementation; `deanonymize` uses safe string splitting to avoid regex-escaping issues with placeholder values.
src/main/ipc/handlers/process.ts	Guard integration is clean; `effectivePrompt` correctly threads through all spawn paths; `isWindows` refactored from import to inline check correctly; `processManager.spawn` is now awaited (safe even if synchronous).
src/main/process-manager/handlers/StdoutHandler.ts	Output guard applied correctly at all three result-emission paths; blocked responses are replaced with a user-visible sentinel string rather than silently dropped, which is appropriate.
src/main/process-manager/spawners/ChildProcessSpawner.ts	Minimal one-line change propagating `llmGuardState` from `ProcessConfig` into the `ManagedProcess` record; straightforward and correct.
src/main/process-manager/types.ts	Adds optional `llmGuardState` field to both `ProcessConfig` and `ManagedProcess`; clean, non-breaking change.
src/tests/main/security/llm-guard.test.ts	Focused unit tests cover the three core flows (pre-sanitize, post-deanonymize, block-mode injection); good coverage for the happy paths, though no tests for overlapping-pattern behavior.
src/tests/main/ipc/handlers/process.test.ts	Two new integration-style tests verify guard state is passed into spawn and that block-mode rejects the promise; covers the important branching paths in `process.ts`.
src/tests/main/process-manager/handlers/StdoutHandler.test.ts	New test validates deanonymization and output-secret redaction in the stdout path; correctly asserts the emitted string contains both the restored email and the redacted token placeholder.

Sequence Diagram

sequenceDiagram
    participant R as Renderer (IPC)
    participant PH as process.ts handler
    participant LG as llm-guard (pre)
    participant V as PiiVault
    participant PM as ProcessManager.spawn
    participant CP as ChildProcess
    participant SH as StdoutHandler
    participant LGO as llm-guard (post)
    participant BM as BufferManager

    R->>PH: process:spawn { prompt, toolType, ... }
    PH->>PH: normalizeLlmGuardConfig(settingsStore.get)
    alt toolType !== 'terminal' && prompt exists
        PH->>LG: runLlmGuardPre(prompt, config)
        LG->>LG: redactSecrets(prompt)
        LG->>V: anonymizePii(sanitized, vault)
        V-->>LG: placeholder mappings stored
        LG->>LG: detectPromptInjection(prompt)
        LG-->>PH: { sanitizedPrompt, vault, findings, blocked }
        alt blocked === true
            PH-->>R: throw Error (blocked by guard)
        end
        PH->>PH: effectivePrompt = sanitizedPrompt
        PH->>PH: llmGuardState = { config, vault, inputFindings }
    end
    PH->>PM: spawn({ ...config, prompt: effectivePrompt, llmGuardState })
    PM->>CP: spawn child process
    CP-->>SH: stdout data
    SH->>SH: parse JSON lines
    alt result message
        SH->>LGO: applyOutputGuard(sessionId, managedProcess, resultText)
        LGO->>LGO: PiiVault.deanonymize(text, vault)
        LGO->>LGO: redactSecrets(deanonymized)
        LGO->>LGO: detectPiiLeakage(redacted, vault)
        LGO-->>SH: guardedText (or block sentinel)
        SH->>BM: emitDataBuffered(sessionId, guardedText)
    end

_{Last reviewed commit: 5318d36}

greptile-apps · 2026-03-06T18:15:44Z

src/main/security/llm-guard/index.ts

+function applyReplacements(
+	text: string,
+	findings: LlmGuardFinding[],
+	replacementBuilder: (finding: LlmGuardFinding, index: number) => string
+): { text: string; findings: LlmGuardFinding[] } {
+	const sortedFindings = [...findings].sort((a, b) => b.start - a.start);
+	let nextText = text;
+
+	sortedFindings.forEach((finding, reverseIndex) => {
+		const index = sortedFindings.length - reverseIndex;
+		const replacement = replacementBuilder(finding, index);
+		nextText =
+			nextText.slice(0, finding.start) + replacement + nextText.slice(finding.end);
+		finding.replacement = replacement;
+	});
+
+	return {
+		text: nextText,
+		findings: sortedFindings.sort((a, b) => a.start - b.start),
+	};
+}


Overlapping findings corrupt output text

applyReplacements sorts findings by start descending and applies replacements right-to-left, relying on the invariant that each replacement operates on the original text's positions. However, when two patterns independently match overlapping spans (e.g. SECRET_CONNECTION_STRING matching postgres://sk-abc...@host/db while SECRET_OPENAI_KEY also matches sk-abc... inside that same range), both findings are collected and passed to applyReplacements together. The first replacement modifies the string in-place; the second then slices using stale start/end indices from the original text, producing garbled output — characters from the first replacement will be duplicated or overwritten.

This affects both redactSecrets (multiple secret patterns run on the same text) and anonymizePii (multiple PII patterns run on the same text, e.g. a credit card number overlapping a phone number via the very broad CREDIT_CARD_REGEX).

To fix, deduplicate/merge overlapping findings before applying replacements — keep only the longest (outermost) match when two findings overlap:

function mergeOverlapping(findings: LlmGuardFinding[]): LlmGuardFinding[] { const sorted = [...findings].sort((a, b) => a.start - b.start || b.end - a.end); const merged: LlmGuardFinding[] = []; for (const f of sorted) { const last = merged[merged.length - 1]; if (last && f.start < last.end) continue; // skip overlapping/contained finding merged.push(f); } return merged; }

Then call applyReplacements(text, mergeOverlapping(findings), ...) in both redactSecrets and anonymizePii.

greptile-apps · 2026-03-06T18:15:45Z

src/main/security/llm-guard/types.ts

@@ -0,0 +1,59 @@
+export type LlmGuardAction = 'warn' | 'sanitize' | 'block';


warn action declared but not implemented

LlmGuardAction includes 'warn' as a valid value, but neither runLlmGuardPre nor runLlmGuardPost have any logic branch for action === 'warn'. At runtime, 'warn' silently behaves identically to 'sanitize' — secrets are still redacted and PII is anonymized, but no warning is surfaced to the caller. If 'warn' is meant to detect-and-log without modifying the prompt, the implementation needs an explicit branch. If it's not yet supported, it should be removed from the union type to prevent callers from configuring a non-functional mode.

Suggested change

export type LlmGuardAction = 'warn' | 'sanitize' | 'block';

export type LlmGuardAction = 'sanitize' | 'block';

greptile-apps · 2026-03-06T18:15:46Z

src/main/security/llm-guard/index.ts

+const PHONE_REGEX = /\b(?:\+?1[-.\s]?)?(?:\(?\d{3}\)?[-.\s]?){2}\d{4}\b/g;
+const SSN_REGEX = /\b\d{3}-\d{2}-\d{4}\b/g;
+const IPV4_REGEX = /\b(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}\b/g;
+const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g;


CREDIT_CARD_REGEX is extremely broad and will produce many false positives

/\b(?:\d[ -]*?){13,19}\b/g matches virtually any 13–19 digit sequence separated by optional spaces or hyphens. In typical developer prompts, this pattern will fire on long numeric identifiers, timestamps, hex values rendered as decimals, issue numbers, etc. Even with the Luhn check (which passes for ~10% of random digit strings), the false-positive rate in developer-oriented text is expected to be very high.

A narrower pattern that requires the common four-group card format would significantly reduce false positives:

Suggested change

const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g;

const CREDIT_CARD_REGEX = /\b(?:4\d{3}|5[1-5]\d{2}|3[47]\d{2}|6(?:011|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g;

greptile-apps · 2026-03-06T18:15:47Z

src/main/security/llm-guard/index.ts

+	},
+	{
+		type: 'SECRET_OPENAI_KEY',
+		regex: /\bsk-[A-Za-z0-9]{20,}\b/g,


SECRET_OPENAI_KEY pattern is too permissive

/\bsk-[A-Za-z0-9]{20,}\b/g will match any identifier starting with sk- followed by 20+ alphanumeric characters. The sk- prefix is widely used across many APIs and internal identifiers (e.g. sk-client-prod-identifier-12345abc…). Modern OpenAI API keys use more specific prefixes (sk-proj-, sk-svcacct-) that could be matched more precisely, and the legacy sk- format has a specific length (51 total characters). Tightening the pattern reduces false positives that could silently redact non-sensitive identifiers from prompts:

Suggested change

regex: /\bsk-[A-Za-z0-9]{20,}\b/g,

{

type: 'SECRET_OPENAI_KEY',

regex: /\bsk-(?:proj-|svcacct-|org-)?[A-Za-z0-9]{20,}T3BlbkFJ[A-Za-z0-9]{20,}\b|\bsk-[A-Za-z0-9]{48}\b/g,

confidence: 0.96,

},

Alternatively, if false positives are acceptable, at minimum raise the length threshold and require the sk- to appear in an API-key context (after =, :, etc.) to reduce ambient matching.

greptile-apps · 2026-03-06T18:15:48Z

src/main/security/llm-guard/index.ts

+	let blockReason: string | undefined;
+
+	if (effectiveConfig.input.detectPromptInjection) {
+		const promptInjectionFindings = detectPromptInjection(prompt);


Prompt-injection detection runs on original prompt, not sanitized text

detectPromptInjection(prompt) uses the original prompt variable rather than sanitizedPrompt (which has already had secrets redacted and PII anonymized). While this doesn't cause incorrect blocking behavior (only confidence scores drive the block decision), it means the findings array returned from runLlmGuardPre contains injection findings whose start/end positions point into the original text, inconsistent with the positions of the secret/PII findings which point into the sanitized text. If callers ever use these positions to highlight findings in the UI, this will produce misaligned highlights.

Suggested change

const promptInjectionFindings = detectPromptInjection(prompt);

const promptInjectionFindings = detectPromptInjection(sanitizedPrompt);

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/main/ipc/handlers/process.ts (1)
132-162: ⚠️ Potential issue | 🟠 Major

Don't log the raw prompt before the pre-scan.

This branch still builds promptPreview from config.prompt, so Windows debug logs can persist secrets/PII even when LLM Guard is enabled. Move that preview below runLlmGuardPre() and derive it from effectivePrompt, or keep this log to metadata only.

Also applies to: 171-191
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/ipc/handlers/process.ts` around lines 132 - 162, The current log
block builds and logs promptPreview from config.prompt (via promptPreview)
before running runLlmGuardPre(), potentially leaking secrets/PII; move the
prompt-preview construction and logging to after runLlmGuardPre() and derive it
from effectivePrompt (or remove the preview) so that LLM guard has already
sanitized/approved the prompt; update references around logFn and
sessionSshRemoteConfig accordingly to only include prompt metadata before the
pre-scan and the actual preview from effectivePrompt after runLlmGuardPre().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/__tests__/main/security/llm-guard.test.ts`:
- Around line 15-18: The test uses a literal PAT-looking string in
runLlmGuardPre which trips secret scanners; change the fixture to build the
token at runtime from concatenated fragments (e.g., const part1 = 'ghp_'; const
part2 = '12345' + '67890'...; const token = part1 + part2) and pass that token
into runLlmGuardPre instead of embedding "ghp_..."; apply the same
fragment-concatenation approach to the other ghp_ fixtures referenced around
lines 37-43 so no full PAT-shaped literal remains in the test file.

In `@src/main/ipc/handlers/process.ts`:
- Around line 163-191: The llmGuardState is only set when effectivePrompt is
truthy, leaving non-terminal spawns without prompts unseeded; always derive
llmGuardConfig via normalizeLlmGuardConfig and, for non-terminal tool types,
call runLlmGuardPre using effectivePrompt || '' (or an explicit empty string) so
you always populate llmGuardState (config, vault, inputFindings) and still
respect findings/blocking logic and sanitizedPrompt replacement; update the
block around runLlmGuardPre and the assignment to llmGuardState (and mirror the
same change in the other occurrence handling lines for the post-scan path) so
processManager.spawn always receives a valid llmGuardState.

In `@src/main/process-manager/handlers/ExitHandler.ts`:
- Around line 33-57: The parse-failure fallback paths currently emit raw
remainingLine/jsonBuffer without redaction; ensure every emission path passes
output through ExitHandler.applyOutputGuard before sending/logging. Locate the
parse-failure/fallback branches that emit remainingLine or jsonBuffer (the
sections invoking unguarded emission in the same class that reference
ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to
this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the
guarded string; keep existing warning/blocked handling from applyOutputGuard so
blocked responses are handled consistently.

In `@src/main/process-manager/handlers/StdoutHandler.ts`:
- Around line 503-527: The applyOutputGuard logic is only used for structured
emissions, so plain-text and JSON-parse fallback branches still emit raw stdout;
update the plain-text emission branch and the JSON-parse fallback in
StdoutHandler.ts to call applyOutputGuard(sessionId, managedProcess, resultText)
(which uses runLlmGuardPost) before forwarding output, handle a blocked result
by suppressing or replacing the emission with the guard block message, and emit
guardResult.sanitizedResponse instead of the original resultText when not
blocked. Locate the plain-text branch and the JSON-parse fallback in
StdoutHandler and replace raw forwards with the guarded flow so all stdout paths
are redacted/blocked consistently.

In `@src/main/security/llm-guard/index.ts`:
- Around line 273-279: The pre-scan and post-scan PII pattern sets are out of
sync: anonymizePii() currently handles PII_IP_ADDRESS and PII_CREDIT_CARD (with
Luhn validation) but detectPiiLeakage() only re-scans email/phone/SSN; unify
them by extracting a shared piiPatterns array (including PII_EMAIL, PII_PHONE,
PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD) and reference it from both
anonymizePii() and detectPiiLeakage(), and ensure the PII_CREDIT_CARD entry
includes the Luhn filter logic so credit cards are validated consistently on
both paths (also update the rescanning branch used when action === 'block' to
use the shared list).
- Around line 57-64: The current rules PROMPT_INJECTION_ROLE_OVERRIDE and
PROMPT_INJECTION_NEW_INSTRUCTIONS use overly permissive regexes and a single-hit
confidence threshold that immediately sets blocked; tighten these regexes to
require stronger cues (e.g., include role nouns or punctuation like "you are now
the|an|a <role>" or require "new instructions:" preceded by a directive word)
and/or raise their confidence values, and change the blocking logic so blocked
is set only when multiple signals are present (e.g., require two or more
matching rules or a combined confidence sum threshold)—modify the rules array
entries for type 'PROMPT_INJECTION_ROLE_OVERRIDE' and
'PROMPT_INJECTION_NEW_INSTRUCTIONS' and adjust the code that computes `blocked`
to aggregate matches instead of blocking on a single match (affecting the rule
matching/aggregation logic used elsewhere around the other related rules).

---

Outside diff comments:
In `@src/main/ipc/handlers/process.ts`:
- Around line 132-162: The current log block builds and logs promptPreview from
config.prompt (via promptPreview) before running runLlmGuardPre(), potentially
leaking secrets/PII; move the prompt-preview construction and logging to after
runLlmGuardPre() and derive it from effectivePrompt (or remove the preview) so
that LLM guard has already sanitized/approved the prompt; update references
around logFn and sessionSshRemoteConfig accordingly to only include prompt
metadata before the pre-scan and the actual preview from effectivePrompt after
runLlmGuardPre().

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3dcfd996-01fa-4b9d-b962-beeae9758ba2

📥 Commits

Reviewing files that changed from the base of the PR and between 58ec09a and 3516cfe.

📒 Files selected for processing (12)

src/__tests__/main/ipc/handlers/process.test.ts
src/__tests__/main/process-manager/handlers/ExitHandler.test.ts
src/__tests__/main/process-manager/handlers/StdoutHandler.test.ts
src/__tests__/main/security/llm-guard.test.ts
src/main/ipc/handlers/process.ts
src/main/process-manager/handlers/ExitHandler.ts
src/main/process-manager/handlers/StdoutHandler.ts
src/main/process-manager/spawners/ChildProcessSpawner.ts
src/main/process-manager/types.ts
src/main/security/llm-guard/index.ts
src/main/security/llm-guard/types.ts
src/main/security/llm-guard/vault.ts

coderabbitai · 2026-03-06T18:32:14Z

src/__tests__/main/security/llm-guard.test.ts

+		const result = runLlmGuardPre(
+			'Contact john@example.com with token ghp_123456789012345678901234567890123456',
+			enabledConfig
+		);


⚠️ Potential issue | 🟠 Major

Avoid PAT-shaped literals in test fixtures.

These ghp_... strings are secret-scanner hits and can fail CI or incident automation even though they're synthetic. Build the token at runtime from split fragments instead of checking the full pattern into the repo, and apply the same cleanup to the other new ghp_ fixtures in this PR.

🧪 Proposed fix

+const githubToken = ['gh', 'p_', '123456789012345678901234567890123456'].join(''); + describe('llm guard', () => { it('anonymizes pii and redacts secrets during pre-scan', () => { const result = runLlmGuardPre( - 'Contact john@example.com with token ghp_123456789012345678901234567890123456', + `Contact john@example.com with token ${githubToken}`, enabledConfig ); @@ it('deanonymizes vault values and redacts output secrets during post-scan', () => { const result = runLlmGuardPost( - 'Reach [EMAIL_1] and rotate ghp_123456789012345678901234567890123456', + `Reach [EMAIL_1] and rotate ${githubToken}`, { entries: [{ placeholder: '[EMAIL_1]', original: 'john@example.com', type: 'PII_EMAIL' }], },

Also applies to: 37-43

🧰 Tools

🪛 Gitleaks (8.30.0)

[high] 16-16: Uncovered a GitHub Personal Access Token, potentially leading to unauthorized repository access and sensitive content exposure.

(github-pat)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/__tests__/main/security/llm-guard.test.ts` around lines 15 - 18, The test uses a literal PAT-looking string in runLlmGuardPre which trips secret scanners; change the fixture to build the token at runtime from concatenated fragments (e.g., const part1 = 'ghp_'; const part2 = '12345' + '67890'...; const token = part1 + part2) and pass that token into runLlmGuardPre instead of embedding "ghp_..."; apply the same fragment-concatenation approach to the other ghp_ fixtures referenced around lines 37-43 so no full PAT-shaped literal remains in the test file.

coderabbitai · 2026-03-06T18:32:14Z

src/main/ipc/handlers/process.ts

+				let effectivePrompt = config.prompt;
+				let llmGuardState: LlmGuardState | undefined;
+				const llmGuardConfig = normalizeLlmGuardConfig(
+					(settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as
+						| Partial<LlmGuardConfig>
+						| undefined) ?? DEFAULT_LLM_GUARD_CONFIG
+				);
+
+				if (config.toolType !== 'terminal' && effectivePrompt) {
+					const guardResult = runLlmGuardPre(effectivePrompt, llmGuardConfig);
+					if (guardResult.findings.length > 0) {
+						logger.warn('[LLMGuard] Input findings detected', 'LLMGuard', {
+							sessionId: config.sessionId,
+							toolType: config.toolType,
+							findings: guardResult.findings.map((finding) => finding.type),
+						});
+					}
+
+					if (guardResult.blocked) {
+						throw new Error(guardResult.blockReason ?? 'Prompt blocked by LLM Guard.');
+					}
+
+					effectivePrompt = guardResult.sanitizedPrompt;
+					llmGuardState = {
+						config: llmGuardConfig,
+						vault: guardResult.vault,
+						inputFindings: guardResult.findings,
+					};
+				}


⚠️ Potential issue | 🟠 Major

Seed llmGuardState for prompt-less agent spawns.

llmGuardState is only assigned when effectivePrompt is truthy. Non-terminal spawns without an initial prompt therefore reach processManager.spawn() with no guard context to carry into the post-scan path.

💡 Suggested fix

- let effectivePrompt = config.prompt; - let llmGuardState: LlmGuardState | undefined; const llmGuardConfig = normalizeLlmGuardConfig( (settingsStore.get('llmGuardConfig', DEFAULT_LLM_GUARD_CONFIG) as | Partial<LlmGuardConfig> | undefined) ?? DEFAULT_LLM_GUARD_CONFIG ); + let effectivePrompt = config.prompt; + let llmGuardState: LlmGuardState | undefined = + config.toolType !== 'terminal' + ? { + config: llmGuardConfig, + vault: { entries: [] }, + inputFindings: [], + } + : undefined;

Also applies to: 537-568

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/ipc/handlers/process.ts` around lines 163 - 191, The llmGuardState is only set when effectivePrompt is truthy, leaving non-terminal spawns without prompts unseeded; always derive llmGuardConfig via normalizeLlmGuardConfig and, for non-terminal tool types, call runLlmGuardPre using effectivePrompt || '' (or an explicit empty string) so you always populate llmGuardState (config, vault, inputFindings) and still respect findings/blocking logic and sanitizedPrompt replacement; update the block around runLlmGuardPre and the assignment to llmGuardState (and mirror the same change in the other occurrence handling lines for the post-scan path) so processManager.spawn always receives a valid llmGuardState.

coderabbitai · 2026-03-06T18:32:14Z

src/main/process-manager/handlers/ExitHandler.ts

+	private applyOutputGuard(
+		sessionId: string,
+		managedProcess: ManagedProcess,
+		resultText: string
+	): string {
+		const guardState = managedProcess.llmGuardState;
+		if (!guardState?.config?.enabled) {
+			return resultText;
+		}
+
+		const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config);
+		if (guardResult.findings.length > 0) {
+			logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', {
+				sessionId,
+				toolType: managedProcess.toolType,
+				findings: guardResult.findings.map((finding) => finding.type),
+			});
+		}
+
+		if (guardResult.blocked) {
+			return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`;
+		}
+
+		return guardResult.sanitizedResponse;
+	}


⚠️ Potential issue | 🟠 Major

Parse-failure exit paths still emit unguarded output.

The new guard covers parsed results, but the fallbacks at Lines 126-129 and 303-310 still emit remainingLine/jsonBuffer verbatim. A truncated or malformed final payload would bypass redaction at the last emission point.

🔒 Proposed fix

} catch { // If parsing fails, emit the raw line as data - this.bufferManager.emitDataBuffered(sessionId, remainingLine); + this.bufferManager.emitDataBuffered( + sessionId, + this.applyOutputGuard(sessionId, managedProcess, remainingLine) + ); } @@ } catch (error) { logger.error('[ProcessManager] Failed to parse JSON response', 'ProcessManager', { sessionId, error: String(error), }); // Emit raw buffer as fallback - this.emitter.emit('data', sessionId, managedProcess.jsonBuffer!); + this.emitter.emit( + 'data', + sessionId, + this.applyOutputGuard(sessionId, managedProcess, managedProcess.jsonBuffer!) + ); }

Also applies to: 120-123, 277-281

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/process-manager/handlers/ExitHandler.ts` around lines 33 - 57, The parse-failure fallback paths currently emit raw remainingLine/jsonBuffer without redaction; ensure every emission path passes output through ExitHandler.applyOutputGuard before sending/logging. Locate the parse-failure/fallback branches that emit remainingLine or jsonBuffer (the sections invoking unguarded emission in the same class that reference ManagedProcess and runLlmGuardPost) and replace direct emissions with a call to this.applyOutputGuard(sessionId, managedProcess, <payload>) and then emit the guarded string; keep existing warning/blocked handling from applyOutputGuard so blocked responses are handled consistently.

coderabbitai · 2026-03-06T18:32:14Z

src/main/process-manager/handlers/StdoutHandler.ts

+	private applyOutputGuard(
+		sessionId: string,
+		managedProcess: ManagedProcess,
+		resultText: string
+	): string {
+		const guardState = managedProcess.llmGuardState;
+		if (!guardState?.config?.enabled) {
+			return resultText;
+		}
+
+		const guardResult = runLlmGuardPost(resultText, guardState.vault, guardState.config);
+		if (guardResult.findings.length > 0) {
+			logger.warn('[LLMGuard] Output findings detected', 'LLMGuard', {
+				sessionId,
+				toolType: managedProcess.toolType,
+				findings: guardResult.findings.map((finding) => finding.type),
+			});
+		}
+
+		if (guardResult.blocked) {
+			return `[Maestro LLM Guard blocked response] ${guardResult.blockReason ?? 'Sensitive content detected.'}`;
+		}
+
+		return guardResult.sanitizedResponse;
+	}


⚠️ Potential issue | 🟠 Major

Non-parser stdout still bypasses LLM Guard.

applyOutputGuard() only wraps structured result emissions. The plain-text branch at Line 132 and the JSON-parse fallback at Line 214 still forward raw output, so malformed or parser-less agent responses can leak secrets/PII without redaction or blocking.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/process-manager/handlers/StdoutHandler.ts` around lines 503 - 527, The applyOutputGuard logic is only used for structured emissions, so plain-text and JSON-parse fallback branches still emit raw stdout; update the plain-text emission branch and the JSON-parse fallback in StdoutHandler.ts to call applyOutputGuard(sessionId, managedProcess, resultText) (which uses runLlmGuardPost) before forwarding output, handle a blocked result by suppressing or replacing the emission with the guard block message, and emit guardResult.sanitizedResponse instead of the original resultText when not blocked. Locate the plain-text branch and the JSON-parse fallback in StdoutHandler and replace raw forwards with the guarded flow so all stdout paths are redacted/blocked consistently.

coderabbitai · 2026-03-06T18:32:14Z

src/main/security/llm-guard/index.ts

+		type: 'PROMPT_INJECTION_ROLE_OVERRIDE',
+		regex: /you\s+are\s+now\s+(?:a|an)?/gi,
+		confidence: 0.84,
+	},
+	{
+		type: 'PROMPT_INJECTION_NEW_INSTRUCTIONS',
+		regex: /\bnew\s+instructions?\s*:/gi,
+		confidence: 0.78,


⚠️ Potential issue | 🟠 Major

block mode is too easy to trigger here.

Line 58 makes the role-override pattern match bare “you are now”, and Line 82's default threshold means that single hit blocks the request. new instructions: has the same problem. Tighten these patterns or require multiple signals before setting blocked.

Also applies to: 81-82, 142-149

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/security/llm-guard/index.ts` around lines 57 - 64, The current rules PROMPT_INJECTION_ROLE_OVERRIDE and PROMPT_INJECTION_NEW_INSTRUCTIONS use overly permissive regexes and a single-hit confidence threshold that immediately sets blocked; tighten these regexes to require stronger cues (e.g., include role nouns or punctuation like "you are now the|an|a <role>" or require "new instructions:" preceded by a directive word) and/or raise their confidence values, and change the blocking logic so blocked is set only when multiple signals are present (e.g., require two or more matching rules or a combined confidence sum threshold)—modify the rules array entries for type 'PROMPT_INJECTION_ROLE_OVERRIDE' and 'PROMPT_INJECTION_NEW_INSTRUCTIONS' and adjust the code that computes `blocked` to aggregate matches instead of blocking on a single match (affecting the rule matching/aggregation logic used elsewhere around the other related rules).

coderabbitai · 2026-03-06T18:32:14Z

src/main/security/llm-guard/index.ts

+	const piiPatterns = [
+		{ type: 'PII_EMAIL', regex: EMAIL_REGEX, confidence: 0.99 },
+		{ type: 'PII_PHONE', regex: PHONE_REGEX, confidence: 0.92 },
+		{ type: 'PII_SSN', regex: SSN_REGEX, confidence: 0.97 },
+		{ type: 'PII_IP_ADDRESS', regex: IPV4_REGEX, confidence: 0.88 },
+		{ type: 'PII_CREDIT_CARD', regex: CREDIT_CARD_REGEX, confidence: 0.75 },
+	];


⚠️ Potential issue | 🟠 Major

Keep post-scan PII coverage in sync with pre-scan coverage.

anonymizePii() handles PII_IP_ADDRESS and PII_CREDIT_CARD, but detectPiiLeakage() only rescans email/phone/SSN. In action: 'block' mode those two classes can still leave the system unchecked. Reuse a shared PII pattern list, including the Luhn filter for credit cards, on both paths.

Also applies to: 309-315

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/main/security/llm-guard/index.ts` around lines 273 - 279, The pre-scan and post-scan PII pattern sets are out of sync: anonymizePii() currently handles PII_IP_ADDRESS and PII_CREDIT_CARD (with Luhn validation) but detectPiiLeakage() only re-scans email/phone/SSN; unify them by extracting a shared piiPatterns array (including PII_EMAIL, PII_PHONE, PII_SSN, PII_IP_ADDRESS, PII_CREDIT_CARD) and reference it from both anonymizePii() and detectPiiLeakage(), and ensure the PII_CREDIT_CARD entry includes the Luhn filter logic so credit cards are validated consistently on both paths (also update the rescanning branch used when action === 'block' to use the shared list).

feat: add native llm guard pipeline

5318d36

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

jeffscottward added 2 commits March 6, 2026 13:16

fix: guard exit-only llm responses

7b1f264

style: format llm guard changes

3516cfe

coderabbitai bot requested changes Mar 6, 2026

View reviewed changes

jeffscottward mentioned this pull request Mar 6, 2026

Feature Request: TypeScript-Native LLM Guard for Input/Output Protection #522

Open

17 tasks

		@@ -0,0 +1,59 @@
		export type LlmGuardAction = 'warn' \| 'sanitize' \| 'block';

	export type LlmGuardAction = 'warn' \| 'sanitize' \| 'block';
	export type LlmGuardAction = 'sanitize' \| 'block';

	const CREDIT_CARD_REGEX = /\b(?:\d[ -]*?){13,19}\b/g;
	const CREDIT_CARD_REGEX = /\b(?:4\d{3}\|5[1-5]\d{2}\|3[47]\d{2}\|6(?:011\|5\d{2}))[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b/g;

	const promptInjectionFindings = detectPromptInjection(prompt);
	const promptInjectionFindings = detectPromptInjection(sanitizedPrompt);

Conversation

jeffscottward commented Mar 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

greptile-apps bot commented Mar 6, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeffscottward commented Mar 6, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading