-
Notifications
You must be signed in to change notification settings - Fork 243
Description
Summary
Build a TypeScript-native security scanning library (inspired by LLM Guard) that Maestro can use directly to provide input sanitization, output validation, PII protection, prompt injection detection, and secrets scanning across all managed AI sessions — with zero Python dependency.
Background
LLM Guard by Protect AI provides excellent security primitives for LLMs, but it's Python-only. Rather than running a Python sidecar, we should replicate the core scanning capabilities in TypeScript so Maestro can integrate them natively without additional runtime dependencies.
Given Maestro's role as an orchestration layer managing multiple concurrent AI agents, it's uniquely positioned to implement these security measures centrally rather than requiring each underlying agent to handle security independently.
Architecture: TypeScript Scanner Library
@maestro/llm-guard
├── src/
│ ├── index.ts # Main exports
│ ├── scanner.ts # Base scanner interface & pipeline runner
│ ├── vault.ts # Anonymize/Deanonymize vault (PII mapping store)
│ ├── input/ # Input scanners
│ │ ├── pii-scanner.ts # PII detection & anonymization
│ │ ├── secrets-scanner.ts # Secrets/credentials detection
│ │ ├── prompt-injection.ts # Prompt injection detection
│ │ ├── ban-topics.ts # Topic blocking
│ │ ├── ban-substrings.ts # Substring/word blocking
│ │ └── invisible-chars.ts # Invisible/homoglyph character detection
│ ├── output/ # Output scanners
│ │ ├── pii-leakage.ts # PII leakage detection in responses
│ │ ├── deanonymize.ts # Restore anonymized PII from vault
│ │ ├── malicious-url.ts # Malicious URL detection
│ │ ├── sensitive-content.ts # Sensitive content filtering
│ │ └── code-scanner.ts # Dangerous code pattern detection
│ └── utils/
│ ├── regex-patterns.ts # Shared regex patterns
│ ├── entropy.ts # Shannon entropy calculator
│ └── tokenizer.ts # Lightweight tokenizer for heuristic checks
Core Scanner Interface
interface ScanResult {
isValid: boolean;
score: number; // 0.0 (safe) to 1.0 (dangerous)
sanitizedText?: string; // Modified text with redactions applied
findings: Finding[]; // Individual detections
}
interface Finding {
type: string; // e.g. "PII_EMAIL", "SECRET_AWS_KEY", "PROMPT_INJECTION"
value: string; // The matched content
start: number; // Start position in original text
end: number; // End position in original text
replacement?: string; // What it was replaced with (if sanitized)
confidence: number; // Detection confidence 0-1
}
interface Scanner {
name: string;
scan(text: string, config?: ScannerConfig): Promise<ScanResult>;
}
interface Guard {
pre(context: PreHookContext): Promise<PreHookResult>;
post(context: PostHookContext): Promise<PostHookResult>;
addInputScanner(scanner: Scanner): void;
addOutputScanner(scanner: Scanner): void;
}Proposed Features & TypeScript Implementation Strategy
1. PII Detection & Anonymize/Deanonymize
TypeScript approach: Regex patterns + heuristic NER (no ML model needed for common PII types).
Detectable entity types:
| Entity | Detection Method |
|---|---|
| Email addresses | Regex |
| Phone numbers | Regex + libphonenumber-js |
| SSN / Tax IDs | Regex with format validation |
| Credit card numbers | Regex + Luhn checksum |
| IP addresses (v4/v6) | Regex |
| Crypto wallet addresses | Regex (BTC, ETH patterns) |
| Person names | Heuristic: capitalized word sequences near PII context, or optional integration with a names dictionary |
| Physical addresses | Regex patterns for common address formats |
Vault system for Anonymize/Deanonymize:
class PiiVault {
private mappings: Map<string, string> = new Map();
anonymize(text: string, findings: Finding[]): { text: string; vault: PiiVault } {
// Replace each PII finding with a placeholder like [EMAIL_1], [PERSON_1]
// Store mapping: "[EMAIL_1]" -> "john@acme.com"
}
deanonymize(text: string): string {
// Restore all placeholders to original values
}
}User types: "Fix the bug for customer John Smith (john@acme.com)"
AI sees: "Fix the bug for customer [PERSON_1] ([EMAIL_1])"
User sees: "I've fixed the bug for John Smith (john@acme.com)..."
2. Secrets Detection & Redaction
TypeScript approach: Regex patterns (ported from detect-secrets & trufflehog patterns) + Shannon entropy analysis.
Detectable secret types:
- AWS access keys & secret keys (
AKIA..., 40-char base64) - Azure keys and connection strings
- GitHub tokens (
ghp_,gho_,ghs_,ghu_,github_pat_) - GitLab tokens (
glpat-) - Slack tokens (
xoxb-,xoxp-,xoxs-) - Google API keys (
AIza...) - Stripe keys (
sk_live_,pk_live_) - JWT tokens (
eyJ...) - Private keys (PEM format detection:
-----BEGIN.*PRIVATE KEY-----) - Generic high-entropy strings (Shannon entropy > 4.5 for hex, > 5.0 for base64)
- Database connection strings (postgres://, mysql://, mongodb://)
- Twilio, SendGrid, Mailgun, and other SaaS API keys
class SecretsScanner implements Scanner {
name = "secrets";
private patterns: SecretPattern[] = [
{ type: "AWS_ACCESS_KEY", regex: /AKIA[0-9A-Z]{16}/g, confidence: 0.95 },
{ type: "GITHUB_TOKEN", regex: /ghp_[A-Za-z0-9_]{36,}/g, confidence: 0.95 },
{ type: "PRIVATE_KEY", regex: /-----BEGIN\s?(RSA|EC|DSA|OPENSSH)?\s?PRIVATE KEY-----/g, confidence: 0.99 },
// ... 30+ patterns
];
private entropyCheck(str: string, charset: "hex" | "base64"): number {
// Shannon entropy calculation
}
}3. Prompt Injection Detection
TypeScript approach: Multi-layered heuristic detection (no ML model, but surprisingly effective).
Detection layers:
- Known payload patterns: Regex matching common injection patterns (
ignore previous instructions,you are now,system prompt:,<|im_start|>, etc.) - Structural analysis: Detect role-switching attempts, unusual delimiter usage, nested instruction blocks
- Instruction density scoring: Flag prompts with unusually high density of imperative verbs and instruction-like patterns
- Encoding detection: Detect base64-encoded instructions, Unicode tricks, invisible characters
- Behavioral scoring: Combine signal scores from multiple layers into a composite risk score
class PromptInjectionScanner implements Scanner {
name = "prompt-injection";
private injectionPatterns = [
/ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)/i,
/you\s+are\s+now\s+/i,
/new\s+instructions?\s*:/i,
/system\s*prompt\s*:/i,
/\[INST\]|\[\/INST\]|<\|im_start\|>|<\|im_end\|>/i,
/do\s+not\s+follow\s+(the\s+)?(previous|above|original)/i,
/act\s+as\s+(if\s+)?(you\s+are|a)\s+/i,
// ... more patterns
];
private detectEncodedPayloads(text: string): Finding[] {
// Check for base64-encoded instructions
// Check for Unicode homoglyph obfuscation
// Check for zero-width character hiding
}
}4. Output Content Filtering
TypeScript approach: Keyword/pattern-based filtering + URL reputation checking.
- Malicious URL Detection: Check against known phishing/malware URL patterns, suspicious TLDs, IP-based URLs, URL shortener abuse
- Sensitive Content Leakage: Re-run PII scanner on output to catch any PII the model generated beyond what was in the input
- Code Scanner: Detect dangerous code patterns (e.g.,
eval(),exec(),rm -rf, shell injection vectors, SQL injection patterns)
5. Content Policy Enforcement
- Ban Topics: Configurable topic list with keyword matching and semantic similarity (using lightweight embedding comparison if needed)
- Ban Substrings: Simple string/regex matching against a blocklist
- Invisible Characters: Strip or flag zero-width characters, homoglyphs, and bidirectional text override characters
Integration: Pre/Post Hook Architecture
The guard library integrates into Maestro via pre-hook (before sending to LLM) and post-hook (before returning response to user) middleware. All LLM calls in Maestro flow through these hooks automatically.
interface GuardHooks {
pre: (context: PreHookContext) => Promise<PreHookResult>;
post: (context: PostHookContext) => Promise<PostHookResult>;
}
interface PreHookContext {
prompt: string; // Raw user prompt
files?: string[]; // File contents attached to context
session: SessionMeta; // Session info (terminal, playbook, group chat, etc.)
config: GuardConfig; // User's security settings
}
interface PreHookResult {
sanitizedPrompt: string; // Cleaned prompt to send to LLM
vault: PiiVault; // Vault instance for deanonymizing the response
findings: Finding[]; // What was detected/redacted
blocked: boolean; // If true, do not send to LLM at all
blockReason?: string; // Why it was blocked (prompt injection, banned topic, etc.)
}
interface PostHookContext {
response: string; // Raw LLM response
vault: PiiVault; // Vault from the pre-hook (for deanonymization)
originalPrompt: string; // The original (unsanitized) user prompt
session: SessionMeta;
config: GuardConfig;
}
interface PostHookResult {
sanitizedResponse: string; // Cleaned response to show to user (PII restored, secrets caught)
findings: Finding[]; // What was detected in the output
blocked: boolean; // If true, do not show response to user
blockReason?: string;
}Hook Flow
User Input
|
v
[PRE-HOOK] -- PII anonymize, secrets redact, injection detect, ban check
|
|-- blocked? --> show warning to user, do not call LLM
|
v
LLM Provider (sanitized prompt)
|
v
[POST-HOOK] -- deanonymize PII, scan for leakage, URL check, code scan
|
|-- blocked? --> show warning, suppress response
|
v
User sees clean response
Registration in Maestro
import { createGuard } from "@maestro/llm-guard";
const guard = createGuard({
pii: { enabled: true, anonymize: true },
secrets: { enabled: true, redactMode: "partial" },
promptInjection: { enabled: true, threshold: 0.7 },
banTopics: { enabled: false, topics: [] },
});
// Register hooks globally — all LLM calls flow through them
maestro.hooks.register("llm:pre", guard.pre);
maestro.hooks.register("llm:post", guard.post);Per-Feature Behavior
| Maestro Feature | Pre-Hook | Post-Hook |
|---|---|---|
| AI Terminal | Sanitize user prompt, store vault | Deanonymize response, scan for leakage |
| Auto Run / Playbooks | Validate task prompt before each agent call | Scan agent output before marking task complete |
| Group Chat | Scan each agent's outgoing message | Scan incoming messages for injection/leakage |
| File Context | Scan file contents for secrets/PII before attaching | N/A |
| Mobile Remote | Additional validation on remote commands | Scan response before forwarding to mobile |
User Interface Suggestions
Settings Panel
Security Settings
+-- Input Protection
| +-- [x] Anonymize PII (Names, Emails, SSNs...)
| +-- [x] Redact Secrets (API keys, tokens, private keys)
| +-- [x] Detect Prompt Injection
| +-- [ ] Block Banned Topics: [Configure...]
|
+-- Output Protection
| +-- [x] Deanonymize (restore PII in responses)
| +-- [x] Detect Sensitive Content Leakage
| +-- [ ] Malicious URL Detection
| +-- [ ] Code Pattern Analysis
|
+-- Sensitivity
+-- Detection Threshold: [====|======] 0.7
+-- Action: (*) Warn ( ) Block ( ) Log Only
Real-time Indicators
- Shield icon in terminal when protection is active
- Badge showing "3 items anonymized" when PII is detected
- Warning toast when prompt injection detected
- Audit log of all security events
Dependencies (TypeScript-native, no Python)
| Package | Purpose | Size |
|---|---|---|
| None (built-in) | Regex-based PII, secrets, injection detection | 0 |
libphonenumber-js |
Phone number validation | ~150KB |
luhn (or inline) |
Credit card validation | <1KB |
punycode |
IDN/homoglyph detection | Built into Node |
Performance Targets
- Input scanning: <20ms latency for regex-based scanners (faster than Python llm-guard)
- Output scanning: <50ms latency
- Memory overhead: <50MB (no ML models in default config)
- Zero cold-start penalty (no Python interpreter to launch)
Priority/Phases
Phase 1 - Core Protection (MVP)
- Scanner interface and pipeline runner
- Secrets detection and redaction (30+ patterns + entropy)
- Basic PII detection (email, phone, SSN, credit card, IP)
- PII Vault with anonymize/deanonymize
- Settings UI for enabling/disabling features
- Security event logging
Phase 2 - Prompt Safety
- Prompt injection detection (heuristic, multi-layer)
- Invisible character / encoding attack detection
- Ban substrings and ban topics
- Output PII leakage detection
- Real-time UI indicators
Phase 3 - Advanced & Enterprise
- Malicious URL detection
- Dangerous code pattern detection
- Custom regex patterns (user-defined)
- Per-session security policies
- Group Chat inter-agent protection
- Audit log export
Benefits Over Python LLM Guard
- Zero external runtime: No Python, no pip, no virtual environments
- Native integration: Import directly into Maestro's TypeScript codebase
- Faster startup: No Python interpreter cold start
- Lower memory: Regex-based scanning uses far less RAM than ML models
- Easier distribution: Ships as part of the app, no sidecar process
- Better DX: Full type safety, same toolchain as the rest of Maestro
Trade-offs
- No ML-based toxicity/bias detection — kept intentionally to avoid heavy dependencies; pattern-based approaches cover the critical cases
- Name detection less accurate than spaCy NER — mitigated by contextual heuristics and the fact that other PII types (email, phone) catch most sensitive data
- Prompt injection detection is heuristic — but research shows pattern-based approaches catch 85%+ of known injection techniques, and multi-layer heuristics close the gap further
Related Links
- LLM Guard GitHub (Python reference implementation)
- detect-secrets patterns (regex patterns to port)
- trufflehog detector patterns (additional secret patterns)
- libphonenumber-js
- Rebuff (prompt injection) (heuristic patterns reference)
Additional Context
Maestro's architecture as a pass-through orchestration layer makes it ideal for implementing security scanning — it already intercepts all communication between users and AI agents. A TypeScript-native implementation means this security layer adds negligible overhead and ships as a first-class part of the application rather than an optional sidecar.