Skip to content

Feature Request: TypeScript-Native LLM Guard for Input/Output Protection #522

@Ashraf-Ali-aa

Description

@Ashraf-Ali-aa

Summary

Build a TypeScript-native security scanning library (inspired by LLM Guard) that Maestro can use directly to provide input sanitization, output validation, PII protection, prompt injection detection, and secrets scanning across all managed AI sessions — with zero Python dependency.

Background

LLM Guard by Protect AI provides excellent security primitives for LLMs, but it's Python-only. Rather than running a Python sidecar, we should replicate the core scanning capabilities in TypeScript so Maestro can integrate them natively without additional runtime dependencies.

Given Maestro's role as an orchestration layer managing multiple concurrent AI agents, it's uniquely positioned to implement these security measures centrally rather than requiring each underlying agent to handle security independently.

Architecture: TypeScript Scanner Library

@maestro/llm-guard
├── src/
│   ├── index.ts                    # Main exports
│   ├── scanner.ts                  # Base scanner interface & pipeline runner
│   ├── vault.ts                    # Anonymize/Deanonymize vault (PII mapping store)
│   ├── input/                      # Input scanners
│   │   ├── pii-scanner.ts          # PII detection & anonymization
│   │   ├── secrets-scanner.ts      # Secrets/credentials detection
│   │   ├── prompt-injection.ts     # Prompt injection detection
│   │   ├── ban-topics.ts           # Topic blocking
│   │   ├── ban-substrings.ts       # Substring/word blocking
│   │   └── invisible-chars.ts      # Invisible/homoglyph character detection
│   ├── output/                     # Output scanners
│   │   ├── pii-leakage.ts          # PII leakage detection in responses
│   │   ├── deanonymize.ts          # Restore anonymized PII from vault
│   │   ├── malicious-url.ts        # Malicious URL detection
│   │   ├── sensitive-content.ts    # Sensitive content filtering
│   │   └── code-scanner.ts         # Dangerous code pattern detection
│   └── utils/
│       ├── regex-patterns.ts       # Shared regex patterns
│       ├── entropy.ts              # Shannon entropy calculator
│       └── tokenizer.ts            # Lightweight tokenizer for heuristic checks

Core Scanner Interface

interface ScanResult {
  isValid: boolean;
  score: number;           // 0.0 (safe) to 1.0 (dangerous)
  sanitizedText?: string;  // Modified text with redactions applied
  findings: Finding[];     // Individual detections
}

interface Finding {
  type: string;            // e.g. "PII_EMAIL", "SECRET_AWS_KEY", "PROMPT_INJECTION"
  value: string;           // The matched content
  start: number;           // Start position in original text
  end: number;             // End position in original text
  replacement?: string;    // What it was replaced with (if sanitized)
  confidence: number;      // Detection confidence 0-1
}

interface Scanner {
  name: string;
  scan(text: string, config?: ScannerConfig): Promise<ScanResult>;
}

interface Guard {
  pre(context: PreHookContext): Promise<PreHookResult>;
  post(context: PostHookContext): Promise<PostHookResult>;
  addInputScanner(scanner: Scanner): void;
  addOutputScanner(scanner: Scanner): void;
}

Proposed Features & TypeScript Implementation Strategy

1. PII Detection & Anonymize/Deanonymize

TypeScript approach: Regex patterns + heuristic NER (no ML model needed for common PII types).

Detectable entity types:

Entity Detection Method
Email addresses Regex
Phone numbers Regex + libphonenumber-js
SSN / Tax IDs Regex with format validation
Credit card numbers Regex + Luhn checksum
IP addresses (v4/v6) Regex
Crypto wallet addresses Regex (BTC, ETH patterns)
Person names Heuristic: capitalized word sequences near PII context, or optional integration with a names dictionary
Physical addresses Regex patterns for common address formats

Vault system for Anonymize/Deanonymize:

class PiiVault {
  private mappings: Map<string, string> = new Map();

  anonymize(text: string, findings: Finding[]): { text: string; vault: PiiVault } {
    // Replace each PII finding with a placeholder like [EMAIL_1], [PERSON_1]
    // Store mapping: "[EMAIL_1]" -> "john@acme.com"
  }

  deanonymize(text: string): string {
    // Restore all placeholders to original values
  }
}
User types: "Fix the bug for customer John Smith (john@acme.com)"
AI sees:    "Fix the bug for customer [PERSON_1] ([EMAIL_1])"
User sees:  "I've fixed the bug for John Smith (john@acme.com)..."

2. Secrets Detection & Redaction

TypeScript approach: Regex patterns (ported from detect-secrets & trufflehog patterns) + Shannon entropy analysis.

Detectable secret types:

  • AWS access keys & secret keys (AKIA..., 40-char base64)
  • Azure keys and connection strings
  • GitHub tokens (ghp_, gho_, ghs_, ghu_, github_pat_)
  • GitLab tokens (glpat-)
  • Slack tokens (xoxb-, xoxp-, xoxs-)
  • Google API keys (AIza...)
  • Stripe keys (sk_live_, pk_live_)
  • JWT tokens (eyJ...)
  • Private keys (PEM format detection: -----BEGIN.*PRIVATE KEY-----)
  • Generic high-entropy strings (Shannon entropy > 4.5 for hex, > 5.0 for base64)
  • Database connection strings (postgres://, mysql://, mongodb://)
  • Twilio, SendGrid, Mailgun, and other SaaS API keys
class SecretsScanner implements Scanner {
  name = "secrets";

  private patterns: SecretPattern[] = [
    { type: "AWS_ACCESS_KEY", regex: /AKIA[0-9A-Z]{16}/g, confidence: 0.95 },
    { type: "GITHUB_TOKEN", regex: /ghp_[A-Za-z0-9_]{36,}/g, confidence: 0.95 },
    { type: "PRIVATE_KEY", regex: /-----BEGIN\s?(RSA|EC|DSA|OPENSSH)?\s?PRIVATE KEY-----/g, confidence: 0.99 },
    // ... 30+ patterns
  ];

  private entropyCheck(str: string, charset: "hex" | "base64"): number {
    // Shannon entropy calculation
  }
}

3. Prompt Injection Detection

TypeScript approach: Multi-layered heuristic detection (no ML model, but surprisingly effective).

Detection layers:

  1. Known payload patterns: Regex matching common injection patterns (ignore previous instructions, you are now, system prompt:, <|im_start|>, etc.)
  2. Structural analysis: Detect role-switching attempts, unusual delimiter usage, nested instruction blocks
  3. Instruction density scoring: Flag prompts with unusually high density of imperative verbs and instruction-like patterns
  4. Encoding detection: Detect base64-encoded instructions, Unicode tricks, invisible characters
  5. Behavioral scoring: Combine signal scores from multiple layers into a composite risk score
class PromptInjectionScanner implements Scanner {
  name = "prompt-injection";

  private injectionPatterns = [
    /ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)/i,
    /you\s+are\s+now\s+/i,
    /new\s+instructions?\s*:/i,
    /system\s*prompt\s*:/i,
    /\[INST\]|\[\/INST\]|<\|im_start\|>|<\|im_end\|>/i,
    /do\s+not\s+follow\s+(the\s+)?(previous|above|original)/i,
    /act\s+as\s+(if\s+)?(you\s+are|a)\s+/i,
    // ... more patterns
  ];

  private detectEncodedPayloads(text: string): Finding[] {
    // Check for base64-encoded instructions
    // Check for Unicode homoglyph obfuscation
    // Check for zero-width character hiding
  }
}

4. Output Content Filtering

TypeScript approach: Keyword/pattern-based filtering + URL reputation checking.

  • Malicious URL Detection: Check against known phishing/malware URL patterns, suspicious TLDs, IP-based URLs, URL shortener abuse
  • Sensitive Content Leakage: Re-run PII scanner on output to catch any PII the model generated beyond what was in the input
  • Code Scanner: Detect dangerous code patterns (e.g., eval(), exec(), rm -rf, shell injection vectors, SQL injection patterns)

5. Content Policy Enforcement

  • Ban Topics: Configurable topic list with keyword matching and semantic similarity (using lightweight embedding comparison if needed)
  • Ban Substrings: Simple string/regex matching against a blocklist
  • Invisible Characters: Strip or flag zero-width characters, homoglyphs, and bidirectional text override characters

Integration: Pre/Post Hook Architecture

The guard library integrates into Maestro via pre-hook (before sending to LLM) and post-hook (before returning response to user) middleware. All LLM calls in Maestro flow through these hooks automatically.

interface GuardHooks {
  pre: (context: PreHookContext) => Promise<PreHookResult>;
  post: (context: PostHookContext) => Promise<PostHookResult>;
}

interface PreHookContext {
  prompt: string;             // Raw user prompt
  files?: string[];           // File contents attached to context
  session: SessionMeta;       // Session info (terminal, playbook, group chat, etc.)
  config: GuardConfig;        // User's security settings
}

interface PreHookResult {
  sanitizedPrompt: string;    // Cleaned prompt to send to LLM
  vault: PiiVault;            // Vault instance for deanonymizing the response
  findings: Finding[];        // What was detected/redacted
  blocked: boolean;           // If true, do not send to LLM at all
  blockReason?: string;       // Why it was blocked (prompt injection, banned topic, etc.)
}

interface PostHookContext {
  response: string;           // Raw LLM response
  vault: PiiVault;            // Vault from the pre-hook (for deanonymization)
  originalPrompt: string;     // The original (unsanitized) user prompt
  session: SessionMeta;
  config: GuardConfig;
}

interface PostHookResult {
  sanitizedResponse: string;  // Cleaned response to show to user (PII restored, secrets caught)
  findings: Finding[];        // What was detected in the output
  blocked: boolean;           // If true, do not show response to user
  blockReason?: string;
}

Hook Flow

User Input
    |
    v
[PRE-HOOK] -- PII anonymize, secrets redact, injection detect, ban check
    |
    |-- blocked? --> show warning to user, do not call LLM
    |
    v
LLM Provider (sanitized prompt)
    |
    v
[POST-HOOK] -- deanonymize PII, scan for leakage, URL check, code scan
    |
    |-- blocked? --> show warning, suppress response
    |
    v
User sees clean response

Registration in Maestro

import { createGuard } from "@maestro/llm-guard";

const guard = createGuard({
  pii: { enabled: true, anonymize: true },
  secrets: { enabled: true, redactMode: "partial" },
  promptInjection: { enabled: true, threshold: 0.7 },
  banTopics: { enabled: false, topics: [] },
});

// Register hooks globally — all LLM calls flow through them
maestro.hooks.register("llm:pre", guard.pre);
maestro.hooks.register("llm:post", guard.post);

Per-Feature Behavior

Maestro Feature Pre-Hook Post-Hook
AI Terminal Sanitize user prompt, store vault Deanonymize response, scan for leakage
Auto Run / Playbooks Validate task prompt before each agent call Scan agent output before marking task complete
Group Chat Scan each agent's outgoing message Scan incoming messages for injection/leakage
File Context Scan file contents for secrets/PII before attaching N/A
Mobile Remote Additional validation on remote commands Scan response before forwarding to mobile

User Interface Suggestions

Settings Panel

Security Settings
+-- Input Protection
|   +-- [x] Anonymize PII (Names, Emails, SSNs...)
|   +-- [x] Redact Secrets (API keys, tokens, private keys)
|   +-- [x] Detect Prompt Injection
|   +-- [ ] Block Banned Topics: [Configure...]
|
+-- Output Protection
|   +-- [x] Deanonymize (restore PII in responses)
|   +-- [x] Detect Sensitive Content Leakage
|   +-- [ ] Malicious URL Detection
|   +-- [ ] Code Pattern Analysis
|
+-- Sensitivity
    +-- Detection Threshold: [====|======] 0.7
    +-- Action: (*) Warn  ( ) Block  ( ) Log Only

Real-time Indicators

  • Shield icon in terminal when protection is active
  • Badge showing "3 items anonymized" when PII is detected
  • Warning toast when prompt injection detected
  • Audit log of all security events

Dependencies (TypeScript-native, no Python)

Package Purpose Size
None (built-in) Regex-based PII, secrets, injection detection 0
libphonenumber-js Phone number validation ~150KB
luhn (or inline) Credit card validation <1KB
punycode IDN/homoglyph detection Built into Node

Performance Targets

  • Input scanning: <20ms latency for regex-based scanners (faster than Python llm-guard)
  • Output scanning: <50ms latency
  • Memory overhead: <50MB (no ML models in default config)
  • Zero cold-start penalty (no Python interpreter to launch)

Priority/Phases

Phase 1 - Core Protection (MVP)

  • Scanner interface and pipeline runner
  • Secrets detection and redaction (30+ patterns + entropy)
  • Basic PII detection (email, phone, SSN, credit card, IP)
  • PII Vault with anonymize/deanonymize
  • Settings UI for enabling/disabling features
  • Security event logging

Phase 2 - Prompt Safety

  • Prompt injection detection (heuristic, multi-layer)
  • Invisible character / encoding attack detection
  • Ban substrings and ban topics
  • Output PII leakage detection
  • Real-time UI indicators

Phase 3 - Advanced & Enterprise

  • Malicious URL detection
  • Dangerous code pattern detection
  • Custom regex patterns (user-defined)
  • Per-session security policies
  • Group Chat inter-agent protection
  • Audit log export

Benefits Over Python LLM Guard

  1. Zero external runtime: No Python, no pip, no virtual environments
  2. Native integration: Import directly into Maestro's TypeScript codebase
  3. Faster startup: No Python interpreter cold start
  4. Lower memory: Regex-based scanning uses far less RAM than ML models
  5. Easier distribution: Ships as part of the app, no sidecar process
  6. Better DX: Full type safety, same toolchain as the rest of Maestro

Trade-offs

  • No ML-based toxicity/bias detection — kept intentionally to avoid heavy dependencies; pattern-based approaches cover the critical cases
  • Name detection less accurate than spaCy NER — mitigated by contextual heuristics and the fact that other PII types (email, phone) catch most sensitive data
  • Prompt injection detection is heuristic — but research shows pattern-based approaches catch 85%+ of known injection techniques, and multi-layer heuristics close the gap further

Related Links

Additional Context

Maestro's architecture as a pass-through orchestration layer makes it ideal for implementing security scanning — it already intercepts all communication between users and AI agents. A TypeScript-native implementation means this security layer adds negligible overhead and ships as a first-class part of the application rather than an optional sidecar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions