Feature Request: TypeScript-Native LLM Guard for Input/Output Protection

### Summary

Build a TypeScript-native security scanning library (inspired by [LLM Guard](https://github.com/protectai/llm-guard)) that Maestro can use directly to provide input sanitization, output validation, PII protection, prompt injection detection, and secrets scanning across all managed AI sessions — with zero Python dependency.

### Background

[LLM Guard](https://protectai.github.io/llm-guard/) by Protect AI provides excellent security primitives for LLMs, but it's Python-only. Rather than running a Python sidecar, we should replicate the core scanning capabilities in TypeScript so Maestro can integrate them natively without additional runtime dependencies.

Given Maestro's role as an orchestration layer managing multiple concurrent AI agents, it's uniquely positioned to implement these security measures centrally rather than requiring each underlying agent to handle security independently.

### Architecture: TypeScript Scanner Library

```
@maestro/llm-guard
├── src/
│   ├── index.ts                    # Main exports
│   ├── scanner.ts                  # Base scanner interface & pipeline runner
│   ├── vault.ts                    # Anonymize/Deanonymize vault (PII mapping store)
│   ├── input/                      # Input scanners
│   │   ├── pii-scanner.ts          # PII detection & anonymization
│   │   ├── secrets-scanner.ts      # Secrets/credentials detection
│   │   ├── prompt-injection.ts     # Prompt injection detection
│   │   ├── ban-topics.ts           # Topic blocking
│   │   ├── ban-substrings.ts       # Substring/word blocking
│   │   └── invisible-chars.ts      # Invisible/homoglyph character detection
│   ├── output/                     # Output scanners
│   │   ├── pii-leakage.ts          # PII leakage detection in responses
│   │   ├── deanonymize.ts          # Restore anonymized PII from vault
│   │   ├── malicious-url.ts        # Malicious URL detection
│   │   ├── sensitive-content.ts    # Sensitive content filtering
│   │   └── code-scanner.ts         # Dangerous code pattern detection
│   └── utils/
│       ├── regex-patterns.ts       # Shared regex patterns
│       ├── entropy.ts              # Shannon entropy calculator
│       └── tokenizer.ts            # Lightweight tokenizer for heuristic checks
```

### Core Scanner Interface

```typescript
interface ScanResult {
  isValid: boolean;
  score: number;           // 0.0 (safe) to 1.0 (dangerous)
  sanitizedText?: string;  // Modified text with redactions applied
  findings: Finding[];     // Individual detections
}

interface Finding {
  type: string;            // e.g. "PII_EMAIL", "SECRET_AWS_KEY", "PROMPT_INJECTION"
  value: string;           // The matched content
  start: number;           // Start position in original text
  end: number;             // End position in original text
  replacement?: string;    // What it was replaced with (if sanitized)
  confidence: number;      // Detection confidence 0-1
}

interface Scanner {
  name: string;
  scan(text: string, config?: ScannerConfig): Promise<ScanResult>;
}

interface Guard {
  pre(context: PreHookContext): Promise<PreHookResult>;
  post(context: PostHookContext): Promise<PostHookResult>;
  addInputScanner(scanner: Scanner): void;
  addOutputScanner(scanner: Scanner): void;
}
```

### Proposed Features & TypeScript Implementation Strategy

#### 1. **PII Detection & Anonymize/Deanonymize**

**TypeScript approach**: Regex patterns + heuristic NER (no ML model needed for common PII types).

Detectable entity types:
| Entity | Detection Method |
|--------|-----------------|
| Email addresses | Regex |
| Phone numbers | Regex + libphonenumber-js |
| SSN / Tax IDs | Regex with format validation |
| Credit card numbers | Regex + Luhn checksum |
| IP addresses (v4/v6) | Regex |
| Crypto wallet addresses | Regex (BTC, ETH patterns) |
| Person names | Heuristic: capitalized word sequences near PII context, or optional integration with a names dictionary |
| Physical addresses | Regex patterns for common address formats |

**Vault system** for Anonymize/Deanonymize:
```typescript
class PiiVault {
  private mappings: Map<string, string> = new Map();

  anonymize(text: string, findings: Finding[]): { text: string; vault: PiiVault } {
    // Replace each PII finding with a placeholder like [EMAIL_1], [PERSON_1]
    // Store mapping: "[EMAIL_1]" -> "john@acme.com"
  }

  deanonymize(text: string): string {
    // Restore all placeholders to original values
  }
}
```

```
User types: "Fix the bug for customer John Smith (john@acme.com)"
AI sees:    "Fix the bug for customer [PERSON_1] ([EMAIL_1])"
User sees:  "I've fixed the bug for John Smith (john@acme.com)..."
```

#### 2. **Secrets Detection & Redaction**

**TypeScript approach**: Regex patterns (ported from detect-secrets & trufflehog patterns) + Shannon entropy analysis.

Detectable secret types:
- AWS access keys & secret keys (`AKIA...`, 40-char base64)
- Azure keys and connection strings
- GitHub tokens (`ghp_`, `gho_`, `ghs_`, `ghu_`, `github_pat_`)
- GitLab tokens (`glpat-`)
- Slack tokens (`xoxb-`, `xoxp-`, `xoxs-`)
- Google API keys (`AIza...`)
- Stripe keys (`sk_live_`, `pk_live_`)
- JWT tokens (`eyJ...`)
- Private keys (PEM format detection: `-----BEGIN.*PRIVATE KEY-----`)
- Generic high-entropy strings (Shannon entropy > 4.5 for hex, > 5.0 for base64)
- Database connection strings (postgres://, mysql://, mongodb://)
- Twilio, SendGrid, Mailgun, and other SaaS API keys

```typescript
class SecretsScanner implements Scanner {
  name = "secrets";

  private patterns: SecretPattern[] = [
    { type: "AWS_ACCESS_KEY", regex: /AKIA[0-9A-Z]{16}/g, confidence: 0.95 },
    { type: "GITHUB_TOKEN", regex: /ghp_[A-Za-z0-9_]{36,}/g, confidence: 0.95 },
    { type: "PRIVATE_KEY", regex: /-----BEGIN\s?(RSA|EC|DSA|OPENSSH)?\s?PRIVATE KEY-----/g, confidence: 0.99 },
    // ... 30+ patterns
  ];

  private entropyCheck(str: string, charset: "hex" | "base64"): number {
    // Shannon entropy calculation
  }
}
```

#### 3. **Prompt Injection Detection**

**TypeScript approach**: Multi-layered heuristic detection (no ML model, but surprisingly effective).

Detection layers:
1. **Known payload patterns**: Regex matching common injection patterns (`ignore previous instructions`, `you are now`, `system prompt:`, `<|im_start|>`, etc.)
2. **Structural analysis**: Detect role-switching attempts, unusual delimiter usage, nested instruction blocks
3. **Instruction density scoring**: Flag prompts with unusually high density of imperative verbs and instruction-like patterns
4. **Encoding detection**: Detect base64-encoded instructions, Unicode tricks, invisible characters
5. **Behavioral scoring**: Combine signal scores from multiple layers into a composite risk score

```typescript
class PromptInjectionScanner implements Scanner {
  name = "prompt-injection";

  private injectionPatterns = [
    /ignore\s+(all\s+)?(previous|above|prior)\s+(instructions|prompts|context)/i,
    /you\s+are\s+now\s+/i,
    /new\s+instructions?\s*:/i,
    /system\s*prompt\s*:/i,
    /\[INST\]|\[\/INST\]|<\|im_start\|>|<\|im_end\|>/i,
    /do\s+not\s+follow\s+(the\s+)?(previous|above|original)/i,
    /act\s+as\s+(if\s+)?(you\s+are|a)\s+/i,
    // ... more patterns
  ];

  private detectEncodedPayloads(text: string): Finding[] {
    // Check for base64-encoded instructions
    // Check for Unicode homoglyph obfuscation
    // Check for zero-width character hiding
  }
}
```

#### 4. **Output Content Filtering**

**TypeScript approach**: Keyword/pattern-based filtering + URL reputation checking.

- **Malicious URL Detection**: Check against known phishing/malware URL patterns, suspicious TLDs, IP-based URLs, URL shortener abuse
- **Sensitive Content Leakage**: Re-run PII scanner on output to catch any PII the model generated beyond what was in the input
- **Code Scanner**: Detect dangerous code patterns (e.g., `eval()`, `exec()`, `rm -rf`, shell injection vectors, SQL injection patterns)

#### 5. **Content Policy Enforcement**

- **Ban Topics**: Configurable topic list with keyword matching and semantic similarity (using lightweight embedding comparison if needed)
- **Ban Substrings**: Simple string/regex matching against a blocklist
- **Invisible Characters**: Strip or flag zero-width characters, homoglyphs, and bidirectional text override characters

### Integration: Pre/Post Hook Architecture

The guard library integrates into Maestro via **pre-hook** (before sending to LLM) and **post-hook** (before returning response to user) middleware. All LLM calls in Maestro flow through these hooks automatically.

```typescript
interface GuardHooks {
  pre: (context: PreHookContext) => Promise<PreHookResult>;
  post: (context: PostHookContext) => Promise<PostHookResult>;
}

interface PreHookContext {
  prompt: string;             // Raw user prompt
  files?: string[];           // File contents attached to context
  session: SessionMeta;       // Session info (terminal, playbook, group chat, etc.)
  config: GuardConfig;        // User's security settings
}

interface PreHookResult {
  sanitizedPrompt: string;    // Cleaned prompt to send to LLM
  vault: PiiVault;            // Vault instance for deanonymizing the response
  findings: Finding[];        // What was detected/redacted
  blocked: boolean;           // If true, do not send to LLM at all
  blockReason?: string;       // Why it was blocked (prompt injection, banned topic, etc.)
}

interface PostHookContext {
  response: string;           // Raw LLM response
  vault: PiiVault;            // Vault from the pre-hook (for deanonymization)
  originalPrompt: string;     // The original (unsanitized) user prompt
  session: SessionMeta;
  config: GuardConfig;
}

interface PostHookResult {
  sanitizedResponse: string;  // Cleaned response to show to user (PII restored, secrets caught)
  findings: Finding[];        // What was detected in the output
  blocked: boolean;           // If true, do not show response to user
  blockReason?: string;
}
```

#### Hook Flow

```
User Input
    |
    v
[PRE-HOOK] -- PII anonymize, secrets redact, injection detect, ban check
    |
    |-- blocked? --> show warning to user, do not call LLM
    |
    v
LLM Provider (sanitized prompt)
    |
    v
[POST-HOOK] -- deanonymize PII, scan for leakage, URL check, code scan
    |
    |-- blocked? --> show warning, suppress response
    |
    v
User sees clean response
```

#### Registration in Maestro

```typescript
import { createGuard } from "@maestro/llm-guard";

const guard = createGuard({
  pii: { enabled: true, anonymize: true },
  secrets: { enabled: true, redactMode: "partial" },
  promptInjection: { enabled: true, threshold: 0.7 },
  banTopics: { enabled: false, topics: [] },
});

// Register hooks globally — all LLM calls flow through them
maestro.hooks.register("llm:pre", guard.pre);
maestro.hooks.register("llm:post", guard.post);
```

#### Per-Feature Behavior

| Maestro Feature | Pre-Hook | Post-Hook |
|-----------------|----------|-----------|
| **AI Terminal** | Sanitize user prompt, store vault | Deanonymize response, scan for leakage |
| **Auto Run / Playbooks** | Validate task prompt before each agent call | Scan agent output before marking task complete |
| **Group Chat** | Scan each agent's outgoing message | Scan incoming messages for injection/leakage |
| **File Context** | Scan file contents for secrets/PII before attaching | N/A |
| **Mobile Remote** | Additional validation on remote commands | Scan response before forwarding to mobile |

### User Interface Suggestions

#### Settings Panel
```
Security Settings
+-- Input Protection
|   +-- [x] Anonymize PII (Names, Emails, SSNs...)
|   +-- [x] Redact Secrets (API keys, tokens, private keys)
|   +-- [x] Detect Prompt Injection
|   +-- [ ] Block Banned Topics: [Configure...]
|
+-- Output Protection
|   +-- [x] Deanonymize (restore PII in responses)
|   +-- [x] Detect Sensitive Content Leakage
|   +-- [ ] Malicious URL Detection
|   +-- [ ] Code Pattern Analysis
|
+-- Sensitivity
    +-- Detection Threshold: [====|======] 0.7
    +-- Action: (*) Warn  ( ) Block  ( ) Log Only
```

#### Real-time Indicators
- Shield icon in terminal when protection is active
- Badge showing "3 items anonymized" when PII is detected
- Warning toast when prompt injection detected
- Audit log of all security events

### Dependencies (TypeScript-native, no Python)

| Package | Purpose | Size |
|---------|---------|------|
| None (built-in) | Regex-based PII, secrets, injection detection | 0 |
| `libphonenumber-js` | Phone number validation | ~150KB |
| `luhn` (or inline) | Credit card validation | <1KB |
| `punycode` | IDN/homoglyph detection | Built into Node |

### Performance Targets

- Input scanning: <20ms latency for regex-based scanners (faster than Python llm-guard)
- Output scanning: <50ms latency
- Memory overhead: <50MB (no ML models in default config)
- Zero cold-start penalty (no Python interpreter to launch)

### Priority/Phases

**Phase 1 - Core Protection (MVP)**
- [ ] Scanner interface and pipeline runner
- [ ] Secrets detection and redaction (30+ patterns + entropy)
- [ ] Basic PII detection (email, phone, SSN, credit card, IP)
- [ ] PII Vault with anonymize/deanonymize
- [ ] Settings UI for enabling/disabling features
- [ ] Security event logging

**Phase 2 - Prompt Safety**
- [ ] Prompt injection detection (heuristic, multi-layer)
- [ ] Invisible character / encoding attack detection
- [ ] Ban substrings and ban topics
- [ ] Output PII leakage detection
- [ ] Real-time UI indicators

**Phase 3 - Advanced & Enterprise**
- [ ] Malicious URL detection
- [ ] Dangerous code pattern detection
- [ ] Custom regex patterns (user-defined)
- [ ] Per-session security policies
- [ ] Group Chat inter-agent protection
- [ ] Audit log export

### Benefits Over Python LLM Guard

1. **Zero external runtime**: No Python, no pip, no virtual environments
2. **Native integration**: Import directly into Maestro's TypeScript codebase
3. **Faster startup**: No Python interpreter cold start
4. **Lower memory**: Regex-based scanning uses far less RAM than ML models
5. **Easier distribution**: Ships as part of the app, no sidecar process
6. **Better DX**: Full type safety, same toolchain as the rest of Maestro

### Trade-offs

- **No ML-based toxicity/bias detection** — kept intentionally to avoid heavy dependencies; pattern-based approaches cover the critical cases
- **Name detection less accurate** than spaCy NER — mitigated by contextual heuristics and the fact that other PII types (email, phone) catch most sensitive data
- **Prompt injection detection is heuristic** — but research shows pattern-based approaches catch 85%+ of known injection techniques, and multi-layer heuristics close the gap further

### Related Links

- [LLM Guard GitHub](https://github.com/protectai/llm-guard) (Python reference implementation)
- [detect-secrets patterns](https://github.com/Yelp/detect-secrets) (regex patterns to port)
- [trufflehog detector patterns](https://github.com/trufflesecurity/trufflehog) (additional secret patterns)
- [libphonenumber-js](https://github.com/nickhudkins/libphonenumber-js)
- [Rebuff (prompt injection)](https://github.com/protectai/rebuff) (heuristic patterns reference)

### Additional Context

Maestro's architecture as a pass-through orchestration layer makes it ideal for implementing security scanning — it already intercepts all communication between users and AI agents. A TypeScript-native implementation means this security layer adds negligible overhead and ships as a first-class part of the application rather than an optional sidecar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: TypeScript-Native LLM Guard for Input/Output Protection #522

Summary

Background

Architecture: TypeScript Scanner Library

Core Scanner Interface

Proposed Features & TypeScript Implementation Strategy

1. PII Detection & Anonymize/Deanonymize

2. Secrets Detection & Redaction

3. Prompt Injection Detection

4. Output Content Filtering

5. Content Policy Enforcement

Integration: Pre/Post Hook Architecture

Hook Flow

Registration in Maestro

Per-Feature Behavior

User Interface Suggestions

Settings Panel

Real-time Indicators

Dependencies (TypeScript-native, no Python)

Performance Targets

Priority/Phases

Benefits Over Python LLM Guard

Trade-offs

Related Links

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Entity	Detection Method
Email addresses	Regex
Phone numbers	Regex + libphonenumber-js
SSN / Tax IDs	Regex with format validation
Credit card numbers	Regex + Luhn checksum
IP addresses (v4/v6)	Regex
Crypto wallet addresses	Regex (BTC, ETH patterns)
Person names	Heuristic: capitalized word sequences near PII context, or optional integration with a names dictionary
Physical addresses	Regex patterns for common address formats

Maestro Feature	Pre-Hook	Post-Hook
AI Terminal	Sanitize user prompt, store vault	Deanonymize response, scan for leakage
Auto Run / Playbooks	Validate task prompt before each agent call	Scan agent output before marking task complete
Group Chat	Scan each agent's outgoing message	Scan incoming messages for injection/leakage
File Context	Scan file contents for secrets/PII before attaching	N/A
Mobile Remote	Additional validation on remote commands	Scan response before forwarding to mobile

Package	Purpose	Size
None (built-in)	Regex-based PII, secrets, injection detection	0
`libphonenumber-js`	Phone number validation	~150KB
`luhn` (or inline)	Credit card validation	<1KB
`punycode`	IDN/homoglyph detection	Built into Node

Feature Request: TypeScript-Native LLM Guard for Input/Output Protection #522

Description

Summary

Background

Architecture: TypeScript Scanner Library

Core Scanner Interface

Proposed Features & TypeScript Implementation Strategy

1. PII Detection & Anonymize/Deanonymize

2. Secrets Detection & Redaction

3. Prompt Injection Detection

4. Output Content Filtering

5. Content Policy Enforcement

Integration: Pre/Post Hook Architecture

Hook Flow

Registration in Maestro

Per-Feature Behavior

User Interface Suggestions

Settings Panel

Real-time Indicators

Dependencies (TypeScript-native, no Python)

Performance Targets

Priority/Phases

Benefits Over Python LLM Guard

Trade-offs

Related Links

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions