Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 266 additions & 0 deletions docs/plans/2026-02-13-secrets-protection-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
# Secrets Protection Design

## Problem

Shellguard currently has zero protection against reading sensitive files or
leaking secrets. Any allowed command (`cat`, `grep`, `head`, `printenv`, etc.)
can freely access `.env`, private keys, cloud credentials, and similar files.
The `download_file` tool has no path validation at all.

### Threat Model

1. **Malicious exfiltration** — prompt injection or model misbehavior causes the
LLM to issue tool calls that read secrets (e.g., `cat .env`, `printenv`)
2. **Accidental exposure** — the LLM innocently reads secrets during
debugging/exploration and includes them in responses, exposing them in
logs/chat history
3. **Privacy** — users don't want secrets sent to LLM API endpoints at all

All three concerns demand the same solution: **local heuristics that block or
redact secrets before they ever leave the machine.** No network calls, no LLM
involvement.

## Architecture

A new standalone `secrets` package (zero internal dependencies) provides two
capabilities integrated into the existing pipeline:

```
User Input
Parse (parser)
Validate (validator) ← existing manifest-based validation
CheckSecrets (secrets) ← NEW: rejects sensitive file access
Reconstruct (ssh)
Execute (ssh)
Truncate (output)
ScrubSecrets (secrets) ← NEW: redacts secrets from output
Return to LLM
```

Both stages are function fields on `Core` for test injection, following the
existing pattern (`Parse`, `Validate`, `Reconstruct`, `Truncate`).

## Phase 1: Pre-execution Path Checking

### Sensitive Path Patterns

Default patterns are hardcoded in the package. Users can override via config.

| Category | Patterns |
| ----------------- | ----------------------------------------------------------------------------------------- |
| Env files | `.env`, `.env.*` (`.env.local`, `.env.production`, etc.) |
| SSH keys | `.ssh/id_*`, `.ssh/authorized_keys`, `.ssh/known_hosts` |
| TLS/Certs | `*.pem`, `*.key`, `*.pfx`, `*.p12` |
| Cloud credentials | `.aws/credentials`, `.aws/config`, `.gcloud/credentials.db`, `.azure/`, `.config/gcloud/` |
| K8s/Docker | `.kube/config`, `.docker/config.json` |
| App credentials | `credentials.json`, `service-account*.json`, `.netrc`, `.pgpass`, `.my.cnf` |
| Git | `.git-credentials`, `.gitconfig` |
| System | `/etc/shadow`, `/etc/gshadow`, `/etc/master.passwd` |
| Generic | `*secret*`, `*credential*`, `*token*` in filenames |

### How It Works

A new function `secrets.CheckPipeline(pipeline, config)` is called from
`Core.Execute()` between `Validate` and `Reconstruct`. For each segment's args:

1. Normalize the path (`path.Clean`, resolve `~`, strip trailing slashes)
2. Check the **basename** against filename patterns (`.env`, `id_rsa`, etc.)
3. Check the **full path** against directory patterns (`.ssh/`, `.aws/`, etc.)
4. If a match is found, return a `SecretsError` with the specific pattern matched

### Special Command Handling

- **`printenv` with no args** — blocked (dumps all env vars including secrets)
- **`printenv VAR_NAME`** — blocked if `VAR_NAME` matches sensitive env var
patterns: `*KEY*`, `*SECRET*`, `*TOKEN*`, `*PASSWORD*`, `*CREDENTIAL*`, `*AUTH*`
- **`download_file`** — `remotePath` checked against the same pattern set in
`Core.DownloadFile()`

### Configuration

```go
type SecretsConfig struct {
// AllowedPaths overrides default blocking for specific paths.
// e.g., [".env.example", "/app/config/credentials.json"]
AllowedPaths []string `yaml:"allowed_paths"`

// AdditionalPatterns adds more patterns to the default set.
AdditionalPatterns []string `yaml:"additional_patterns"`

// DisablePathCheck disables pre-execution path checking entirely.
DisablePathCheck bool `yaml:"disable_path_check"`

// DisableOutputScrub disables post-execution output scrubbing.
DisableOutputScrub bool `yaml:"disable_output_scrub"`
}
```

Hard block by default. Users can allowlist specific paths if they explicitly
choose to allow access.

## Phase 2: Post-execution Output Scrubbing

Catches secrets that appear in command output — e.g., `grep -r "database"
/app/config/` might return lines containing connection strings with embedded
passwords.

### Patterns (High Confidence — Low False Positive Risk)

| Pattern | Example | Redacted to |
| ------------------ | ------------------------------------------------- | -------------------------------------- |
| AWS Access Key | `AKIA1234567890ABCDEF` | `AKIA***REDACTED***` |
| AWS Secret Key | 40-char base64 after `aws_secret_access_key` | `***REDACTED***` |
| GitHub tokens | `ghp_xxxx`, `gho_xxxx`, `ghs_xxxx`, `github_pat_` | `ghp_***REDACTED***` |
| Stripe/OpenAI keys | `sk-xxxx`, `sk_live_xxxx`, `pk_live_xxxx` | `sk-***REDACTED***` |
| Private key blocks | `-----BEGIN (RSA\|EC\|OPENSSH) PRIVATE KEY-----` | `***REDACTED_PRIVATE_KEY***` |
Comment on lines +125 to +131
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use “GitHub” capitalization in the tokens row.

The platform name should be capitalized as “GitHub” in the pattern description.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~129-~129: The official name of this software platform is spelled with a capital “H”.
Context: ... | ghp_xxxx, gho_xxxx, ghs_xxxx, github_pat_ | ghp_***REDACTED*** ...

(GITHUB)

🤖 Prompt for AI Agents
In `@docs/plans/2026-02-13-secrets-protection-design.md` around lines 125 - 131,
The table entry labeled "GitHub tokens" uses lowercase "GitHub" and should be
fixed: update the Pattern cell from "GitHub tokens" (currently "GitHub tokens"?)
to use the correct capitalization "GitHub tokens" and ensure any occurrences of
"github" in that row (e.g., the examples `ghp_xxxx`, `gho_xxxx`, `ghs_xxxx`,
`github_pat_`) remain unchanged except for the platform name; specifically edit
the row containing the pattern string "GitHub tokens" (and the example values
`ghp_xxxx`, `gho_xxxx`, `ghs_xxxx`, `github_pat_`) to capitalize the platform
name as "GitHub".

| Bearer tokens | `Authorization: Bearer xxxx` | `Authorization: Bearer ***REDACTED***` |

### Patterns (Medium Confidence — Tightened to Reduce False Positives)

| Pattern | Tightening Heuristic |
| ---------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| JWTs (`eyJ...`) | Require all 3 dot-separated segments to be valid base64 AND total length > 30 chars |
| Connection strings (`postgresql://user:pass@host`) | Only redact the password portion between `://user:` and `@` |
| Generic key-value (`password=`, `secret=`, `token=`) | Require value portion to look like a secret: min length, mixed case/digits, not a common word. `password_policy=strict` should NOT match. |

### Implementation

`secrets.ScrubOutput(text string) string` applies all compiled regex patterns.
Patterns are compiled once at package init. Output is already capped at 64KB by
the truncation stage, so running ~10-15 compiled regexes is sub-millisecond.

## Integration Points (Changes to Existing Code)

### `shellguard.go`

Add `SecretsConfig` to `Config` struct. Pass it through to `Core`.

### `server/server.go`

Add two function fields to `Core`:

```go
type Core struct {
// ... existing fields ...
CheckSecrets func(*parser.Pipeline) error // default: secrets.CheckPipeline
ScrubSecrets func(string) string // default: secrets.ScrubOutput
}
```

Update `Core.Execute()`:

```go
func (c *Core) Execute(ctx context.Context, in ExecuteInput) (output.CommandResult, error) {
pipeline, err := c.Parse(in.Command)
// ...
err = c.Validate(pipeline, c.Registry)
// ...
err = c.CheckSecrets(pipeline) // NEW
// ...
cmd := c.Reconstruct(pipeline, ...)
result := c.Runner.Execute(ctx, host, cmd, timeout)
truncated := c.Truncate(...)
// Scrub both stdout and stderr // NEW
truncated.Stdout = c.ScrubSecrets(truncated.Stdout)
truncated.Stderr = c.ScrubSecrets(truncated.Stderr)
return truncated, nil
}
```

Update `Core.DownloadFile()` to check `remotePath` against secrets patterns.

### No changes to `validator/` or `output/`

Secrets checking is a separate stage, keeping clean separation of concerns.

## Package Layout

```
secrets/
secrets.go # Core types, SecretsConfig, constructor, SecretsError
paths.go # Sensitive path patterns, CheckPipeline(), CheckPath()
scrub.go # Output scrubbing patterns, ScrubOutput()
paths_test.go # Path checking unit tests
scrub_test.go # Output scrubbing unit tests + false-positive regression
security_test.go # Attack vector tests (bypass attempts)
fuzz_test.go # Fuzz tests for both path checking and scrubbing
```

## Testing Strategy

Following existing conventions: `testing` only, no testify, `got`/`want`
assertions, `t.Helper()` in helpers.

### Path Tests (`paths_test.go`)

Table-driven tests covering all pattern categories:

- **Blocked:** `cat .env`, `head .ssh/id_rsa`, `grep -r foo credentials.json`,
`printenv`, `printenv AWS_SECRET_KEY`
- **Allowed:** `cat README.md`, `head main.go`, `printenv PATH`, `printenv HOME`
- **Allowlist override:** `.env.example` in allowed list → `cat .env.example` passes
- **Path normalization:** `../../.env`, `./foo/../.env`, `~/.ssh/id_rsa`

### Scrub Tests (`scrub_test.go`)

Table-driven with input/expected output pairs:

- **Positive:** Each pattern category with realistic examples
- **False-positive regression:** `token_count=5`, `password_policy=strict`,
`secret_garden.txt`, base64 strings that start with `eyJ` but aren't JWTs

### Security Tests (`security_test.go`)

`TestSec_` prefix. Bypass attempts:

- Path traversal: `../../.env`, `/app/../../../etc/shadow`
- Encoding tricks: URL-encoded paths, unicode homoglyphs
- Indirect access: `find / -name .env`, `grep -rl password /etc/`
- Argument hiding: flags that take path values (`grep -f .env foo.txt`)

### Fuzz Tests (`fuzz_test.go`)

- `FuzzCheckPath` — random strings never panic, always return valid error or nil
- `FuzzScrubOutput` — random input never panics, output length ≤ input length +
redaction marker overhead

### Cross-layer Tests (`security_pipeline_test.go`)

Add cases to existing test file:

- `cat .env` → rejected
- `cat README.md` → allowed
- `printenv` (bare) → rejected
- `printenv PATH` → allowed
- `head .aws/credentials` → rejected

## Open Questions

1. **`grep -f .env`** — Should we check flag values that take file paths?
Manifests know which flags have `takes_value: true`, but we'd need to know
which of those values are file paths. Could inspect based on the flag name
(e.g., `-f`, `--file`, `--include-from`). Start simple, iterate.

2. **Symlinks** — We can't resolve symlinks before execution (the file is on the
remote host). The heuristic only checks the path string as written. This is
an accepted limitation of the local-heuristic approach.

3. **`find` output** — `find / -name .env` doesn't read secrets, it lists paths.
Should we block `find` commands that search for sensitive filenames? Leaning
toward yes for `-name .env` patterns but this adds complexity.
Loading