Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
321 changes: 321 additions & 0 deletions docs/security/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
# ToolHive Security Documentation

This directory contains comprehensive security documentation for the ToolHive platform, including threat models, attack trees, and security best practices.

## πŸ“š Documentation Index

### [Attack Tree](./attack-tree.md)
Visual representation of potential attack vectors against ToolHive across all deployment modes. Includes:
- **Attack chains** showing step-by-step compromise paths
- **Risk classifications** (High/Medium/Low) for each attack vector
- **Cost estimates** for attacker effort and prerequisites
- **Threat actor profiles** from script kiddies to nation-state actors
- **Key attack chains** with detailed mitigation strategies

**Use this when**: Planning defense-in-depth strategies, prioritizing security investments, or assessing threat exposure.

### [Threat Model](./threat-model.md)
STRIDE-based threat analysis of all ToolHive components. Includes:
- **Data flow diagrams** (DFDs) for Local, Kubernetes, and Remote MCP scenarios
- **STRIDE analysis** (Spoofing, Tampering, Repudiation, Information Disclosure, DoS, Privilege Escalation)
- **Critical asset inventory** with sensitivity classifications
- **Trust boundaries** between system components
- **Top 10 critical threats** with immediate action items
- **Security control recommendations** for each component

**Use this when**: Designing new features, reviewing architectural changes, or conducting security assessments.

## 🎯 Quick Reference

### Critical Security Assets (Priority Order)

| Asset | Location | Protection Mechanism |
|-------|----------|---------------------|
| 1. **Secrets (API Keys, Tokens)** | OS Keyring, K8s Secrets | AES-256-GCM, RBAC |
| 2. **Container Runtime Socket** | `/var/run/docker.sock` | Socket authentication, rootless mode |
| 3. **OAuth Access Tokens** | Memory, optional cache | PKCE, short TTL, HTTPS |
| 4. **JWT Signing Keys** | Config files, K8s Secrets | Strong algorithms (RS256/ES256), rotation |
| 5. **etcd Cluster** | Kubernetes control plane | Encryption at rest, network isolation |

### Top 5 Attack Vectors to Mitigate First

1. **Secrets Exposure** (Attack Tree: `SECRETS_LOCAL`, `K8S_SECRETS`)
- Implement file permissions 0600 on encrypted secrets
- Enable etcd encryption at rest
- Audit secret access patterns

2. **Container Runtime Abuse** (Attack Tree: `DOCKER_SOCKET`)
- Never mount Docker socket into containers
- Use rootless containers where possible
- Implement runtime authentication

3. **Privilege Escalation** (Threat Model: Elevation of Privilege)
- Drop all container capabilities by default
- Enforce Pod Security Standards in Kubernetes
- Never allow privileged containers

4. **Supply Chain Compromise** (Attack Tree: `SUPPLY`)
- Implement image signature verification
- Scan dependencies and images regularly
- Use registry allow-lists

5. **Authentication Bypass** (Threat Model: Spoofing)
- Enforce strong JWT signing (RS256/ES256)
- Implement PKCE for all OAuth flows
- Validate issuer, audience, and signature

## πŸ›‘οΈ Security by Deployment Mode

### Local Deployment (CLI/Desktop)

**Primary Threats:**
- Local secret theft from OS keyring
- Container escape via Docker socket
- Electron vulnerabilities in Desktop UI
- Process memory extraction

**Key Mitigations:**
- Enable OS keyring encryption
- Use rootless container runtime
- Keep Electron framework updated
- Implement code signing for binaries

**Documentation**: See [Threat Model Β§4.1, Β§4.2, Β§4.6](./threat-model.md)

### Kubernetes Deployment (Operator)

**Primary Threats:**
- RBAC misconfiguration allowing secret access
- CRD injection to deploy malicious workloads
- etcd direct access
- Operator privilege escalation

**Key Mitigations:**
- Implement least-privilege RBAC
- Enable admission webhooks
- Encrypt etcd at rest
- Use namespace isolation

**Documentation**: See [Threat Model Β§4.3, Β§4.4, Β§4.7](./threat-model.md)

### Remote MCP Servers

**Primary Threats:**
- OAuth/OIDC flow compromise
- PKCE bypass leading to session hijacking
- Man-in-the-middle attacks
- Token theft and replay

**Key Mitigations:**
- Enforce PKCE mandatory
- Implement certificate pinning
- Use short-lived tokens
- Validate issuer and audience

**Documentation**: See [Threat Model Β§4.10](./threat-model.md), [Remote MCP Authentication](../remote-mcp-authentication.md)

## πŸ” Security Review Checklist

Use this checklist when reviewing pull requests or new features:

### Authentication & Authorization
- [ ] JWT tokens use strong algorithms (RS256/ES256, not HS256)
- [ ] OAuth flows enforce PKCE
- [ ] Cedar policies follow least-privilege principle
- [ ] User inputs are validated before authentication checks
- [ ] Token expiry times are reasonable (access: 15m, refresh: 7d)

### Secrets Management
- [ ] Secrets never hardcoded in code or configs
- [ ] Secrets referenced by name, not embedded
- [ ] Secrets redacted in all logs
- [ ] File permissions 0600 on secret storage
- [ ] K8s secrets use SecretKeyRef, not direct values

### Container Security
- [ ] Containers run as non-root user
- [ ] All capabilities dropped, only required ones added
- [ ] No privileged containers allowed
- [ ] Resource limits (CPU, memory, PID) specified
- [ ] Network isolation enabled for untrusted workloads
- [ ] Volume mounts are read-only where possible

### Network Security
- [ ] All external connections use HTTPS/TLS
- [ ] Certificate validation enabled (no InsecureSkipVerify)
- [ ] Egress proxy enforces allow-list for isolated workloads
- [ ] No sensitive data in URLs or query parameters
- [ ] Rate limiting implemented on public endpoints

### Input Validation
- [ ] All user inputs validated against allow-list
- [ ] Path traversal checks for file operations
- [ ] Command injection prevention (no shell=true)
- [ ] JSON/YAML parsing uses safe libraries
- [ ] Maximum size limits on inputs

### Kubernetes
- [ ] RBAC follows least-privilege (no cluster-admin)
- [ ] Admission webhooks validate CRDs
- [ ] Pod Security Standards enforced
- [ ] Network policies restrict pod-to-pod traffic
- [ ] Secrets mounted as volumes, not environment variables

### Audit & Monitoring
- [ ] Security-relevant events logged (auth, authz, secret access)
- [ ] Logs include correlation IDs for tracing
- [ ] No sensitive data in logs (credentials, PII, tokens)
- [ ] Distributed tracing enabled (OpenTelemetry)
- [ ] Alerts configured for security events

## πŸ“Š Risk Assessment Matrix

| Likelihood β†’ <br> Impact ↓ | Low | Medium | High |
|---------------------------|-----|--------|------|
| **Critical** | Medium Risk | High Risk | **Critical Risk** |
| **High** | Low Risk | Medium Risk | High Risk |
| **Medium** | Low Risk | Low Risk | Medium Risk |
| **Low** | Acceptable | Low Risk | Low Risk |

### Risk Categories
- **Critical Risk**: Immediate action required, security incident likely
- **High Risk**: Address within current sprint, significant threat
- **Medium Risk**: Address within quarter, moderate threat
- **Low Risk**: Address as time permits, minimal threat
- **Acceptable**: No action needed, acceptable risk level

## πŸ” Security Best Practices

### For Developers

1. **Never commit secrets** to version control
- Use `.gitignore` for config files with secrets
- Scan commits with tools like `git-secrets` or `trufflehog`

2. **Validate all inputs** at the earliest possible point
- Reject invalid inputs, don't try to sanitize
- Use allow-lists, not deny-lists

3. **Fail securely** when errors occur
- Default to deny access on error
- Log security-relevant errors
- Don't expose internal details in error messages

4. **Use security linters**
- `gosec` for Go code
- `bandit` for Python code
- `eslint-plugin-security` for JavaScript

5. **Write security tests**
- Test authentication bypass scenarios
- Test authorization with different user roles
- Test input validation with fuzzing

### For Operators

1. **Keep systems patched**
- Enable Dependabot/Renovate for dependencies
- Subscribe to security mailing lists
- Test patches in staging before production

2. **Monitor security events**
- Set up alerts for failed authentication
- Monitor for unusual secret access patterns
- Track container escape attempts

3. **Practice principle of least privilege**
- Grant minimum required RBAC permissions
- Use namespace isolation
- Regular access reviews

4. **Backup and disaster recovery**
- Regular backups of secrets and configs
- Test restore procedures
- Document incident response plan

5. **Security training**
- Regular security awareness training
- Threat modeling workshops
- Tabletop exercises for incident response

## 🚨 Reporting Security Vulnerabilities

**DO NOT** open public GitHub issues for security vulnerabilities.

Instead, follow our [Security Policy](../../SECURITY.md):

1. Email [email protected] with details
2. Include proof-of-concept if available
3. Wait for response before public disclosure
4. Coordinate disclosure timeline with security team

We typically respond within 48 hours and aim to patch critical issues within 7 days.

## πŸ“– Related Documentation

### ToolHive Architecture
- [Architecture Overview](../arch/00-overview.md)
- [Deployment Modes](../arch/01-deployment-modes.md)
- [Secrets Management](../arch/04-secrets-management.md)
- [RunConfig and Permissions](../arch/05-runconfig-and-permissions.md)

### Security Features
- [Authorization Framework](../authz.md) - Cedar policies
- [Remote MCP Authentication](../remote-mcp-authentication.md) - OAuth/OIDC
- [Middleware](../middleware.md) - Auth/Authz/Audit chain
- [Runtime Implementation Guide](../runtime-implementation-guide.md) - Security mapping

### Operational Security
- [Kubernetes Integration](../kubernetes-integration.md)
- [Operator Documentation](../../cmd/thv-operator/README.md)
- [Observability](../observability.md) - Logging and monitoring

## πŸ”„ Maintenance

### Review Schedule

| Activity | Frequency | Owner | Next Due |
|----------|-----------|-------|----------|
| Threat model review | Quarterly | Security Team | 2026-02-19 |
| Attack tree update | Quarterly | Security Team | 2026-02-19 |
| Penetration testing | Annually | External Auditor | TBD |
| Security training | Bi-annually | All Teams | TBD |
| Incident response drill | Quarterly | DevOps + Security | TBD |

### Change Management

When to update these documents:
- βœ… New features that handle secrets or authentication
- βœ… Changes to RBAC or permission models
- βœ… New deployment modes or components
- βœ… After security incidents or near-misses
- βœ… New threat intelligence or attack patterns
- ❌ Minor bug fixes without security impact
- ❌ Documentation-only changes
- ❌ Performance optimizations

### Version History

| Version | Date | Changes | Author |
|---------|------|---------|--------|
| 1.0 | 2025-11-19 | Initial release with attack tree and threat model | Security Team |

## 🀝 Contributing

Security improvements are always welcome! When contributing:

1. **For new features**: Update threat model with STRIDE analysis
2. **For security fixes**: Reference threat model sections addressed
3. **For architectural changes**: Update attack tree with new vectors
4. **For incident learnings**: Document in threat model and attack tree

See [CONTRIBUTING.md](../../CONTRIBUTING.md) for general contribution guidelines.

## πŸ“ License

These security documents are part of the ToolHive project and are licensed under [Apache 2.0](../../LICENSE).

---

**Questions or concerns?** Contact [email protected] or open a discussion in our [Discord](https://discord.gg/stacklok).

Loading
Loading