The agents system implements a comprehensive security model designed to prevent unauthorized use, prompt injection attacks, and malicious code insertion. The system uses multiple layers of defense including user authentication, keyword-based command triggers, and real-time commit validation during PR processing.
- Zero Trust by Default: No action is taken without explicit authorization
- Defense in Depth: Multiple security layers that work independently
- Audit Trail: All actions are logged with user attribution
- Fail Secure: Any security failure results in no action taken
- Real-time Validation: Continuous security checks during execution
Our agents implement defense-in-depth with multiple security layers:
- GitHub Actions
ifconditions prevent workflows from running for unauthorized users - Fail-fast security checks terminate workflows immediately for unauthorized access
- Minimal GITHUB_TOKEN permissions following principle of least privilege
- Allow List Based Authorization: Only specific GitHub usernames can trigger agent actions
- Rate Limiting: Prevents abuse with configurable request limits per user
- Repository Validation: Restricts agents to specific repositories
- Comprehensive Security Checks: All layers validated before any action
The agent admins list is configured in .agents.yaml under the security.agent_admins field. Only these users can trigger agent actions via [Action][Agent] keywords. The repository owner (extracted from GITHUB_REPOSITORY environment variable) is always included automatically.
Agent admins have the following special capabilities:
- Trigger Agent Actions: Use
[Action][Agent]keywords to invoke agents - Extend Iteration Limits: Use
[CONTINUE]to allow more agent iterations (see below) - Authoritative Comments: Their comments are treated as authoritative in review contexts
To prevent infinite loops where agents repeatedly try to fix the same issues, the system tracks iteration counts based on PR comments. Each agent type has its own independent counter.
- Comment-Based Tracking: Iterations are counted by parsing PR comments with agent metadata markers
- Separate Counters: The
review-fixagent (responds to AI reviews) andfailure-fixagent (responds to CI failures) have independent iteration counts - Max Iterations: By default, each agent is limited to 5 iterations before pausing
- Metadata Format: Comments include
<!-- agent-metadata:type=TYPE:iteration=N -->for tracking
Agent admins can extend an agent's iteration limit by posting a comment containing [CONTINUE]:
[CONTINUE]
I've reviewed the progress. Let the agent continue working on this.
How it works:
- Each
[CONTINUE]adds the base limit to the effective max - Base limit is 5, so: 1x
[CONTINUE]= max 10, 2x[CONTINUE]= max 15, etc. - The iteration count itself is not reset - it keeps incrementing
Key Details:
- Case-insensitive:
[CONTINUE],[continue], and[Continue]all work - Admin-only: Only users in
security.agent_adminscan extend limits - Cumulative: Multiple
[CONTINUE]comments stack (each adds 5 more iterations) - Per-PR: The count applies to the entire PR comment history
- PR validation runs, agent tries to fix issues (iterations 1-5)
- Agent hits max iterations (5), posts "Iteration Limit Reached" message
- Human reviews progress and decides the agent should continue
- Admin comments:
[CONTINUE] Making good progress, keep going - Effective max is now 10 - agent can run iterations 6-10
- If needed, another
[CONTINUE]would extend to 15, and so on
# In .agents.yaml
automation:
max_auto_fix_iterations: 5 # Base limit per agent (extended by [CONTINUE])Agents are controlled exclusively through a keyword trigger system that requires explicit commands from authorized users. This prevents accidental activation and provides clear audit trails.
The trigger format is: [Action][Agent]
Security Properties:
- Case-insensitive matching for user convenience
- Must be exact format with square brackets
- Only the most recent trigger is processed
- Invalid triggers are ignored (fail secure)
[Approved]- Approve and process the issue/PR (includes fix and implement requests)[Review]- Review and address feedback[Close]- Close the issue/PR[Summarize]- Provide a summary[Debug]- Debug the issue
[Claude]- Claude Code agent[Gemini]- Gemini CLI agent[OpenCode]- Open-source coding AI[Crush]- Charm Bracelet Crush AI shell assistant
[Approved][Claude]- Have Claude process the issue/PR[Approved][OpenCode]- Have OpenCode implement or fix the request[Review][Gemini]- Have Gemini review and address PR feedback[Summarize][Claude]- Have Claude summarize the discussion
- User Action: An allowed user comments with
[Action][Agent] - Authentication: System verifies user is in agent_admins
- Authorization: System checks rate limits and repository permissions
- Validation: System ensures trigger is on latest commit (for PRs)
- Execution: Agent performs requested action
- Audit: All actions logged with full context
Security settings are configured in .agents.yaml:
security:
# Users authorized to trigger agent actions via [Approved][Agent] keywords
# CRITICAL: Only add trusted human users - these can execute code via agents
agent_admins:
- AndrewAltimit # Repository owner
# Trusted sources for comment context (used in PR reviews)
# Comments from these accounts are marked as trusted when providing context to AI
# This does NOT grant them ability to trigger agent actions
trusted_sources:
- AndrewAltimit # Repository owner
- github-actions[bot] # GitHub Actions bot
- dependabot[bot] # Dependabotagent_admins: Array of GitHub usernames authorized to trigger agent actions (humans only)trusted_sources: Array of accounts whose comments are trusted for context (includes bots)log_violations: Whether to log security violationsreject_message: Custom message shown to unauthorized usersrate_limit_window_minutes: Time window for rate limiting (default: 60)rate_limit_max_requests: Maximum requests per window (default: 10)allowed_repositories: Array of allowed repositories (empty = all repos from owner)
You can also set the allow list via environment variable:
export AI_AGENT_ALLOWED_USERS="user1,user2,bot-name[bot]"You can also set allowed repositories via environment variable:
export AI_AGENT_ALLOWED_REPOS="owner/repo1,owner/repo2"The security functionality is implemented in Rust and available via the github-agents CLI:
# Check if user is allowed
github-agents security check-user --username "AndrewAltimit"
# Check if action is allowed
github-agents security check-action --action "issue_approved"
# Validate PR commit hasn't changed
github-agents security validate-pr-commit --pr 123 --expected-sha "abc1234"
# Parse trigger from comment
github-agents security parse-trigger --comment "[Approved][Claude]"Effective security requires trained human oversight. All team members working with agents should complete the AI Safety Training Guide to understand:
- Hidden Capabilities: AI systems may develop or hide capabilities that only emerge under specific conditions
- Deceptive Alignment: Systems may behave safely during testing but activate harmful behaviors when deployed
- Scalable Oversight: Techniques for managing AI systems that exceed human capabilities in specific domains
- Specification Gaming: How AI systems optimize exactly what we measure, not what we intend
- Trust Frameworks: Building appropriate trust levels without over-reliance on AI decisions
The security system enforces human oversight through:
- Keyword Triggers: Explicit human approval required for all actions
- Allow List: Only authorized users can trigger agent actions
- Commit Validation: Human approval tied to specific code states
- Emergency Procedures: Clear protocols for suspected misalignment
The PR monitoring system implements sophisticated commit-level security to prevent code injection attacks during the review and modification process.
Without commit validation, an attacker could:
- Create an innocent-looking PR
- Wait for approval from an authorized user
- Push malicious code after approval but before AI processing
- Have the AI agent unknowingly work on and push malicious code
Stage 1 - Approval Commit Tracking
- When
[Approved][Claude]is issued, the system records the exact commit SHA - This creates an immutable "point-in-time" snapshot of what was approved
- The approval is cryptographically tied to the repository state
Stage 2 - Pre-Execution Validation
- Prevents any work if the PR has changed since approval
- Immediate failure with clear security message
Stage 3 - Pre-Push Validation
- Final check before any code enters the repository
- Drops all work if PR was modified during processing
- Prevents race conditions and TOCTOU attacks
The system implements real-time secret masking through gh-validator, a Rust-based GitHub CLI wrapper that validates and sanitizes all GitHub comments before they are posted. This is a deterministic, automatic process that ensures secrets can never appear in public comments.
Agent gh command -> gh-validator (shadows gh) -> Secret Masking -> Real gh CLI -> GitHub
The gh-validator binary is installed as gh in a higher-priority PATH directory (e.g., ~/.local/bin/gh), shadowing the real GitHub CLI. When any gh command runs:
- Pass-through for non-content commands: Commands like
gh pr listexecute immediately - Validation for content commands: Commands with
--body,--body-file,--title, etc. are validated:- Secrets are masked based on
.secrets.yamlconfiguration - Unicode emojis are blocked (may display as corrupted characters)
- Formatting is validated for reaction images
- URLs in
--body-fileare verified to exist (with SSRF protection)
- Secrets are masked based on
- Execution: After validation, the real
ghbinary is called with (potentially modified) arguments
environment_variables:
- GITHUB_TOKEN
- OPENROUTER_API_KEY
- DB_PASSWORD
patterns:
- name: GITHUB_TOKEN
pattern: "ghp_[A-Za-z0-9_]{36,}"
auto_detection:
enabled: true
include_patterns: ["*_TOKEN", "*_SECRET", "*_KEY"]
exclude_patterns: ["PUBLIC_*"]The validator searches for .secrets.yaml in:
- Current working directory (and parent directories up to git root)
- Binary directory (and parent directories)
~/.secrets.yaml~/.config/gh-validator/.secrets.yaml
Common secret formats are detected and masked:
- GitHub tokens:
ghp_*,ghs_*,github_pat_* - API keys:
sk-*,pk-* - JWT tokens:
eyJ* - Bearer tokens
- URLs with embedded credentials
- Private key blocks
# Quick install (recommended)
curl -sSL https://raw.githubusercontent.com/AndrewAltimit/template-repo/main/tools/rust/gh-validator/install.sh | bash
# Ensure ~/.local/bin comes before /usr/bin in PATH
export PATH="$HOME/.local/bin:$PATH"- Universal: Works with all agents and automation tools using
ghCLI - Automatic: No agent configuration required - just install and forget
- Fail-Closed: If configuration is missing or URLs can't be verified, commands are blocked
- SSRF Protection: Only whitelisted hostnames allowed for reaction images
- Single Binary: No runtime dependencies, fast startup, cross-platform support
- Transparent: Agents are unaware of masking (only stderr notification)
See tools/rust/gh-validator/README.md for complete documentation.
Both git-guard and gh-validator are hardened against bypass through the Wrapper Guard system, which relocates real binaries behind group-restricted permissions and provides structured audit logging. See Wrapper Guard Documentation for the full security model.
The agents use a sophisticated deduplication system to prevent duplicate processing and ensure each issue/PR is only handled once per trigger.
-
Comment-Based State Tracking
- Every agent action results in a comment with the
[Agent]tag - These comments serve as persistent "claims" on issues/PRs
- Before processing, agents check for existing claims
- Every agent action results in a comment with the
-
Deduplication Flow
New Issue/PR Event | Time Filter (last 24 hours) <- Deterministic pre-filter | Has [Action][Agent] trigger? <- Only process explicit requests | Security checks passed? | Has [Agent] comment? <- THE KEY CHECK | No? -> Process & Post Comment (stake claim) Yes? -> Skip (already claimed) -
Implementation Details
- Uses
has_agent_comment()to check for ANY comment containing[Agent] - If found, skips processing entirely
- Simple but effective for preventing duplicate processing
- Uses
The workflows use GitHub Environments for secure secret management:
jobs:
monitor-issues:
environment: production # Uses environment secrets
steps:
- name: Run agent
env:
GITHUB_TOKEN: ${{ secrets.AGENT_TOKEN }}Setup Required:
- Go to Settings -> Environments -> New environment
- Create a "production" environment
- Add secret:
AGENT_TOKEN(your GitHub PAT) - Add variable:
ENABLE_AGENTS=true(to enable the feature) - Configure protection rules as needed
See GitHub Environments Setup Guide for detailed instructions.
For local testing:
# Option 1: Use environment variable
export GITHUB_TOKEN="your-token-here"
github-agents issue-monitor
# Option 2: Use gh CLI authentication (recommended)
gh auth login
github-agents issue-monitorThe agents require a fine-grained Personal Access Token with exactly these permissions:
| Permission | Access Level | Why It's Needed |
|---|---|---|
| Actions | Read | View workflow runs and logs |
| Commit statuses | Read | Check CI/CD status on PRs |
| Contents | Read + Write | Clone repo, create branches, push commits |
| Issues | Read + Write | Read issues, post comments |
| Pull requests | Read + Write | Read PRs, create PRs, post comments |
Important: Do NOT grant any Account permissions - only Repository permissions are needed.
- Rotate tokens every 90 days
- Use GitHub's token expiration feature
- Monitor token usage in GitHub Settings
The repository includes an advanced Sleeper Agents System for identifying potential backdoors and hidden behaviors in AI models:
- Backdoor Triggers: Hidden activation patterns that cause unexpected behavior
- Deceptive Alignment: Models pretending to be aligned during testing
- Goal Misgeneralization: Models pursuing different objectives than trained
- Hidden Capabilities: Abilities that only emerge under specific conditions
- Residual Stream Analysis: Using TransformerLens to examine internal model activations
- Attention Pattern Analysis: Identifying suspicious attention head behaviors
- Layer-wise Probing: Detecting hidden representations across model layers
- Behavioral Testing: Comprehensive test suites for various attack scenarios
# Run sleeper agents tests in CI/CD
docker compose run --rm sleeper-eval-cpu python -m packages.sleeper_agents.cli evaluate \
--model "gpt2" --test-suite "robustness"See the Sleeper Agents Documentation for detailed usage instructions.
- Keep Allow List Minimal: Only add trusted users and bots
- Review Regularly: Periodically audit the allow list
- Monitor Logs: Check for security violations in agent logs
- Never Disable: Keep security enabled in production
- Use Bot Accounts: Create dedicated bot accounts for automation
If a security incident occurs:
- Immediate: Disable agents via environment variable
- Investigate: Check logs for unauthorized attempts
- Remediate: Remove compromised users from allow list
- Document: Record incident details
- Improve: Update security measures based on findings
All agents are configured to run in fully autonomous mode for CI/CD environments. This is a critical requirement for automated workflows.
In CI/CD environments (GitHub Actions, GitLab CI, etc.):
- No human interaction is possible (no TTY)
- Workflows must run unattended
- Interactive prompts would block pipelines indefinitely
- Agents run in sandboxed environments for security
Each agent has specific flags for autonomous operation:
- Claude:
--print --dangerously-skip-permissions - Gemini:
-m model -p prompt(non-interactive by design) - OpenCode:
--non-interactive - Crush:
--non-interactive --no-update
All agents must follow these guidelines to prevent accidentally notifying random GitHub users:
- NEVER use @ mentions unless referring to actual repository maintainers
- Do NOT use @Gemini, @Claude, @OpenAI, etc. - these may ping unrelated GitHub users
- Instead, refer to agents without the @ symbol: "Gemini", "Claude", "OpenAI"
- Only @ mention users who are:
- The repository owner
- Active contributors listed in the repository
- Users who have explicitly asked to be mentioned
When referencing AI reviews, use phrases like:
- "As noted in Gemini's review..."
- "Addressing Claude's feedback..."
- "Per the AI agent's suggestion..."
- NEVER hardcode tokens in code
- NEVER commit tokens to the repository
- NEVER log tokens without redaction
- NEVER use tokens in command line arguments (they appear in process lists)
- NEVER share tokens between environments (use separate environments)
- NEVER disable environment protection rules for production
- NEVER disable automatic secret masking in
.secrets.yaml - NEVER bypass PreToolUse hooks when posting GitHub comments