Skip to content

XuebinMa/agent-guard

Repository files navigation

agent-guard

Outbound change control for AI coding agents — action and content. Your agent writes code and runs tests freely; the moment it tries to push, publish, deploy, or send a secret out, agent-guard is the gate.

Version Focus License MSRV

agent-guard is for developers running AI coding agents — Claude Code, Cursor, Codex CLI, Aider — who don't want the next git push --force to be the agent's idea, and don't want a stray .env quietly making it into the model's context.

Two layers of outbound control, one decision surface:

  • Action layer (today): gate git push, npm publish, docker push, gh release create, non-local HTTP mutations, rm -rf — before they become real
  • Content layer (roadmap): detect credentials and PII in tool inputs and outputs before they reach the LLM provider or external API
  • Audit layer: every decision signed with Ed25519 — tamper-evident receipts ready for EU AI Act articles 28-31 evidence

Best fit: solo and small-team devs running coding agents in real workflows. Local-first by design — no cloud, no telemetry, no data leaves your machine.

Why now: EU AI Act enforcement begins 2026-08-02. Claude Code's PreToolUse hook has known gaps with MCP tools (#33106). DNS-tunnel credential exfiltration exploits (CVE-2025-55284) are already in the wild. The cost of "the agent did something irreversible" is no longer hypothetical.


Latest Release

Verify Locally

If you are touching the repository itself, use the shared verification entrypoint:

./scripts/verify.sh full

Useful narrower paths:

  • ./scripts/verify.sh rust
  • ./scripts/verify.sh lint
  • ./scripts/verify.sh python
  • ./scripts/verify.sh node

The verification script uses temporary directories for Python build/test work so routine verification does not leave venv_* style residue in the repository root.


Try The Preset First

The fastest adoption path is the zero-config outbound preset. It covers all five action-layer categories (code egress, package release, artifact egress, remote mutation, destructive shell) with sensible defaults, so you do not have to write your first rule.

cargo install --path crates/guard-hook   # one-time install of the Claude Code adapter
guard-hook check \
  --policy presets/coding-agent-outbound.yaml \
  --agent-id smoke-test < event.json

A real git push from the agent then surfaces as an ask decision; a git push --force is denied outright; a cargo build passes through with no friction. See presets/README.md for adoption with the Rust SDK, Node binding, or Claude Code PreToolUse hook, and for the contributing guide on new presets.

For a runnable decision preview of the preset — an agent finishing a feature, then the gate firing on git push — use the bundled demo:

npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:outbound --prefix crates/agent-guard-node

If you prefer a runnable end-to-end demo of the multi-side-effect runtime, the Node side-effect wedge is also wired up:

npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:wedge --prefix crates/agent-guard-node

What you should see:

=== agent-guard side-effect wedge ===

[1] shell decision: execute
[2] file decision: execute
[3] http decision: execute
[4] remote publish decision: ask_for_approval

That path is documented in Side-Effect Wedge Demo. For the fastest shell-only proof, use Three-Minute Proof.


What It Does

The core runtime decision now looks like this:

agent action (outbound moment)
  -> agent-guard
  -> execute | deny | ask_for_approval | handoff
  -> optional guard-owned execution
  -> Ed25519-signed audit record

This is the difference between:

  • hoping the model behaves
  • and putting an explicit gate in front of every outbound action

Today, the runtime can already own execution for:

  • shell / terminal
  • file write
  • outbound mutation HTTP

Together those three surfaces cover the action-layer categories the preset bundles (code egress, package release, artifact egress, remote mutation, destructive shell).


Why Developers Adopt It

  • A real outbound boundary, not prompt-only safety: git push, npm publish, docker push, rm -rf, kubectl apply all hit a decision point before they become real.
  • Zero-config preset: a copy-able policy that covers the five action-layer categories on day one — no rule-writing required.
  • Small integration surface: wrap existing LangChain-style tools or OpenAI-style handlers, or hook into Claude Code's PreToolUse via guard-hook. No runtime rewrite.
  • Tamper-evident audit: every decision is Ed25519-signed, JSONL-formatted, and ready to map onto EU AI Act articles 28-31 without an enterprise control plane.

Best Fit Right Now

  • solo and small-team devs running Claude Code / Cursor / Codex CLI / Aider against real codebases
  • shell-enabled coding agents that publish, push, deploy, or otherwise produce outbound effects
  • teams that want a tamper-evident audit trail before the EU AI Act enforcement deadline

Not The First Thing To Reach For

  • chat-only assistants with no tool execution
  • teams looking for a full orchestration framework
  • teams expecting a finished enterprise control plane on day one

Adjacent Layer: Loop Governance

agent-guard controls the outbound side effect on each tool call. It deliberately does not govern the surrounding autonomous loop — budget caps, verifier gates, retry admission, and JSONL run records are a different failure mode (a 47-retry overnight bill vs. a single rogue git push).

For that layer, see MartinLoop: it wraps autonomous coding agents with budgets, verifier gates, and run records. The two layers compose — MartinLoop decides whether the next attempt is admitted; agent-guard decides whether the side effects inside that attempt are allowed to leave.


Current Scope

What is strong today (action layer):

  • the five outbound action categories — code egress, package release, artifact egress, remote mutation, destructive shell — are covered by a zero-config preset
  • shell / terminal, file write, and outbound mutation HTTP are the underlying runtime proof surfaces
  • normalized runtime decisions, approval flows, and Ed25519-signed audit records are available now
  • the SDK already includes policy signing, execution receipts, metrics, anomaly detection, and SIEM export beyond the narrow wedge

What is experimental and opt-in (content layer):

  • credential / PII detection on outbound content — write_file content and http_request body — behind the off-by-default content feature, with three enforcement modes (block / mask / warn). See Content layer below.

What is roadmap (content layer):

  • detection on tool inputs (prompts) before they reach the LLM provider, not just outbound effects
  • HTTP method matching in policy (today the schema is URL-only; method-aware filtering goes host-side — see presets/README.md)
  • distribution as a Claude Code plugin / ECC marketplace entry

What to understand before integrating:

  • raw runtime APIs expose execute | deny | ask_for_approval | handoff
  • adapter enforce is still strongest on shell-like execution paths today
  • Bash has the deepest validator path; read_file / write_file normalize paths and fail closed on symlink escapes; HTTP policy matching is URL-centric (see roadmap)
  • Python and Node bindings use the SDK's default sandbox selection in the current release; explicit backend selection is deferred until pilot demand surfaces
  • broader capability coverage is intentionally narrow, not generic
  • broader policy workflow and control-plane ideas are future expansion paths, not the phase-one hook

Content layer (experimental)

The action layer decides whether a call may leave. The content layer inspects what leaves with it. It is off by default — opt in with the content feature flag — and currently scans two surfaces: write_file content and http_request body.

Add a content block to any tool rule:

tools:
  http_request:
    mode: full_access
    content:
      mode: block          # block | mask | warn
      detect: [secrets, pii]   # optional; defaults to both

The three modes:

Mode Effect
block Deny the call when sensitive content is detected (SENSITIVE_CONTENT_BLOCKED).
mask Execute a redacted copy — each finding becomes [REDACTED:<label>] — and emit a ContentFinding audit record.
warn Execute unchanged, but emit a ContentFinding audit record.

Findings only ever expose the kind of data (e.g. AWS Access Key, Email), never the raw matched substring — audit records carry labels and counts, not secrets.

Run the example:

cargo run -p agent-guard-sdk --example content_policy --features content

This is a spike-grade detector set (named patterns + entropy fallback for secrets, regex + Luhn for PII), not a compliance-grade DLP engine. Treat it as a safety net, not the primary control.


Fastest Paths

Additional references:


Framework Entry Points

  • Claude Code: the guard-hook PreToolUse adapter is the lowest-friction entry — point one --policy flag at the outbound preset
  • Node: strongest programmatic surface, with wrappers for LangChain-style tools and OpenAI-style handlers
  • Python: wrap_langchain_tool / wrap_openai_tool are available; a real-package validation script ships, automated CI version matrix is the remaining gap
  • Rust SDK: most direct integration path for hosts that want explicit control over side-effect decisioning and execution

Contributing

We welcome security research and contributions. Please see CONTRIBUTING.md for details.

Copyright © 2026 agent-guard team. Distributed under the MIT License.

About

AI Agent permission enforcement and sandbox security SDK, extracted from claw-code

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors