agent-guard

Outbound change control for AI coding agents — action and content. Your agent writes code and runs tests freely; the moment it tries to push, publish, deploy, or send a secret out, agent-guard is the gate.

agent-guard is for developers running AI coding agents — Claude Code, Cursor, Codex CLI, Aider — who don't want the next git push --force to be the agent's idea, and don't want a stray .env quietly making it into the model's context.

Two layers of outbound control, one decision surface:

Action layer (today): gate git push, npm publish, docker push, gh release create, non-local HTTP mutations, rm -rf — before they become real
Content layer (roadmap): detect credentials and PII in tool inputs and outputs before they reach the LLM provider or external API
Audit layer: every decision signed with Ed25519 — tamper-evident receipts ready for EU AI Act articles 28-31 evidence

Best fit: solo and small-team devs running coding agents in real workflows. Local-first by design — no cloud, no telemetry, no data leaves your machine.

Why now: EU AI Act enforcement begins 2026-08-02. Claude Code's PreToolUse hook has known gaps with MCP tools (#33106). DNS-tunnel credential exfiltration exploits (CVE-2025-55284) are already in the wild. The cost of "the agent did something irreversible" is no longer hypothetical.

Latest Release

Prerelease: v0.2.0-rc1
Announcement: GitHub Discussions #1

Verify Locally

If you are touching the repository itself, use the shared verification entrypoint:

./scripts/verify.sh full

Useful narrower paths:

./scripts/verify.sh rust
./scripts/verify.sh lint
./scripts/verify.sh python
./scripts/verify.sh node

The verification script uses temporary directories for Python build/test work so routine verification does not leave venv_* style residue in the repository root.

Try The Preset First

The fastest adoption path is the zero-config outbound preset. It covers all five action-layer categories (code egress, package release, artifact egress, remote mutation, destructive shell) with sensible defaults, so you do not have to write your first rule.

cargo install --path crates/guard-hook   # one-time install of the Claude Code adapter
guard-hook check \
  --policy presets/coding-agent-outbound.yaml \
  --agent-id smoke-test < event.json

A real git push from the agent then surfaces as an ask decision; a git push --force is denied outright; a cargo build passes through with no friction. See presets/README.md for adoption with the Rust SDK, Node binding, or Claude Code PreToolUse hook, and for the contributing guide on new presets.

For a runnable decision preview of the preset — an agent finishing a feature, then the gate firing on git push — use the bundled demo:

npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:outbound --prefix crates/agent-guard-node

If you prefer a runnable end-to-end demo of the multi-side-effect runtime, the Node side-effect wedge is also wired up:

npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:wedge --prefix crates/agent-guard-node

What you should see:

=== agent-guard side-effect wedge ===

[1] shell decision: execute
[2] file decision: execute
[3] http decision: execute
[4] remote publish decision: ask_for_approval

That path is documented in Side-Effect Wedge Demo. For the fastest shell-only proof, use Three-Minute Proof.

What It Does

The core runtime decision now looks like this:

agent action (outbound moment)
  -> agent-guard
  -> execute | deny | ask_for_approval | handoff
  -> optional guard-owned execution
  -> Ed25519-signed audit record

This is the difference between:

hoping the model behaves
and putting an explicit gate in front of every outbound action

Today, the runtime can already own execution for:

shell / terminal
file write
outbound mutation HTTP

Together those three surfaces cover the action-layer categories the preset bundles (code egress, package release, artifact egress, remote mutation, destructive shell).

Why Developers Adopt It

A real outbound boundary, not prompt-only safety: git push, npm publish, docker push, rm -rf, kubectl apply all hit a decision point before they become real.
Zero-config preset: a copy-able policy that covers the five action-layer categories on day one — no rule-writing required.
Small integration surface: wrap existing LangChain-style tools or OpenAI-style handlers, or hook into Claude Code's PreToolUse via guard-hook. No runtime rewrite.
Tamper-evident audit: every decision is Ed25519-signed, JSONL-formatted, and ready to map onto EU AI Act articles 28-31 without an enterprise control plane.

Best Fit Right Now

solo and small-team devs running Claude Code / Cursor / Codex CLI / Aider against real codebases
shell-enabled coding agents that publish, push, deploy, or otherwise produce outbound effects
teams that want a tamper-evident audit trail before the EU AI Act enforcement deadline

Not The First Thing To Reach For

chat-only assistants with no tool execution
teams looking for a full orchestration framework
teams expecting a finished enterprise control plane on day one

Adjacent Layer: Loop Governance

agent-guard controls the outbound side effect on each tool call. It deliberately does not govern the surrounding autonomous loop — budget caps, verifier gates, retry admission, and JSONL run records are a different failure mode (a 47-retry overnight bill vs. a single rogue git push).

For that layer, see MartinLoop: it wraps autonomous coding agents with budgets, verifier gates, and run records. The two layers compose — MartinLoop decides whether the next attempt is admitted; agent-guard decides whether the side effects inside that attempt are allowed to leave.

Current Scope

What is strong today (action layer):

the five outbound action categories — code egress, package release, artifact egress, remote mutation, destructive shell — are covered by a zero-config preset
shell / terminal, file write, and outbound mutation HTTP are the underlying runtime proof surfaces
normalized runtime decisions, approval flows, and Ed25519-signed audit records are available now
the SDK already includes policy signing, execution receipts, metrics, anomaly detection, and SIEM export beyond the narrow wedge

What is experimental and opt-in (content layer):

credential / PII detection on outbound content — write_file content and http_request body — behind the off-by-default content feature, with three enforcement modes (block / mask / warn). See Content layer below.

What is roadmap (content layer):

detection on tool inputs (prompts) before they reach the LLM provider, not just outbound effects
HTTP method matching in policy (today the schema is URL-only; method-aware filtering goes host-side — see presets/README.md)
distribution as a Claude Code plugin / ECC marketplace entry

What to understand before integrating:

raw runtime APIs expose execute | deny | ask_for_approval | handoff
adapter enforce is still strongest on shell-like execution paths today
Bash has the deepest validator path; read_file / write_file normalize paths and fail closed on symlink escapes; HTTP policy matching is URL-centric (see roadmap)
Python and Node bindings use the SDK's default sandbox selection in the current release; explicit backend selection is deferred until pilot demand surfaces
broader capability coverage is intentionally narrow, not generic
broader policy workflow and control-plane ideas are future expansion paths, not the phase-one hook

Content layer (experimental)

The action layer decides whether a call may leave. The content layer inspects what leaves with it. It is off by default — opt in with the content feature flag — and currently scans two surfaces: write_file content and http_request body.

Add a content block to any tool rule:

tools:
  http_request:
    mode: full_access
    content:
      mode: block          # block | mask | warn
      detect: [secrets, pii]   # optional; defaults to both

The three modes:

Mode	Effect
`block`	Deny the call when sensitive content is detected (`SENSITIVE_CONTENT_BLOCKED`).
`mask`	Execute a redacted copy — each finding becomes `[REDACTED:<label>]` — and emit a `ContentFinding` audit record.
`warn`	Execute unchanged, but emit a `ContentFinding` audit record.

Findings only ever expose the kind of data (e.g. AWS Access Key, Email), never the raw matched substring — audit records carry labels and counts, not secrets.

Run the example:

cargo run -p agent-guard-sdk --example content_policy --features content

This is a spike-grade detector set (named patterns + entropy fallback for secrets, regex + Luhn for PII), not a compliance-grade DLP engine. Treat it as a safety net, not the primary control.

Fastest Paths

Outbound preset: the zero-config policy for coding-agent users — start here
Claude Code plugin: one-command install — /plugin marketplace add XuebinMa/agent-guard, then /plugin install agent-guard@agent-guard
Claude Code PreToolUse hook: wire guard-hook into your live Claude Code session manually
Node Quickstart: shortest programmatic path for a new developer
Side-Effect Wedge Demo: runnable proof of the multi-side-effect runtime
Secure Shell Tools: first integration when shell is the dominant risk
Check vs Enforce: when to keep your handler vs when to move execution into agent-guard
Framework Support Matrix: current Node / Python / Rust adoption surfaces
User Manual: install, policy basics, and SDK integration

Additional references:

Framework Entry Points

Claude Code: the guard-hook PreToolUse adapter is the lowest-friction entry — point one --policy flag at the outbound preset
Node: strongest programmatic surface, with wrappers for LangChain-style tools and OpenAI-style handlers
Python: wrap_langchain_tool / wrap_openai_tool are available; a real-package validation script ships, automated CI version matrix is the remaining gap
Rust SDK: most direct integration path for hosts that want explicit control over side-effect decisioning and execution

Contributing

We welcome security research and contributions. Please see CONTRIBUTING.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 252 Commits
.claude-plugin		.claude-plugin
.claude		.claude
.github/workflows		.github/workflows
crates		crates
demos		demos
docs		docs
hooks		hooks
packages/agent-guard-plugin		packages/agent-guard-plugin
presets		presets
scripts		scripts
tests/cross-language-parity		tests/cross-language-parity
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
deny.toml		deny.toml
policy.example.yaml		policy.example.yaml
pyproject.toml		pyproject.toml
release.toml		release.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

agent-guard

Latest Release

Verify Locally

Try The Preset First

What It Does

Why Developers Adopt It

Best Fit Right Now

Not The First Thing To Reach For

Adjacent Layer: Loop Governance

Current Scope

Content layer (experimental)

Fastest Paths

Framework Entry Points

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

agent-guard

Latest Release

Verify Locally

Try The Preset First

What It Does

Why Developers Adopt It

Best Fit Right Now

Not The First Thing To Reach For

Adjacent Layer: Loop Governance

Current Scope

Content layer (experimental)

Fastest Paths

Framework Entry Points

Contributing

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages