Outbound change control for AI coding agents — action and content. Your agent writes code and runs tests freely; the moment it tries to push, publish, deploy, or send a secret out,
agent-guardis the gate.
agent-guard is for developers running AI coding agents — Claude Code, Cursor, Codex CLI, Aider — who don't want the next git push --force to be the agent's idea, and don't want a stray .env quietly making it into the model's context.
Two layers of outbound control, one decision surface:
- Action layer (today): gate
git push,npm publish,docker push,gh release create, non-local HTTP mutations,rm -rf— before they become real - Content layer (roadmap): detect credentials and PII in tool inputs and outputs before they reach the LLM provider or external API
- Audit layer: every decision signed with Ed25519 — tamper-evident receipts ready for EU AI Act articles 28-31 evidence
Best fit: solo and small-team devs running coding agents in real workflows. Local-first by design — no cloud, no telemetry, no data leaves your machine.
Why now: EU AI Act enforcement begins 2026-08-02. Claude Code's PreToolUse hook has known gaps with MCP tools (#33106). DNS-tunnel credential exfiltration exploits (CVE-2025-55284) are already in the wild. The cost of "the agent did something irreversible" is no longer hypothetical.
- Prerelease:
v0.2.0-rc1 - Announcement: GitHub Discussions #1
If you are touching the repository itself, use the shared verification entrypoint:
./scripts/verify.sh fullUseful narrower paths:
./scripts/verify.sh rust./scripts/verify.sh lint./scripts/verify.sh python./scripts/verify.sh node
The verification script uses temporary directories for Python build/test work so routine verification does not leave venv_* style residue in the repository root.
The fastest adoption path is the zero-config outbound preset. It covers all five action-layer categories (code egress, package release, artifact egress, remote mutation, destructive shell) with sensible defaults, so you do not have to write your first rule.
cargo install --path crates/guard-hook # one-time install of the Claude Code adapter
guard-hook check \
--policy presets/coding-agent-outbound.yaml \
--agent-id smoke-test < event.jsonA real git push from the agent then surfaces as an ask decision; a git push --force is denied outright; a cargo build passes through with no friction. See presets/README.md for adoption with the Rust SDK, Node binding, or Claude Code PreToolUse hook, and for the contributing guide on new presets.
For a runnable decision preview of the preset — an agent finishing a feature, then the gate firing on git push — use the bundled demo:
npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:outbound --prefix crates/agent-guard-nodeIf you prefer a runnable end-to-end demo of the multi-side-effect runtime, the Node side-effect wedge is also wired up:
npm ci --prefix crates/agent-guard-node
npm run build:debug --prefix crates/agent-guard-node
npm run demo:wedge --prefix crates/agent-guard-nodeWhat you should see:
=== agent-guard side-effect wedge ===
[1] shell decision: execute
[2] file decision: execute
[3] http decision: execute
[4] remote publish decision: ask_for_approval
That path is documented in Side-Effect Wedge Demo. For the fastest shell-only proof, use Three-Minute Proof.
The core runtime decision now looks like this:
agent action (outbound moment)
-> agent-guard
-> execute | deny | ask_for_approval | handoff
-> optional guard-owned execution
-> Ed25519-signed audit record
This is the difference between:
- hoping the model behaves
- and putting an explicit gate in front of every outbound action
Today, the runtime can already own execution for:
- shell / terminal
- file write
- outbound mutation HTTP
Together those three surfaces cover the action-layer categories the preset bundles (code egress, package release, artifact egress, remote mutation, destructive shell).
- A real outbound boundary, not prompt-only safety:
git push,npm publish,docker push,rm -rf,kubectl applyall hit a decision point before they become real. - Zero-config preset: a copy-able policy that covers the five action-layer categories on day one — no rule-writing required.
- Small integration surface: wrap existing LangChain-style tools or OpenAI-style handlers, or hook into Claude Code's PreToolUse via
guard-hook. No runtime rewrite. - Tamper-evident audit: every decision is Ed25519-signed, JSONL-formatted, and ready to map onto EU AI Act articles 28-31 without an enterprise control plane.
- solo and small-team devs running Claude Code / Cursor / Codex CLI / Aider against real codebases
- shell-enabled coding agents that publish, push, deploy, or otherwise produce outbound effects
- teams that want a tamper-evident audit trail before the EU AI Act enforcement deadline
- chat-only assistants with no tool execution
- teams looking for a full orchestration framework
- teams expecting a finished enterprise control plane on day one
agent-guard controls the outbound side effect on each tool call. It deliberately does not govern the surrounding autonomous loop — budget caps, verifier gates, retry admission, and JSONL run records are a different failure mode (a 47-retry overnight bill vs. a single rogue git push).
For that layer, see MartinLoop: it wraps autonomous coding agents with budgets, verifier gates, and run records. The two layers compose — MartinLoop decides whether the next attempt is admitted; agent-guard decides whether the side effects inside that attempt are allowed to leave.
What is strong today (action layer):
- the five outbound action categories — code egress, package release, artifact egress, remote mutation, destructive shell — are covered by a zero-config preset
- shell / terminal, file write, and outbound mutation HTTP are the underlying runtime proof surfaces
- normalized runtime decisions, approval flows, and Ed25519-signed audit records are available now
- the SDK already includes policy signing, execution receipts, metrics, anomaly detection, and SIEM export beyond the narrow wedge
What is experimental and opt-in (content layer):
- credential / PII detection on outbound content —
write_filecontent andhttp_requestbody — behind the off-by-defaultcontentfeature, with three enforcement modes (block/mask/warn). See Content layer below.
What is roadmap (content layer):
- detection on tool inputs (prompts) before they reach the LLM provider, not just outbound effects
- HTTP method matching in policy (today the schema is URL-only; method-aware filtering goes host-side — see presets/README.md)
- distribution as a Claude Code plugin / ECC marketplace entry
What to understand before integrating:
- raw runtime APIs expose
execute | deny | ask_for_approval | handoff - adapter
enforceis still strongest on shell-like execution paths today - Bash has the deepest validator path;
read_file/write_filenormalize paths and fail closed on symlink escapes; HTTP policy matching is URL-centric (see roadmap) - Python and Node bindings use the SDK's default sandbox selection in the current release; explicit backend selection is deferred until pilot demand surfaces
- broader capability coverage is intentionally narrow, not generic
- broader policy workflow and control-plane ideas are future expansion paths, not the phase-one hook
The action layer decides whether a call may leave. The content layer inspects
what leaves with it. It is off by default — opt in with the content
feature flag — and currently scans two surfaces: write_file content and
http_request body.
Add a content block to any tool rule:
tools:
http_request:
mode: full_access
content:
mode: block # block | mask | warn
detect: [secrets, pii] # optional; defaults to bothThe three modes:
| Mode | Effect |
|---|---|
block |
Deny the call when sensitive content is detected (SENSITIVE_CONTENT_BLOCKED). |
mask |
Execute a redacted copy — each finding becomes [REDACTED:<label>] — and emit a ContentFinding audit record. |
warn |
Execute unchanged, but emit a ContentFinding audit record. |
Findings only ever expose the kind of data (e.g. AWS Access Key, Email),
never the raw matched substring — audit records carry labels and counts, not secrets.
Run the example:
cargo run -p agent-guard-sdk --example content_policy --features contentThis is a spike-grade detector set (named patterns + entropy fallback for secrets, regex + Luhn for PII), not a compliance-grade DLP engine. Treat it as a safety net, not the primary control.
- Outbound preset: the zero-config policy for coding-agent users — start here
- Claude Code plugin: one-command install —
/plugin marketplace add XuebinMa/agent-guard, then/plugin install agent-guard@agent-guard - Claude Code PreToolUse hook: wire
guard-hookinto your live Claude Code session manually - Node Quickstart: shortest programmatic path for a new developer
- Side-Effect Wedge Demo: runnable proof of the multi-side-effect runtime
- Secure Shell Tools: first integration when shell is the dominant risk
- Check vs Enforce: when to keep your handler vs when to move execution into
agent-guard - Framework Support Matrix: current Node / Python / Rust adoption surfaces
- User Manual: install, policy basics, and SDK integration
Additional references:
- Claude Code: the
guard-hookPreToolUse adapter is the lowest-friction entry — point one--policyflag at the outbound preset - Node: strongest programmatic surface, with wrappers for LangChain-style tools and OpenAI-style handlers
- Python: wrap_langchain_tool / wrap_openai_tool are available; a real-package validation script ships, automated CI version matrix is the remaining gap
- Rust SDK: most direct integration path for hosts that want explicit control over side-effect decisioning and execution
We welcome security research and contributions. Please see CONTRIBUTING.md for details.
Copyright © 2026 agent-guard team. Distributed under the MIT License.