A framework for governing AI coding agents through infrastructure instead of instructions.
The common assumption is that Claude's problem is memory. It forgets facts, decisions, and other nuances between sessions, so the go-to "solution" is to make it remember more by applying bandaids like session logs or plans in endless markdown files, classic databases, spawning dozens of subagents to argue it out with little context, and attempting marathon sessions to prove you've avoided compaction; that last one is actually nightmare fuel, not an accomplishment. I'll take "context rot" for $500, Alex.
The uncomfortable truth is that Claude makes bad decisions even when it remembers everything.
Smells like teen spirit. The analogy that comes to mind more often than not, for Claude, is that developers feel like "babysitters." If you haven't felt this way, then you should get your hands in the code more often. But, to be fair to Claude, it's more like a (usually) well-behaved teenager. Plenty of potential, and eager to be helpful, but they start the task before reading the instructions, are easily distracted, can have poor follow-through, and think guessing is the same thing as knowing. You can tell it to "be careful" all day. It won't help. What helps is a system that makes "careful" the default. The guardrails engage before the mistake, not rules that scold afterwards. But it's also balanced enough to encourage the right behavior without feeling like more chores.
Memory plugins ask: "How do we help Claude remember?" PACT asks: "How do we help Claude grow up?"
If you've ever witnessed Claude editing a file it hasn't read, fix one layer and break three downstream, guess at a package API instead of checking docs, deeclare "done" while half the work is missing, or worse, then install PACT. Full context doesn't fix any of these. They're reasoning failures, not recall failures.
PACT is a modular governance framework for AI coding agents (Claude Code, Cursor, Copilot Workspace, etc.). Use all of it or just the parts that fill gaps in your existing setup — if you already have a memory layer, a task manager, or a workflow orchestrator, PACT detects that and only scaffolds what's missing.
PACT was built by a solo developer for solo developers, then designed to scale up. The smaller your team, the more you need it. There's no code reviewer to notice the stale cache, no teammate to check the dependency chain, no QA to catch the UI regression. Besides scaling, smaller projects can also refer inherit knowledge from a larger one through delegation* or a shared stack-level instance. PACT's subsystems activate on the patterns they detect and based on what you explictly ask for.
PACT has 6 major features:
- Mechanical Enforcement — Shell hooks that block violations before they land.
- Required Checkpoints — Output-level reasoning gates that force visible, structured analysis before acting, which are immune to cognitive load.
- Cognitive Redirections — Questions the agent asks itself at decision points. Easily the most novel feature, and their effectiveness will surprise you.
- Context Replacement — Architecture maps and lifecycle flows that replace memory.
- Compound Intelligence — Research synthesis, cross-system knowledge directory, and capability baseline that make each session smarter than the last.
- Vector Memory — Semantic search across bugs, solutions, research, and task feedback using local embeddings (no API keys, no cloud). YAML stays authoritative; vector search finds the right file faster.
There are also many minor features:
- Multi-Agent Resilience — When Claude is down, switch to Gemini (or vice versa) with zero context loss. An auto-detection banner fires on Claude degradation with exact instructions to continue working.
- Multi-Model Delegation — Claude Opus costs $15/M input and $75/M output tokens. That's the right price for architecture and security decisions, but overkill for reading a changelog or generating boilerplate tests.
pact-delegateroutes those tasks to specialized worker models: research to Arcee Trinity ($0.90/M output — 98% cheaper) and boilerplate code to MiniMax M2.5 ($0.99/M output — 99% cheaper). Claude orchestrates, reviews, and commits. Workers execute. Hooks verify all output regardless of which model wrote it. When Claude is unavailable, Gemini (free tier available) takes over as orchestrator and manages the same workers. Swap models by editing one line inmodel_roster.yaml. - Observability & Feedback — Real-time dashboard that visualizes agent activity, captures user prompts, tracks tasks, and feeds user ratings back into future sessions.
- Distributed Cognition — Auto-dispatched subagents for dependency tracing, knowledge research, and pre-commit review so the main session stays focused on the user's task.
- Cutting Room Floor — Complex visuals (heat maps, animations, shaders, charts) can be prototyped outside the app framework before committing.
- Project Philosophy — Define what your product believes — core principles, decision filters, and anti-patterns that govern every product decision across sessions. The counterpart to the aesthetic skill: aesthetics govern how things look, philosophy governs why things exist.
- Project Scale & Delegation — Three tiers (Seed, Growth, Full) so small projects get governance without overhead, plus delegation so projects can inherit knowledge from a parent PACT instance — either a specific larger project (satellite) or a shared technology stack (stack).
- Progress Tracking — Three-layer system that ensures agents leave breadcrumbs during long operations. A staleness hook warns after 30+ edits or 20+ minutes without a PENDING_WORK update; a
progress_updatecheckpoint forces structured state documentation at milestones; a cognitive redirection asks "am I leaving breadcrumbs?" during multi-step work. Prevents the universal failure mode where the next session opens a stale task tracker and starts from scratch. ... and more.
Every recommendation in Anthropic's Claude Prompting Best Practices guide is addressed by PACT — clear instructions, context with rationale, structured XML, role-setting, examples, anti-hallucination guards, investigation-before-answering mandates, state management, subagent orchestration, autonomy/safety balancing, and anti-overengineering. PACT was built from production failures before these were published as best practices.
Every rule falls into one of two categories:
Hook-enforceable — Can be detected by text pattern matching. These become PreToolUse hooks that mechanically block violations. Examples:
- Forbidden imports (
import hive,import firebase) - Banned functions (
print(),debugPrint()) - Hardcoded secrets (
api_key = "sk-...") - Raw SQL with string interpolation
- Editing a file that hasn't been read this session
- Force-pushing to main/master
- Committing when local is behind remote
Self-enforceable — Requires semantic understanding. These stay as cognitive redirections in the instructions file. Examples:
- "Fresh-read the entity before saving" (stale data bug)
- "Update both the list AND the map cache in provider methods"
- "Trace 3 hops in both directions before editing"
- "Check what your changes made stale before declaring done"
- "Walk through the user journey after building UI"
- "Research before writing workarounds"
The rule: If a violation can be detected by grep, it's a hook. If it requires understanding, it's a checkpoint or a redirection.
PACT uses three layers of enforcement, from strongest to lightest:
| Layer | Mechanism | Can be skipped? | Use for |
|---|---|---|---|
| Hooks | Shell scripts that block tool calls | No — mechanical | Pattern-matchable violations (secrets, forbidden imports, git safety) |
| Checkpoints | Output-level <checkpoint> blocks the agent must produce |
Hard to skip — format requirement | Reasoning that historically fails under load (bug analysis, solution comparison, dependency tracing) |
| Redirections | Questions the agent asks itself | Yes — guidance only | Lighter decisions where a prompt is sufficient |
Checkpoints solve the core failure mode of cognitive redirections: rules encoded as prose get skipped when the agent is under cognitive pressure. A checkpoint is a structured block the agent must output before acting. It's visible to the user, verifiable, and much harder to skip than an internal question.
Eight checkpoint types:
-
bug_fix— Triggers when the user reports something broken. Forces the agent to trace the causal chain from symptom to root cause and create a bug tracker file before writing any fix. -
solution_compare— Triggers when the agent considers 2+ approaches. Forces a side-by-side comparison with research sources named. Prevents the "spiral" pattern of trying approaches sequentially without structured evaluation. -
package_verify— Triggers when calling a package API. Forces the agent to cite where it verified the API (docs, knowledge file, WebSearch) instead of guessing from training data. -
dependency_trace— Triggers when editing a file with downstream dependents. Forces the agent to trace 3 hops in both directions using the architecture map before making changes. -
done_check— Triggers when declaring a task complete. Forces the agent to re-read the user's exact request and list stale artifacts. -
ui_work— Triggers before building or modifying a UI element. Forces the agent to audit existing reusable widgets, read reference screens for design guidance, and declare which pattern it's following. Prevents bespoke UI that drifts from the app's visual language. -
delegation_check— Triggers before starting web research, doc reading, boilerplate code generation, test scaffolding, or content classification. Forces the agent to run the delegation decision tree and justify whether to delegate to a worker model or keep the task. Makes the cost trade-off visible: "This is a $75/M task that a $0.90/M model handles." -
progress_update— Triggers when a logical unit of work completes during a multi-step operation (agent returns, batch processed, phase finished). Forces the agent to document what just completed, the current state with concrete counts, and whether PENDING_WORK.yaml was updated. Prevents the universal failure mode where an agent works for hours without leaving breadcrumbs, and the next session starts from scratch.
Research basis: Claude API docs on extended thinking confirm that system prompts don't reach into internal thinking blocks. Output-level format requirements are the proven mechanism for structured reasoning — they're visible, verifiable, and survive cognitive load. (Extended thinking docs, Prompt engineering best practices)
Cognitive redirections are the lighter layer — questions the agent asks itself at decision points. They work well for routine decisions but historically fail under cognitive pressure (which is why the six patterns above were upgraded to checkpoints).
- When about to edit any file:
"What depends on this, and what does this depend on?"
- When declaring a task done:
"What did my changes just make stale?"
- When a package doesn't behave as expected:
"Do I actually know this package, or am I guessing?"
- Before starting any UI work:
"What already exists that I should reuse or reference?"
- After finishing any UI build:
"Am I the user right now?"
The agent has autonomy to add new redirections — and to promote a redirection to a checkpoint when it notices the redirection being skipped under load. Future sessions inherit this awareness.
| Architecture Map | Lifecycle Flow | |
|---|---|---|
| Answers | "What files are involved?" | "What's the safe order of operations?" |
| Contains | Tables, services, state, screens, caches, cascades | States, ordering, assumptions, memory model |
| Nature | Static structure (spatial) | Dynamic behavior (temporal) |
| Analogy | Circuit diagram | Timing diagram |
If you're writing a table name in a flow doc → stop, that belongs in the map. If you're writing "this must happen before that" in the map → stop, that belongs in a flow.
A fresh AI session has training data and a context window. A session running PACT has training data + context window + every synthesis every previous session earned. That's compound intelligence — it grows with every session.
Three systems make this work:
Research Knowledge Base (docs/reference/research/) — When the agent researches something non-trivial (combining project code analysis with online docs/papers/APIs), the synthesis — the insight that neither source had alone — is saved as a structured YAML file. Future sessions find these via tags, build on them, and evolve them through four actions: deepen, reframe, update, supersede.
Knowledge Directory (docs/reference/KNOWLEDGE_DIRECTORY.yaml) — A single-file tag index across ALL knowledge systems (research, bugs, solutions, packages, feature flows). One read shows every file that touches a topic without opening them individually. Hook-enforced: commits that include knowledge files without updating the directory are blocked.
Capability Baseline (docs/reference/PACT_BASELINE.yaml) — PACT's self-awareness layer. Records what the agent can do natively, what PACT compensates for, and how capabilities change over time. When the agent provider ships a new feature that makes a PACT rule redundant, this file is how the agent notices. When a new capability makes PACT stronger, this file is how the agent leans into it.
PACT includes a real-time dashboard that visualizes everything your AI agent does — every file edit, preflight check, hook block, commit, and governance update appears as a card in a horizontal timeline. Your own prompts show up too, so you can see the full conversation flow alongside the agent's actions.
Task Rating System — Click "Track Next Task" on any session, describe what you're asking the agent to do, and all subsequent events flow into that task's sub-row. When the task is done, rate it 1-5 with category tags (UI, Backend, Logic, Missed Requirements, Hallucination, etc.) and free-text feedback on what went right and wrong. Ratings are stored permanently and compiled into a scorecard (~/.claude/pact-scorecard.md) that the agent reads at the start of every session — creating a feedback loop where past ratings directly shape future behavior.
What you see:
- Session lanes with model identity (Claude/Gemini) and project name
- Task sub-rows with collapse/expand, diagnosis, and rating
- Per-type animated icons (the pencil writes while editing, the lightning strikes on preflight checks)
- Activity timeline, sidebar metrics, scorecard with rolling average and streak tracking
- "Track From Here" on any prompt card to retroactively mark where a new task started
- Diagnosis per session or per task — coverage analysis of which PACT subsystems were exercised
Start the dashboard:
python .claude/hooks/pact-server.py &
# Opens at http://127.0.0.1:7246Or set "dashboard": "auto" in ~/.claude/pact-config.json to start it automatically every session.
These are built-in Claude Code slash commands — part of the plugin system. Run them directly in your Claude Code session:
/plugin marketplace add jonathanmr22/pact
/plugin install pact@pactThis gives you:
- 14 hooks — automatically active (read-before-write, secrets blocker, git safety, multi-session coordination, edit warnings, PreFlight architectural checks, feature flow protection, issue tracker gate, knowledge directory pairing, progress breadcrumb staleness, session tracking, timestamps, status page health check, prompt capture, PACT event logging)
- 3 subagents — auto-dispatched for dependency tracing, research, and pre-commit review
- 5 slash commands —
/pact-init,/pact-check,/pact-flow,/pact-bug,/pact-recall - Live dashboard — real-time visualization of agent activity, task tracking, and rating system
Then run /pact-init in your project to scaffold the governance files. The init process will:
- Assess project scale — recommend Seed, Growth, or Full tier based on project complexity
- Check for delegation — ask if this project shares infrastructure with a parent project or technology stack
- Detect existing tools — present an overlap table so you make informed decisions
- Scaffold — create only the files appropriate for your tier and delegation setup
Install the plugin fully. Do not ask the user to pick subsystems before init. The /pact-init skill handles tier selection, delegation, and overlap detection — that's where informed decisions happen, not at plugin install time.
After installation, uncomment rules in the hook scripts for your project's specific patterns (forbidden imports, banned functions, critical file patterns, etc.). The hooks live in your Claude Code plugin cache — edit them directly or override with project-level hooks.
If you're not using Claude Code, copy the templates manually:
your-project/
├── CLAUDE.md # ← from templates/instructions.md
├── SYSTEM_MAP.yaml # ← from templates/architecture_map.yaml
├── cutting_room/ # ← from templates/cutting_room/
│ ├── _INDEX.yaml
│ └── _TRIAL_TEMPLATE.yaml
├── .claude/
│ ├── hooks/
│ │ ├── pre-edit-rules.sh
│ │ ├── pre-bash-guard.sh
│ │ ├── pre-edit-feature-flow.sh
│ │ ├── post-edit-warnings.sh
│ │ ├── post-read-tracker.sh
│ │ ├── post-edit-progress-check.sh
│ │ ├── post-edit-timestamp.sh
│ │ ├── post-sentry-bug-reminder.sh
│ │ └── session-register.sh
│ ├── bugs/
│ │ ├── _INDEX.yaml
│ │ └── _SOLUTIONS.yaml
│ ├── sessions.yaml # (auto-maintained by hooks)
│ └── memory/
│ ├── PENDING_WORK.yaml
│ └── file_edit_log.yaml # (auto-populated by hooks)
├── docs/
│ ├── feature_flows/ # ← lifecycle flow docs
│ ├── plans/ # ← implementation plans
│ └── reference/
│ ├── packages/ # ← per-package knowledge files
│ ├── research/ # ← cross-session research synthesis
│ │ └── _RESEARCH.yaml
│ ├── KNOWLEDGE_DIRECTORY.yaml # ← cross-system tag index
│ └── PACT_BASELINE.yaml # ← agent capability baseline
Configure hooks in .claude/settings.local.json (or your agent's equivalent):
{
"hooks": {
"PreToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/pre-edit-rules.sh" },
{ "type": "command", "command": "bash .claude/hooks/pre-edit-feature-flow.sh" }
]
},
{
"matcher": "Bash",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/pre-bash-guard.sh" }
]
}
],
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/post-edit-warnings.sh" },
{ "type": "command", "command": "bash .claude/hooks/post-edit-timestamp.sh" }
]
},
{
"matcher": "Read",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/post-read-tracker.sh" }
]
},
{
"matcher": "mcp__sentry",
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/post-sentry-bug-reminder.sh" }
]
}
],
"SessionStart": [
{
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/session-register.sh" }
]
},
{
"hooks": [
{ "type": "command", "command": "bash .claude/hooks/session-status-check.sh", "timeout": 8000 }
]
}
]
}
}3 PreToolUse hooks that block violations (secrets, git safety, critical files without flow docs). 8 PostToolUse hooks that warn or log (file size, workaround language, progress staleness, edit timestamps, read tracking, preflight architectural checks). 3 SessionStart hooks that coordinate (session registration, status page monitoring, dashboard startup).
Full hook reference with patterns and thresholds →
Output-level <checkpoint> blocks the agent must produce before acting. Bug fix tracing, solution comparison, package verification, dependency tracing, completion check, UI audit, delegation justification, progress documentation. Visible to the user, verifiable, resistant to cognitive load.
Checkpoint types and format reference →
Questions the agent asks itself at decision points. The lightest enforcement layer — guidance, not gates. "What depends on this?" "Do I actually know this package?" "Can I delegate this?" The agent can add new redirections and promote them to checkpoints when they're consistently skipped.
Claude Opus ($75/M output) orchestrates and reviews. Trinity ($0.90/M — 98% cheaper) handles research. M2.5 ($0.99/M — 99% cheaper) generates boilerplate code. Gemini (free tier) takes over as orchestrator when Claude is down. pact-delegate CLI routes tasks, logs token usage, and tracks quality. Every delegation is visible in the terminal and logged with actual costs.
Delegation guide with CLI reference →
Live visualization of every file edit, hook block, preflight check, and commit as a horizontal timeline. Task rating system (1-5) feeds a scorecard the agent reads at session start. Diagnosis panels show which PACT subsystems were exercised.
Dashboard setup and features →
Local embeddings (no API keys) index bugs, solutions, research, and feedback. Query with natural language instead of grepping filenames. /pact-recall slash command for inline search. Auto-indexed when knowledge files are created.
Research synthesis, knowledge directory, capability baseline, package knowledge, bug tracking with reusable solutions. Every session's work enriches future sessions. No knowledge dies when a context window closes.
Compound intelligence deep dive →
| Agent | What It Does |
|---|---|
pact-tracer |
Traces dependency chains before edits — returns impact report |
pact-researcher |
Checks existing knowledge, researches if needed, saves synthesis |
pact-reviewer |
Pre-commit governance review — staleness audit, dependency check |
| Command | What It Does |
|---|---|
/pact-init |
Scaffold governance files (tier-aware, overlap-detecting) |
/pact-check |
Run cognitive redirections against current changes |
/pact-flow |
Generate a lifecycle flow document |
/pact-bug |
Create/update structured bug investigation files |
/pact-recall |
Semantic search across PACT vector memory |
Claude + Gemini share the same hooks, same rules, same task tracker. When one is down, switch to the other with zero context loss. Adapter scripts translate between hook formats. Auto-detection banner when Claude is degraded.
Multi-agent setup guide → | Comparison with other plugins → | Real-world examples →
PACT generates some files that should be committed (hooks, architecture maps, knowledge files) and some that shouldn't (runtime logs, event streams, PID files). See templates/pact-gitignore for the recommended exclusions, or add these to your project's .gitignore:
# PACT runtime (auto-generated, session-specific)
.claude/sessions.yaml
.claude/pact-server.pid
.claude/pact-events.jsonl
.claude/memory/file_edit_log.yaml
# Session worktrees (if worktree isolation is enabled)
.worktrees/- Identify your top 3 recurring agent patterns to improve
- Create
pre-edit-rules.shwith those 3 patterns - Create
post-read-tracker.sh(read-before-write enforcement) - Create
pre-bash-guard.sh(git safety + multi-session coordination + knowledge directory pairing) - Create
silent-linter.shfor your project's analyzer - Write
SYSTEM_MAP.yamlfor your most-changed features - Write cognitive redirections from your actual experience
- Create
PENDING_WORK.yamlfor cross-session continuity - Add session start protocol to instructions file
- Create
docs/reference/packages/for package knowledge - Create
docs/reference/research/_RESEARCH.yamlfor cross-session research synthesis - Create
docs/reference/KNOWLEDGE_DIRECTORY.yamlas the cross-system tag index - Create
docs/reference/PACT_BASELINE.yamlwith your agent's current capabilities - Create
docs/feature_flows/for lifecycle flows of critical systems - Write your first feature flow for your highest-risk system
- Create
cutting_room/for visual prototyping - Create
.claude/bugs/with_INDEX.yamland_SOLUTIONS.yaml - Set up
session-register.shfor multi-session awareness - Optional: Enable worktree isolation — set
"worktree_isolation": truein~/.claude/pact-config.json. Each session gets its own git branch; merges to main require user approval. Recommended if agents commit too eagerly or you run parallel sessions. Add.worktrees/to.gitignore. - Set up the PACT dashboard (
pact-server.py,pact-dashboard.html,pact-event-logger.sh) - Configure dashboard startup preference in
~/.claude/pact-config.json(ask/auto/off) - Add
pact-prompt-logger.shtoUserPromptSubmithooks for prompt capture - Copy PACT subagents to
.claude/agents/(pact-tracer, pact-researcher, pact-reviewer) - Add Subagent Delegation section to your instructions file
- Run
pact-migrate.pyto build the vector search index from existing knowledge files - Add
.pact-gitignoreentries to your.gitignore
PACT collects anonymous feedback at two milestones — Day 2 and Week 2 of use. Your Claude will ask if you'd like to generate a report. The report captures:
- Which PACT subsystems you used vs ignored
- Task rating averages and common issue categories
- What helped (from you and Claude's perspective)
- What caused friction
- Workarounds Claude had to invent — these are the most valuable signal because they show exactly where PACT has gaps that should become hooks, checks, or templates
- Hooks that blocked legitimate work (false positives)
The report is generated locally at ~/.claude/pact-feedback-report.yaml. It contains no identifying information — no project names, no file paths, no usernames, no code. Individual frameworks can be mentioned if relevant to the feedback, but your full stack combination is never included. Only aggregate PACT usage counts and generic descriptions of what helped or didn't. Nothing is sent anywhere unless you explicitly choose to share it after reviewing every line. To submit:
PACT is released into the public domain. Use it however you want.
