PACT — Programmatic Agent Constraint Toolkit

A framework for governing AI coding agents through infrastructure instead of instructions.

The common assumption is that Claude's problem is memory. It forgets facts, decisions, and other nuances between sessions, so the go-to "solution" is to make it remember more by applying bandaids like session logs or plans in endless markdown files, classic databases, spawning dozens of subagents to argue it out with little context, and attempting marathon sessions to prove you've avoided compaction; that last one is actually nightmare fuel, not an accomplishment. I'll take "context rot" for $500, Alex.

The uncomfortable truth is that Claude makes bad decisions even when it remembers everything.

Smells like teen spirit. The analogy that comes to mind more often than not, for Claude, is that developers feel like "babysitters." If you haven't felt this way, then you should get your hands in the code more often. But, to be fair to Claude, it's more like a (usually) well-behaved teenager. Plenty of potential, and eager to be helpful, but they start the task before reading the instructions, are easily distracted, can have poor follow-through, and think guessing is the same thing as knowing. You can tell it to "be careful" all day. It won't help. What helps is a system that makes "careful" the default. The guardrails engage before the mistake, not rules that scold afterwards. But it's also balanced enough to encourage the right behavior without feeling like more chores.

Memory plugins ask: "How do we help Claude remember?" PACT asks: "How do we help Claude grow up?"

If you've ever witnessed Claude editing a file it hasn't read, fix one layer and break three downstream, guess at a package API instead of checking docs, deeclare "done" while half the work is missing, or worse, then install PACT. Full context doesn't fix any of these. They're reasoning failures, not recall failures.

What is PACT, & Who Is It For?

PACT is a modular governance framework for AI coding agents (Claude Code, Cursor, Copilot Workspace, etc.). Use all of it or just the parts that fill gaps in your existing setup — if you already have a memory layer, a task manager, or a workflow orchestrator, PACT detects that and only scaffolds what's missing.

PACT was built by a solo developer for solo developers, then designed to scale up. The smaller your team, the more you need it. There's no code reviewer to notice the stale cache, no teammate to check the dependency chain, no QA to catch the UI regression. Besides scaling, smaller projects can also refer inherit knowledge from a larger one through delegation* or a shared stack-level instance. PACT's subsystems activate on the patterns they detect and based on what you explictly ask for.

PACT has 6 major features:

Mechanical Enforcement — Shell hooks that block violations before they land.
Required Checkpoints — Output-level reasoning gates that force visible, structured analysis before acting, which are immune to cognitive load.
Cognitive Redirections — Questions the agent asks itself at decision points. Easily the most novel feature, and their effectiveness will surprise you.
Context Replacement — Architecture maps and lifecycle flows that replace memory.
Compound Intelligence — Research synthesis, cross-system knowledge directory, and capability baseline that make each session smarter than the last.
Vector Memory — Semantic search across bugs, solutions, research, and task feedback using local embeddings (no API keys, no cloud). YAML stays authoritative; vector search finds the right file faster.

There are also many minor features:

Multi-Agent Resilience — When Claude is down, switch to Gemini (or vice versa) with zero context loss. An auto-detection banner fires on Claude degradation with exact instructions to continue working.
Multi-Model Delegation — Claude Opus costs $15/M input and $75/M output tokens. That's the right price for architecture and security decisions, but overkill for reading a changelog or generating boilerplate tests. pact-delegate routes those tasks to specialized worker models: research to Arcee Trinity ($0.90/M output — 98% cheaper) and boilerplate code to MiniMax M2.5 ($0.99/M output — 99% cheaper). Claude orchestrates, reviews, and commits. Workers execute. Hooks verify all output regardless of which model wrote it. When Claude is unavailable, Gemini (free tier available) takes over as orchestrator and manages the same workers. Swap models by editing one line in model_roster.yaml.
Observability & Feedback — Real-time dashboard that visualizes agent activity, captures user prompts, tracks tasks, and feeds user ratings back into future sessions.
Distributed Cognition — Auto-dispatched subagents for dependency tracing, knowledge research, and pre-commit review so the main session stays focused on the user's task.
Cutting Room Floor — Complex visuals (heat maps, animations, shaders, charts) can be prototyped outside the app framework before committing.
Project Philosophy — Define what your product believes — core principles, decision filters, and anti-patterns that govern every product decision across sessions. The counterpart to the aesthetic skill: aesthetics govern how things look, philosophy governs why things exist.
Project Scale & Delegation — Three tiers (Seed, Growth, Full) so small projects get governance without overhead, plus delegation so projects can inherit knowledge from a parent PACT instance — either a specific larger project (satellite) or a shared technology stack (stack).
Progress Tracking — Three-layer system that ensures agents leave breadcrumbs during long operations. A staleness hook warns after 30+ edits or 20+ minutes without a PENDING_WORK update; a progress_update checkpoint forces structured state documentation at milestones; a cognitive redirection asks "am I leaving breadcrumbs?" during multi-step work. Prevents the universal failure mode where the next session opens a stale task tracker and starts from scratch. ... and more.

Every recommendation in Anthropic's Claude Prompting Best Practices guide is addressed by PACT — clear instructions, context with rationale, structured XML, role-setting, examples, anti-hallucination guards, investigation-before-answering mandates, state management, subagent orchestration, autonomy/safety balancing, and anti-overengineering. PACT was built from production failures before these were published as best practices.

Core Concepts

Hook-Enforceable vs Self-Enforceable Rules

Every rule falls into one of two categories:

Hook-enforceable — Can be detected by text pattern matching. These become PreToolUse hooks that mechanically block violations. Examples:

Forbidden imports (import hive, import firebase)
Banned functions (print(), debugPrint())
Hardcoded secrets (api_key = "sk-...")
Raw SQL with string interpolation
Editing a file that hasn't been read this session
Force-pushing to main/master
Committing when local is behind remote

Self-enforceable — Requires semantic understanding. These stay as cognitive redirections in the instructions file. Examples:

"Fresh-read the entity before saving" (stale data bug)
"Update both the list AND the map cache in provider methods"
"Trace 3 hops in both directions before editing"
"Check what your changes made stale before declaring done"
"Walk through the user journey after building UI"
"Research before writing workarounds"

The rule: If a violation can be detected by grep, it's a hook. If it requires understanding, it's a checkpoint or a redirection.

Three Enforcement Layers

PACT uses three layers of enforcement, from strongest to lightest:

Layer	Mechanism	Can be skipped?	Use for
Hooks	Shell scripts that block tool calls	No — mechanical	Pattern-matchable violations (secrets, forbidden imports, git safety)
Checkpoints	Output-level `<checkpoint>` blocks the agent must produce	Hard to skip — format requirement	Reasoning that historically fails under load (bug analysis, solution comparison, dependency tracing)
Redirections	Questions the agent asks itself	Yes — guidance only	Lighter decisions where a prompt is sufficient

Required Checkpoints (New in 0.9.0)

Checkpoints solve the core failure mode of cognitive redirections: rules encoded as prose get skipped when the agent is under cognitive pressure. A checkpoint is a structured block the agent must output before acting. It's visible to the user, verifiable, and much harder to skip than an internal question.

Eight checkpoint types:

bug_fix — Triggers when the user reports something broken. Forces the agent to trace the causal chain from symptom to root cause and create a bug tracker file before writing any fix.
solution_compare — Triggers when the agent considers 2+ approaches. Forces a side-by-side comparison with research sources named. Prevents the "spiral" pattern of trying approaches sequentially without structured evaluation.
package_verify — Triggers when calling a package API. Forces the agent to cite where it verified the API (docs, knowledge file, WebSearch) instead of guessing from training data.
dependency_trace — Triggers when editing a file with downstream dependents. Forces the agent to trace 3 hops in both directions using the architecture map before making changes.
done_check — Triggers when declaring a task complete. Forces the agent to re-read the user's exact request and list stale artifacts.
ui_work — Triggers before building or modifying a UI element. Forces the agent to audit existing reusable widgets, read reference screens for design guidance, and declare which pattern it's following. Prevents bespoke UI that drifts from the app's visual language.
delegation_check — Triggers before starting web research, doc reading, boilerplate code generation, test scaffolding, or content classification. Forces the agent to run the delegation decision tree and justify whether to delegate to a worker model or keep the task. Makes the cost trade-off visible: "This is a $75/M task that a $0.90/M model handles."
progress_update — Triggers when a logical unit of work completes during a multi-step operation (agent returns, batch processed, phase finished). Forces the agent to document what just completed, the current state with concrete counts, and whether PENDING_WORK.yaml was updated. Prevents the universal failure mode where an agent works for hours without leaving breadcrumbs, and the next session starts from scratch.

Research basis: Claude API docs on extended thinking confirm that system prompts don't reach into internal thinking blocks. Output-level format requirements are the proven mechanism for structured reasoning — they're visible, verifiable, and survive cognitive load. (Extended thinking docs, Prompt engineering best practices)

Cognitive Redirections

Cognitive redirections are the lighter layer — questions the agent asks itself at decision points. They work well for routine decisions but historically fail under cognitive pressure (which is why the six patterns above were upgraded to checkpoints).

- When about to edit any file:
  "What depends on this, and what does this depend on?"

- When declaring a task done:
  "What did my changes just make stale?"

- When a package doesn't behave as expected:
  "Do I actually know this package, or am I guessing?"

- Before starting any UI work:
  "What already exists that I should reuse or reference?"

- After finishing any UI build:
  "Am I the user right now?"

The agent has autonomy to add new redirections — and to promote a redirection to a checkpoint when it notices the redirection being skipped under load. Future sessions inherit this awareness.

Architecture Map vs Lifecycle Flow

	Architecture Map	Lifecycle Flow
Answers	"What files are involved?"	"What's the safe order of operations?"
Contains	Tables, services, state, screens, caches, cascades	States, ordering, assumptions, memory model
Nature	Static structure (spatial)	Dynamic behavior (temporal)
Analogy	Circuit diagram	Timing diagram

If you're writing a table name in a flow doc → stop, that belongs in the map. If you're writing "this must happen before that" in the map → stop, that belongs in a flow.

Compound Intelligence

A fresh AI session has training data and a context window. A session running PACT has training data + context window + every synthesis every previous session earned. That's compound intelligence — it grows with every session.

Three systems make this work:

Research Knowledge Base (docs/reference/research/) — When the agent researches something non-trivial (combining project code analysis with online docs/papers/APIs), the synthesis — the insight that neither source had alone — is saved as a structured YAML file. Future sessions find these via tags, build on them, and evolve them through four actions: deepen, reframe, update, supersede.

Knowledge Directory (docs/reference/KNOWLEDGE_DIRECTORY.yaml) — A single-file tag index across ALL knowledge systems (research, bugs, solutions, packages, feature flows). One read shows every file that touches a topic without opening them individually. Hook-enforced: commits that include knowledge files without updating the directory are blocked.

Capability Baseline (docs/reference/PACT_BASELINE.yaml) — PACT's self-awareness layer. Records what the agent can do natively, what PACT compensates for, and how capabilities change over time. When the agent provider ships a new feature that makes a PACT rule redundant, this file is how the agent notices. When a new capability makes PACT stronger, this file is how the agent leans into it.

Live Dashboard

PACT includes a real-time dashboard that visualizes everything your AI agent does — every file edit, preflight check, hook block, commit, and governance update appears as a card in a horizontal timeline. Your own prompts show up too, so you can see the full conversation flow alongside the agent's actions.

Task Rating System — Click "Track Next Task" on any session, describe what you're asking the agent to do, and all subsequent events flow into that task's sub-row. When the task is done, rate it 1-5 with category tags (UI, Backend, Logic, Missed Requirements, Hallucination, etc.) and free-text feedback on what went right and wrong. Ratings are stored permanently and compiled into a scorecard (~/.claude/pact-scorecard.md) that the agent reads at the start of every session — creating a feedback loop where past ratings directly shape future behavior.

What you see:

Session lanes with model identity (Claude/Gemini) and project name
Task sub-rows with collapse/expand, diagnosis, and rating
Per-type animated icons (the pencil writes while editing, the lightning strikes on preflight checks)
Activity timeline, sidebar metrics, scorecard with rolling average and streak tracking
"Track From Here" on any prompt card to retroactively mark where a new task started
Diagnosis per session or per task — coverage analysis of which PACT subsystems were exercised

Start the dashboard:

python .claude/hooks/pact-server.py &
# Opens at http://127.0.0.1:7246

Or set "dashboard": "auto" in ~/.claude/pact-config.json to start it automatically every session.

Quick Start — Claude Code Plugin (Recommended)

These are built-in Claude Code slash commands — part of the plugin system. Run them directly in your Claude Code session:

/plugin marketplace add jonathanmr22/pact
/plugin install pact@pact

This gives you:

14 hooks — automatically active (read-before-write, secrets blocker, git safety, multi-session coordination, edit warnings, PreFlight architectural checks, feature flow protection, issue tracker gate, knowledge directory pairing, progress breadcrumb staleness, session tracking, timestamps, status page health check, prompt capture, PACT event logging)
3 subagents — auto-dispatched for dependency tracing, research, and pre-commit review
5 slash commands — /pact-init, /pact-check, /pact-flow, /pact-bug, /pact-recall
Live dashboard — real-time visualization of agent activity, task tracking, and rating system

Then run /pact-init in your project to scaffold the governance files. The init process will:

Assess project scale — recommend Seed, Growth, or Full tier based on project complexity
Check for delegation — ask if this project shares infrastructure with a parent project or technology stack
Detect existing tools — present an overlap table so you make informed decisions
Scaffold — create only the files appropriate for your tier and delegation setup

Install the plugin fully. Do not ask the user to pick subsystems before init. The /pact-init skill handles tier selection, delegation, and overlap detection — that's where informed decisions happen, not at plugin install time.

Customize

After installation, uncomment rules in the hook scripts for your project's specific patterns (forbidden imports, banned functions, critical file patterns, etc.). The hooks live in your Claude Code plugin cache — edit them directly or override with project-level hooks.

Quick Start — Manual Setup (Any Agent)

If you're not using Claude Code, copy the templates manually:

your-project/
├── CLAUDE.md                    # ← from templates/instructions.md
├── SYSTEM_MAP.yaml              # ← from templates/architecture_map.yaml
├── cutting_room/                # ← from templates/cutting_room/
│   ├── _INDEX.yaml
│   └── _TRIAL_TEMPLATE.yaml
├── .claude/
│   ├── hooks/
│   │   ├── pre-edit-rules.sh
│   │   ├── pre-bash-guard.sh
│   │   ├── pre-edit-feature-flow.sh
│   │   ├── post-edit-warnings.sh
│   │   ├── post-read-tracker.sh
│   │   ├── post-edit-progress-check.sh
│   │   ├── post-edit-timestamp.sh
│   │   ├── post-sentry-bug-reminder.sh
│   │   └── session-register.sh
│   ├── bugs/
│   │   ├── _INDEX.yaml
│   │   └── _SOLUTIONS.yaml
│   ├── sessions.yaml            # (auto-maintained by hooks)
│   └── memory/
│       ├── PENDING_WORK.yaml
│       └── file_edit_log.yaml   # (auto-populated by hooks)
├── docs/
│   ├── feature_flows/           # ← lifecycle flow docs
│   ├── plans/                   # ← implementation plans
│   └── reference/
│       ├── packages/            # ← per-package knowledge files
│       ├── research/            # ← cross-session research synthesis
│       │   └── _RESEARCH.yaml
│       ├── KNOWLEDGE_DIRECTORY.yaml  # ← cross-system tag index
│       └── PACT_BASELINE.yaml   # ← agent capability baseline

Configure hooks in .claude/settings.local.json (or your agent's equivalent):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/pre-edit-rules.sh" },
          { "type": "command", "command": "bash .claude/hooks/pre-edit-feature-flow.sh" }
        ]
      },
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/pre-bash-guard.sh" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/post-edit-warnings.sh" },
          { "type": "command", "command": "bash .claude/hooks/post-edit-timestamp.sh" }
        ]
      },
      {
        "matcher": "Read",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/post-read-tracker.sh" }
        ]
      },
      {
        "matcher": "mcp__sentry",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/post-sentry-bug-reminder.sh" }
        ]
      }
    ],
    "SessionStart": [
      {
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/session-register.sh" }
        ]
      },
      {
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/session-status-check.sh", "timeout": 8000 }
        ]
      }
    ]
  }
}

What You Get

Hooks — 14 shell scripts, automatic, no action needed

3 PreToolUse hooks that block violations (secrets, git safety, critical files without flow docs). 8 PostToolUse hooks that warn or log (file size, workaround language, progress staleness, edit timestamps, read tracking, preflight architectural checks). 3 SessionStart hooks that coordinate (session registration, status page monitoring, dashboard startup).

Full hook reference with patterns and thresholds →

Checkpoints — 7 structured reasoning gates

Output-level <checkpoint> blocks the agent must produce before acting. Bug fix tracing, solution comparison, package verification, dependency tracing, completion check, UI audit, delegation justification, progress documentation. Visible to the user, verifiable, resistant to cognitive load.

Checkpoint types and format reference →

Cognitive Redirections — 15+ self-check questions

Questions the agent asks itself at decision points. The lightest enforcement layer — guidance, not gates. "What depends on this?" "Do I actually know this package?" "Can I delegate this?" The agent can add new redirections and promote them to checkpoints when they're consistently skipped.

Full redirections catalog →

Multi-Model Delegation — Route tasks to specialized workers

Claude Opus ($75/M output) orchestrates and reviews. Trinity ($0.90/M — 98% cheaper) handles research. M2.5 ($0.99/M — 99% cheaper) generates boilerplate code. Gemini (free tier) takes over as orchestrator when Claude is down. pact-delegate CLI routes tasks, logs token usage, and tracks quality. Every delegation is visible in the terminal and logged with actual costs.

Delegation guide with CLI reference →

Dashboard — Real-time observability + task rating

Live visualization of every file edit, hook block, preflight check, and commit as a horizontal timeline. Task rating system (1-5) feeds a scorecard the agent reads at session start. Diagnosis panels show which PACT subsystems were exercised.

Dashboard setup and features →

Vector Memory — Semantic search across knowledge

Local embeddings (no API keys) index bugs, solutions, research, and feedback. Query with natural language instead of grepping filenames. /pact-recall slash command for inline search. Auto-indexed when knowledge files are created.

Vector memory setup and CLI →

Compound Intelligence — Knowledge that grows

Research synthesis, knowledge directory, capability baseline, package knowledge, bug tracking with reusable solutions. Every session's work enriches future sessions. No knowledge dies when a context window closes.

Compound intelligence deep dive →

Subagents — 3 auto-dispatched specialists

Agent	What It Does
`pact-tracer`	Traces dependency chains before edits — returns impact report
`pact-researcher`	Checks existing knowledge, researches if needed, saves synthesis
`pact-reviewer`	Pre-commit governance review — staleness audit, dependency check

Slash Commands

Command	What It Does
`/pact-init`	Scaffold governance files (tier-aware, overlap-detecting)
`/pact-check`	Run cognitive redirections against current changes
`/pact-flow`	Generate a lifecycle flow document
`/pact-bug`	Create/update structured bug investigation files
`/pact-recall`	Semantic search across PACT vector memory

Multi-Agent Resilience

Claude + Gemini share the same hooks, same rules, same task tracker. When one is down, switch to the other with zero context loss. Adapter scripts translate between hook formats. Auto-detection banner when Claude is degraded.

Multi-agent setup guide → | Comparison with other plugins → | Real-world examples →

What to .gitignore

PACT generates some files that should be committed (hooks, architecture maps, knowledge files) and some that shouldn't (runtime logs, event streams, PID files). See templates/pact-gitignore for the recommended exclusions, or add these to your project's .gitignore:

# PACT runtime (auto-generated, session-specific)
.claude/sessions.yaml
.claude/pact-server.pid
.claude/pact-events.jsonl
.claude/memory/file_edit_log.yaml

# Session worktrees (if worktree isolation is enabled)
.worktrees/

Adoption Checklist

Feedback

PACT collects anonymous feedback at two milestones — Day 2 and Week 2 of use. Your Claude will ask if you'd like to generate a report. The report captures:

Which PACT subsystems you used vs ignored
Task rating averages and common issue categories
What helped (from you and Claude's perspective)
What caused friction
Workarounds Claude had to invent — these are the most valuable signal because they show exactly where PACT has gaps that should become hooks, checks, or templates
Hooks that blocked legitimate work (false positives)

The report is generated locally at ~/.claude/pact-feedback-report.yaml. It contains no identifying information — no project names, no file paths, no usernames, no code. Individual frameworks can be mentioned if relevant to the feedback, but your full stack combination is never included. Only aggregate PACT usage counts and generic descriptions of what helped or didn't. Nothing is sent anywhere unless you explicitly choose to share it after reviewing every line. To submit:

Submit Anonymous Feedback →

License

PACT is released into the public domain. Use it however you want.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.claude-plugin		.claude-plugin
.claude/memory		.claude/memory
assets		assets
docs		docs
plugins/pact		plugins/pact
templates		templates
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
COMPARISON.md		COMPARISON.md
EMBEDDED.md		EMBEDDED.md
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
MULTI_AGENT.md		MULTI_AGENT.md
README.md		README.md
UPGRADE.md		UPGRADE.md
VERSION		VERSION

Folders and files

Latest commit

History

Repository files navigation

PACT — Programmatic Agent Constraint Toolkit

What is PACT, & Who Is It For?

Core Concepts

Hook-Enforceable vs Self-Enforceable Rules

Three Enforcement Layers

Required Checkpoints (New in 0.9.0)

Cognitive Redirections

Architecture Map vs Lifecycle Flow

Compound Intelligence

Live Dashboard

Quick Start — Claude Code Plugin (Recommended)

Customize

Quick Start — Manual Setup (Any Agent)

What You Get

Hooks — 14 shell scripts, automatic, no action needed

Checkpoints — 7 structured reasoning gates

Cognitive Redirections — 15+ self-check questions

Multi-Model Delegation — Route tasks to specialized workers

Dashboard — Real-time observability + task rating

Vector Memory — Semantic search across knowledge

Compound Intelligence — Knowledge that grows

Subagents — 3 auto-dispatched specialists

Slash Commands

Multi-Agent Resilience

What to .gitignore

Adoption Checklist

Feedback

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages