SoulForge
An AI-native agent engineering substrate for building observable, replayable, Base-native agents.
SoulForge is a production-conscious repository architecture for AI-assisted agent engineering. Its primary interface is not a framework import or a scaffold command. Its primary interface is a cloned repo plus a natural-language instruction to a coding agent:
"Claude, build me a Base-native research agent with memory and x402 payments."
"Codex, create a long-horizon monitoring agent with Bankr integration and evals."
The repository itself is part of the product. Its folders, contracts, examples, evals, observability, and .ai/ guidance are arranged so Claude, Codex, Cursor, OpenAI agents, and local coding agents can navigate the system quickly and create high-quality agents with fewer architectural mistakes.
The thesis:
primitives + working examples > framework lock-in
SoulForge is not a runtime. It is not a template CLI dressed up as a product. The repo provides copyable primitives, typed reference implementations, runnable examples, eval harnesses, observability, memory, and machine-readable guidance that make AI-generated agents dramatically more reliable and production-grade.
Agents should be easy to create, hard to create incorrectly. The repo gives AI systems deterministic structure, strong contracts, naming conventions, eval expectations, and observability requirements so natural-language requests land as coherent implementations.
The repository structure is product surface. .ai/, docs/, examples, contracts, tests, and naming conventions are designed as a navigation system for autonomous coding agents.
Primitives over frameworks. SoulForge produces copyable infrastructure, not a package that every agent must import.
Human-readable souls. Souls stay markdown-first. Schema validates structure, but humans remain the primary authors.
Provider-agnostic policy. Souls do not encode OpenAI, Anthropic, Gemini, local model, or hosted-tool assumptions. Examples can pin providers. Primitives should not.
Economic actions must be observable. If agents can move money, execution must be replayable, actions must emit telemetry, receipts must persist, limits must be enforceable, and approvals must be explicit.
git clone https://github.com/0xAxiom/soulforge.git
cd soulforge
npm installThen ask your coding agent:
Build a Base-native research agent with memory, x402 payments, eval goldens, and JSONL observability.
The agent should read:
README.mdCLAUDE.mddocs/ARCHITECTURE.md.ai/repo-map.json.ai/task-routing.md- the nearest module README for each primitive it touches
The implementation should produce a standalone agent or reference example with this shape:
| File | Purpose |
|---|---|
soul.md |
Human-readable behavior, scope, and refusal policy |
src/contracts.ts |
Zod input/output schemas |
src/tools.ts |
Isolated tool wiring |
src/memory.ts |
Local memory lifecycle |
src/endpoint.ts |
Request handler composition |
src/observability.ts |
JSONL trace sink |
src/eval.ts |
Local replay runner |
eval/goldens/ |
Golden cases, including refusal cases |
.env.example |
Runtime knobs and safe defaults |
The scaffold command is a supporting accelerator when the coding agent wants a known-good starting point:
npx soulforge new base-research --template research-agent
npx soulforge new paid-agent --template x402-paid-agent
npx soulforge new dry-run-trader --template trading-agentGenerated projects are examples of the expected structure. They are not the core product and they are not a runtime dependency.
soulforge/
├── .ai/ machine-readable guidance for coding agents
├── CLAUDE.md operating contract for AI contributors
├── docs/ architecture and release docs
├── generator/ optional scaffold accelerator and template examples
├── souls/ markdown souls, schema, validator
├── tools/ optional typed capability modules
├── endpoints/ endpoint templates and working demos
├── memory/ short-term, long-term, recall, reflect
├── eval/ traces, goldens, score, diff, cache
├── observability/ JSONL cost, latency, error, receipt events
└── research/ implementation research notes
The implementation folders are peers. No primitive secretly owns the others.
| Module | Status | Verification |
|---|---|---|
souls/ |
Schema, examples, validate CLI | npm run validate-souls |
memory/ |
Local-first memory contracts: Map KV, SQLite long-term, recall, reflection, telemetry | npm run test -- memory |
eval/ |
JSONL traces, goldens, scoring, diff, cache | npm run eval -- run --soul souls/examples/starter-soul.md |
observability/ |
JSONL sink, cost ledger, latency histogram, error grouping | npm run test -- observability |
tools/bankr/ |
Dry-run-first Bankr adapter with receipts and guardrails | npm run test -- tools/bankr |
generator/ |
Optional scaffold accelerator and six structure examples | npm run scaffold:smoke |
endpoints/ |
Endpoint contracts, x402 template, URL inspector examples | npm run test -- endpoints |
Quality gates:
npm run lint
npm run typecheck
npm run test
npm run build
npm run validate-soulsNever:
- add hidden runtimes
- bypass eval
- bypass observability
- use unstructured tool outputs
- hardcode providers into souls
- create giant god-agents
- skip replayability
- skip idempotency for financial actions
Always:
- use structured outputs
- emit telemetry
- create eval goldens
- keep tools isolated
- keep primitives composable
- preserve local-first operability
- document env vars and failure behavior
When modifying SoulForge:
- Read
README.md. - Read
CLAUDE.md. - Read
docs/ARCHITECTURE.md. - Read
.ai/repo-map.jsonand the relevant.ai/*.mdguide. - Inspect neighboring modules before adding abstractions.
- Preserve primitive boundaries.
- Add tests for code changes.
- Add eval goldens for agent behavior changes.
- Add observability for runtime paths.
- Update docs in the same change.
Use .ai/ for machine guidance, examples for local patterns, generator/ only when a scaffold speeds up implementation, and the primitive folders for real code.
flowchart LR
Dev["Developer"]
Prompt["Natural-language request<br/>Claude / Codex / Cursor"]
Guide[".ai guidance<br/>repo-map + task routing"]
Examples["Examples + patterns<br/>neighboring modules"]
Soul["Soul<br/>markdown policy"]
Tools["Tools<br/>typed Zod contracts"]
Memory["Memory<br/>local state + reflection"]
Endpoint["Endpoint<br/>HTTP/local interface"]
Eval["Eval<br/>goldens + replay"]
Obs["Observability<br/>JSONL traces"]
Gen["Optional scaffold<br/>soulforge new"]
Deploy["Deployment<br/>copyable project"]
Dev --> Prompt --> Guide --> Examples --> Soul --> Tools --> Memory --> Endpoint --> Eval --> Obs --> Deploy
Guide -. accelerates .-> Gen -. provides structure .-> Soul
Tools --> Obs
Endpoint --> Obs
Use short-term memory for active turn state, SQLite long-term memory for durable records, recall for deterministic retrieval, and reflection for manual session summarization. Include provenance and trace identifiers on persisted records.
Validate request input, verify payment, execute the tool, persist a receipt, emit observability, and score the behavior with eval goldens. Never execute the paid tool before payment validation.
Planner emits typed tasks. Executor performs bounded tool calls. Planner reviews the result. Eval scores the full trace. Memory stores handoffs and failure escalations.
Default to dry-run. Require live flag, spending cap, network allowlist, idempotency key, scoped wallet or sub-account, receipt persistence, and telemetry.
SoulForge treats Base as first-class agent infrastructure: payment, identity, scoped wallets, spend permissions, receipts, and paid service calls are part of the execution model. The goal is not to make hype-driven trading bots. The goal is to make economic agents inspectable, replayable, and bounded enough for serious software teams to operate.
Base-native agents can compose:
| Capability | SoulForge boundary |
|---|---|
| Receive payments | x402 endpoint manifests, Base Pay, and receipt capture |
| Manage scoped wallets | Sub-accounts or task wallets with limited balances |
| Use spend permissions | Explicit allowances, expirations, revocation, and inspection |
| Perform controlled swaps | Typed tool calls, simulation first, dry-run default |
| Execute strategies | Idempotent action keys, spending caps, live-mode gates |
| Persist execution receipts | JSONL receipts that memory, eval, and observability replay |
| Coordinate paid calls | x402 client/server flows with hard spending caps |
Financial actions are never autonomous by default, unlimited, hidden, or unobservable.
SoulForge treats financial execution like infrastructure operations, not "AI magic."
flowchart LR
U["User"]
X["x402 endpoint<br/>payment-gated request"]
S["Soul<br/>human-authored policy"]
T["Tool Layer<br/>typed schemas and caps"]
G{"Economic boundary<br/>dry-run/live flag<br/>spending cap<br/>idempotency key<br/>scoped wallet"}
B["Bankr tool<br/>portfolio, prices, swaps"]
Base["Base<br/>USDC, Base Pay, spend permissions"]
O["Observability<br/>cost, latency, errors, trace ids"]
E["Eval<br/>replay, score, regression diff"]
M["Memory<br/>receipts, summaries, recall"]
R["Receipt persistence<br/>payment proof, job id, tx hash"]
U --> X --> S --> T --> G --> B --> Base --> R
R --> O
R --> E
R --> M
O --> E
tools/bankr/ is an optional programmable-finance adapter. It exposes typed, dry-run-first primitives for price checks, portfolio reads, swap simulation, guarded live swap submission, and execution receipts.
Guardrails:
- dry-run default
- Base and Base Sepolia allowlist
- explicit
live: truefor execution spendingCapUsdrequired for live swapsidempotencyKeyrequired for live swaps- no direct sign/submit API exposed
- observability on success and failure
Run:
npm run test -- tools/bankr
npx tsx tools/bankr/examples/dry-run-swap.ts| Artifact | Convention |
|---|---|
| Soul | souls/examples/<agent-name>-soul.md or generated soul.md |
| Tool module | tools/<tool-name>/src/index.ts |
| Tool test | tools/<tool-name>/src/<tool-name>.test.ts |
| Endpoint example | endpoints/examples/<agent-name>/ |
| Eval golden | eval/goldens/<soul-name>/golden-001.json |
| Generated golden | <agent>/eval/goldens/golden-001.json |
| Telemetry | ~/.soulforge/obs/YYYY-MM-DD.jsonl |
| Research note | research/YYYY-MM-DD-topic.md |
Every primitive documents:
| Contract | Required answer |
|---|---|
| Inputs | What schemas or types are accepted? |
| Outputs | What structured object is returned? |
| Side effects | What can be written, called, paid, or posted? |
| Persistence | What survives process exit? |
| Observability | Which events are emitted? |
| Failure behavior | What error shape or refusal path is used? |
| Replay guarantees | What can eval or a human reproduce later? |
AI contributors should not infer these. Add or update the contract when behavior changes.
Machine-readable checklist for every generated or reference agent:
- soul is markdown and validates if placed under
souls/examples/ - tools have typed input and output schemas
- endpoint validates before execution
- tests cover success and refusal/failure
- eval goldens include at least one refusal case
- observability emits trace identifiers
- memory lifecycle is explicit
- external calls are mockable
- economic actions default to dry-run
- live economic actions require caps and idempotency
- README includes env docs and verification commands
Most repos are difficult for AI because architecture is implicit: inconsistent layout, weak typing, hidden side effects, no evals, poor observability, and unclear ownership boundaries.
SoulForge optimizes for:
- explicit primitives
- deterministic structure
- machine-readable repo maps
- natural-language task routing
- strong TypeScript and Zod contracts
- local-first memory and observability
- replayable evals
- isolated tools
- highly legible examples and optional scaffolds
AI agents should generate understandable systems, not opaque magic stacks.
License: MIT.