A single OpenAI-compatible entry point that gates each LLM request behind a per-request programmatic payment (x402 V2), forwards to an upstream provider, and reports its own overhead.
| Path | Role |
|---|---|
crates/x402 |
x402 V2 protocol crate, transport-agnostic, no LLM coupling |
poc-x402-llm |
the router binary, consumes the crate |
test-agent |
x402 V2 buyer that pays and measures overhead |
docs/ |
architecture and decision records |
The crate is the reusable artifact; the binary is one consumer. The
boundary is strict (see docs/DECISIONS.md, D2/D7).
The crate is pinned to the canonical x402 spec, not to memory. Wire
types and the facilitator contract are aligned with
specs/transports-v2/http.md and go/FACILITATOR.md of the x402
foundation repository. Golden tests decode the verbatim Base64 header
values from the spec. V2 only (x402Version: 2); V1 is rejected.
- Rust stable, Node 20+
- A Base Sepolia recipient wallet (receives test payments)
- A funded Base Sepolia buyer wallet for
test-agent:- test ETH for gas headroom — Base / Coinbase Developer Platform faucet
- test USDC for payments — Circle faucet
- An upstream provider endpoint, model id and API key
No real funds are involved: Base Sepolia is a testnet.
Copy config.example.toml to config.local.toml (gitignored) and fill
in real values. Any key can be overridden by environment:
AR_<SECTION>__<KEY>, e.g. AR_UPSTREAM__API_KEY. The upstream
endpoint, model id, API key and recipient address never enter the
versioned tree.
cargo run -p poc-x402-llm # serves on config.server.bind
cd test-agent && npm install
EVM_PRIVATE_KEY=0x... RUNS=20 npm start # pays and measuresThe agent issues a request, receives 402 with a PAYMENT-REQUIRED
challenge, constructs and signs a payment, retries with
PAYMENT-SIGNATURE, and on success reads the timing headers.
The router emits per-request timing headers in microseconds:
| Header | Meaning |
|---|---|
x-verify-us |
facilitator /verify round trip |
x-upstream-us |
upstream provider call |
x-settle-us |
facilitator /settle round trip |
x-overhead-us |
total minus upstream — the router's own cost |
x-total-us |
full handler |
test-agent reports median / p95 / p99 over the configured run count.
The target from docs/ARCHITECTURE.md is p99 overhead < 5 ms; the
overhead here additionally includes the payment round trips, which is
the honest figure for a payment-gated path.
Upstream: DeepSeek deepseek-chat. Facilitator: hosted x402.org.
All values in milliseconds.
| stage | median | p95 | p99 |
|---|---|---|---|
| verify | 228 | 273 | 330 |
| upstream | 824 | 1050 | 1206 |
| settle | 593 | 1138 | 2229 |
| overhead (total − upstream) | 846 | 1410 | 2439 |
| total | 1693 | 2372 | 3645 |
The architecture target (docs/ARCHITECTURE.md) is p99 overhead
< 5 ms. Measured p99 overhead is ~2439 ms — about 490× over
target. The Rust proxy's own cost (parsing, codec, dispatch) is
negligible; the overhead is entirely verify + settle, two
synchronous facilitator round trips on the hot path, with settle
blocking on Base Sepolia confirmation.
Conclusion: synchronous per-request settlement is not viable for a
latency-sensitive payment-gated LLM proxy. Settlement must leave the
hot path (deferred/async, or the x402 batch-settlement scheme), and
verify must be optimized (local/optimistic verification, caching).
The settlement mode is a structuring V1 decision, not a late add-on.
See docs/DECISIONS.md, D10.
Same harness, same upstream. Overhead = total − upstream. Three points on the latency vs economic-risk curve:
| mode | overhead p50 | p95 | p99 | vs 5 ms target | risk added |
|---|---|---|---|---|---|
| baseline (sync settle, facilitator verify) | 846 ms | 1410 ms | 2439 ms | ~490× over | none (strict) |
| async settle, facilitator verify | 242 ms | 413 ms | 956 ms | ~191× over | serve-before-settle |
| async settle, cached verify (TTL, per payer) | 0.067 ms | 0.117 ms | 0.141 ms | ~35× under | + per-payer trust window |
On-chain: async settlements land (RECIPIENT +42 over the two runs),
but 42 of ~46 requested — confirming a tokio::spawn is not durable;
a real system needs a settlement queue with reconciliation.
Reading: with strict x402 (synchronous settle + verify) the 5 ms
target is missed by ~490×. The Rust proxy's own cost is ~67 µs (p50)
— it is not the problem. The target is reachable only by moving
settlement off the hot path and caching verification per payer,
i.e. by purchasing latency with two explicit economic risks:
serve-before-settle (needs a durable settlement queue + reconciliation)
and a per-payer trust window (needs per-payer spend caps + a short
TTL). The x402 payment mode is a latency/risk dial and a V1
architecture decision. See docs/DECISIONS.md, D11.
One-click on-chain evidence: see
RESULTS.mdfor the per-loop claim/settle transaction table (each settle linked to Base Sepolia BaseScan,settledAmountequal to the on-chain recipient delta to the raw unit) and a copy-pasteable reproduction recipe.
A Rust x402::batch_settlement core was implemented and proven
correct: EIP-712 domain / channelId / voucher digest and
secp256k1 recovery are byte-exact against viem (the library the x402
client SDK signs with), 5 golden tests pinned to authoritative
vectors. No public Rust implementation of batch-settlement existed.
End-to-end measured on Base Sepolia. One on-chain deposit in warmup (payer −0.1 USDC, escrow +0.1 USDC), then 20 requests served against off-chain cumulative vouchers. Recipient balance unchanged: vouchers are never settled per request — value is committed once on-chain and claimed/settled in batches off the hot path.
Voucher steady-state overhead (= total − upstream), settle is 0 by construction:
| build | median | p95 | p99 |
|---|---|---|---|
| release (target-cpu=native) | 0.21 ms | 0.38 ms | 0.39 ms |
| debug | 3.47 ms | 5.52 ms | 5.59 ms |
The overhead is entirely local secp256k1 recover + keccak EIP-712 + response build — no facilitator round trip, no chain, no per-request settlement. Debug inflates the crypto ~16×; release is the representative figure.
Reading: batch-settlement clears the 5 ms p99 target by ~13×
(release) with no risk compromise — unlike POC-2 mode-3, which hit
0.14 ms only by accepting a per-payer trust window. Here every request
carries a capital-backed EIP-712 voucher against an on-chain escrow;
settlement is batched off the hot path. This is the principled
solution, not a latency/safety trade. The batched claim/settle that
moves escrow → recipient is now demonstrated end-to-end on Base
Sepolia (the facilitator holds the delegated receiver-authorizer).
See docs/DECISIONS.md, D12.
Tranche 2 (bounded channel-manager core). Server-authored
claim/settle types + a router core (multi-channel aggregation,
idempotent claim→settle, facilitator-delegated authorizer) are
implemented and unit-proven (9 batch tests). The full
escrow → recipient loop is demonstrated on Base Sepolia in a single
process: one deposit (payer → escrow), 20+ off-chain vouchers, then
one /admin/claim-settle. RECIPIENT USDC, read on-chain before and
after, strictly increased — 0.086 → 0.142 USDC (settle tx
0x58da7777…, settledAmount 56000 == the on-chain delta), and a
second loop 0.142 → 0.165 (settle tx 0x7c6a7fe8…,
settledAmount 23000). Re-run at smaller deposit footprint
(multiplier 20) on one server process: 4 consecutive
single-trigger closures, each settledAmount 21000 == on-chain
RECIPIENT delta; RECIPIENT 0.182 → 0.266 USDC. Three lifecycle
bugs were found via the facilitator's own error codes and fixed: (1)
the reference client opens the channel with a fixed DEFAULT_SALT
(zero), so every run reused the same channel and the deposit was
rejected with cumulative_below_claimed — fixed with a fresh
per-run channel salt; (2) the settle leg was submitted before the
claim transaction was mined, colliding on the shared
facilitator-delegated sender nonce (replacement transaction underpriced), and the facilitator's settle view further lagged a
freshly mined claim (nothing_to_settle) — fixed by confirming the
claim receipt on-chain before settle, plus a bounded in-process
settle retry; (3) the deposit branch wiped the per-channel state on
every top-up, so the claim leg was bound to the post-top-up
voucher's small maxClaimableAmount and settledAmount settled
well below the channel's actual throughput — fixed by preserving
the highest cumulative voucher ever signed for the channel
(last_max/last_sig) across top-ups. Hot path unchanged
(release p99 378 µs). Caveat: demonstrated single-process with a
fresh channel per run; durable cross-restart channel-state
reconciliation, refund, and withdrawal monitoring remain
intentionally out of scope. See docs/DECISIONS.md, D12
(Tranche 2 outcome).
- No / invalid
PAYMENT-SIGNATURE→402with a fresh challenge. - Verification invalid →
402with the facilitator reason. - Payment verified but upstream fails → settlement is skipped; the
client keeps its funds; the router returns
502. - Resource delivered but settlement fails → surfaced as an error; a production system would queue a settlement retry (out of POC scope).
Addresses, hex and signatures are validated only by JSON shape, not strongly typed; local cryptographic verification of the EIP-3009 signature is not implemented (only needed for a facilitator-less mode). Streaming is a separate POC: per-request payment plus SSE raises an open question (charge per request or per token).