Cost-aware LLM API router — x402 POC

A single OpenAI-compatible entry point that gates each LLM request behind a per-request programmatic payment (x402 V2), forwards to an upstream provider, and reports its own overhead.

Layout

Path	Role
`crates/x402`	x402 V2 protocol crate, transport-agnostic, no LLM coupling
`poc-x402-llm`	the router binary, consumes the crate
`test-agent`	x402 V2 buyer that pays and measures overhead
`docs/`	architecture and decision records

The crate is the reusable artifact; the binary is one consumer. The boundary is strict (see docs/DECISIONS.md, D2/D7).

Protocol correctness

The crate is pinned to the canonical x402 spec, not to memory. Wire types and the facilitator contract are aligned with specs/transports-v2/http.md and go/FACILITATOR.md of the x402 foundation repository. Golden tests decode the verbatim Base64 header values from the spec. V2 only (x402Version: 2); V1 is rejected.

Prerequisites

Rust stable, Node 20+
A Base Sepolia recipient wallet (receives test payments)
A funded Base Sepolia buyer wallet for test-agent:
- test ETH for gas headroom — Base / Coinbase Developer Platform faucet
- test USDC for payments — Circle faucet
An upstream provider endpoint, model id and API key

No real funds are involved: Base Sepolia is a testnet.

Configuration

Copy config.example.toml to config.local.toml (gitignored) and fill in real values. Any key can be overridden by environment: AR_<SECTION>__<KEY>, e.g. AR_UPSTREAM__API_KEY. The upstream endpoint, model id, API key and recipient address never enter the versioned tree.

Run

cargo run -p poc-x402-llm                       # serves on config.server.bind
cd test-agent && npm install
EVM_PRIVATE_KEY=0x... RUNS=20 npm start         # pays and measures

The agent issues a request, receives 402 with a PAYMENT-REQUIRED challenge, constructs and signs a payment, retries with PAYMENT-SIGNATURE, and on success reads the timing headers.

Measurement

The router emits per-request timing headers in microseconds:

Header	Meaning
`x-verify-us`	facilitator `/verify` round trip
`x-upstream-us`	upstream provider call
`x-settle-us`	facilitator `/settle` round trip
`x-overhead-us`	total minus upstream — the router's own cost
`x-total-us`	full handler

test-agent reports median / p95 / p99 over the configured run count. The target from docs/ARCHITECTURE.md is p99 overhead < 5 ms; the overhead here additionally includes the payment round trips, which is the honest figure for a payment-gated path.

Results — POC-1: sync settle, facilitator verify (Base Sepolia, n=20)

Upstream: DeepSeek deepseek-chat. Facilitator: hosted x402.org. All values in milliseconds.

stage	median	p95	p99
verify	228	273	330
upstream	824	1050	1206
settle	593	1138	2229
overhead (total − upstream)	846	1410	2439
total	1693	2372	3645

The architecture target (docs/ARCHITECTURE.md) is p99 overhead < 5 ms. Measured p99 overhead is ~2439 ms — about 490× over target. The Rust proxy's own cost (parsing, codec, dispatch) is negligible; the overhead is entirely verify + settle, two synchronous facilitator round trips on the hot path, with settle blocking on Base Sepolia confirmation.

Conclusion: synchronous per-request settlement is not viable for a latency-sensitive payment-gated LLM proxy. Settlement must leave the hot path (deferred/async, or the x402 batch-settlement scheme), and verify must be optimized (local/optimistic verification, caching). The settlement mode is a structuring V1 decision, not a late add-on. See docs/DECISIONS.md, D10.

Results — POC-2: the latency/risk frontier (Base Sepolia, n=20)

Same harness, same upstream. Overhead = total − upstream. Three points on the latency vs economic-risk curve:

mode	overhead p50	p95	p99	vs 5 ms target	risk added
baseline (sync settle, facilitator verify)	846 ms	1410 ms	2439 ms	~490× over	none (strict)
async settle, facilitator verify	242 ms	413 ms	956 ms	~191× over	serve-before-settle
async settle, cached verify (TTL, per payer)	0.067 ms	0.117 ms	0.141 ms	~35× under	+ per-payer trust window

On-chain: async settlements land (RECIPIENT +42 over the two runs), but 42 of ~46 requested — confirming a tokio::spawn is not durable; a real system needs a settlement queue with reconciliation.

Reading: with strict x402 (synchronous settle + verify) the 5 ms target is missed by ~490×. The Rust proxy's own cost is ~67 µs (p50) — it is not the problem. The target is reachable only by moving settlement off the hot path and caching verification per payer, i.e. by purchasing latency with two explicit economic risks: serve-before-settle (needs a durable settlement queue + reconciliation) and a per-payer trust window (needs per-payer spend caps + a short TTL). The x402 payment mode is a latency/risk dial and a V1 architecture decision. See docs/DECISIONS.md, D11.

Results — POC-3: batch-settlement (Base Sepolia, n=20 pure vouchers)

One-click on-chain evidence: see RESULTS.md for the per-loop claim/settle transaction table (each settle linked to Base Sepolia BaseScan, settledAmount equal to the on-chain recipient delta to the raw unit) and a copy-pasteable reproduction recipe.

A Rust x402::batch_settlement core was implemented and proven correct: EIP-712 domain / channelId / voucher digest and secp256k1 recovery are byte-exact against viem (the library the x402 client SDK signs with), 5 golden tests pinned to authoritative vectors. No public Rust implementation of batch-settlement existed.

End-to-end measured on Base Sepolia. One on-chain deposit in warmup (payer −0.1 USDC, escrow +0.1 USDC), then 20 requests served against off-chain cumulative vouchers. Recipient balance unchanged: vouchers are never settled per request — value is committed once on-chain and claimed/settled in batches off the hot path.

Voucher steady-state overhead (= total − upstream), settle is 0 by construction:

build	median	p95	p99
release (target-cpu=native)	0.21 ms	0.38 ms	0.39 ms
debug	3.47 ms	5.52 ms	5.59 ms

The overhead is entirely local secp256k1 recover + keccak EIP-712 + response build — no facilitator round trip, no chain, no per-request settlement. Debug inflates the crypto ~16×; release is the representative figure.

Reading: batch-settlement clears the 5 ms p99 target by ~13× (release) with no risk compromise — unlike POC-2 mode-3, which hit 0.14 ms only by accepting a per-payer trust window. Here every request carries a capital-backed EIP-712 voucher against an on-chain escrow; settlement is batched off the hot path. This is the principled solution, not a latency/safety trade. The batched claim/settle that moves escrow → recipient is now demonstrated end-to-end on Base Sepolia (the facilitator holds the delegated receiver-authorizer). See docs/DECISIONS.md, D12.

Tranche 2 (bounded channel-manager core). Server-authored claim/settle types + a router core (multi-channel aggregation, idempotent claim→settle, facilitator-delegated authorizer) are implemented and unit-proven (9 batch tests). The full escrow → recipient loop is demonstrated on Base Sepolia in a single process: one deposit (payer → escrow), 20+ off-chain vouchers, then one /admin/claim-settle. RECIPIENT USDC, read on-chain before and after, strictly increased — 0.086 → 0.142 USDC (settle tx 0x58da7777…, settledAmount 56000 == the on-chain delta), and a second loop 0.142 → 0.165 (settle tx 0x7c6a7fe8…, settledAmount 23000). Re-run at smaller deposit footprint (multiplier 20) on one server process: 4 consecutive single-trigger closures, each settledAmount 21000 == on-chain RECIPIENT delta; RECIPIENT 0.182 → 0.266 USDC. Three lifecycle bugs were found via the facilitator's own error codes and fixed: (1) the reference client opens the channel with a fixed DEFAULT_SALT (zero), so every run reused the same channel and the deposit was rejected with cumulative_below_claimed — fixed with a fresh per-run channel salt; (2) the settle leg was submitted before the claim transaction was mined, colliding on the shared facilitator-delegated sender nonce (replacement transaction underpriced), and the facilitator's settle view further lagged a freshly mined claim (nothing_to_settle) — fixed by confirming the claim receipt on-chain before settle, plus a bounded in-process settle retry; (3) the deposit branch wiped the per-channel state on every top-up, so the claim leg was bound to the post-top-up voucher's small maxClaimableAmount and settledAmount settled well below the channel's actual throughput — fixed by preserving the highest cumulative voucher ever signed for the channel (last_max/last_sig) across top-ups. Hot path unchanged (release p99 378 µs). Caveat: demonstrated single-process with a fresh channel per run; durable cross-restart channel-state reconciliation, refund, and withdrawal monitoring remain intentionally out of scope. See docs/DECISIONS.md, D12 (Tranche 2 outcome).

Documented failure modes

No / invalid PAYMENT-SIGNATURE → 402 with a fresh challenge.
Verification invalid → 402 with the facilitator reason.
Payment verified but upstream fails → settlement is skipped; the client keeps its funds; the router returns 502.
Resource delivered but settlement fails → surfaced as an error; a production system would queue a settlement retry (out of POC scope).

Known scope limits

Addresses, hex and signatures are validated only by JSON shape, not strongly typed; local cryptographic verification of the EIP-3009 signature is not implemented (only needed for a facilitator-less mode). Streaming is a separate POC: per-request payment plus SSE raises an open question (charge per request or per token).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.cargo		.cargo
crates/x402		crates/x402
docs		docs
poc-x402-llm		poc-x402-llm
test-agent		test-agent
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
RESULTS.md		RESULTS.md
config.example.toml		config.example.toml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cost-aware LLM API router — x402 POC

Layout

Protocol correctness

Prerequisites

Configuration

Run

Measurement

Results — POC-1: sync settle, facilitator verify (Base Sepolia, n=20)

Results — POC-2: the latency/risk frontier (Base Sepolia, n=20)

Results — POC-3: batch-settlement (Base Sepolia, n=20 pure vouchers)

Documented failure modes

Known scope limits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cost-aware LLM API router — x402 POC

Layout

Protocol correctness

Prerequisites

Configuration

Run

Measurement

Results — POC-1: sync settle, facilitator verify (Base Sepolia, n=20)

Results — POC-2: the latency/risk frontier (Base Sepolia, n=20)

Results — POC-3: batch-settlement (Base Sepolia, n=20 pure vouchers)

Documented failure modes

Known scope limits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages