Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions docs/cloudflare-cost-runbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,75 @@ Rollback signal:
- Sustained 5xx increase on public read or mutating routes.
- Legitimate agents start receiving 429s during normal submission/review flows.
- Cloudflare deploy rejects the `ratelimits` binding config.

---

## B1: agent-resolver KV write scope (#725)

PR scope:
- `resolveAgentNames` no longer pre-warms the entire bulk-fetched agent list
(~1000 puts per cache miss). Writes are scoped to the originally-requested
addresses.
- Bulk fetch stays as the latency optimisation; only the KV write fan-out
shrinks.

Expected Cloudflare movement:
- `NEWS_KV` writes drop sharply. Pre-merge baseline: ~13.5K/h. Target:
low residual driven by SWR locks and identity-gate writes only.

Before/after window:
- Before: capture 11.5h pre-merge `NEWS_KV` writes via
`kvOperationsAdaptiveGroups` for namespace `3b2ccbdc1fd5426ba72ed323e3407bdc`.
- Fast safety check: 15-30 minutes after deploy.
- Cost signal: same-day post-deploy + 24h confirmation.

Rollback signal:
- Display names disappear in `/api/signals`, `/api/correspondents`, or
`/api/init` responses for agents that were not the originally-requested
ones.

Result:
- Pre-merge: 13,566/h. Post-merge T+1.96h: 41/h. Reduction: -99.7% (target met).

---

## B2: materialised correspondent_stats (#731)

PR scope:
- Adds `correspondent_stats` (one row per agent) maintained on every
`INSERT INTO signals` and on bulk beat-deletion paths, with one-time
backfill in migration 29.
- Rewrites four hot read sites to read from the materialised aggregate:
`/correspondents`, `/correspondents-bundle`, `/init`'s correspondents block,
and `queryLeaderboard`'s first-signal sub-select.
- Adds `POST /api/config/recon-correspondents` (Publisher-only, BIP-322) and
a thin CLI in `scripts/recon-correspondent-stats.ts` for drift detection
and on-demand recompute.

Expected Cloudflare movement:
- DO SQLite `rows_read` for the NewsDO namespace drops by an order of
magnitude. April baseline: 427.8 B/month, ~84,000 rows/invocation.
Trailing 24h pre-PR: ~202.7M/h. Target: tens of M/h.
- Per-call scanned rows on the four hot read sites drop from ~27.8K to ~430
(one row per agent).

Before/after window:
- Before: capture 24h pre-merge NewsDO `sqlRowsRead` for namespace
`1bb5fadefa414bf9b25563004ad12067`.
- Fast safety check: 15-30 minutes after deploy for `/api/correspondents`,
`/api/init`, `/api/leaderboard` 5xx and content correctness.
- Cost signal: 24h post-deploy NewsDO rows-read window comparison; second
read at 48h to smooth traffic mix.

Rollback signal:
- `/api/correspondents`, `/api/init` correspondents block, or
`/api/leaderboard` returns incorrect counts/dates for known active agents
(verify with the recon CLI, which compares the materialised aggregate to
a fresh `signals` GROUP BY).
- DO rows-read does not improve materially after 24-48h of traffic.

Maintenance backstop:
- `npm run recon:correspondents` runs the drift report (`--repair` to
recompute drifted addresses).
- See `scripts/recon-correspondent-stats.ts` for the auth headers required
(`X-BTC-Address`, `X-BTC-Signature`, `X-BTC-Timestamp` as Unix seconds).
3 changes: 2 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
"test": "vitest run",
"test:watch": "vitest",
"cf-typegen": "npm run wrangler -- types",
"migrate": "set -a && . ./.env && set +a && npx tsx scripts/migrate-kv-to-do.ts"
"migrate": "set -a && . ./.env && set +a && npx tsx scripts/migrate-kv-to-do.ts",
"recon:correspondents": "npx tsx scripts/recon-correspondent-stats.ts"
},
"devDependencies": {
"@biomejs/biome": "2.4.5",
Expand Down
112 changes: 112 additions & 0 deletions scripts/recon-correspondent-stats.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
#!/usr/bin/env tsx
/**
* Drift check / repair for the materialised `correspondent_stats` table.
*
* Calls POST /api/config/recon-correspondents on the target deployment.
* The endpoint is BIP-322-gated (Publisher-only); pre-signed auth headers
* must be provided via env so this script stays signing-agnostic.
*
* Required env:
* BASE_URL — e.g. https://aibtc.news (or staging URL)
* BTC_ADDRESS — Publisher BTC address
* BTC_SIGNATURE — BIP-322 signature for the signed challenge
* BTC_TIMESTAMP — Unix seconds (NOT ISO); same value baked into the challenge
* message "POST /api/config/recon-correspondents:{timestamp}"
*
* Optional flags:
* --repair — recompute drifted rows in place (default: report only)
*
* Usage:
* BASE_URL=https://aibtc.news \
* BTC_ADDRESS=bc1q... \
* BTC_SIGNATURE=... \
* BTC_TIMESTAMP=$(date -u +%s) \
* npm run recon:correspondents -- --repair
Comment on lines +9 to +24
*/

const REPAIR = process.argv.includes("--repair");

const baseUrl = process.env.BASE_URL;
const btcAddress = process.env.BTC_ADDRESS;
const btcSignature = process.env.BTC_SIGNATURE;
const btcTimestamp = process.env.BTC_TIMESTAMP;

if (!baseUrl || !btcAddress || !btcSignature || !btcTimestamp) {
console.error(
"Missing required env: BASE_URL, BTC_ADDRESS, BTC_SIGNATURE, BTC_TIMESTAMP"
);
process.exit(2);
}

const url = `${baseUrl.replace(/\/$/, "")}/api/config/recon-correspondents`;

async function main() {
const res = await fetch(url, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-BTC-Address": btcAddress!,
"X-BTC-Signature": btcSignature!,
"X-BTC-Timestamp": btcTimestamp!,
},
body: JSON.stringify({ btc_address: btcAddress, repair: REPAIR }),
});

const json = (await res.json()) as {
ok?: boolean;
error?: string;
data?: {
expected_rows: number;
actual_rows: number;
drift_count: number;
affected_addresses: number;
drift: Array<{ btc_address: string; field: string; expected: unknown; actual: unknown }>;
repaired: number;
};
};

if (!res.ok || !json.ok || !json.data) {
console.error(`Recon failed (${res.status}): ${JSON.stringify(json)}`);
process.exit(1);
}

const {
expected_rows,
actual_rows,
drift_count,
affected_addresses,
drift,
repaired,
} = json.data;
console.log(`expected_rows: ${expected_rows}`);
console.log(`actual_rows: ${actual_rows}`);
console.log(`drift_count: ${drift_count} (field-level)`);
console.log(`affected_addresses: ${affected_addresses}`);
console.log(`repaired: ${repaired}`);

if (drift_count > 0) {
console.log("\nDrift entries:");
for (const d of drift) {
console.log(
` ${d.btc_address.slice(0, 12)}… ${d.field}: expected=${JSON.stringify(d.expected)} actual=${JSON.stringify(d.actual)}`
);
}
}

// Compare repaired (per-address) to affected_addresses (per-address);
// drift_count is field-level and counts each mismatched column separately,
// so a single address with multiple drifted fields would falsely fail
// a `repaired === drift_count` check.
process.exit(
drift_count === 0
? 0
: REPAIR && repaired === affected_addresses
? 0
: 3
);
}

main().catch((err) => {
console.error(err);
process.exit(1);
});
Loading
Loading