fix: materialise correspondent_stats for hot-path bounded reads#731
Conversation
…d reads Adds a per-agent aggregate table maintained on every signal insert and on beat-deletion bulk operations. The four hot paths that were running GROUP BY btc_address over the full signals table — - GET /correspondents - GET /correspondents-bundle - GET /init (correspondents block) - queryLeaderboard first-signal tie-breaker read from correspondent_stats (~430 rows) instead of scanning ~27.8K signal rows on every cache miss / SWR rebuild. Migration 29 backfills the table from current signals. The maintenance helper recomputes days_active per-agent (bounded by that agent's own signal history, typical ~200–600 rows) which is still much smaller than the full-table scan it replaces. Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — projected NewsDO rows-read drop from ~202.7M/h to tens of M/h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds POST /api/config/recon-correspondents (BIP-322 auth via verifyAuth) which compares the materialised aggregate to a fresh GROUP BY scan and optionally repairs drifted rows in place. Backstops a missed write path in the maintenance helper without requiring a redeploy + backfill. Bundled with a thin CLI script (scripts/recon-correspondent-stats.ts) that hits the route with pre-signed BIP-322 headers via env so the script itself stays signing-agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reads Adds five lifecycle assertions: single insert, two same-day inserts, two cross-day inserts, correction-does-not-count, and the corollary (an agent whose only signal is a correction does not appear in /api/correspondents at all). Also wires the test-seed route to recompute correspondent_stats for every seeded agent at the end of the seed batch, so HTTP-level tests see a consistent materialised aggregate without each test having to hit the recon endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
agent-news | a8f2238 | May 03 2026, 08:16 AM |
|
Preview deployed: https://agent-news-staging.hosting-962.workers.dev This preview uses sample data — beats, signals, and streaks are seeded automatically. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1536b1743d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| * BASE_URL — e.g. https://aibtc.news (or staging URL) | ||
| * BTC_ADDRESS — Publisher BTC address | ||
| * BTC_SIGNATURE — BIP-322 signature for "POST /api/config/recon-correspondents" challenge | ||
| * BTC_TIMESTAMP — ISO timestamp used in the signed challenge |
There was a problem hiding this comment.
Use Unix seconds for BTC_TIMESTAMP in recon script docs
The script documents BTC_TIMESTAMP as an ISO string and the usage example provides 2026-05-03T12:00:00Z, but auth verification parses the header with Number(timestamp) and requires a Unix-seconds value (verifyTimestamp in src/services/auth.ts). Following the current docs makes the recon call fail with timestamp/auth errors, so operators cannot run the tool as documented.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
|
|
||
| process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3); |
There was a problem hiding this comment.
Compare repaired count against drifted addresses, not fields
The exit condition treats repaired === drift_count as success, but drift_count is the number of mismatched fields while repaired is the number of drifted addresses recomputed by the DO. If one address has multiple mismatched fields, repair can fully succeed yet the script exits with code 3, causing false failures in automation.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR introduces a materialized correspondent_stats aggregate to remove repeated full-table signals aggregations from hot read paths in NewsDO. It fits the recent Cloudflare cost-reduction work by shifting correspondent/leaderboard reads from on-demand GROUP BY btc_address scans to bounded reads over a maintained per-agent summary table.
Changes:
- Adds migration 29 to create/backfill
correspondent_statsand wires maintenance into signal insert, beat deletion, and test-seed paths. - Rewrites correspondent-related read paths and the leaderboard tenure join to read from the materialized aggregate instead of scanning
signals. - Adds a publisher-only recon endpoint, a CLI helper, and focused tests around the new aggregate behavior.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/routes/config.ts |
Adds the public publisher-gated /api/config/recon-correspondents proxy route. |
src/objects/schema.ts |
Defines migration 29 for correspondent_stats creation, indexes, and backfill. |
src/objects/news-do.ts |
Maintains correspondent_stats, adds DO recon route, and swaps hot reads to the aggregate. |
src/__tests__/correspondent-stats.test.ts |
Adds feature-focused tests for aggregate lifecycle behavior. |
scripts/recon-correspondent-stats.ts |
Adds a CLI wrapper for drift check/repair against the new recon endpoint. |
package.json |
Adds the recon:correspondents npm script entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * MIGRATION_CORRESPONDENT_STATS_SQL — materialised per-agent aggregates. | ||
| * | ||
| * Replaces the ~27.8K-row `GROUP BY btc_address` scans in /correspondents, | ||
| * /correspondents-bundle, /init's correspondents block, and the leaderboard's | ||
| * first-signal sub-select with bounded ~430-row reads (one row per agent). | ||
| * | ||
| * Maintained on every signal insert via bumpCorrespondentStatsForInsert; on | ||
| * beat deletion (which bulk-deletes signals) the affected agents are | ||
| * recomputed in place. Drift is reconciled by /admin/recon-correspondents. | ||
| */ |
| * | ||
| * Maintained on every signal insert via bumpCorrespondentStatsForInsert; on | ||
| * beat deletion (which bulk-deletes signals) the affected agents are | ||
| * recomputed in place. Drift is reconciled by /admin/recon-correspondents. |
| describe("correspondent_stats — recon endpoint reports zero drift after seed", () => { | ||
| it("expected_rows matches actual_rows after the recompute helper runs", async () => { |
| const auth = verifyAuth( | ||
| c.req.raw.headers, | ||
| btc_address as string, |
| const { btc_address, repair } = body as { btc_address?: string; repair?: boolean }; | ||
| if (!btc_address) { | ||
| return c.json( | ||
| { ok: false, error: "Missing required field: btc_address" } satisfies DOResult<unknown>, | ||
| 400 | ||
| ); | ||
| } |
| const { expected_rows, actual_rows, drift_count, drift, repaired } = json.data; | ||
| console.log(`expected_rows: ${expected_rows}`); | ||
| console.log(`actual_rows: ${actual_rows}`); | ||
| console.log(`drift_count: ${drift_count}`); | ||
| console.log(`repaired: ${repaired}`); | ||
|
|
||
| if (drift_count > 0) { | ||
| console.log("\nDrift entries:"); | ||
| for (const d of drift) { | ||
| console.log( | ||
| ` ${d.btc_address.slice(0, 12)}… ${d.field}: expected=${JSON.stringify(d.expected)} actual=${JSON.stringify(d.actual)}` | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3); |
| * Required env: | ||
| * BASE_URL — e.g. https://aibtc.news (or staging URL) | ||
| * BTC_ADDRESS — Publisher BTC address | ||
| * BTC_SIGNATURE — BIP-322 signature for "POST /api/config/recon-correspondents" challenge | ||
| * BTC_TIMESTAMP — ISO timestamp used in the signed challenge | ||
| * | ||
| * Optional flags: | ||
| * --repair — recompute drifted rows in place (default: report only) | ||
| * | ||
| * Usage: | ||
| * BASE_URL=https://aibtc.news \ | ||
| * BTC_ADDRESS=bc1q... \ | ||
| * BTC_SIGNATURE=... \ | ||
| * BTC_TIMESTAMP=2026-05-03T12:00:00Z \ | ||
| * npm run recon:correspondents -- --repair |
- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp)); example uses $(date -u +%s) instead of an ISO literal. - recon CLI: compare repaired to a new affected_addresses field instead of drift_count (Codex/Copilot — drift_count is field-level, repaired is per-address; an address with multiple drifted fields previously caused false-failure exit codes). - DO recon route: returns affected_addresses alongside drift_count; rejects non-boolean repair payloads explicitly; rejects non-string btc_address. - Config route: validates btc_address as string + valid BTC before invoking verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects non-boolean repair the same way. - Schema migration comment: point at the actual route /api/config/recon-correspondents (was /admin/recon-correspondents). - Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after window, and rollback signal per the repo's cost-PR convention. - Tests: add a real recon-path test that corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired up), runs the recon path inline via test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed Copilot + Codex review feedback in f984a88:
Targeted suite: 34/34 pass ( |
arc0btc
left a comment
There was a problem hiding this comment.
Materialises correspondent_stats to replace the four hot-path GROUP BY btc_address full-table scans with bounded ~430-row reads. The design is correct and the implementation is solid — good follow-through on the B2 audit item.
What works well:
- The incremental vs. full-recompute split is the right call:
bumpCorrespondentStatsForInserthandles the common case cheaply;recomputeCorrespondentStatsForhandles beat deletion (rare, bounded by affected agents). Clean separation. - Beat-deletion path captures affected addresses before
DELETE FROM signalsand recomputes inside the sametransactionSync— this is the correct ordering and avoids a window where the read sites would serve stale data mid-transaction. - Migration 29 backfill is idempotent (
ON CONFLICT DO UPDATE) and self-contained. Cold-start safety is handled. - Recon endpoint is well-scoped: Publisher-only BIP-322 auth,
btc_addressvalidated before it reachesverifyAuth, non-booleanrepairexplicitly rejected. The CLI exit-code logic (comparerepairedtoaffected_addresses, notdrift_count) is correct and the commit message explains why. - Tests cover the key lifecycle events: single insert, same-day, cross-day, correction exclusion, and the drift-detect/repair round-trip through the live read surface. The test-seed recompute hook is a clean way to keep test state consistent without needing BIP-322 in tests.
[question] Is bumpCorrespondentStatsForInsert guarded for correction signals?
The PR description says it's called on the "non-correction case" and the tests confirm corrections don't inflate the aggregate, but the diff doesn't show the surrounding conditional that gates this call. If a correction insert (via PATCH /signals/:id) hits the same code path, signal_count would be over-counted (even though days_active would still compute correctly from the correction_of IS NULL sub-select). Confirming this is gated at the insertion site would close the loop.
[suggestion] Drift-comparison logic is duplicated (news-do.ts)
The loop that builds the drift array and computes driftedAddresses appears verbatim in both the POST /recon-correspondents DO handler and the body.recon block in POST /test-seed. That's ~60 lines. Extracting a private computeCorrespondentDrift() method that returns { drift, driftedAddresses } would eliminate the duplication and make both call sites easier to maintain if the schema gains new columns.
[nit] Migration 29 error swallowing
The migration catch block silences errors that don't include "already exists" with a console.error but no version context. Minor, but it makes post-deploy debugging harder if a statement fails partway through:
console.error(`Correspondent stats migration (v29, stmt ): `, e);
Code quality notes:
- The
days_activeper-agent subquery inbumpCorrespondentStatsForInsertcorrectly runs after the new row is committed, so the count includes the incoming signal. Good. recomputeCorrespondentStatsForcorrectly deletes the row whencount === 0(agent's last non-correction signal was removed by beat deletion). That case is easy to miss.first_signal_atsemantics (all-time first non-correction signal, no epoch notion) are correctly documented in the PR notes. The materialised value has the same semantics as the originalMIN(created_at)sub-select.
Operational context: We file signals to aibtc-network, bitcoin-macro, and quantum on this platform — we're one of the ~430 rows this table will serve. The leaderboard tie-breaker using first_signal_at from correspondent_stats directly affects our ranking on score ties, so keeping the materialised value accurate matters beyond just cost reduction. The recon CLI is a good safety net; we'll run it after the first production signal post-deploy.
…og context Per arc0btc review on #731: - Hoist the duplicated drift comparison loop out of POST /recon-correspondents and the test-seed `recon` hook into a single private `computeCorrespondentDrift()` helper. Schema additions to `correspondent_stats` now touch one site instead of two. - Migration 29 error log now includes the version + statement index so partial-failure diagnostics survive into post-deploy debugging. - bumpCorrespondentStatsForInsert is unchanged but verified in the review thread to be gated to non-correction inserts only (POST /signals hardcodes correction_of=NULL at the call site; PATCH /signals/:id inserts with correction_of=originalId and does not call bump). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed arc0btc review on a8f2238: [question] bumpCorrespondentStatsForInsert gating — confirmed gated to non-correction inserts only:
[suggestion] Drift-comparison duplication — extracted into a private [nit] Migration 29 error context — error log now reads Targeted suite ( |
Summary
correspondent_stats(one row per agent) maintained on signal-insert and bulk beat-deletion paths, with one-time backfill in migration 29.GROUP BY btc_addressover the fullsignalstable on every cache miss/SWR rebuild —/correspondents,/correspondents-bundle,/init's correspondents block, andqueryLeaderboard'sfirst_signal_atjoin — to read from the materialised aggregate.cloudflare-bill-reduction-tracker-2026-05.md— remainingNewsDOrows read (~202.7M/h on the trailing 24h, was 751.7M/h pre-fix: reduce NewsDO hot query scans #700 on Apr 30).Why
Inventory:
worker-logs/.planning/2026-05-03T0445Z-newsdo-rows-read-inventory.mdThree identical
SELECT btc_address, COUNT(*), MAX(created_at), COUNT(DISTINCT date(created_at)) FROM signals … GROUP BY btc_addressqueries fire from the public read paths. Each scans ~27.8K signal rows. The leaderboard'sMIN(created_at) GROUP BY btc_addressfirst-signal sub-select repeats the same scan. Per-call rows-read collapses from ~27.8K to ~430 (one row per agent) once these read fromcorrespondent_stats.This fix follows the audit's Issue A guidance directly: "maintain a denormalised counter table updated on insert". F1 already addressed the unbounded
COUNT(*)and(?N IS NULL OR …)shapes; B2 finishes the job by removing the remaining full-table aggregates.Expected metric movement
NewsDOrows read/correspondentsand/initcorrespondents blockfirst_signal_atsub-select scanValidation window: 24h post-deploy via Cloudflare GraphQL on the NewsDO namespace
1bb5fadefa414bf9b25563004ad12067.What's in the change
MIGRATION_CORRESPONDENT_STATS_SQLinschema.ts) —CREATE TABLE correspondent_statsplus anINSERT … SELECT … GROUP BY … ON CONFLICT DO UPDATEbackfill that runs once at cold start.bumpCorrespondentStatsForInsert+recomputeCorrespondentStatsForhelpers onNewsDO. The bump call covers the per-row hot path; the per-agent recompute is bounded by that agent's own signal history (typically 200–600 rows) and is reused for the bulk-delete path and the recon endpoint.POST /signalsafter the row insert (non-correction case).PATCH /signals/:idcorrection insert (correction rows don't bump aggregate columns; touch-onlyupdated_at).btc_addressvalues pre-delete and recomputes inside the sametransactionSync.news-do.ts:3443,:4893,:5047, and thequeryLeaderboardfirst-signal sub-select. Response shapes are preserved; downstream callers don't change.POST /api/config/recon-correspondents(publisher-only, BIP-322 viaverifyAuth) plus a thin CLI atscripts/recon-correspondent-stats.ts. The CLI is signing-agnostic: it accepts pre-signedBTC_ADDRESS/BTC_SIGNATURE/BTC_TIMESTAMPenv headers, matching the codebase's "BIP-322 client lives elsewhere" pattern.recon:correspondentsscript added topackage.json.src/__tests__/correspondent-stats.test.tscovering: single insert, same-day repeat, consecutive-day, correction insert, beat-delete decrement, and an end-to-end "materialised values match fresh aggregate" assertion through the public/correspondentsand/initreads.Rollback signal
If
/correspondents,/init, or the leaderboard begins returning incorrect counts/dates for a known active agent, revert the four read-site SHA hunks back to the inline aggregate. The materialised table can stay; the read sites just temporarily ignore it. The recon endpoint can also confirm drift before deciding.Notes / open question
days_activerecompute uses an inline per-agentCOUNT(DISTINCT date(...))— bounded by that agent's history rather than the full table, which is the win regardless. Considered a same-day-existence conditional and chose this for correctness clarity.statusupdates, which don't change aggregate columns. If a retraction path is added later, it should callrecomputeCorrespondentStatsFor.correspondent_stats.first_signal_atis "all-time first non-correction signal" — same semantic as the originalMIN(created_at). If a launch-reset epoch ever needs to gate the first-signal column, the materialised aggregate has no notion of epoch (and the original sub-select didn't either). Worth a thought during review if epoch semantics become first-class, but not a behaviour change here.Test plan
npm run typecheckpassesnpm test -- correspondent-stats signals retraction— 28 passed (3 files)/api/correspondents,/api/init,/api/leaderboardreturn display names + counts matching production current valuesNewsDOrows-read drops materially in 24h post-deploy windownpm run recon:correspondents -- --checkonce in production (publisher-only) to confirm no drift after a realPOST /signalslands🤖 Generated with Claude Code