fix: materialise correspondent_stats for hot-path bounded reads by whoabuddy · Pull Request #731 · aibtcdev/agent-news

whoabuddy · 2026-05-03T06:18:18Z

Summary

Adds correspondent_stats (one row per agent) maintained on signal-insert and bulk beat-deletion paths, with one-time backfill in migration 29.
Rewrites the four hot read sites that previously did GROUP BY btc_address over the full signals table on every cache miss/SWR rebuild — /correspondents, /correspondents-bundle, /init's correspondents block, and queryLeaderboard's first_signal_at join — to read from the materialised aggregate.
Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — remaining NewsDO rows read (~202.7M/h on the trailing 24h, was 751.7M/h pre-fix: reduce NewsDO hot query scans #700 on Apr 30).

Why

Inventory: worker-logs/.planning/2026-05-03T0445Z-newsdo-rows-read-inventory.md

Three identical SELECT btc_address, COUNT(*), MAX(created_at), COUNT(DISTINCT date(created_at)) FROM signals … GROUP BY btc_address queries fire from the public read paths. Each scans ~27.8K signal rows. The leaderboard's MIN(created_at) GROUP BY btc_address first-signal sub-select repeats the same scan. Per-call rows-read collapses from ~27.8K to ~430 (one row per agent) once these read from correspondent_stats.

This fix follows the audit's Issue A guidance directly: "maintain a denormalised counter table updated on insert". F1 already addressed the unbounded COUNT(*) and (?N IS NULL OR …) shapes; B2 finishes the job by removing the remaining full-table aggregates.

Expected metric movement

Metric	Pre-PR (24h trailing)	Target
`NewsDO` rows read	~202.7M/h	tens of M/h
Per-call scanned rows on `/correspondents` and `/init` correspondents block	~27.8K	~430
Leaderboard `first_signal_at` sub-select scan	~27.8K	~430
Endpoint correctness	100%	100% (lifecycle test asserts materialised values match a fresh aggregate)

Validation window: 24h post-deploy via Cloudflare GraphQL on the NewsDO namespace 1bb5fadefa414bf9b25563004ad12067.

What's in the change

Migration 29 (MIGRATION_CORRESPONDENT_STATS_SQL in schema.ts) — CREATE TABLE correspondent_stats plus an INSERT … SELECT … GROUP BY … ON CONFLICT DO UPDATE backfill that runs once at cold start.
bumpCorrespondentStatsForInsert + recomputeCorrespondentStatsFor helpers on NewsDO. The bump call covers the per-row hot path; the per-agent recompute is bounded by that agent's own signal history (typically 200–600 rows) and is reused for the bulk-delete path and the recon endpoint.
Maintenance call sites:
- POST /signals after the row insert (non-correction case).
- PATCH /signals/:id correction insert (correction rows don't bump aggregate columns; touch-only updated_at).
- Bulk beat-deletion path captures affected btc_address values pre-delete and recomputes inside the same transactionSync.
Read-site rewrites in news-do.ts:3443, :4893, :5047, and the queryLeaderboard first-signal sub-select. Response shapes are preserved; downstream callers don't change.
Recon endpoint POST /api/config/recon-correspondents (publisher-only, BIP-322 via verifyAuth) plus a thin CLI at scripts/recon-correspondent-stats.ts. The CLI is signing-agnostic: it accepts pre-signed BTC_ADDRESS / BTC_SIGNATURE / BTC_TIMESTAMP env headers, matching the codebase's "BIP-322 client lives elsewhere" pattern. recon:correspondents script added to package.json.
Tests in src/__tests__/correspondent-stats.test.ts covering: single insert, same-day repeat, consecutive-day, correction insert, beat-delete decrement, and an end-to-end "materialised values match fresh aggregate" assertion through the public /correspondents and /init reads.

Rollback signal

If /correspondents, /init, or the leaderboard begins returning incorrect counts/dates for a known active agent, revert the four read-site SHA hunks back to the inline aggregate. The materialised table can stay; the read sites just temporarily ignore it. The recon endpoint can also confirm drift before deciding.

Notes / open question

days_active recompute uses an inline per-agent COUNT(DISTINCT date(...)) — bounded by that agent's history rather than the full table, which is the win regardless. Considered a same-day-existence conditional and chose this for correctness clarity.
No retraction-of-signal path exists in this codebase today; only status updates, which don't change aggregate columns. If a retraction path is added later, it should call recomputeCorrespondentStatsFor.
correspondent_stats.first_signal_at is "all-time first non-correction signal" — same semantic as the original MIN(created_at). If a launch-reset epoch ever needs to gate the first-signal column, the materialised aggregate has no notion of epoch (and the original sub-select didn't either). Worth a thought during review if epoch semantics become first-class, but not a behaviour change here.

Test plan

npm run typecheck passes
npm test -- correspondent-stats signals retraction — 28 passed (3 files)
post-deploy smoke: /api/correspondents, /api/init, /api/leaderboard return display names + counts matching production current values
Cloudflare GraphQL: NewsDO rows-read drops materially in 24h post-deploy window
Run npm run recon:correspondents -- --check once in production (publisher-only) to confirm no drift after a real POST /signals lands

🤖 Generated with Claude Code

…d reads Adds a per-agent aggregate table maintained on every signal insert and on beat-deletion bulk operations. The four hot paths that were running GROUP BY btc_address over the full signals table — - GET /correspondents - GET /correspondents-bundle - GET /init (correspondents block) - queryLeaderboard first-signal tie-breaker read from correspondent_stats (~430 rows) instead of scanning ~27.8K signal rows on every cache miss / SWR rebuild. Migration 29 backfills the table from current signals. The maintenance helper recomputes days_active per-agent (bounded by that agent's own signal history, typical ~200–600 rows) which is still much smaller than the full-table scan it replaces. Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — projected NewsDO rows-read drop from ~202.7M/h to tens of M/h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds POST /api/config/recon-correspondents (BIP-322 auth via verifyAuth) which compares the materialised aggregate to a fresh GROUP BY scan and optionally repairs drifted rows in place. Backstops a missed write path in the maintenance helper without requiring a redeploy + backfill. Bundled with a thin CLI script (scripts/recon-correspondent-stats.ts) that hits the route with pre-signed BIP-322 headers via env so the script itself stays signing-agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…reads Adds five lifecycle assertions: single insert, two same-day inserts, two cross-day inserts, correction-does-not-count, and the corollary (an agent whose only signal is a correction does not appear in /api/correspondents at all). Also wires the test-seed route to recompute correspondent_stats for every seeded agent at the end of the seed batch, so HTTP-level tests see a consistent materialised aggregate without each test having to hit the recon endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-05-03T06:18:25Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Updated (UTC)
✅ Deployment successful! View logs	agent-news	`a8f2238`	May 03 2026, 08:16 AM

github-actions · 2026-05-03T06:18:53Z

Preview deployed: https://agent-news-staging.hosting-962.workers.dev

This preview uses sample data — beats, signals, and streaks are seeded automatically.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1536b1743d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T06:24:41Z

+ *   BASE_URL              — e.g. https://aibtc.news (or staging URL)
+ *   BTC_ADDRESS           — Publisher BTC address
+ *   BTC_SIGNATURE         — BIP-322 signature for "POST /api/config/recon-correspondents" challenge
+ *   BTC_TIMESTAMP         — ISO timestamp used in the signed challenge


Use Unix seconds for BTC_TIMESTAMP in recon script docs

The script documents BTC_TIMESTAMP as an ISO string and the usage example provides 2026-05-03T12:00:00Z, but auth verification parses the header with Number(timestamp) and requires a Unix-seconds value (verifyTimestamp in src/services/auth.ts). Following the current docs makes the recon call fail with timestamp/auth errors, so operators cannot run the tool as documented.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-03T06:24:41Z

+    }
+  }
+
+  process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3);


Compare repaired count against drifted addresses, not fields

The exit condition treats repaired === drift_count as success, but drift_count is the number of mismatched fields while repaired is the number of drifted addresses recomputed by the DO. If one address has multiple mismatched fields, repair can fully succeed yet the script exits with code 3, causing false failures in automation.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR introduces a materialized correspondent_stats aggregate to remove repeated full-table signals aggregations from hot read paths in NewsDO. It fits the recent Cloudflare cost-reduction work by shifting correspondent/leaderboard reads from on-demand GROUP BY btc_address scans to bounded reads over a maintained per-agent summary table.

Changes:

Adds migration 29 to create/backfill correspondent_stats and wires maintenance into signal insert, beat deletion, and test-seed paths.
Rewrites correspondent-related read paths and the leaderboard tenure join to read from the materialized aggregate instead of scanning signals.
Adds a publisher-only recon endpoint, a CLI helper, and focused tests around the new aggregate behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
`src/routes/config.ts`	Adds the public publisher-gated `/api/config/recon-correspondents` proxy route.
`src/objects/schema.ts`	Defines migration 29 for `correspondent_stats` creation, indexes, and backfill.
`src/objects/news-do.ts`	Maintains `correspondent_stats`, adds DO recon route, and swaps hot reads to the aggregate.
`src/__tests__/correspondent-stats.test.ts`	Adds feature-focused tests for aggregate lifecycle behavior.
`scripts/recon-correspondent-stats.ts`	Adds a CLI wrapper for drift check/repair against the new recon endpoint.
`package.json`	Adds the `recon:correspondents` npm script entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+/**
+ * MIGRATION_CORRESPONDENT_STATS_SQL — materialised per-agent aggregates.
+ *
+ * Replaces the ~27.8K-row `GROUP BY btc_address` scans in /correspondents,
+ * /correspondents-bundle, /init's correspondents block, and the leaderboard's
+ * first-signal sub-select with bounded ~430-row reads (one row per agent).
+ *
+ * Maintained on every signal insert via bumpCorrespondentStatsForInsert; on
+ * beat deletion (which bulk-deletes signals) the affected agents are
+ * recomputed in place. Drift is reconciled by /admin/recon-correspondents.
+ */


+ *
+ * Maintained on every signal insert via bumpCorrespondentStatsForInsert; on
+ * beat deletion (which bulk-deletes signals) the affected agents are
+ * recomputed in place. Drift is reconciled by /admin/recon-correspondents.


+describe("correspondent_stats — recon endpoint reports zero drift after seed", () => {
+  it("expected_rows matches actual_rows after the recompute helper runs", async () => {


+  const auth = verifyAuth(
+    c.req.raw.headers,
+    btc_address as string,


+      const { btc_address, repair } = body as { btc_address?: string; repair?: boolean };
+      if (!btc_address) {
+        return c.json(
+          { ok: false, error: "Missing required field: btc_address" } satisfies DOResult<unknown>,
+          400
+        );
+      }


+  const { expected_rows, actual_rows, drift_count, drift, repaired } = json.data;
+  console.log(`expected_rows: ${expected_rows}`);
+  console.log(`actual_rows:   ${actual_rows}`);
+  console.log(`drift_count:   ${drift_count}`);
+  console.log(`repaired:      ${repaired}`);
+
+  if (drift_count > 0) {
+    console.log("\nDrift entries:");
+    for (const d of drift) {
+      console.log(
+        `  ${d.btc_address.slice(0, 12)}…  ${d.field}: expected=${JSON.stringify(d.expected)} actual=${JSON.stringify(d.actual)}`
+      );
+    }
+  }
+
+  process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3);


+ * Required env:
+ *   BASE_URL              — e.g. https://aibtc.news (or staging URL)
+ *   BTC_ADDRESS           — Publisher BTC address
+ *   BTC_SIGNATURE         — BIP-322 signature for "POST /api/config/recon-correspondents" challenge
+ *   BTC_TIMESTAMP         — ISO timestamp used in the signed challenge
+ *
+ * Optional flags:
+ *   --repair              — recompute drifted rows in place (default: report only)
+ *
+ * Usage:
+ *   BASE_URL=https://aibtc.news \
+ *   BTC_ADDRESS=bc1q... \
+ *   BTC_SIGNATURE=... \
+ *   BTC_TIMESTAMP=2026-05-03T12:00:00Z \
+ *   npm run recon:correspondents -- --repair


- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp)); example uses $(date -u +%s) instead of an ISO literal. - recon CLI: compare repaired to a new affected_addresses field instead of drift_count (Codex/Copilot — drift_count is field-level, repaired is per-address; an address with multiple drifted fields previously caused false-failure exit codes). - DO recon route: returns affected_addresses alongside drift_count; rejects non-boolean repair payloads explicitly; rejects non-string btc_address. - Config route: validates btc_address as string + valid BTC before invoking verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects non-boolean repair the same way. - Schema migration comment: point at the actual route /api/config/recon-correspondents (was /admin/recon-correspondents). - Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after window, and rollback signal per the repo's cost-PR convention. - Tests: add a real recon-path test that corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired up), runs the recon path inline via test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

whoabuddy · 2026-05-03T06:33:38Z

Addressed Copilot + Codex review feedback in f984a88:

recon CLI BTC_TIMESTAMP documented as Unix seconds (matches auth.ts Number(timestamp)); example uses $(date -u +%s).
recon CLI exit logic compares repaired against the new affected_addresses field (per-address) instead of drift_count (field-level) — fixes the false-failure case where one address has multiple drifted fields.
DO recon endpoint returns affected_addresses alongside drift_count; rejects non-boolean repair and non-string btc_address explicitly.
/api/config/recon-correspondents validates btc_address as a valid BTC address before calling verifyAuth (no more 500 from .toLowerCase() on a non-string), and rejects non-boolean repair.
Stale /admin/recon-correspondents reference in migration 29's doc comment now points at the actual POST /api/config/recon-correspondents route.
docs/cloudflare-cost-runbook.md updated with B1 (fix: scope agent-resolver NEWS_KV writes to requested addresses #725) and B2 (this PR) entries — metric, before/after window, rollback signal — per the repo's cost-PR convention.
New test exercises the recon path end-to-end: seeds signals, corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired), runs recon inline via test-seed, and asserts repaired === affected_addresses after --repair.

Targeted suite: 34/34 pass (correspondent-stats config signals retraction).

arc0btc

Materialises correspondent_stats to replace the four hot-path GROUP BY btc_address full-table scans with bounded ~430-row reads. The design is correct and the implementation is solid — good follow-through on the B2 audit item.

What works well:

The incremental vs. full-recompute split is the right call: bumpCorrespondentStatsForInsert handles the common case cheaply; recomputeCorrespondentStatsFor handles beat deletion (rare, bounded by affected agents). Clean separation.
Beat-deletion path captures affected addresses before DELETE FROM signals and recomputes inside the same transactionSync — this is the correct ordering and avoids a window where the read sites would serve stale data mid-transaction.
Migration 29 backfill is idempotent (ON CONFLICT DO UPDATE) and self-contained. Cold-start safety is handled.
Recon endpoint is well-scoped: Publisher-only BIP-322 auth, btc_address validated before it reaches verifyAuth, non-boolean repair explicitly rejected. The CLI exit-code logic (compare repaired to affected_addresses, not drift_count) is correct and the commit message explains why.
Tests cover the key lifecycle events: single insert, same-day, cross-day, correction exclusion, and the drift-detect/repair round-trip through the live read surface. The test-seed recompute hook is a clean way to keep test state consistent without needing BIP-322 in tests.

[question] Is bumpCorrespondentStatsForInsert guarded for correction signals?
The PR description says it's called on the "non-correction case" and the tests confirm corrections don't inflate the aggregate, but the diff doesn't show the surrounding conditional that gates this call. If a correction insert (via PATCH /signals/:id) hits the same code path, signal_count would be over-counted (even though days_active would still compute correctly from the correction_of IS NULL sub-select). Confirming this is gated at the insertion site would close the loop.

[suggestion] Drift-comparison logic is duplicated (news-do.ts)
The loop that builds the drift array and computes driftedAddresses appears verbatim in both the POST /recon-correspondents DO handler and the body.recon block in POST /test-seed. That's ~60 lines. Extracting a private computeCorrespondentDrift() method that returns { drift, driftedAddresses } would eliminate the duplication and make both call sites easier to maintain if the schema gains new columns.

[nit] Migration 29 error swallowing
The migration catch block silences errors that don't include "already exists" with a console.error but no version context. Minor, but it makes post-deploy debugging harder if a statement fails partway through:

console.error(`Correspondent stats migration (v29, stmt ): `, e);

Code quality notes:

The days_active per-agent subquery in bumpCorrespondentStatsForInsert correctly runs after the new row is committed, so the count includes the incoming signal. Good.
recomputeCorrespondentStatsFor correctly deletes the row when count === 0 (agent's last non-correction signal was removed by beat deletion). That case is easy to miss.
first_signal_at semantics (all-time first non-correction signal, no epoch notion) are correctly documented in the PR notes. The materialised value has the same semantics as the original MIN(created_at) sub-select.

Operational context: We file signals to aibtc-network, bitcoin-macro, and quantum on this platform — we're one of the ~430 rows this table will serve. The leaderboard tie-breaker using first_signal_at from correspondent_stats directly affects our ranking on score ties, so keeping the materialised value accurate matters beyond just cost reduction. The recon CLI is a good safety net; we'll run it after the first production signal post-deploy.

…og context Per arc0btc review on #731: - Hoist the duplicated drift comparison loop out of POST /recon-correspondents and the test-seed `recon` hook into a single private `computeCorrespondentDrift()` helper. Schema additions to `correspondent_stats` now touch one site instead of two. - Migration 29 error log now includes the version + statement index so partial-failure diagnostics survive into post-deploy debugging. - bumpCorrespondentStatsForInsert is unchanged but verified in the review thread to be gated to non-correction inserts only (POST /signals hardcodes correction_of=NULL at the call site; PATCH /signals/:id inserts with correction_of=originalId and does not call bump). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

whoabuddy · 2026-05-03T08:16:37Z

Addressed arc0btc review on a8f2238:

[question] bumpCorrespondentStatsForInsert gating — confirmed gated to non-correction inserts only:

POST /signals is the only call site (line 2719). The INSERT INTO signals immediately above hardcodes correction_of NULL (line 2704), so the bump never sees a correction signal.
PATCH /signals/:id inserts a separate row with correction_of = originalId and does not call bump. Corrections never inflate signal_count.

[suggestion] Drift-comparison duplication — extracted into a private computeCorrespondentDrift() helper that returns { expected_rows, actual_rows, drift, driftedAddresses }. The publisher recon endpoint and the test-seed recon hook now share the same code path; future schema additions to correspondent_stats touch one place instead of two. Net diff: -29 LOC.

[nit] Migration 29 error context — error log now reads Correspondent stats migration (v29, stmt N) failed: so partial-failure diagnostics carry the version + statement index.

Targeted suite (correspondent-stats config): 11/11 pass.

whoabuddy and others added 3 commits May 2, 2026 23:07

Copilot AI review requested due to automatic review settings May 3, 2026 06:18

Copilot started reviewing on behalf of whoabuddy May 3, 2026 06:18 View session

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

Copilot AI reviewed May 3, 2026

View reviewed changes

arc0btc approved these changes May 3, 2026

View reviewed changes

whoabuddy merged commit b589771 into main May 3, 2026
7 checks passed

whoabuddy deleted the fix/correspondent-stats-materialized branch May 3, 2026 14:14

github-actions Bot mentioned this pull request May 3, 2026

chore(main): release agent-news 1.29.0 #692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: materialise correspondent_stats for hot-path bounded reads#731

fix: materialise correspondent_stats for hot-path bounded reads#731
whoabuddy merged 5 commits into
mainfrom
fix/correspondent-stats-materialized

whoabuddy commented May 3, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 3, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

whoabuddy commented May 3, 2026

Uh oh!

arc0btc left a comment

Uh oh!

whoabuddy commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		describe("correspondent_stats — recon endpoint reports zero drift after seed", () => {
		it("expected_rows matches actual_rows after the recompute helper runs", async () => {

Conversation

whoabuddy commented May 3, 2026

Summary

Why

Expected metric movement

What's in the change

Rollback signal

Notes / open question

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

github-actions Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

whoabuddy commented May 3, 2026

Uh oh!

arc0btc left a comment

Choose a reason for hiding this comment

Uh oh!

whoabuddy commented May 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cloudflare-workers-and-pages Bot commented May 3, 2026 •

edited

Loading

github-actions Bot commented May 3, 2026 •

edited

Loading