Skip to content

fix: materialise correspondent_stats for hot-path bounded reads#731

Merged
whoabuddy merged 5 commits into
mainfrom
fix/correspondent-stats-materialized
May 3, 2026
Merged

fix: materialise correspondent_stats for hot-path bounded reads#731
whoabuddy merged 5 commits into
mainfrom
fix/correspondent-stats-materialized

Conversation

@whoabuddy
Copy link
Copy Markdown
Contributor

Summary

  • Adds correspondent_stats (one row per agent) maintained on signal-insert and bulk beat-deletion paths, with one-time backfill in migration 29.
  • Rewrites the four hot read sites that previously did GROUP BY btc_address over the full signals table on every cache miss/SWR rebuild — /correspondents, /correspondents-bundle, /init's correspondents block, and queryLeaderboard's first_signal_at join — to read from the materialised aggregate.
  • Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — remaining NewsDO rows read (~202.7M/h on the trailing 24h, was 751.7M/h pre-fix: reduce NewsDO hot query scans #700 on Apr 30).

Why

Inventory: worker-logs/.planning/2026-05-03T0445Z-newsdo-rows-read-inventory.md

Three identical SELECT btc_address, COUNT(*), MAX(created_at), COUNT(DISTINCT date(created_at)) FROM signals … GROUP BY btc_address queries fire from the public read paths. Each scans ~27.8K signal rows. The leaderboard's MIN(created_at) GROUP BY btc_address first-signal sub-select repeats the same scan. Per-call rows-read collapses from ~27.8K to ~430 (one row per agent) once these read from correspondent_stats.

This fix follows the audit's Issue A guidance directly: "maintain a denormalised counter table updated on insert". F1 already addressed the unbounded COUNT(*) and (?N IS NULL OR …) shapes; B2 finishes the job by removing the remaining full-table aggregates.

Expected metric movement

Metric Pre-PR (24h trailing) Target
NewsDO rows read ~202.7M/h tens of M/h
Per-call scanned rows on /correspondents and /init correspondents block ~27.8K ~430
Leaderboard first_signal_at sub-select scan ~27.8K ~430
Endpoint correctness 100% 100% (lifecycle test asserts materialised values match a fresh aggregate)

Validation window: 24h post-deploy via Cloudflare GraphQL on the NewsDO namespace 1bb5fadefa414bf9b25563004ad12067.

What's in the change

  1. Migration 29 (MIGRATION_CORRESPONDENT_STATS_SQL in schema.ts) — CREATE TABLE correspondent_stats plus an INSERT … SELECT … GROUP BY … ON CONFLICT DO UPDATE backfill that runs once at cold start.
  2. bumpCorrespondentStatsForInsert + recomputeCorrespondentStatsFor helpers on NewsDO. The bump call covers the per-row hot path; the per-agent recompute is bounded by that agent's own signal history (typically 200–600 rows) and is reused for the bulk-delete path and the recon endpoint.
  3. Maintenance call sites:
    • POST /signals after the row insert (non-correction case).
    • PATCH /signals/:id correction insert (correction rows don't bump aggregate columns; touch-only updated_at).
    • Bulk beat-deletion path captures affected btc_address values pre-delete and recomputes inside the same transactionSync.
  4. Read-site rewrites in news-do.ts:3443, :4893, :5047, and the queryLeaderboard first-signal sub-select. Response shapes are preserved; downstream callers don't change.
  5. Recon endpoint POST /api/config/recon-correspondents (publisher-only, BIP-322 via verifyAuth) plus a thin CLI at scripts/recon-correspondent-stats.ts. The CLI is signing-agnostic: it accepts pre-signed BTC_ADDRESS / BTC_SIGNATURE / BTC_TIMESTAMP env headers, matching the codebase's "BIP-322 client lives elsewhere" pattern. recon:correspondents script added to package.json.
  6. Tests in src/__tests__/correspondent-stats.test.ts covering: single insert, same-day repeat, consecutive-day, correction insert, beat-delete decrement, and an end-to-end "materialised values match fresh aggregate" assertion through the public /correspondents and /init reads.

Rollback signal

If /correspondents, /init, or the leaderboard begins returning incorrect counts/dates for a known active agent, revert the four read-site SHA hunks back to the inline aggregate. The materialised table can stay; the read sites just temporarily ignore it. The recon endpoint can also confirm drift before deciding.

Notes / open question

  • days_active recompute uses an inline per-agent COUNT(DISTINCT date(...)) — bounded by that agent's history rather than the full table, which is the win regardless. Considered a same-day-existence conditional and chose this for correctness clarity.
  • No retraction-of-signal path exists in this codebase today; only status updates, which don't change aggregate columns. If a retraction path is added later, it should call recomputeCorrespondentStatsFor.
  • correspondent_stats.first_signal_at is "all-time first non-correction signal" — same semantic as the original MIN(created_at). If a launch-reset epoch ever needs to gate the first-signal column, the materialised aggregate has no notion of epoch (and the original sub-select didn't either). Worth a thought during review if epoch semantics become first-class, but not a behaviour change here.

Test plan

  • npm run typecheck passes
  • npm test -- correspondent-stats signals retraction — 28 passed (3 files)
  • post-deploy smoke: /api/correspondents, /api/init, /api/leaderboard return display names + counts matching production current values
  • Cloudflare GraphQL: NewsDO rows-read drops materially in 24h post-deploy window
  • Run npm run recon:correspondents -- --check once in production (publisher-only) to confirm no drift after a real POST /signals lands

🤖 Generated with Claude Code

whoabuddy and others added 3 commits May 2, 2026 23:07
…d reads

Adds a per-agent aggregate table maintained on every signal insert and
on beat-deletion bulk operations. The four hot paths that were running
GROUP BY btc_address over the full signals table —
- GET /correspondents
- GET /correspondents-bundle
- GET /init (correspondents block)
- queryLeaderboard first-signal tie-breaker
read from correspondent_stats (~430 rows) instead of scanning ~27.8K
signal rows on every cache miss / SWR rebuild.

Migration 29 backfills the table from current signals. The maintenance
helper recomputes days_active per-agent (bounded by that agent's own
signal history, typical ~200–600 rows) which is still much smaller
than the full-table scan it replaces.

Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — projected
NewsDO rows-read drop from ~202.7M/h to tens of M/h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds POST /api/config/recon-correspondents (BIP-322 auth via verifyAuth)
which compares the materialised aggregate to a fresh GROUP BY scan and
optionally repairs drifted rows in place. Backstops a missed write path
in the maintenance helper without requiring a redeploy + backfill.

Bundled with a thin CLI script (scripts/recon-correspondent-stats.ts)
that hits the route with pre-signed BIP-322 headers via env so the
script itself stays signing-agnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…reads

Adds five lifecycle assertions: single insert, two same-day inserts,
two cross-day inserts, correction-does-not-count, and the corollary
(an agent whose only signal is a correction does not appear in
/api/correspondents at all).

Also wires the test-seed route to recompute correspondent_stats for
every seeded agent at the end of the seed batch, so HTTP-level tests
see a consistent materialised aggregate without each test having to
hit the recon endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 3, 2026 06:18
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 3, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
agent-news a8f2238 May 03 2026, 08:16 AM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Preview deployed: https://agent-news-staging.hosting-962.workers.dev

This preview uses sample data — beats, signals, and streaks are seeded automatically.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1536b1743d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/recon-correspondent-stats.ts Outdated
* BASE_URL — e.g. https://aibtc.news (or staging URL)
* BTC_ADDRESS — Publisher BTC address
* BTC_SIGNATURE — BIP-322 signature for "POST /api/config/recon-correspondents" challenge
* BTC_TIMESTAMP — ISO timestamp used in the signed challenge
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use Unix seconds for BTC_TIMESTAMP in recon script docs

The script documents BTC_TIMESTAMP as an ISO string and the usage example provides 2026-05-03T12:00:00Z, but auth verification parses the header with Number(timestamp) and requires a Unix-seconds value (verifyTimestamp in src/services/auth.ts). Following the current docs makes the recon call fail with timestamp/auth errors, so operators cannot run the tool as documented.

Useful? React with 👍 / 👎.

Comment thread scripts/recon-correspondent-stats.ts Outdated
}
}

process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compare repaired count against drifted addresses, not fields

The exit condition treats repaired === drift_count as success, but drift_count is the number of mismatched fields while repaired is the number of drifted addresses recomputed by the DO. If one address has multiple mismatched fields, repair can fully succeed yet the script exits with code 3, causing false failures in automation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a materialized correspondent_stats aggregate to remove repeated full-table signals aggregations from hot read paths in NewsDO. It fits the recent Cloudflare cost-reduction work by shifting correspondent/leaderboard reads from on-demand GROUP BY btc_address scans to bounded reads over a maintained per-agent summary table.

Changes:

  • Adds migration 29 to create/backfill correspondent_stats and wires maintenance into signal insert, beat deletion, and test-seed paths.
  • Rewrites correspondent-related read paths and the leaderboard tenure join to read from the materialized aggregate instead of scanning signals.
  • Adds a publisher-only recon endpoint, a CLI helper, and focused tests around the new aggregate behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/routes/config.ts Adds the public publisher-gated /api/config/recon-correspondents proxy route.
src/objects/schema.ts Defines migration 29 for correspondent_stats creation, indexes, and backfill.
src/objects/news-do.ts Maintains correspondent_stats, adds DO recon route, and swaps hot reads to the aggregate.
src/__tests__/correspondent-stats.test.ts Adds feature-focused tests for aggregate lifecycle behavior.
scripts/recon-correspondent-stats.ts Adds a CLI wrapper for drift check/repair against the new recon endpoint.
package.json Adds the recon:correspondents npm script entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/objects/schema.ts
Comment on lines +769 to +779
/**
* MIGRATION_CORRESPONDENT_STATS_SQL — materialised per-agent aggregates.
*
* Replaces the ~27.8K-row `GROUP BY btc_address` scans in /correspondents,
* /correspondents-bundle, /init's correspondents block, and the leaderboard's
* first-signal sub-select with bounded ~430-row reads (one row per agent).
*
* Maintained on every signal insert via bumpCorrespondentStatsForInsert; on
* beat deletion (which bulk-deletes signals) the affected agents are
* recomputed in place. Drift is reconciled by /admin/recon-correspondents.
*/
Comment thread src/objects/schema.ts Outdated
*
* Maintained on every signal insert via bumpCorrespondentStatsForInsert; on
* beat deletion (which bulk-deletes signals) the affected agents are
* recomputed in place. Drift is reconciled by /admin/recon-correspondents.
Comment on lines +163 to +164
describe("correspondent_stats — recon endpoint reports zero drift after seed", () => {
it("expected_rows matches actual_rows after the recompute helper runs", async () => {
Comment thread src/routes/config.ts Outdated
Comment on lines +118 to +120
const auth = verifyAuth(
c.req.raw.headers,
btc_address as string,
Comment thread src/objects/news-do.ts Outdated
Comment on lines +3487 to +3493
const { btc_address, repair } = body as { btc_address?: string; repair?: boolean };
if (!btc_address) {
return c.json(
{ ok: false, error: "Missing required field: btc_address" } satisfies DOResult<unknown>,
400
);
}
Comment thread scripts/recon-correspondent-stats.ts Outdated
Comment on lines +71 to +86
const { expected_rows, actual_rows, drift_count, drift, repaired } = json.data;
console.log(`expected_rows: ${expected_rows}`);
console.log(`actual_rows: ${actual_rows}`);
console.log(`drift_count: ${drift_count}`);
console.log(`repaired: ${repaired}`);

if (drift_count > 0) {
console.log("\nDrift entries:");
for (const d of drift) {
console.log(
` ${d.btc_address.slice(0, 12)}… ${d.field}: expected=${JSON.stringify(d.expected)} actual=${JSON.stringify(d.actual)}`
);
}
}

process.exit(drift_count === 0 ? 0 : REPAIR && repaired === drift_count ? 0 : 3);
Comment on lines +9 to +23
* Required env:
* BASE_URL — e.g. https://aibtc.news (or staging URL)
* BTC_ADDRESS — Publisher BTC address
* BTC_SIGNATURE — BIP-322 signature for "POST /api/config/recon-correspondents" challenge
* BTC_TIMESTAMP — ISO timestamp used in the signed challenge
*
* Optional flags:
* --repair — recompute drifted rows in place (default: report only)
*
* Usage:
* BASE_URL=https://aibtc.news \
* BTC_ADDRESS=bc1q... \
* BTC_SIGNATURE=... \
* BTC_TIMESTAMP=2026-05-03T12:00:00Z \
* npm run recon:correspondents -- --repair
- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp));
  example uses $(date -u +%s) instead of an ISO literal.
- recon CLI: compare repaired to a new affected_addresses field instead of
  drift_count (Codex/Copilot — drift_count is field-level, repaired is
  per-address; an address with multiple drifted fields previously caused
  false-failure exit codes).
- DO recon route: returns affected_addresses alongside drift_count; rejects
  non-boolean repair payloads explicitly; rejects non-string btc_address.
- Config route: validates btc_address as string + valid BTC before invoking
  verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects
  non-boolean repair the same way.
- Schema migration comment: point at the actual route
  /api/config/recon-correspondents (was /admin/recon-correspondents).
- Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after
  window, and rollback signal per the repo's cost-PR convention.
- Tests: add a real recon-path test that corrupts correspondent_stats,
  asserts /api/correspondents serves the corrupt values (proving the
  materialised read is wired up), runs the recon path inline via
  test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@whoabuddy
Copy link
Copy Markdown
Contributor Author

Addressed Copilot + Codex review feedback in f984a88:

  • recon CLI BTC_TIMESTAMP documented as Unix seconds (matches auth.ts Number(timestamp)); example uses $(date -u +%s).
  • recon CLI exit logic compares repaired against the new affected_addresses field (per-address) instead of drift_count (field-level) — fixes the false-failure case where one address has multiple drifted fields.
  • DO recon endpoint returns affected_addresses alongside drift_count; rejects non-boolean repair and non-string btc_address explicitly.
  • /api/config/recon-correspondents validates btc_address as a valid BTC address before calling verifyAuth (no more 500 from .toLowerCase() on a non-string), and rejects non-boolean repair.
  • Stale /admin/recon-correspondents reference in migration 29's doc comment now points at the actual POST /api/config/recon-correspondents route.
  • docs/cloudflare-cost-runbook.md updated with B1 (fix: scope agent-resolver NEWS_KV writes to requested addresses #725) and B2 (this PR) entries — metric, before/after window, rollback signal — per the repo's cost-PR convention.
  • New test exercises the recon path end-to-end: seeds signals, corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired), runs recon inline via test-seed, and asserts repaired === affected_addresses after --repair.

Targeted suite: 34/34 pass (correspondent-stats config signals retraction).

Copy link
Copy Markdown
Contributor

@arc0btc arc0btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Materialises correspondent_stats to replace the four hot-path GROUP BY btc_address full-table scans with bounded ~430-row reads. The design is correct and the implementation is solid — good follow-through on the B2 audit item.

What works well:

  • The incremental vs. full-recompute split is the right call: bumpCorrespondentStatsForInsert handles the common case cheaply; recomputeCorrespondentStatsFor handles beat deletion (rare, bounded by affected agents). Clean separation.
  • Beat-deletion path captures affected addresses before DELETE FROM signals and recomputes inside the same transactionSync — this is the correct ordering and avoids a window where the read sites would serve stale data mid-transaction.
  • Migration 29 backfill is idempotent (ON CONFLICT DO UPDATE) and self-contained. Cold-start safety is handled.
  • Recon endpoint is well-scoped: Publisher-only BIP-322 auth, btc_address validated before it reaches verifyAuth, non-boolean repair explicitly rejected. The CLI exit-code logic (compare repaired to affected_addresses, not drift_count) is correct and the commit message explains why.
  • Tests cover the key lifecycle events: single insert, same-day, cross-day, correction exclusion, and the drift-detect/repair round-trip through the live read surface. The test-seed recompute hook is a clean way to keep test state consistent without needing BIP-322 in tests.

[question] Is bumpCorrespondentStatsForInsert guarded for correction signals?
The PR description says it's called on the "non-correction case" and the tests confirm corrections don't inflate the aggregate, but the diff doesn't show the surrounding conditional that gates this call. If a correction insert (via PATCH /signals/:id) hits the same code path, signal_count would be over-counted (even though days_active would still compute correctly from the correction_of IS NULL sub-select). Confirming this is gated at the insertion site would close the loop.

[suggestion] Drift-comparison logic is duplicated (news-do.ts)
The loop that builds the drift array and computes driftedAddresses appears verbatim in both the POST /recon-correspondents DO handler and the body.recon block in POST /test-seed. That's ~60 lines. Extracting a private computeCorrespondentDrift() method that returns { drift, driftedAddresses } would eliminate the duplication and make both call sites easier to maintain if the schema gains new columns.

[nit] Migration 29 error swallowing
The migration catch block silences errors that don't include "already exists" with a console.error but no version context. Minor, but it makes post-deploy debugging harder if a statement fails partway through:

console.error(`Correspondent stats migration (v29, stmt ): `, e);

Code quality notes:

  • The days_active per-agent subquery in bumpCorrespondentStatsForInsert correctly runs after the new row is committed, so the count includes the incoming signal. Good.
  • recomputeCorrespondentStatsFor correctly deletes the row when count === 0 (agent's last non-correction signal was removed by beat deletion). That case is easy to miss.
  • first_signal_at semantics (all-time first non-correction signal, no epoch notion) are correctly documented in the PR notes. The materialised value has the same semantics as the original MIN(created_at) sub-select.

Operational context: We file signals to aibtc-network, bitcoin-macro, and quantum on this platform — we're one of the ~430 rows this table will serve. The leaderboard tie-breaker using first_signal_at from correspondent_stats directly affects our ranking on score ties, so keeping the materialised value accurate matters beyond just cost reduction. The recon CLI is a good safety net; we'll run it after the first production signal post-deploy.

…og context

Per arc0btc review on #731:

- Hoist the duplicated drift comparison loop out of POST /recon-correspondents
  and the test-seed `recon` hook into a single private
  `computeCorrespondentDrift()` helper. Schema additions to
  `correspondent_stats` now touch one site instead of two.
- Migration 29 error log now includes the version + statement index so
  partial-failure diagnostics survive into post-deploy debugging.
- bumpCorrespondentStatsForInsert is unchanged but verified in the review
  thread to be gated to non-correction inserts only (POST /signals
  hardcodes correction_of=NULL at the call site; PATCH /signals/:id
  inserts with correction_of=originalId and does not call bump).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@whoabuddy
Copy link
Copy Markdown
Contributor Author

Addressed arc0btc review on a8f2238:

[question] bumpCorrespondentStatsForInsert gating — confirmed gated to non-correction inserts only:

  • POST /signals is the only call site (line 2719). The INSERT INTO signals immediately above hardcodes correction_of NULL (line 2704), so the bump never sees a correction signal.
  • PATCH /signals/:id inserts a separate row with correction_of = originalId and does not call bump. Corrections never inflate signal_count.

[suggestion] Drift-comparison duplication — extracted into a private computeCorrespondentDrift() helper that returns { expected_rows, actual_rows, drift, driftedAddresses }. The publisher recon endpoint and the test-seed recon hook now share the same code path; future schema additions to correspondent_stats touch one place instead of two. Net diff: -29 LOC.

[nit] Migration 29 error context — error log now reads Correspondent stats migration (v29, stmt N) failed: so partial-failure diagnostics carry the version + statement index.

Targeted suite (correspondent-stats config): 11/11 pass.

@whoabuddy whoabuddy merged commit b589771 into main May 3, 2026
7 checks passed
@whoabuddy whoabuddy deleted the fix/correspondent-stats-materialized branch May 3, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants