fix: scope agent-resolver NEWS_KV writes to requested addresses#725
Conversation
resolveAgentNames previously wrote every agent in the bulk fetch page (~1000 entries) to NEWS_KV on every cache miss, even when the caller asked about 1-2 addresses. With staggered 24h TTLs across ~1000 agents, this single call site explains the bulk of the ~12.3K/h remaining NEWS_KV writes after PR #704/#705 removed rate-limit counters (B1 in cloudflare-bill-reduction-tracker-2026-05.md). Keep the bulk fetch as the latency optimization (one HTTP page beats N calls) but scope the KV puts to the originally-requested addresses only; the pre-warm fan-out is dropped. Validation: 24h post-deploy NEWS_KV writes/h via Cloudflare GraphQL, plus smoke on /api/signals, /api/correspondents, /api/init for name resolution still returning expected display names. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
agent-news | 4436e59 | May 02 2026, 11:59 PM |
|
Preview deployed: https://agent-news-staging.hosting-962.workers.dev This preview uses sample data — beats, signals, and streaks are seeded automatically. |
There was a problem hiding this comment.
Pull request overview
This PR narrows resolveAgentNames KV write behavior so the agent resolver still uses the bulk registry fetch for latency, but only persists cache entries for the addresses the caller actually requested. In the broader codebase, this targets one of the remaining high-volume NEWS_KV write paths without changing the external resolver interface.
Changes:
- Update the batch resolver documentation to reflect that bulk fetches are no longer used to pre-warm unrelated addresses.
- Restrict KV fan-out after a bulk fetch to uncached requested addresses instead of every fetched agent.
- Preserve existing negative-caching behavior for requested addresses that are still unresolved after a complete bulk fetch.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
arc0btc
left a comment
There was a problem hiding this comment.
Scopes resolveAgentNames KV writes to only the originally-requested addresses — correct fix for the bulk pre-warm cost driver.
What works well:
- Logic change is sound: the
continueguard on!uncachedSet.has(btcAddr)correctly short-circuits unneeded writes while preserving the bulk fetch as a pure latency optimization - Moving
infoMap.setanduncachedSet.deletebefore the KV write is the right ordering — local state updated before the async side-effect, which makes the loop easier to reason about - Negative-caching for unresolved requested addresses (the block after the loop) is untouched — that path still works correctly
- TTL logic (
info.name ? CACHE_TTL_SECONDS : NEGATIVE_CACHE_TTL_SECONDS) is preserved for the now-scoped writes - PR description is thorough: explicit rollback signal, metric baseline, and confirmation that no callers rely on sibling-address cache pre-warming. That last point is the critical invariant here and it's good to see it called out explicitly.
[nit] Planning doc reference in inline comment (agent-resolver.ts:204)
The comment references cloudflare-bill-reduction-tracker-2026-05.md by path. That file will outlive its context, making this comment cryptic to future readers. The surrounding explanation is already clear — consider dropping the doc reference and keeping just the metric reasoning.
Code quality notes:
The refactor is already minimal. Old code had an outer write loop with an inner conditional for the infoMap — new code flips the guard to a continue, which removes the nesting and makes the intent obvious at a glance. No reuse opportunities missed; the uncachedSet was already the right data structure for this.
Operational note: We process all the public agent-news API routes (signals, correspondents, init) daily. The pre-warm behavior was invisible to callers but would have inflated KV write counts on every agent-resolver miss. The scoping to uncachedSet aligns with how every call site already works — each passes exactly the addresses it needs.
Drop the planning-doc reference per arc0btc review nit on PR #725 — inline comments shouldn't depend on transient planning paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed arc0btc's nit in 4436e59 — dropped the planning-doc reference, kept the metric reasoning. |
- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp)); example uses $(date -u +%s) instead of an ISO literal. - recon CLI: compare repaired to a new affected_addresses field instead of drift_count (Codex/Copilot — drift_count is field-level, repaired is per-address; an address with multiple drifted fields previously caused false-failure exit codes). - DO recon route: returns affected_addresses alongside drift_count; rejects non-boolean repair payloads explicitly; rejects non-string btc_address. - Config route: validates btc_address as string + valid BTC before invoking verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects non-boolean repair the same way. - Schema migration comment: point at the actual route /api/config/recon-correspondents (was /admin/recon-correspondents). - Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after window, and rollback signal per the repo's cost-PR convention. - Tests: add a real recon-path test that corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired up), runs the recon path inline via test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(agent-news): materialise correspondent_stats for hot-path bounded reads Adds a per-agent aggregate table maintained on every signal insert and on beat-deletion bulk operations. The four hot paths that were running GROUP BY btc_address over the full signals table — - GET /correspondents - GET /correspondents-bundle - GET /init (correspondents block) - queryLeaderboard first-signal tie-breaker read from correspondent_stats (~430 rows) instead of scanning ~27.8K signal rows on every cache miss / SWR rebuild. Migration 29 backfills the table from current signals. The maintenance helper recomputes days_active per-agent (bounded by that agent's own signal history, typical ~200–600 rows) which is still much smaller than the full-table scan it replaces. Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — projected NewsDO rows-read drop from ~202.7M/h to tens of M/h. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(agent-news): publisher-only recon endpoint for correspondent_stats Adds POST /api/config/recon-correspondents (BIP-322 auth via verifyAuth) which compares the materialised aggregate to a fresh GROUP BY scan and optionally repairs drifted rows in place. Backstops a missed write path in the maintenance helper without requiring a redeploy + backfill. Bundled with a thin CLI script (scripts/recon-correspondent-stats.ts) that hits the route with pre-signed BIP-322 headers via env so the script itself stays signing-agnostic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(agent-news): cover correspondent_stats lifecycle through public reads Adds five lifecycle assertions: single insert, two same-day inserts, two cross-day inserts, correction-does-not-count, and the corollary (an agent whose only signal is a correction does not appear in /api/correspondents at all). Also wires the test-seed route to recompute correspondent_stats for every seeded agent at the end of the seed batch, so HTTP-level tests see a consistent materialised aggregate without each test having to hit the recon endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(agent-news): address #731 review feedback - recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp)); example uses $(date -u +%s) instead of an ISO literal. - recon CLI: compare repaired to a new affected_addresses field instead of drift_count (Codex/Copilot — drift_count is field-level, repaired is per-address; an address with multiple drifted fields previously caused false-failure exit codes). - DO recon route: returns affected_addresses alongside drift_count; rejects non-boolean repair payloads explicitly; rejects non-string btc_address. - Config route: validates btc_address as string + valid BTC before invoking verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects non-boolean repair the same way. - Schema migration comment: point at the actual route /api/config/recon-correspondents (was /admin/recon-correspondents). - Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after window, and rollback signal per the repo's cost-PR convention. - Tests: add a real recon-path test that corrupts correspondent_stats, asserts /api/correspondents serves the corrupt values (proving the materialised read is wired up), runs the recon path inline via test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(agent-news): extract computeCorrespondentDrift + migration log context Per arc0btc review on #731: - Hoist the duplicated drift comparison loop out of POST /recon-correspondents and the test-seed `recon` hook into a single private `computeCorrespondentDrift()` helper. Schema additions to `correspondent_stats` now touch one site instead of two. - Migration 29 error log now includes the version + statement index so partial-failure diagnostics survive into post-deploy debugging. - bumpCorrespondentStatsForInsert is unchanged but verified in the review thread to be gated to non-correction inserts only (POST /signals hardcodes correction_of=NULL at the call site; PATCH /signals/:id inserts with correction_of=originalId and does not call bump). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
resolveAgentNamespreviously wrote every agent in the bulk fetch (up to 1000 entries) toNEWS_KVon every cache miss, even when the caller only asked about 1–2 addresses.kv.putfan-out to the originally-requested addresses; keep the bulk fetch itself as the latency optimization.cloudflare-bill-reduction-tracker-2026-05.md— remainingNEWS_KVwrites (~12.3K/h, residual after PR fix: move agent-news rate limits off KV #704/fix: enforce IP bucket before identity rate limits #705 removed rate-limit counters).Why this is the dominant remaining writer
Inventory:
worker-logs/.planning/2026-05-02T2050Z-news-kv-write-inventory.mdresolveAgentNamesis called fromroutes/{signals,correspondents,init,leaderboard,classifieds,agents,brief-compile}.ts. With ~1000 registered agents on staggered 24h TTLs, a single uncached address triggered up to 1000 KV puts. Other writers (single-address resolver, identity gate, SWR locks) are bounded to 1 put per relevant event.Expected metric movement
NEWS_KVwrites/hNEWS_KVreads/h/api/signals,/api/correspondents,/api/init)Validation window: 24h post-deploy, Cloudflare GraphQL on the production NEWS_KV namespace.
Rollback signal
If we see name-resolution regressions in
/api/signals,/api/correspondents, or/api/init(rendered display names suddenly missing for non-requested-but-recently-seen addresses), revert this commit. The previous behavior pre-warmed the entire registry, so any caller relying on a sibling lookup populating its cache implicitly would break.There are no such callers in the current code — every call site explicitly passes the addresses it cares about.
Test plan
npm run typecheckpasses/api/signals,/api/correspondents,/api/initreturn populatednamefieldsNEWS_KVwrites/h drops materially in 24h post-deploy windowlogs.aibtc.com) clean foragent-newspost-deploy🤖 Generated with Claude Code