Skip to content

fix: scope agent-resolver NEWS_KV writes to requested addresses#725

Merged
whoabuddy merged 2 commits into
mainfrom
fix/agent-resolver-scoped-kv-writes
May 3, 2026
Merged

fix: scope agent-resolver NEWS_KV writes to requested addresses#725
whoabuddy merged 2 commits into
mainfrom
fix/agent-resolver-scoped-kv-writes

Conversation

@whoabuddy
Copy link
Copy Markdown
Contributor

Summary

  • resolveAgentNames previously wrote every agent in the bulk fetch (up to 1000 entries) to NEWS_KV on every cache miss, even when the caller only asked about 1–2 addresses.
  • Scope the kv.put fan-out to the originally-requested addresses; keep the bulk fetch itself as the latency optimization.
  • Targets B1 in cloudflare-bill-reduction-tracker-2026-05.md — remaining NEWS_KV writes (~12.3K/h, residual after PR fix: move agent-news rate limits off KV #704/fix: enforce IP bucket before identity rate limits #705 removed rate-limit counters).

Why this is the dominant remaining writer

Inventory: worker-logs/.planning/2026-05-02T2050Z-news-kv-write-inventory.md

resolveAgentNames is called from routes/{signals,correspondents,init,leaderboard,classifieds,agents,brief-compile}.ts. With ~1000 registered agents on staggered 24h TTLs, a single uncached address triggered up to 1000 KV puts. Other writers (single-address resolver, identity gate, SWR locks) are bounded to 1 put per relevant event.

Expected metric movement

Metric Pre-PR (last 24h) Target
NEWS_KV writes/h 12.3K low single-digit K/h (driven by SWR locks + identity gate)
NEWS_KV reads/h ~50K unchanged
Name resolution correctness 100% 100% (smoke /api/signals, /api/correspondents, /api/init)

Validation window: 24h post-deploy, Cloudflare GraphQL on the production NEWS_KV namespace.

Rollback signal

If we see name-resolution regressions in /api/signals, /api/correspondents, or /api/init (rendered display names suddenly missing for non-requested-but-recently-seen addresses), revert this commit. The previous behavior pre-warmed the entire registry, so any caller relying on a sibling lookup populating its cache implicitly would break.

There are no such callers in the current code — every call site explicitly passes the addresses it cares about.

Test plan

  • npm run typecheck passes
  • post-deploy smoke: /api/signals, /api/correspondents, /api/init return populated name fields
  • Cloudflare GraphQL: NEWS_KV writes/h drops materially in 24h post-deploy window
  • Central logs (logs.aibtc.com) clean for agent-news post-deploy

🤖 Generated with Claude Code

resolveAgentNames previously wrote every agent in the bulk fetch page
(~1000 entries) to NEWS_KV on every cache miss, even when the caller
asked about 1-2 addresses. With staggered 24h TTLs across ~1000 agents,
this single call site explains the bulk of the ~12.3K/h remaining
NEWS_KV writes after PR #704/#705 removed rate-limit counters (B1 in
cloudflare-bill-reduction-tracker-2026-05.md).

Keep the bulk fetch as the latency optimization (one HTTP page beats N
calls) but scope the KV puts to the originally-requested addresses
only; the pre-warm fan-out is dropped.

Validation: 24h post-deploy NEWS_KV writes/h via Cloudflare GraphQL,
plus smoke on /api/signals, /api/correspondents, /api/init for name
resolution still returning expected display names.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 23:54
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 2, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Updated (UTC)
✅ Deployment successful!
View logs
agent-news 4436e59 May 02 2026, 11:59 PM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Preview deployed: https://agent-news-staging.hosting-962.workers.dev

This preview uses sample data — beats, signals, and streaks are seeded automatically.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR narrows resolveAgentNames KV write behavior so the agent resolver still uses the bulk registry fetch for latency, but only persists cache entries for the addresses the caller actually requested. In the broader codebase, this targets one of the remaining high-volume NEWS_KV write paths without changing the external resolver interface.

Changes:

  • Update the batch resolver documentation to reflect that bulk fetches are no longer used to pre-warm unrelated addresses.
  • Restrict KV fan-out after a bulk fetch to uncached requested addresses instead of every fetched agent.
  • Preserve existing negative-caching behavior for requested addresses that are still unresolved after a complete bulk fetch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@arc0btc arc0btc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scopes resolveAgentNames KV writes to only the originally-requested addresses — correct fix for the bulk pre-warm cost driver.

What works well:

  • Logic change is sound: the continue guard on !uncachedSet.has(btcAddr) correctly short-circuits unneeded writes while preserving the bulk fetch as a pure latency optimization
  • Moving infoMap.set and uncachedSet.delete before the KV write is the right ordering — local state updated before the async side-effect, which makes the loop easier to reason about
  • Negative-caching for unresolved requested addresses (the block after the loop) is untouched — that path still works correctly
  • TTL logic (info.name ? CACHE_TTL_SECONDS : NEGATIVE_CACHE_TTL_SECONDS) is preserved for the now-scoped writes
  • PR description is thorough: explicit rollback signal, metric baseline, and confirmation that no callers rely on sibling-address cache pre-warming. That last point is the critical invariant here and it's good to see it called out explicitly.

[nit] Planning doc reference in inline comment (agent-resolver.ts:204)
The comment references cloudflare-bill-reduction-tracker-2026-05.md by path. That file will outlive its context, making this comment cryptic to future readers. The surrounding explanation is already clear — consider dropping the doc reference and keeping just the metric reasoning.

Code quality notes:
The refactor is already minimal. Old code had an outer write loop with an inner conditional for the infoMap — new code flips the guard to a continue, which removes the nesting and makes the intent obvious at a glance. No reuse opportunities missed; the uncachedSet was already the right data structure for this.

Operational note: We process all the public agent-news API routes (signals, correspondents, init) daily. The pre-warm behavior was invisible to callers but would have inflated KV write counts on every agent-resolver miss. The scoping to uncachedSet aligns with how every call site already works — each passes exactly the addresses it needs.

Drop the planning-doc reference per arc0btc review nit on PR #725 —
inline comments shouldn't depend on transient planning paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@whoabuddy
Copy link
Copy Markdown
Contributor Author

Addressed arc0btc's nit in 4436e59 — dropped the planning-doc reference, kept the metric reasoning.

@whoabuddy whoabuddy merged commit 95598b0 into main May 3, 2026
7 checks passed
@whoabuddy whoabuddy deleted the fix/agent-resolver-scoped-kv-writes branch May 3, 2026 03:02
whoabuddy added a commit that referenced this pull request May 3, 2026
- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp));
  example uses $(date -u +%s) instead of an ISO literal.
- recon CLI: compare repaired to a new affected_addresses field instead of
  drift_count (Codex/Copilot — drift_count is field-level, repaired is
  per-address; an address with multiple drifted fields previously caused
  false-failure exit codes).
- DO recon route: returns affected_addresses alongside drift_count; rejects
  non-boolean repair payloads explicitly; rejects non-string btc_address.
- Config route: validates btc_address as string + valid BTC before invoking
  verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects
  non-boolean repair the same way.
- Schema migration comment: point at the actual route
  /api/config/recon-correspondents (was /admin/recon-correspondents).
- Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after
  window, and rollback signal per the repo's cost-PR convention.
- Tests: add a real recon-path test that corrupts correspondent_stats,
  asserts /api/correspondents serves the corrupt values (proving the
  materialised read is wired up), runs the recon path inline via
  test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
whoabuddy added a commit that referenced this pull request May 3, 2026
* feat(agent-news): materialise correspondent_stats for hot-path bounded reads

Adds a per-agent aggregate table maintained on every signal insert and
on beat-deletion bulk operations. The four hot paths that were running
GROUP BY btc_address over the full signals table —
- GET /correspondents
- GET /correspondents-bundle
- GET /init (correspondents block)
- queryLeaderboard first-signal tie-breaker
read from correspondent_stats (~430 rows) instead of scanning ~27.8K
signal rows on every cache miss / SWR rebuild.

Migration 29 backfills the table from current signals. The maintenance
helper recomputes days_active per-agent (bounded by that agent's own
signal history, typical ~200–600 rows) which is still much smaller
than the full-table scan it replaces.

Targets B2 in cloudflare-bill-reduction-tracker-2026-05.md — projected
NewsDO rows-read drop from ~202.7M/h to tens of M/h.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(agent-news): publisher-only recon endpoint for correspondent_stats

Adds POST /api/config/recon-correspondents (BIP-322 auth via verifyAuth)
which compares the materialised aggregate to a fresh GROUP BY scan and
optionally repairs drifted rows in place. Backstops a missed write path
in the maintenance helper without requiring a redeploy + backfill.

Bundled with a thin CLI script (scripts/recon-correspondent-stats.ts)
that hits the route with pre-signed BIP-322 headers via env so the
script itself stays signing-agnostic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(agent-news): cover correspondent_stats lifecycle through public reads

Adds five lifecycle assertions: single insert, two same-day inserts,
two cross-day inserts, correction-does-not-count, and the corollary
(an agent whose only signal is a correction does not appear in
/api/correspondents at all).

Also wires the test-seed route to recompute correspondent_stats for
every seeded agent at the end of the seed batch, so HTTP-level tests
see a consistent materialised aggregate without each test having to
hit the recon endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(agent-news): address #731 review feedback

- recon CLI: doc BTC_TIMESTAMP as Unix seconds (auth.ts parses Number(timestamp));
  example uses $(date -u +%s) instead of an ISO literal.
- recon CLI: compare repaired to a new affected_addresses field instead of
  drift_count (Codex/Copilot — drift_count is field-level, repaired is
  per-address; an address with multiple drifted fields previously caused
  false-failure exit codes).
- DO recon route: returns affected_addresses alongside drift_count; rejects
  non-boolean repair payloads explicitly; rejects non-string btc_address.
- Config route: validates btc_address as string + valid BTC before invoking
  verifyAuth (avoid 500 from .toLowerCase() on non-string input); rejects
  non-boolean repair the same way.
- Schema migration comment: point at the actual route
  /api/config/recon-correspondents (was /admin/recon-correspondents).
- Cost runbook: add B1 (#725) and B2 (#731) entries with metric, before/after
  window, and rollback signal per the repo's cost-PR convention.
- Tests: add a real recon-path test that corrupts correspondent_stats,
  asserts /api/correspondents serves the corrupt values (proving the
  materialised read is wired up), runs the recon path inline via
  test-seed (gated on ENVIRONMENT), and asserts repaired == affected_addresses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(agent-news): extract computeCorrespondentDrift + migration log context

Per arc0btc review on #731:

- Hoist the duplicated drift comparison loop out of POST /recon-correspondents
  and the test-seed `recon` hook into a single private
  `computeCorrespondentDrift()` helper. Schema additions to
  `correspondent_stats` now touch one site instead of two.
- Migration 29 error log now includes the version + statement index so
  partial-failure diagnostics survive into post-deploy debugging.
- bumpCorrespondentStatsForInsert is unchanged but verified in the review
  thread to be gated to non-correction inserts only (POST /signals
  hardcodes correction_of=NULL at the call site; PATCH /signals/:id
  inserts with correction_of=originalId and does not call bump).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants