feat(dashboard): rework trading-comp dashboard to D1 balances + cron chunking (blocks #651)

## Summary

#651's dashboard rebuilder fan-outs to ~2 subrequests per agent (Hiro `/balances` + mempool.space) inside a single Worker invocation. At ~500 agents this hits the 1000-subrequest-per-invocation cap; at ~600 agents it hits the 30s Worker CPU limit. This is the same architectural wall Phase 1.4 hit during inbox reconciliation last week (see `CHECKPOINT-2026-05-10T0203Z-inbox-subrequest-limit.md` in the D1-migration quest planning folder).

Rework #651 to use a **Durable Object alarm** as the chunked rebuilder, writing per-agent rows to the existing D1 `balances` table.

## Failure thresholds (current PR design)

| Agents | Subrequests per rebuild | Worker CPU per rebuild | Status |
|---:|---:|---:|---|
| 430 (current) | ~860 | ~21s | Under both limits, barely |
| 500 | ~1000 | ~25s | At subrequest cap — first failures |
| 600 | ~1200 | ~30s | Both walls breached |
| 1000 | ~2000 | ~50s | Architecturally impossible in one invocation |

Neither limit is raisable via wrangler config or plan upgrade.

## Architecture

**Scheduler: Durable Object alarm.** Chosen over Cron Triggers because we need cursor state, strict-once execution per tick, and adaptive interval (back off on Hiro 429s). Established org pattern — see `Reference patterns` below.

- **`DashboardRebuildDO` (new):** single global instance, holds rebuild cursor in DO storage. `alarm()` handler does one chunk per tick.
- **Chunking:** `CHUNK_SIZE = 100` agents per alarm. Per-tick subrequests: ~200 (100 Hiro + 100 mempool) + 1–2 for prices. Well under the 1000 cap with margin.
- **Pre-fetch outside the lock:** per the `NonceDO` precedent — do all Hiro/mempool I/O before entering `blockConcurrencyWhile`, then apply writes inside the lock. Sequential fetches inside the lock can exceed 30s and crash the DO.
- **Adaptive interval:** `ALARM_INTERVAL_ACTIVE_MS` (~2 min) when the agent set is healthy; `ALARM_INTERVAL_IDLE_MS` (~10 min) when most recent ticks saw upstream failures. Always re-schedule from the alarm handler, including the catch path.
- **Storage:** writes per-agent rows to `balances` D1 table (migration `006_balances.sql`, currently dead schema — this is its intended writer). One row per (agent, token) — schema already defined in RFC §balances.
- **Reader:** `app/api/dashboard/route.ts` + `/dashboard` page read from D1, ranked + paginated server-side. Edge cache via `Cache-Control: s-maxage=60, stale-while-revalidate=300` (same as #651) so most reads never reach D1.
- **Reusable from #651 unchanged:** `lib/balances/fetch.ts`, `lib/balances/prices.ts`, `lib/balances/types.ts`, `/dashboard` page UI, all discovery wiring (`llms.txt`, `openapi.json`, `agent.json`).
- **Removed:** `lib/balances/snapshot.ts` (KV snapshot + SWR + building sentinel — replaced by DO alarm + D1).

## Reference patterns

Org has established DO patterns to copy from — this isn't a new architectural primitive for the team:

- **`aibtcdev/x402-sponsor-relay/src/durable-objects/nonce-do.ts`** — the closest analog. Cursor-rotated wallet chunks per alarm, parallel Hiro pre-fetch outside the lock, adaptive `ALARM_INTERVAL_ACTIVE_MS` / `ALARM_INTERVAL_IDLE_MS`, self-rescheduling via `state.storage.setAlarm()` including the catch path. Encodes the exact "don't fan out inside `blockConcurrencyWhile`" lesson.
- **`aibtcdev/x402-sponsor-relay/src/durable-objects/stats-do.ts`** — DO + `DurableObjectStorage["sql"]` for aggregate state with periodic refresh.
- **`aibtcdev/agent-news/src/objects/news-do.ts`** — `DurableObject<Env>` + SQL storage shape; confirms the pattern is established across the org.
- **`wrangler.jsonc` binding shape** (from x402-sponsor-relay): `durable_objects.bindings: [{ name: "DASHBOARD_REBUILD_DO", class_name: "DashboardRebuildDO" }]` + `migrations: [{ tag: "v1", new_sqlite_classes: ["DashboardRebuildDO"] }]`. Add to top-level + each env block per the #666 pattern.

## Acceptance

- DO alarm processes 100-agent chunks; full agent set covered every ~10 min at current scale (~4–5 chunks)
- All Hiro/mempool I/O happens outside `blockConcurrencyWhile`; DO write phase inside the lock only does SQL + alarm reschedule
- D1 `balances` rows present for every agent in `cache:agent-list` within one full rotation
- `/api/dashboard` reads from D1, returns identical JSON shape to current PR
- Subrequest count per alarm invocation never exceeds ~250 (chunk + price fetch + headroom)
- Alarm always reschedules (success path + catch path), with adaptive interval based on observed upstream health
- Vitest covers: cursor advance, chunk pre-fetch outside lock, partial-failure handling, D1 read pagination, edge-cache headers preserved, alarm reschedule on success + on throw
- Smoke window post-merge: 60 min, confirm D1 row counts grow monotonically + dashboard renders + alarm log shows rescheduling + no Worker errors

## Out of scope

- Historical balance time-series queries (the table will accumulate data; query API can come later)
- SIP-10 metadata enrichment beyond what #651 already does
- Trading-comp ranking integration with `swaps` (separate, lives in #738; the #738 scheduler follow-up should use the same DO-alarm pattern — separate decision)

## Context

- Discovered during D1-migration quest retro 2026-05-11
- Subrequest-cap precedent: Phase 1.4 reconciliation hit the same wall on 12K-subrequest inbox loop
- Schema: `docs/rfc-d1-schema.md` §balances + migration `006_balances.sql`
- Scheduler-pattern precedent: `aibtcdev/x402-sponsor-relay` NonceDO + StatsDO; `aibtcdev/agent-news` NewsDO
- Blocks: #651
- Umbrella: #652

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dashboard): rework trading-comp dashboard to D1 balances + cron chunking (blocks #651) #764

Summary

Failure thresholds (current PR design)

Architecture

Reference patterns

Acceptance

Out of scope

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agents	Subrequests per rebuild	Worker CPU per rebuild	Status
430 (current)	~860	~21s	Under both limits, barely
500	~1000	~25s	At subrequest cap — first failures
600	~1200	~30s	Both walls breached
1000	~2000	~50s	Architecturally impossible in one invocation

feat(dashboard): rework trading-comp dashboard to D1 balances + cron chunking (blocks #651) #764

Description

Summary

Failure thresholds (current PR design)

Architecture

Reference patterns

Acceptance

Out of scope

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions