You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#651's dashboard rebuilder fan-outs to ~2 subrequests per agent (Hiro /balances + mempool.space) inside a single Worker invocation. At ~500 agents this hits the 1000-subrequest-per-invocation cap; at ~600 agents it hits the 30s Worker CPU limit. This is the same architectural wall Phase 1.4 hit during inbox reconciliation last week (see CHECKPOINT-2026-05-10T0203Z-inbox-subrequest-limit.md in the D1-migration quest planning folder).
Rework #651 to use a Durable Object alarm as the chunked rebuilder, writing per-agent rows to the existing D1 balances table.
Failure thresholds (current PR design)
Agents
Subrequests per rebuild
Worker CPU per rebuild
Status
430 (current)
~860
~21s
Under both limits, barely
500
~1000
~25s
At subrequest cap — first failures
600
~1200
~30s
Both walls breached
1000
~2000
~50s
Architecturally impossible in one invocation
Neither limit is raisable via wrangler config or plan upgrade.
Architecture
Scheduler: Durable Object alarm. Chosen over Cron Triggers because we need cursor state, strict-once execution per tick, and adaptive interval (back off on Hiro 429s). Established org pattern — see Reference patterns below.
DashboardRebuildDO (new): single global instance, holds rebuild cursor in DO storage. alarm() handler does one chunk per tick.
Chunking:CHUNK_SIZE = 100 agents per alarm. Per-tick subrequests: ~200 (100 Hiro + 100 mempool) + 1–2 for prices. Well under the 1000 cap with margin.
Pre-fetch outside the lock: per the NonceDO precedent — do all Hiro/mempool I/O before entering blockConcurrencyWhile, then apply writes inside the lock. Sequential fetches inside the lock can exceed 30s and crash the DO.
Adaptive interval:ALARM_INTERVAL_ACTIVE_MS (~2 min) when the agent set is healthy; ALARM_INTERVAL_IDLE_MS (~10 min) when most recent ticks saw upstream failures. Always re-schedule from the alarm handler, including the catch path.
Storage: writes per-agent rows to balances D1 table (migration 006_balances.sql, currently dead schema — this is its intended writer). One row per (agent, token) — schema already defined in RFC §balances.
Removed:lib/balances/snapshot.ts (KV snapshot + SWR + building sentinel — replaced by DO alarm + D1).
Reference patterns
Org has established DO patterns to copy from — this isn't a new architectural primitive for the team:
aibtcdev/x402-sponsor-relay/src/durable-objects/nonce-do.ts — the closest analog. Cursor-rotated wallet chunks per alarm, parallel Hiro pre-fetch outside the lock, adaptive ALARM_INTERVAL_ACTIVE_MS / ALARM_INTERVAL_IDLE_MS, self-rescheduling via state.storage.setAlarm() including the catch path. Encodes the exact "don't fan out inside blockConcurrencyWhile" lesson.
aibtcdev/x402-sponsor-relay/src/durable-objects/stats-do.ts — DO + DurableObjectStorage["sql"] for aggregate state with periodic refresh.
aibtcdev/agent-news/src/objects/news-do.ts — DurableObject<Env> + SQL storage shape; confirms the pattern is established across the org.
Summary
#651's dashboard rebuilder fan-outs to ~2 subrequests per agent (Hiro
/balances+ mempool.space) inside a single Worker invocation. At ~500 agents this hits the 1000-subrequest-per-invocation cap; at ~600 agents it hits the 30s Worker CPU limit. This is the same architectural wall Phase 1.4 hit during inbox reconciliation last week (seeCHECKPOINT-2026-05-10T0203Z-inbox-subrequest-limit.mdin the D1-migration quest planning folder).Rework #651 to use a Durable Object alarm as the chunked rebuilder, writing per-agent rows to the existing D1
balancestable.Failure thresholds (current PR design)
Neither limit is raisable via wrangler config or plan upgrade.
Architecture
Scheduler: Durable Object alarm. Chosen over Cron Triggers because we need cursor state, strict-once execution per tick, and adaptive interval (back off on Hiro 429s). Established org pattern — see
Reference patternsbelow.DashboardRebuildDO(new): single global instance, holds rebuild cursor in DO storage.alarm()handler does one chunk per tick.CHUNK_SIZE = 100agents per alarm. Per-tick subrequests: ~200 (100 Hiro + 100 mempool) + 1–2 for prices. Well under the 1000 cap with margin.NonceDOprecedent — do all Hiro/mempool I/O before enteringblockConcurrencyWhile, then apply writes inside the lock. Sequential fetches inside the lock can exceed 30s and crash the DO.ALARM_INTERVAL_ACTIVE_MS(~2 min) when the agent set is healthy;ALARM_INTERVAL_IDLE_MS(~10 min) when most recent ticks saw upstream failures. Always re-schedule from the alarm handler, including the catch path.balancesD1 table (migration006_balances.sql, currently dead schema — this is its intended writer). One row per (agent, token) — schema already defined in RFC §balances.app/api/dashboard/route.ts+/dashboardpage read from D1, ranked + paginated server-side. Edge cache viaCache-Control: s-maxage=60, stale-while-revalidate=300(same as feat: trading-comp dashboard with multi-token portfolio + USD totals #651) so most reads never reach D1.lib/balances/fetch.ts,lib/balances/prices.ts,lib/balances/types.ts,/dashboardpage UI, all discovery wiring (llms.txt,openapi.json,agent.json).lib/balances/snapshot.ts(KV snapshot + SWR + building sentinel — replaced by DO alarm + D1).Reference patterns
Org has established DO patterns to copy from — this isn't a new architectural primitive for the team:
aibtcdev/x402-sponsor-relay/src/durable-objects/nonce-do.ts— the closest analog. Cursor-rotated wallet chunks per alarm, parallel Hiro pre-fetch outside the lock, adaptiveALARM_INTERVAL_ACTIVE_MS/ALARM_INTERVAL_IDLE_MS, self-rescheduling viastate.storage.setAlarm()including the catch path. Encodes the exact "don't fan out insideblockConcurrencyWhile" lesson.aibtcdev/x402-sponsor-relay/src/durable-objects/stats-do.ts— DO +DurableObjectStorage["sql"]for aggregate state with periodic refresh.aibtcdev/agent-news/src/objects/news-do.ts—DurableObject<Env>+ SQL storage shape; confirms the pattern is established across the org.wrangler.jsoncbinding shape (from x402-sponsor-relay):durable_objects.bindings: [{ name: "DASHBOARD_REBUILD_DO", class_name: "DashboardRebuildDO" }]+migrations: [{ tag: "v1", new_sqlite_classes: ["DashboardRebuildDO"] }]. Add to top-level + each env block per the fix(rate-limit): env separation + DEPLOY_ENV + bucket rename + test handler exercise (#663) #666 pattern.Acceptance
blockConcurrencyWhile; DO write phase inside the lock only does SQL + alarm reschedulebalancesrows present for every agent incache:agent-listwithin one full rotation/api/dashboardreads from D1, returns identical JSON shape to current PROut of scope
swaps(separate, lives in feat(competition): Phase 3.1 verifier + read routes + allowlist + scheduler #738; the feat(competition): Phase 3.1 verifier + read routes + allowlist + scheduler #738 scheduler follow-up should use the same DO-alarm pattern — separate decision)Context
docs/rfc-d1-schema.md§balances + migration006_balances.sqlaibtcdev/x402-sponsor-relayNonceDO + StatsDO;aibtcdev/agent-newsNewsDO🤖 Generated with Claude Code