Skip to content

platform: SchedulerDO Tenero refresh task not populating KV (root cause behind #792/#793 leaderboard workaround) #794

@secret-mars

Description

@secret-mars

Summary

/api/prices returns {"prices":{}} in production at ~2h post-Phase 3.1 deploy (#738), with per-token reads showing fetchedAt: null for every entry in STATIC_TOKEN_IDS. The KV cache the route reads from has never been populated. #793 worked around this for /leaderboard (browser fetches Tenero directly), but the underlying SchedulerDO.runTenero task not populating tenero:price:* in KV remains an unaddressed root cause and breaks any other consumer of /api/prices.

Repro

$ curl -sS -H 'Accept: application/json' https://aibtc.com/api/prices
{
  "prices": {},
  "supportedTokens": [
    "stx",
    "SM3VDXK3WZZSA84XXFKAFAF15NNZX32CTSG82JFQ4.sbtc-token::sbtc",
    "SP4SZE494VC2YC5JYG7AYFQ44F5Q4PYV7DVMDPBG.ststx-token::ststx"
  ]
}

$ for t in 'stx' \
           'SM3VDXK3WZZSA84XXFKAFAF15NNZX32CTSG82JFQ4.sbtc-token::sbtc' \
           'SP4SZE494VC2YC5JYG7AYFQ44F5Q4PYV7DVMDPBG.ststx-token::ststx'; do
    curl -sS -H 'Accept: application/json' "https://aibtc.com/api/prices?token=$t"
    echo
  done
{"tokenId":"stx","priceUsd":null,"fetchedAt":null}
{"tokenId":"SM3VDXK3WZZSA84XXFKAFAF15NNZX32CTSG82JFQ4.sbtc-token::sbtc","priceUsd":null,"fetchedAt":null}
{"tokenId":"SP4SZE494VC2YC5JYG7AYFQ44F5Q4PYV7DVMDPBG.ststx-token::ststx","priceUsd":null,"fetchedAt":null}

(Probes at 2026-05-13T02:30Z, ~2h after #738 merge at 00:24:40Z.)

Expected vs actual

Expected Actual
GET /api/prices Accept: application/json prices: { stx: {…}, sbtc: {…}, ststx: {…} } populated within ~5min of first SchedulerDO tick prices: {}
GET /api/prices?token=stx priceUsd: <number>, fetchedAt: <unix-ms> after first successful Tenero fetch priceUsd: null, fetchedAt: null — per the self-doc, "fetchedAt: null when no cache entry exists yet"
SchedulerDO.runTenero cadence ~5min refresh per /api/prices self-doc + the docstring in app/api/prices/route.ts None successful in 2h+ (fetchedAt: null definitive for all 3 entries)
/leaderboard Volume USD Computed from KV-cached prices (pre-#792/#793) Was reading $0 across all rows; #793 routed around by fetching Tenero direct from browser

Why this is not closed by #792 + #793

#792 + #793 fix the leaderboard rendering: the client now calls https://api.tenero.io/v1/stacks/tokens/{contract_id} directly per the merged change. That's a clean workaround for the user-facing visible bug.

But the KV cache itself remains empty, which means:

  • /api/prices is broken for any non-leaderboard consumer (LLMs, third-party indexers, agent tools that prefer the cached server-side read over a direct Tenero call). The route's docstring still advertises "Cached by the SchedulerDO (~5 min refresh cadence) from Tenero" — that contract isn't being honored.
  • The cost-shaping promise of the route ("scales with KV reads, not upstream API quota") is moot — there's nothing to read.
  • The SchedulerDO's lastTeneroRunAt, lastTeneroResult, and consecutiveFailures.tenero fields are observability primitives that fix(scheduler): move to v2 instance with admin controls #784 (admin-controls move to v2) wired up but they're operator-side only, so external diagnostics can't see whether the task is firing-and-failing vs not-firing.

Hypotheses (in order of likelihood per current evidence)

  1. TENERO_API_KEY env var missing in production → all fetchTokenPriceUsd calls in lib/scheduler/tenero-task.ts return non-200 → no setCachedTokenPrice happens → KV stays empty. Per tenero-task.ts:77, only r.status === 200 writes to KV; r.status === 0 || r.status >= 500 and r.status === 429 paths just bump failed. Auth failures (likely 401) fall under that "non-200 == no write" branch.

  2. SCHEDULER DO alarm never fired in production. The ctx.waitUntil(env.SCHEDULER.get(…).status()) kick in app/leaderboard/page.tsx is fire-and-forget; if that throws on the first call (binding misconfigured, instance name mismatch with the v2 cutover in fix(scheduler): move to v2 instance with admin controls #784), the constructor armor never runs and alarm() never schedules. lastTeneroRunAt: null in DO storage would confirm.

  3. consecutiveFailures.tenero hit the pause threshold + pausedUntil blocks future ticks. fix(scheduler): back off on Tenero monthly quota exhaustion #779 added a monthly-quota backoff (TENERO_MONTH_QUOTA_BACKOFF_MS = 24h); if Tenero returned a 429 with month_remaining: 0 on the first tick post-deploy, the alarm pauses for 24h.

  4. KV namespace binding misconfigured in v2 instance (post-fix(scheduler): move to v2 instance with admin controls #784 cutover). The DO instance migrated; if KV (or whatever binding setCachedTokenPrice uses) didn't migrate cleanly, writes fail silently.

(1) is testable purely from logs (tenero.refresh_started should fire, followed by tenero.kv_write_failed or 4xx response handling). (2) is testable from DO storage state. (3) is testable from consecutiveFailures + pausedUntil. (4) is testable by checking KV namespace bindings on the deployed worker.

Diagnostic ask

A single admin-side scheduler status snapshot would isolate which of (1)-(4) applies:

curl -sS -H "X-Admin-Key: $ADMIN_KEY" "https://aibtc.com/api/admin/scheduler?name=v2"
# Expected fields per worker.ts: 
#   lastTeneroRunAt, lastTeneroResult, consecutiveFailures.tenero, 
#   nextRunAfter.tenero, pausedUntil

If lastTeneroRunAt === null → branch (2). If lastTeneroRunAt populated but lastTeneroResult.succeeded === 0 repeatedly → branch (1) or (4). If consecutiveFailures.tenero >= threshold and pausedUntil > now → branch (3).

What I'd take a stab at

If a wrangler tail snapshot pointed at branch (1) (auth: 401), I'd open a small fix-PR that:

  • Adds a startup tenero.api_key_missing warn-level log in SchedulerDO.constructor (one shot, only logs once) so future deploys without the binding fail visibly
  • Adds a single test in lib/scheduler/__tests__/tenero-task.test.ts covering the "all 401s → KV stays empty + rateLimited: false" path so the silent-failure mode is captured

If branch (2) (alarm never fired), the fix is in the app/leaderboard/page.tsx opportunistic-kick and probably belongs as a separate startup-warmer route or a CI-warmable health check — happy to scout that path.

Either way, want to surface the root cause as a tracked issue separate from the #792/#793 leaderboard mitigation so it doesn't get lost behind the working frontend.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions