fix(traffic): bound the Vercel sync drain so a dense/slow window can't wedge a source#681
Open
arberx wants to merge 1 commit into
Open
fix(traffic): bound the Vercel sync drain so a dense/slow window can't wedge a source#681arberx wants to merge 1 commit into
arberx wants to merge 1 commit into
Conversation
…t wedge a source A drifted watermark or a dense/slow request-logs window made the synchronous Vercel sync drain run for many minutes — timing out the caller and leaving the run stuck 'running', the source ingesting nothing until a manual reset. Two bounds on the incremental Vercel sync, on top of the existing per-fetch 30s timeout + retry (which never bounded the TOTAL drain): - Window cap (VERCEL_MAX_SYNC_WINDOW_MS = 24h): clamp the start forward so a watermark that drifted past the cap can't request a multi-day pull. The skipped span is surfaced (warn), not silent — a backfill recovers it. - Drain wall-clock deadline (DEFAULT_VERCEL_SYNC_DEADLINE_MS = 4m, override via vercelSyncDeadlineMs): the adaptive drain stops before a sub-window once the budget elapses and reports how far it got. The route commits that partial window and advances lastSyncedAt only to there (the additive rollup makes a partial window safe), so the next sync resumes from the boundary instead of one sync grinding unbounded. If nothing drained before the budget the run fails (visible) instead of orphaning a 'running' row. No API surface change (internal options only), so no SDK regen. 4.70.0 -> 4.70.1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A Vercel traffic sync drains its
request-logswindow synchronously inside the sync request. Two cases made that drain run for many minutes — timing out the caller (surfacing as a misleading "could not connect") and leaving the run stuckrunningwhile the source ingested nothing until a manualreset:traffic-syncschedule was paused/missing) requests a pull back to the 30-day default window.The adapter already had a per-fetch 30s timeout + retry, but nothing bounded the TOTAL drain — only a 5000 sub-window count cap.
Fix
Two bounds on the incremental Vercel sync:
Window cap (
VERCEL_MAX_SYNC_WINDOW_MS = 24h): clamp the start forward so a drifted watermark can't request a multi-day pull. The skipped pre-cap span is surfaced viawarn(a backfill recovers it), never silently dropped.Drain wall-clock deadline (
DEFAULT_VERCEL_SYNC_DEADLINE_MS = 4m, overridevercelSyncDeadlineMs):drainVercelTrafficEventsstops before starting a sub-window once the budget elapses and reportsdrainedThroughMs(the last fully-drained boundary). The route then:lastSyncedAtonly todrainedThroughMs(the incremental rollup is additive, so a partial window is safe), so the next sync resumes from the boundary — a dense backlog converges over several syncs instead of one unbounded grind;runningrow.Retention handling is unchanged and still takes precedence (a clamped-to-tail window fails so
lastSyncedAtnever advances across missing history).No API surface change (internal route/adapter options only), so no SDK regen. Patch bump 4.70.0 → 4.70.1.
Tests
integration-verceldrain unit: full drain reportsdrainedThroughMs == endDate+deadlineReached: false; an already-passed deadline stops before the first pull (zero progress); a mid-window deadline stops with partial progress and reports the boundary.api-routesroute: a zero budget fails the run without advancing the watermark; a 5-day-drifted watermark is capped to the last 24h.Full suite 1140 passing; typecheck clean; 0 lint errors.
Follow-up (not in this PR)
The sync route is still synchronous — a manual
traffic syncCLI call can out-wait its own HTTP timeout even though the daemon now bounds and completes the work. Making the sync route async (returnrunIdimmediately, like backfill does) would remove the misleading "could not connect" entirely; worth a separate change.🤖 Generated with Claude Code