-
Notifications
You must be signed in to change notification settings - Fork 91
fix: Add stream retry with exponential backoff to AnthropicProvider #1229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Issue: #1228 Problem: AnthropicProvider had retryWithBackoff around the initial API call but no retry logic for mid-stream errors during stream consumption. Transient network errors (like 'terminated') that occurred during streaming would break the agent loop instead of being retried. Solution: - Added stream retry constants (matching OpenAIResponsesProvider pattern): - STREAM_RETRY_INITIAL_DELAY_MS = 5000ms (5 seconds) - STREAM_RETRY_MAX_DELAY_MS = 30000ms (30 seconds) - STREAM_RETRY_MAX_ATTEMPTS = 4 (consistent retry count for Anthropic) - Imported delay utility for exponential backoff - Wrapped entire stream consumption in retry loop (lines 1948-2260): - Tracks streamingAttempt counter and currentDelay for backoff - On transient stream error: checks if error is retryable via isNetworkTransientError - If retryable and attempts remain: logs, delays with exponential backoff + jitter, makes fresh API call to get new stream, then restarts consumption - If not retryable or attempts exhausted: throws error - Key implementation details: - When stream fails mid-consumption, makes a new API call (apiCallWithResponse) to get a fresh stream from Anthropic SDK - Uses exponential backoff with 30% jitter to avoid thundering herd - Matches OpenAIResponsesProvider pattern (PR #1227) for consistency Testing: - Created AnthropicProvider.streamRetry.test.ts with comprehensive test coverage: - Test: retries when streaming terminates after successful API call - Test: waits with exponential backoff between retries (validates backoff increases) - Test: fails after max attempts with error thrown (validates 4 attempts) - Test: throws non-retryable errors immediately (no retry for bad request) All tests pass, including full verification cycle: - npm run test (all 5555 tests pass) - npm run lint (clean) - npm run typecheck (clean) - npm run format (clean) - npm run build (clean) - node scripts/start.js --profile-load synthetic "write me a haiku" (works) Fixes #1228
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
LLxprt PR Review – PR #1229Title: fix: Add stream retry with exponential backoff to AnthropicProvider Issue AlignmentMeets requirements. The PR directly addresses issue #1228 by implementing stream retry with exponential backoff for AnthropicProvider, mirroring the fix from PR #1227 for OpenAIResponsesProvider. Key evidence:
Side Effects
Code Quality
Tests and CoverageCoverage impact: Increase
Tests are behavioral (testing API calls, delays, error handling) rather than mock theater. VerdictReady – The implementation correctly addresses issue #1228 with comprehensive test coverage. The changes are consistent with the pattern established in PR #1227 and properly respect user-configured ephemeral settings. |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-24.x-ubuntu-latest' artifact from the main CI run. |
…ded constants Stream retry for both OpenAIResponsesProvider and AnthropicProvider now reads retries/retrywait from ephemeral settings (user-configurable profile settings), consistent with how fetch-level retries already work. OpenAIResponsesProvider: - Reads retries/retrywait from options.invocation.ephemerals - Defaults: isCodex ? 5 : 4 attempts, 5000ms initial delay - Passes settings to inner retryWithBackoff call AnthropicProvider: - Reuses maxAttempts/initialDelayMs from getRetryConfig() which already reads from ephemeral settings (default: 6 attempts, 4000ms) - Removed redundant hardcoded STREAM_RETRY_* constants Tests updated to verify ephemeral settings are respected.
TLDR
Fixes the issue where transient network errors (like `terminated`, `connection reset`, etc.) during AnthropicProvider stream consumption were breaking the agent loop instead of being properly retried with exponential backoff. This mirrors the fix from PR #1227 for OpenAIResponsesProvider.
Both providers now read retry configuration from ephemeral settings (`retries` and `retrywait`) instead of hardcoded constants, ensuring consistency with fetch-level retry behavior and respecting user-configured profile settings. Bucket failover via `onPersistent429` is preserved in Anthropic stream retries.
Dive Deeper
Problem
The AnthropicProvider had `retryWithBackoff` around the initial API call, but when streaming is enabled, the stream consumption loop only caught errors and re-threw them - no retry with backoff. Additionally, the initial implementation hardcoded retry constants instead of reading from the ephemeral settings that users configure in profiles.
Solution
AnthropicProvider stream retry - Wrapped stream consumption in a retry loop:
Ephemeral settings integration - Both providers now read from user-configurable settings:
Exponential backoff with jitter: Same formula as `retryWithBackoff`: jitter = currentDelay * 0.3 * (Math.random() * 2 - 1), delay doubles each retry, capped at 30s
Bucket failover preserved: Anthropic stream retry passes `onPersistent429Callback` to the inner `retryWithBackoff` call, so load-balanced/bucketed profiles work correctly
Reviewer Test Plan
Testing Matrix
Linked issues / bugs
closes #1228