Skip to content

fix: improve account rotation with request timeouts and interruption safety#523

Open
9nunya wants to merge 8 commits intoNoeFabris:mainfrom
9nunya:fix/account-rotation-timeout-hang
Open

fix: improve account rotation with request timeouts and interruption safety#523
9nunya wants to merge 8 commits intoNoeFabris:mainfrom
9nunya:fix/account-rotation-timeout-hang

Conversation

@9nunya
Copy link

@9nunya 9nunya commented Mar 2, 2026

Summary

This PR addresses the "stuck request" and "interruption freeze" issues in the Antigravity plugin.

  • Request Timeouts: Introduced request_timeout_seconds (default: 600s) to prevent the plugin from hanging indefinitely on "stuck" accounts that accept connections but never produce output.
  • Interruption Safety: Modified the request loop to immediately exit upon receiving a user AbortSignal (e.g., pressing ESC). This fixes the "spin loop" that caused 100% CPU usage, memory leaks, and TUI/Web server crashes.
  • Proactive Quota Management: Added a mechanism to refresh the entire account pool's quotas if more than 75% of accounts are rate-limited or over soft-quota. This ensures the plugin recovers as soon as any account reset occurs.
  • Automatic Account Rotation: Accounts that time out are now penalized in the HealthScoreTracker and automatically rotated to the next available account.

Why

  • Indefinite Deadlocks: Hanging requests on certain Antigravity endpoints were previously causing permanent stalls.
  • Memory Leak & Crash: On user interruption, the catch-and-retry logic entered a high-speed spin loop because the AbortSignal remained in the aborted state.
  • Blind Quota Blocking: Quota resets were only detected opportunistically, requiring a manual Check quotas run to unblock a pool that had already reset at the provider level.

Changes

  • src/plugin/config/schema.ts: Added request_timeout_seconds configuration.
  • src/plugin.ts:
    • Implemented AbortSignal.any() to combine timeout and user interruption signals.
    • Added explicit check for AbortError to immediately break the retry loop on user action.
    • Implemented automatic rotation and health penalty for timed-out requests.
  • src/plugin/accounts.ts: Added shouldRefreshAllQuotas() heuristic and pool update logic.
  • src/plugin/quota.ts: Implemented triggerAsyncQuotaRefreshForAll() for proactive pool management.

Testing

  • Build: npm run build
  • Manual Verification: Symlinked the local build to ~/.cache/opencode/node_modules/opencode-antigravity-auth and verified:
    • Immediate TUI recovery on ESC (no freeze).
    • Automatic rotation on simulated 5s timeouts.
    • Proactive pool refresh when multiple accounts are blocked.

…safety

- Implement AbortSignal.timeout (180s default) to prevent hanging on stuck accounts
- Fix TUI/web server freeze by immediately exiting request loop on user interrupt
- Add proactive pool quota refresh when majority of accounts are rate-limited
- MARK unhealthy accounts on timeout to trigger automatic rotation
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds proactive background quota refresh with cooldown and a new exported trigger triggerAsyncQuotaRefreshForAll. AccountManager.updateQuotaCache now keys by refreshToken and persists changes; AccountManager.shouldRefreshAllQuotas was added. A Node.js AbortSignal.any polyfill was included and per-request timeouts are introduced via request_timeout_seconds (default 600), merging caller abort signals with a timeout signal. Request paths now explicitly handle aborts/timeouts (account rotation, logging, toasts, cooldown persistence). Tests updated for token-keyed quotas.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: introducing request timeouts and improving interruption safety for account rotation.
Description check ✅ Passed The description clearly explains the problem being solved (stuck requests, interruption freezes, blind quota blocking) and maps changes to specific files and features.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

src/plugin.ts Outdated
const timeoutMs = (config.request_timeout_seconds ?? 180) * 1000;
const timeoutSignal = AbortSignal.timeout(timeoutMs);
const combinedSignal = abortSignal
? (AbortSignal as any).any([abortSignal, timeoutSignal])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: AbortSignal.any() compatibility issue - may cause runtime errors

AbortSignal.any() was added in Node.js 20.17.0, but this project supports Node.js >=20.0.0. Users on Node.js 20.0.0 - 20.16.x will encounter a runtime error.

Suggested change
? (AbortSignal as any).any([abortSignal, timeoutSignal])
// Create a combined signal for timeout and user abort
const timeoutMs = (config.request_timeout_seconds ?? 180) * 1000;
const timeoutSignal = AbortSignal.timeout(timeoutMs);
const combinedSignal = abortSignal
? (abortSignal.any ? abortSignal.any([abortSignal, timeoutSignal]) : mergeAbortSignals(abortSignal, timeoutSignal))
: timeoutSignal;

Alternatively, you could add a polyfill at the top of the file:

// Polyfill for older Node.js versions
if (typeof AbortSignal.any === 'undefined') {
  AbortSignal.any = function(signals: AbortSignal[]): AbortSignal {
    const controller = new AbortController();
    for (const signal of signals) {
      if (signal.aborted) {
        controller.abort(signal.reason);
        break;
      }
      signal.addEventListener('abort', () => {
        controller.abort(signal.reason);
      }, { once: true });
    }
    return controller.signal;
  };
}

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 2, 2026

Code Review Summary

Status: No New Issues Found | Recommendation: Merge

Overview

This PR introduces timeout handling for API requests with proper Node.js version compatibility fallbacks. The changes address several issues raised in previous reviews.

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0

Changes Verified

  • AbortSignal.timeout() fallback: The new code at lines 2107-2113 properly checks for AbortSignal.timeout availability and falls back to setTimeout + AbortController for older Node.js versions
  • Timeout scheduling in fallback: The fallback path now correctly schedules timeout using setTimeout(() => controller.abort(new Error('Timeout')), effectiveTimeoutMs)
  • Timeout error handling: New code properly detects timeout errors (AbortError/TimeoutError) and rotates to the next account
  • API modernization: updateQuotaCache now uses string refreshToken instead of numeric index, which is a more robust approach
Files Reviewed (4 files)
  • src/plugin.ts - Timeout handling with fallback, error handling
  • src/plugin/accounts.ts - API change: updateQuotaCache signature, new shouldRefreshAllQuotas method
  • src/plugin/config/schema.ts - New request_timeout_seconds config option (default 600s)
  • src/plugin/quota.ts - New triggerAsyncQuotaRefreshForAll function

Pre-existing Issues (Not in This Diff)

The following issues from the existing comments table are in code that existed before this PR and are NOT addressed by this diff:

  • mergeAbortSignals cleanup registration (line 144-145 in src/plugin.ts) - early return on aborted signal bypasses cleanup
  • Schema default discrepancy (180s vs 600s) - existing comment

These are architectural issues in the existing codebase that would require separate PRs to fix.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin/accounts.ts`:
- Around line 1171-1184: shouldRefreshAllQuotas currently only counts
rate-limited (acc.rateLimitResetTimes) and cooling-down (acc.coolingDownUntil)
accounts, so it misses accounts blocked by soft quota; update the blockedCount
filter inside shouldRefreshAllQuotas (which uses getEnabledAccounts) to also
treat accounts as blocked if their soft-quota marker is in the future (e.g.,
check a field like acc.softQuotaUntil !== undefined && acc.softQuotaUntil > now
or the actual soft-quota timestamp property your Account object uses) and
include that condition in the returned (blockedCount / enabled.length)
calculation.

In `@src/plugin/quota.ts`:
- Around line 421-424: The loop writing async refresh results uses positional
result.index which can become stale; change the logic in the results handling so
you locate the target account by a stable identifier instead of result.index:
have the refresher include a stable accountId (or accountKey) in each result,
then replace accountManager.updateQuotaCache(result.index, ...) with a lookup
that finds the account (e.g., accountManager.findById(accountId) or a new
updateQuotaCacheById(accountId, groups)) and apply groups to that account;
update the producer of results to populate result.accountId if needed and update
callers of updateQuotaCache/its signature accordingly so writes use stable IDs,
not snapshot positions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between f0ee206 and 52d9b51.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (4)
  • src/plugin.ts
  • src/plugin/accounts.ts
  • src/plugin/config/schema.ts
  • src/plugin/quota.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
🧬 Code graph analysis (1)
src/plugin/quota.ts (1)
src/plugin/types.ts (1)
  • PluginClient (42-42)
🔇 Additional comments (8)
src/plugin/config/schema.ts (2)

271-279: request_timeout_seconds schema is well-bounded and documented.

The new config field is clearly constrained and matches the intended timeout behavior.


476-476: Default value alignment looks correct.

DEFAULT_CONFIG.request_timeout_seconds matches the schema default, preventing config drift.

src/plugin/accounts.ts (1)

1158-1164: Quota cache persistence on update is a good reliability improvement.

This prevents refreshed quota data from being lost across restarts.

src/plugin/quota.ts (1)

13-17: Pool refresh cooldown guard is a solid anti-thrashing addition.

The cooldown prevents repeated full-pool refresh bursts under heavy retry pressure.

src/plugin.ts (4)

47-47: Import/type updates are clean and improve clarity.

triggerAsyncQuotaRefreshForAll wiring and explicit AccountManager | null typing are both good changes.

Also applies to: 93-93


2025-2029: Proactive pool-refresh trigger placement is good.

Triggering refresh right before request execution helps recover from pooled blocking without stalling the foreground path.


2422-2444: Abort-vs-timeout split is a strong fix.

Immediate rethrow on user abort plus timeout-based account rotation directly addresses interruption freezes and stuck-account retries.


2031-2036: The code already guards AbortSignal.any with a module-level polyfill.

A compatibility polyfill exists at lines 89–111 that checks if (typeof (AbortSignal as any).any !== "function") and provides a fallback implementation using AbortController. This guard runs at module initialization before any code at line 2061 executes, ensuring AbortSignal.any is always available despite the Node.js >=20.0.0 constraint (which predates the 20.17.0 release when AbortSignal.any was added). The existing polyfill implementation is correct and handles all necessary edge cases.

Likely an incorrect or invalid review comment.

src/plugin.ts Outdated

// Create a combined signal for timeout and user abort
const timeoutMs = (config.request_timeout_seconds ?? 180) * 1000;
const timeoutSignal = AbortSignal.timeout(timeoutMs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRITICAL: AbortSignal.timeout() compatibility issue - will cause runtime errors

AbortSignal.timeout() was added in Node.js 20.11.0, but this project supports Node.js >=20.0.0. Users on Node.js 20.0.0 - 20.10.x will encounter a TypeError: AbortSignal.timeout is not a function runtime error.

Suggested change
const timeoutSignal = AbortSignal.timeout(timeoutMs);
// Create a combined signal for timeout and user abort
const timeoutMs = (config.request_timeout_seconds ?? 180) * 1000;
const timeoutSignal = (typeof AbortSignal.timeout === 'function')
? AbortSignal.timeout(timeoutMs)
: createTimeoutSignal(timeoutMs);
const combinedSignal = abortSignal
? (AbortSignal as any).any([abortSignal, timeoutSignal])
: timeoutSignal;

You'll need to add a createTimeoutSignal polyfill function alongside the existing AbortSignal.any() polyfill.

@9nunya
Copy link
Author

9nunya commented Mar 2, 2026

Fixing, just wanna note: this PR has improved my experience with the plugin, before i have come back to opencode being idle forever and causing memory leaks that put my system in a OOM freeze, with this fix, abort works when plugin is in the "retry" cycle forever, and the retry cycle is more forgiving / stable, leading to my sessions never freezing forever

- AccountManager.shouldRefreshAllQuotas now includes soft quota checks in its blockedCount calculation
- AccountManager.updateQuotaCache now uses stable refreshToken instead of stale positional index
- Quota results now include refreshToken for reliable identification during async refreshes
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin.ts`:
- Around line 2448-2470: Move the abort-signal check to run before any
error-type gating inside the error handling section: immediately check if
abortSignal?.aborted and, if so, call pushDebug("user-interrupted: stopping
request loop") and rethrow the current error to stop the loop; do this prior to
the existing instanceof Error && name ("AbortError"/"TimeoutError") branch so
user interruptions are detected regardless of error classification (affects the
error handler using abortSignal, pushDebug, shouldSwitchAccount, lastError,
getHealthTracker, and accountManager).

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 52d9b51 and 3757ec3.

📒 Files selected for processing (1)
  • src/plugin.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
🧬 Code graph analysis (1)
src/plugin.ts (3)
src/plugin/accounts.ts (1)
  • AccountManager (298-1264)
src/plugin/quota.ts (1)
  • triggerAsyncQuotaRefreshForAll (404-430)
src/plugin/rotation.ts (1)
  • getHealthTracker (441-446)
🔇 Additional comments (5)
src/plugin.ts (5)

47-47: Proactive quota refresh import is correctly wired.

triggerAsyncQuotaRefreshForAll is imported and then used in the request path, which keeps this change cohesive.


83-107: Compatibility polyfill looks good.

This safely backfills AbortSignal.any() for older Node 20.x runtimes and handles already-aborted signals correctly.


119-119: Type narrowing for activeAccountManager is clear and correct.

Explicit AccountManager | null keeps module state intent readable and type-safe.


2051-2055: Proactive refresh trigger is well-placed.

Running the pool refresh check pre-request with a non-blocking call is a solid resilience improvement.


2057-2067: Timeout + user-abort signal composition is correct.

This is the right pattern to avoid stuck requests while still honoring user interruption.

9nunya added 2 commits March 2, 2026 05:12
- Increase default request_timeout_seconds to 600s (10 mins)
- Apply 3x multiplier for streaming requests (max 30 mins) to prevent premature timeouts on active streams
- Fix actualTimeoutSec logic in catch block by moving declaration outside try scope
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/plugin/accounts.ts (2)

1184-1193: Comment wording should match threshold semantics.

Implementation uses >= 0.75 but the comment says “more than 75%”. Suggest updating wording to “75% or more” for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin/accounts.ts` around lines 1184 - 1193, The comment above the
blocked-count check is inconsistent with the condition: the code uses >= 0.75
(75% or more) but the comment says “more than 75%”; update the comment to say
“75% or more” (or alternatively change the comparison to > 0.75 if you truly
mean strictly greater than 75%) so the wording matches the implementation in the
function that calculates blockedCount and returns (blockedCount /
enabled.length) >= 0.75.

1185-1190: Blocked-account detection should be family-aware when family is provided.

Current logic treats any active rate-limit key as blocked. For family-scoped checks, this can overcount (e.g., one Gemini style limited while the requested family/model is still usable), causing unnecessary pool-wide refreshes.

♻️ Suggested adjustment
 const blockedCount = enabled.filter(acc => {
-  const isRateLimited = Object.values(acc.rateLimitResetTimes).some(t => t !== undefined && t > now);
+  clearExpiredRateLimits(acc);
+  const isRateLimited = family
+    ? isRateLimitedForFamily(acc, family, model)
+    : Object.values(acc.rateLimitResetTimes).some(t => t !== undefined && t > now);
   const isCoolingDown = acc.coolingDownUntil !== undefined && acc.coolingDownUntil > now;
   const isOverSoftQuota = family ? isOverSoftQuotaThreshold(acc, family, thresholdPercent, cacheTtlMs, model) : false;
   return isRateLimited || isCoolingDown || isOverSoftQuota;
 }).length;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin/accounts.ts` around lines 1185 - 1190, The blockedCount
computation is overcounting when a family is provided because isRateLimited
checks all acc.rateLimitResetTimes keys instead of only those relevant to the
requested family/model; update the predicate inside enabled.filter (where
blockedCount is computed) so that isRateLimited only considers
rateLimitResetTimes entries that apply to the given family or model (e.g.,
filter keys by family/model-specific keys or call a new helper like
isRateLimitedForFamily(account, family, model) ), keep the existing
isCoolingDown and isOverSoftQuota logic, and ensure the check uses the same now
timestamp and threshold arguments as isOverSoftQuotaThreshold to avoid counting
unrelated rate-limit keys as blocked.
src/plugin.ts (2)

2052-2056: Run proactive pool-refresh check once per endpoint cycle.

This condition currently executes for every fallback endpoint attempt. Restricting it to the first endpoint attempt avoids redundant checks and repeated debug noise.

⚙️ Suggested patch
- if (accountManager.shouldRefreshAllQuotas(family, config.soft_quota_threshold_percent, softQuotaCacheTtlMs, model)) {
+ if (i === 0 && accountManager.shouldRefreshAllQuotas(family, config.soft_quota_threshold_percent, softQuotaCacheTtlMs, model)) {
   pushDebug("proactive-quota-refresh: pool is mostly blocked, refreshing all");
   void triggerAsyncQuotaRefreshForAll(accountManager, client, providerId);
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin.ts` around lines 2052 - 2056, The proactive quota-refresh block
calling accountManager.shouldRefreshAllQuotas(...) and
triggerAsyncQuotaRefreshForAll(...) should only run once per endpoint cycle;
wrap that existing block with a guard that ensures it executes only on the first
endpoint attempt (for example check an existing attempt-index or add an
isFirstEndpointAttempt flag and require attemptIndex === 0) so the check and
pushDebug("proactive-quota-refresh: pool is mostly blocked, refreshing
all")/triggerAsyncQuotaRefreshForAll(...) are not executed on every fallback
endpoint try.

2468-2476: Persist timeout cooldown state after marking account as cooling down.

This branch mutates account availability but doesn’t request persistence; adding a save request makes timeout penalties survive process restarts.

💾 Suggested patch
   getHealthTracker().recordFailure(account.index);
   accountManager.markAccountCoolingDown(account, 60000, "network-error");
+  accountManager.requestSaveToDisk();
   
   await showToast(
     `⏳ Account stuck (${actualTimeoutSec}s). Rotating to next available...`,
     "warning"
   );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin.ts` around lines 2468 - 2476, After calling
accountManager.markAccountCoolingDown(account, 60000, "network-error") persist
that state so the cooldown survives restarts: immediately invoke the account
manager's persistence/save method (e.g., accountManager.saveAccounts() or
accountManager.persistAccounts()/requestSave() depending on the API) right after
markAccountCoolingDown, before showing the toast and setting
shouldSwitchAccount/lastError, so the timeout penalty is written to storage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin/accounts.test.ts`:
- Around line 1871-1885: The test title and expectation disagree: the spec says
"more than 75%" but the fixture has exactly 75% rate-limited accounts and
expects true; update either the test title or the fixture to match intended
behavior. Either rename the test string to "returns true when 75% or more
accounts are rate-limited" to reflect the current fixture, or change the stored
fixture in the test to make >75% (e.g., mark the fourth account as rate-limited)
so the case exercises AccountManager.shouldRefreshAllQuotas() for strictly
greater than 75%.

---

Nitpick comments:
In `@src/plugin.ts`:
- Around line 2052-2056: The proactive quota-refresh block calling
accountManager.shouldRefreshAllQuotas(...) and
triggerAsyncQuotaRefreshForAll(...) should only run once per endpoint cycle;
wrap that existing block with a guard that ensures it executes only on the first
endpoint attempt (for example check an existing attempt-index or add an
isFirstEndpointAttempt flag and require attemptIndex === 0) so the check and
pushDebug("proactive-quota-refresh: pool is mostly blocked, refreshing
all")/triggerAsyncQuotaRefreshForAll(...) are not executed on every fallback
endpoint try.
- Around line 2468-2476: After calling
accountManager.markAccountCoolingDown(account, 60000, "network-error") persist
that state so the cooldown survives restarts: immediately invoke the account
manager's persistence/save method (e.g., accountManager.saveAccounts() or
accountManager.persistAccounts()/requestSave() depending on the API) right after
markAccountCoolingDown, before showing the toast and setting
shouldSwitchAccount/lastError, so the timeout penalty is written to storage.

In `@src/plugin/accounts.ts`:
- Around line 1184-1193: The comment above the blocked-count check is
inconsistent with the condition: the code uses >= 0.75 (75% or more) but the
comment says “more than 75%”; update the comment to say “75% or more” (or
alternatively change the comparison to > 0.75 if you truly mean strictly greater
than 75%) so the wording matches the implementation in the function that
calculates blockedCount and returns (blockedCount / enabled.length) >= 0.75.
- Around line 1185-1190: The blockedCount computation is overcounting when a
family is provided because isRateLimited checks all acc.rateLimitResetTimes keys
instead of only those relevant to the requested family/model; update the
predicate inside enabled.filter (where blockedCount is computed) so that
isRateLimited only considers rateLimitResetTimes entries that apply to the given
family or model (e.g., filter keys by family/model-specific keys or call a new
helper like isRateLimitedForFamily(account, family, model) ), keep the existing
isCoolingDown and isOverSoftQuota logic, and ensure the check uses the same now
timestamp and threshold arguments as isOverSoftQuotaThreshold to avoid counting
unrelated rate-limit keys as blocked.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 3757ec3 and 203d4db.

📒 Files selected for processing (5)
  • src/plugin.ts
  • src/plugin/accounts.test.ts
  • src/plugin/accounts.ts
  • src/plugin/config/schema.ts
  • src/plugin/quota.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/plugin/config/schema.ts
  • src/plugin/quota.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
🧬 Code graph analysis (3)
src/plugin/accounts.ts (3)
src/plugin/quota.ts (2)
  • QuotaGroup (19-19)
  • QuotaGroupSummary (21-25)
src/plugin/storage.ts (1)
  • ModelFamily (130-130)
src/plugin/transform/types.ts (1)
  • ModelFamily (3-3)
src/plugin/accounts.test.ts (2)
src/plugin/accounts.ts (2)
  • resolveQuotaGroup (239-244)
  • AccountManager (298-1273)
src/plugin/storage.ts (1)
  • AccountStorageV4 (217-225)
src/plugin.ts (3)
src/plugin/accounts.ts (1)
  • AccountManager (298-1273)
src/plugin/quota.ts (1)
  • triggerAsyncQuotaRefreshForAll (407-433)
src/plugin/rotation.ts (1)
  • getHealthTracker (441-446)
🔇 Additional comments (5)
src/plugin/accounts.test.ts (3)

1583-1836: Token-keyed quota cache test migration looks good.

These updates correctly align test setup with updateQuotaCache(refreshToken, quotaGroups) and keep soft-quota scenarios readable.


1864-1867: Good addition for Gemini Pro model-group coverage.

The extra resolveQuotaGroup("gemini", "gemini-3-pro") assertion is a useful guardrail for model-family routing.


1887-1919: Coverage for pool-health heuristic is solid.

The added positive/negative cases for shouldRefreshAllQuotas (soft-quota blocked vs healthy pool) are well targeted.

src/plugin/accounts.ts (1)

1158-1167: Refresh-token keyed quota cache update is a good API correction.

This aligns persistence behavior with the new quota refresh result shape and avoids index-coupling.

src/plugin.ts (1)

2058-2074: Abort/timeout handling improvements look strong.

Combining user abort with per-request timeout and prioritizing user interruption before retry classification is a solid stability fix.

Also applies to: 2455-2478

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

This PR successfully addresses critical reliability issues in the Antigravity plugin by implementing request timeouts, user interruption safety, and proactive quota management.

Key Improvements:

  • Request Timeouts: Adds configurable timeout (default 600s) with 3x multiplier for streaming requests (capped at 30 minutes) to prevent indefinite hangs on stuck accounts
  • Interruption Safety: Implements priority checking for user abort signals before timeout handling, preventing the spin loop that caused 100% CPU usage and memory leaks
  • Proactive Quota Refresh: Automatically refreshes quota pool when 75%+ of accounts are blocked, with 1-minute cooldown to prevent API spam
  • Account Rotation: Timeouts now penalize health scores and trigger automatic rotation with disk persistence for restart resilience
  • Node.js Compatibility: Provides proper polyfills for AbortSignal.any() and AbortSignal.timeout() for Node.js 20.0-20.16

Implementation Quality:

  • Proper error handling with user interruption prioritized over timeout errors
  • RefreshToken-based quota cache updates (more robust than index-based)
  • Comprehensive test coverage including edge cases for the new heuristics
  • Disk persistence of timeout cooldowns ensures resilience across restarts

The code demonstrates solid software engineering with clear separation of concerns, proper async handling, and defensive programming patterns.

Confidence Score: 4/5

  • This PR is safe to merge with low risk - addresses critical reliability issues with well-tested implementations
  • Score reflects thorough implementation of timeout handling and interruption safety with proper polyfills, comprehensive tests, and good error handling. Minor point deducted due to AbortSignal polyfill complexity and the fact that timeout behavior under edge cases (very long streaming requests) could benefit from additional runtime validation
  • Pay attention to src/plugin.ts - the AbortSignal handling logic is critical for preventing deadlocks and memory leaks. Verify the timeout behavior works correctly in your Node.js version

Important Files Changed

Filename Overview
src/plugin/config/schema.ts Added request_timeout_seconds configuration with proper validation (30-3600s range, default 600s)
src/plugin.ts Implemented timeout handling, user interruption safety, and proactive quota refresh with proper AbortSignal polyfills
src/plugin/accounts.ts Enhanced updateQuotaCache to use refreshToken lookup and added shouldRefreshAllQuotas heuristic for pool management
src/plugin/quota.ts Added triggerAsyncQuotaRefreshForAll with cooldown protection and proper error handling

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Request Start] --> B{Create Timeout Signal}
    B -->|Native Support| C[AbortSignal.timeout]
    B -->|Fallback| D[Controller + setTimeout]
    C --> E[Merge with User Signal]
    D --> E
    E --> F[Fetch with Combined Signal]
    F -->|Success| G[Return Response]
    F -->|Error| H{Check User Abort?}
    H -->|Yes| I[Throw Immediately<br/>Prevent Spin Loop]
    H -->|No| J{AbortError/<br/>TimeoutError?}
    J -->|Yes| K[Mark Account Cooling Down<br/>Penalty Health Score]
    K --> L[Persist to Disk]
    L --> M[Rotate to Next Account]
    J -->|No| N[Other Error Handling]
    M --> O{Pool 75%+ Blocked?}
    O -->|Yes| P[Trigger Async Quota Refresh<br/>with 1min Cooldown]
    O -->|No| Q[Continue]
    P --> Q
Loading

Last reviewed commit: 2b9b575

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

* When a request times out, the account is marked unhealthy and the
* plugin automatically rotates to the next available account.
*
* @default 600 (10 minutes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description states default is 180s, but code shows 600s (10 minutes)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin/config/schema.ts
Line: 276

Comment:
PR description states default is 180s, but code shows 600s (10 minutes)

How can I resolve this? If you propose a fix, please make it concise.

- Ensure proactive quota refresh only triggers on the first endpoint attempt
- Persist account cooldown state immediately after a request timeout
- Refine shouldRefreshAllQuotas to use family-specific rate limiting check
- Update test descriptions to match implementation (75% or more)
@9nunya
Copy link
Author

9nunya commented Mar 2, 2026

Damn dude

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin.ts`:
- Around line 2470-2472: Wrap the accountManager.saveToDisk() call in a
try/catch so disk errors cannot abort the timeout-recovery/rotation path; catch
any error from accountManager.saveToDisk(), log it (using the existing logger in
this module), and continue executing the timeout/account rotation logic (i.e.,
do not rethrow or return). Ensure the change is applied where
accountManager.saveToDisk() is called so rotation completes even if saveToDisk
fails.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 203d4db and a4745c1.

📒 Files selected for processing (3)
  • src/plugin.ts
  • src/plugin/accounts.test.ts
  • src/plugin/accounts.ts
📜 Review details
🧰 Additional context used
🧬 Code graph analysis (3)
src/plugin.ts (2)
src/plugin/accounts.ts (1)
  • AccountManager (298-1275)
src/plugin/quota.ts (1)
  • triggerAsyncQuotaRefreshForAll (407-433)
src/plugin/accounts.ts (4)
src/plugin/refresh-queue.ts (1)
  • refreshToken (212-227)
src/plugin/quota.ts (2)
  • QuotaGroup (19-19)
  • QuotaGroupSummary (21-25)
src/plugin/transform/index.ts (1)
  • ModelFamily (9-9)
src/plugin/storage.ts (1)
  • ModelFamily (130-130)
src/plugin/accounts.test.ts (1)
src/plugin/accounts.ts (1)
  • resolveQuotaGroup (239-244)
🔇 Additional comments (5)
src/plugin/accounts.ts (2)

1158-1167: Good move to refresh-token keyed quota cache updates.

Using refreshToken as the lookup key avoids index drift issues and the queued persistence call keeps cache state durable without blocking hot paths.


1174-1195: Heuristic update looks solid.

Including cooldown + soft-quota in shouldRefreshAllQuotas() (with >= 0.75) is the right behavior for proactive pool recovery.

src/plugin/accounts.test.ts (2)

1583-1835: Nice API migration coverage in tests.

The updated updateQuotaCache("refreshToken", ...) usage consistently exercises the new token-keyed behavior across soft-quota and wait-time paths.


1871-1885: Threshold wording now matches expectation.

"75% or more" correctly matches the 3/4 fixture and expected true result.

src/plugin.ts (1)

2052-2074: Great reliability hardening on request control flow.

The proactive quota-refresh hook plus combined timeout/user-abort signaling and early user-abort exit significantly reduce hang/spin risk.

Also applies to: 2455-2460

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/plugin.ts (1)

2066-2069: ⚠️ Potential issue | 🔴 Critical

Guard AbortSignal.timeout() for older supported Node 20.x runtimes.

Line 2066 calls AbortSignal.timeout(...) unconditionally. If your support floor is Node 20.0.0, this can still throw at runtime on older 20.x minors.

💡 Proposed fix
+function createTimeoutSignal(timeoutMs: number): AbortSignal {
+  const controller = new AbortController();
+  const id = setTimeout(() => {
+    const reason = new DOMException("The operation timed out", "TimeoutError");
+    controller.abort(reason);
+  }, timeoutMs);
+  controller.signal.addEventListener("abort", () => clearTimeout(id), { once: true });
+  return controller.signal;
+}
...
-const timeoutSignal = AbortSignal.timeout(effectiveTimeoutMs);
+const timeoutSignal =
+  typeof AbortSignal.timeout === "function"
+    ? AbortSignal.timeout(effectiveTimeoutMs)
+    : createTimeoutSignal(effectiveTimeoutMs);
#!/bin/bash
set -euo pipefail

echo "== Node engine range (if declared) =="
if [ -f package.json ]; then
  jq -r '.engines.node // "engines.node not set"' package.json
fi

echo
echo "== AbortSignal.timeout callsites =="
rg -n 'AbortSignal\.timeout\s*\('

echo
echo "== Existing timeout polyfill/helper presence =="
rg -n 'createTimeoutSignal|TimeoutError|DOMException'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin.ts` around lines 2066 - 2069, The call to
AbortSignal.timeout(effectiveTimeoutMs) is unguarded and can throw on older Node
20.x runtimes; update the logic that builds timeoutSignal/combinedSignal in the
code that uses AbortSignal.timeout, AbortSignal.any, timeoutSignal,
combinedSignal, abortSignal and effectiveTimeoutMs to first check for these
static methods (e.g., typeof AbortSignal.timeout === 'function' and typeof
(AbortSignal as any).any === 'function') and if missing create an equivalent
timeout signal via an AbortController + setTimeout, and if AbortSignal.any is
missing implement a simple combinator that listens to multiple signals and
aborts a controller; then use those safe fallbacks when constructing
combinedSignal so the code works on older Node 20.x versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugin.ts`:
- Around line 90-103: The polyfill for (AbortSignal as any).any currently adds
listeners but never detaches sibling listeners when one signal aborts; update
the any implementation to track each listener (e.g., store a Map<AbortSignal,
EventListener>) when attaching via addEventListener, and in both the
immediate-aborted branch and inside the abort callback call
controller.abort(signal.reason) then iterate over the other signals and call
removeEventListener with the stored listener to detach them, ensuring you still
return controller.signal; also ensure you clean up listeners if the returned
controller.signal is aborted from elsewhere to avoid leaks.

---

Duplicate comments:
In `@src/plugin.ts`:
- Around line 2066-2069: The call to AbortSignal.timeout(effectiveTimeoutMs) is
unguarded and can throw on older Node 20.x runtimes; update the logic that
builds timeoutSignal/combinedSignal in the code that uses AbortSignal.timeout,
AbortSignal.any, timeoutSignal, combinedSignal, abortSignal and
effectiveTimeoutMs to first check for these static methods (e.g., typeof
AbortSignal.timeout === 'function' and typeof (AbortSignal as any).any ===
'function') and if missing create an equivalent timeout signal via an
AbortController + setTimeout, and if AbortSignal.any is missing implement a
simple combinator that listens to multiple signals and aborts a controller; then
use those safe fallbacks when constructing combinedSignal so the code works on
older Node 20.x versions.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a4745c1 and 506698b.

📒 Files selected for processing (1)
  • src/plugin.ts
📜 Review details
🧰 Additional context used
🧬 Code graph analysis (1)
src/plugin.ts (2)
src/plugin/accounts.ts (1)
  • AccountManager (298-1275)
src/plugin/quota.ts (1)
  • triggerAsyncQuotaRefreshForAll (407-433)
🔇 Additional comments (2)
src/plugin.ts (2)

2455-2475: Abort-first check and guarded cooldown persistence look good.

Line 2455’s abort-first exit and Lines 2471-2475’s saveToDisk() guard correctly protect interruption handling and keep timeout rotation resilient to disk failures.


186-187: refreshToken-keyed quota cache update is the right direction.

Using results[0].refreshToken for cache updates aligns with token-identity semantics and avoids stale email/index coupling.

src/plugin.ts Outdated
Comment on lines +90 to +103
(AbortSignal as any).any = function (signals: AbortSignal[]): AbortSignal {
const controller = new AbortController();
for (const signal of signals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
signal.addEventListener(
"abort",
() => {
controller.abort(signal.reason);
},
{ once: true }
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== AbortSignal.any polyfill block =="
sed -n '83,110p' src/plugin.ts

echo
echo "== Check for listener cleanup in polyfill =="
rg -n 'removeEventListener|cleanup|listeners' src/plugin.ts

echo
echo "== Full polyfill function (lines 83-120) =="
sed -n '83,120p' src/plugin.ts

Repository: NoeFabris/opencode-antigravity-auth

Length of output: 2461


Polyfill should clean up sibling listeners after first abort.

When any signal aborts, remaining listeners attached to other signals are never detached. The { once: true } option only removes listeners after they fire, not before. In fallback runtimes (Node 20.0-20.16), repeated requests with overlapping signals cause listeners to accumulate on those signals, creating a memory leak.

Proposed fix
 (AbortSignal as any).any = function (signals: AbortSignal[]): AbortSignal {
   const controller = new AbortController();
+  const listeners = new Map<AbortSignal, () => void>();
+  const cleanup = () => {
+    for (const [s, fn] of listeners) s.removeEventListener("abort", fn);
+    listeners.clear();
+  };
   for (const signal of signals) {
     if (signal.aborted) {
       controller.abort(signal.reason);
       return controller.signal;
     }
-    signal.addEventListener(
-      "abort",
-      () => {
-        controller.abort(signal.reason);
-      },
-      { once: true }
-    );
+    const onAbort = () => {
+      cleanup();
+      controller.abort(signal.reason);
+    };
+    listeners.set(signal, onAbort);
+    signal.addEventListener("abort", onAbort, { once: true });
   }
   return controller.signal;
 };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
(AbortSignal as any).any = function (signals: AbortSignal[]): AbortSignal {
const controller = new AbortController();
for (const signal of signals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
signal.addEventListener(
"abort",
() => {
controller.abort(signal.reason);
},
{ once: true }
);
(AbortSignal as any).any = function (signals: AbortSignal[]): AbortSignal {
const controller = new AbortController();
const listeners = new Map<AbortSignal, () => void>();
const cleanup = () => {
for (const [s, fn] of listeners) s.removeEventListener("abort", fn);
listeners.clear();
};
for (const signal of signals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
const onAbort = () => {
cleanup();
controller.abort(signal.reason);
};
listeners.set(signal, onAbort);
signal.addEventListener("abort", onAbort, { once: true });
}
return controller.signal;
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/plugin.ts` around lines 90 - 103, The polyfill for (AbortSignal as
any).any currently adds listeners but never detaches sibling listeners when one
signal aborts; update the any implementation to track each listener (e.g., store
a Map<AbortSignal, EventListener>) when attaching via addEventListener, and in
both the immediate-aborted branch and inside the abort callback call
controller.abort(signal.reason) then iterate over the other signals and call
removeEventListener with the stored listener to detach them, ensuring you still
return controller.signal; also ensure you clean up listeners if the returned
controller.signal is aborted from elsewhere to avoid leaks.

Comment on lines +2104 to +2111
// Safely create timeout signal with fallback for older Node.js versions
let timeoutSignal: AbortSignal;
if (typeof AbortSignal.timeout === 'function') {
timeoutSignal = AbortSignal.timeout(effectiveTimeoutMs);
} else {
const controller = new AbortController();
timeoutSignal = controller.signal;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout never scheduled in fallback. When AbortSignal.timeout is unavailable (Node 20.0-20.10), this creates a controller that never aborts, making the timeout feature completely non-functional.

Suggested change
// Safely create timeout signal with fallback for older Node.js versions
let timeoutSignal: AbortSignal;
if (typeof AbortSignal.timeout === 'function') {
timeoutSignal = AbortSignal.timeout(effectiveTimeoutMs);
} else {
const controller = new AbortController();
timeoutSignal = controller.signal;
}
let timeoutSignal: AbortSignal;
if (typeof AbortSignal.timeout === 'function') {
timeoutSignal = AbortSignal.timeout(effectiveTimeoutMs);
} else {
const controller = new AbortController();
setTimeout(() => controller.abort(new Error('Timeout')), effectiveTimeoutMs);
timeoutSignal = controller.signal;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin.ts
Line: 2104-2111

Comment:
timeout never scheduled in fallback. When `AbortSignal.timeout` is unavailable (Node 20.0-20.10), this creates a controller that never aborts, making the timeout feature completely non-functional.

```suggestion
                let timeoutSignal: AbortSignal;
                if (typeof AbortSignal.timeout === 'function') {
                  timeoutSignal = AbortSignal.timeout(effectiveTimeoutMs);
                } else {
                  const controller = new AbortController();
                  setTimeout(() => controller.abort(new Error('Timeout')), effectiveTimeoutMs);
                  timeoutSignal = controller.signal;
                }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +137 to +144
for (const signal of activeSignals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
signal.addEventListener("abort", onAbort, { once: true });
}
return controller.signal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing cleanup registration. Unlike the AbortSignal.any polyfill (line 109), this doesn't register cleanup on controller.signal, causing listener leaks if none of the input signals abort.

Suggested change
for (const signal of activeSignals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
signal.addEventListener("abort", onAbort, { once: true });
}
return controller.signal;
for (const signal of activeSignals) {
if (signal.aborted) {
controller.abort(signal.reason);
return controller.signal;
}
signal.addEventListener("abort", onAbort, { once: true });
}
controller.signal.addEventListener("abort", cleanup, { once: true });
return controller.signal;
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin.ts
Line: 137-144

Comment:
missing cleanup registration. Unlike the `AbortSignal.any` polyfill (line 109), this doesn't register cleanup on `controller.signal`, causing listener leaks if none of the input signals abort.

```suggestion
  for (const signal of activeSignals) {
    if (signal.aborted) {
      controller.abort(signal.reason);
      return controller.signal;
    }
    signal.addEventListener("abort", onAbort, { once: true });
  }
  controller.signal.addEventListener("abort", cleanup, { once: true });
  return controller.signal;
```

How can I resolve this? If you propose a fix, please make it concise.

continue;
}

let effectiveTimeoutMs = (config.request_timeout_seconds ?? 600) * 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declared but immediately overwritten at line 2100, making this initialization unnecessary

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugin.ts
Line: 2029

Comment:
declared but immediately overwritten at line 2100, making this initialization unnecessary

How can I resolve this? If you propose a fix, please make it concise.

- Improve AbortSignal.any polyfill with proper listener cleanup
- Add AbortSignal.timeout guard and fallback for older Node.js versions
- Implement mergeAbortSignals helper for safe signal combination
- Add exhaustive listener cleanup on signal abortion
@9nunya 9nunya force-pushed the fix/account-rotation-timeout-hang branch from e5c8544 to 2b9b575 Compare March 2, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant