Skip to content

Conversation

@nathanclevenger
Copy link

Problem

The proxy was failing with:

TypeError [ERR_INVALID_ARG_TYPE]: Bun.serve expects idleTimeout to be 255 or less

PR #12 set idleTimeout: 300 (5 minutes), but Bun requires it to be ≤255 seconds.

Solution

Changed idleTimeout from 300 to 255 seconds (4.25 minutes).

This is still sufficient for long-running Claude API requests while staying within Bun's limit.

Testing

  • Confirms idleTimeout is now within Bun's allowed range
  • Maintains adequate timeout for Claude API requests

Claude Code and others added 26 commits October 14, 2025 12:21
- Changed condition from always() to !cancelled()
- Prevents 'Claude encountered an error' message when workflow is cancelled
- Still posts error messages on actual failures
- Solves issue where concurrent workflow runs trigger error messages unnecessarily
- Add 'assigned' to validActions in validateTrackProgressEvent()
- Add 'assigned' to supportedActions for PR event detection
- Add tests for assigned action with track_progress
- Enables automation workflows that trigger on PR assignment

Fixes: platform workflow failures when auto-assigning PRs after CHANGES_REQUESTED reviews
- Update validation to accept AWS_BEARER_TOKEN_BEDROCK as alternative to full AWS credentials
- AWS_REGION still required
- Supports both authentication methods:
  - Bearer token (simpler): AWS_BEARER_TOKEN_BEDROCK
  - IAM credentials: AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY
- Aligns with Claude Code CLI documentation

References: https://docs.claude.com/en/docs/claude-code/amazon-bedrock
Implements automatic retry with exponential backoff when Claude API returns 429 rate limit errors:

- Detects rate limit errors in CLI output (429, 'Too many requests', 'rate limit', 'throttl')
- Throws retriable error instead of hard failing
- Adds runClaudeWithRetry wrapper with exponential backoff
- Default: 5 attempts with 1min → 2min → 4min → 8min → 16min delays
- Only retries on rate limit errors, fails fast on other errors
- Updates main entrypoint to use runClaudeWithRetry by default

This prevents complete workflow failures when hitting transient API rate limits,
improving reliability of automated code reviews and other Claude-powered workflows.

Resolves: dot-do/platform#7400 (comment)
Strategy to minimize costs while handling rate limits:
1. First attempt: Use AWS Bedrock (we have credits, prefer this)
2. If rate limited: Switch to Anthropic API immediately (costs money but available)
3. If still rate limited: Retry with Anthropic using exponential backoff

Key changes:
- Detect rate limit on first Bedrock attempt
- Switch environment from Bedrock to Anthropic API
- No delay when switching providers (immediate retry)
- Clear logging shows which provider is being used
- TODO added noting AWS rate limit increase has been requested

This prevents long retry loops with Bedrock when we can immediately
fallback to Anthropic API as a secondary provider.
When a code review is submitted via `gh pr review`, delete the initial
"Claude Code is working..." comment instead of updating it to show
"Claude finished @user's task". This reduces noise in PRs with multiple
review iterations.

The logic:
1. Check if this is a PR (not an issue)
2. Check if the bot submitted a review in the last 5 minutes
3. If yes, delete the progress comment instead of updating it
4. If deletion fails, fall back to normal update behavior

This ensures that only the review itself is visible, not both the
review and a separate completion comment.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Addresses code review feedback:
- Extract review detection logic to testable helper function
- Add type guard for review.submitted_at to prevent undefined dates
- Return structured result with review ID and timestamp for better logging
- Make time window explicit (5 minutes) with clear parameter

This improves:
- Testability: Helper function can be unit tested independently
- Type safety: Explicit check for submitted_at before date conversion
- Clarity: Structured return type makes intent clear
- Maintainability: All review detection logic in one place

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Fixes crash when pullRequest.files is null (occurs on large PRs >100 files
or when GitHub API returns null for files field).

## Problem

The code was accessing `pullRequest.files.nodes` without checking if
`files` was null first, causing:
```
TypeError: null is not an object (evaluating 'pullRequest.files.nodes')
```

This happens on PRs with >100 changed files where the GraphQL query
`files(first: 100)` returns null.

## Solution

- Use optional chaining: `pullRequest.files?.nodes || []`
- Add warning log when files data is null
- Gracefully fallback to empty array for changedFiles

## Impact

- Prevents crashes on large PRs
- Reviews can now proceed even without full file list
- User gets clear warning about missing files data

Co-Authored-By: Claude <[email protected]>
…ilover

## Problem

The old code would stick with Anthropic after the first 429, never trying
Bedrock again on subsequent attempts. This wastes Anthropic credits when
Bedrock might work on the next try.

## Correct Behavior

**Each attempt:**
1. ALWAYS try AWS Bedrock first (maximize free credits)
2. If Bedrock returns 429 → IMMEDIATELY try Anthropic (no delay)
3. If Anthropic succeeds → done
4. If both fail → next attempt starts over with Bedrock

**Example flow:**
- Attempt 1: Bedrock (429) → Anthropic (success) ✅
- Attempt 2: Bedrock (429) → Anthropic (success) ✅
- Attempt 3: Bedrock (429) → Anthropic (success) ✅

Every attempt gives Bedrock a chance to work (in case 429 was temporary),
but fails over to Anthropic immediately with zero delay.

## Key Changes

- **Reset to Bedrock** at start of each attempt
- **Zero delays** anywhere
- **Immediate failover** on any 429
- Removed exponential backoff parameters (unused)

## Impact

- 💰 Maximizes AWS Bedrock credit usage (tries every time)
- ⚡ Zero delays on failover (immediate)
- 💰 Only uses Anthropic credits when Bedrock unavailable
- 🚀 Fast retry cycles

Co-Authored-By: Claude <[email protected]>
Replaces complex retry logic with single execution through claude-lb worker proxy.
The proxy handles Bedrock-first routing with immediate HTTP-level Anthropic failover.

Key changes:
- Set ANTHROPIC_BASE_URL to route through claude-lb.dotdo.workers.dev
- Simplify runClaudeWithRetry to single execution (no retry loop)
- Remove CLI-level retry overhead (~177s MCP initialization per retry)
- Proxy handles failover at HTTP level (milliseconds, not minutes)

Benefits:
- Single Claude CLI process (no re-initialization)
- HTTP-level failover (<1s vs 3+ minutes)
- Unified AI Gateway tracking for both providers
- Pass-through authentication (no secrets in worker)
- Bedrock-first routing maintained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Implements local HTTP proxy server that intercepts Claude CLI requests
and adds authentication headers before forwarding to claude-lb worker.

Architecture:
```
Claude CLI
  ↓ ANTHROPIC_BASE_URL=http://127.0.0.1:18765
  ↓ Makes request
Local Proxy Server (this action)
  ↓ Adds headers:
  ↓   x-bedrock-token: $AWS_BEARER_TOKEN_BEDROCK
  ↓   x-anthropic-key: $ANTHROPIC_API_KEY
  ↓ Forwards to https://claude-lb.dotdo.workers.dev
claude-lb Worker (NO secrets!)
  ├─ Try Bedrock with x-bedrock-token
  └─ On 429, instant Anthropic failover with x-anthropic-key
  ↓ Returns response
Local Proxy → Claude CLI
```

Security Benefits:
- 🔒 Zero secrets in worker
- 🔒 Credentials only in GitHub Secrets
- 🔒 Fresh credentials per request
- ⚡ <1 second failover
- 💪 Conversation state preserved

Files Changed:
- base-action/src/proxy-server.ts (new) - Local HTTP proxy
- base-action/src/index.ts - Start proxy and route through it

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Supports 3 modes: bedrock-anthropic, anthropic-only, direct
- Auto-detects mode based on available credentials
- Adds direct Anthropic API fallback if proxy fails
- Includes stats/health endpoint for monitoring
- Maintains secure pass-through auth (no secrets in worker)
- Enables centralized observability via Cloudflare AI Gateway

Fixes: claude-lb proxy now works without Bedrock credentials
This fixes a control flow bug where the /health endpoint was unreachable
because it came AFTER the 404 response for non-/v1/messages URLs.

When a request came in for /health:
1. Line 171 checked: req.url !== '/v1/messages' (true for /health)
2. Line 172 returned 404 immediately
3. Lines 164-168 (health check) were never reached (dead code)

Solution: Move /health endpoint check BEFORE the 404 response, making
it the first check in the request handler.

Fixes automated Claude Code reviews returning 'API Error: 404 Not Found'
Bun.serve requires idleTimeout to be 255 seconds or less.
The previous value of 300 caused: TypeError [ERR_INVALID_ARG_TYPE]:
Bun.serve expects idleTimeout to be 255 or less

This changes the timeout from 5 minutes (300s) to 4.25 minutes (255s),
which is still sufficient for long-running Claude API requests.
const server = Bun.serve({
port: PROXY_PORT,
hostname: '127.0.0.1',
idleTimeout: 255, // Max allowed by Bun (4.25 minutes) for long-running Claude API requests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idleTimeout: 0

will make it unlimited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants