Cerebras API Proxy Server

This is a Python-based proxy server designed to forward requests to the Cerebras API while managing multiple API keys through a round-robin rotation mechanism. It handles rate limiting (429) and server errors (500) by automatically rotating keys and retrying requests.

Features

Smart API Key Management:
- Sticks with one key until it hits rate limits (429 errors)
- Automatically switches to the next available key only when needed
- Tracks cooldown periods for each key
- Waits and retries if all keys are rate-limited instead of failing immediately
Request Forwarding: Forwards all incoming HTTP requests to the Cerebras API endpoint (https://api.cerebras.ai/v1/).
Dynamic Authorization: Injects the Authorization: Bearer <api_key> header dynamically for each request using the current key.
Intelligent Error Handling:
- Automatically rotates keys on 429 Too Many Requests or 500 Internal Server Error responses
- Marks keys as temporarily unavailable and tracks when they can be retried
- Waits for the next available key instead of immediately failing
Concurrency Support: Built with aiohttp to efficiently handle multiple concurrent requests with thread-safe key rotation.
Status Monitoring: Built-in /_status endpoint to monitor API key health and rotation state.
Request/Response Logging: Optional filesystem logging to save all requests and responses as JSON files for auditing, debugging, or analysis (enabled by default).
Automatic Tool Call Validation: Detects and fixes missing tool responses in chat completion requests by automatically injecting fake "failed" responses to maintain valid conversation flow.

Requirements

Python 3.7 or higher
aiohttp library
Brotli library (required for handling Brotli-compressed responses from Cerebras API)

Install the required libraries:

pip install aiohttp Brotli

Or install from requirements.txt:

pip install -r requirements.txt

Configuration

The proxy server uses environment variables for configuration:

Required Configuration

CEREBRAS_API_KEYS: JSON string containing your Cerebras API keys.

Example JSON format:

{
  "key1": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
  "key2": "sk-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
  "key3": "sk-zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
}

Optional Configuration

CEREBRAS_COOLDOWN: Number of seconds to wait before retrying a rate-limited key (default: 60)

Example:

CEREBRAS_COOLDOWN=90

LOG_REQUESTS: Enable or disable request/response logging (default: true)

Example:

LOG_REQUESTS=false

LOG_DIR: Directory to save request/response logs (default: ./logs)

Example:

LOG_DIR=/var/log/cerebras-proxy

Usage

Set the CEREBRAS_API_KEYS environment variable with your JSON configuration:

export CEREBRAS_API_KEYS='{"key1":"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","key2":"sk-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy","key3":"sk-zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"}'

Run the proxy server:
```
python proxy_server.py
```
By default, the server will start on 127.0.0.1:8080. You can modify the host and port by adjusting the parameters in the run() method call within proxy_server.py.

Docker Setup

This project includes Docker configuration for easy deployment.

Using Docker

Build the Docker image:
```
docker build -t cerebras-proxy .
```

Run the container with your API keys:

docker run -p 18080:8080 -e CEREBRAS_API_KEYS='{"key1":"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","key2":"sk-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"}' cerebras-proxy

Using Docker Compose

Create a .env file in the project root with your API keys:

CEREBRAS_API_KEYS={"key1":"sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx","key2":"sk-yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"}

Build and run with Docker Compose:
```
docker-compose up --build
```

The proxy will be available at http://localhost:18080.

Note: Logs are automatically mounted to ./logs on the host machine for easy access. You can customize logging behavior in the .env file:

LOG_REQUESTS=true
LOG_DIR=/app/logs

Make requests to the proxy server as you would to the Cerebras API, but use the proxy's address instead:

For example, if the Cerebras API endpoint is:
```
POST https://api.cerebras.ai/v1/chat/completions
```
You would send your request to:
```
POST http://127.0.0.1:8080/chat/completions
```
The proxy will handle adding the appropriate Authorization header.

How Smart Rotation Works

Unlike traditional round-robin proxies that switch keys on every request, this proxy uses intelligent key management:

Sticky Keys: The proxy sticks with one key and uses it for all requests until it encounters a problem
Smart Switching: Only rotates to the next key when receiving a 429 (rate limit) or 500 (server error)
Cooldown Tracking: Remembers when each key was rate-limited and won't try it again until the cooldown expires
Automatic Waiting: If all keys are rate-limited, the proxy waits for the soonest available key instead of failing
State Persistence: The current key position is maintained across requests, so you don't always start with key1

Example Flow:

Request 1-100:   Uses key1 (all successful)
Request 101:     key1 hits rate limit → switches to key2
Request 102-200: Uses key2 (all successful)
Request 201:     key2 hits rate limit → switches to key3
Request 202:     key3 hits rate limit → waits for key1 cooldown to expire
Request 203-300: Uses key1 again (cooldown expired)

Monitoring

The proxy provides a built-in status endpoint to monitor API key health and rotation state.

Status Endpoint

Access the status endpoint at:

GET http://localhost:18080/_status

Example response:

{
  "keys": [
    {
      "name": "key1",
      "available": true,
      "rate_limited_for": 0,
      "error_count": 0
    },
    {
      "name": "key2",
      "available": false,
      "rate_limited_for": 45.2,
      "error_count": 3
    }
  ],
  "current_key": "key1"
}

Fields:

available: Whether the key is currently available for use
rate_limited_for: Seconds remaining until the key can be retried (0 if available)
error_count: Number of consecutive errors for this key
current_key: Name of the key currently being used

Request/Response Logging

The proxy includes optional filesystem logging to save all requests and responses for auditing, debugging, or analysis purposes.

Enabling/Disabling Logging

By default, logging is enabled. To disable it:

export LOG_REQUESTS=false

Configuring Log Directory

Logs are saved to ./logs by default. You can specify a custom log directory:

export LOG_DIR=/var/log/cerebras-proxy

Log File Format

Each request/response pair is saved as a separate JSON file with the following naming convention:

YYYYMMDD_HHMMSS_microseconds_METHOD_path_requestid.json

Logs are organized in date-based subdirectories:

logs/
├── 2025-11-06/
│   ├── 20251106_143022_123456_POST_chat_completions_abc123de.json
│   ├── 20251106_143023_234567_POST_chat_completions_xyz789ab.json
│   └── ...
└── 2025-11-07/
    └── ...

Log Entry Structure

Each log file contains:

{
  "timestamp": "2025-11-06T14:30:22.123456",
  "request_id": "abc123de",
  "request": {
    "method": "POST",
    "path": "chat/completions",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": "[REDACTED]"
    },
    "body": {
      "model": "llama3.1-70b",
      "messages": [...]
    }
  },
  "response": {
    "status": 200,
    "headers": {
      "Content-Type": "application/json"
    },
    "body": {
      "id": "chat-...",
      "choices": [...]
    }
  },
  "duration_ms": 1234.56
}

Privacy and Security

Authorization headers are automatically redacted in logs to prevent API key leakage
Binary data is base64-encoded if the body is not valid JSON or UTF-8
Logs are stored locally and never transmitted elsewhere
Consider disk space when enabling logging for high-traffic deployments

Automatic Tool Call Validation

The proxy automatically detects and fixes invalid tool call sequences in chat completion requests. This prevents API errors when clients fail to provide responses for tool calls.

How It Works

When processing /chat/completions requests, the proxy:

Scans the message array for assistant messages containing tool_calls
Tracks pending tool calls that are waiting for responses
Detects missing responses when:
- A tool call is followed by a non-tool message (like a user message)
- A tool call appears at the end of the messages array with no response
Automatically injects fake responses with content: "failed" for each missing tool call
Updates Content-Length header to match the modified request body

Example

Before (Invalid - Would cause 422 error):

{
  "messages": [
    {
      "role": "assistant",
      "tool_calls": [{"id": "call_123", "function": {...}}]
    },
    {
      "role": "user",
      "content": "test"
    }
  ]
}

After (Valid - Automatically fixed by proxy):

{
  "messages": [
    {
      "role": "assistant",
      "tool_calls": [{"id": "call_123", "function": {...}}]
    },
    {
      "role": "tool",
      "tool_call_id": "call_123",
      "content": "failed"
    },
    {
      "role": "user",
      "content": "test"
    }
  ]
}

Logging

When the fix is applied, you'll see log messages:

WARNING:__main__:Found 1 tool_calls without responses. Injecting fake 'failed' responses.
INFO:__main__:Injected fake tool response for tool_call_id: call_123
INFO:__main__:Applied tool_call fix: 2 -> 3 messages (size: 1234 -> 1456 bytes)

This feature is always enabled for all chat completion requests and requires no configuration.

Architecture

This proxy server is built using the aiohttp framework for high performance and concurrency. It consists of two main components:

ApiKeyManager (api_key_manager.py): Manages the pool of API keys with intelligent rotation:
- Tracks each key's state (available vs. rate-limited)
- Maintains the current active key instead of rotating on every request
- Only switches keys when a rate limit (429) or server error (500) is encountered
- Implements automatic wait/retry when all keys are temporarily unavailable
- Uses asyncio.Lock for thread-safe operation across concurrent requests
ProxyServer (proxy_server.py): The main application that listens for HTTP requests:
- Catch-all route forwards requests to the Cerebras API
- Integrates with ApiKeyManager to get the current key and handle rotation
- Automatically retries with the next available key on failures
- Provides a /_status endpoint for monitoring key health

Error Handling

The proxy implements smart error handling with automatic recovery:

Rate Limiting (429): When a key hits rate limits:
1. The key is marked as unavailable with a cooldown period (default 60 seconds)
2. The proxy automatically switches to the next available key
3. The request is retried immediately with the new key
4. After the cooldown period, the key becomes available again
Server Errors (500): Treated similarly to rate limits:
1. Key is marked as temporarily unavailable
2. Automatic rotation to the next available key
3. Request retry with the new key
All Keys Rate-Limited: If all keys are temporarily unavailable:
1. The proxy calculates which key will become available soonest
2. Waits for that cooldown period
3. Automatically retries the request when a key becomes available
4. No immediate 503 error - the proxy handles waiting for you
Other Errors: Non-retryable errors (4xx client errors, network issues) are returned to the client immediately
Maximum Retries: After (number of keys × 2) attempts, returns 503 Service Unavailable

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
api_key_manager.py		api_key_manager.py
docker-compose.yml		docker-compose.yml
proxy_server.py		proxy_server.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cerebras API Proxy Server

Features

Requirements

Configuration

Required Configuration

Optional Configuration

Usage

Docker Setup

Using Docker

Using Docker Compose

How Smart Rotation Works

Monitoring

Status Endpoint

Request/Response Logging

Enabling/Disabling Logging

Configuring Log Directory

Log File Format

Log Entry Structure

Privacy and Security

Automatic Tool Call Validation

How It Works

Example

Logging

Architecture

Error Handling

License

About

Uh oh!

Releases

Packages

Languages

License

janfeddersen-wq/cerebras-proxy

Folders and files

Latest commit

History

Repository files navigation

Cerebras API Proxy Server

Features

Requirements

Configuration

Required Configuration

Optional Configuration

Usage

Docker Setup

Using Docker

Using Docker Compose

How Smart Rotation Works

Monitoring

Status Endpoint

Request/Response Logging

Enabling/Disabling Logging

Configuring Log Directory

Log File Format

Log Entry Structure

Privacy and Security

Automatic Tool Call Validation

How It Works

Example

Logging

Architecture

Error Handling

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages