|
1 | 1 | # Gcli-Nexus |
2 | 2 |
|
3 | | -> High-performance Gemini CLI reverse proxy that talks to the raw Cloud Code Gemini endpoints while presenting Gemini-native responses. |
| 3 | +[](https://github.com/Yoo1tic/gcli-nexus/releases/latest) |
| 4 | +[](LICENSE) |
4 | 5 |
|
5 | | -## Highlights |
| 6 | +**Gcli-Nexus is a high-performance Rust adapter that bridges Gemini CLI (Cloud Code) to the Standard Gemini API.** |
6 | 7 |
|
7 | | -- **Gemini-native proxy for the official CLI**: accepts `/v1beta/models/{model}:generateContent` and `:streamGenerateContent` payloads from `geminicli` while converting upstream CLI envelopes back into the standard Gemini response shape. |
8 | | -- **SSE-friendly normalization**: streaming events land in the Gemini-native `candidates/usageMetadata/modelVersion` shape so dashboards and SDKs can consume them directly. |
9 | | -- **Credential pool with actor scheduling**: a `ractor`-driven worker manages Google OAuth credentials stored in SQLite, separates “big” and “tiny” model queues, cools down projects that hit 429, and refreshes tokens only when a credential is near expiry or fails authentication. |
10 | | -- **Operable out of the box**: `.env` configuration via Figment/dotenvy, SQLite (`data.db`) that bootstraps automatically, structured tracing, and a `mimalloc` global allocator for predictable latency. |
11 | | -- **One-click browser auth**: hitting `/auth` in a browser jumps straight to Google OAuth for login/consent. |
| 8 | +It acts as a headless protocol bridge: feeding raw GCP service accounts turns into a drop-in `/v1beta/models` interface. It normalizes proprietary CLI streams into standard JSON/SSE, compatible with LangChain, curl, and modern AI clients. |
12 | 9 |
|
13 | | -## Quick start |
| 10 | +### Highlights |
| 11 | + |
| 12 | +- **Protocol Standardization**: Exposes native Gemini API endpoints (`:generateContent`, `:streamGenerateContent`) backed by Cloud Code credentials. |
| 13 | +- **Actor-Driven Concurrency**: Built on `ractor` for zero-lock scheduling, enabling high throughput with minimal resource overhead. |
| 14 | +- **Headless & Self-Healing**: Zero management UI. Traffic automatically scrubs invalid tokens and repairs the pool asynchronously. |
| 15 | +- **Hot-Swapping**: Scale capacity instantly via the `/auth` endpoint without restarting the service or dropping connections. |
| 16 | +- **Portable**: Ships as a single static binary (Linux/macOS/Windows) or a lightweight Docker container. |
| 17 | + |
| 18 | +## API Endpoints |
| 19 | + |
| 20 | +Authentication requires the `x-goog-api-key` header (or `?key=` query parameter). |
| 21 | + |
| 22 | +| Endpoint | Method | Auth | Description | |
| 23 | +| :--------------------------------------------- | :----- | :--- | :---------------------------------------------------------------- | |
| 24 | +| `/v1beta/models/{model}:generateContent` | `POST` | ✅ | **Core Interface**. Standard chat completion (unary). | |
| 25 | +| `/v1beta/models/{model}:streamGenerateContent` | `POST` | ✅ | **Core Interface**. Standard chat completion (streaming). | |
| 26 | +| `/v1beta/models` | `GET` | ✅ | Lists supported models in standard Gemini JSON format. | |
| 27 | +| `/v1beta/openai/models` | `GET` | ✅ | Lists supported models in OpenAI-compatible format. | |
| 28 | +| `/auth` | `GET` | ❌ | **Hot-Swapping**. Initiates OAuth flow to inject new credentials. | |
| 29 | +| `/oauth2callback` | `GET` | ❌ | Internal callback handler for Google OAuth redirects. | |
| 30 | + |
| 31 | +## Quick Start |
14 | 32 |
|
15 | 33 | ### Prerequisites |
16 | 34 |
|
17 | | -- Google Cloud projects that already have Gemini CLI access; export each account as the JSON blob that contains `client_id`, `client_secret`, `refresh_token`, `project_id`, etc. |
18 | | -- For the prebuilt binary: Linux host with SQLite available (no Rust toolchain required). |
19 | | -- For containers: Docker + docker compose. Building from source remains possible with Rust 1.78+ if needed. |
20 | | - |
21 | | -### Run the prebuilt binary |
22 | | - |
23 | | -1. Copy the sample environment file and fill in secrets: |
24 | | - ```bash |
25 | | - cp .env.example .env |
26 | | - # edit NEXUS_KEY plus (optionally) DATABASE_URL, BIGMODEL_LIST, PROXY... |
27 | | - ``` |
28 | | -2. Drop every Gemini credential JSON into the folder referenced by `CRED_PATH` (default `./credentials`). On startup the actor will normalize, refresh, and persist them into SQLite. Additions today require a restart to be ingested. |
29 | | -3. Download the latest release binary for your platform, make it executable, and run it from the project root: |
30 | | - ```bash |
31 | | - chmod +x gcli-nexus |
32 | | - ./gcli-nexus |
33 | | - ``` |
34 | | - The server binds `0.0.0.0:8188`. Logs reveal how many credentials were activated and whether a proxy is in use. |
35 | | - |
36 | | -### Run with docker compose |
37 | | - |
38 | | -1. Copy the compose template and set secrets: |
39 | | - ```bash |
40 | | - cp docker-compose.yml.example docker-compose.yml |
41 | | - # edit NEXUS_KEY and other options in docker-compose.yml |
42 | | - ``` |
43 | | -2. Ensure local folders exist for persistence and credentials: |
44 | | - ```bash |
45 | | - mkdir -p data credentials |
46 | | - # place credential JSON files under ./credentials |
47 | | - ``` |
48 | | -3. Start the stack: |
49 | | - ```bash |
50 | | - docker compose up -d |
51 | | - ``` |
52 | | - The service listens on `0.0.0.0:8188` and stores SQLite data under `./data`. |
| 35 | +- **Google Account**: A Google account with access to Gemini CLI (Cloud Code). |
| 36 | +- **Environment**: |
| 37 | +- **Docker** (Recommended) for containerized deployment. |
| 38 | +- **Linux Host** with SQLite if running the binary directly. |
53 | 39 |
|
54 | | -## Configuration |
| 40 | +### 1. Start the Service |
| 41 | + |
| 42 | +You can start Gcli-Nexus immediately with an empty credential pool. |
| 43 | + |
| 44 | +#### Option A: Docker Compose (Recommended) |
| 45 | + |
| 46 | +1. **Setup Directories**: |
| 47 | + |
| 48 | +```bash |
| 49 | +mkdir -p gcli-nexus/data |
| 50 | +cd gcli-nexus |
| 51 | +``` |
| 52 | + |
| 53 | +2. **Create Compose File**: |
| 54 | + Copy `docker-compose.yml.example` or create a new `docker-compose.yml`: |
| 55 | + |
| 56 | +3. **Launch**: |
| 57 | + |
| 58 | +```bash |
| 59 | +docker compose up -d |
| 60 | +``` |
| 61 | + |
| 62 | +#### Option B: Prebuilt Binary |
| 63 | + |
| 64 | +1. **Prepare Environment**: |
| 65 | + |
| 66 | +```bash |
| 67 | +cp .env.example .env |
| 68 | +# Edit .env to set NEXUS_KEY and MODEL_LIST |
| 69 | +``` |
| 70 | + |
| 71 | +2. **Run**: |
| 72 | + |
| 73 | +```bash |
| 74 | +chmod +x gcli-nexus |
| 75 | +./gcli-nexus |
| 76 | +``` |
| 77 | + |
| 78 | +The server binds to `0.0.0.0:8188` by default. |
| 79 | + |
| 80 | +### 2. Onboard Credentials (Instant & Dynamic) |
| 81 | + |
| 82 | +Gcli-Nexus supports **Hot-Swapping**. You can add credentials at runtime without restarting the service. |
55 | 83 |
|
56 | | -| Env var | Required | Default | Description | |
57 | | -| ------------------------------------ | -------- | ------------------ | --------------------------------------------------------------------------------------------------------------------- | |
58 | | -| `NEXUS_KEY` | Yes | _none_ | Shared secret checked on every request via `x-goog-api-key`, `Authorization: Bearer`, or `?key=`. | |
59 | | -| `DATABASE_URL` | No | `sqlite://data.db` | SQLite DSN; the actor creates the file/migrations automatically. | |
60 | | -| `LOGLEVEL` | No | `info` | Tracing level (`error`, `warn`, `info`, `debug`, `trace`). `RUST_LOG` still works as a fallback. | |
61 | | -| `BIGMODEL_LIST` | No | `[]` | JSON array of model names treated as “big”. They get their own queue/cooldown bucket to avoid starving lighter chats. | |
62 | | -| `CRED_PATH` | No | unset | Directory that is scanned once during startup for credential JSON; leave unset to rely purely on SQLite contents. | |
63 | | -| `OAUTH_TPS` | No | `10` | OAuth refresh requests per second; refresh buffer/burst sizes are derived as `OAUTH_TPS * 2`. | |
64 | | -| `GEMINI_RETRY_MAX_TIMES` | No | `3` | Max retry attempts for Gemini CLI upstream calls. | |
65 | | -| `ENABLE_MULTIPLEXING` | No | `false` | Allow outbound reqwest clients to use HTTP/2 multiplexing; keep `false` to force HTTP/1-only behavior. | |
66 | | -| `PROXY` | No | unset | Outbound HTTP proxy applied to both the Gemini caller and the OAuth refresh client (supports HTTP/SOCKS). | |
67 | | -| `DATABASE_URL`, `PROXY`, `CRED_PATH` | — | — | Accept absolute or relative paths; Figment merges `.env` values automatically. | |
| 84 | +#### Method A: Browser-Based Auto Ingestion (Easiest) |
68 | 85 |
|
69 | | -### Credential lifecycle |
| 86 | +1. Navigate to `http://<your-server-ip>:8188/auth` in your browser. |
| 87 | +2. Complete the Google OAuth login flow. |
| 88 | +3. **Done.** The credential is automatically captured, persisted to SQLite, and **immediately injected** into the scheduling queue. |
| 89 | +4. Repeat for as many accounts as needed. |
70 | 90 |
|
71 | | -1. **Ingestion**: Each JSON file is parsed via `GoogleCredential::from_payload`, refreshed immediately, and upserted into SQLite. Duplicate `project_id`s are replaced atomically. |
72 | | -2. **Queues**: Active credentials are pushed into both the “big” and “tiny” queues; requests choose a queue based on whether `model` matches `BIGMODEL_LIST`. |
73 | | -3. **Rate limits**: When a 429 response contains `quotaResetTimeStamp`, the actor parks the credential for that many seconds before putting it back in queue. |
74 | | -4. **Refresh flow**: 401/403 responses trigger `ReportInvalid` → refresh pipeline → DB update → re-enqueue. Failing refreshes disable the credential (status=false). |
75 | | -5. **Persistence**: Because the DB is authoritative, restarts reuse the latest access tokens/expiry timestamps without re-reading every JSON file. |
| 91 | +#### Method B: Manual JSON File (Legacy) |
76 | 92 |
|
77 | | -## API usage |
| 93 | +If you already have credential JSON files (containing `project_id` and `refresh_token`), place them into the `credentials/` directory. |
78 | 94 |
|
79 | | -### Authentication |
| 95 | +- **Docker**: Place files in the mapped `./credentials` volume. |
| 96 | +- **Binary**: Place files in the directory referenced by `CRED_PATH`. |
80 | 97 |
|
81 | | -- Send `x-goog-api-key: <NEXUS_KEY>` (preferred). |
82 | | -- Or append `?key=<NEXUS_KEY>` to the request URL. |
83 | | -- Visit `/auth` in a browser to be redirected to Google OAuth for login/consent. |
| 98 | +_Note: Files added manually usually require a restart to be ingested, whereas Method A is instant._ |
| 99 | + |
| 100 | +### Credential JSON Format (For Method B) |
| 101 | + |
| 102 | +```json |
| 103 | +{ |
| 104 | + "project_id": "my-gcp-project", |
| 105 | + "refresh_token": "1//0gExampleRefreshToken" |
| 106 | +} |
| 107 | +``` |
84 | 108 |
|
85 | | -### Generate content (non-streaming) |
| 109 | +_Only `project_id` and `refresh_token` are strictly required. Missing fields (like `access_token`) are automatically filled during the first refresh._ |
| 110 | + |
| 111 | +### Usage |
| 112 | + |
| 113 | +Gcli-Nexus exposes a standard Gemini-compatible surface. |
| 114 | + |
| 115 | +**Generate Content:** |
86 | 116 |
|
87 | 117 | ```bash |
88 | 118 | curl -X POST http://localhost:8188/v1beta/models/gemini-2.5-pro:generateContent \ |
89 | 119 | -H "x-goog-api-key: $NEXUS_KEY" \ |
90 | 120 | -H "Content-Type: application/json" \ |
91 | 121 | -d '{ |
92 | | - "contents":[{"role":"user","parts":[{"text":"hello from gcli-nexus"}]}] |
| 122 | + "contents":[{"role":"user","parts":[{"text":"Hello World"}]}] |
93 | 123 | }' |
| 124 | + |
94 | 125 | ``` |
95 | 126 |
|
96 | | -### Streaming |
| 127 | +## Configuration |
97 | 128 |
|
98 | | -```bash |
99 | | -curl --no-buffer -X POST \ |
100 | | - http://localhost:8188/v1beta/models/gemini-2.5-pro:streamGenerateContent \ |
101 | | - -H "x-goog-api-key: $NEXUS_KEY" \ |
102 | | - -H "Content-Type: application/json" \ |
103 | | - -d '{"contents":[{"role":"user","parts":[{"text":"stream"}]}]}' |
104 | | -``` |
| 129 | +| Env var | Required | Default | Description | |
| 130 | +| ------------------------ | -------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------- | |
| 131 | +| `LOGLEVEL` | No | `info` | Logging verbosity for tracing (e.g. `error`, `warn`, `info`, `debug`, `trace`). | |
| 132 | +| `LISTEN_ADDR` | No | `0.0.0.0` | HTTP server listen address. | |
| 133 | +| `LISTEN_PORT` | No | `8188` | HTTP server listen port. | |
| 134 | +| `NEXUS_KEY` | Yes | `pwd` | Required Nexus API key used to authorize inbound requests. | |
| 135 | +| `MODEL_LIST` | No | `"[gemini-2.5-flash, gemini-2.5-pro, gemini-3-pro-preview]"` | JSON array of Gemini models. | |
| 136 | +| `CRED_PATH` | No | `./credentials` | Optional directory containing credential JSON files to preload. | |
| 137 | +| `OAUTH_TPS` | No | `5` | OAuth refresh requests per second (TPS) for the refresh worker. | |
| 138 | +| `ENABLE_MULTIPLEXING` | No | `false` | Allow reqwest clients to use HTTP/2 multiplexing. Leave `false` to force HTTP/1. | |
| 139 | +| `GEMINI_RETRY_MAX_TIMES` | No | `3` | Max retry attempts for Gemini CLI upstream calls. | |
| 140 | +| `PROXY` | No | unset | Optional outbound HTTP proxy (`scheme://user:pass@host:port`). Remove if unused. | |
| 141 | + |
| 142 | +## Technical Details |
| 143 | + |
| 144 | +### 1. Dynamic Scalability (Hot-Swapping) |
| 145 | + |
| 146 | +Adding capacity is instantaneous. |
| 147 | + |
| 148 | +- **Zero-Touch Ingestion**: Visit the `/auth` endpoint to authenticate a new account. The credential is automatically persisted to the database and **immediately injected** into the scheduling loop. |
| 149 | +- **No Restarts**: Scale your pool from 1 to 1,000 credentials at runtime without dropping a single connection. |
| 150 | + |
| 151 | +### 2. Traffic-Driven Maintenance |
| 152 | + |
| 153 | +We don't run expensive background cron jobs to check for expired tokens. Instead, we use live traffic as a probe. |
| 154 | + |
| 155 | +- **Lazy Self-Healing**: A credential's validity is verified only when a request hits the proxy. Invalid tokens (401/403) are instantly quarantined and repaired asynchronously. |
| 156 | +- **Auto-Convergence**: The higher the concurrency, the faster the system converges to a clean state. |
| 157 | + |
| 158 | +### 3. Zero-Lock Concurrency |
| 159 | + |
| 160 | +Built on the **Actor Model (Ractor)**, Gcli-Nexus eliminates the mutex contention that plagues traditional multi-threaded proxies. |
105 | 161 |
|
106 | | -### Error semantics |
| 162 | +- **In-Memory Scheduling**: The critical path (Client -> Actor -> Client) is purely single-threaded and non-blocking. |
| 163 | +- **Decoupled IO**: Database writes (SQLite WAL) and OAuth refreshes are offloaded to detached workers, ensuring the proxy latency remains stable under load. |
107 | 164 |
|
108 | | -- `401/403` from upstream map to a temporary `502/500` locally after a refresh attempt; the credential is refreshed before reuse. |
109 | | -- `429` returns upstream headers/body untouched while the offending credential cools down. |
110 | | -- `503` with `{"error":"no available credential"}` means all queues are empty or cooling—add more credentials or wait for cooldowns. |
| 165 | +### 4. Precision Rate Limiting |
111 | 166 |
|
112 | | -## Operations |
| 167 | +Handling upstream Rate Limits (429) is a scheduling problem, not an error handling problem. |
113 | 168 |
|
114 | | -- **Logging**: Structured tracing goes to stdout; set `LOGLEVEL=debug` for detailed actor logs (queue lengths, refresh states). Use `RUST_LOG` for per-module overrides. |
115 | | -- **Database**: `data.db` lives at the path inside `DATABASE_URL`; backup the file periodically if you care about history. |
116 | | -- **Proxying**: Set `PROXY` (e.g. `http://127.0.0.1:1080`) if your network requires outbound proxying; both Gemini traffic and OAuth refresh calls use it. |
117 | | -- **Credential rotation**: Update the JSON file, restart the binary, or seed SQLite manually; the actor upserts by `project_id`. |
118 | | -- **Security**: Treat `.env`, `credentials/*.json`, and `data.db` as sensitive—they contain refresh and access tokens. |
| 169 | +- **The Waiting Room**: Rate-limited credentials are parked in a **Binary Heap**. |
| 170 | +- **O(1) Wakeups**: We strictly avoid polling. Credentials are reclaimed into the active queue at the exact millisecond their quota resets. |
119 | 171 |
|
120 | 172 | ## License |
121 | 173 |
|
|
0 commit comments