sync: v1.3.11

NikkeTryHard · NikkeTryHard · commit 47f7b732ddd1 · 2026-03-01T21:18:47.000-06:00
diff --git a/README.md b/README.md
@@ -332,7 +332,10 @@ The proxy reads accounts from `accounts.json` in the config directory:
 | `ZEROGRAVITY_SYSTEM_MODE`     | `stealth`     | `stealth` = keep backend prompt, inject user system prompt as override; `minimal` = replace 20KB prompt with minimal identity + user prompt |
 | `ZEROGRAVITY_SENSITIVE_WORDS` | built-in list | Comma-separated client names to obfuscate in requests (zero-width spaces), or `none` to disable                                             |
 | `ZEROGRAVITY_MODEL_ALIASES`   | —             | Map custom model names to built-in models, e.g. `gpt-4o:gemini-3-flash,gpt-4:opus-4.6`. Also configurable via `zg alias` or `aliases.json`  |
-| `ZEROGRAVITY_API_BODY_LIMIT_MB` | `32` (clamped `1..100`) | Max request body size in MiB for API routes (`/v1/*`) | `64` |
+| `ZEROGRAVITY_API_BODY_LIMIT_MB` | `32` (clamped `1..100`) | Max request body size in MiB for API routes (`/v1/*`)                                                                                       |
+| `ZEROGRAVITY_QUOTA_CAP`       | `0.2`         | Per-account quota usage cap (0.0-1.0), triggers rotation. `0` to disable. Also available as `--quota-cap` CLI flag                          |
+| `ZEROGRAVITY_UPSTREAM_PROXY`  | —             | Route all outbound MITM traffic through a proxy (`http://`, `socks5://`, `socks5h://`)                                                      |
+| `ZEROGRAVITY_HTTP_PROXY`      | —             | Pass HTTP/HTTPS proxy settings to the backend child process                                                                                 |
 
 **System prompt mode:** When your client sends a system prompt (e.g. via OpenAI `system` role), ZeroGravity injects it into the request. In `stealth` mode, the backend's identity is stripped and your prompt takes over. In `minimal` mode, the entire 20KB backend prompt is replaced — saves tokens but may trigger rate limiting on Pro models.
 
diff --git a/docs/api.md b/docs/api.md
@@ -265,3 +265,33 @@ curl http://localhost:8741/v1beta/models/gemini-3-flash:generateContent \
 | `GET/POST` | `/`                               | Compatibility root (returns status)   |
 | `POST`     | `/api/event_logging/batch`        | Compatibility event logging endpoint  |
 | `GET/POST` | `/.well-known/{*path}`            | Compatibility well-known endpoint     |
+
+## Behavior Notes
+
+### Default Output Tokens
+
+When a client omits `max_tokens` (Anthropic), `max_completion_tokens` (OpenAI), or `max_output_tokens` (OpenAI Responses), the proxy defaults to **64,000 tokens** -- just below Gemini's 65,536 ceiling. The MITM layer enforces a minimum of 4,096 regardless. This means clients that previously got errors for missing `max_tokens` now receive a sensible default.
+
+### Thinking Budget (Claude Aliases)
+
+Claude aliases (`opus-4.6`, `sonnet-4.6`) are backed by Gemini models. When clients send `budget_tokens` via the Anthropic Messages API, the proxy maps it to Gemini thinking levels:
+
+| budget_tokens | Gemini thinkingLevel |
+|---------------|---------------------|
+| 0             | disabled            |
+| 1 -- 512      | minimal             |
+| 513 -- 1024   | low                 |
+| 1025 -- 4096  | medium              |
+| 4097+         | high                |
+
+Raw integer budgets cause 400 INVALID_ARGUMENT on Gemini 3+ models. The proxy handles this automatically.
+
+### Tool Calling
+
+OpenAI and Anthropic tool/function declarations are translated to Gemini's format. During translation:
+- **Constraint hints** (`minLength`, `maxLength`, `pattern`, `format`, `default`, `examples`) are preserved as description text since Gemini strips these JSON Schema keys
+- **Nullable properties** are removed from `required` arrays
+- **Union types** (e.g. `["string", "array"]`) produce "Accepts: string | array" description hints
+- **Non-standard keys** (`strict`, `x-*` prefixed, etc.) are stripped before sending to Gemini
+
+Gemini-native tool declarations (via `/v1beta/`) pass through with zero translation.
diff --git a/docs/docker.md b/docs/docker.md
@@ -111,9 +111,11 @@ docker run -d --name zerogravity \
 | `ZEROGRAVITY_IDE_VERSION`     | Auto-detected           | Preferred override for reported IDE version          | `1.19.4`                    |
 | `ZEROGRAVITY_CLIENT_VERSION`  | Auto-detected           | Override the client version string                   | `1.15.8`                    |
 | `ZEROGRAVITY_DEVICE_FINGERPRINT` | Auto-detected         | Override reported device fingerprint (UUID required) | `11111111-2222-4333-8444-555555555555` |
-| `ZEROGRAVITY_API_BODY_LIMIT_MB` | `32` (clamped `1..100`) | Max request body size in MiB for API routes (`/v1/*`) | `64`                        |
+| `ZEROGRAVITY_API_BODY_LIMIT_MB` | `32` (clamped `1..100`) | Max request body size in MiB for API routes (`/v1/*`) | `64`                       |
 | `SSL_CERT_FILE`               | System default          | Custom CA certificate bundle path                    | `/etc/ssl/certs/ca.pem`     |
 | `RUST_LOG`                    | `warn` (runtime default) / `info` (`zg docker-init` template) | Log level | `debug`                     |
+| `ZEROGRAVITY_DOH`             | `0` (disabled)          | Enable DNS-over-HTTPS via dns.google (`1` to enable)  | `1`                         |
+| `ZEROGRAVITY_STREAM_IDLE_TIMEOUT_SECS` | `120`          | Stream idle timeout in seconds before closing          | `300`                       |
 
 ### Customization
 
@@ -133,7 +135,7 @@ Serializes generation requests to prevent thundering-herd failures when multiple
 | ------------------------------- | -------- | ---------------------------------------------------------- | -------- |
 | `ZEROGRAVITY_QUEUE_ENABLED`     | `true`   | Set to `false`, `0`, or `no` to disable the queue entirely | `false`  |
 | `ZEROGRAVITY_QUEUE_CONCURRENCY` | `2`      | Max concurrent requests to Google                          | `4`      |
-| `ZEROGRAVITY_QUEUE_INTERVAL_MS` | `300`    | Anti-burst gap between consecutive requests (ms)           | `500`    |
+| `ZEROGRAVITY_QUEUE_INTERVAL_MS` | `0`      | Anti-burst gap between consecutive requests (ms)           | `500`    |
 | `ZEROGRAVITY_QUEUE_TIMEOUT_MS`  | `600000` | Max wait time in queue before HTTP 408                     | `300000` |
 | `ZEROGRAVITY_QUEUE_MAX_SIZE`    | `50`     | Max queue depth; excess requests get HTTP 503              | `100`    |