A local API server + Chrome extension that exposes an OpenAI-compatible API by routing requests through real AI chat interfaces in your browser. No API keys needed for the AI providers themselves — it drives the UI directly.
Your code → POST /v1/chat/completions → FastAPI server → WebSocket → Chrome extension → AI chat tab → response
Works with any tool that speaks the OpenAI API: Python openai SDK, LangChain, LlamaIndex, Continue.dev, Cursor, open-webui, curl, Postman — anything.
| Provider | Models |
|---|---|
| Claude (claude.ai) | claude-3-5-sonnet-20241022, claude-3-5-haiku, claude-3-opus, claude-3-sonnet, claude-3-haiku |
| ChatGPT (chat.openai.com) | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o3, o4-mini |
| Gemini (gemini.google.com) | gemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash |
| Copilot (copilot.microsoft.com) | copilot, copilot-gpt-4 |
| Poe (poe.com) | poe-claude-3-5-sonnet, poe-gpt-4o, poe-llama-3, poe-mistral-large |
| HuggingChat (huggingface.co) | Llama-3.1-70B, Mistral-7B, Phi-3-mini, Command-R+ |
| Le Chat (chat.mistral.ai) | mistral-large, mistral-medium, codestral |
| You.com | you-gpt-4o, you-claude-3-5 |
| Perplexity | perplexity-pro |
| Grok (grok.com) | grok-2, grok-2-mini |
- Python 3.10+
- Google Chrome with Developer Mode enabled
- Accounts on whichever AI sites you want to use (free tiers work)
cd api-server
pip install -r requirements.txt
python main.pyServer starts at http://localhost:8000. Your API key is printed on first run — it is also visible at the dashboard.
- Open
chrome://extensions/ - Enable Developer mode (top right toggle)
- Click Load unpacked
- Select the
extension/folder
The extension icon appears in your toolbar. Click it to see connection status and your API key.
Open at least one of the supported sites in Chrome and make sure you are logged in. The extension will find and use that tab automatically.
from openai import OpenAI
client = OpenAI(
api_key="bb-sk-your-key-here",
base_url="http://localhost:8000/v1"
)
# Chat completions
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Hello"}]
)
print(response.choices[0].message.content)
# Responses API (also supported)
response = client.responses.create(
model="gpt-4.1-mini",
input="Hello"
)
print(response.output[0].content[0].text)# Single message
python test_client.py --prompt "What is 2+2?"
# Interactive chat with persistent memory
python test_client.py --chat --model gpt-4.1-mini
# Use a specific model
python test_client.py --chat --model claude-3-5-sonnet-20241022
# With a system prompt
python test_client.py --chat --system "You are a cybersecurity expert"
# Resume a previous session
python test_client.py --chat --session my-session
# List all saved sessions
python test_client.py --sessions
# Clear a session
python test_client.py --clear --session my-session$res = Invoke-RestMethod `
-Uri "http://localhost:8000/v1/chat/completions" `
-Method POST `
-Headers @{ "Authorization" = "Bearer bb-sk-your-key-here" } `
-ContentType "application/json" `
-Body '{"model":"gpt-4.1-mini","messages":[{"role":"user","content":"Hello"}]}'
$res.choices[0].message.contentcurl http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer bb-sk-your-key-here" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4.1-mini","messages":[{"role":"user","content":"Hello"}]}'Open http://localhost:8000/dashboard in any browser while the server is running.
| Tab | What it does |
|---|---|
| Overview | Live connection status, active key, supported sites |
| API Keys | Generate single or bulk keys in 10 different styles |
| Models | Browse all 45+ supported models |
| Test API | Fire a live request directly from the browser |
Keys are fake — they exist only for local auth and tool compatibility. Generate keys in any style to match whatever tool you are integrating with:
# Single key
curl -X POST http://localhost:8000/v1/keys \
-H "Content-Type: application/json" \
-d '{"style":"openai","label":"my-tool"}'
# Bulk — multiple styles at once
curl -X POST http://localhost:8000/v1/keys/bulk \
-H "Content-Type: application/json" \
-d '{"keys":[{"style":"openai","count":3},{"style":"anthropic","count":2}]}'Supported styles: browser-bridge, openai, anthropic, google, huggingface, mistral, cohere, copilot, perplexity, grok
The extension uses sessionStorage fingerprinting to track every tab it opens:
- A new request comes in for a model (e.g.
gpt-4o) - Background script checks its registry for an existing tab on
chat.openai.com - If a tab exists, it asks the content script: "Is this tab ours, and has it been used?"
- Ours + unused → reuse the tab as-is, inject prompt directly
- Ours + used → click "New Chat" inside the same tab, then inject
- Not ours → open a brand new tab (won't interfere with your own browsing)
- If no tab exists → open a new tab, stamp it, inject prompt
This means you can keep browsing normally — the tool only ever uses tabs it opened itself.
pipeline.py provides input/output filtering and persistent session context for the test client:
- Input filter — strips null bytes, normalises whitespace
- Output filter — removes scraped UI artifacts (buttons, labels, navigation text that sometimes leaks into DOM scraping)
- Session context — per-session chat history stored in
session_context.json, survives restarts
| Method | Path | Description |
|---|---|---|
| GET | / |
Health check |
| GET | /dashboard |
Web dashboard |
| GET | /v1/status |
JSON status (extension connected, keys, models) |
| GET | /v1/models |
List all models |
| POST | /v1/chat/completions |
OpenAI chat completions |
| POST | /v1/completions |
Legacy text completions |
| POST | /v1/responses |
OpenAI Responses API |
| POST | /v1/embeddings |
Stub embeddings (zero vectors) |
| POST | /v1/keys |
Generate a key |
| POST | /v1/keys/bulk |
Generate multiple keys |
| GET | /v1/keys |
List all keys |
| DELETE | /v1/keys/{key} |
Revoke a key |
| WS | /ws/extension |
Extension WebSocket |
browser-api-bridge/
├── api-server/
│ ├── main.py # FastAPI server, all endpoints
│ ├── websocket_manager.py # WebSocket connection to extension
│ ├── queue_manager.py # Serial request queue
│ ├── keygen.py # Fake API key generator
│ ├── pipeline.py # Input/output filters + session context
│ ├── test_client.py # CLI test client
│ ├── dashboard.html # Web dashboard (served at /dashboard)
│ └── requirements.txt
└── extension/
├── manifest.json # MV3 manifest
├── background.js # Service worker, WebSocket, tab registry
├── content.js # DOM automation, fingerprinting
├── popup.html # Extension popup
└── popup.js
- Responses depend on the AI site's UI — if a site updates its DOM structure, selectors may need updating
- One request at a time (serial queue) — parallel requests are queued and processed in order
- Requires the browser to stay open while the server is running
- Free tier rate limits of the AI sites still apply
MIT