LLM Council is a local-first web app that runs the same prompt through multiple Ollama models, has them critique and rank each other, then produces a final synthesized answer from a designated chairman model.
This project extends the multi-model collaboration direction explored by @karpathy.
The council-style workflow, peer critique, and chairman synthesis pattern were inspired by that line of thinking, with a modified implementation and hybrid local/cloud runtime built independently in this repository.
- Runs one user prompt across multiple models (
COUNCIL_MODELS) - Shows each model's raw answer (Stage 1)
- Runs anonymized peer ranking across model outputs (Stage 2)
- Streams the chairman's final synthesis token-by-token (Stage 3)
- Stores conversations as JSON files on disk
- Auto-generates conversation titles from the first user message
LLM Council uses a hybrid runtime approach: users can choose either local Ollama or Ollama Cloud for the same council workflow.
OLLAMA_MODE=local: queries your local Ollama server (default:http://localhost:11434)OLLAMA_MODE=cloud: queries Ollama Cloud endpoint (default:https://ollama.com/api/chat)
- Local mode: better privacy, lower per-request cost, works offline once models are available.
- Cloud mode: easier access to hosted models without local GPU constraints.
- Same UI and same 3-stage council logic in both modes.
- You select one mode at a time via
.env. COUNCIL_MODELSmust match the selected mode.- In local mode, models must be installed in your local Ollama instance.
- In cloud mode, model IDs must be valid cloud-available model IDs for your account.
- If
OLLAMA_MODE=cloudandOLLAMA_CLOUD_API_KEYis missing, requests will fail and the UI shows a warning.
The app does not simply pick one “winning” model answer. Instead, it runs a multi-stage deliberation and synthesis process.
- The user prompt is sent to every model in
COUNCIL_MODELS. - Responses are collected in parallel.
- Failed model calls are skipped; successful outputs are kept.
Output: a list of raw model responses (model, response).
- Stage 1 responses are anonymized as
Response A,Response B, etc. - Each model receives all anonymized responses and evaluates strengths and weaknesses of each response.
- Each model then provides a strict final ranking from best to worst.
- Rankings are parsed from each model output.
- Aggregate ranking is computed by averaging rank position across reviewers.
Output:
- per-model ranking writeups (
stage2) - parsed rankings
- aggregate ranking table (average rank, lower is better)
This is the “voting” step: every model acts as a reviewer/judge of the full response set.
- The chairman model receives the original user question, all Stage 1 responses, and all Stage 2 ranking analyses.
- It generates a final synthesized answer that aims to combine the strongest points and resolve disagreements.
- The UI streams this final answer token-by-token over SSE (
/message/stream).
Decision rule in practice:
- The final answer shown to the user is the chairman synthesis, not a raw top-ranked Stage 1 response.
- Stage 2 voting influences the chairman prompt context.
- If chairman synthesis fails, the backend falls back to the first available Stage 1 response.
- Backend: FastAPI, httpx, uvicorn, python-dotenv
- Frontend: React 19 + Vite + react-markdown
- Runtime: Ollama local or Ollama Cloud
- Storage: JSON files in
data/conversations/
- Python 3.10+
- uv
- Node.js 18+ and npm
- Ollama (local mode) or Ollama Cloud API key (cloud mode)
make installEquivalent manual install:
uv sync
cd frontend && npm installCreate a .env file in the repo root.
Local mode example:
OLLAMA_MODE=local
OLLAMA_API_URL=http://localhost:11434
COUNCIL_MODELS=model-a,model-b,model-c
CHAIRMAN_MODEL=model-b
TITLE_MODEL=model-aCloud mode example:
OLLAMA_MODE=cloud
OLLAMA_CLOUD_API_URL=https://ollama.com/api/chat
OLLAMA_CLOUD_API_KEY=your_api_key_here
COUNCIL_MODELS=your-cloud-model-1,your-cloud-model-2
CHAIRMAN_MODEL=your-cloud-model-1
TITLE_MODEL=your-cloud-model-1ollama serve
# Pull the models you configured in COUNCIL_MODELS
ollama pull model-a
ollama pull model-b
ollama pull model-cSwitching modes later is just updating .env and restarting the backend.
make dev- Backend:
http://localhost:8001 - Frontend:
http://localhost:5173
make dev: run backend + frontend togethermake backend: run backend only (uvicornon port 8001)make frontend: run frontend only (Vite on port 5173)make install: install backend + frontend dependenciesmake clean: remove local build caches
| Variable | Description | Default |
|---|---|---|
OLLAMA_MODE |
local or cloud |
local |
OLLAMA_API_URL |
Local Ollama base URL (normalizes /api/chat or /api/generate) |
http://localhost:11434 |
OLLAMA_CLOUD_API_URL |
Ollama Cloud chat endpoint | https://ollama.com/api/chat |
OLLAMA_CLOUD_API_KEY |
API key used when OLLAMA_MODE=cloud |
empty |
COUNCIL_MODELS |
Comma-separated model list used in Stage 1 and Stage 2 | set by user |
CHAIRMAN_MODEL |
Preferred Stage 3 synthesis model | first model in COUNCIL_MODELS |
TITLE_MODEL |
Model used for first-message conversation title generation | CHAIRMAN_MODEL |
DATA_DIR |
Conversation storage directory | data/conversations |
GET /: health checkGET /api/runtime-config: active runtime config exposed to UIGET /api/conversations: list conversation metadataPOST /api/conversations: create conversationGET /api/conversations/{id}: fetch one conversationPOST /api/conversations/{id}/message: run full 3-stage pipelinePOST /api/conversations/{id}/message/stream: stream stage events (SSE)
- In cloud mode,
COUNCIL_MODELSmust be valid cloud-available model IDs for your account. - Stage 3 streaming is implemented with Server-Sent Events (SSE).
- Conversation files are plain JSON and can be inspected directly under
data/conversations/.
