Quorum

Adaptive RAG evaluation with a Council-of-LLMs — up to 70x cost reduction by routing each test case to the optimal strategy based on risk.

Live Demo — zero setup, no API keys needed. Click "Run Demo" and watch adaptive orchestration in real time.

Overview

RAG systems fail silently. The retriever fetches wrong documents, the generator hallucinates, stale data gets served as truth — and nothing breaks. No error, no alert, just bad information reaching users.

Evaluating every query with a full multi-judge panel costs ~$0.0035/case. At scale this becomes prohibitive — but a single cheap judge misses subtle failures on high-stakes queries.

Quorum scores each test case for risk and routes it to the optimal evaluation strategy, spending budget only where it matters.

How It Works

                    ┌──────────────────┐
                    │   Risk Scorer    │
                    │  (real analysis) │
                    └────────┬─────────┘
                             │
              ┌──────────────┼──────────────┐
              │              │              │
         risk ≥ 0.8    0.4 ≤ risk < 0.8  risk < 0.4
              │              │              │
     ┌────────┴────────┐  ┌──┴───┐    ┌────┴────┐
     │    Council      │  │Hybrid│    │ Single  │
     │ 3 judges + agg. │  │det + │    │ Gemini  │
     │ ~$0.0035/case   │  │1 judge│   │~$0.00005│
     └─────────────────┘  └──────┘    └─────────┘

Council Mode (risk ≥ 0.8)

Medical dosages, legal requirements, safety procedures — queries where errors have real consequences get the full treatment: OpenAI (faithfulness) + Anthropic (groundedness) + Gemini (context relevancy), synthesized by Claude Sonnet 4.

Hybrid Mode (0.4 – 0.8)

Technical explanations, financial advice — zero-cost deterministic checks (Jaccard similarity, entity matching, freshness, completeness) run first, then a single LLM judge validates. Local verdict computation, no aggregator needed.

Single Mode (< 0.4)

"What is the capital of Japan?" — one Gemini call, done. No wasted spend on trivial factoid queries.

Real-Time Streaming

17 SSE event types stream the entire evaluation lifecycle:

risk_scored → strategy_selected → judge_start → judge_complete → aggregator_start → ...

The frontend renders judges appearing staggered with live score animations — council shows 3 judge cards, hybrid shows deterministic checks + 1 judge, single shows 1 judge. All driven by SSE events, not hardcoded layouts.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Frontend (React + Vite)                       │
│  Upload → Strategy Select → Live Streaming → History → Costs    │
└────────────────────────────────┬────────────────────────────────┘
                                 │ SSE + REST
┌────────────────────────────────┴────────────────────────────────┐
│                     Backend (Express + MongoDB)                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │                    Adaptive Router                        │   │
│  │  Risk Scorer → Strategy Selector → Cost Tracker          │   │
│  │       ↓               ↓                ↓                 │   │
│  │  ┌─────────┐  ┌──────────────────────────┐               │   │
│  │  │Determin.│  │      Orchestrator         │               │   │
│  │  │ Checks  │  │  OpenAI │ Anthropic │ Gem │               │   │
│  │  │(0-cost) │  │         ↓                 │               │   │
│  │  └─────────┘  │      Aggregator           │               │   │
│  │               └──────────────────────────┘               │   │
│  └──────────────────────────────────────────────────────────┘   │
│                          ↓                                      │
│                    Webhook Service → Slack / HTTP                │
└─────────────────────────────────────────────────────────────────┘

See ARCHITECTURE.md for detailed system diagrams, data flow, and SSE protocol documentation.

Quick Start

Demo Mode (no dependencies)

git clone https://github.com/AlexLopezGomez/Quorum---Council-LLMs.git && cd Quorum---Council-LLMs
cd frontend && npm ci && npm run build && cd ..
cd backend && npm ci
DEMO_MODE=true node src/index.js

Open http://localhost:3000 — click "Run Demo" to launch a 10-case adaptive evaluation.

No MongoDB, no API keys, no Docker. The demo runs the real orchestration engine with mocked judge I/O at the boundary.

Full Mode

cd backend && cp .env.example .env  # Add OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY
docker run -d -p 27017:27017 mongo:7
npm run dev &
cd ../frontend && npm run dev &

Open http://localhost:5173.

Docker

docker-compose up --build
# Open http://localhost:8080

SDK

Zero-dependency ESM module for capturing RAG interactions:

import { Quorum } from '@quorum/sdk';

const quorum = new Quorum({ endpoint: 'http://localhost:3000' });

quorum.capture({
  input: 'What is the capital of France?',
  actualOutput: 'The capital of France is Paris.',
  retrievalContext: ['Paris is the capital and largest city of France.'],
});

await quorum.close();

Batched transport with exponential backoff, PII sanitization on by default. See sdk/README.md.

CLI

npx quorum test --file test-cases.json    # Run evaluation
npx quorum init                            # Scaffold config
npx quorum validate                        # Validate test case format

API

Method	Path	Purpose
POST	`/api/evaluate`	Start evaluation (accepts strategy + riskOverride)
GET	`/api/stream/:jobId`	SSE stream (replays + live)
GET	`/api/results/:jobId`	Poll for results
GET	`/api/history`	Cursor-paginated history
GET	`/api/history/:jobId/cost`	Cost breakdown with savings estimate
GET	`/api/stats`	Aggregated statistics
GET	`/api/docs`	Swagger UI (interactive docs)

Tech Stack

Backend: Node.js 20+, Express, MongoDB/Mongoose, Zod, SSE Frontend: React 18, Vite 6, TailwindCSS 3, Lucide React SDK: Zero-dependency ESM with native fetch LLM Providers: OpenAI (gpt-4o-mini), Anthropic (claude-3-haiku, claude-sonnet-4), Google (gemini-2.5-flash)

Documentation

ARCHITECTURE.md — System diagrams, data flow, SSE protocol
DECISIONS.md — Architectural decision records
DESIGN_SYSTEM.md — Frontend component patterns
sdk/README.md — SDK integration guide

Research & Benchmarks

Public benchmark results: /benchmarks
Research paper: forthcoming

Contributing

Community contributions are welcome. Start with CONTRIBUTING.md for local setup, architecture notes, and pull request expectations.

Production Monitoring

Monitor live RAG traffic with the council. After any inference call, submit the sample fire-and-forget — it never blocks your production path:

// After your RAG pipeline response
fetch('https://your-quorum.app/api/sample', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': 'Bearer ' + process.env.QUORUM_SERVICE_KEY,
  },
  body: JSON.stringify({
    query,       // the user's question
    response,    // your model's answer
    contexts,    // retrieved context passages (array of strings)
  }),
}).catch(() => {}); // never await, never throw

Quorum samples at 5% by default (configurable via SAMPLE_RATE env var). View the Monitoring dashboard for score trends, baseline comparison, and drift alerts.

Rate limit note: The /api/sample endpoint shares the global 30 RPM limit across all /api routes. At 5% sample rate, you need fewer than 600 RPM of production traffic to stay under the limit.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.agent/skills		.agent/skills
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.cursor		.cursor
.github		.github
.windsurf/skills		.windsurf/skills
backend		backend
cli		cli
demo		demo
docs		docs
frontend		frontend
mintlify		mintlify
sample-data		sample-data
scripts		scripts
sdk		sdk
skills		skills
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.mcp.json		.mcp.json
.node-version		.node-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN_SYSTEM.md		DESIGN_SYSTEM.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PLAN_MODE.md		PLAN_MODE.md
README.md		README.md
SECURITY.md		SECURITY.md
TODOS.md		TODOS.md
VERSION		VERSION
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
docs.json		docs.json
fly.toml		fly.toml
package.json		package.json
render.yaml		render.yaml
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quorum

Overview

How It Works

Council Mode (risk ≥ 0.8)

Hybrid Mode (0.4 – 0.8)

Single Mode (< 0.4)

Real-Time Streaming

Architecture

Quick Start

Demo Mode (no dependencies)

Full Mode

Docker

SDK

CLI

API

Tech Stack

Documentation

Research & Benchmarks

Contributing

Production Monitoring

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quorum

Overview

How It Works

Council Mode (risk ≥ 0.8)

Hybrid Mode (0.4 – 0.8)

Single Mode (< 0.4)

Real-Time Streaming

Architecture

Quick Start

Demo Mode (no dependencies)

Full Mode

Docker

SDK

CLI

API

Tech Stack

Documentation

Research & Benchmarks

Contributing

Production Monitoring

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages