██████╗███████╗███╗ ██╗████████╗██╗███╗ ██╗███████╗██╗
██╔════╝██╔════╝████╗ ██║╚══██╔══╝██║████╗ ██║██╔════╝██║
██║ █████╗ ██╔██╗ ██║ ██║ ██║██╔██╗ ██║█████╗ ██║
██║ ██╔══╝ ██║╚██╗██║ ██║ ██║██║╚██╗██║██╔══╝ ██║
╚██████╗███████╗██║ ╚████║ ██║ ██║██║ ╚████║███████╗███████╗
╚═════╝╚══════╝╚═╝ ╚═══╝ ╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚══════╝
Semantic model routing with hard-block enforcement for Claude Code.
A Claude Code router that automatically routes prompts to Sonnet or Opus based on complexity — saving Opus quota for tasks that actually need it.
Centinel sits between your keystrokes and Claude. It encodes every prompt into a semantic embedding, measures its distance to pre-built simple and complex centroids, and either recommends or hard-blocks mismatched model usage before the prompt ever reaches the model.
Running all prompts through Opus wastes quota on trivial requests. Running complex reasoning tasks through Sonnet produces worse results. Neither is obvious in the moment — you just type and send.
Centinel intercepts every prompt, classifies its complexity in ~10ms, compares it against your current model, and takes action when there is a mismatch.
Centinel uses all-MiniLM-L6-v2 to encode prompts into 384-dimensional vectors, then computes cosine similarity against two pre-built centroid vectors — one representing the center of mass of 72 simple prototype prompts, one for 43 complex prototypes.
your prompt
│
┌───────┴────────┐
│ all-MiniLM │ (~10ms warm)
│ 384-dim embed │
└───────┬────────┘
│
┌─────────────┼─────────────┐
▼ ▼
cosine(embed, cosine(embed,
simple_centroid) complex_centroid)
│ │
└─────────────┬─────────────┘
│
sim_complex > sim_simple?
/ \
YES NO
│ │
OPUS SONNET
When the two similarity scores are within confidence_threshold of each other, Centinel biases toward Opus — borderline prompts are treated as complex by default.
Every routing decision carries a direction that describes the relationship between your current model and the recommendation:
- step_up — your current model is weaker than the task requires. You are under-powered.
- step_down — your current model is stronger than the task needs. You are over-powered.
This distinction matters because the appropriate response differs: a step-up is a capability problem that may affect output quality, while a step-down is a cost problem.
At the start of every session, Centinel's SessionStart hook injects a set of subagent tier-selection rules directly into the session's additionalContext. Claude reads these rules once and applies them to every Agent tool call it makes during the session — no per-prompt directive needed.
The injected rules look like this:
SIMPLE tasks → subagent_type="centinel:sonnet-executor"
COMPLEX tasks → subagent_type="centinel:opus-executor"
Examples of simple tasks: file reads, grep/glob searches, small edits,
git status/log, fixing typos, renaming, formatting, quick lookups.
Examples of complex tasks: architecture decisions, multi-file refactoring,
deep analysis, writing new features, debugging complex issues, planning.
The sonnet-executor and opus-executor agent definition files carry model: sonnet and model: opus in their YAML frontmatter. Any subagent Claude spawns using those subagent_type values runs on the specified model automatically, without further instruction.
This means subagent model selection is handled at the session level, not repeated on every prompt.
Centinel supports two action modes, configured in ~/.claude/model-router.json.
warn is the default. When a mismatch is detected, Centinel writes a visible recommendation to the conversation and passes the prompt through to Claude unchanged. Claude responds, but you see the suggestion.
block is the hard enforcement mode. When a mismatch is detected, Centinel exits with code 2. Claude Code interprets this as a hook failure and stops the prompt from reaching Claude entirely. You see a message explaining the mismatch and what to do. Claude never runs.
You: Design the architecture for a distributed caching system
┌─────────────────────────────────────────────────────────────┐
│ BLOCKED — task needs OPUS │
│ Reason: complex task (sim_complex=0.561 > sim_simple=0.081)│
│ Current model is under-powered for this request. │
│ → Run: /model opus then re-send your message. │
│ → Or prefix with '~' to bypass routing. │
└─────────────────────────────────────────────────────────────┘
Block granularity is configurable independently per direction:
| Setting | Default | Effect |
|---|---|---|
block.on_step_up |
true |
Block when current model is weaker than needed |
block.on_step_down |
false |
Block when current model is stronger than needed |
With the defaults, Centinel hard-blocks under-powered prompts but only warns on over-powered ones, since a capability mismatch is more consequential than a cost mismatch.
Prefix any message with ~ to skip routing entirely for that one prompt:
~Design the architecture for a distributed caching system
Centinel strips the prefix and passes the clean prompt to Claude. No block, no warning. The model you are currently on handles the request.
This is useful when the router misclassifies a specific prompt, or when you know the task well enough to make the call yourself without switching models.
Requirements: Python 3.10+, Claude Code
git clone https://github.com/your-username/centinel
cd centinel
bash install.shRestart your Claude Code session after installation.
What the installer does:
- Installs
numpyandsentence-transformers - Builds centroid embeddings from the bundled prototype prompts (~30 seconds, one time only)
- Writes a default configuration to
~/.claude/model-router.json - Registers
UserPromptSubmitandSessionStarthooks in~/.claude/settings.json
Uninstall:
bash uninstall.shRemoves hooks from settings.json. Your configuration file and logs are not deleted.
~/.claude/model-router.json
{
"enabled": true,
"action": "warn",
"default_model": "sonnet",
"confidence_threshold": 0.02,
"tiers": {
"sonnet": "claude-sonnet-4-6",
"opus": "claude-opus-4-6"
},
"block": {
"on_step_up": true,
"on_step_down": false
},
"budget": {
"enabled": false,
"daily_opus_limit": 50,
"fallback_on_limit": "sonnet"
},
"overrides": {
"bypass_prefix": "~",
"always_opus_keywords": [],
"always_sonnet_keywords": []
},
"logging": {
"enabled": true,
"log_file": "~/.claude/model-router.log"
}
}A project-level override file at .claude/model-router.json inside any project directory merges with the global config. Project settings take precedence.
Forcing keywords to a specific tier:
"overrides": {
"always_opus_keywords": ["security audit", "production incident"],
"always_sonnet_keywords": ["quick check", "typo"]
}Any prompt containing a listed keyword bypasses the classifier entirely and routes to the specified tier at confidence 0.95.
# Classify a single prompt
centinel classify "fix the typo in README"
centinel classify "design a distributed caching system" --current-model sonnet
# View today's routing summary
centinel stats
# View daily history from the log file
centinel stats history
centinel stats history --days 7
# View the last N routing decisions
centinel stats log
centinel stats log -n 50
# Clear counters and log
centinel stats reset
# Show active configuration
centinel config
# Write default config to ~/.claude/model-router.json
centinel init
# Classify a file of prompts, one per line
centinel batch prompts.txt
echo "fix typo" | centinel batch -
# Rebuild centroids from prototype prompts
centinel build-centroidsExample classify output:
Recommended: OPUS [step_up]
Model: claude-opus-4-6
Action: block
Direction: step_up
Confidence: 0.479
Complexity: 1.000
Latency: 10ms
Reason: Complex task (sim_complex=0.561 > sim_simple=0.081)
Example history output:
=== Routing History (last 7 days) ===
Total routes: 142
Opus: 41 (28.9%)
Sonnet: 101 (71.1%)
Step-ups: 12 (under-powered detections)
Step-downs: 29 (over-powered detections)
Savings: ~71% routed to Sonnet
Date Total Opus Sonnet Up Down Opus%
---------------------------------------------------
2026-03-23 25 7 18 2 5 28%
2026-03-22 31 9 22 3 7 29%
The centroids are built from prototype prompts in claude_model_router/prototypes.py. To adapt the classifier to your own prompt patterns, edit that file and rebuild:
centinel build-centroidsThe file contains two lists — SIMPLE_PROMPTS and COMPLEX_PROMPTS — with one prompt string per entry. The more representative your prototypes, the more accurately the classifier generalizes.
centinel/
├── claude_model_router/
│ ├── classifier.py # CentroidClassifier — cosine similarity routing
│ ├── encoder.py # Lazy-loaded SentenceTransformer singleton
│ ├── prototypes.py # Seed prompts for centroid construction
│ ├── build_centroids.py # Encodes prototypes, saves .npy centroid files
│ ├── config.py # Config loading: global → project → env vars
│ ├── router.py # ModelRouter: classifier + budget + logging
│ └── cli.py # CLI entry point
├── hooks/
│ ├── session_init_hook.py # SessionStart: injects subagent tier rules
│ └── model_router_hook.py # UserPromptSubmit: warn or hard block
├── agents/
│ ├── sonnet-executor.md # Agent definition with model: sonnet
│ └── opus-executor.md # Agent definition with model: opus
├── data/
│ ├── simple_centroid.npy # Pre-built centroid, committed to repo
│ └── complex_centroid.npy
├── tests/
├── install.sh
├── uninstall.sh
└── pyproject.toml
- Python 3.10 or later
- numpy >= 1.24
- sentence-transformers >= 2.2 (downloads ~80MB model on first run)
- Claude Code with hook support
Centinel is a Claude Code model router — part of a broader class of tools for Claude Code hook development and LLM routing. Related concepts and projects:
- Claude Code router — hook-based prompt routing between Sonnet and Opus
- LLM model routing — directing prompts to the most cost-effective model tier
- Claude Code hooks —
UserPromptSubmitandSessionStartautomation - Semantic routing — embedding-based classification vs keyword/rule-based approaches
- RouteLLM — academic framework for learned routing between strong and weak LLMs
If you are looking for a Claude Sonnet vs Opus router, a Claude Code cost optimization tool, or a way to auto-route Claude prompts by complexity, Centinel addresses all three.
MIT