NadirRouter · doramirdor · May 29, 2026 · May 27, 2026 · May 27, 2026 · May 27, 2026
diff --git a/MODEL_CARD.md b/MODEL_CARD.md
@@ -0,0 +1,220 @@
+# Model Card — `wide_deep_asym_v3` (Nadir pre-generation classifier)
+
+This card documents the architecture, training corpus, contamination
+posture, and published benchmark numbers for the pre-generation tier
+classifier that powers Nadir's routing decisions.
+
+NadirClaw, the open-source router in this repo, ships **the architecture
+description, the bundled trained weights, the heuristics on top of
+them**, and the cascade rule engine (see `nadirclaw/cascade.py`,
+`nadirclaw/cascade_rules/`, `nadirclaw/heuristic_verifier.py`,
+`nadirclaw/wide_deep_classifier.py`). The `wide_deep_asym_v3.pt`
+checkpoint (and its symmetric-loss companion `wide_deep_sym_v3.pt`,
+~900 KB each) lives under `nadirclaw/models/` and is loaded
+automatically by `nadirclaw.wide_deep_classifier.get_wide_deep_classifier()`.
+The weights and code are MIT-licensed alongside the rest of the
+package. [Nadir Pro](https://getnadir.com) layers a hosted dashboard,
+team billing, the trained DeBERTa-v3-small cascade verifier, and
+closed-loop retraining over the same classifier.
+
+- **Router name**: `nadir`
+- **Classifier family**: wide-and-deep asymmetric (`wide_deep_asym`)
+- **Production artifact**: `wide_deep_asym_v3.pt` (bundled in NadirClaw + used in Nadir Pro)
+- **Companion artifact**: `wide_deep_sym_v3.pt` (symmetric-loss variant, fixes the asym head's simple-class collapse under argmax decoding)
+- **Card last updated**: 2026-05-27
+- **Schema version**: 1
+
+---
+
+## 1. Architecture
+
+`wide_deep_asym_v3` is a wide-and-deep classifier trained on prompt
+features to predict a routing tier in `{simple, medium, complex}`.
+
+- **Wide branch** — structural and lexical features (length buckets,
+  code-fence indicators, math symbol density, JSON shape hints,
+  question-word counts).
+- **Deep branch** — BGE sentence-transformer embedding (`bge-small-en`).
+- **Head** — three-way softmax over `{simple, medium, complex}`.
+- **Loss** — asymmetric cross-entropy with downgrade penalty
+  `λ = 3` in v3. Downgrades (predicting `simple` when `complex` was
+  correct) are penalised 3× more than upgrades.
+
+Tier mapping for the reference deployment (Anthropic Claude 4.x ladder):
+
+| Tier | Model |
+| --- | --- |
+| simple | `claude-haiku-4-5` |
+| medium | `claude-sonnet-4-6` |
+| complex | `claude-opus-4-6` |
+
+### Inputs / outputs
+
+**Input**: a single user message (string). Multi-turn messages are
+concatenated by the production analyzer before classification.
+
+**Output**:
+
+- `tier` in `{simple, medium, complex}`
+- `model` (the corresponding tier model name)
+- `complexity_score` in `[0, 1]`
+- `classifier_confidence` in `[0, 1]` (softmax top-class probability)
+- `latency_ms` (single-core CPU)
+- `classifier_version` (`wide_deep_asym_v3`)
+
+---
+
+## 2. Training data
+
+Training is **deliberately disjoint** from RouterBench and RouterArena.
+
+Sources for `wide_deep_asym_v3`:
+
+- Internal Nadir labeled batches (`backend/labeled_data/v3/...`, not
+  part of this open-source repo).
+- Prior labeled batches under `v2/`, `raw/`, `batches/`.
+
+The verifier corpus used to train the post-generation cascade verifier
+is stored separately and was not used to train the pre-generation
+classifier.
+
+### Contamination audit status
+
+| Held-out set | Audit run | Overlap | Verdict |
+| --- | --- | --- | --- |
+| RouterBench `0shot` | 2026-05-24 | 0 of 36,481 | DISJOINT |
+| RouterArena `sub_10` | 2026-05-27 | 0 of 809 | DISJOINT |
+| RouterArena `full` | 2026-05-27 | 0 of 8,399 | DISJOINT |
+
+Audits are reproducible from this repo with the script in
+`verifier/contamination_audit.py`. Hash recipe:
+`sha256(NFC(prompt).strip().casefold().utf8)`.
+
+---
+
+## 3. Performance
+
+### Pre-generation classifier (this card)
+
+- **Held-out RouterBench** (n=11,420 prompts):
+  - AUROC **0.961** for binary "should escalate?" decision composed
+    with the post-generation verifier.
+  - Expected Calibration Error (ECE) **0.016** at the production
+    operating point.
+- **RouterArena `sub_10`** (n=809, public leaderboard):
+  - Composite score **0.7118**, currently projected #5 on the public
+    leaderboard (ahead of NotDiamond, Auto Router, Martian).
+  - RouterArena submission PR:
+    https://github.com/RouteWorks/RouterArena/pull/112
+- **Pre-generation, prompt-only** (no verifier): AUROC ~0.62 on
+  RouterBench cross-family triples. The pre-generation ceiling is the
+  architectural reason Nadir layers a post-generation cascade verifier
+  on top.
+
+### Cascade verifier (separately published)
+
+The post-generation cross-encoder verifier is shipped in **Nadir Pro**
+as DeBERTa-v3-small INT8 quantized. NadirClaw ships a rule-based
+heuristic verifier with the same interface (see
+`nadirclaw/heuristic_verifier.py`); the heuristic version reaches ~0.60
+AUROC on the same held-out triples but catches the bulk of refusals,
+truncations, and JSON-format failures.
+
+Composed-system numbers (classifier + Pro verifier) on RouterBench:
+
+- AUROC 0.961, ECE 0.016.
+- At τ=0.80: **98% of always-Opus quality preserved** (catastrophic
+  ≤ 1.7%), composed cost ~60% reduction vs always-Opus.
+- Verifier latency 192.9 ms per call, single-core CPU, INT8 qnnpack.
+
+τ-sweep (from the same held-out report):
+
+| τ | accept rate | catastrophic | wasted escalation | quality preserved |
+| --- | --- | --- | --- | --- |
+| 0.70 | 0.69 | 0.024 | 0.078 | 97.6% |
+| 0.75 | 0.67 | 0.019 | 0.089 | 98.1% |
+| **0.80** | **0.67** | **0.017** | **0.092** | **98.3%** |
+| 0.90 | 0.64 | 0.011 | 0.108 | 98.9% |
+
+τ=0.80 is the production operating point. NadirClaw's cascade defaults
+to τ=0.80 in `DEFAULT_ACCEPTANCE_THRESHOLD`.
+
+---
+
+## 4. Intended use
+
+- Pre-generation tier selection for LLM routing on the Claude 4.x
+  ladder, or any three-model ladder mapped to the same tiers.
+- Public-benchmark evaluation (RouterBench, RouterArena).
+
+### Out of scope
+
+- Not a quality verifier on its own — the post-generation cascade
+  verifier closes the pre-generation gap.
+- Not a guarantee of model output correctness. The router's job is to
+  pick a model.
+- Not validated on languages other than English at the published
+  thresholds.
+
+---
+
+## 5. Limitations
+
+1. **Pre-generation ceiling.** Prompt-only classification has bounded
+   AUROC on cross-family distributions (~0.62 on RouterBench). The
+   router cannot know whether Haiku will get the answer right; it can
+   only know whether Haiku *usually* gets that *kind* of prompt right.
+   The post-generation cascade verifier is the architectural answer.
+2. **Per-domain variance.** Verifier AUROC ranges from ~1.0 on
+   factual-recall (MMLU-style) prompts down to ~0.65 on code
+   generation and ~0.77 on long-form summarisation. The default
+   `cascade_rules` profile encodes those weak-verifier domains as
+   force-escalate / set-threshold rules so the cascade does not rely
+   on the verifier where it is known to be unreliable.
+3. **Training data is not adversarial.** The classifier has not been
+   stress-tested against prompt-injection-style inputs designed to
+   force a particular tier.
+4. **Asymmetric loss at λ=3.** The router prefers upgrades over
+   downgrades, which inflates the wasted-escalation rate on
+   pure-cheap prompts. This is intentional: catastrophic downgrade is
+   more expensive in customer trust than wasted Sonnet calls.
+
+---
+
+## 6. NadirClaw vs Nadir Pro on this card
+
+| Component | NadirClaw (OSS) | Nadir Pro |
+| --- | --- | --- |
+| Pre-generation classifier | Binary centroid (~10 ms), DistilBERT (opt-in), **or** bundled `wide_deep_asym_v3` (~40 ms CPU) | `wide_deep_asym_v3` with closed-loop retraining + provider-health-aware ranking |
+| Post-generation verifier | Rule-based heuristic, ~1 ms | DeBERTa-v3-small INT8, ~193 ms, AUROC 0.96 |
+| Cascade rule engine | Same engine; `default.yaml` + `multi_provider.yaml` profiles bundled | Same engine, same profiles, plus per-tenant overrides |
+| Default τ | 0.80 | 0.80 (env override `CASCADE_DEFAULT_THRESHOLD`) |
+| Contamination audit utility | `verifier/contamination_audit.py` | Same script, plus internal corpus loader |
+
+To use the bundled trained classifier directly, import it from
+`nadirclaw.wide_deep_classifier`:
+
+```python
+from nadirclaw.wide_deep_classifier import get_wide_deep_classifier
+
+clf = get_wide_deep_classifier(
+    checkpoint_variant="asym",            # or "symmetric"
+    decision_rule="cost_sensitive",       # pair asym with cost-sensitive
+    cost_lambda=20.0,                     # max-safe, ~47% cost vs always-Opus
+)
+result = clf.classify("Your prompt here")
+print(result.tier, result.confidence, result.probabilities)
+```
+
+For closed-loop retraining on your own workload, the path is: log
+decisions and outcomes from NadirClaw, then re-train a copy of the
+wide-and-deep head with the same architecture. Nadir Pro automates
+this loop for hosted customers.
+
+---
+
+## 7. Contact
+
+- Project: https://getnadir.com
+- GitHub: https://github.com/NadirRouter/NadirClaw
+- Email: hello@getnadir.dev
diff --git a/README.md b/README.md
@@ -59,6 +59,54 @@ SIMPLE  "Write a docstring"         → gemini-flash    $0.0002
 
 > **Your keys. Your models. No middleman.** NadirClaw runs locally and routes directly to providers. No third-party proxy, no subsidized tokens, no platform that can pull the plug on you. [Why this matters.](docs/vs-clawrouter.md)
 
+## Benchmarks
+
+NadirClaw and Nadir Pro share the same routing architecture. The numbers
+below are from the trained classifier + DeBERTa verifier in Nadir Pro;
+the NadirClaw OSS classifier uses a simpler binary centroid that trades
+some accuracy for zero training cost. Both run the same cascade rule
+engine (`nadirclaw/cascade_rules/`).
+
+### RouterBench (held-out, n=11,420)
+
+The composed system (pre-generation classifier + post-generation
+cascade verifier, τ=0.80):
+
+| Metric | Value |
+| --- | ---: |
+| AUROC | **0.961** |
+| Expected Calibration Error (ECE) | **0.016** |
+| Quality preserved vs always-Opus | **98.3%** |
+| Catastrophic-downgrade rate | 1.7% |
+| Composed cost vs always-Opus | -60% |
+
+Full τ-sweep and per-domain breakdown is in [`MODEL_CARD.md`](MODEL_CARD.md).
+
+### RouterArena (sub_10, n=809, public leaderboard)
+
+| Metric | Value |
+| --- | ---: |
+| Composite score | **0.7118** |
+| Projected leaderboard rank | **#5** |
+| Routers below (selected) | NotDiamond-0001, Auto Router, Martian |
+
+RouterArena submission PR (live):
+[RouteWorks/RouterArena#112](https://github.com/RouteWorks/RouterArena/pull/112).
+
+### Contamination audit
+
+Zero overlap between Nadir's training corpus and either held-out set:
+
+| Held-out set | Audit run | Overlap |
+| --- | --- | --- |
+| RouterBench `0shot` | 2026-05-24 | 0 of 36,481 |
+| RouterArena `sub_10` | 2026-05-27 | 0 of 809 |
+| RouterArena `full` | 2026-05-27 | 0 of 8,399 |
+
+The audit is reproducible from this repo:
+[`verifier/contamination_audit.py`](verifier/contamination_audit.py).
+Hash recipe: `sha256(NFC(prompt).strip().casefold().utf8)`.
+
 ## Quick Start
 
 ```bash
@@ -93,7 +141,9 @@ NadirClaw is the free, open-source core. If you are routing production traffic o
 |---|---|---|
 | **License** | MIT | Proprietary |
 | **Deploy** | Self-hosted, localhost | `api.getnadir.com` or self-host via Docker |
-| **Classifier** | Binary centroid (~10ms) or opt-in 3-class DistilBERT | Trained classifier + 3-tier routing, higher accuracy |
+| **Pre-generation classifier** | Binary centroid (~10ms), opt-in DistilBERT, or **bundled** `wide_deep_asym_v3` trained checkpoint (~40ms CPU; see [`MODEL_CARD.md`](MODEL_CARD.md)) | Same trained classifier + closed-loop retraining, provider-health-aware ranking |
+| **Post-generation verifier** | Rule-based heuristic (refusal / length / JSON checks, ~1ms) | Trained DeBERTa-v3-small cross-encoder, AUROC 0.96 on RouterBench held-out |
+| **Verifier-gated cascade** | Yes (heuristic verifier) | Yes (trained verifier) |
 | **Storage** | Local JSONL + SQLite | Postgres (Supabase), multi-tenant |
 | **Dashboard** | Terminal + local web | Hosted web dashboard, per-team analytics |
 | **Cost tracking** | `nadirclaw savings` CLI | Live dashboard, monthly invoices, projected savings |
@@ -110,6 +160,8 @@ NadirClaw is the free, open-source core. If you are routing production traffic o
 - **Smart routing** — classifies prompts in ~10ms using sentence embeddings
 - **Pluggable classifier** — `binary` (default, ~10ms centroid classifier) or `distilbert` (3-class fine-tuned DistilBERT that natively predicts simple/mid/complex). Select with `NADIRCLAW_COMPLEXITY_ANALYZER`
 - **Three-tier routing** — simple / mid / complex tiers with configurable score thresholds (`NADIRCLAW_TIER_THRESHOLDS`); set `NADIRCLAW_MID_MODEL` for a cost-effective middle tier
+- **Verifier-gated cascade** — cheap model first, score the response with a rule-based heuristic verifier (refusals, truncations, JSON-format failures, ~1ms), escalate to the expensive tier when the score falls below τ=0.80. Same architecture as Nadir Pro, swap the verifier for the trained DeBERTa cross-encoder. See `nadirclaw/cascade.py`.
+- **Cascade rule engine** — declarative YAML rules drive per-prompt overrides: `force_escalate` on patterns where the verifier is unreliable (code, summarisation), `set_threshold` to raise the verifier bar on borderline domains, `force_cheap` for trivially-easy patterns, `set_max_tokens` for length budgeting. Hot-reload from disk; profiles live in `nadirclaw/cascade_rules/profiles/`.
 - **Agentic task detection** — auto-detects tool use, multi-step loops, and agent system prompts; forces complex model for agentic requests
 - **Reasoning detection** — identifies prompts needing chain-of-thought and routes to reasoning-optimized models
 - **Vision routing** — auto-detects image content in messages and routes to vision-capable models (GPT-4o, Claude, Gemini)