Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
220 changes: 220 additions & 0 deletions MODEL_CARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Model Card — `wide_deep_asym_v3` (Nadir pre-generation classifier)

This card documents the architecture, training corpus, contamination
posture, and published benchmark numbers for the pre-generation tier
classifier that powers Nadir's routing decisions.

NadirClaw, the open-source router in this repo, ships **the architecture
description, the bundled trained weights, the heuristics on top of
them**, and the cascade rule engine (see `nadirclaw/cascade.py`,
`nadirclaw/cascade_rules/`, `nadirclaw/heuristic_verifier.py`,
`nadirclaw/wide_deep_classifier.py`). The `wide_deep_asym_v3.pt`
checkpoint (and its symmetric-loss companion `wide_deep_sym_v3.pt`,
~900 KB each) lives under `nadirclaw/models/` and is loaded
automatically by `nadirclaw.wide_deep_classifier.get_wide_deep_classifier()`.
The weights and code are MIT-licensed alongside the rest of the
package. [Nadir Pro](https://getnadir.com) layers a hosted dashboard,
team billing, the trained DeBERTa-v3-small cascade verifier, and
closed-loop retraining over the same classifier.

- **Router name**: `nadir`
- **Classifier family**: wide-and-deep asymmetric (`wide_deep_asym`)
- **Production artifact**: `wide_deep_asym_v3.pt` (bundled in NadirClaw + used in Nadir Pro)
- **Companion artifact**: `wide_deep_sym_v3.pt` (symmetric-loss variant, fixes the asym head's simple-class collapse under argmax decoding)
- **Card last updated**: 2026-05-27
- **Schema version**: 1

---

## 1. Architecture

`wide_deep_asym_v3` is a wide-and-deep classifier trained on prompt
features to predict a routing tier in `{simple, medium, complex}`.

- **Wide branch** — structural and lexical features (length buckets,
code-fence indicators, math symbol density, JSON shape hints,
question-word counts).
- **Deep branch** — BGE sentence-transformer embedding (`bge-small-en`).
- **Head** — three-way softmax over `{simple, medium, complex}`.
- **Loss** — asymmetric cross-entropy with downgrade penalty
`λ = 3` in v3. Downgrades (predicting `simple` when `complex` was
correct) are penalised 3× more than upgrades.

Tier mapping for the reference deployment (Anthropic Claude 4.x ladder):

| Tier | Model |
| --- | --- |
| simple | `claude-haiku-4-5` |
| medium | `claude-sonnet-4-6` |
| complex | `claude-opus-4-6` |

### Inputs / outputs

**Input**: a single user message (string). Multi-turn messages are
concatenated by the production analyzer before classification.

**Output**:

- `tier` in `{simple, medium, complex}`
- `model` (the corresponding tier model name)
- `complexity_score` in `[0, 1]`
- `classifier_confidence` in `[0, 1]` (softmax top-class probability)
- `latency_ms` (single-core CPU)
- `classifier_version` (`wide_deep_asym_v3`)

---

## 2. Training data

Training is **deliberately disjoint** from RouterBench and RouterArena.

Sources for `wide_deep_asym_v3`:

- Internal Nadir labeled batches (`backend/labeled_data/v3/...`, not
part of this open-source repo).
- Prior labeled batches under `v2/`, `raw/`, `batches/`.

The verifier corpus used to train the post-generation cascade verifier
is stored separately and was not used to train the pre-generation
classifier.

### Contamination audit status

| Held-out set | Audit run | Overlap | Verdict |
| --- | --- | --- | --- |
| RouterBench `0shot` | 2026-05-24 | 0 of 36,481 | DISJOINT |
| RouterArena `sub_10` | 2026-05-27 | 0 of 809 | DISJOINT |
| RouterArena `full` | 2026-05-27 | 0 of 8,399 | DISJOINT |

Audits are reproducible from this repo with the script in
`verifier/contamination_audit.py`. Hash recipe:
`sha256(NFC(prompt).strip().casefold().utf8)`.

---

## 3. Performance

### Pre-generation classifier (this card)

- **Held-out RouterBench** (n=11,420 prompts):
- AUROC **0.961** for binary "should escalate?" decision composed
with the post-generation verifier.
- Expected Calibration Error (ECE) **0.016** at the production
operating point.
- **RouterArena `sub_10`** (n=809, public leaderboard):
- Composite score **0.7118**, currently projected #5 on the public
leaderboard (ahead of NotDiamond, Auto Router, Martian).
- RouterArena submission PR:
https://github.com/RouteWorks/RouterArena/pull/112
- **Pre-generation, prompt-only** (no verifier): AUROC ~0.62 on
RouterBench cross-family triples. The pre-generation ceiling is the
architectural reason Nadir layers a post-generation cascade verifier
on top.

### Cascade verifier (separately published)

The post-generation cross-encoder verifier is shipped in **Nadir Pro**
as DeBERTa-v3-small INT8 quantized. NadirClaw ships a rule-based
heuristic verifier with the same interface (see
`nadirclaw/heuristic_verifier.py`); the heuristic version reaches ~0.60
AUROC on the same held-out triples but catches the bulk of refusals,
truncations, and JSON-format failures.

Composed-system numbers (classifier + Pro verifier) on RouterBench:

- AUROC 0.961, ECE 0.016.
- At τ=0.80: **98% of always-Opus quality preserved** (catastrophic
≤ 1.7%), composed cost ~60% reduction vs always-Opus.
- Verifier latency 192.9 ms per call, single-core CPU, INT8 qnnpack.

τ-sweep (from the same held-out report):

| τ | accept rate | catastrophic | wasted escalation | quality preserved |
| --- | --- | --- | --- | --- |
| 0.70 | 0.69 | 0.024 | 0.078 | 97.6% |
| 0.75 | 0.67 | 0.019 | 0.089 | 98.1% |
| **0.80** | **0.67** | **0.017** | **0.092** | **98.3%** |
| 0.90 | 0.64 | 0.011 | 0.108 | 98.9% |

τ=0.80 is the production operating point. NadirClaw's cascade defaults
to τ=0.80 in `DEFAULT_ACCEPTANCE_THRESHOLD`.

---

## 4. Intended use

- Pre-generation tier selection for LLM routing on the Claude 4.x
ladder, or any three-model ladder mapped to the same tiers.
- Public-benchmark evaluation (RouterBench, RouterArena).

### Out of scope

- Not a quality verifier on its own — the post-generation cascade
verifier closes the pre-generation gap.
- Not a guarantee of model output correctness. The router's job is to
pick a model.
- Not validated on languages other than English at the published
thresholds.

---

## 5. Limitations

1. **Pre-generation ceiling.** Prompt-only classification has bounded
AUROC on cross-family distributions (~0.62 on RouterBench). The
router cannot know whether Haiku will get the answer right; it can
only know whether Haiku *usually* gets that *kind* of prompt right.
The post-generation cascade verifier is the architectural answer.
2. **Per-domain variance.** Verifier AUROC ranges from ~1.0 on
factual-recall (MMLU-style) prompts down to ~0.65 on code
generation and ~0.77 on long-form summarisation. The default
`cascade_rules` profile encodes those weak-verifier domains as
force-escalate / set-threshold rules so the cascade does not rely
on the verifier where it is known to be unreliable.
3. **Training data is not adversarial.** The classifier has not been
stress-tested against prompt-injection-style inputs designed to
force a particular tier.
4. **Asymmetric loss at λ=3.** The router prefers upgrades over
downgrades, which inflates the wasted-escalation rate on
pure-cheap prompts. This is intentional: catastrophic downgrade is
more expensive in customer trust than wasted Sonnet calls.

---

## 6. NadirClaw vs Nadir Pro on this card

| Component | NadirClaw (OSS) | Nadir Pro |
| --- | --- | --- |
| Pre-generation classifier | Binary centroid (~10 ms), DistilBERT (opt-in), **or** bundled `wide_deep_asym_v3` (~40 ms CPU) | `wide_deep_asym_v3` with closed-loop retraining + provider-health-aware ranking |
| Post-generation verifier | Rule-based heuristic, ~1 ms | DeBERTa-v3-small INT8, ~193 ms, AUROC 0.96 |
| Cascade rule engine | Same engine; `default.yaml` + `multi_provider.yaml` profiles bundled | Same engine, same profiles, plus per-tenant overrides |
| Default τ | 0.80 | 0.80 (env override `CASCADE_DEFAULT_THRESHOLD`) |
| Contamination audit utility | `verifier/contamination_audit.py` | Same script, plus internal corpus loader |

To use the bundled trained classifier directly, import it from
`nadirclaw.wide_deep_classifier`:

```python
from nadirclaw.wide_deep_classifier import get_wide_deep_classifier

clf = get_wide_deep_classifier(
checkpoint_variant="asym", # or "symmetric"
decision_rule="cost_sensitive", # pair asym with cost-sensitive
cost_lambda=20.0, # max-safe, ~47% cost vs always-Opus
)
result = clf.classify("Your prompt here")
print(result.tier, result.confidence, result.probabilities)
```

For closed-loop retraining on your own workload, the path is: log
decisions and outcomes from NadirClaw, then re-train a copy of the
wide-and-deep head with the same architecture. Nadir Pro automates
this loop for hosted customers.

---

## 7. Contact

- Project: https://getnadir.com
- GitHub: https://github.com/NadirRouter/NadirClaw
- Email: hello@getnadir.dev
54 changes: 53 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,54 @@ SIMPLE "Write a docstring" → gemini-flash $0.0002

> **Your keys. Your models. No middleman.** NadirClaw runs locally and routes directly to providers. No third-party proxy, no subsidized tokens, no platform that can pull the plug on you. [Why this matters.](docs/vs-clawrouter.md)

## Benchmarks

NadirClaw and Nadir Pro share the same routing architecture. The numbers
below are from the trained classifier + DeBERTa verifier in Nadir Pro;
the NadirClaw OSS classifier uses a simpler binary centroid that trades
some accuracy for zero training cost. Both run the same cascade rule
engine (`nadirclaw/cascade_rules/`).

### RouterBench (held-out, n=11,420)

The composed system (pre-generation classifier + post-generation
cascade verifier, τ=0.80):

| Metric | Value |
| --- | ---: |
| AUROC | **0.961** |
| Expected Calibration Error (ECE) | **0.016** |
| Quality preserved vs always-Opus | **98.3%** |
| Catastrophic-downgrade rate | 1.7% |
| Composed cost vs always-Opus | -60% |

Full τ-sweep and per-domain breakdown is in [`MODEL_CARD.md`](MODEL_CARD.md).

### RouterArena (sub_10, n=809, public leaderboard)

| Metric | Value |
| --- | ---: |
| Composite score | **0.7118** |
| Projected leaderboard rank | **#5** |
| Routers below (selected) | NotDiamond-0001, Auto Router, Martian |

RouterArena submission PR (live):
[RouteWorks/RouterArena#112](https://github.com/RouteWorks/RouterArena/pull/112).

### Contamination audit

Zero overlap between Nadir's training corpus and either held-out set:

| Held-out set | Audit run | Overlap |
| --- | --- | --- |
| RouterBench `0shot` | 2026-05-24 | 0 of 36,481 |
| RouterArena `sub_10` | 2026-05-27 | 0 of 809 |
| RouterArena `full` | 2026-05-27 | 0 of 8,399 |

The audit is reproducible from this repo:
[`verifier/contamination_audit.py`](verifier/contamination_audit.py).
Hash recipe: `sha256(NFC(prompt).strip().casefold().utf8)`.

## Quick Start

```bash
Expand Down Expand Up @@ -93,7 +141,9 @@ NadirClaw is the free, open-source core. If you are routing production traffic o
|---|---|---|
| **License** | MIT | Proprietary |
| **Deploy** | Self-hosted, localhost | `api.getnadir.com` or self-host via Docker |
| **Classifier** | Binary centroid (~10ms) or opt-in 3-class DistilBERT | Trained classifier + 3-tier routing, higher accuracy |
| **Pre-generation classifier** | Binary centroid (~10ms), opt-in DistilBERT, or **bundled** `wide_deep_asym_v3` trained checkpoint (~40ms CPU; see [`MODEL_CARD.md`](MODEL_CARD.md)) | Same trained classifier + closed-loop retraining, provider-health-aware ranking |
| **Post-generation verifier** | Rule-based heuristic (refusal / length / JSON checks, ~1ms) | Trained DeBERTa-v3-small cross-encoder, AUROC 0.96 on RouterBench held-out |
| **Verifier-gated cascade** | Yes (heuristic verifier) | Yes (trained verifier) |
| **Storage** | Local JSONL + SQLite | Postgres (Supabase), multi-tenant |
| **Dashboard** | Terminal + local web | Hosted web dashboard, per-team analytics |
| **Cost tracking** | `nadirclaw savings` CLI | Live dashboard, monthly invoices, projected savings |
Expand All @@ -110,6 +160,8 @@ NadirClaw is the free, open-source core. If you are routing production traffic o
- **Smart routing** — classifies prompts in ~10ms using sentence embeddings
- **Pluggable classifier** — `binary` (default, ~10ms centroid classifier) or `distilbert` (3-class fine-tuned DistilBERT that natively predicts simple/mid/complex). Select with `NADIRCLAW_COMPLEXITY_ANALYZER`
- **Three-tier routing** — simple / mid / complex tiers with configurable score thresholds (`NADIRCLAW_TIER_THRESHOLDS`); set `NADIRCLAW_MID_MODEL` for a cost-effective middle tier
- **Verifier-gated cascade** — cheap model first, score the response with a rule-based heuristic verifier (refusals, truncations, JSON-format failures, ~1ms), escalate to the expensive tier when the score falls below τ=0.80. Same architecture as Nadir Pro, swap the verifier for the trained DeBERTa cross-encoder. See `nadirclaw/cascade.py`.
- **Cascade rule engine** — declarative YAML rules drive per-prompt overrides: `force_escalate` on patterns where the verifier is unreliable (code, summarisation), `set_threshold` to raise the verifier bar on borderline domains, `force_cheap` for trivially-easy patterns, `set_max_tokens` for length budgeting. Hot-reload from disk; profiles live in `nadirclaw/cascade_rules/profiles/`.
- **Agentic task detection** — auto-detects tool use, multi-step loops, and agent system prompts; forces complex model for agentic requests
- **Reasoning detection** — identifies prompts needing chain-of-thought and routes to reasoning-optimized models
- **Vision routing** — auto-detects image content in messages and routes to vision-capable models (GPT-4o, Claude, Gemini)
Expand Down
Loading
Loading