Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,67 @@ N-tier dispatch with the bundled 3-tier profile, or write your own.
Schema reference: `nadirclaw/tier_config/schema.py`. Sample profiles:
`nadirclaw/tier_config/profiles/`.

### Trained verifier (NadirRouter/cascade-verifier-v1)

The default `n2_default` profile escalates via the rule-based
`HeuristicVerifier` shipped in this repo — no extra dependencies, runs
in under 1 ms per call, catches the obvious failure modes (refusals,
truncation, JSON parse failure). For the subtler "looks right but is
factually wrong" tail, NadirClaw v0.19 ships an opt-in trained
cross-encoder verifier.

This is the frozen DeBERTa-v3-small snapshot used in the
[RouterArena PR #112](https://github.com/RouteWorks/RouterArena/pull/112)
submission (arena_F 0.7358). It is released under MIT as
[`NadirRouter/cascade-verifier-v1`](https://huggingface.co/NadirRouter/cascade-verifier-v1)
on HuggingFace so the RouterArena number is reproducible end-to-end
with the open-source router.

**Install with the optional extras:**

```bash
pip install nadirclaw[trained]
```

This pulls in `transformers>=4.40` and `torch>=2.0`. Users who do not
want the transformer stack pay nothing — the heuristic remains the
default.

**Activate the trained verifier:**

```bash
export NADIRCLAW_TIERS_PROFILE=n2_trained
```

The `n2_trained` profile uses the same N=2 cascade ladder as
`n2_default` but routes verifier decisions through the trained
DeBERTa-v3-small cross-encoder. Weights load lazily on first cascade
call (~500 MB checkpoint, ~10 s download into the HF cache; subsequent
runs hit the cache).

**Direct API:**

```python
from nadirclaw.trained_verifier import TrainedVerifier

verifier = TrainedVerifier(threshold=0.80)
result = verifier.score(prompt, cheap_answer)
print(result.score, result.accepted)
```

**What is and is not released**

| | OSS (NadirClaw v0.19) | Pro (Nadir hosted) |
| --- | --- | --- |
| Frozen verifier weights | YES (`cascade-verifier-v1`, MIT) | YES |
| Training pipeline | NO | YES (corpus + judge + curriculum) |
| Adaptive retraining loop | NO | YES |
| Custom-routed quality scoring | NO | YES |

The frozen snapshot is enough to reproduce the RouterArena result; the
adaptive retraining keeps the production verifier current as new model
families ship.

## Usage with Gemini

Gemini is the default simple model. NadirClaw calls Gemini natively via the Google GenAI SDK for best performance.
Expand Down
2 changes: 1 addition & 1 deletion nadirclaw/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""NadirClaw — Open-source LLM router."""

__version__ = "0.18.0"
__version__ = "0.19.0"
22 changes: 21 additions & 1 deletion nadirclaw/cascade.py
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,27 @@ def __init__(
self.tier_callers = dict(tier_callers)
self.selector = TierSelector(tier_profile)
self.threshold = float(tier_profile.cascade.acceptance_threshold)
self.verifier = verifier or get_heuristic_verifier(threshold=self.threshold)
# Pick verifier: explicit constructor arg wins; otherwise read
# `cascade.verifier` from the profile. "heuristic" (default)
# keeps the zero-dependency rule-based verifier. "trained"
# lazily loads the DeBERTa-v3-small cross-encoder from
# NadirRouter/cascade-verifier-v1 (requires the optional
# `nadirclaw[trained]` extras).
if verifier is not None:
self.verifier = verifier
elif tier_profile.cascade.verifier == "trained":
# Local import so the heuristic-only install path does not
# pull in transformers/torch at module load time.
from nadirclaw.trained_verifier import ( # noqa: PLC0415
get_trained_verifier,
)

self.verifier = get_trained_verifier(
threshold=self.threshold,
model_id=tier_profile.cascade.verifier_model,
)
else:
self.verifier = get_heuristic_verifier(threshold=self.threshold)
self.rule_engine = rule_engine
self._consecutive_errors: int = 0
self._kill_switch: bool = False
Expand Down
58 changes: 58 additions & 0 deletions nadirclaw/tier_config/profiles/n2_trained.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# N=2 default + trained verifier profile (NadirClaw, MIT).
#
# Same N=2 tier layout as n2_default, but routes verifier decisions
# through the trained DeBERTa-v3-small cross-encoder released as
# NadirRouter/cascade-verifier-v1 on HuggingFace, instead of the
# rule-based HeuristicVerifier.
#
# Use this profile to reproduce the RouterArena PR #112 result
# (arena_F 0.7358) end-to-end with the open-source router. Requires
# the `trained` extras:
#
# pip install nadirclaw[trained]
#
# Activate with:
#
# export NADIRCLAW_TIERS_PROFILE=n2_trained
#
# The heuristic verifier remains the default (n2_default) so users
# who do not want the transformer stack pay nothing for it.

version: 1
mode: tiered

selector:
classifier: wide_deep_asym_v3
lambda_cost: 1.0

cascade:
escalation: adjacent
acceptance_threshold: 0.80
rules_profile: default
max_escalations: 1
# Use the trained DeBERTa-v3-small cross-encoder instead of the
# rule-based heuristic. Loaded lazily on first cascade call.
verifier: trained
verifier_model: NadirRouter/cascade-verifier-v1

tiers:
# ----- Cheap tier: workhorses for simple/mid prompts. -----
- name: cheap
score_min: 0.00
model_pool:
- gpt-4o-mini
- qwen3-235b-a22b-2507
- deepseek-v3.2
- claude-3-haiku-20240307
max_output_tokens: 2048

# ----- Strong tier: reasoning models for the verifier-rejected tail. -----
- name: strong
score_min: 0.65
model_pool:
- gpt-5-mini
- deepseek-reasoner
- deepseek-v4-flash
- grok-4-1-fast-reasoning
- claude-sonnet-4
max_output_tokens: 4096
14 changes: 14 additions & 0 deletions nadirclaw/tier_config/schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,15 @@ class CascadeConfig(BaseModel):
# Safety cap: never escalate more than this many hops, even in
# adjacent mode. None = N-1 (walk the full ladder).
max_escalations: Optional[int] = Field(default=None, ge=0)
# Which verifier the cascade should use. `heuristic` (default) uses
# the rule-based HeuristicVerifier shipped in this repo. `trained`
# loads NadirRouter/cascade-verifier-v1 from HuggingFace and
# requires the `trained` extras (pip install nadirclaw[trained]).
verifier: str = "heuristic"
# HuggingFace model id or local path for the trained verifier.
# Only consulted when verifier == "trained". Defaults to the
# released v1 snapshot.
verifier_model: str = "NadirRouter/cascade-verifier-v1"

@model_validator(mode="after")
def _check_mode(self) -> "CascadeConfig":
Expand All @@ -72,6 +81,11 @@ def _check_mode(self) -> "CascadeConfig":
f"cascade.escalation must be 'adjacent' or 'jump', "
f"got {self.escalation!r}. ('learned' is a Pro-only mode.)"
)
if self.verifier not in ("heuristic", "trained"):
raise ValueError(
f"cascade.verifier must be 'heuristic' or 'trained', "
f"got {self.verifier!r}."
)
return self


Expand Down
Loading
Loading