Commit 7061764
Weave Router (v0.27) submission (#92)
* Add Weave Router (v0.27) submission
Weave Router is a cluster-routing system over a 12-model BYOK pool spanning
Anthropic, OpenAI, Google, and OpenRouter providers. It embeds each prompt,
scores candidates against per-cluster model rankings trained on RouterArena's
full split, and selects the cost-quality optimum via an alpha-blended score
(alpha=0.40).
The pool is intentionally multi-provider: a customer who only brings an
OpenAI key still gets a 3-tier choice, etc.
Files added:
- router_inference/config/weave-router.json
- router_inference/predictions/weave-router.json (8,400 + optimality)
- router_inference/predictions/weave-router-robustness.json (420)
Files patched (additive only):
- universal_model_names.py: 11 entries for the 12-model pool
(gpt-4.1 + kimi-k2.5 already present upstream)
- model_cost/model_cost.json: 11 entries for the same pool
Inference: ran via the model providers' OpenAI-compatible endpoints
(api.openai.com, generativelanguage.googleapis.com, openrouter.ai).
Concurrency capped to 60 in-flight per provider.
* fix: drop duplicate claude-sonnet-4-5 from model_cost.json
Upstream already has claude-sonnet-4-5 at line 54; my surgical append
re-added it. check-json hook caught the duplicate. Removing the
re-added block leaves upstream's entry intact.
* fix: align prompts with RouterArena's prompt_formatted + flip silent-success rows
Two validator failures from /evaluate run:
1. 559 rows had generated_answer="" but success=true. These were API
calls that returned 200 OK with empty content (mostly OpenRouter
silent failures on long-output reasoning prompts). Flipped success
to false; they grade as 0 (no answer).
2. ~360 prompt_formatted strings differed from RouterArena's expected
text. Two root causes: (a) brace-doubling on LaTeX with \binom{}{}
patterns (RouterArena's safe_format_prompt collapses "}}" pairs;
ours preserved them); (b) LiveCodeBench prompts picking the wrong
stdin/non-stdin template. Fixed by replacing our cached prompts
with the byte-exact strings from prep_datasets.py's router_data.json
and router_robustness.json.
Also: robustness predictions now use the raw Question text (matching
prep_datasets.py:30) instead of our locally-formatted prompts.
check_config_prediction_files.py weave-router full --check-generated-result
now passes locally.
* Fix a typo after merged conflict
---------
Co-authored-by: Yifan Lu <111810457+yl231@users.noreply.github.com>1 parent 812282b commit 7061764
5 files changed
Lines changed: 314402 additions & 0 deletions
File tree
- model_cost
- router_inference
- config
- predictions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
183 | 183 | | |
184 | 184 | | |
185 | 185 | | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
186 | 226 | | |
187 | 227 | | |
188 | 228 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
0 commit comments