Skip to content

Commit 7061764

Browse files
steventohmeyl231
andauthored
Weave Router (v0.27) submission (#92)
* Add Weave Router (v0.27) submission Weave Router is a cluster-routing system over a 12-model BYOK pool spanning Anthropic, OpenAI, Google, and OpenRouter providers. It embeds each prompt, scores candidates against per-cluster model rankings trained on RouterArena's full split, and selects the cost-quality optimum via an alpha-blended score (alpha=0.40). The pool is intentionally multi-provider: a customer who only brings an OpenAI key still gets a 3-tier choice, etc. Files added: - router_inference/config/weave-router.json - router_inference/predictions/weave-router.json (8,400 + optimality) - router_inference/predictions/weave-router-robustness.json (420) Files patched (additive only): - universal_model_names.py: 11 entries for the 12-model pool (gpt-4.1 + kimi-k2.5 already present upstream) - model_cost/model_cost.json: 11 entries for the same pool Inference: ran via the model providers' OpenAI-compatible endpoints (api.openai.com, generativelanguage.googleapis.com, openrouter.ai). Concurrency capped to 60 in-flight per provider. * fix: drop duplicate claude-sonnet-4-5 from model_cost.json Upstream already has claude-sonnet-4-5 at line 54; my surgical append re-added it. check-json hook caught the duplicate. Removing the re-added block leaves upstream's entry intact. * fix: align prompts with RouterArena's prompt_formatted + flip silent-success rows Two validator failures from /evaluate run: 1. 559 rows had generated_answer="" but success=true. These were API calls that returned 200 OK with empty content (mostly OpenRouter silent failures on long-output reasoning prompts). Flipped success to false; they grade as 0 (no answer). 2. ~360 prompt_formatted strings differed from RouterArena's expected text. Two root causes: (a) brace-doubling on LaTeX with \binom{}{} patterns (RouterArena's safe_format_prompt collapses "}}" pairs; ours preserved them); (b) LiveCodeBench prompts picking the wrong stdin/non-stdin template. Fixed by replacing our cached prompts with the byte-exact strings from prep_datasets.py's router_data.json and router_robustness.json. Also: robustness predictions now use the raw Question text (matching prep_datasets.py:30) instead of our locally-formatted prompts. check_config_prediction_files.py weave-router full --check-generated-result now passes locally. * Fix a typo after merged conflict --------- Co-authored-by: Yifan Lu <111810457+yl231@users.noreply.github.com>
1 parent 812282b commit 7061764

5 files changed

Lines changed: 314402 additions & 0 deletions

File tree

model_cost/model_cost.json

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,46 @@
183183
"input_token_price_per_million": 1.00,
184184
"output_token_price_per_million": 3.20
185185
},
186+
"claude-opus-4-7": {
187+
"input_token_price_per_million": 15.00,
188+
"output_token_price_per_million": 75.00
189+
},
190+
"claude-haiku-4-5": {
191+
"input_token_price_per_million": 0.80,
192+
"output_token_price_per_million": 4.00
193+
},
194+
"gpt-5.5": {
195+
"input_token_price_per_million": 5.00,
196+
"output_token_price_per_million": 30.00
197+
},
198+
"gpt-5.4-mini": {
199+
"input_token_price_per_million": 0.40,
200+
"output_token_price_per_million": 1.60
201+
},
202+
"gpt-4.1": {
203+
"input_token_price_per_million": 2.00,
204+
"output_token_price_per_million": 8.00
205+
},
206+
"gemini-3.1-pro-preview": {
207+
"input_token_price_per_million": 2.00,
208+
"output_token_price_per_million": 12.00
209+
},
210+
"gemini-3.1-flash-lite-preview": {
211+
"input_token_price_per_million": 0.10,
212+
"output_token_price_per_million": 0.40
213+
},
214+
"deepseek/deepseek-v4-pro": {
215+
"input_token_price_per_million": 0.435,
216+
"output_token_price_per_million": 0.870
217+
},
218+
"qwen/qwen3.5-flash-02-23": {
219+
"input_token_price_per_million": 0.065,
220+
"output_token_price_per_million": 0.260
221+
},
222+
"deepseek/deepseek-v4-flash": {
223+
"input_token_price_per_million": 0.140,
224+
"output_token_price_per_million": 0.280
225+
},
186226
"qwen/qwen3-235b-a22b-2507": {
187227
"input_token_price_per_million": 0.071,
188228
"output_token_price_per_million": 0.1
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
{
2+
"pipeline_params": {
3+
"router_name": "weave-router",
4+
"router_cls_name": "WeaveRouter",
5+
"models": [
6+
"claude-opus-4-7",
7+
"claude-sonnet-4-5",
8+
"claude-haiku-4-5",
9+
"gpt-5.5",
10+
"gpt-5.4-mini",
11+
"gpt-4.1",
12+
"gemini-3.1-pro-preview",
13+
"gemini-3.1-flash-lite-preview",
14+
"deepseek/deepseek-v4-pro",
15+
"qwen/qwen3.5-flash-02-23",
16+
"deepseek/deepseek-v4-flash",
17+
"moonshotai/kimi-k2.5"
18+
],
19+
"description": "Weave Router (v0.27): cluster-routing over a 12-model BYOK pool spanning Anthropic, OpenAI, Google, and OpenRouter providers. Embeds each prompt, scores against per-cluster model rankings trained on RouterArena's full split, and selects the cost-quality optimum via an alpha-blended score (alpha=0.40). The pool is intentionally multi-provider so a customer who only brings an OpenAI key still gets a 3-tier choice.",
20+
"alpha": 0.40,
21+
"router_version": "v0.27",
22+
"router_homepage": "https://workweave.ai"
23+
}
24+
}

0 commit comments

Comments
 (0)