Skip to content

Add orcarouter-adaptive router predictions#104

Merged
yl231 merged 4 commits into
RouteWorks:mainfrom
Continuum-AI-Corp:submit/orcarouter-adaptive-v2
May 25, 2026
Merged

Add orcarouter-adaptive router predictions#104
yl231 merged 4 commits into
RouteWorks:mainfrom
Continuum-AI-Corp:submit/orcarouter-adaptive-v2

Conversation

@ZhenghuaBao

Copy link
Copy Markdown
Contributor

Method

orcarouter-adaptive is a LinUCB contextual bandit with embedding-augmented features.

  • Algorithm: per-arm LinUCB ridge regression with UCB exploration. Each /decide call scores all candidate arms and picks the argmax.
  • Feature vector: lexical features concatenated with a sentence-transformer embedding (all-MiniLM-L6-v2, L2-normalized, CPU inference for determinism across environments).
  • Reward shape: per-query Arena Score — aligns the bandit's training objective with the leaderboard evaluation metric.
  • Pre-training: per-arm parameters are initialized via offline observations before evaluation. /decide calls during evaluation are read-only on bandit state.

Pool

10 models across 5 providers:

Provider Models
Anthropic claude-haiku-4-5-20251001, claude-sonnet-4
Google gemini-2.5-flash, gemini-2.5-flash-lite
OpenAI gpt-4o-mini, gpt-5-mini
DeepSeek deepseek-chat, deepseek-reasoner
Alibaba qwen3-235b-a22b-instruct-2507, qwen3-30b-a3b-instruct-2507

Files

  • router_inference/config/orcarouter-adaptive.json — router config
  • router_inference/predictions/orcarouter-adaptive.json — full-split predictions with populated generated_result
  • router_inference/predictions/orcarouter-adaptive-robustness.json — robustness predictions (routing-only)
  • universal_model_names.py — provider-prefix → bare-name mappings for the pool
  • model_cost/model_cost.json — token cost entries for pool models not previously listed

Local validation via router_inference/check_config_prediction_files.py passes for both full and robustness splits.

ZhenghuaBao and others added 2 commits May 21, 2026 18:49
10-model pool: anthropic claude-haiku/sonnet-4, google gemini-2.5-flash/-lite,
openai gpt-4o-mini/gpt-5-mini, deepseek chat/reasoner, alibaba qwen3-235b/30b.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add prefix→bare mappings (universal_model_names.py) and cost entries
(model_cost/model_cost.json) for the 5 pool models not already present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ZhenghuaBao

Copy link
Copy Markdown
Contributor Author

/evaluate

@github-actions

Copy link
Copy Markdown

Router Evaluation Results

Router: orcarouter-adaptive
Dataset Split: full

RouterArena Metrics

Metric Value
RouterArena Score 0.7208
Accuracy 75.54%
Total Cost $8.380033
Avg Cost per Query $0.000998
Avg Cost per 1K Queries $0.9976
Number of Queries 8400
Robustness Score 0.2262

Evaluation completed by RouterArena automated workflow

@ZhenghuaBao

Copy link
Copy Markdown
Contributor Author

Hi @yl231,

All pipelines pass and evaluation results are included (full 8400 + 420 robustness splits). We'd like to merge OrcaRouter Adaptive as an online learning entry — a different regime from the offline/pre-trained routers currently on the leaderboard, intended as a reference point for contextual bandits in routing.

Thanks in advance for the review!

yl231 added a commit that referenced this pull request May 21, 2026
OrcaRouter-Adaptive (LinUCB contextual bandit with embedding features)
submitted in #104 by @ZhenghuaBao. With Acc-Cost Arena 72.08 it slots
between Sqwish (75.27) and Azure-Model-Router (71.87), landing at 🥈.
Every row below shifts down by one (16 → 17 entries).

Bot-reported metrics:
  Acc-Cost Arena  72.08
  Accuracy        75.54
  Cost/1K Queries $1.00 ($0.9976 rounded to 2dp like other rows)
  Robustness      22.62

The submitted prediction file does not include for_optimality entries,
so the three optimality columns are dashed (same convention as GPT-5
and the Azure row).

URL and affiliation cells are left blank for the maintainer to fill in
pre-merge, matching the Auto Router / Sqwish Router pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yl231 added a commit that referenced this pull request May 21, 2026
OrcaRouter-Adaptive (LinUCB contextual bandit with embedding features)
submitted in #104 by @ZhenghuaBao. With Acc-Cost Arena 72.08 it slots
between Sqwish (75.27) and Azure-Model-Router (71.87), landing at 🥈.
Every row below shifts down by one (16 → 17 entries).

Bot-reported metrics:
  Acc-Cost Arena  72.08
  Accuracy        75.54
  Cost/1K Queries $1.00 ($0.9976 rounded to 2dp like other rows)
  Robustness      22.62

The submitted prediction file does not include for_optimality entries,
so the three optimality columns are dashed (same convention as GPT-5
and the Azure row).

URL and affiliation cells are left blank for the maintainer to fill in
pre-merge, matching the Auto Router / Sqwish Router pattern.

Co-authored-by: yl231 <yl231@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ZhenghuaBao and others added 2 commits May 22, 2026 09:52
The Merge branch 'main' resolution dropped the `},` separator between the
last orcarouter-adaptive entry and the first upstream-added entry,
breaking JSON parse for the pre-commit check-json hook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yl231 yl231 self-assigned this May 25, 2026
@yl231 yl231 merged commit 571bf08 into RouteWorks:main May 25, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants