Add orcarouter-adaptive router predictions#104
Merged
yl231 merged 4 commits intoMay 25, 2026
Conversation
10-model pool: anthropic claude-haiku/sonnet-4, google gemini-2.5-flash/-lite, openai gpt-4o-mini/gpt-5-mini, deepseek chat/reasoner, alibaba qwen3-235b/30b. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add prefix→bare mappings (universal_model_names.py) and cost entries (model_cost/model_cost.json) for the 5 pool models not already present. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/evaluate |
Router Evaluation ResultsRouter: RouterArena Metrics
Evaluation completed by RouterArena automated workflow |
Contributor
Author
|
Hi @yl231, All pipelines pass and evaluation results are included (full 8400 + 420 robustness splits). We'd like to merge OrcaRouter Adaptive as an online learning entry — a different regime from the offline/pre-trained routers currently on the leaderboard, intended as a reference point for contextual bandits in routing. Thanks in advance for the review! |
yl231
added a commit
that referenced
this pull request
May 21, 2026
OrcaRouter-Adaptive (LinUCB contextual bandit with embedding features) submitted in #104 by @ZhenghuaBao. With Acc-Cost Arena 72.08 it slots between Sqwish (75.27) and Azure-Model-Router (71.87), landing at 🥈. Every row below shifts down by one (16 → 17 entries). Bot-reported metrics: Acc-Cost Arena 72.08 Accuracy 75.54 Cost/1K Queries $1.00 ($0.9976 rounded to 2dp like other rows) Robustness 22.62 The submitted prediction file does not include for_optimality entries, so the three optimality columns are dashed (same convention as GPT-5 and the Azure row). URL and affiliation cells are left blank for the maintainer to fill in pre-merge, matching the Auto Router / Sqwish Router pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yl231
added a commit
that referenced
this pull request
May 21, 2026
OrcaRouter-Adaptive (LinUCB contextual bandit with embedding features) submitted in #104 by @ZhenghuaBao. With Acc-Cost Arena 72.08 it slots between Sqwish (75.27) and Azure-Model-Router (71.87), landing at 🥈. Every row below shifts down by one (16 → 17 entries). Bot-reported metrics: Acc-Cost Arena 72.08 Accuracy 75.54 Cost/1K Queries $1.00 ($0.9976 rounded to 2dp like other rows) Robustness 22.62 The submitted prediction file does not include for_optimality entries, so the three optimality columns are dashed (same convention as GPT-5 and the Azure row). URL and affiliation cells are left blank for the maintainer to fill in pre-merge, matching the Auto Router / Sqwish Router pattern. Co-authored-by: yl231 <yl231@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Merge branch 'main' resolution dropped the `},` separator between the last orcarouter-adaptive entry and the first upstream-added entry, breaking JSON parse for the pre-commit check-json hook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yl231
approved these changes
May 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Method
orcarouter-adaptive is a LinUCB contextual bandit with embedding-augmented features.
/decidecall scores all candidate arms and picks the argmax.all-MiniLM-L6-v2, L2-normalized, CPU inference for determinism across environments)./decidecalls during evaluation are read-only on bandit state.Pool
10 models across 5 providers:
Files
router_inference/config/orcarouter-adaptive.json— router configrouter_inference/predictions/orcarouter-adaptive.json— full-split predictions with populatedgenerated_resultrouter_inference/predictions/orcarouter-adaptive-robustness.json— robustness predictions (routing-only)universal_model_names.py— provider-prefix → bare-name mappings for the poolmodel_cost/model_cost.json— token cost entries for pool models not previously listedLocal validation via
router_inference/check_config_prediction_files.pypasses for bothfullandrobustnesssplits.