diff --git a/eval/routerarena/leaderboard-pr/README.upstream.updated.md b/eval/routerarena/leaderboard-pr/README.upstream.updated.md new file mode 100644 index 0000000..24c56b4 --- /dev/null +++ b/eval/routerarena/leaderboard-pr/README.upstream.updated.md @@ -0,0 +1,256 @@ +
+ RouterArena logo + +
+

+ Blog + arXiv: RouterArena + Hugging Face Dataset +
+

+ +
+ +

Make Router Evaluation Open and Standardized

+ +

+ RouterArena Diagram +

+ +**RouterArena** is an open evaluation platform and leaderboard for **LLM routers**β€”systems that automatically select the best model for a given query. As the LLM ecosystem diversifies with models varying in size, capability, and cost, **routing** has become critical for balancing performance and cost. Yet, LLM routers currently lack a standardized evaluation framework to assess how effectively they trade off accuracy, cost, and other related metrics. + +RouterArena bridges this gap by providing an open evaluation platform and benchmarking framework for both open-source and commercial routers. It has the following key features: + +- 🌍 **Diverse Data Coverage**: A principly-constructed, diverse evaluation dataset spanning 9 domains and 44 categories with easy, medium, and hard difficulty levels. +- πŸ“Š **Comprehensive Metrics**: Five router-critical metrics measuring accuracy, cost, optimality, robustness, and latency. +- βš™οΈ **Automated Evaluation**: An automated evaluation framework to simplify the evaluation process for open-source and commercial routers. +- πŸ† **Live Leaderboard**: A live leaderboard to track the performance of routers across multiple dimensions. + +*We aim for RouterArena to serve as a foundation for the community to evaluate, understand, and advance LLM routing systems.* + +> [!IMPORTANT] +> **RouterArena is an evaluation-only dataset.** Submissions that train, fit, or tune any router component on RouterArena data (including the label files) will be rejected, and any accepted submission found in violation will be withdrawn. + +# Current Leaderboard + +For more details, please see our [website](https://routeworks.github.io/leaderboard) and [blog](https://huggingface.co/blog/JerryPotter/who-routes-the-routers). + +| Rank | Router | Affiliation | Acc-Cost Arena | Accuracy | Cost/1K Queries | Optimal Selection | Optimal Cost | Optimal Accuracy | Latency | Robustness | +|------|--------------------|-----------------------------|--------|----------|---------|-----------------|--------------|----------------|---------|------------| +| πŸ₯‡ | [Sqwish Router](https://www.sqwish.ai/) | | 75.27 | 76.40 | $0.18 | 7.41 | 25.10 | 90.47 | β€” | 100.00 | +| πŸ₯ˆ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β€” | β€” | β€” | β€” | 25.48 | +| πŸ₯‰ | [Weave Router](https://workweave.ai) | πŸŽ“ Weave | 72.82 | 76.32 | $0.94 | β€” | β€” | β€” | β€” | 100.00 | +| 4 | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β€” | β€” | β€” | β€” | 22.62 | +| 5 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πŸ’Ό Microsoft | 71.87 | 72.82 | $0.22 | β€” | β€” | β€” | β€” | 71.43 | +| 6 | [R2-Router](https://arxiv.org/abs/2602.02823/) | πŸŽ“ UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β€” | 45.71 | +| 7 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β€” | 49.52 | +| 8 | [vLLM‑SR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | πŸŽ“ vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β€” | 90.95 | +| 9 | [MIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 | +| 10 | [NIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 | +| 11 | [GPT‑5](https://openai.com/index/introducing-gpt-5/) | πŸ’Ό OpenAI | 64.32 | 73.96 | $10.02 | β€” | β€” | β€” | β€” | β€” | +| 12 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | πŸŽ“ UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 | +| 13 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | πŸŽ“ Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β€” | β€” | +| 14 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β€” | β€” | β€” | β€” | 100.00 | +| 15 | [RouterBench‑MLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 | +| 16 | [NotDiamond](https://www.notdiamond.ai/) | πŸ’Ό NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β€” | 55.91 | +| 17 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | πŸŽ“ UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 | +| 18 | [RouterBench‑KNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 | +| 19 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | πŸŽ“ Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 | +| 20 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | πŸŽ“ SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 | + +πŸŽ“ Open-sourceβ€ƒβ€ƒπŸ’Ό Closed-source  + + + + + +# Evaluating Your Router + +## 1. Setup + +### Step 1.1: Install uv and RouterArena + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +cd RouterArena +uv sync +``` + +### Step 1.2: Download Dataset +Download the dataset from [HF dataset](https://huggingface.co/datasets/RouteWorks/RouterArena). + +```bash +uv run python ./scripts/process_datasets/prep_datasets.py +``` + +### Step 1.3: Set Up API Keys (Optional) + +In the project root, copy `.env.example` as `.env` and update the API keys in `.env`. This step is **required only if you use our pipeline for LLM inferences**. + +```bash +# Example .env file +OPENAI_API_KEY= +ANTHROPIC_API_KEY= +# ... +``` + +See the [`ModelInference`](./llm_inference/model_inference.py) class for the complete list of supported providers and required environment variables. You can extend that class to support more models, or submit a GitHub issue to request support for new providers. + +## 2. Get Routing Decisions + +Follow the steps below to obtain your router's model choices for each query. Start with the `sub_10` split (a 10% subset) for local testing. Once your setup works, you can evaluate: +- on the `full` dataset for full local evaluation and official leaderboard submission. +- on the `robustness` dataset for robustness evaluation. + +### Step 2.1: Prepare Config File + +Create a config file in `./router_inference/config/.json`. An example config file is included [here](./router_inference/config/your-router.json). + +```json +{ + "pipeline_params": { + "router_name": "your-router", + "router_cls_name": "your_router_class_name", + "models": [ + "gpt-4o-mini", + "claude-3-haiku-20240307", + "gemini-2.0-flash-001" + ] + } +} +``` + +For each model in your config, add an entry with the pricing per million tokens in this format at [`model_cost/model_cost.json`](./model_cost/model_cost.json): + +```json +{ + "gpt-4o-mini": { + "input_token_price_per_million": 0.15, + "output_token_price_per_million": 0.6 + }, +} +``` + +> [!NOTE] +> Ensure all models in your above config files are listed in [`./universal_model_names.py`](./universal_model_names.py). If you add a new model, you must also add the API inference endpoint in [`llm_inference/model_inference.py`](./llm_inference/model_inference.py). + +### Step 2.2: Create Your Router Class and Generate Prediction File + +Create your own router class by inheriting from `BaseRouter` and implementing the `_get_prediction()` method. See [`router_inference/router/example_router.py`](./router_inference/router/example_router.py) for a complete example. + +Then, modify [`router_inference/router/__init__.py`](./router_inference/router/__init__.py) to include your router class: + +```python +# Import your router class +from router_inference.router.my_router import MyRouter + +__all__ = ["BaseRouter", "ExampleRouter", "MyRouter"] +``` + +Finally, generate the prediction file: + +```bash +uv run python ./router_inference/generate_prediction_file.py your-router [sub_10|full|robustness] +``` + +> [!NOTE] +> - The `` argument must match your config filename (without the `.json` extension). For example, if your config file is `router_inference/config/my-router.json`, use `my-router` as the argument. +> - Your `_get_prediction()` method must return a model name that exists in your config file's `models` list. The base class will automatically validate this. + +### Step 2.3: Validate Config and Prediction Files + +```bash +uv run python ./router_inference/check_config_prediction_files.py your-router [sub_10|full|robustness] +``` + +This script checks: (1) all model names are valid, (2) prediction file has correct size (809 for `sub_10`, 8400 for `full`, 420 for `robustness`), and (3) all entries have valid `global_index`, `prompt`, and `prediction` fields. + +## 3. Run LLM Inference + +Run the inference script to make API calls for each query using the selected models: + +```bash +uv run python ./llm_inference/run.py your-router +``` + +The script loads your prediction file, makes API calls using the models specified in the `prediction` field, and saves results incrementally. It uses cached results when available and saves progress after each query, so you can safely interrupt and resume. Results are saved to `./cached_results/` for reuse across routers. +> [!NOTE] +> - For robustness evaluation, we only measure the model-selection flip ratio after adding noise to the original prompt, so no additional LLM inference is required for this stage. + +## 4. Run Router Evaluation + +As the last step, run the evaluation script: + +```bash +uv run python ./llm_evaluation/run.py your-router [sub_10|full|robustness] +``` + +> [!TIP] +> - Use `sub_10` or `full` to evaluate on those datasets. +> - Use `robustness` to run robustness-only evaluation (expects `-robustness.json`). + +# Submitting to the leaderboard + +To get your router on the leaderboard, you can open a Pull Request with your router's prediction file to trigger our automated evaluation workflow. Details are as follows: + +1. **Add your files**: + - `router_inference/config/.json` - Your router configuration + - `router_inference/predictions/.json` - Your prediction file with `generated_result` fields populated + - `router_inference/predictions/-robustness.json` - Your prediction file for robustness evaluation, no `generated_result` fields needed +2. **Open a Pull Request to `main` branch and call `/evaluate` in the PR comment** + - When the PR is ready for evaluation, call `/evaluate` in the PR comment to trigger the evaluation workflow. See an example [here](https://github.com/RouteWorks/RouterArena/pull/71#issuecomment-3904936480). + - The automated workflow will: + - Validate your submission + - Run evaluation on the full dataset + - Post results as a comment on your PR + - Update the leaderboard upon approval + +The Figure below shows the evaluation pipeline. + +

+ RouterArena Evaluation Pipeline +

+ +# Contributing + +We welcome and appreciate contributions and collaborations of any kind. + +We use pre-commit to ensure a consistent coding style. You can set it up by + +```bash +pip install pre-commit +pre-commit install +``` + +Before pushing your code, run the following and make sure your code passes all checks. + +```bash +pre-commit run --all-files +``` + +# Contacts + +Feel free to contact us for contributions and collaborations. + +``` +Yifan Lu (yifan.lu@rice.edu) +Rixin Liu (rixin.liu@rice.edu) +Jiarong Xing (jxing@rice.edu) +``` + +# Citation: +If you find our project helpful, please give us a star and cite us by: + +```bibtax +@misc{lu2025routerarenaopenplatformcomprehensive, + title = {RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers}, + author = {Yifan Lu and Rixin Liu and Jiayi Yuan and Xingqi Cui and Shenrun Zhang and Hongyi Liu and Jiarong Xing}, + year = {2025}, + eprint = {2510.00202}, + archivePrefix= {arXiv}, + primaryClass = {cs.LG}, + url = {https://arxiv.org/abs/2510.00202} +} +``` diff --git a/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md b/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md new file mode 100644 index 0000000..95055aa --- /dev/null +++ b/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md @@ -0,0 +1,65 @@ +# RouterArena leaderboard README update β€” Nadir at #2 + +PR [RouteWorks/RouterArena#112](https://github.com/RouteWorks/RouterArena/pull/112) +(`nadir-cascade-v2`) was merged on 2026-05-31. The automated evaluation bot +returned the official scores below, but the leaderboard table in +`RouteWorks/RouterArena/README.md` has not yet been regenerated to include the +entry. This directory holds a ready-to-submit README edit that inserts Nadir at +its earned rank. + +## Official evaluation result (from the PR's CI bot) + +| Metric | Value | +|---|---| +| Acc-Cost Arena score | **0.7333** (β†’ 73.33) | +| Accuracy | 74.87% | +| Avg cost / 1K queries | $0.2932 (β†’ $0.29) | +| Robustness | 0.2548 (β†’ 25.48) | +| Queries | 8,400 (full split) | +| Gate checks | Passed all 4 | + +Ranked by **Acc-Cost Arena** (the leaderboard's ranking column), 73.33 lands at +**#2**, behind Sqwish Router (75.27) and ahead of Weave Router (72.82). + +> Note: the Robustness score (25.48) is low relative to the top entries +> (100.00). The leaderboard ranks by Acc-Cost Arena, so #2 stands, but expect a +> maintainer to ask about robustness. The PR notes a planned follow-up to +> rebuild the robustness predictions to mirror main routing. + +## The change + +Insert one row for `Nadir Cascade` as the new πŸ₯ˆ, demote Weave to πŸ₯‰ and +OrcaRouter to rank 4, and shift ranks 4–19 down to 5–20. See +`nadir_leaderboard.patch` (a unified diff against +`RouteWorks/RouterArena/main:README.md`) and `README.upstream.updated.md` (the +full updated upstream README for reference). + +``` +| πŸ₯ˆ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β€” | β€” | β€” | β€” | 25.48 | +``` + +## How to open the upstream PR (from a RouterArena fork) + +This repo's automation is scoped to `doramirdor/getnadir.dev` and cannot push +to `RouteWorks/RouterArena`, so the upstream PR has to be opened from a fork. + +```bash +# 1. Fork RouteWorks/RouterArena on GitHub (once), then: +git clone https://github.com//RouterArena.git +cd RouterArena +git remote add upstream https://github.com/RouteWorks/RouterArena.git +git fetch upstream && git checkout -b leaderboard-add-nadir upstream/main + +# 2. Apply the patch from this directory +git apply /path/to/getnadir.dev/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch +# (if the upstream README moved and the patch won't apply cleanly, +# copy the single Nadir row above into the table manually and renumber) + +# 3. Commit and push to your fork +git add README.md +git commit -m "Add Nadir Cascade to leaderboard (#2, Acc-Cost Arena 73.33)" +git push -u origin leaderboard-add-nadir + +# 4. Open a PR from your fork's branch to RouteWorks/RouterArena:main +# referencing the merged submission PR #112. +``` diff --git a/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch b/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch new file mode 100644 index 0000000..fe288cb --- /dev/null +++ b/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch @@ -0,0 +1,46 @@ +--- a/README.md ++++ b/README.md +@@ -38,24 +38,25 @@ + | Rank | Router | Affiliation | Acc-Cost Arena | Accuracy | Cost/1K Queries | Optimal Selection | Optimal Cost | Optimal Accuracy | Latency | Robustness | + |------|--------------------|-----------------------------|--------|----------|---------|-----------------|--------------|----------------|---------|------------| + | πŸ₯‡ | [Sqwish Router](https://www.sqwish.ai/) | | 75.27 | 76.40 | $0.18 | 7.41 | 25.10 | 90.47 | β€” | 100.00 | +-| πŸ₯ˆ | [Weave Router](https://workweave.ai) | πŸŽ“ Weave | 72.82 | 76.32 | $0.94 | β€” | β€” | β€” | β€” | 100.00 | +-| πŸ₯‰ | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β€” | β€” | β€” | β€” | 22.62 | +-| 4 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πŸ’Ό Microsoft | 71.87 | 72.82 | $0.22 | β€” | β€” | β€” | β€” | 71.43 | +-| 5 | [R2-Router](https://arxiv.org/abs/2602.02823/) | πŸŽ“ UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β€” | 45.71 | +-| 6 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β€” | 49.52 | +-| 7 | [vLLM‑SR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | πŸŽ“ vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β€” | 90.95 | +-| 8 | [MIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 | +-| 9 | [NIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 | +-| 10 | [GPT‑5](https://openai.com/index/introducing-gpt-5/) | πŸ’Ό OpenAI | 64.32 | 73.96 | $10.02 | β€” | β€” | β€” | β€” | β€” | +-| 11 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | πŸŽ“ UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 | +-| 12 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | πŸŽ“ Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β€” | β€” | +-| 13 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β€” | β€” | β€” | β€” | 100.00 | +-| 14 | [RouterBench‑MLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 | +-| 15 | [NotDiamond](https://www.notdiamond.ai/) | πŸ’Ό NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β€” | 55.91 | +-| 16 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | πŸŽ“ UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 | +-| 17 | [RouterBench‑KNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 | +-| 18 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | πŸŽ“ Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 | +-| 19 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | πŸŽ“ SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 | ++| πŸ₯ˆ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β€” | β€” | β€” | β€” | 25.48 | ++| πŸ₯‰ | [Weave Router](https://workweave.ai) | πŸŽ“ Weave | 72.82 | 76.32 | $0.94 | β€” | β€” | β€” | β€” | 100.00 | ++| 4 | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β€” | β€” | β€” | β€” | 22.62 | ++| 5 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πŸ’Ό Microsoft | 71.87 | 72.82 | $0.22 | β€” | β€” | β€” | β€” | 71.43 | ++| 6 | [R2-Router](https://arxiv.org/abs/2602.02823/) | πŸŽ“ UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β€” | 45.71 | ++| 7 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β€” | 49.52 | ++| 8 | [vLLM‑SR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | πŸŽ“ vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β€” | 90.95 | ++| 9 | [MIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 | ++| 10 | [NIRT‑BERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | πŸŽ“ USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 | ++| 11 | [GPT‑5](https://openai.com/index/introducing-gpt-5/) | πŸ’Ό OpenAI | 64.32 | 73.96 | $10.02 | β€” | β€” | β€” | β€” | β€” | ++| 12 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | πŸŽ“ UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 | ++| 13 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | πŸŽ“ Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β€” | β€” | ++| 14 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β€” | β€” | β€” | β€” | 100.00 | ++| 15 | [RouterBench‑MLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 | ++| 16 | [NotDiamond](https://www.notdiamond.ai/) | πŸ’Ό NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β€” | 55.91 | ++| 17 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | πŸŽ“ UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 | ++| 18 | [RouterBench‑KNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | πŸŽ“ Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 | ++| 19 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | πŸŽ“ Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 | ++| 20 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | πŸŽ“ SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 | + + πŸŽ“ Open-sourceβ€ƒβ€ƒπŸ’Ό Closed-source  +