diff --git a/eval/routerarena/leaderboard-pr/README.upstream.updated.md b/eval/routerarena/leaderboard-pr/README.upstream.updated.md
new file mode 100644
index 0000000..24c56b4
--- /dev/null
+++ b/eval/routerarena/leaderboard-pr/README.upstream.updated.md
@@ -0,0 +1,256 @@
+
+
+ Make Router Evaluation Open and Standardized
+
+
+
+
+
+**RouterArena** is an open evaluation platform and leaderboard for **LLM routers**βsystems that automatically select the best model for a given query. As the LLM ecosystem diversifies with models varying in size, capability, and cost, **routing** has become critical for balancing performance and cost. Yet, LLM routers currently lack a standardized evaluation framework to assess how effectively they trade off accuracy, cost, and other related metrics.
+
+RouterArena bridges this gap by providing an open evaluation platform and benchmarking framework for both open-source and commercial routers. It has the following key features:
+
+- π **Diverse Data Coverage**: A principly-constructed, diverse evaluation dataset spanning 9 domains and 44 categories with easy, medium, and hard difficulty levels.
+- π **Comprehensive Metrics**: Five router-critical metrics measuring accuracy, cost, optimality, robustness, and latency.
+- βοΈ **Automated Evaluation**: An automated evaluation framework to simplify the evaluation process for open-source and commercial routers.
+- π **Live Leaderboard**: A live leaderboard to track the performance of routers across multiple dimensions.
+
+*We aim for RouterArena to serve as a foundation for the community to evaluate, understand, and advance LLM routing systems.*
+
+> [!IMPORTANT]
+> **RouterArena is an evaluation-only dataset.** Submissions that train, fit, or tune any router component on RouterArena data (including the label files) will be rejected, and any accepted submission found in violation will be withdrawn.
+
+# Current Leaderboard
+
+For more details, please see our [website](https://routeworks.github.io/leaderboard) and [blog](https://huggingface.co/blog/JerryPotter/who-routes-the-routers).
+
+| Rank | Router | Affiliation | Acc-Cost Arena | Accuracy | Cost/1K Queries | Optimal Selection | Optimal Cost | Optimal Accuracy | Latency | Robustness |
+|------|--------------------|-----------------------------|--------|----------|---------|-----------------|--------------|----------------|---------|------------|
+| π₯ | [Sqwish Router](https://www.sqwish.ai/) | | 75.27 | 76.40 | $0.18 | 7.41 | 25.10 | 90.47 | β | 100.00 |
+| π₯ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β | β | β | β | 25.48 |
+| π₯ | [Weave Router](https://workweave.ai) | π Weave | 72.82 | 76.32 | $0.94 | β | β | β | β | 100.00 |
+| 4 | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β | β | β | β | 22.62 |
+| 5 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πΌ Microsoft | 71.87 | 72.82 | $0.22 | β | β | β | β | 71.43 |
+| 6 | [R2-Router](https://arxiv.org/abs/2602.02823/) | π UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β | 45.71 |
+| 7 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β | 49.52 |
+| 8 | [vLLMβSR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | π vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β | 90.95 |
+| 9 | [MIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 |
+| 10 | [NIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 |
+| 11 | [GPTβ5](https://openai.com/index/introducing-gpt-5/) | πΌ OpenAI | 64.32 | 73.96 | $10.02 | β | β | β | β | β |
+| 12 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | π UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 |
+| 13 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | π Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β | β |
+| 14 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β | β | β | β | 100.00 |
+| 15 | [RouterBenchβMLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 |
+| 16 | [NotDiamond](https://www.notdiamond.ai/) | πΌ NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β | 55.91 |
+| 17 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | π UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 |
+| 18 | [RouterBenchβKNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 |
+| 19 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | π Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 |
+| 20 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | π SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 |
+
+π Open-sourceββπΌ Closed-sourceβ
+
+
+
+
+
+# Evaluating Your Router
+
+## 1. Setup
+
+### Step 1.1: Install uv and RouterArena
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+cd RouterArena
+uv sync
+```
+
+### Step 1.2: Download Dataset
+Download the dataset from [HF dataset](https://huggingface.co/datasets/RouteWorks/RouterArena).
+
+```bash
+uv run python ./scripts/process_datasets/prep_datasets.py
+```
+
+### Step 1.3: Set Up API Keys (Optional)
+
+In the project root, copy `.env.example` as `.env` and update the API keys in `.env`. This step is **required only if you use our pipeline for LLM inferences**.
+
+```bash
+# Example .env file
+OPENAI_API_KEY=
+ANTHROPIC_API_KEY=
+# ...
+```
+
+See the [`ModelInference`](./llm_inference/model_inference.py) class for the complete list of supported providers and required environment variables. You can extend that class to support more models, or submit a GitHub issue to request support for new providers.
+
+## 2. Get Routing Decisions
+
+Follow the steps below to obtain your router's model choices for each query. Start with the `sub_10` split (a 10% subset) for local testing. Once your setup works, you can evaluate:
+- on the `full` dataset for full local evaluation and official leaderboard submission.
+- on the `robustness` dataset for robustness evaluation.
+
+### Step 2.1: Prepare Config File
+
+Create a config file in `./router_inference/config/.json`. An example config file is included [here](./router_inference/config/your-router.json).
+
+```json
+{
+ "pipeline_params": {
+ "router_name": "your-router",
+ "router_cls_name": "your_router_class_name",
+ "models": [
+ "gpt-4o-mini",
+ "claude-3-haiku-20240307",
+ "gemini-2.0-flash-001"
+ ]
+ }
+}
+```
+
+For each model in your config, add an entry with the pricing per million tokens in this format at [`model_cost/model_cost.json`](./model_cost/model_cost.json):
+
+```json
+{
+ "gpt-4o-mini": {
+ "input_token_price_per_million": 0.15,
+ "output_token_price_per_million": 0.6
+ },
+}
+```
+
+> [!NOTE]
+> Ensure all models in your above config files are listed in [`./universal_model_names.py`](./universal_model_names.py). If you add a new model, you must also add the API inference endpoint in [`llm_inference/model_inference.py`](./llm_inference/model_inference.py).
+
+### Step 2.2: Create Your Router Class and Generate Prediction File
+
+Create your own router class by inheriting from `BaseRouter` and implementing the `_get_prediction()` method. See [`router_inference/router/example_router.py`](./router_inference/router/example_router.py) for a complete example.
+
+Then, modify [`router_inference/router/__init__.py`](./router_inference/router/__init__.py) to include your router class:
+
+```python
+# Import your router class
+from router_inference.router.my_router import MyRouter
+
+__all__ = ["BaseRouter", "ExampleRouter", "MyRouter"]
+```
+
+Finally, generate the prediction file:
+
+```bash
+uv run python ./router_inference/generate_prediction_file.py your-router [sub_10|full|robustness]
+```
+
+> [!NOTE]
+> - The `` argument must match your config filename (without the `.json` extension). For example, if your config file is `router_inference/config/my-router.json`, use `my-router` as the argument.
+> - Your `_get_prediction()` method must return a model name that exists in your config file's `models` list. The base class will automatically validate this.
+
+### Step 2.3: Validate Config and Prediction Files
+
+```bash
+uv run python ./router_inference/check_config_prediction_files.py your-router [sub_10|full|robustness]
+```
+
+This script checks: (1) all model names are valid, (2) prediction file has correct size (809 for `sub_10`, 8400 for `full`, 420 for `robustness`), and (3) all entries have valid `global_index`, `prompt`, and `prediction` fields.
+
+## 3. Run LLM Inference
+
+Run the inference script to make API calls for each query using the selected models:
+
+```bash
+uv run python ./llm_inference/run.py your-router
+```
+
+The script loads your prediction file, makes API calls using the models specified in the `prediction` field, and saves results incrementally. It uses cached results when available and saves progress after each query, so you can safely interrupt and resume. Results are saved to `./cached_results/` for reuse across routers.
+> [!NOTE]
+> - For robustness evaluation, we only measure the model-selection flip ratio after adding noise to the original prompt, so no additional LLM inference is required for this stage.
+
+## 4. Run Router Evaluation
+
+As the last step, run the evaluation script:
+
+```bash
+uv run python ./llm_evaluation/run.py your-router [sub_10|full|robustness]
+```
+
+> [!TIP]
+> - Use `sub_10` or `full` to evaluate on those datasets.
+> - Use `robustness` to run robustness-only evaluation (expects `-robustness.json`).
+
+# Submitting to the leaderboard
+
+To get your router on the leaderboard, you can open a Pull Request with your router's prediction file to trigger our automated evaluation workflow. Details are as follows:
+
+1. **Add your files**:
+ - `router_inference/config/.json` - Your router configuration
+ - `router_inference/predictions/.json` - Your prediction file with `generated_result` fields populated
+ - `router_inference/predictions/-robustness.json` - Your prediction file for robustness evaluation, no `generated_result` fields needed
+2. **Open a Pull Request to `main` branch and call `/evaluate` in the PR comment**
+ - When the PR is ready for evaluation, call `/evaluate` in the PR comment to trigger the evaluation workflow. See an example [here](https://github.com/RouteWorks/RouterArena/pull/71#issuecomment-3904936480).
+ - The automated workflow will:
+ - Validate your submission
+ - Run evaluation on the full dataset
+ - Post results as a comment on your PR
+ - Update the leaderboard upon approval
+
+The Figure below shows the evaluation pipeline.
+
+
+
+
+
+# Contributing
+
+We welcome and appreciate contributions and collaborations of any kind.
+
+We use pre-commit to ensure a consistent coding style. You can set it up by
+
+```bash
+pip install pre-commit
+pre-commit install
+```
+
+Before pushing your code, run the following and make sure your code passes all checks.
+
+```bash
+pre-commit run --all-files
+```
+
+# Contacts
+
+Feel free to contact us for contributions and collaborations.
+
+```
+Yifan Lu (yifan.lu@rice.edu)
+Rixin Liu (rixin.liu@rice.edu)
+Jiarong Xing (jxing@rice.edu)
+```
+
+# Citation:
+If you find our project helpful, please give us a star and cite us by:
+
+```bibtax
+@misc{lu2025routerarenaopenplatformcomprehensive,
+ title = {RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers},
+ author = {Yifan Lu and Rixin Liu and Jiayi Yuan and Xingqi Cui and Shenrun Zhang and Hongyi Liu and Jiarong Xing},
+ year = {2025},
+ eprint = {2510.00202},
+ archivePrefix= {arXiv},
+ primaryClass = {cs.LG},
+ url = {https://arxiv.org/abs/2510.00202}
+}
+```
diff --git a/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md b/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md
new file mode 100644
index 0000000..95055aa
--- /dev/null
+++ b/eval/routerarena/leaderboard-pr/SUBMIT_INSTRUCTIONS.md
@@ -0,0 +1,65 @@
+# RouterArena leaderboard README update β Nadir at #2
+
+PR [RouteWorks/RouterArena#112](https://github.com/RouteWorks/RouterArena/pull/112)
+(`nadir-cascade-v2`) was merged on 2026-05-31. The automated evaluation bot
+returned the official scores below, but the leaderboard table in
+`RouteWorks/RouterArena/README.md` has not yet been regenerated to include the
+entry. This directory holds a ready-to-submit README edit that inserts Nadir at
+its earned rank.
+
+## Official evaluation result (from the PR's CI bot)
+
+| Metric | Value |
+|---|---|
+| Acc-Cost Arena score | **0.7333** (β 73.33) |
+| Accuracy | 74.87% |
+| Avg cost / 1K queries | $0.2932 (β $0.29) |
+| Robustness | 0.2548 (β 25.48) |
+| Queries | 8,400 (full split) |
+| Gate checks | Passed all 4 |
+
+Ranked by **Acc-Cost Arena** (the leaderboard's ranking column), 73.33 lands at
+**#2**, behind Sqwish Router (75.27) and ahead of Weave Router (72.82).
+
+> Note: the Robustness score (25.48) is low relative to the top entries
+> (100.00). The leaderboard ranks by Acc-Cost Arena, so #2 stands, but expect a
+> maintainer to ask about robustness. The PR notes a planned follow-up to
+> rebuild the robustness predictions to mirror main routing.
+
+## The change
+
+Insert one row for `Nadir Cascade` as the new π₯, demote Weave to π₯ and
+OrcaRouter to rank 4, and shift ranks 4β19 down to 5β20. See
+`nadir_leaderboard.patch` (a unified diff against
+`RouteWorks/RouterArena/main:README.md`) and `README.upstream.updated.md` (the
+full updated upstream README for reference).
+
+```
+| π₯ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β | β | β | β | 25.48 |
+```
+
+## How to open the upstream PR (from a RouterArena fork)
+
+This repo's automation is scoped to `doramirdor/getnadir.dev` and cannot push
+to `RouteWorks/RouterArena`, so the upstream PR has to be opened from a fork.
+
+```bash
+# 1. Fork RouteWorks/RouterArena on GitHub (once), then:
+git clone https://github.com//RouterArena.git
+cd RouterArena
+git remote add upstream https://github.com/RouteWorks/RouterArena.git
+git fetch upstream && git checkout -b leaderboard-add-nadir upstream/main
+
+# 2. Apply the patch from this directory
+git apply /path/to/getnadir.dev/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch
+# (if the upstream README moved and the patch won't apply cleanly,
+# copy the single Nadir row above into the table manually and renumber)
+
+# 3. Commit and push to your fork
+git add README.md
+git commit -m "Add Nadir Cascade to leaderboard (#2, Acc-Cost Arena 73.33)"
+git push -u origin leaderboard-add-nadir
+
+# 4. Open a PR from your fork's branch to RouteWorks/RouterArena:main
+# referencing the merged submission PR #112.
+```
diff --git a/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch b/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch
new file mode 100644
index 0000000..fe288cb
--- /dev/null
+++ b/eval/routerarena/leaderboard-pr/nadir_leaderboard.patch
@@ -0,0 +1,46 @@
+--- a/README.md
++++ b/README.md
+@@ -38,24 +38,25 @@
+ | Rank | Router | Affiliation | Acc-Cost Arena | Accuracy | Cost/1K Queries | Optimal Selection | Optimal Cost | Optimal Accuracy | Latency | Robustness |
+ |------|--------------------|-----------------------------|--------|----------|---------|-----------------|--------------|----------------|---------|------------|
+ | π₯ | [Sqwish Router](https://www.sqwish.ai/) | | 75.27 | 76.40 | $0.18 | 7.41 | 25.10 | 90.47 | β | 100.00 |
+-| π₯ | [Weave Router](https://workweave.ai) | π Weave | 72.82 | 76.32 | $0.94 | β | β | β | β | 100.00 |
+-| π₯ | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β | β | β | β | 22.62 |
+-| 4 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πΌ Microsoft | 71.87 | 72.82 | $0.22 | β | β | β | β | 71.43 |
+-| 5 | [R2-Router](https://arxiv.org/abs/2602.02823/) | π UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β | 45.71 |
+-| 6 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β | 49.52 |
+-| 7 | [vLLMβSR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | π vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β | 90.95 |
+-| 8 | [MIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 |
+-| 9 | [NIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 |
+-| 10 | [GPTβ5](https://openai.com/index/introducing-gpt-5/) | πΌ OpenAI | 64.32 | 73.96 | $10.02 | β | β | β | β | β |
+-| 11 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | π UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 |
+-| 12 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | π Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β | β |
+-| 13 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β | β | β | β | 100.00 |
+-| 14 | [RouterBenchβMLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 |
+-| 15 | [NotDiamond](https://www.notdiamond.ai/) | πΌ NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β | 55.91 |
+-| 16 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | π UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 |
+-| 17 | [RouterBenchβKNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 |
+-| 18 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | π Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 |
+-| 19 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | π SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 |
++| π₯ | [Nadir Cascade](https://getnadir.com) | | 73.33 | 74.87 | $0.29 | β | β | β | β | 25.48 |
++| π₯ | [Weave Router](https://workweave.ai) | π Weave | 72.82 | 76.32 | $0.94 | β | β | β | β | 100.00 |
++| 4 | [OrcaRouter-Adaptive](https://www.orcarouter.ai/) | | 72.08 | 75.54 | $1.00 | β | β | β | β | 22.62 |
++| 5 | [Azure-Model-Router](https://ai.azure.com/catalog/models/model-router) [[Web]](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/model-router) | πΌ Microsoft | 71.87 | 72.82 | $0.22 | β | β | β | β | 71.43 |
++| 6 | [R2-Router](https://arxiv.org/abs/2602.02823/) | π UCF | 71.60 | 71.23 | $0.06 | 24.51 | 48.70 | 99.85 | β | 45.71 |
++| 7 | [Auto Router]() | | 70.05 | 70.17 | $0.12 | 37.58 | 40.02 | 86.04 | β | 49.52 |
++| 8 | [vLLMβSR](https://vllm-semantic-router.com/) [[Code]](https://github.com/vllm-project/semantic-router) [[HF]](https://huggingface.co/llm-semantic-router) | π vLLM SR Team | 67.23 | 66.53 | $0.06 | 84.66 | 90.71 | 89.24 | β | 90.95 |
++| 9 | [MIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.89 | 66.88 | $0.15 | 3.44 | 19.62 | 78.18 | 27.03 | 61.19 |
++| 10 | [NIRTβBERT](https://arxiv.org/pdf/2506.01048) [[Code]](https://github.com/Mercidaiha/IRT-Router) | π USTC | 66.12 | 66.34 | $0.21 | 3.83 | 14.04 | 77.88 | 10.42 | 49.29 |
++| 11 | [GPTβ5](https://openai.com/index/introducing-gpt-5/) | πΌ OpenAI | 64.32 | 73.96 | $10.02 | β | β | β | β | β |
++| 12 | [CARROT](https://arxiv.org/abs/2502.03261) [[Code]](https://github.com/somerstep/CARROT) [[HF]](https://huggingface.co/CARROT-LLM-Routing) | π UMich | 63.87 | 67.21 | $2.06 | 2.68 | 6.77 | 78.63 | 1.50 | 89.05 |
++| 13 | [Chayan](https://huggingface.co/adaptive-classifier/chayan) [[HF]](https://huggingface.co/adaptive-classifier/chayan) | π Adaptive Classifier | 63.83 | 64.89 | $0.56 | 43.03 | 43.75 | 88.74 | β | β |
++| 14 | [AgentForge Router]() | | 60.12 | 59.16 | $0.09 | β | β | β | β | 100.00 |
++| 15 | [RouterBenchβMLP](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 57.56 | 61.62 | $4.83 | 13.39 | 24.45 | 83.32 | 90.91 | 80.00 |
++| 16 | [NotDiamond](https://www.notdiamond.ai/) | πΌ NotDiamond | 57.29 | 60.83 | $4.10 | 1.55 | 2.14 | 76.81 | β | 55.91 |
++| 17 | [GraphRouter](https://arxiv.org/abs/2410.03834) [[Code]](https://github.com/ulab-uiuc/GraphRouter) | π UIUC | 57.22 | 57.00 | $0.34 | 4.73 | 38.33 | 74.25 | 2.70 | 94.29 |
++| 18 | [RouterBenchβKNN](https://arxiv.org/pdf/2403.12031) [[Code]](https://github.com/withmartian/routerbench) [[HF]](https://huggingface.co/datasets/withmartian/routerbench) | π Martian | 55.48 | 58.69 | $4.27 | 13.09 | 25.49 | 78.77 | 1.33 | 83.33 |
++| 19 | [RouteLLM](https://arxiv.org/abs/2406.18665) [[Code]](https://github.com/lm-sys/RouteLLM) [[HF]](https://huggingface.co/routellm) | π Berkeley | 48.07 | 47.04 | $0.27 | 99.72 | 99.63 | 68.76 | 0.40 | 100.00 |
++| 20 | [RouterDC](https://arxiv.org/abs/2409.19886) [[Code]](https://github.com/shuhao02/RouterDC) | π SUSTech | 33.75 | 32.01 | $0.07 | 39.84 | 73.00 | 49.05 | 10.75 | 85.24 |
+
+ π Open-sourceββπΌ Closed-sourceβ
+