RouterArena leaderboard: place Nadir at #2#10
Draft
doramirdor wants to merge 1 commit into
Draft
Conversation
PR RouteWorks/RouterArena#112 (nadir-cascade-v2) merged 2026-05-31 with official Acc-Cost Arena score 0.7333 (74.87% accuracy, $0.2932/1K), which ranks #2 behind Sqwish (75.27). Stage a ready-to-submit upstream README edit, patch, and submission instructions, since the live leaderboard table has not yet been regenerated to include the entry. https://claude.ai/code/session_013z3xX1X21qcdFZPU5q3kS8
👷 Deploy Preview for nadir-landing processing.
|
✅ Deploy Preview for nadir-landing canceled.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
RouteWorks/RouterArena#112 (
nadir-cascade-v2) was merged on 2026-05-31, but the leaderboard table in RouterArena's README hasn't been regenerated to include the entry yet. This stages a ready-to-submit README edit that inserts Nadir at its earned rank.Official evaluation result (RouterArena CI bot)
Ranked by Acc-Cost Arena (the leaderboard's ranking column), 73.33 lands at #2, behind Sqwish (75.27) and ahead of Weave (72.82).
Files (under
eval/routerarena/leaderboard-pr/)nadir_leaderboard.patch— unified diff againstRouteWorks/RouterArena:main/README.mdinserting Nadir as 🥈 and shifting ranks below it down by one.README.upstream.updated.md— full updated upstream README for reference.SUBMIT_INSTRUCTIONS.md— steps to open the PR from a RouterArena fork.This session's GitHub access is scoped to
doramirdor/getnadir.dev, so it can't push to or open a PR againstRouteWorks/RouterArena. The upstream PR must be opened from a fork —SUBMIT_INSTRUCTIONS.mdhas the exact commands.Caveat
The Robustness score (25.48) is low vs the top entries (100.00). The board ranks by Acc-Cost Arena so #2 stands, but a maintainer may ask about it; PR #112 notes a planned follow-up to rebuild the robustness predictions.
https://claude.ai/code/session_013z3xX1X21qcdFZPU5q3kS8
Generated by Claude Code