Skip to content

RouterArena leaderboard: place Nadir at #2#10

Draft
doramirdor wants to merge 1 commit into
mainfrom
claude/readme-routerarena-update-2HcNn
Draft

RouterArena leaderboard: place Nadir at #2#10
doramirdor wants to merge 1 commit into
mainfrom
claude/readme-routerarena-update-2HcNn

Conversation

@doramirdor
Copy link
Copy Markdown
Owner

What

RouteWorks/RouterArena#112 (nadir-cascade-v2) was merged on 2026-05-31, but the leaderboard table in RouterArena's README hasn't been regenerated to include the entry yet. This stages a ready-to-submit README edit that inserts Nadir at its earned rank.

Official evaluation result (RouterArena CI bot)

Metric Value
Acc-Cost Arena score 0.7333 (→ 73.33)
Accuracy 74.87%
Avg cost / 1K queries $0.2932
Robustness 0.2548 (→ 25.48)
Queries 8,400 (full split)

Ranked by Acc-Cost Arena (the leaderboard's ranking column), 73.33 lands at #2, behind Sqwish (75.27) and ahead of Weave (72.82).

Files (under eval/routerarena/leaderboard-pr/)

  • nadir_leaderboard.patch — unified diff against RouteWorks/RouterArena:main/README.md inserting Nadir as 🥈 and shifting ranks below it down by one.
  • README.upstream.updated.md — full updated upstream README for reference.
  • SUBMIT_INSTRUCTIONS.md — steps to open the PR from a RouterArena fork.

⚠️ Cannot open the upstream PR from here

This session's GitHub access is scoped to doramirdor/getnadir.dev, so it can't push to or open a PR against RouteWorks/RouterArena. The upstream PR must be opened from a fork — SUBMIT_INSTRUCTIONS.md has the exact commands.

Caveat

The Robustness score (25.48) is low vs the top entries (100.00). The board ranks by Acc-Cost Arena so #2 stands, but a maintainer may ask about it; PR #112 notes a planned follow-up to rebuild the robustness predictions.

https://claude.ai/code/session_013z3xX1X21qcdFZPU5q3kS8


Generated by Claude Code

PR RouteWorks/RouterArena#112 (nadir-cascade-v2) merged 2026-05-31 with
official Acc-Cost Arena score 0.7333 (74.87% accuracy, $0.2932/1K), which
ranks #2 behind Sqwish (75.27). Stage a ready-to-submit upstream README
edit, patch, and submission instructions, since the live leaderboard table
has not yet been regenerated to include the entry.

https://claude.ai/code/session_013z3xX1X21qcdFZPU5q3kS8
@netlify
Copy link
Copy Markdown

netlify Bot commented May 31, 2026

👷 Deploy Preview for nadir-landing processing.

Name Link
🔨 Latest commit b77f925
🔍 Latest deploy log https://app.netlify.com/projects/nadir-landing/deploys/6a1cc401b7690c0008f12ee0

@netlify
Copy link
Copy Markdown

netlify Bot commented May 31, 2026

Deploy Preview for nadir-landing canceled.

Name Link
🔨 Latest commit b77f925
🔍 Latest deploy log https://app.netlify.com/projects/nadir-landing/deploys/6a1cc401b7690c0008f12ee0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants