Skip to content

Refresh Azure-Model-Router metrics from PR #98 re-evaluation#107

Merged
yl231 merged 1 commit into
mainfrom
yl231-refresh-azure-metrics
May 21, 2026
Merged

Refresh Azure-Model-Router metrics from PR #98 re-evaluation#107
yl231 merged 1 commit into
mainfrom
yl231-refresh-azure-metrics

Conversation

@yl231

@yl231 yl231 commented May 21, 2026

Copy link
Copy Markdown
Contributor

The Azure submission was re-evaluated on May 20 after the CI disk-space fix (#99) cleared the OOM. Updating the leaderboard row to the new RouterArena metrics:

Acc-Cost Arena 66.66 → 71.87
Accuracy 68.09 → 72.82
Cost/1K $0.54 → $0.22
Robustness 54.07 → 71.43

The Azure submission was re-evaluated on May 20 after the CI disk-space
fix (#99) cleared the OOM. Updating the leaderboard row to the new
RouterArena metrics:

  Acc-Cost Arena 66.66 → 71.87
  Accuracy       68.09 → 72.82
  Cost/1K        $0.54 → $0.22
  Robustness     54.07 → 71.43

The new Azure prediction file has no for_optimality entries, so the
three optimality columns are dashed (same convention as GPT-5). The
previous 22.52/46.32/81.96 values were computed from an earlier Azure
submission that did include those entries and don't apply to the
current file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yl231 yl231 merged commit 9822cf9 into main May 21, 2026
10 checks passed
@yl231 yl231 deleted the yl231-refresh-azure-metrics branch May 21, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant