Skip to content

[DON'T MERGE] Update the eval procedure & Test the Evaluation Workflow#41

Closed
yl231 wants to merge 7 commits into
mainfrom
eval-pipe-update
Closed

[DON'T MERGE] Update the eval procedure & Test the Evaluation Workflow#41
yl231 wants to merge 7 commits into
mainfrom
eval-pipe-update

Conversation

@yl231

@yl231 yl231 commented Nov 27, 2025

Copy link
Copy Markdown
Contributor

Summary

Updates the evaluation pipeline for public dataset support and adds multi-threaded evaluation.

Key changes:

  • Multi-threaded evaluation with configurable workers (--num-workers)
  • Migrated to public RouteWorks/RouterArena dataset (removed HF_TOKEN)
  • Fixed column name handling for dataset compatibility
  • Added glm-4-air-router submission

Pre-commit checks pass. Ready for evaluation workflow.

@github-actions

Copy link
Copy Markdown

📊 Router Evaluation Results

Router: glm-4-air-router
Dataset Split: full

Metrics

Metric Value
RouterArena Score 0.5617
Accuracy 54.65%
Total Cost $0.396728
Avg Cost per Query $0.000047
Avg Cost per 1K Queries $0.0472
Number of Queries 8400

Evaluation completed by RouterArena automated workflow

1 similar comment
@github-actions

Copy link
Copy Markdown

📊 Router Evaluation Results

Router: glm-4-air-router
Dataset Split: full

Metrics

Metric Value
RouterArena Score 0.5617
Accuracy 54.65%
Total Cost $0.396728
Avg Cost per Query $0.000047
Avg Cost per 1K Queries $0.0472
Number of Queries 8400

Evaluation completed by RouterArena automated workflow

@yl231

yl231 commented Nov 28, 2025

Copy link
Copy Markdown
Contributor Author

Test passed so I will close this test PR.

@yl231 yl231 closed this Nov 28, 2025
@github-actions

Copy link
Copy Markdown

📊 Router Evaluation Results

Router: glm-4-air-router
Dataset Split: full

Metrics

Metric Value
RouterArena Score 0.5617
Accuracy 54.65%
Total Cost $0.396728
Avg Cost per Query $0.000047
Avg Cost per 1K Queries $0.0472
Number of Queries 8400

Evaluation completed by RouterArena automated workflow

@jiarong0907 jiarong0907 deleted the eval-pipe-update branch November 28, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant