The model Qwen2.5-Math-7B without fine-tuning cannot reproduce the ratings in the paper #5

chen-t-r · 2025-03-12T11:50:32Z

The scoring results of the Qwen2.5-Math-7B basic model in the paper：
math minerva_math gsm8k olympiadbench amc23 aime24 theoremqa avg
55.4 13.6 91.6 16.1 40.0 10.0 None 37.8

Actual rating results of Qwen2.5-Math-7B basic model：
math minerva_math gsm8k olympiadbench amc23 aime24 theoremqa avg
53.0 14.0 59.0 16.1 45.0 13.3 28.7 32.7

There is a significant difference in the rating of gsm8k among them

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The model Qwen2.5-Math-7B without fine-tuning cannot reproduce the ratings in the paper #5

The model Qwen2.5-Math-7B without fine-tuning cannot reproduce the ratings in the paper #5

chen-t-r commented Mar 12, 2025

The model Qwen2.5-Math-7B without fine-tuning cannot reproduce the ratings in the paper #5

The model Qwen2.5-Math-7B without fine-tuning cannot reproduce the ratings in the paper #5

Comments

chen-t-r commented Mar 12, 2025