Skip to content

Commit 1ec8886

Browse files
authored
Merge branch 'main' into openhands/fix-issue-81
2 parents 6cbf718 + 00bdb7f commit 1ec8886

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+52872
-38
lines changed

evaluation/qwen_eval/README.md

Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,20 @@ pip install -r requirements.txt
99

1010

1111
## Usage
12+
### Generation & Evaluation
13+
14+
To run the evaluation:
15+
16+
1. **Configure the prompt type**: Add your custom prompt type in `utils.py` and specify it in `sh/run_evaluate.sh`
17+
18+
2. **Set the model path**: Update the `MODEL_NAME_OR_PATH` variable in `sh/run_evaluate.sh` with your model's path
19+
20+
3. **Run evaluation**: Execute the following command to generate predictions and evaluate results:
21+
```bash
22+
bash sh/run_evaluate.sh
23+
```
24+
25+
### Leaderboard
1226
```bash
1327
BASE_DIR=./eval_example/
1428

@@ -17,11 +31,11 @@ python evaluate_final.py --eval_path $BASE_DIR
1731
BASE_DIR is the directory that contains different datasets folders. The structure of the directory is as follows:
1832
```bash
1933
BASE_DIR
20-
├── math500
34+
├── dataset1
2135
│ ├── example.jsonl
22-
├── minerva_math
36+
├── dataset2
2337
│ ├── example.jsonl
24-
├── olympiadbench
38+
├── dataset3
2539
│ ├── example.jsonl
2640
```
2741

8.8 KB
Binary file not shown.
6.73 KB
Binary file not shown.
5.95 KB
Binary file not shown.
12.7 KB
Binary file not shown.
6.56 KB
Binary file not shown.
5.24 KB
Binary file not shown.
9.41 KB
Binary file not shown.

evaluation/qwen_eval/data/aime24/test.jsonl

Lines changed: 30 additions & 0 deletions
Large diffs are not rendered by default.

evaluation/qwen_eval/data/amc23/test.jsonl

Lines changed: 40 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)