Add VL-RewardBench #703

TobiasLee · 2024-12-30T12:33:37Z

Hi there,

Thanks for your awesome project, which helps a lot for LMM evaluation & development!

This PR incorporates our recently released VL-RewardBench.
Example script:

python run.py --data VL-RewardBench --model GPT4o

Saved results for GPT4O-MINI:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.4552736982643525","0.6477987421383647","0.4371584699453552","0.5134103034493575","0.5016"

and GPT4O:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.7076101468624834","0.6509433962264151","0.4918032786885246","0.616785607259141","0.6616"

The results are consistent with our reported with small variance.

…ass#648) * add molmo prompts * fix lint format

Co-authored-by: Yuan Ye <[email protected]>

kennymckormick · 2025-01-01T15:23:12Z

Evaluation Results of GPT4o-20241120

hallucination 0.753004
reasoning 0.676101
general 0.535519
Macro Accuracy 0.654875
Overall Consistency 0.7016

TobiasLee and others added 15 commits December 30, 2024 10:47

update vlrewardbench

80a329c

pre-commit fix

72262a8

formatter

86c2a57

[Improvement] Better AUTO_SPLIT and model split for InternVL2

ac535c5

[Minor] Improve CC-OCR Import

fe3b252

[Model] Support QVQ

c29628d

[Model] Update Molmo Eval to Match Official Implementation (open-comp…

40bbc75

…ass#648) * add molmo prompts * fix lint format

[Fix] Refine Qwen-VL2 device assignment

8c6ee87

[Fix] Fix RealWorldQA md5

624c127

update MMMU_DEV_VAL tsv

b66d47f

[Fix] Fix confusing image width&height (open-compass#704)

e540952

Co-authored-by: Yuan Ye <[email protected]>

Update llama_vision.py (open-compass#705)

3691698

[Fix] Fix Lint

3b5d93f

Fix Lint

1bce5c7

Fix Lint

c222e2f

kennymckormick merged commit 276d90a into open-compass:main Jan 1, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VL-RewardBench #703

Add VL-RewardBench #703

TobiasLee commented Dec 30, 2024

kennymckormick commented Jan 1, 2025

Add VL-RewardBench #703

Add VL-RewardBench #703

Conversation

TobiasLee commented Dec 30, 2024

kennymckormick commented Jan 1, 2025