Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add VL-RewardBench #703

Merged
merged 15 commits into from
Jan 1, 2025
Merged

Conversation

TobiasLee
Copy link
Contributor

Hi there,

Thanks for your awesome project, which helps a lot for LMM evaluation & development!

This PR incorporates our recently released VL-RewardBench.
Example script:

python run.py --data VL-RewardBench --model GPT4o 

Saved results for GPT4O-MINI:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.4552736982643525","0.6477987421383647","0.4371584699453552","0.5134103034493575","0.5016"

and GPT4O:

"hallucination","reasoning","general","Macro Accuracy","Overall Consistency"
"0.7076101468624834","0.6509433962264151","0.4918032786885246","0.616785607259141","0.6616"

The results are consistent with our reported with small variance.

@kennymckormick kennymckormick merged commit 276d90a into open-compass:main Jan 1, 2025
1 check passed
@kennymckormick
Copy link
Member

Evaluation Results of GPT4o-20241120


hallucination 0.753004
reasoning 0.676101
general 0.535519
Macro Accuracy 0.654875
Overall Consistency 0.7016


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants