Skip to content

GXYM/VCapsBench

Repository files navigation

VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation

Sample Visualization
Visualization of evaluation metrics across different models (RadarChartPlot.py)

VCapsBench is a comprehensive benchmark for evaluating the quality of video captions generated by vision-language models. This repository provides:

  • 🎥 A large-scale dataset with diverse video content
  • ⚖️ Fine-grained evaluation results for multiple models
  • 📊 Visualization tools for analyzing caption quality
  • 🤖 Scripts for generating and evaluating captions

📂 Dataset Download

Main Dataset

Access the VCapsBench dataset on Hugging Face:
HF Dataset

Raw Data Files

File Description Link
VCapsbench_Caption_ALL.csv.zip Raw caption dataset Download
gemini_eval_results.zip Evaluation results (Gemini-2.5-Pro) Download
gpt_eval_results.zip Evaluation results (GPT-4.1) Download

🛠️ Scripts

Video Caption Generation (VLMs)

Supported models:

  • Qwen2.5-VL-72B
  • Qwen2.5-VL-7B
  • Qwen2VL-7B
  • InternVL2.5-8B
  • NVILA-8B
  • LLaVA-Video-7B
  • VideoLLaMA3-7B

Evaluation Scripts

#!/bin/bash
# eval.sh - Batch evaluate multiple caption outputs

unset http_proxy      
unset https_proxy

# Configuration
input_file="VCapsBench_Caption_ALL.csv"
dataset_path="VCapsBench_100KQA.jsonl"
max_workers=128
llm="gemini"  # "gemini" or "gpt4o"
output_dir="eval_results-gemini-2.5"

caption_cols=(
    "gpt4o_cap"
    "Qwen2.5-VL-72B"
    "gemini2.5_pro-05-06"
    "gemini2.5_pre_flash"
)

# Run evaluations
for caption_col in "${caption_cols[@]}"; do
    python3 LLM4eval-m.py \
        --input_file "$input_file" \
        --dataset_path "$dataset_path" \
        --output_dir "$output_dir" \
        --caption_col "$caption_col" \
        --llm "$llm" \
        --max_workers "$max_workers"
done

📈 Visualization Tools

Script Description Output
RadarChartPlot.py Compare model performance across metrics
WordLength.py Analyze caption length distribution
wordlength_IR_CR_plot.py Relationship between length and quality

📄 Citation

@article{zhang2025vcapsbench,
  title={VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation},
  author={Zhang, Shi-Xue and Wang, Hongfa and Huang, Duojun and Li, Xin and Zhu, Xiaobin and Yin, Xu-Cheng},
  journal={arXiv preprint arXiv:2505.23484},
  year={2025}
}

About

VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published