Skip to content

Latest commit

 

History

History
74 lines (52 loc) · 3.06 KB

README.md

File metadata and controls

74 lines (52 loc) · 3.06 KB

(NeurIPS24) NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Links:

🚩 News

Usages

  • VQA Task

    There are two approaches to use and evaluate NaturalBench benchmark:

    1. Evaluation based on the example code:

    Learn how to use and evaluate NaturalBench by reviewing the simple example in naturalbench_vqa.py.

    2. Evaluation with lmms-eval and VLMEvalKit:

    Please refer to the official documentation of lmms-eval and VLMEvalKit for more details.

    • lmms-eval:

      python3 -m accelerate.commands.launch \
          --num_processes=1 \
          -m lmms_eval \
          --model llava_onevision \
          --model_args pretrained="lmms-lab/llava-onevision-qwen2-7b-ov" \
          --tasks naturalbench \
          --batch_size 1 \
          --log_samples \
          --log_samples_suffix llava_onevision_naturalbench \
          --output_path ./logs/
    • VLMEvalKit:

      python run.py --data NaturalBenchDataset --model llava-onevision-qwen2-7b-ov-hf --verbose
  • Retrieval Task

    To use the retrieval task, install t2v_metric package, then run the evaluation code:

    python naturalbench_retrieval.py
    

Citation Information

@inproceedings{naturalbench,
  title={NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples},
  author={Li, Baiqi and Lin, Zhiqiu and Peng, Wenxuan and Nyandwi, Jean de Dieu and Jiang, Daniel and Ma, Zixian and Khanuja, Simran and Krishna, Ranjay and Neubig, Graham and Ramanan, Deva},
  booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024},
  url={https://openreview.net/forum?id=Dx88A9Zgnv}
}