Skip to content

insait-institute/TRAVL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 

Repository files navigation

TRAVL logo

TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility

[Preprint 2025] Official code of the paper “TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility” Keywords: Video-Language Models, Physics Plausibility, Video Reasoning, Trajectory-aware Attention, Benchmarking

Saman Motamed1,2✉, Minghao Chen2, Luc Van Gool1, Iro Laina2

1INSAIT, Sofia University "St. Kliment Ohridski"   2Visual Geometry Group, University of Oxford

arXiv   Project Page   Video Summary

ImplausiBench on Hugging Face   TRAVL dataset on Hugging Face

Please support our work by leaving a star on our repo! ⭐⭐⭐

TRAVL teaser

🔥 Update Log

TODO 🏃🏻‍♂️

  • Training code (LLaVA-NeXT + TRAVL)
  • LLM Judge evaluation script
  • Release LLaVA-NeXT + TRAVL weights

Table of Contents


Overview 👀

  • Modern VLMs can give an overview of a video quite well, yet they fail to reason about more finegrained physical interactions in a video.
  • TRAVL is a light, modular attention recipe— spatial + trajectory-aware temporal—that helps VLMs judge physics implausibility more reliably.
  • ImplausiBench is our 300-video benchmark (150 real, 150 implausible) with paired, style-matched videos and grounded MCQs to evaluate visual-temporal reasoning beyond language shortcuts.
  • TRAVL Dataset is our curated dataset of 3,482 videos with 19,708 physics‑focused Q/A pairs.
  • Paper: (arXiv link coming soon)
  • Project page: https://sam-motamed.github.io/projects/TRAVL

Datasets

ImplausiBench (Benchmark Dataset)

A 300-video benchmark (150 real, 150 implausible) for evaluating visual-temporal physics plausibility with paired clips (shared first frame & style) and grounded MCQs that reduce language-only shortcuts.

  • Hugging Facehttps://huggingface.co/datasets/INSAIT-Institute/ImplausiBench
  • What’s inside
    • ImplausiBench/real/*.mp4 & ImplausiBench/implausible/*.mp4
    • ImplausiBench-MCQA.json grounded multiple-choice questions per pair
  • Metrics reported: Human & LLM-judge accuracy on Real / Implausible subsets (150 each)

Download

git lfs install
git clone https://huggingface.co/datasets/INSAIT-Institute/ImplausiBench data/implausibench

TRAVL (Tuning Dataset)

# Option A: huggingface_hub
pip install -U huggingface_hub
python - << 'PY'
from huggingface_hub import snapshot_download
snapshot_download(repo_id="INSAIT-Institute/TRAVL", repo_type="dataset", local_dir="data/travl")
PY

# Option B: git-lfs
git lfs install
git clone https://huggingface.co/datasets/INSAIT-Institute/TRAVL data/travl

ImplausiBench Leaderboard

Accuracies (%) on Implausible (generated) and Real subsets (150 videos each).
We report both Human and LLM-judge scores. Sorted by Implausible — Human (best → worst).

Model Implausible — Human Implausible — LLM Real — Human Real — LLM
LLaVA-NeXT (TRAVL)
52.7
28.7
47.3
31.3
Gemini 2.5 Pro
41.3
29.3
100.0
78.0
LLaVA-NeXT (SFT)
34.0
22.0
45.3
23.3
GPT-4o
32.7
21.3
84.7
64.0
Qwen2.5-VL
18.7
12.0
96.7
74.7
InternVL 2.5
12.7
4.7
92.7
76.0
LLaVA-NeXT (pretrained)
3.3
2.7
98.7
62.7

Cite us 😇

@article{{motamed2025travl,
    title={TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility},
    author={Saman Motamed and Minghao Chen and Luc Van Gool and Iro Laina},
    year={2025},
    eprint={2510.07550},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Contact ☎️

Questions or feedback? Reach us at [email protected].

Acknowledgement 💖

Our work was made possible by efforts from following works. Thanks to all the contributors!

About

TRAVL: A Recipe for Making Video-Language Models Better Judges of Physics Implausibility

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published