Awesome-Video-Hallucination

A curated paper list on hallucination in Video Large Language Models (Vid-LLMs), covering 29 benchmarks and 42 mitigation methods. Updated monthly via arXiv search.

📄 Survey Paper: Distorted or Fabricated? A Survey on Hallucination in Video LLMs

🔎 Interactive Browser: Search and filter papers by type, mechanism, venue, year, and resources.

Latest Updates

[2026/05] Classified recent papers from new_papers.md, expanding the list to 29 benchmarks and 42 mitigation methods.
[2026/04] Our survey has been accepted to ACL 2026 Findings. 👉 arXiv:2604.12944
[2026/03] Monthly arXiv search is live. Newly found, unclassified papers are listed in new_papers.md.

Taxonomy of Video Hallucinations

Mechanism-driven taxonomy of Vid-LLM hallucinations. Solid fill = benchmarks; striped fill = mitigation methods.

Evaluation Benchmarks

Note

Benchmarks follow the taxonomy above. Each entry includes venue, date, and available resources.

Legend: = Project Page = GitHub Repository = Dataset - = Not Available

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

Event Misordering (5 papers)

Title	Benchmark	Venue	Date	Resources
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	VidHalluc	CVPR 2025	12/2024
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	HAVEN	arXiv 2025	03/2025
MHBench: Demystifying Motion Hallucination in VideoLLMs	MHBench	AAAI 2025	01/2025
KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding	KPM-Bench	arXiv 2026	02/2026	-
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	ARGUS	ICCV 2025	06/2025

Duration Distortion (2 papers)

Title	Benchmark	Venue	Date	Resources
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models	VideoHallucer	arXiv 2024	06/2024
Online Video Understanding: OVBench and VideoChat-Online	OVBench	CVPR 2025	01/2025

Frequency Confusion (2 papers)

Title	Benchmark	Venue	Date	Resources
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs	VidHal	arXiv 2024	11/2024
Vript: A Video Is Worth Thousands of Words	Vript	NeurIPS 2024	06/2024

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

Character Conflation (2 papers)

Title	Benchmark	Venue	Date	Resources
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding	EGOILLUSION	EMNLP 2025	11/2025
MESH: Measuring Hallucinations in Large Video Models	MESH	ACM MM 2025	09/2025

Scene Conflation (1 paper)

Title	Benchmark	Venue	Date	Resources
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding	ELV-Halluc	arXiv 2025	08/2025

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

Object-Action Hallucination (2 papers)

Title	Benchmark	Venue	Date	Resources
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	VideoHallu	NeurIPS 2025	05/2025
Models See Hallucinations: Evaluating the Factuality in Video Captioning	FactVC	EMNLP 2023	03/2023

Scene-Event Hallucination (4 papers)

Title	Benchmark	Venue	Date	Resources
EventHallusion: Diagnosing Event Hallucinations in Video LLMs	EventHallusion	arXiv 2024	09/2024
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models	NOAH	arXiv 2025	11/2025
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives	RoadSocial	CVPR 2025	02/2025
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs	CCTVBench	arXiv 2026	04/2026	-

Compositional and Factuality Hallucination (6 papers)

Title	Benchmark	Venue	Date	Resources
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs	INFACT	arXiv 2026	03/2026	-
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models	OmniVCHall	arXiv 2026	01/2026
VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations	VideoHEDGE	arXiv 2026	01/2026
DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding	DualFact	ACL 2026 Findings	04/2026	-
Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models	GasVideo-1000	arXiv 2026	04/2026
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models	VisualTextTrap	arXiv 2026	04/2026	-

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Action Attribution (4 papers)

Title	Benchmark	Venue	Date	Resources
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	AVHBench	ICLR 2025	10/2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio	CMM	arXiv 2024	10/2024
Exploring Audio Hallucination in Egocentric Video Understanding	Audio Hallucination QA	ICASSP 2026	04/2026	-
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models	AVHalluBench	arXiv 2024	05/2024

Emotion Inference (1 paper)

Title	Benchmark	Venue	Date	Resources
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	EmotionHallucer	arXiv 2025	05/2025

Mitigation Strategies

Note

Methods are grouped by target hallucination type. Training-Free marks whether extra training is required (✘) or not (✔︎).

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

Event Misordering (5 papers)

Title	Method	Venue	Date	Training-Free	Resources
SEASON: Mitigating Temporal Hallucination in Video LLMs via Self-Diagnostic Contrastive Decoding	SEASON	arXiv 2025	12/2025	✔︎	-
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	Video-thinking (TDPO)	arXiv 2025	03/2025	✘
SmartSight: Mitigating Hallucination in Video-LLMs via Temporal Attention Collapse	SmartSight	AAAI 2026	12/2025	✔︎	-
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos	VideoTemp-o3	arXiv 2026	02/2026	✘
CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models	MixDPO	arXiv 2026	01/2026	✘	-

Duration Distortion (8 papers)

Title	Method	Venue	Date	Training-Free	Resources
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Video Understanding by MLLMs	Temporal Insight	ICPR 2024	01/2024	✔︎	-
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	DINO-HEAL	CVPR 2025	12/2024	✔︎
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering	TAAE	arXiv 2025	05/2025	✘	-
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning	VideoTIR	arXiv 2026	03/2026	✘	-
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition	FrameRepeat	arXiv 2026	03/2026	✘	-
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding	Video-TwG	arXiv 2026	02/2026	✘	-
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding	CoE	ICME 2026	01/2026	✘	-
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models	DTR	arXiv 2026	04/2026	✔︎	-

Frequency Confusion (3 papers)

Title	Method	Venue	Date	Training-Free	Resources
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding	VTG-LLM	AAAI 2025	05/2024	✘
Vript: A Video Is Worth Thousands of Words	Vriptor	NeurIPS 2024	06/2024	✘
KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding	MoPE	arXiv 2026	02/2026	✘	-

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

Character Conflation (2 papers)

Title	Method	Venue	Date	Training-Free	Resources
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens	Vista-LLaMA	CVPR 2024	12/2023	✘
Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding	VideoPLR	arXiv 2025	11/2025	✘

Scene Conflation (2 papers)

Title	Method	Venue	Date	Training-Free	Resources
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding	ELV-Halluc-DPO	arXiv 2025	08/2025	✘
Online Video Understanding: OVBench and VideoChat-Online	VideoChat-Online	CVPR 2025	01/2025	✘

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

Object-Action Hallucination (2 papers)

Title	Method	Venue	Date	Training-Free	Resources
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment	SANTA	WACV 2026	12/2025	✘
EventHallusion: Diagnosing Event Hallucinations in Video LLMs	TCD	arXiv 2024	09/2024	✔︎

Scene-Event Hallucination (9 papers)

Title	Method	Venue	Date	Training-Free	Resources
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations	MASH-VLM	CVPR 2025	03/2025	✘	-
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning	PaMi-VDPO	arXiv 2025	04/2025	✘	-
Hallucination Reduction in Video-Language Models via Hierarchical Multimodal Consistency	MMA	IJCAI 2025	08/2025	✘	-
Clue Matters: Leveraging Latent Visual Clues to Empower Video Reasoning	ClueNet	arXiv 2026	03/2026	✘	-
GraphThinker: Reinforcing Video Reasoning with Event Graph Thinking	GraphThinker	arXiv 2026	02/2026	✘	-
MACD: Model-Aware Contrastive Decoding via Counterfactual Data	MACD	arXiv 2026	02/2026	✔︎	-
Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding	STSCD	arXiv 2026	01/2026	✔︎	-
Video-ToC: Video Tree-of-Cue Reasoning	Video-ToC	arXiv 2026	04/2026	✘
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs	C-TCD	arXiv 2026	04/2026	✔︎	-

Both Object-Action & Scene-Event (7 papers)

Title	Method	Venue	Date	Training-Free	Resources
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	VistaDPO	ICML 2025	04/2025	✘
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding	VideoHallu-GRPO	NeurIPS 2025	05/2025	✘
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models	TriCD	arXiv 2026	01/2026	✘
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs	SToP	arXiv 2026	04/2026	✔︎	-
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models	VTHM-MoE	arXiv 2026	04/2026	✘	-
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models	STEAR	arXiv 2026	04/2026	✔︎	-
Reinforcing Consistency in Video MLLMs with Structured Rewards	Structured Rewards	arXiv 2026	04/2026	✘	-

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Action Attribution (3 papers)

Title	Method	Venue	Date	Training-Free
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	AVHModel-Align-FT	ICLR 2025	10/2024	✘
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding	AVCD	NeurIPS 2025	05/2025	✔︎
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization	mrDPO	arXiv 2024	10/2024	✘

Emotion Inference (1 paper)

Title	Method	Venue	Date	Training-Free	Resources
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models	PEP-MEK	arXiv 2025	05/2025	✔︎

Citation

If this repository or survey helps your work, please cite:

@article{huang2026distorted,
  title={Distorted or Fabricated? A Survey on Hallucination in Video LLMs},
  author={Huang, Yiyang and Zhang, Yitian and Wang, Yizhou and Zhang, Mingyuan and Shi, Liang and Zeng, Huimin and Fu, Yun},
  journal={arXiv preprint arXiv:2604.12944},
  year={2026}
}

Contributing

Tip

Contributions are welcome:

🔀 Pull Request — Add new papers, update resource links, or correct errors
🐛 Open an Issue — Report mistakes, suggest missing papers, or request features

Resource gaps tracked in data/papers.json:

Add official code links for 33 entries. Browse: missing code
Add official project pages for 58 entries. Browse: missing project pages
Add official dataset or leaderboard links when available.

📝 PR Format Guide

Use this structure for new entries:

| [**Paper Title**](paper_link) | Method/Benchmark Name | Venue | MM/YYYY | Resources |

If this repository helps, please consider giving it a ⭐

Maintained by the SmileLab team at Northeastern University.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
assets		assets
data		data
imgs		imgs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
new_papers.md		new_papers.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Video-Hallucination

Table of Contents

Latest Updates

Taxonomy of Video Hallucinations

Evaluation Benchmarks

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Mitigation Strategies

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Citation

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome-Video-Hallucination

Table of Contents

Latest Updates

Taxonomy of Video Hallucinations

Evaluation Benchmarks

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Mitigation Strategies

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Citation

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages