Skip to content

hukcc/Awesome-Video-Hallucination

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-Video-Hallucination Awesome

arXiv ACL 2026 Findings Papers Auto arXiv Update License: MIT Last Commit

A curated paper list on hallucination in Video Large Language Models (Vid-LLMs), covering 29 benchmarks and 42 mitigation methods. Updated monthly via arXiv search.

📄 Survey Paper: Distorted or Fabricated? A Survey on Hallucination in Video LLMs

🔎 Interactive Browser: Search and filter papers by type, mechanism, venue, year, and resources.

Framework overview

Table of Contents


Latest Updates

  • [2026/05] Classified recent papers from new_papers.md, expanding the list to 29 benchmarks and 42 mitigation methods.
  • [2026/04] Our survey has been accepted to ACL 2026 Findings. 👉 arXiv:2604.12944
  • [2026/03] Monthly arXiv search is live. Newly found, unclassified papers are listed in new_papers.md.

Taxonomy of Video Hallucinations


Mechanism-driven taxonomy of Vid-LLM hallucinations. Solid fill = benchmarks; striped fill = mitigation methods.


Evaluation Benchmarks

Note

Benchmarks follow the taxonomy above. Each entry includes venue, date, and available resources.

Legend: page = Project Page   code = GitHub Repository   dataset = Dataset   - = Not Available

🔵 Spatiotemporal Dynamics Benchmarks (Dynamic Distortion)

Event Misordering (5 papers)
Title Benchmark Venue Date Resources
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding VidHalluc CVPR 2025 12/2024 page code
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation HAVEN arXiv 2025 03/2025 code
MHBench: Demystifying Motion Hallucination in VideoLLMs MHBench AAAI 2025 01/2025 code
KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding KPM-Bench arXiv 2026 02/2026 -
ARGUS: Hallucination and Omission Evaluation in Video-LLMs ARGUS ICCV 2025 06/2025 code
Duration Distortion (2 papers)
Title Benchmark Venue Date Resources
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models VideoHallucer arXiv 2024 06/2024 code
Online Video Understanding: OVBench and VideoChat-Online OVBench CVPR 2025 01/2025 page code
Frequency Confusion (2 papers)
Title Benchmark Venue Date Resources
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs VidHal arXiv 2024 11/2024 code
Vript: A Video Is Worth Thousands of Words Vript NeurIPS 2024 06/2024 code

🟢 Referential Inconsistency Benchmarks (Dynamic Distortion)

Character Conflation (2 papers)
Title Benchmark Venue Date Resources
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding EGOILLUSION EMNLP 2025 11/2025 page
MESH: Measuring Hallucinations in Large Video Models MESH ACM MM 2025 09/2025 code
Scene Conflation (1 paper)
Title Benchmark Venue Date Resources
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding ELV-Halluc arXiv 2025 08/2025 code

🟠 Context-Driven Fabrication Benchmarks (Content Fabrication)

Object-Action Hallucination (2 papers)
Title Benchmark Venue Date Resources
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding VideoHallu NeurIPS 2025 05/2025 code
Models See Hallucinations: Evaluating the Factuality in Video Captioning FactVC EMNLP 2023 03/2023 code
Scene-Event Hallucination (4 papers)
Title Benchmark Venue Date Resources
EventHallusion: Diagnosing Event Hallucinations in Video LLMs EventHallusion arXiv 2024 09/2024 code
NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models NOAH arXiv 2025 11/2025 page code
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives RoadSocial CVPR 2025 02/2025 page code
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs CCTVBench arXiv 2026 04/2026 -
Compositional and Factuality Hallucination (6 papers)
Title Benchmark Venue Date Resources
INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs INFACT arXiv 2026 03/2026 -
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models OmniVCHall arXiv 2026 01/2026 code
VideoHEDGE: Entropy-Based Hallucination Detection for Video-VLMs via Semantic Clustering and Spatiotemporal Perturbations VideoHEDGE arXiv 2026 01/2026 code
DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding DualFact ACL 2026 Findings 04/2026 -
Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models GasVideo-1000 arXiv 2026 04/2026 page
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models VisualTextTrap arXiv 2026 04/2026 -

🟣 Audio-Visual Conflict Benchmarks (Content Fabrication)

Action Attribution (4 papers)
Title Benchmark Venue Date Resources
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models AVHBench ICLR 2025 10/2024 code
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio CMM arXiv 2024 10/2024 page code
Exploring Audio Hallucination in Egocentric Video Understanding Audio Hallucination QA ICASSP 2026 04/2026 -
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models AVHalluBench arXiv 2024 05/2024 dataset leaderboard
Emotion Inference (1 paper)
Title Benchmark Venue Date Resources
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models EmotionHallucer arXiv 2025 05/2025 code

Mitigation Strategies

Note

Methods are grouped by target hallucination type. Training-Free marks whether extra training is required (✘) or not (✔︎).

🔵 Spatiotemporal Dynamics Mitigation (Dynamic Distortion)

Event Misordering (5 papers)
Title Method Venue Date Training-Free Resources
SEASON: Mitigating Temporal Hallucination in Video LLMs via Self-Diagnostic Contrastive Decoding SEASON arXiv 2025 12/2025 ✔︎ -
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation Video-thinking (TDPO) arXiv 2025 03/2025 code
SmartSight: Mitigating Hallucination in Video-LLMs via Temporal Attention Collapse SmartSight AAAI 2026 12/2025 ✔︎ -
VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos VideoTemp-o3 arXiv 2026 02/2026 page code
CounterVid: Counterfactual Video Generation for Mitigating Action and Temporal Hallucinations in Video-Language Models MixDPO arXiv 2026 01/2026 -
Duration Distortion (8 papers)
Title Method Venue Date Training-Free Resources
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Video Understanding by MLLMs Temporal Insight ICPR 2024 01/2024 ✔︎ -
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding DINO-HEAL CVPR 2025 12/2024 ✔︎ page code
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering TAAE arXiv 2025 05/2025 -
VideoTIR: Accurate Understanding for Long Videos with Efficient Tool-Integrated Reasoning VideoTIR arXiv 2026 03/2026 -
When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition FrameRepeat arXiv 2026 03/2026 -
Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding Video-TwG arXiv 2026 02/2026 -
Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding CoE ICME 2026 01/2026 -
Relaxing Anchor-Frame Dominance for Mitigating Hallucinations in Video Large Language Models DTR arXiv 2026 04/2026 ✔︎ -
Frequency Confusion (3 papers)
Title Method Venue Date Training-Free Resources
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding VTG-LLM AAAI 2025 05/2024 code
Vript: A Video Is Worth Thousands of Words Vriptor NeurIPS 2024 06/2024 code
KPM-Bench: A Kinematic Parsing Motion Benchmark for Fine-grained Motion-centric Video Understanding MoPE arXiv 2026 02/2026 -

🟢 Referential Inconsistency Mitigation (Dynamic Distortion)

Character Conflation (2 papers)
Title Method Venue Date Training-Free Resources
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens Vista-LLaMA CVPR 2024 12/2023 page code
Alternating Perception-Reasoning for Hallucination-Resistant Video Understanding VideoPLR arXiv 2025 11/2025 code
Scene Conflation (2 papers)
Title Method Venue Date Training-Free Resources
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding ELV-Halluc-DPO arXiv 2025 08/2025 code
Online Video Understanding: OVBench and VideoChat-Online VideoChat-Online CVPR 2025 01/2025 page code

🟠 Context-Driven Fabrication Mitigation (Content Fabrication)

Object-Action Hallucination (2 papers)
Title Method Venue Date Training-Free Resources
Mitigating Object and Action Hallucinations in Multimodal LLMs via Self-Augmented Contrastive Alignment SANTA WACV 2026 12/2025 page
EventHallusion: Diagnosing Event Hallucinations in Video LLMs TCD arXiv 2024 09/2024 ✔︎ code
Scene-Event Hallucination (9 papers)
Title Method Venue Date Training-Free Resources
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations MASH-VLM CVPR 2025 03/2025 -
PaMi-VDPO: Mitigating Video Hallucinations by Prompt-Aware Multi-Instance Video Preference Learning PaMi-VDPO arXiv 2025 04/2025 -
Hallucination Reduction in Video-Language Models via Hierarchical Multimodal Consistency MMA IJCAI 2025 08/2025 -
Clue Matters: Leveraging Latent Visual Clues to Empower Video Reasoning ClueNet arXiv 2026 03/2026 -
GraphThinker: Reinforcing Video Reasoning with Event Graph Thinking GraphThinker arXiv 2026 02/2026 -
MACD: Model-Aware Contrastive Decoding via Counterfactual Data MACD arXiv 2026 02/2026 ✔︎ -
Mitigating Hallucinations in Video Large Language Models via Spatiotemporal-Semantic Contrastive Decoding STSCD arXiv 2026 01/2026 ✔︎ -
Video-ToC: Video Tree-of-Cue Reasoning Video-ToC arXiv 2026 04/2026 code
CCTVBench: Contrastive Consistency Traffic VideoQA Benchmark for Multimodal LLMs C-TCD arXiv 2026 04/2026 ✔︎ -
Both Object-Action & Scene-Event (7 papers)
Title Method Venue Date Training-Free Resources
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models VistaDPO ICML 2025 04/2025 code
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding VideoHallu-GRPO NeurIPS 2025 05/2025 code
Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models TriCD arXiv 2026 01/2026 code
Sink-Token-Aware Pruning for Fine-Grained Video Understanding in Efficient Video LLMs SToP arXiv 2026 04/2026 ✔︎ -
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models VTHM-MoE arXiv 2026 04/2026 -
STEAR: Layer-Aware Spatiotemporal Evidence Intervention for Hallucination Mitigation in Video Large Language Models STEAR arXiv 2026 04/2026 ✔︎ -
Reinforcing Consistency in Video MLLMs with Structured Rewards Structured Rewards arXiv 2026 04/2026 -

🟣 Audio-Visual Conflict Mitigation (Content Fabrication)

Action Attribution (3 papers)
Title Method Venue Date Training-Free Resources
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models AVHModel-Align-FT ICLR 2025 10/2024 code
AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding AVCD NeurIPS 2025 05/2025 ✔︎ code
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization mrDPO arXiv 2024 10/2024 page code
Emotion Inference (1 paper)
Title Method Venue Date Training-Free Resources
EmotionHallucer: Evaluating Emotion Hallucinations in Multimodal Large Language Models PEP-MEK arXiv 2025 05/2025 ✔︎ code

Citation

If this repository or survey helps your work, please cite:

@article{huang2026distorted,
  title={Distorted or Fabricated? A Survey on Hallucination in Video LLMs},
  author={Huang, Yiyang and Zhang, Yitian and Wang, Yizhou and Zhang, Mingyuan and Shi, Liang and Zeng, Huimin and Fu, Yun},
  journal={arXiv preprint arXiv:2604.12944},
  year={2026}
}

Contributing

Tip

Contributions are welcome:

🔀 Pull Request — Add new papers, update resource links, or correct errors
🐛 Open an Issue — Report mistakes, suggest missing papers, or request features

Resource gaps tracked in data/papers.json:

  • Add official code links for 33 entries. Browse: missing code
  • Add official project pages for 58 entries. Browse: missing project pages
  • Add official dataset or leaderboard links when available.
📝 PR Format Guide

Use this structure for new entries:

| [**Paper Title**](paper_link) | Method/Benchmark Name | Venue | MM/YYYY | Resources |

If this repository helps, please consider giving it a

Maintained by the SmileLab team at Northeastern University.

About

[ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors