GitHub - pipixin321/HolmesVAU: ✨✨✨Official implementation of "Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity"

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

If you like our project, please give us a star ⭐ on GitHub for latest update.

✨Highlights

Abstract: How can we enable models to comprehend video anomalies occurring over varying temporal scales and contexts? Traditional Video Anomaly Understanding (VAU) methods focus on frame-level anomaly prediction, often missing the interpretability of complex and diverse real-world anomalies. Recent multimodal approaches leverage visual and textual data but lack hierarchical annotations that capture both short-term and long-term anomalies.

To address this challenge, we introduce HIVAU-70k, a large-scale benchmark for hierarchical video anomaly understanding across any granularity. We develop a semi-automated annotation engine that efficiently scales high-quality annotations by combining manual video segmentation with recursive free-text annotation using large language models (LLMs). This results in over 70,000 multi-granular annotations organized at clip-level, event-level, and video-level segments.

For efficient anomaly detection in long videos, we propose the Anomaly-focused Temporal Sampler (ATS). ATS integrates an anomaly scorer with a density-aware sampler to adaptively select frames based on anomaly scores, ensuring that the multimodal LLM concentrates on anomaly-rich regions, which significantly enhances both efficiency and accuracy. Extensive experiments demonstrate that our hierarchical instruction data markedly improves anomaly comprehension. The integrated ATS and visual-language model outperform traditional methods in processing long videos.

📅 TODO

🔧 Benchmarks

Download videos

Download the source videos for UCF-Crime and XD-Violence from the homepage below:

Check the folder

Put all their training videos and test videos in the [ucf-crime/xd-violence]/videos/[train/test] folder respectively. Please ensure the data structure is as below.

├── HIVAU-70k
    ├── instruction
        ├── merge_instruction_test_final.jsonl
        └── merge_instruction_train_final.jsonl
    ├── raw_annotations
        ├── ucf_database_train.json
        ├── ucf_database_test.json
        ├── xd_database_train.json
        └── xd_database_test.json
    └── videos
        ├── ucf-crime
            ├── clips
            ├── events
            └── videos
                ├── train
                    ├── Abuse001_x264.mp4
                    ├── ...
                └── test
                    ├── Abuse028_x264.mp4
                    ├── ...
        └── xd-violence
            ├── clips
            ├── events
            └── videos
                ├── train
                    ├── A.Beautiful.Mind.2001__#00-01-45_00-02-50_label_A.mp4
                    ├── ...
                └── test
                    ├── A.Beautiful.Mind.2001__#00-25-20_00-29-20_label_A.mp4
                    ├── ...

Split videos

This process consumes several hours:

cd HIVAU-70k
python split_video.py
python check_video.py

Citation

If you find this repo useful for your research, please consider citing our papers:

@article{zhang2024holmesvau,
  title={Holmes-vau: Towards long-term video anomaly understanding at any granularity},
  author={Zhang, Huaxin and Xu, Xiaohao and Wang, Xiang and Zuo, Jialong and Huang, Xiaonan and Gao, Changxin and Zhang, Shanjun and Yu, Li and Sang, Nong},
  journal={arXiv preprint arXiv:2412.06171},
  year={2024}
}

@article{zhang2024holmesvad,
  title={Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM},
  author={Zhang, Huaxin and Xu, Xiaohao and Wang, Xiang and Zuo, Jialong and Han, Chuchu and Huang, Xiaonan and Gao, Changxin and Wang, Yuehuan and Sang, Nong},
  journal={arXiv preprint arXiv:2406.12235},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ATS		ATS
HIVAU-70k		HIVAU-70k
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

If you like our project, please give us a star ⭐ on GitHub for latest update.

✨Highlights

📅 TODO

🔧 Benchmarks

Citation

About

Releases

Packages

Languages

License

pipixin321/HolmesVAU

Folders and files

Latest commit

History

Repository files navigation

Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity

If you like our project, please give us a star ⭐ on GitHub for latest update.

✨Highlights

📅 TODO

🔧 Benchmarks

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages