Task-aware Video-to-Text TLDR

This repository contains source code for VIBE: Video-to-Text Information Bottleneck Evaluation for TL;DR

TL;DR

VIBE is an annotation-free method that selects video summaries by scoring task relevance and visual grounding without retraining. Human studies show VIBE improves accuracy and reduces response time over naive VLM summaries and full videos across three datasets.

System Plot and Major Results

Our results show that video captions selected by VIBE can achieve a better trade-off between faster human response time and task accuracy.

Project Structure

.
├── config/                 # Configuration files for different tasks
│   ├── AIVideoConf.yaml   # AI Conference Video analysis config
│   ├── LongVideoBench.yaml # Long video benchmark config
│   └── TrafficQA.yaml     # Traffic QA config
│
└── src/                   # Source code
    ├── AIConfVideo_script/    # AI Conference Video analysis
    ├── LongVideoBench_script/ # Long video benchmark
    ├── TrafficQA_script/      # Traffic QA
    ├── client/                # Client-side code
    ├── server/                # Server-side code
    └── utils/                 # Utility functions

Prerequisites

See requirements.txt for the packages and their versions.

Key Utility Functions

The src/utils/ directory contains several important utility modules:

Text Processing

tf_idf.py: TF-IDF based keyword extraction from text corpus
- Extracts important keywords using Term Frequency-Inverse Document Frequency
- Supports customizable parameters for document frequency thresholds
- Handles n-grams and stop words

Computer Vision

easyocr_utils.py: Optical Character Recognition (OCR) utilities
- Text detection and recognition in images
- Keyword-based text masking
- Support for multiple languages
- Confidence threshold filtering

Video Processing

scene_detect.py: Scene detection utilities
- Video scene boundary detection
- Keyframe extraction
- Scene transition analysis

Image Processing

fill_in_mask.py: Image inpainting utilities
- Mask filling and image completion
- Region-based image editing

Natural Language Processing

nltk_utils.py: NLP utilities
- Text preprocessing
- Tokenization and lemmatization
- Language model integration

Area Analysis

primary_area.py: Area detection and analysis
- Primary region detection
- Area-based feature extraction
- Spatial analysis utilities

Model Configuration

The project supports multiple large language models for video understanding:

InternVL-2.5-8B-MPO
InternVL3-38B
Qwen2.5-VL-72B-Instruct-AWQ You may modify the config files (in folder /config/) to run any other models supported by vLLM.

Citation

@misc{chen2025vibevideototextinformationbottleneck,
      title={VIBE: Video-to-Text Information Bottleneck Evaluation for TL;DR}, 
      author={Shenghui Chen and Po-han Li and Sandeep Chichali and Ufuk Topcu},
      year={2025},
      eprint={2505.17423},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17423}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
config		config
src		src
README.md		README.md
requrements.txt		requrements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Task-aware Video-to-Text TLDR

TL;DR

Table of Contents

System Plot and Major Results

Project Structure

Prerequisites

Key Utility Functions

Text Processing

Computer Vision

Video Processing

Image Processing

Natural Language Processing

Area Analysis

Model Configuration

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

UTAustin-SwarmLab/Task-aware-TLDR-Public

Folders and files

Latest commit

History

Repository files navigation

Task-aware Video-to-Text TLDR

TL;DR

Table of Contents

System Plot and Major Results

Project Structure

Prerequisites

Key Utility Functions

Text Processing

Computer Vision

Video Processing

Image Processing

Natural Language Processing

Area Analysis

Model Configuration

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages