Leverage AI-powered insights to analyze surveillance videos with advanced activity detection, object recognition, and video summarization. This project provides both a command-line interface (CLI) and a modern Streamlit web UI for interactive analysis.
- Video Analysis: Extracts key activities, objects, and summaries from surveillance videos.
- AI Model Integration: Supports Vision-Language Models (VLM), LLMs, and CLIP for deep video understanding.
- Object Detection: Uses YOLO-based models for frame-level annotation.
- Streamlit UI: User-friendly web interface for uploading, analyzing, and visualizing results.
- CLI Tools: Command-line utilities for batch processing and automation.
git clone https://github.com/FadelMamar/video-qa.git
cd video-qa2. Install uv
curl -Ls https://astral.sh/uv/install.sh | shirm https://astral.sh/uv/install.ps1 | iexFor more details or troubleshooting, see the uv installation guide.
It's recommended to use a virtual environment.
uv venv .venv --python 3.11
uv pip install -e .If you plan to use Llama-based models, you need the llama.cpp server executable.
- Visit the llama.cpp releases page and download the latest pre-built binary for your platform:
- Windows: Run
winget install llama.cpp. - Linux/macOS: Run
brew install llama.cpp.
- Windows: Run
After downloading, set the LLAMA_SERVER_PATH variable in your .env to the path of the llama-server (or llama-server.exe on Windows) executable, e.g.:
LLAMA_SERVER_PATH=llama-server
- See instructions above or in the llama.cpp documentation.
This project uses environment variables for configuration. An example configuration file is provided as example.env in the project root.
- Copy the example file to create your own
.env:On Windows (PowerShell):cp example.env .env
copy example.env .env
- Open
.envin a text editor and update the values as needed for your environment.
Below are some important environment variables you may want to configure in your .env file:
HF_TOKEN— HuggingFace API token (if required)OPENAI_API_KEY— OpenAI API key (default:sk-no-key-required)HOST— Host address for local endpoints (default:localhost)VLM_PORT— Port for the Vision-Language Model endpoint (e.g.,8000)LLM_PORT— Port for the Language Model endpoint (e.g.,8008)CTX_SIZE— Context size for model inference (e.g.,20000)MODEL_NAME— Vision-Language Model identifier or path (e.g.,ggml-org/Qwen2.5-VL-3B-Instruct-GGUF:q4_k_m)LLM_MODEL— Language Model identifier or path (e.g.,ggml-org/Qwen3-0.6B-GGUF:f16)LLAMA_SERVER_PATH— Path to the Llama server executable (if used)LLAMA_SERVER_LOG— Path to the Llama server log file (e.g.,llama_server.log)TEMPERATURE— Sampling temperature for model inference (e.g.,0.7)DSPY_CACHEDIR— Directory for DSPy cache (e.g.,.cache_dspy)VIDEO_PREPROCESSED_DIR— Directory for preprocessed videos (e.g.,preprocessed_videos)CLIP_MODEL— Path or identifier for your CLIP model (if required)
Example .env values:
Refer to example.env for all available options and their default/example values.
uv run streamlit run app/ui.py- Open your browser at the provided URL (usually http://localhost:8501).
- Upload a video or provide a path to a local video file.
- Configure analysis options in the sidebar.
- Click "Start Analysis" to view results and logs.
Analyze a video directly from the terminal:
python cli.py analyze <path_to_video> --args='{"sample_freq":5, ...}' --activity_analysis=True- Replace
<path_to_video>with your video file. - Adjust
--argsas needed (seePredictionConfiginwatcher/config.py).
python cli.py launch_vlm --model_name="your-vlm-model" --port=8001 --ctx_size=20000video-qa/
app/ # Streamlit UI
src/watcher/ # Core analysis, detection, and utils
cli.py # Command-line interface
models/ # Model checkpoints
preprocessed_videos/
tests/ # Test scripts
README.md
pyproject.toml
- Python 3.11
- Streamlit
- torch, torchvision
- ultralytics
- dotenv
- Other dependencies as listed in
pyproject.toml
- Upload or specify a video in the UI.
- Click "Start Analysis".
- View key insights, frame-level metadata, and logs.
- Models: Place your YOLO or VLM models in the specified paths and update
.env. - Sampling Frequency: Adjust in the UI or CLI for faster or more detailed analysis.
- Device: By default, uses CPU. For GPU, ensure torch detects CUDA.
- If you see errors about missing models, check your
.envpaths. - For large videos, use the "Video path" input instead of uploading.
- Logs are available in the UI under "Logs" expanders.
Pull requests and issues are welcome! Please open an issue for bugs or feature requests.
