This directory contains benchmark results for video augmentation libraries.
The video benchmarks measure the performance of various augmentation libraries on video transformations. The benchmarks compare CPU-based processing (Albumentations) with GPU-accelerated processing (Kornia).
The benchmarks use the UCF101 dataset, which contains 13,320 videos from 101 action categories. The videos are realistic, collected from YouTube, and include a wide variety of camera motion, object appearance, pose, scale, viewpoint, and background. This makes it an excellent dataset for benchmarking video augmentation performance across diverse real-world scenarios.
You can download the dataset from: https://www.crcv.ucf.edu/data/UCF101/UCF101.rar
-
Video Loading: Videos are loaded using library-specific loaders:
- OpenCV for Albumentations
- PyTorch tensors for Kornia
-
Warmup Phase:
- Performs adaptive warmup until performance variance stabilizes
- Uses configurable parameters for stability detection
- Implements early stopping for slow transforms
-
Measurement Phase:
- Multiple runs of each transform
- Measures throughput (videos/second)
- Calculates statistical metrics (median, standard deviation)
-
Environment Control:
- CPU benchmarks are run single-threaded
- GPU benchmarks utilize the specified GPU device
- Thread settings are controlled for consistent results
The benchmarks compare:
- Albumentations: CPU-based processing (single thread)
- Kornia: GPU-accelerated processing (NVIDIA GPUs)
This provides insights into the trade-offs between CPU and GPU processing for video augmentation.
To run the video benchmarks:
./run_video_single.sh -l albumentations -d /path/to/videos -o /path/to/output
To run all libraries and generate a comparison:
./run_video_all.sh -d /path/to/videos -o /path/to/output
Number shows how many videos per second can be processed. Larger is better. The Speedup column shows how many times faster Albumentations is compared to the fastest other library for each transform.
Transform | albumentations (videos per second) arm (1 core) |
kornia (videos per second) NVIDIA GeForce RTX 4090 |
torchvision (videos per second) NVIDIA GeForce RTX 4090 |
Speedup (Alb/fastest other) |
---|---|---|---|---|
Affine | 4.45 ± 0.06 | 21.39 ± 0.05 | 452.58 ± 0.14 | 0.01x |
AutoContrast | 20.85 ± 0.10 | 21.41 ± 0.02 | 577.72 ± 16.86 | 0.04x |
Blur | 49.61 ± 1.95 | 20.61 ± 0.06 | N/A | 2.41x |
Brightness | 56.84 ± 1.94 | 21.85 ± 0.02 | 755.52 ± 435.17 | 0.08x |
CLAHE | 8.89 ± 0.09 | N/A | N/A | N/A |
CenterCrop128 | 733.66 ± 4.03 | 70.12 ± 1.29 | 1133.39 ± 234.60 | 0.65x |
ChannelDropout | 58.28 ± 2.96 | 21.81 ± 0.03 | N/A | 2.67x |
ChannelShuffle | 46.92 ± 2.29 | 19.99 ± 0.03 | 958.35 ± 0.20 | 0.05x |
CoarseDropout | 65.62 ± 1.82 | N/A | N/A | N/A |
ColorJitter | 10.67 ± 0.23 | 18.79 ± 0.03 | 68.75 ± 0.13 | 0.16x |
Contrast | 58.81 ± 1.10 | 21.69 ± 0.04 | 546.55 ± 13.23 | 0.11x |
CornerIllumination | 4.80 ± 0.47 | 2.60 ± 0.07 | N/A | 1.84x |
Elastic | 4.31 ± 0.07 | N/A | 126.83 ± 1.28 | 0.03x |
Equalize | 13.09 ± 0.22 | 4.21 ± 0.00 | 191.55 ± 1.25 | 0.07x |
Erasing | 69.44 ± 3.31 | N/A | 254.59 ± 6.57 | 0.27x |
GaussianBlur | 25.63 ± 0.42 | 21.61 ± 0.05 | 543.44 ± 11.50 | 0.05x |
GaussianIllumination | 7.10 ± 0.15 | 20.33 ± 0.08 | N/A | 0.35x |
GaussianNoise | 8.40 ± 0.19 | 22.38 ± 0.08 | N/A | 0.38x |
Grayscale | 152.01 ± 11.18 | 22.24 ± 0.04 | 838.40 ± 466.76 | 0.18x |
HSV | 6.48 ± 0.35 | N/A | N/A | N/A |
HorizontalFlip | 8.69 ± 0.21 | 21.86 ± 0.07 | 977.87 ± 49.03 | 0.01x |
Hue | 14.47 ± 0.33 | 19.53 ± 0.02 | N/A | 0.74x |
Invert | 67.77 ± 2.60 | 21.91 ± 0.23 | 843.27 ± 176.00 | 0.08x |
JpegCompression | 19.62 ± 0.20 | N/A | N/A | N/A |
LinearIllumination | 4.81 ± 0.25 | 4.29 ± 0.19 | N/A | 1.12x |
MedianBlur | 13.87 ± 0.33 | 8.39 ± 0.09 | N/A | 1.65x |
MotionBlur | 33.49 ± 0.66 | N/A | N/A | N/A |
Normalize | 21.70 ± 0.18 | 21.82 ± 0.02 | 460.80 ± 0.18 | 0.05x |
OpticalDistortion | 4.29 ± 0.10 | N/A | N/A | N/A |
Pad | 68.10 ± 0.91 | N/A | 759.68 ± 337.78 | 0.09x |
Perspective | 4.37 ± 0.08 | N/A | 434.75 ± 0.14 | 0.01x |
PlankianJitter | 21.29 ± 0.67 | 10.85 ± 0.01 | N/A | 1.96x |
PlasmaBrightness | 3.37 ± 0.03 | 16.94 ± 0.36 | N/A | 0.20x |
PlasmaContrast | 2.64 ± 0.01 | 16.97 ± 0.03 | N/A | 0.16x |
PlasmaShadow | 6.08 ± 0.05 | 19.03 ± 0.50 | N/A | 0.32x |
Posterize | 56.50 ± 2.44 | N/A | 631.46 ± 14.74 | 0.09x |
RGBShift | 31.73 ± 0.71 | 22.27 ± 0.04 | N/A | 1.42x |
Rain | 23.09 ± 1.52 | 3.77 ± 0.00 | N/A | 6.12x |
RandomCrop128 | 695.33 ± 29.37 | 65.33 ± 0.35 | 1132.79 ± 15.23 | 0.61x |
RandomGamma | 183.49 ± 6.45 | 21.63 ± 0.02 | N/A | 8.48x |
RandomResizedCrop | 15.48 ± 1.12 | 6.29 ± 0.03 | 182.09 ± 15.75 | 0.09x |
Resize | 15.67 ± 0.49 | 5.87 ± 0.03 | 139.96 ± 35.04 | 0.11x |
Rotate | 28.62 ± 0.76 | 21.53 ± 0.05 | 534.18 ± 0.16 | 0.05x |
SaltAndPepper | 9.88 ± 0.19 | 8.82 ± 0.12 | N/A | 1.12x |
Saturation | 8.42 ± 0.14 | 36.56 ± 0.12 | N/A | 0.23x |
Sharpen | 25.02 ± 0.30 | 17.86 ± 0.03 | 420.09 ± 8.99 | 0.06x |
Shear | 4.41 ± 0.08 | N/A | N/A | N/A |
Snow | 12.72 ± 0.21 | N/A | N/A | N/A |
Solarize | 52.02 ± 1.45 | 20.73 ± 0.02 | 628.42 ± 5.91 | 0.08x |
ThinPlateSpline | 4.30 ± 0.14 | 44.90 ± 0.67 | N/A | 0.10x |
VerticalFlip | 9.57 ± 0.27 | 21.96 ± 0.24 | 977.92 ± 5.22 | 0.01x |
system_info:
python_version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0]
platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.31
processor: x86_64
cpu_count: 64
timestamp: 2025-03-11T11:14:57.765540+00:00
library_versions:
torchvision: 0.21.0
numpy: 2.2.3
pillow: 11.1.0
opencv-python-headless: not installed
torch: 2.6.0
opencv-python: not installed
thread_settings:
environment: {'OMP_NUM_THREADS': '1', 'OPENBLAS_NUM_THREADS': '1', 'MKL_NUM_THREADS': '1', 'VECLIB_MAXIMUM_THREADS': '1', 'NUMEXPR_NUM_THREADS': '1'}
opencv: not installed
pytorch: {'threads': 32, 'gpu_available': True, 'gpu_device': 0, 'gpu_name': 'NVIDIA GeForce RTX 4090', 'gpu_memory_total': 23.55084228515625, 'gpu_memory_allocated': 15.05643081665039}
pillow: {'threads': 'unknown', 'simd': False}
benchmark_params:
num_videos: 200
num_runs: 10
max_warmup_iterations: 100
warmup_window: 5
warmup_threshold: 0.05
min_warmup_windows: 3
precision: torch.float16
system_info:
python_version: 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 10:37:40) [Clang 14.0.6 ]
platform: macOS-15.1-arm64-arm-64bit
processor: arm
cpu_count: 16
timestamp: 2025-03-11T01:57:36.320659+00:00
library_versions:
albumentations: 2.0.5
numpy: 2.2.3
pillow: 11.1.0
opencv-python-headless: 4.11.0.86
torch: 2.6.0
opencv-python: not installed
thread_settings:
environment: {'OMP_NUM_THREADS': '1', 'OPENBLAS_NUM_THREADS': '1', 'MKL_NUM_THREADS': '1', 'VECLIB_MAXIMUM_THREADS': '1', 'NUMEXPR_NUM_THREADS': '1'}
opencv: {'threads': 1, 'opencl': False}
pytorch: {'threads': 1, 'gpu_available': False, 'gpu_device': None}
pillow: {'threads': 'unknown', 'simd': False}
benchmark_params:
num_videos: 200
num_runs: 5
max_warmup_iterations: 100
warmup_window: 5
warmup_threshold: 0.05
min_warmup_windows: 3
system_info:
python_version: 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0]
platform: Linux-5.15.0-131-generic-x86_64-with-glibc2.31
processor: x86_64
cpu_count: 64
timestamp: 2025-03-11T00:46:14.791885+00:00
library_versions:
kornia: 0.8.0
numpy: 2.2.3
pillow: 11.1.0
opencv-python-headless: not installed
torch: 2.6.0
opencv-python: not installed
thread_settings:
environment: {'OMP_NUM_THREADS': '1', 'OPENBLAS_NUM_THREADS': '1', 'MKL_NUM_THREADS': '1', 'VECLIB_MAXIMUM_THREADS': '1', 'NUMEXPR_NUM_THREADS': '1'}
opencv: not installed
pytorch: {'threads': 32, 'gpu_available': True, 'gpu_device': 0, 'gpu_name': 'NVIDIA GeForce RTX 4090', 'gpu_memory_total': 23.55084228515625, 'gpu_memory_allocated': 15.05643081665039}
pillow: {'threads': 'unknown', 'simd': False}
benchmark_params:
num_videos: 200
num_runs: 5
max_warmup_iterations: 100
warmup_window: 5
warmup_threshold: 0.05
min_warmup_windows: 3
precision: torch.float16
The benchmark results show interesting trade-offs between CPU and GPU processing:
-
CPU Advantages:
- Better for simple transformations with low computational complexity
- No data transfer overhead between CPU and GPU
- More consistent performance across different transform types
-
GPU Advantages:
- Significantly faster for complex transformations
- Better scaling with video resolution
- More efficient for batch processing
Based on the benchmark results, we recommend:
- For simple transformations on a small number of videos, CPU processing may be sufficient
- For complex transformations or batch processing, GPU acceleration provides significant benefits
- Consider the specific transformations you need and their relative performance on CPU vs GPU