Skip to content

Commit ce033f9

Browse files
author
subinKim14
committed
initial commit
0 parents  commit ce033f9

18 files changed

+2066
-0
lines changed

README.md

+104
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Scalable Neural Video Representations with Leanable Positional Features (NVP)
2+
3+
Official PyTorch implementation of
4+
["**Scalable Neural Video Representations with Leanable Positional Features**"](
5+
https://arxiv.org/xxxxx) (NeurIPS 2022) by
6+
[Subin Kim*](https://subin-kim-cv.github.io/),
7+
[Sihyun Yu*](https://sihyun.me/),
8+
[Jaeho Lee](https://jaeho-lee.github.io/),
9+
and [Jinwoo Shin](https://alinlab.kaist.ac.kr/shin.html).
10+
11+
### [Project Page](https://subin-kim-cv.github.io/NVP) | [Paper](xxxx) | [Slide](https://subin-kim-cv.github.io/assets/2022_NVP/slide/kim2022NVP.pdf)
12+
13+
<p align="center">
14+
<img src=figures/teaser_dynamic_compressed.gif width="900">
15+
<img src=figures/teaser_compressed.gif width="900">
16+
</p>
17+
18+
19+
## 1. Requirements
20+
### Environments
21+
Required packages are listed in "environment.yaml".
22+
Also, you should install the following packages:
23+
```
24+
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
25+
26+
pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch
27+
```
28+
* This [repository](https://github.com/subin-kim-cv/tiny-cuda-nn) is slightly different from original implementation of [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn)
29+
30+
### Datasets
31+
First, download the UVG-HD datasets from the following links:
32+
33+
* [UVG-HD](http://ultravideo.fi/#testsequences)
34+
35+
Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT is the input file name, and OUTPUT is a directory to save decompressed RGB frames.
36+
37+
```
38+
ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p \ -i INPUT.yuv OUTPUT/f%05d.png
39+
```
40+
41+
## 2. Training
42+
Run the code with a single GPU.
43+
44+
```train
45+
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json
46+
```
47+
* Option --logging_root denotes the path to save the experiment log.
48+
* Option --experiment_name denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under --logging_root.
49+
* Option --dataset denotes the path of RGB sequences (e.g., ~/data/Jockey).
50+
* Option --num_frames denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD).
51+
* To reconstruct videos with 300 frames, please change the values of "t_resolution" in configuration file to 300.
52+
53+
## 3. Evaluation
54+
Evaluation without compression of parameters (only quantize)
55+
```
56+
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json
57+
```
58+
* Option --save denotes whether to save the reconstructed frames.
59+
* One can specify an option --s_interp for a video superresolution results. It denotes the superresolution scale (e.g., 8).
60+
* One can specify an option --t_interp for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).
61+
62+
63+
Evaluation with compression of parameters using well-known image and video codecs
64+
65+
1. Save the quantized parameters
66+
```
67+
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json
68+
```
69+
2. Compress the saved sparse positional image-/video-like features using codecs.
70+
* Execute "compression.ipynb".
71+
* Please change the logging_root and experiment_name in "compression.ipynb" appropriately.
72+
* One can change qscale, crf, framerate which changes the compression ratio of sparse positinal features.
73+
* qscale ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended).
74+
* crf ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended)
75+
* framerate (25 or 40 recommended)
76+
77+
78+
3. Evaluation with the compressed parameters
79+
```
80+
CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25
81+
```
82+
* Option --save denotes whether to save the reconstructed frames.
83+
* Please specify the option --qscale, --crf, --framerate as same with the values in the compression.ipynb
84+
85+
## 4. Results
86+
Reconstructed video results of NVP on UVG-HD, more temporally dynamic vidoes, 4K long video are available at the following [project page](https://subin-kim-cv.github.io/NVP/)
87+
88+
Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:
89+
* Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, Yachtride in UVG-HD
90+
91+
| Encoding Time | BPP | PSNR (&#8593;) | FLIP (&#8595;) | LPIPS (&#8595;) |
92+
| -------------- | ------ | ------------------ | ------------------ | ------------------ |
93+
| ~5 minutes | 0.901 | 34.57 $\pm$ 2.62 | 0.075 $\pm$ 0.021 | 0.190 $\pm$ 0.100 |
94+
| ~10 minutes | 0.901 | 35.79 $\pm$ 2.31 | 0.065 $\pm$ 0.016 | 0.160 $\pm$ 0.098 |
95+
| ~1 hour | 0.901 | 37.61 $\pm$ 2.20 | 0.052 $\pm$ 0.011 | 0.145 $\pm$ 0.106 |
96+
| ~8 hours | 0.210 | 36.46 $\pm$ 2.18 | 0.067 $\pm$ 0.017 | 0.135 $\pm$ 0.083 |
97+
98+
99+
## Citation
100+
```
101+
```
102+
103+
## Reference
104+
We used the code from following repositories: [SIREN](https://github.com/vsitzmann/siren), [Modulation](https://github.com/lucidrains/siren-pytorch), [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn)

compression.ipynb

+114
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": null,
6+
"metadata": {},
7+
"outputs": [],
8+
"source": [
9+
"import os\n",
10+
"def jpeg_compression(scale, src, dst, config):\n",
11+
"\n",
12+
" dim = config[\"n_features_per_level\"]\n",
13+
" n_levels = config[\"n_levels\"]\n",
14+
"\n",
15+
" for d in range(dim):\n",
16+
" for i in range(n_levels):\n",
17+
" src_path = os.path.join(src, f\"dim{d}\", f\"{str(i).zfill(2)}.png\")\n",
18+
" save_path = os.path.join(dst, f\"dim{d}\", str(scale).zfill(3))\n",
19+
" os.makedirs(save_path, exist_ok=True)\n",
20+
" save_path = os.path.join(save_path, f\"{str(i).zfill(2)}.jpg\")\n",
21+
"\n",
22+
" if os.path.isfile(save_path):\n",
23+
" !rm $save_path\n",
24+
" !ffmpeg -hide_banner -i $src_path -qscale:v $scale $save_path\n",
25+
"\n",
26+
"\n",
27+
"# keyint=7:min-keyint=7:no-scenecut:me=full:subme=7:bframes=0\n",
28+
"def hevc_compression(crf, framerate, src, dst, config):\n",
29+
"\n",
30+
" dim = config[\"n_features_per_level\"]\n",
31+
" \n",
32+
" for d in range(dim):\n",
33+
" src_path = os.path.join(src, f\"dim{d}\", \"%05d.png\")\n",
34+
" os.makedirs(os.path.join(dst, f\"dim{d}\"), exist_ok=True)\n",
35+
" save_path = os.path.join(dst, f\"dim{d}\", f\"{crf}_{framerate}.mp4\")\n",
36+
"\n",
37+
" if os.path.isfile(save_path):\n",
38+
" !rm $save_path\n",
39+
"\n",
40+
" !ffmpeg -framerate $framerate -i $src_path -c:v hevc -preset slow -x265-params bframes=0 -crf $crf $save_path\n",
41+
"\n"
42+
]
43+
},
44+
{
45+
"cell_type": "markdown",
46+
"metadata": {},
47+
"source": [
48+
"Compress Learnable Keyframes and Sparse Grid"
49+
]
50+
},
51+
{
52+
"cell_type": "code",
53+
"execution_count": null,
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"import json\n",
58+
"datasets = [\"jockey\"]\n",
59+
"\n",
60+
"for data in datasets:\n",
61+
" experiment_name = f\"{data}\"\n",
62+
" config_path = f\"./logs_nvp/{experiment_name}/config.json\"\n",
63+
" base_path = f\"./logs_nvp/{experiment_name}/compression\"\n",
64+
"\n",
65+
"\n",
66+
" with open(config_path, 'r') as f:\n",
67+
" config = json.load(f)\n",
68+
"\n",
69+
" config = config[\"nvp\"]\n",
70+
" keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"xy\")\n",
71+
" save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"xy\")\n",
72+
" jpeg_compression(scale=2, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_xy\"])\n",
73+
"\n",
74+
" keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"xt\")\n",
75+
" save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"xt\")\n",
76+
" jpeg_compression(scale=3, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_xt\"])\n",
77+
"\n",
78+
" keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"yt\")\n",
79+
" save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"yt\")\n",
80+
" jpeg_compression(scale=3, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_yt\"])\n",
81+
"\n",
82+
" sparsegrid_path = os.path.join(base_path, \"src\", \"sparsegrid\")\n",
83+
" save_path = os.path.join(base_path, \"dst\", \"sparsegrid\")\n",
84+
" hevc_compression(crf=21, framerate=25, src=sparsegrid_path, dst=save_path, config=config[\"3d_encoding\"])\n",
85+
" "
86+
]
87+
}
88+
],
89+
"metadata": {
90+
"interpreter": {
91+
"hash": "53f0588b3c374ffb8d6102e58c8bd6fde8c17a0cc886162e9d708a7e3ec6b0c9"
92+
},
93+
"kernelspec": {
94+
"display_name": "Python 3.7.13 ('inr')",
95+
"language": "python",
96+
"name": "python3"
97+
},
98+
"language_info": {
99+
"codemirror_mode": {
100+
"name": "ipython",
101+
"version": 3
102+
},
103+
"file_extension": ".py",
104+
"mimetype": "text/x-python",
105+
"name": "python",
106+
"nbconvert_exporter": "python",
107+
"pygments_lexer": "ipython3",
108+
"version": "3.8.8"
109+
},
110+
"orig_nbformat": 4
111+
},
112+
"nbformat": 4,
113+
"nbformat_minor": 2
114+
}

config/config_nvp_l.json

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
{
2+
"nvp": {
3+
"2d_encoding_xy": {
4+
"otype": "DenseGrid",
5+
"n_levels": 16,
6+
"n_features_per_level": 4,
7+
"log2_hashmap_size": 32,
8+
"base_resolution": 16,
9+
"per_level_scale": 1.35
10+
},
11+
"2d_encoding_xt": {
12+
"otype": "DenseGrid",
13+
"n_levels": 16,
14+
"n_features_per_level": 4,
15+
"log2_hashmap_size": 32,
16+
"base_resolution": 16,
17+
"per_level_scale": 1.35
18+
},
19+
"2d_encoding_yt": {
20+
"otype": "DenseGrid",
21+
"n_levels": 16,
22+
"n_features_per_level": 4,
23+
"log2_hashmap_size": 32,
24+
"base_resolution": 16,
25+
"per_level_scale": 1.35
26+
},
27+
"3d_encoding": {
28+
"otype": "SparseGrid",
29+
"n_features_per_level": 4,
30+
"x_resolution": 300,
31+
"y_resolution": 300,
32+
"t_resolution": 600,
33+
"upsample": false
34+
},
35+
"network": {
36+
"n_neurons": 128,
37+
"n_hidden_layers": 3
38+
}
39+
}
40+
}

config/config_nvp_s.json

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
{
2+
"nvp": {
3+
"2d_encoding_xy": {
4+
"otype": "DenseGrid",
5+
"n_levels": 16,
6+
"n_features_per_level": 2,
7+
"log2_hashmap_size": 32,
8+
"base_resolution": 16,
9+
"per_level_scale": 1.35
10+
},
11+
"2d_encoding_xt": {
12+
"otype": "DenseGrid",
13+
"n_levels": 16,
14+
"n_features_per_level": 2,
15+
"log2_hashmap_size": 32,
16+
"base_resolution": 16,
17+
"per_level_scale": 1.35
18+
},
19+
"2d_encoding_yt": {
20+
"otype": "DenseGrid",
21+
"n_levels": 16,
22+
"n_features_per_level": 2,
23+
"log2_hashmap_size": 32,
24+
"base_resolution": 16,
25+
"per_level_scale": 1.35
26+
},
27+
"3d_encoding": {
28+
"otype": "SparseGrid",
29+
"n_features_per_level": 2,
30+
"x_resolution": 300,
31+
"y_resolution": 300,
32+
"t_resolution": 600,
33+
"upsample": false
34+
},
35+
"network": {
36+
"n_neurons": 128,
37+
"n_hidden_layers": 3
38+
}
39+
}
40+
}

0 commit comments

Comments
 (0)