subin-kim-cv
diff --git a/‎README.md
+104 b/‎README.md
+104
diff --git a/‎compression.ipynb
+114 b/‎compression.ipynb
+114
diff --git a/‎config/config_nvp_l.json
+40 b/‎config/config_nvp_l.json
+40
diff --git a/‎config/config_nvp_s.json
+40 b/‎config/config_nvp_s.json
+40
@@ -0,0 +1,104 @@
+# Scalable Neural Video Representations with Leanable Positional Features (NVP)
+
+Official PyTorch implementation of
+["**Scalable Neural Video Representations with Leanable Positional Features**"](
+https://arxiv.org/xxxxx) (NeurIPS 2022) by
+[Subin Kim*](https://subin-kim-cv.github.io/),
+[Sihyun Yu*](https://sihyun.me/),
+[Jaeho Lee](https://jaeho-lee.github.io/),
+and [Jinwoo Shin](https://alinlab.kaist.ac.kr/shin.html).
+
+### [Project Page](https://subin-kim-cv.github.io/NVP) | [Paper](xxxx) | [Slide](https://subin-kim-cv.github.io/assets/2022_NVP/slide/kim2022NVP.pdf) 
+
+<p align="center">
+    <img src=figures/teaser_dynamic_compressed.gif width="900"> 
+    <img src=figures/teaser_compressed.gif width="900"> 
+</p>
+
+
+## 1. Requirements
+### Environments
+Required packages are listed in "environment.yaml".
+Also, you should install the following packages:
+```
+conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
+
+pip install git+https://github.com/subin-kim-cv/tiny-cuda-nn/#subdirectory=bindings/torch
+```
+* This [repository](https://github.com/subin-kim-cv/tiny-cuda-nn) is slightly different from original implementation of [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn)
+
+### Datasets
+First, download the UVG-HD datasets from the following links:
+
+* [UVG-HD](http://ultravideo.fi/#testsequences)
+
+Then, extract RGB sequences from the original YUV videos of UVG-HD using ffmpeg. Here, INPUT is the input file name, and OUTPUT is a directory to save decompressed RGB frames.
+
+```
+ffmpeg -f rawvideo -vcodec rawvideo -s 1920x1080 -r 120 -pix_fmt yuv420p \ -i INPUT.yuv OUTPUT/f%05d.png
+```
+
+## 2. Training
+Run the code with a single GPU.
+
+```train
+CUDA_VISIBLE_DEVICES=0 python experiment_scripts/train_video.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./config/config_nvp_s.json 
+```
+* Option --logging_root denotes the path to save the experiment log.
+* Option --experiment_name denotes the subdirectory to save the log files (results, checkpoints, configuration, etc.) existed under --logging_root.
+* Option --dataset denotes the path of RGB sequences (e.g., ~/data/Jockey).
+* Option --num_frames denotes the number of frames to reconstruct (300 for the ShakeNDry video and 600 for other videos in UVG-HD).
+* To reconstruct videos with 300 frames, please change the values of "t_resolution" in configuration file to 300.
+
+## 3. Evaluation
+Evaluation without compression of parameters (only quantize)
+```
+CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json   
+```
+* Option --save denotes whether to save the reconstructed frames.
+* One can specify an option --s_interp for a video superresolution results. It denotes the superresolution scale (e.g., 8).
+* One can specify an option --t_interp for a video frame interpolation results. It denotes the temporal interpolation scale (e.g., 8).
+
+
+Evaluation with compression of parameters using well-known image and video codecs
+
+1. Save the quantized parameters
+    ```
+    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json  
+    ```
+2. Compress the saved sparse positional image-/video-like features using codecs. 
+    * Execute "compression.ipynb". 
+    * Please change the logging_root and experiment_name in "compression.ipynb" appropriately.
+    * One can change qscale, crf, framerate which changes the compression ratio of sparse positinal features.
+        * qscale ranges from 1 to 31, where larger values mean the worse quality (2~5 recommended).
+        * crf ranges from 0 to 51 where larger values mean the worse quality (20~25 recommended)
+        * framerate (25 or 40 recommended)
+
+
+3. Evaluation with the compressed parameters
+    ```
+    CUDA_VISIBLE_DEVICES=0 python experiment_scripts/eval_compression.py --logging_root ./logs_nvp --experiment_name <EXPERIMENT_NAME> --dataset <DATASET> --num_frames <NUM_FRAMES>  --config ./logs_nvp/<EXPERIMENT_NAME>/config_nvp_s.json --qscale 2 3 3 --crf 21 --framerate 25
+    ```
+    * Option --save denotes whether to save the reconstructed frames.
+    * Please specify the option --qscale, --crf, --framerate as same with the values in the compression.ipynb
+
+## 4. Results
+Reconstructed video results of NVP on UVG-HD, more temporally dynamic vidoes, 4K long video are available at the following [project page](https://subin-kim-cv.github.io/NVP/)
+
+Our model achieves the following performance on UVG-HD with a single NVIDIA V100 32GB GPU:
+* Beauty, Bosphorus, Honeybee, Jockey, ReadySetGo, ShakeNDry, Yachtride in UVG-HD
+
+| Encoding Time  |   BPP  |    PSNR (&#8593;)  |    FLIP (&#8595;)  |    LPIPS (&#8595;) |
+| -------------- | ------ | ------------------ | ------------------ | ------------------ |
+|   ~5  minutes  |  0.901 | 34.57 $\pm$ 2.62   | 0.075 $\pm$ 0.021  | 0.190 $\pm$ 0.100  |
+|   ~10 minutes  |  0.901 | 35.79 $\pm$ 2.31   | 0.065 $\pm$ 0.016  | 0.160 $\pm$ 0.098  |
+|     ~1 hour    |  0.901 | 37.61 $\pm$ 2.20   | 0.052 $\pm$ 0.011  | 0.145 $\pm$ 0.106  |
+|     ~8 hours   |  0.210 | 36.46 $\pm$ 2.18   | 0.067 $\pm$ 0.017  | 0.135 $\pm$ 0.083  |
+
+
+## Citation
+```
+```
+
+## Reference
+We used the code from following repositories: [SIREN](https://github.com/vsitzmann/siren), [Modulation](https://github.com/lucidrains/siren-pytorch), [tiny-cuda-nn](https://github.com/NVlabs/tiny-cuda-nn)
@@ -0,0 +1,114 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "def jpeg_compression(scale, src, dst, config):\n",
+    "\n",
+    "    dim = config[\"n_features_per_level\"]\n",
+    "    n_levels = config[\"n_levels\"]\n",
+    "\n",
+    "    for d in range(dim):\n",
+    "        for i in range(n_levels):\n",
+    "            src_path = os.path.join(src, f\"dim{d}\", f\"{str(i).zfill(2)}.png\")\n",
+    "            save_path = os.path.join(dst, f\"dim{d}\", str(scale).zfill(3))\n",
+    "            os.makedirs(save_path, exist_ok=True)\n",
+    "            save_path = os.path.join(save_path,  f\"{str(i).zfill(2)}.jpg\")\n",
+    "\n",
+    "            if os.path.isfile(save_path):\n",
+    "                !rm $save_path\n",
+    "            !ffmpeg -hide_banner -i $src_path -qscale:v $scale $save_path\n",
+    "\n",
+    "\n",
+    "# keyint=7:min-keyint=7:no-scenecut:me=full:subme=7:bframes=0\n",
+    "def hevc_compression(crf, framerate, src, dst, config):\n",
+    "\n",
+    "    dim = config[\"n_features_per_level\"]\n",
+    "    \n",
+    "    for d in range(dim):\n",
+    "        src_path = os.path.join(src, f\"dim{d}\", \"%05d.png\")\n",
+    "        os.makedirs(os.path.join(dst, f\"dim{d}\"), exist_ok=True)\n",
+    "        save_path = os.path.join(dst, f\"dim{d}\", f\"{crf}_{framerate}.mp4\")\n",
+    "\n",
+    "        if os.path.isfile(save_path):\n",
+    "            !rm $save_path\n",
+    "\n",
+    "        !ffmpeg -framerate $framerate -i $src_path -c:v hevc -preset slow -x265-params bframes=0 -crf $crf $save_path\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Compress Learnable Keyframes and Sparse Grid"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "datasets = [\"jockey\"]\n",
+    "\n",
+    "for data in datasets:\n",
+    "    experiment_name = f\"{data}\"\n",
+    "    config_path = f\"./logs_nvp/{experiment_name}/config.json\"\n",
+    "    base_path = f\"./logs_nvp/{experiment_name}/compression\"\n",
+    "\n",
+    "\n",
+    "    with open(config_path, 'r') as f:\n",
+    "        config = json.load(f)\n",
+    "\n",
+    "    config = config[\"nvp\"]\n",
+    "    keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"xy\")\n",
+    "    save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"xy\")\n",
+    "    jpeg_compression(scale=2, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_xy\"])\n",
+    "\n",
+    "    keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"xt\")\n",
+    "    save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"xt\")\n",
+    "    jpeg_compression(scale=3, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_xt\"])\n",
+    "\n",
+    "    keyframe_path = os.path.join(base_path, \"src\", \"keyframes\", \"yt\")\n",
+    "    save_path = os.path.join(base_path, \"dst\", \"keyframes\", \"yt\")\n",
+    "    jpeg_compression(scale=3, src=keyframe_path, dst=save_path, config=config[\"2d_encoding_yt\"])\n",
+    "\n",
+    "    sparsegrid_path = os.path.join(base_path, \"src\", \"sparsegrid\")\n",
+    "    save_path = os.path.join(base_path, \"dst\", \"sparsegrid\")\n",
+    "    hevc_compression(crf=21, framerate=25, src=sparsegrid_path, dst=save_path, config=config[\"3d_encoding\"])\n",
+    "    "
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "53f0588b3c374ffb8d6102e58c8bd6fde8c17a0cc886162e9d708a7e3ec6b0c9"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.7.13 ('inr')",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
@@ -0,0 +1,40 @@
+{
+    "nvp": {
+        "2d_encoding_xy": {
+            "otype": "DenseGrid",
+            "n_levels": 16,
+            "n_features_per_level": 4,
+            "log2_hashmap_size": 32,
+            "base_resolution": 16,
+            "per_level_scale": 1.35
+        },
+        "2d_encoding_xt": {
+            "otype": "DenseGrid",
+            "n_levels": 16,
+            "n_features_per_level": 4,
+            "log2_hashmap_size": 32,
+            "base_resolution": 16,
+            "per_level_scale": 1.35
+        },
+        "2d_encoding_yt": {
+            "otype": "DenseGrid",
+            "n_levels": 16,
+            "n_features_per_level": 4,
+            "log2_hashmap_size": 32,
+            "base_resolution": 16,
+            "per_level_scale": 1.35
+        },
+        "3d_encoding": {
+            "otype": "SparseGrid",
+            "n_features_per_level": 4,
+            "x_resolution": 300,
+            "y_resolution": 300,
+            "t_resolution": 600,
+            "upsample": false
+        },
+        "network": {
+            "n_neurons": 128,
+            "n_hidden_layers": 3
+        }
+    }
+  }
@@ -0,0 +1,40 @@
+{
+  "nvp": {
+      "2d_encoding_xy": {
+          "otype": "DenseGrid",
+          "n_levels": 16,
+          "n_features_per_level": 2,
+          "log2_hashmap_size": 32,
+          "base_resolution": 16,
+          "per_level_scale": 1.35
+      },
+      "2d_encoding_xt": {
+          "otype": "DenseGrid",
+          "n_levels": 16,
+          "n_features_per_level": 2,
+          "log2_hashmap_size": 32,
+          "base_resolution": 16,
+          "per_level_scale": 1.35
+      },
+      "2d_encoding_yt": {
+          "otype": "DenseGrid",
+          "n_levels": 16,
+          "n_features_per_level": 2,
+          "log2_hashmap_size": 32,
+          "base_resolution": 16,
+          "per_level_scale": 1.35
+      },
+      "3d_encoding": {
+          "otype": "SparseGrid",
+          "n_features_per_level": 2,
+          "x_resolution": 300,
+          "y_resolution": 300,
+          "t_resolution": 600,
+          "upsample": false
+      },
+      "network": {
+          "n_neurons": 128,
+          "n_hidden_layers": 3
+      }
+  }
+}