Skip to content

Implementation of the paper "Fast video visual quality and resolution improvement using SR-UNet".

License

Notifications You must be signed in to change notification settings

fede-vaccaro/fast-sr-unet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

dfedf53 · Oct 2, 2021

History

13 Commits
Oct 2, 2021
Jul 16, 2021
Jul 16, 2021
Jul 16, 2021
Jul 16, 2021
Jul 16, 2021
Jul 18, 2021
Jul 16, 2021
Jul 18, 2021
Jul 16, 2021
Jul 18, 2021
Jul 18, 2021
Jul 16, 2021
Jul 18, 2021

Repository files navigation

Fast SR-UNet

This repository contains the implementation of [1]. It is an architecture comprised with a GAN-based training procedure for obtaining a fast neural network which enable better bitrate performances respect to the H.265 codec for the same quality, or better quality at the same bitrate.

Requirements:

  • Installing CUDA with torchvision and torch: $ conda install pytorch torchvision cudatoolkit=10.2 -c pytorch -c
  • LPIPS: $ pip install lpips
  • FFMpeg compiled with H.265 codec and also VMAF metric. My version is included in the helper/ directory but it won't likely work. For references check the official compilation guide and the VMAF GitHub Repository.

The dataset:

First, the dataset we use for training is the BVI-DVC. For preparing the dataset there are two helper script, compress_train_videos.sh for spatially compressing and encoding each video, then with extract_train_frames.sh the dataset can be prepared.
The train dataset should follow this naming scheme (assuming the videos are encoded with CRF 23):

  [DATASET_DIR]/
      frames_HQ/
          [clipName1]/
              [clipName1]_001.png
              [clipName1]_002.png
              ...
              [clipName1]_064.png
          [clipName2]/
              ...
          [clipNameN]/
              ...
      frames_QF23/
          [clipName1]/
              [clipName1]_001.png
              [clipName1]_002.png
              ...
              [clipName1]_064.png
          [clipName2]/
              ...
          [clipNameN]/
              ...

Training the model:

To train the SR-ResNet described in the paper for 2x Super Resolution (as used in the model for the 540p -> 1080p upscaling), you can use this command.

$ python train.py --arch srunet --device 0 --upscale 2 --export [EXPORT_DIR] \
--epochs 80 --dataset [DATASET_DIR] --crf 23

Or, since most of these arguments are defaults, simply

$ python train.py --dataset [DATASET_DIR]

For more information about the other parameters, inspect utils.py or try

& python train.py -h

However, in the bandwidth experiments we employed a lighter model, trained on a range of CRFs for performing an easier 1.5x upscale (720p -> 1080p). It is obtainable with the following command:

$ python train.py --arch srunet --layer_multiplier 0.7 --n_filters 48 --downsample 0.75 --device 0 \
--upscale 2 --export [EXPORT_DIR] --epochs 80 --dataset [DATASET_DIR] --crf [CRF]

Testing the models:

You may want to test your models. In our paper we tested on the 1080p clips available from the (Derf's Collection)[https://media.xiph.org/video/derf/] in Y4M format. For preparing the test set (of encoded clips) you can use the compress_test_videos.sh helper script. This time, the test set will be structured as follows, and there is no need of extracting each frame:

    [TEST_DIR]/
        encoded540CRF23/
            aspen_1080p.mp4
            crowd_run_1080p50.mp4
            ducks_take_off_1080p50.mp4
            ...
            touchdown_pass_1080p.mp4
        aspen_1080p.y4m
        crowd_run_1080p50.y4m
        ducks_take_off_1080p50.y4m
        ...
        touchdown_pass_1080p.y4m

Finally, for testing a model (e.g. the one performing 1.5x upscale) which name is [MODEL_NAME] you can use the command:

$ python evaluate_model.py --model [MODEL_NAME] --arch srunet --layer_mult 0.7 --n_filters 48 \
--downsample 0.75 --device 0 --upscale 2 --crf 23 --test_dir [TEST_DIR] --testinputres 720 --testoutputres 1080

Ultimately will be printed on screen the experimental results, and also will be saved a .csv file contained these infos.

Inference with the model using render.py

You can use the script render.py for using the model in real-time to upscale your clips. Examples:

  • For 2x upscaling
    $ python render.py --clipname path/to/clip.mp4 --model models/srunet_2x_crf23.pth
    
  • For 1.5x upscaling
    $ python render.py --clipname path/to/clip.mp4 --model models/srunet_1.5x_crf23.pth \
    --layer_mult 0.7 --n_filters 48 --downsample 0.75
    

You will notice that by default the output is split in two halves: on the left there is the input, on the right there is the upscaled version. You can show only the upscaled version by adding the flag --show-only-upscaled.

About performances, my GTX 1080 Ti is enough for rendering at 30fps when upscaling 540p -> 1080p and 25fps when 720p -> 1080p. Note that in the paper we also employed Nvidia Apex for speeding up inference times.

Examples

TajMahal Check this link for the complete clip.
Venice

References:

This code is the implementation of my Master Degree Thesis, from which my supervisors and I wrote the paper:

  • [1] Fast video visual quality and resolution improvement using SR-UNet. Authors Federico Vaccaro, Marco Bertini, Tiberio Uricchio, and Alberto Del Bimbo (accepted at ACM MM '21)

About

Implementation of the paper "Fast video visual quality and resolution improvement using SR-UNet".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published