Skip to content

Latest commit

 

History

History
79 lines (51 loc) · 2.86 KB

README.md

File metadata and controls

79 lines (51 loc) · 2.86 KB

DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video (AAAI2023)

在这里插入图片描述 Paper         demo video      Supplementary materials

🤔 How to achive this boost in inference latency?

To achieve this, several changes were implemented:

  • Removed DeepSpeech and utilized wav2vec for instant feature extraction, leveraging the speed and power of torch.
  • Trained a lightweight model to map the wav2vec features to DeepSpeech, maintaining the existing process.
  • Enhanced frames extraction for improved speed.
  • These adjustments contribute to a reduction of up to 60% in inference latency compared to the original implementation, all while maintaining quality.

Additionally, Docker has been introduced to facilitate faster, simpler, and more automated facial landmarks extraction.

Tested on:

  • Windows 11
  • Python version >= 3.9

📖 Prerequisites

To get started, follow these steps:

  • Download the resources (asserts.zip) n Google drive. Unzip the file and place the directory in the current directory (./). This zip file includes the model for mapping wav2vec to deepspeech, beside all other models.

Install Instructions

Set up a Conda environment by executing the following commands.

  conda create -n dinet python=3.9
  conda activate dinet

Clone repository

  git clone https://github.com/illeng/DINet_optimized_Win.git
  cd DINet

Install Dependencies

  pip install -r requirements.txt

Install torch 1.11.0

  pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html

Install tensorflow 2.5.0

  pip install tensorflow==2.5.0

Installing pysoundfile

  conda install -c conda-forge pysoundfile

🚀 Inference

Run inference with example videos:

python inference.py --mouth_region_size=256 --source_video_path=./asserts/examples/testxxx.mp4 --source_openface_landmark_path=./asserts/examples/testxxx.csv --driving_audio_path=./asserts/examples/driving_audio_xxx.wav --pretrained_clip_DINet_path=./asserts/clip_training_DINet_256mouth.pth 

Use OpenFace to detect smooth facial landmarks of your custom video..

Acknowledge

The AdaAT is borrowed from AdaAT. The deepspeech feature is borrowed from AD-NeRF. The basic module is borrowed from first-order. Thanks for their released code.