Kai Zhang1, Zhenyu Zhang1, Jian Yang1, Zhenheng Yang2, Ying Tai1†
-
2025.01.09 The online demo of STAR is now live! Please note that due to the duration limitation of ZeroGPU, the running time may exceed the allocated GPU duration. If you'd like to try it, you can duplicate the demo and assign a paid GPU.
-
2025.01.07 The pretrained STAR model (I2VGen-XL and CogVideoX-5B versions) and inference code have been released.
- Inference codes
- Online demo
- Training codes
👀 More visual results can be found in our Project Page and Video Demo.
## git clone this repository
git clone https://github.com/NJU-PCALab/STAR.git
cd STAR
## create an environment
conda create -n star python=3.10
conda activate star
pip install -r requirements.txt
sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y
Base Model | Type | URL |
---|---|---|
I2VGen-XL | Light Degradation | 🔗 |
I2VGen-XL | Heavy Degradation | 🔗 |
CogVideoX-5B | Heavy Degradation | 🔗 |
Step 1: Download the pretrained model STAR from HuggingFace.
We provide two versions for I2VGen-XL-based model, heavy_deg.pt
for heavy degraded videos and light_deg.pt
for light degraded videos (e.g., the low-resolution video downloaded from video websites).
You can put the weight into pretrained_weight/
.
You can put the testing videos in the input/video/
.
As for the prompt, there are three options: 1. No prompt. 2. Automatically generate a prompt (e.g., using Pllava). 3. Manually write the prompt. You can put the txt file in the input/text/
.
You need to change the paths in video_super_resolution/scripts/inference_sr.sh
to your local corresponding paths, including video_folder_path
, txt_file_path
, model_path
, and save_dir
.
bash video_super_resolution/scripts/inference_sr.sh
If you encounter an OOM problem, you can set a smaller frame_length
in inference_sr.sh
.
Refer to these instructions for inference with the CogVideX-5B-based model.
Please note that the CogVideX-5B-based model supports only 720x480 input.
This project is based on I2VGen-XL, VEnhancer, CogVideoX and OpenVid-1M. Thanks for their awesome works.
If our project helps your research or work, please consider citing our paper:
@misc{xie2025starspatialtemporalaugmentationtexttovideo,
title={STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution},
author={Rui Xie and Yinhong Liu and Penghao Zhou and Chen Zhao and Jun Zhou and Kai Zhang and Zhenyu Zhang and Jian Yang and Zhenheng Yang and Ying Tai},
year={2025},
eprint={2501.02976},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.02976},
}
If you have any inquiries, please don't hesitate to reach out via email at [email protected]
I2VGen-XL-based models are distributed under the terms of the MIT License.
CogVideoX-5B-based model is distributed under the terms of the CogVideoX License.