[CVPR 2025] Progressive Focused Transformer for Single Image Super-Resolution

This repository is an official implementation of the paper "Progressive Focused Transformer for Single Image Super-Resolution", CVPR, 2025.

By Wei Long, Xingyu Zhou, Leheng Zhang, and Shuhang Gu.

Abstract: Transformer-based methods have achieved remarkable results in image super-resolution tasks because they can capture non-local dependencies in low-quality input images. However, this feature-intensive modeling approach is computationally expensive because it calculates the similarities between numerous features that are irrelevant to the query features when obtaining attention weights. These unnecessary similarity calculations not only degrade the reconstruction performance but also introduce significant computational overhead. How to accurately identify the features that are important to the current query features and avoid similarity calculations between irrelevant features remains an urgent problem. To address this issue, we propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens. PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on various single image super-resolution benchmarks..

Environment

Python 3.9
PyTorch 2.5.0

Installation

git clone https://github.com/LabShuHangGU/PFT-SR.git

conda create -n PFT python=3.9
conda activate PFT

pip install -r requirements.txt
python setup.py develop

cd ./ops_smm
./make.sh

Inference

Using inference.py for fast inference on single image or multiple images within the same folder.

# For classical SR
python inference.py -i inference_image.png -o results/test/ --scale 4 --task classical
python inference.py -i inference_images/ -o results/test/ --scale 4 --task classical

# For lightweight SR
python inference.py -i inference_image.png -o results/test/ --scale 4 --task lightweight
python inference.py -i inference_images/ -o results/test/ --scale 4 --task lightweight

The PFT SR model processes the image inference_image.png or images within the inference_images/ directory. The results will be saved in the results/inference/ directory.

Training

Data Preparation

Download the training dataset DF2K (DIV2K + Flickr2K) and put them in the folder ./datasets.
It's recommanded to refer to the data preparation from BasicSR for faster data reading speed.

Training Commands

Refer to the training configuration files in ./options/train folder for detailed settings.
PFT (Classical Image Super-Resolution)

# batch size = 8 (GPUs) × 4 (per GPU)
# training dataset: DF2K

# ×2 scratch, input size = 64×64, 500k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145  basicsr/train.py -opt options/train/001_PFT_SRx2_scratch.yml --launcher pytorch

# ×3 finetune, input size = 64×64, 250k iterationsCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145  basicsr/train.py -opt options/train/002_PFT_SRx3_finetune.yml --launcher pytorch

# ×4 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145  basicsr/train.py -opt options/train/003_PFT_SRx4_finetune.yml --launcher pytorch

PFT-light (Lightweight Image Super-Resolution)

# batch size = 4 (GPUs) × 8 (per GPU)
# training dataset: DIV2K

# ×2 scratch, input size = 64×64, 500k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145  basicsr/train.py -opt options/train/101_PFT_light_SRx2_scratch.yml --launcher pytorch

# ×3 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145  basicsr/train.py -opt options/train/102_PFT_light_SRx3_finetune.yml --launcher pytorch

# ×4 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145  basicsr/train.py -opt options/train/103_PFT_light_SRx4_finetune.yml --launcher pytorch

Testing

Data Preparation

Download the testing data (Set5 + Set14 + BSD100 + Urban100 + Manga109 [download]) and put them in the folder ./datasets.

Pretrained Models

Download the pretrained models and put them in the folder ./experiments/pretrained_models.

Testing Commands

Refer to the testing configuration files in ./options/test folder for detailed settings.
PFT (Classical Image Super-Resolution)
We have now integrated the patchwise_testing strategy into basicsr/models/pft_model.py. This update allows for successful inference on RTX 4090 GPUs without running into memory issues.

python basicsr/test.py -opt options/test/001_PFT_SRx2_scratch.yml
python basicsr/test.py -opt options/test/002_PFT_SRx3_finetune.yml
python basicsr/test.py -opt options/test/003_PFT_SRx4_finetune.yml

PFT-light (Lightweight Image Super-Resolution)

python basicsr/test.py -opt options/test/101_PFT_light_SRx2_scratch.yml
python basicsr/test.py -opt options/test/102_PFT_light_SRx3_finetune.yml
python basicsr/test.py -opt options/test/103_PFT_light_SRx4_finetune.yml

Results

Classical Image Super-Resolution

Lightweight Image Super-Resolution

Visual Results

Visualization of Attention Distributions

Uncomment the code at this location to enable attention map saving: https://github.com/LabShuHangGU/PFT-SR/blob/master/basicsr/archs/pft_arch.py#L316-L328
Perform inference on the image you want to visualize to generate and save the attention maps under the ./results/Attention_map directory:

python inference.py -i inference_image.png -o results/test/ --scale 4 --task lightweight

Modify the corresponding paths and specify the window location you want to visualize in VisualAttention.py (the window is indexed from left to right, top to bottom, assuming the stride equals the window size).
Run the following command to visualize the attention map:

python VisualAttention.py

It should be noted that PFT employs a shift window operation, resulting in different corresponding positions in the attention maps between odd-numbered and even-numbered layers.

Acknowledgements

This code is built on BasicSR and ATD.

Citation

@article{long2025progressive,
  title={Progressive Focused Transformer for Single Image Super-Resolution},
  author={Long, Wei and Zhou, Xingyu and Zhang, Leheng and Gu, Shuhang},
  journal={arXiv preprint arXiv:2503.20337},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[CVPR 2025] Progressive Focused Transformer for Single Image Super-Resolution

Contents

Environment

Installation

Inference

Training

Data Preparation

Training Commands

Testing

Data Preparation

Pretrained Models

Testing Commands

Results

Visual Results

Visualization of Attention Distributions

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
basicsr		basicsr
experiments/pretrained_models		experiments/pretrained_models
figures		figures
ops_smm		ops_smm
options		options
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
VERSION		VERSION
VisualAttention.py		VisualAttention.py
inference.py		inference.py
inference_image.png		inference_image.png
requirements.txt		requirements.txt
setup.py		setup.py

License

LabShuHangGU/PFT-SR

Folders and files

Latest commit

History

Repository files navigation

[CVPR 2025] Progressive Focused Transformer for Single Image Super-Resolution

Contents

Environment

Installation

Inference

Training

Data Preparation

Training Commands

Testing

Data Preparation

Pretrained Models

Testing Commands

Results

Visual Results

Visualization of Attention Distributions

Acknowledgements

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages