This repository is an official implementation of the paper "Progressive Focused Transformer for Single Image Super-Resolution", CVPR, 2025.
By Wei Long, Xingyu Zhou, Leheng Zhang, and Shuhang Gu.
Abstract: Transformer-based methods have achieved remarkable results in image super-resolution tasks because they can capture non-local dependencies in low-quality input images. However, this feature-intensive modeling approach is computationally expensive because it calculates the similarities between numerous features that are irrelevant to the query features when obtaining attention weights. These unnecessary similarity calculations not only degrade the reconstruction performance but also introduce significant computational overhead. How to accurately identify the features that are important to the current query features and avoid similarity calculations between irrelevant features remains an urgent problem. To address this issue, we propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens. PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on various single image super-resolution benchmarks..
![]()
![]()
- Enviroment
- Inference
- Training
- Testing
- Results
- Visual Results
- Visualization of Attention Distributions
- Acknowledgements
- Citation
- Python 3.9
- PyTorch 2.5.0
git clone https://github.com/LabShuHangGU/PFT-SR.git
conda create -n PFT python=3.9
conda activate PFT
pip install -r requirements.txt
python setup.py develop
cd ./ops_smm
./make.shUsing inference.py for fast inference on single image or multiple images within the same folder.
# For classical SR
python inference.py -i inference_image.png -o results/test/ --scale 4 --task classical
python inference.py -i inference_images/ -o results/test/ --scale 4 --task classical
# For lightweight SR
python inference.py -i inference_image.png -o results/test/ --scale 4 --task lightweight
python inference.py -i inference_images/ -o results/test/ --scale 4 --task lightweightThe PFT SR model processes the image inference_image.png or images within the inference_images/ directory. The results will be saved in the results/inference/ directory.
- Download the training dataset DF2K (DIV2K + Flickr2K) and put them in the folder
./datasets. - It's recommanded to refer to the data preparation from BasicSR for faster data reading speed.
- Refer to the training configuration files in
./options/trainfolder for detailed settings. - PFT (Classical Image Super-Resolution)
# batch size = 8 (GPUs) × 4 (per GPU)
# training dataset: DF2K
# ×2 scratch, input size = 64×64, 500k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145 basicsr/train.py -opt options/train/001_PFT_SRx2_scratch.yml --launcher pytorch
# ×3 finetune, input size = 64×64, 250k iterationsCUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145 basicsr/train.py -opt options/train/002_PFT_SRx3_finetune.yml --launcher pytorch
# ×4 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --use-env --nproc_per_node=8 --master_port=1145 basicsr/train.py -opt options/train/003_PFT_SRx4_finetune.yml --launcher pytorch- PFT-light (Lightweight Image Super-Resolution)
# batch size = 4 (GPUs) × 8 (per GPU)
# training dataset: DIV2K
# ×2 scratch, input size = 64×64, 500k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145 basicsr/train.py -opt options/train/101_PFT_light_SRx2_scratch.yml --launcher pytorch
# ×3 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145 basicsr/train.py -opt options/train/102_PFT_light_SRx3_finetune.yml --launcher pytorch
# ×4 finetune, input size = 64×64, 250k iterations
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --use-env --nproc_per_node=4 --master_port=1145 basicsr/train.py -opt options/train/103_PFT_light_SRx4_finetune.yml --launcher pytorch- Download the testing data (Set5 + Set14 + BSD100 + Urban100 + Manga109 [download]) and put them in the folder
./datasets.
- Download the pretrained models and put them in the folder
./experiments/pretrained_models.
- Refer to the testing configuration files in
./options/testfolder for detailed settings. - PFT (Classical Image Super-Resolution)
- We have now integrated the patchwise_testing strategy into basicsr/models/pft_model.py. This update allows for successful inference on RTX 4090 GPUs without running into memory issues.
python basicsr/test.py -opt options/test/001_PFT_SRx2_scratch.yml
python basicsr/test.py -opt options/test/002_PFT_SRx3_finetune.yml
python basicsr/test.py -opt options/test/003_PFT_SRx4_finetune.yml- PFT-light (Lightweight Image Super-Resolution)
python basicsr/test.py -opt options/test/101_PFT_light_SRx2_scratch.yml
python basicsr/test.py -opt options/test/102_PFT_light_SRx3_finetune.yml
python basicsr/test.py -opt options/test/103_PFT_light_SRx4_finetune.yml- Classical Image Super-Resolution
- Lightweight Image Super-Resolution
- Uncomment the code at this location to enable attention map saving: https://github.com/LabShuHangGU/PFT-SR/blob/master/basicsr/archs/pft_arch.py#L316-L328
- Perform inference on the image you want to visualize to generate and save the attention maps under the ./results/Attention_map directory:
python inference.py -i inference_image.png -o results/test/ --scale 4 --task lightweight
- Modify the corresponding paths and specify the window location you want to visualize in VisualAttention.py (the window is indexed from left to right, top to bottom, assuming the stride equals the window size).
- Run the following command to visualize the attention map:
python VisualAttention.py
It should be noted that PFT employs a shift window operation, resulting in different corresponding positions in the attention maps between odd-numbered and even-numbered layers.
This code is built on BasicSR and ATD.
@article{long2025progressive,
title={Progressive Focused Transformer for Single Image Super-Resolution},
author={Long, Wei and Zhou, Xingyu and Zhang, Leheng and Gu, Shuhang},
journal={arXiv preprint arXiv:2503.20337},
year={2025}
}






