SIND: Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment

Official implementation of "Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment" (IEEE TCSVT 2025)

🏆 NTIRE 2024 Portrait Quality Assessment Challenge Winner

Abstract

The inherent diversity of visual scenes poses a fundamental challenge in blind image quality assessment (BIQA), leading to compromised model generalizability when dealing with unseen scenes. We found that human annotations for images with different visual scenes exhibit distinct quality distribution discrepancies, causing existing BIQA models to overfit to such diversified distributions.

This work presents SIND (Scene-INvariant Distribution), a generalizable BIQA model that addresses this challenge through a distribution alignment framework. Our approach automatically scales and shifts cross-scene distributions into a unified distribution, enabling scene-invariant and quality-aware feature representation. Additionally, we design a token-complementary patch reasoning network to extract comprehensive quality-aware features from both image overview and details.

Distribution alignment visualization: (a) Training data from three scenes, (b) Prediction without alignment, (c) Distribution alignment process, (d) Prediction with alignment

Framework

Overview of SIND: (1) Scene Sampling selects k scenes and n images per scene, (2) Distribution Alignment learns scene-specific transformations, (3) Model learns scene-invariant distributions

📁 Data Preparation

Technical Quality Assessment (TQA)

Dataset Links

Dataset	Images	Description
SPAQ	11,125	Smartphone Photography Attribute and Quality
KonIQ-10k	10,073	Konstanz Natural Image Quality Database
LIVEC/LIVEW	1,162	LIVE In the Wild Image Quality Challenge Database
LIVE	779	Laboratory for Image & Video Engineering
BID	585	Blur Image Database
CID2013	474	Color Image Database 2013

Aesthetic Quality Assessment (AQA)

Dataset	Images	Description
EVA	4,070	Explainable Visual Aesthetics
PARA	31220	Personalized image Aesthetics database with Rich Attributes

Download datasets following the links above

Organize data structure:

data/
├── SPAQ/
├── KonIQ-10k/
├── LIVEC/
├── CID2013/
├── BID/
├── LIVE/
├── EVA/
└── PARA/

Prepare data splits (Optional):

# Generate custom data splits
python data_json/data_json_generator.py

# Or use provided splits (recommended)
# Pre-generated JSON files are available in data_json/

📝 Note: Most of the datasets are splited following Q-Align, please refer data_json/data_json_generator.py for details.

🎯 Quick Start & Training

Cross-scene Validation

Train and evaluate on a single dataset with leave-one-out cross-scene validation. Refer to paper and train_test_IQA.py for more details.

# [Example] Train and evaluate on SPAQ dataset with leave-one-out cross-scene validation

cd "$(dirname "$0")"
pro_dir="./exp_log/leave_one_out/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq \
                    --test_dataset spaq \
                    --exp_type leave-one-out \
                    >> $pro_dir/train.log 

# Main configurations:
# - train_dataset: spaq / koniq10k / eva / para
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: leave-one-out

Cross-dataset Evaluation

Train on SPAQ and/or KonIQ-10k dataset and test on other datasets without fine-tuning. Refer to paper and cross_set_exp.sh for more details.

# [Example] Train on SPAQ, test on KonIQ-10k, LIVEW, LIVE, RBID, CID2013
# use scene-domain alignment (default)
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq \
                    --test_dataset koniq10k spaq livec live bid cid2013 \
                    --exp_type cross-set \
                    >> $pro_dir/train.log 

# [Example] Train on SPAQ and KonIQ-10k, test on KonIQ-10k, SPAQ, LIVEW, LIVE, BID, CID2013
# use dataset-domain alignment
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq_koniq_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq koniq10k \
                    --test_dataset koniq10k spaq livec live bid cid2013 \
                    --exp_type cross-set \
                    --dataset_domain \
                    >> $pro_dir/train.log 

# Main configurations:
# - train_dataset: spaq / koniq10k / spaq koniq10k
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: cross-set
# - dataset_domain: use dataset-domain alignment instead of scene-domain alignment

Intra-dataset Evaluation

Follow the experimental settings of LIQE. We mix six datasets for the training set and learn a single set of weights to test on six test sets. Refer to random_split_exp.sh for more details.

We conduct experiments on six IQA datasets, among which LIVE, CSIQ, and KADID-10k contain synthetic distortions, while LIVE Challenge, BID, and KonIQ-10K include realistic distortions. We randomly sample 70% and 10% images from each dataset to construct the training and validation set, respectively, leaving the remaining 20% for testing. We repeat this procedure ten times, and report median SRCC and PLCC results as prediction monotonicity and precision measures, respectively.

# [Example] Joint training on mix datasets: koniq10k livec bid kadid10k csiq live
# use dataset-domain alignment
pro_dir="./exp_log/random-split/our_epoch35_bs128_all_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset koniq10k livec bid kadid10k csiq live \
                    --test_dataset koniq10k livec bid kadid10k csiq live \
                    --exp_type random-split \
                    --dataset_domain \
                    >> $pro_dir/train.log

Experiment Types

The framework supports three types of experiments:

leave-one-out: Cross-scene validation within a single dataset
cross-set: Cross-dataset validation
random-split: Random split validation following the LIQE setup

Main Arguments

--train_dataset: Training dataset (spaq, koniq10k, eva, para)
--test_dataset: Testing dataset (spaq, koniq10k, eva, para, livec, live, bid, cid2013)
--local_global: Enable token-complementary patch reasoning
--loss_type: scale_shift if using alignment, l1 if not using alignment
--scene_sampling: Number of scenes per GPU in a mini-batch (0 for no scene sampling)
--exp_type: Experiment type (leave-one-out, cross-set, random-split)
--dataset_domain: Use dataset-domain alignment instead of scene-domain alignment

📊 Results

Cross-scene Validation Performance

Leave-one-out cross-scene validation on SPAQ, KonIQ-10k, EVA, and PARA datasets:

Dataset	SRCC	PLCC
SPAQ	0.861	0.889
KonIQ-10k	0.919	0.935
EVA	0.778	0.792
PARA	0.902	0.940

Cross-dataset Evaluation

Generalization performance across different datasets:

Cross dataset generalization performance of different methods when trained on SPAQ or KonIQ-10k. Metric is (SRCC+PLCC)/2.

Intra-dataset Evaluation

Performance following the LIQE protocol:

SRCC/PLCC values of intra-dataset validation for the TQA task.

NTIRE 2024 Champion

Our method won the NTIRE 2024 Portrait Quality Assessment Challenge, demonstrating superior generalization capabilities, especially on DXOMARK's internal Challenge Test dataset.

Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey

The final submission code and trained weights for the competition are available in the NTIRE2024/Submission.

Citation

If you find this work useful for your research, please cite our paper:

@ARTICLE{SIND2025,
  author={Huang, Yipo and Duan, Zhichao and Chen, Pengfei and Cai, Li and Li, Leida and Lin, Weisi},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment}, 
  year={2025},
  doi={10.1109/TCSVT.2025.3595208}}

License

This project is released under the MIT License.

Acknowledgements

Built upon the OpenCLIP and HyperIQA
Evaluation metrics: SRCC (Spearman Rank Correlation Coefficient) and PLCC (Pearson Linear Correlation Coefficient)

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.vscode		.vscode
NTIRE2024/Submission		NTIRE2024/Submission
data_json		data_json
figs		figs
g_iqa		g_iqa
open_clip		open_clip
sota		sota
tools		tools
.gitignore		.gitignore
README.md		README.md
ablation_exp.sh		ablation_exp.sh
ablation_kn_exp.sh		ablation_kn_exp.sh
cross_set_exp.sh		cross_set_exp.sh
infer_demo.sh		infer_demo.sh
leave_one_out_exp.sh		leave_one_out_exp.sh
random_split_exp.sh		random_split_exp.sh
test_IQA.py		test_IQA.py
train_test_IQA.py		train_test_IQA.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIND: Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment

Abstract

Framework

📁 Data Preparation

Technical Quality Assessment (TQA)

Aesthetic Quality Assessment (AQA)

🎯 Quick Start & Training

Cross-scene Validation

Cross-dataset Evaluation

Intra-dataset Evaluation

Experiment Types

Main Arguments

📊 Results

Cross-scene Validation Performance

Cross-dataset Evaluation

Intra-dataset Evaluation

NTIRE 2024 Champion

Citation

License

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ZachL1/SIND

Folders and files

Latest commit

History

Repository files navigation

SIND: Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment

Abstract

Framework

📁 Data Preparation

Technical Quality Assessment (TQA)

Aesthetic Quality Assessment (AQA)

🎯 Quick Start & Training

Cross-scene Validation

Cross-dataset Evaluation

Intra-dataset Evaluation

Experiment Types

Main Arguments

📊 Results

Cross-scene Validation Performance

Cross-dataset Evaluation

Intra-dataset Evaluation

NTIRE 2024 Champion

Citation

License

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages