Official implementation of "Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment" (IEEE TCSVT 2025)
π NTIRE 2024 Portrait Quality Assessment Challenge Winner
The inherent diversity of visual scenes poses a fundamental challenge in blind image quality assessment (BIQA), leading to compromised model generalizability when dealing with unseen scenes. We found that human annotations for images with different visual scenes exhibit distinct quality distribution discrepancies, causing existing BIQA models to overfit to such diversified distributions.
This work presents SIND (Scene-INvariant Distribution), a generalizable BIQA model that addresses this challenge through a distribution alignment framework. Our approach automatically scales and shifts cross-scene distributions into a unified distribution, enabling scene-invariant and quality-aware feature representation. Additionally, we design a token-complementary patch reasoning network to extract comprehensive quality-aware features from both image overview and details.
Distribution alignment visualization: (a) Training data from three scenes, (b) Prediction without alignment, (c) Distribution alignment process, (d) Prediction with alignment
Overview of SIND: (1) Scene Sampling selects k scenes and n images per scene, (2) Distribution Alignment learns scene-specific transformations, (3) Model learns scene-invariant distributions
Dataset Links
Dataset | Images | Description |
---|---|---|
SPAQ | 11,125 | Smartphone Photography Attribute and Quality |
KonIQ-10k | 10,073 | Konstanz Natural Image Quality Database |
LIVEC/LIVEW | 1,162 | LIVE In the Wild Image Quality Challenge Database |
LIVE | 779 | Laboratory for Image & Video Engineering |
BID | 585 | Blur Image Database |
CID2013 | 474 | Color Image Database 2013 |
Dataset | Images | Description |
---|---|---|
EVA | 4,070 | Explainable Visual Aesthetics |
PARA | 31220 | Personalized image Aesthetics database with Rich Attributes |
-
Download datasets following the links above
-
Organize data structure:
data/ βββ SPAQ/ βββ KonIQ-10k/ βββ LIVEC/ βββ CID2013/ βββ BID/ βββ LIVE/ βββ EVA/ βββ PARA/
-
Prepare data splits (Optional):
# Generate custom data splits python data_json/data_json_generator.py # Or use provided splits (recommended) # Pre-generated JSON files are available in data_json/
π Note: Most of the datasets are splited following Q-Align, please refer
data_json/data_json_generator.py
for details.
Train and evaluate on a single dataset with leave-one-out cross-scene validation. Refer to paper and train_test_IQA.py
for more details.
# [Example] Train and evaluate on SPAQ dataset with leave-one-out cross-scene validation
cd "$(dirname "$0")"
pro_dir="./exp_log/leave_one_out/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"
CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
train_test_IQA.py \
--clip_model $clip_model \
--epochs 35 \
--lr 1e-5 \
--warmup_epoch 5 \
--weight_decay 1e-5 \
--batch_size 64 \
--local_global \
--loss_type scale_shift \
--scene_sampling 2 \
--project_dir $pro_dir \
--train_dataset spaq \
--test_dataset spaq \
--exp_type leave-one-out \
>> $pro_dir/train.log
# Main configurations:
# - train_dataset: spaq / koniq10k / eva / para
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: leave-one-out
Train on SPAQ and/or KonIQ-10k dataset and test on other datasets without fine-tuning. Refer to paper and cross_set_exp.sh
for more details.
# [Example] Train on SPAQ, test on KonIQ-10k, LIVEW, LIVE, RBID, CID2013
# use scene-domain alignment (default)
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"
CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
train_test_IQA.py \
--clip_model $clip_model \
--epochs 35 \
--lr 1e-5 \
--warmup_epoch 5 \
--weight_decay 1e-5 \
--batch_size 64 \
--local_global \
--loss_type scale_shift \
--scene_sampling 2 \
--project_dir $pro_dir \
--train_dataset spaq \
--test_dataset koniq10k spaq livec live bid cid2013 \
--exp_type cross-set \
>> $pro_dir/train.log
# [Example] Train on SPAQ and KonIQ-10k, test on KonIQ-10k, SPAQ, LIVEW, LIVE, BID, CID2013
# use dataset-domain alignment
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq_koniq_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"
CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
train_test_IQA.py \
--clip_model $clip_model \
--epochs 35 \
--lr 1e-5 \
--warmup_epoch 5 \
--weight_decay 1e-5 \
--batch_size 64 \
--local_global \
--loss_type scale_shift \
--scene_sampling 2 \
--project_dir $pro_dir \
--train_dataset spaq koniq10k \
--test_dataset koniq10k spaq livec live bid cid2013 \
--exp_type cross-set \
--dataset_domain \
>> $pro_dir/train.log
# Main configurations:
# - train_dataset: spaq / koniq10k / spaq koniq10k
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: cross-set
# - dataset_domain: use dataset-domain alignment instead of scene-domain alignment
Follow the experimental settings of LIQE. We mix six datasets for the training set and learn a single set of weights to test on six test sets. Refer to random_split_exp.sh
for more details.
We conduct experiments on six IQA datasets, among which LIVE, CSIQ, and KADID-10k contain synthetic distortions, while LIVE Challenge, BID, and KonIQ-10K include realistic distortions. We randomly sample 70% and 10% images from each dataset to construct the training and validation set, respectively, leaving the remaining 20% for testing. We repeat this procedure ten times, and report median SRCC and PLCC results as prediction monotonicity and precision measures, respectively.
# [Example] Joint training on mix datasets: koniq10k livec bid kadid10k csiq live
# use dataset-domain alignment
pro_dir="./exp_log/random-split/our_epoch35_bs128_all_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"
CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
train_test_IQA.py \
--clip_model $clip_model \
--epochs 35 \
--lr 1e-5 \
--warmup_epoch 5 \
--weight_decay 1e-5 \
--batch_size 64 \
--local_global \
--loss_type scale_shift \
--scene_sampling 2 \
--project_dir $pro_dir \
--train_dataset koniq10k livec bid kadid10k csiq live \
--test_dataset koniq10k livec bid kadid10k csiq live \
--exp_type random-split \
--dataset_domain \
>> $pro_dir/train.log
The framework supports three types of experiments:
leave-one-out
: Cross-scene validation within a single datasetcross-set
: Cross-dataset validationrandom-split
: Random split validation following the LIQE setup
--train_dataset
: Training dataset (spaq
,koniq10k
,eva
,para
)--test_dataset
: Testing dataset (spaq
,koniq10k
,eva
,para
,livec
,live
,bid
,cid2013
)--local_global
: Enable token-complementary patch reasoning--loss_type
:scale_shift
if using alignment,l1
if not using alignment--scene_sampling
: Number of scenes per GPU in a mini-batch (0
for no scene sampling)--exp_type
: Experiment type (leave-one-out
,cross-set
,random-split
)--dataset_domain
: Use dataset-domain alignment instead of scene-domain alignment
Leave-one-out cross-scene validation on SPAQ, KonIQ-10k, EVA, and PARA datasets:
Dataset | SRCC | PLCC |
---|---|---|
SPAQ | 0.861 | 0.889 |
KonIQ-10k | 0.919 | 0.935 |
EVA | 0.778 | 0.792 |
PARA | 0.902 | 0.940 |
Generalization performance across different datasets:
Cross dataset generalization performance of different methods when trained on SPAQ or KonIQ-10k. Metric is (SRCC+PLCC)/2.
Performance following the LIQE protocol:
SRCC/PLCC values of intra-dataset validation for the TQA task.
Our method won the NTIRE 2024 Portrait Quality Assessment Challenge, demonstrating superior generalization capabilities, especially on DXOMARK's internal Challenge Test
dataset.
Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey
The final submission code and trained weights for the competition are available in the NTIRE2024/Submission.
If you find this work useful for your research, please cite our paper:
@ARTICLE{SIND2025,
author={Huang, Yipo and Duan, Zhichao and Chen, Pengfei and Cai, Li and Li, Leida and Lin, Weisi},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment},
year={2025},
doi={10.1109/TCSVT.2025.3595208}}
This project is released under the MIT License.