Skip to content
/ SIND Public

Scene-Invariant Distribution Learning for Generalizable Image Quality Assessment, TCSVT2025

Notifications You must be signed in to change notification settings

ZachL1/SIND

Repository files navigation

SIND: Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment

Paper NTIRE 2024 Champion License

Official implementation of "Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment" (IEEE TCSVT 2025)

πŸ† NTIRE 2024 Portrait Quality Assessment Challenge Winner

Abstract

The inherent diversity of visual scenes poses a fundamental challenge in blind image quality assessment (BIQA), leading to compromised model generalizability when dealing with unseen scenes. We found that human annotations for images with different visual scenes exhibit distinct quality distribution discrepancies, causing existing BIQA models to overfit to such diversified distributions.

This work presents SIND (Scene-INvariant Distribution), a generalizable BIQA model that addresses this challenge through a distribution alignment framework. Our approach automatically scales and shifts cross-scene distributions into a unified distribution, enabling scene-invariant and quality-aware feature representation. Additionally, we design a token-complementary patch reasoning network to extract comprehensive quality-aware features from both image overview and details.


Distribution alignment visualization: (a) Training data from three scenes, (b) Prediction without alignment, (c) Distribution alignment process, (d) Prediction with alignment

Framework


Overview of SIND: (1) Scene Sampling selects k scenes and n images per scene, (2) Distribution Alignment learns scene-specific transformations, (3) Model learns scene-invariant distributions

πŸ“ Data Preparation

Technical Quality Assessment (TQA)

Dataset Links
Dataset Images Description
SPAQ 11,125 Smartphone Photography Attribute and Quality
KonIQ-10k 10,073 Konstanz Natural Image Quality Database
LIVEC/LIVEW 1,162 LIVE In the Wild Image Quality Challenge Database
LIVE 779 Laboratory for Image & Video Engineering
BID 585 Blur Image Database
CID2013 474 Color Image Database 2013

Aesthetic Quality Assessment (AQA)

Dataset Images Description
EVA 4,070 Explainable Visual Aesthetics
PARA 31220 Personalized image Aesthetics database with Rich Attributes
  1. Download datasets following the links above

  2. Organize data structure:

    data/
    β”œβ”€β”€ SPAQ/
    β”œβ”€β”€ KonIQ-10k/
    β”œβ”€β”€ LIVEC/
    β”œβ”€β”€ CID2013/
    β”œβ”€β”€ BID/
    β”œβ”€β”€ LIVE/
    β”œβ”€β”€ EVA/
    └── PARA/
    
  3. Prepare data splits (Optional):

    # Generate custom data splits
    python data_json/data_json_generator.py
    
    # Or use provided splits (recommended)
    # Pre-generated JSON files are available in data_json/

πŸ“ Note: Most of the datasets are splited following Q-Align, please refer data_json/data_json_generator.py for details.

🎯 Quick Start & Training

Cross-scene Validation

Train and evaluate on a single dataset with leave-one-out cross-scene validation. Refer to paper and train_test_IQA.py for more details.

# [Example] Train and evaluate on SPAQ dataset with leave-one-out cross-scene validation

cd "$(dirname "$0")"
pro_dir="./exp_log/leave_one_out/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq \
                    --test_dataset spaq \
                    --exp_type leave-one-out \
                    >> $pro_dir/train.log 

# Main configurations:
# - train_dataset: spaq / koniq10k / eva / para
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: leave-one-out

Cross-dataset Evaluation

Train on SPAQ and/or KonIQ-10k dataset and test on other datasets without fine-tuning. Refer to paper and cross_set_exp.sh for more details.

# [Example] Train on SPAQ, test on KonIQ-10k, LIVEW, LIVE, RBID, CID2013
# use scene-domain alignment (default)
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq \
                    --test_dataset koniq10k spaq livec live bid cid2013 \
                    --exp_type cross-set \
                    >> $pro_dir/train.log 

# [Example] Train on SPAQ and KonIQ-10k, test on KonIQ-10k, SPAQ, LIVEW, LIVE, BID, CID2013
# use dataset-domain alignment
cd "$(dirname "$0")"
pro_dir="./exp_log/cross-set/our_epoch35_bs128_spaq_koniq_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset spaq koniq10k \
                    --test_dataset koniq10k spaq livec live bid cid2013 \
                    --exp_type cross-set \
                    --dataset_domain \
                    >> $pro_dir/train.log 

# Main configurations:
# - train_dataset: spaq / koniq10k / spaq koniq10k
# - loss_type: scale_shift if using alignment, l1 if not using alignment
# - scene_sampling: 2 (per GPU) if using alignment, 0 if not using alignment
# - local_global: enable token-complementary patch reasoning
# - exp_type: cross-set
# - dataset_domain: use dataset-domain alignment instead of scene-domain alignment

Intra-dataset Evaluation

Follow the experimental settings of LIQE. We mix six datasets for the training set and learn a single set of weights to test on six test sets. Refer to random_split_exp.sh for more details.

We conduct experiments on six IQA datasets, among which LIVE, CSIQ, and KADID-10k contain synthetic distortions, while LIVE Challenge, BID, and KonIQ-10K include realistic distortions. We randomly sample 70% and 10% images from each dataset to construct the training and validation set, respectively, leaving the remaining 20% for testing. We repeat this procedure ten times, and report median SRCC and PLCC results as prediction monotonicity and precision measures, respectively.

# [Example] Joint training on mix datasets: koniq10k livec bid kadid10k csiq live
# use dataset-domain alignment
pro_dir="./exp_log/random-split/our_epoch35_bs128_all_dataset"
mkdir -p $pro_dir
clip_model="openai/ViT-B-16"

CUDA_VISIBLE_DEVICES=0,1 accelerate launch \
                    train_test_IQA.py  \
                    --clip_model $clip_model \
                    --epochs 35 \
                    --lr 1e-5  \
                    --warmup_epoch 5 \
                    --weight_decay 1e-5 \
                    --batch_size 64 \
                    --local_global \
                    --loss_type scale_shift  \
                    --scene_sampling 2 \
                    --project_dir $pro_dir \
                    --train_dataset koniq10k livec bid kadid10k csiq live \
                    --test_dataset koniq10k livec bid kadid10k csiq live \
                    --exp_type random-split \
                    --dataset_domain \
                    >> $pro_dir/train.log 

Experiment Types

The framework supports three types of experiments:

  • leave-one-out: Cross-scene validation within a single dataset
  • cross-set: Cross-dataset validation
  • random-split: Random split validation following the LIQE setup

Main Arguments

  • --train_dataset: Training dataset (spaq, koniq10k, eva, para)
  • --test_dataset: Testing dataset (spaq, koniq10k, eva, para, livec, live, bid, cid2013)
  • --local_global: Enable token-complementary patch reasoning
  • --loss_type: scale_shift if using alignment, l1 if not using alignment
  • --scene_sampling: Number of scenes per GPU in a mini-batch (0 for no scene sampling)
  • --exp_type: Experiment type (leave-one-out, cross-set, random-split)
  • --dataset_domain: Use dataset-domain alignment instead of scene-domain alignment

πŸ“Š Results

Cross-scene Validation Performance

Leave-one-out cross-scene validation on SPAQ, KonIQ-10k, EVA, and PARA datasets:

Dataset SRCC PLCC
SPAQ 0.861 0.889
KonIQ-10k 0.919 0.935
EVA 0.778 0.792
PARA 0.902 0.940

Cross-dataset Evaluation

Generalization performance across different datasets:


Cross dataset generalization performance of different methods when trained on SPAQ or KonIQ-10k. Metric is (SRCC+PLCC)/2.

Intra-dataset Evaluation

Performance following the LIQE protocol:


SRCC/PLCC values of intra-dataset validation for the TQA task.

NTIRE 2024 Champion

Our method won the NTIRE 2024 Portrait Quality Assessment Challenge, demonstrating superior generalization capabilities, especially on DXOMARK's internal Challenge Test dataset.


Deep Portrait Quality Assessment. A NTIRE 2024 Challenge Survey

The final submission code and trained weights for the competition are available in the NTIRE2024/Submission.

Citation

If you find this work useful for your research, please cite our paper:

@ARTICLE{SIND2025,
  author={Huang, Yipo and Duan, Zhichao and Chen, Pengfei and Cai, Li and Li, Leida and Lin, Weisi},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Scene-invariant Distribution for Generalizable Blind Image Quality Assessment}, 
  year={2025},
  doi={10.1109/TCSVT.2025.3595208}}

License

This project is released under the MIT License.

Acknowledgements

  • Built upon the OpenCLIP and HyperIQA
  • Evaluation metrics: SRCC (Spearman Rank Correlation Coefficient) and PLCC (Pearson Linear Correlation Coefficient)

About

Scene-Invariant Distribution Learning for Generalizable Image Quality Assessment, TCSVT2025

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages