Skip to content

Latest commit

 

History

History

README.md

Deep Learning for Sentiment Image Classification

This project applies convolutional neural networks (CNNs) to classify facial images into six emotion categories — Happy, Sad, Fear, Pain, Anger, and Disgust.
It focuses on developing a compact, interpretable, and generalisable model for sentiment understanding using deep learning, with emphasis on augmentation strategies, capacity control, and domain adaptation.

The dataset used in this project is collected from multiple online sources and is available at Kaggle and New Domain 6 Emotions

Project Overview

The goal is to build a deep learning pipeline that can:

  • Accurately predict human emotions from facial imagery.
  • Handle small, imbalanced datasets through augmentation and class weighting.
  • Improve generalisation with targeted data transformations and adaptive fine-tuning.
  • Benchmark performance under domain shift scenarios.

Dataset Summary

Emotion Sample Count Share (%)
Happy 230 20.0
Sad 224 19.5
Anger 214 18.6
Pain 162 14.1
Disgust 159 13.8
Fear 159 13.8

Dataset size: ~1.15K labeled images.
Class imbalance mitigated using WeightedRandomSampler and weighted loss.

image image

Experiments & Results

1️⃣ Baseline Model — SmallCNN

  • Parameters: 389,958
  • Architecture: 4 Conv–ReLU–MaxPool blocks + GlobalAvgPool + Dropout + Linear
  • Input size: 160×160×3
  • Optimizer: AdamW (LR=1e-3, WD=1e-4)
  • Loss: Weighted Cross-Entropy + Label Smoothing (0.05)
  • Train/Val/Test Split: 70/15/15
Metric Accuracy Macro Precision Macro Recall Macro F1
SmallCNN (Baseline) 0.1734 0.1320 0.2042 0.1272

The baseline performance barely exceeds random chance (≈16.7%), suggesting insufficient capacity and limited invariance to facial variations.

image image image

2️⃣ Data Augmentation — Standard Policy

Augmentations: random resized crop, flip, small rotation (±10°), mild brightness/contrast jitter.
All else constant between augmented and non-augmented runs.

Variant Accuracy Macro Precision Macro Recall Macro F1
No Augmentation 0.2197 0.1868 0.2366 0.1946
With Augmentation 0.2023 0.0745 0.2431 0.1126

While augmentation improved recall for minority classes, it decreased precision and overall F1, indicating the need to fine-tune augmentation strength.

image image image

3️⃣ Model Capacity — SmallCNN vs WiderCNN

Model Parameters Accuracy Macro Precision Macro Recall Macro F1
SmallCNN 390K 0.2081 0.1556 0.2083 0.1334
WiderCNN 764K 0.2197 0.2023 0.2161 0.1585

Observation: WiderCNN outperformed SmallCNN in macro metrics (+0.025 F1), validating moderate capacity scaling for small datasets.

image image image

4️⃣ Targeted Augmentation Refinement

After error analysis (e.g., Sad ↔ Fear, Pain ↔ Anger confusions), grayscale jitter and translation were added to simulate lighting and pose variability.

Metric Before After
Accuracy 0.1734 0.2081
Macro Precision 0.1320 0.1885
Macro Recall 0.2042 0.2478
Macro F1 0.1272 0.1483

Targeted augmentation increased macro-F1 by 16.6%, reducing bias across emotional classes.

image

Before Refinement

image

After Refinement

image

5️⃣ Cross-Domain Generalisation

Evaluated model performance on a new dataset with different lighting and demographics.

Variant Accuracy Macro Precision Macro Recall Macro F1
Baseline 0.1255 0.0417 0.1294 0.0586
TTA (flip averaging) 0.1255 0.0417 0.1294 0.0586
Few-Shot Adapt (10 imgs/class) 0.1365 0.0476 0.1386 0.0645

Few-shot fine-tuning improved macro F1 by ~10%, proving that limited adaptation can recover domain performance under severe shift.

image image

⚠️ Notes

Please be advised that the reported results are approximate and may vary slightly between runs due to:

  • Random weight initialisation
  • Stochastic data loading and augmentation
  • Hardware and runtime variability (especially on GPU environments)

Each experiment was executed under the same configuration, but reproducibility across environments may show ±2–3% metric deviation.

Tech Stack

Category Tools / Libraries
Language Python 3.12
Frameworks PyTorch, Torchvision
Data Processing Pandas, NumPy
Visualization Matplotlib, Seaborn
Metrics scikit-learn
Environment Google Colab / Local GPU

Key Insights

  • WiderCNN achieved the best overall balance with a 0.1585 Macro-F1, outperforming the smaller baseline.
  • Error-driven augmentation refined the augmentation policy and improved F1 by ~16%.
  • Few-shot adaptation boosted new-domain F1 from 0.0586 → 0.0645, confirming adaptability potential.
  • Despite improvements, the task remains challenging due to high inter-class similarity and limited data diversity.

Future Work

  • Incorporate transfer learning from pretrained face recognition models (e.g., ResNet-18, VGGFace2).
  • Experiment with Vision Transformers (ViT) for global emotion context.
  • Integrate domain-adversarial training for better robustness under distribution shift.
  • Expand dataset diversity and use synthetic data for underrepresented classes.

Author

Author: [Frank Dinh]
Email: [dinh.qnhat@gmail.com]
Year: 2025

© 2025 All Rights Reserved. Unauthorized reuse or redistribution is prohibited.