A real-time face detection app that identifies ethnicity and reads out an interesting cultural fun fact using text-to-speech. Built with PyTorch, OpenCV, and Edge TTS.
- Webcam detects a face using OpenCV's Haar Cascade
- A ResNet-34 model classifies the detected face's ethnicity
- An LLM generates a fun cultural fact based on the result
- Edge TTS reads the fact aloud in a contextually appropriate voice
The pretrained model used is from the FairFace project — a ResNet-34 trained on the FairFace dataset for balanced, multi-racial face classification.
- Initially trained a custom
FaceIdentifierModelon the UTK Face dataset (train.py,dataset_utk.py) - Switched to the FairFace dataset for better racial diversity coverage (
dataset_fairface.py) - F1 scores on the custom-trained model were too low for reliable real-time use
- Switched to the pretrained FairFace ResNet-34 weights for production use
The .pt files are not included in this repo (too large for GitHub). Download manually:
- Go to the FairFace GitHub
- Download
res34_fair_align_multi_7_20190809.pt - Place it in the
models/folder
- Windows (DirectML is used for AMD GPU acceleration)
- Python 3.11
- AMD GPU (or fall back to CPU — edit
predict.pyaccordingly)
git clone https://github.com/Kiranlimtl/Racial-Fun-Fact.git
cd Racial-Fun-Fact
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txtThe Qwen2-1.5B model (~3GB) will be downloaded automatically from HuggingFace on first run. Make sure you have enough disk space and a stable internet connection before starting.
python predict.pyPress Q to quit.
Racial-Fun-Fact/
├── models/
│ ├── model.py # Custom model definition (from UTK training phase)
│ └── __init__.py
├── utils/
│ └── transforms.py # Image transforms and race/gender label maps
├── data/ # Dataset scripts (not used in inference)
├── predict.py # Main real-time inference script
├── llm.py # Fun fact generation via LLM
├── app.py # App entry point
├── train.py # Training script (UTK/FairFace experimentation)
├── requirements.txt
└── README.md
torch
torchvision
torch-directml
transformers
accelerate
opencv-python
Pillow
numpy
matplotlib
scikit-learn
tqdm
pygame
edge-tts
requests
Note:
torch-directmlis Windows-only for AMD GPU support. On Linux with an AMD GPU, install the ROCm build of PyTorch instead. On NVIDIA, the standard CUDA build works out of the box.
- The app waits 5 seconds after a face is detected before generating a fact, to allow the model to stabilise its prediction across 30 frames
- A new fun fact is generated every 20 seconds while a face remains on screen
- If no face is detected for 3 seconds, the session resets