Skip to content

This project was done for Samsung innovative campus ( SIC) X Misk

Notifications You must be signed in to change notification settings

RinDataz/ASL_Translation_System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Yad-Tech: Real-Time ASL Recognition System

image

image

Overview

Welcome to the Yad-Tech project repository! This project was developed as part of the Samsung x Misk Innovation Campus program. Our aim is to bridge the communication gap between the Deaf and hearing communities by translating American Sign Language (ASL) gestures into text in real-time. This tool also serves as an educational resource for learning ASL.

Yad-Tech combines Convolutional Neural Networks (CNNs) for image-based gesture recognition and a Random Forest model for real-time prediction adjustments. By integrating these models with Mediapipe for hand landmark detection and OpenCV for webcam support, we created a system that is both highly accurate and efficient in live settings.

Data Analysis

Our initial focus was on the ASL alphabet, including 29 classes: each letter and common functional signs like "space," "delete," and "nothing." The dataset, sourced from Kaggle, includes over 87,000 images, balanced across classes. To address class imbalance and enhance recognition, we applied data augmentation for underrepresented classes.

image

image

image

Modeling

The CNN model was built using a Keras Sequential architecture for high accuracy in image classification. After preprocessing and splitting the dataset into training and test sets, the model achieved remarkable results:

  • Training Accuracy: 99.6%
  • Test Accuracy: 98%

This high performance indicates the CNN’s robustness in identifying static ASL gestures accurately.

image

image

image

The model achieved excellent performance across all ASL alphabet classes, with an overall accuracy of 98%. Key metrics, including precision, recall, and F1-score, are consistently high, with most classes scoring above 0.97 in each metric.

The confusion matrix shows that the model performs well, with high accuracy across most ASL alphabet classes, evidenced by strong values along the diagonal. Minor misclassifications occur between some visually similar classes, but overall, the model generalizes effectively with minimal errors. Fine-tuning or targeted data augmentation could further improve accuracy for the few challenging classes.

Model Accuracy

The training accuracy rapidly increased, reaching nearly 100% within the first few epochs. This indicates that the model quickly learned patterns in the training data. Validation accuracy shows an overall upward trend, although it fluctuates slightly, especially in the early epochs. By the final epochs, validation accuracy stabilizes and aligns closely with training accuracy, suggesting that the model generalizes well without overfitting.

Model Loss

The training loss consistently decreased and approached zero, indicating that the model is minimizing errors on the training data effectively. The validation loss shows a steep decline initially, mirroring the validation accuracy improvement. While it fluctuates slightly across epochs, it ultimately stabilizes close to the training loss, which further indicates good generalization.

To enhance real-time performance, we used a Random Forest Classifier trained on hand landmark data extracted using Mediapipe. This model operates on a confidence threshold to ensure only high-certainty predictions are displayed. The Random Forest model achieved:

  • Accuracy: 99.9% on validation data

image

Accuracy: The model achieved an accuracy of 99.8%, which indicates it’s highly effective at recognizing and classifying signs. Precision, Recall, and F1-Score: Each class shows nearly perfect scores (close to 1.00) in precision, recall, and F1-score, confirming consistent performance across all gestures. This balanced performance is crucial for real-time applications to avoid biases toward any specific gesture.

image

The confusion matrix shows that the model performs exceptionally well across all classes, with almost perfect classification for each sign. The model misclassifies very few instances, as shown by the small number of off-diagonal values, suggesting minimal errors.

The integration of CNN and Random Forest allows the system to operate smoothly in real-time, effectively balancing accuracy and efficiency.

Screenshot 2024-11-12 113338

For real-time ASL gesture recognition, we leveraged a Random Forest model combined with a webcam feed and the Mediapipe library for hand tracking. Mediapipe extracts hand landmarks, allowing us to capture and preprocess real-time gesture data from users’ hands, which is then passed through the Random Forest model for prediction. The model classifies each gesture and updates the output text based on the identified sign language letter or action. Key steps in the real-time recognition process:

Hand Landmark Detection: The Mediapipe library captures and tracks hand landmarks in each video frame, generating x, y coordinates of key points on the user's hand.

• Data Preparation: To ensure consistency with our model’s input requirements, the landmark data is normalized and arranged to match the model’s expected input length.

• Prediction and Text Update: The preprocessed data is passed to the trained Random Forest model, which predicts the ASL letter or gesture. When the confidence level exceeds a specified threshold (e.g., 50%), the system updates the text output accordingly. For actions like "space" and "delete," the text is modified with appropriate spacing or character removal.

Testing and Improvements

The Yad-tech ASL recognition system was rigorously tested on both pre-split test data and additional user-generated images to ensure robust performance. The CNN model achieved a high accuracy of 98%, with misclassifications primarily in visually similar gestures. The Random Forest model, paired with MediaPipe for real-time recognition, showed reliable accuracy during live webcam testing.

Key Improvements:

• Hyperparameter Tuning and Dropout Layers: Adjustments to CNN hyperparameters and the addition of dropout layers helped control overfitting, boosting generalization.

• Confidence Threshold Optimization: Fine-tuning confidence thresholds in the Random Forest model reduced noise, ensuring reliable character insertion in real-time.

• User Feedback: User testing informed refinements to delay settings, making text insertion smoother for real-time interaction.

Conclusions

Yad-Tech combines advanced image classification and real-time processing to create a powerful tool for ASL translation. Our CNN and Random Forest models achieved high accuracy rates, with the Sequential model reaching a test accuracy of 98% and the Random Forest delivering near-perfect real-time accuracy. These models have the potential to enhance communication and learning, bridging a crucial gap between the Deaf and hearing communities.

Future Improvements

To further improve robustness, we aim to:

  • Expand the dataset to include more diverse gestures and variations.
  • Extend the system to recognize dynamic gestures and incorporate regional sign languages.

About

An ASL Translator using CNN for image classification and real-time hand detection with OpenCV and MediaPipe. Achieves high accuracy with user-friendly applications in Streamlit and Flask, supporting both image uploads and real-time webcam translations.

Citations

Team Members

Rinad Almjishai (Repo Owner)

Wael habib Alkiyani

Jawad Abdullah Sherbini

Albatul Ali Abusaq

Luluh Khalid Alyahya

Yasser Ahmed Alzahrani

Ali Abu Ali

About

This project was done for Samsung innovative campus ( SIC) X Misk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published