Skip to content

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

License

Notifications You must be signed in to change notification settings

JoeNatan30/spoter

 
 

Repository files navigation

Top Banner

by Matyáš Boháček and Marek Hrúz, University of West Bohemia
Should you have any questions or inquiries, feel free to contact us here.

PWC

This is a forked version of the original repository created by Matyáš Boháček and is based on the research paper titled "Sign Pose-based Transformer for Word-level Sign Language Recognition."

Research Contributions

In the course of our research using Spoter, we have made several contributions, including the following papers and extended abstract:

In this paper, we delve into the significance of pose estimation models for landmark-based sign language recognition. We specifically explore the utilization of 29 and 71 landmarks from Mediapipe, Openpose, and RHnet models to input into both the Spoter and a graph-based model. Through our analysis, we conclude that the Mediapipe model in combination with the Spoter model exhibits better compatibility with our dataset. Interestingly, we observe that employing 71 points yields positive outcomes, but subsequent experiments led us to discover that using 54 points actually yields superior results.

The strategies detailed in this paper are designed to counter overfitting by making modifications to both the data and the training process. Our findings underscore the effectiveness of employing a combination of the AEC and PUCP305 techniques, which yield notable improvements in our results. Additionally, we highlight the significance of data augmentation, label smoothing, and model complexity reduction in enhancing model generalization. These insights have led us to make certain parameter adjustments to the Spoter model to better balance complexity and performance.

This study investigates how shortening videos due to missing landmarks affects the performance of our sign language recognition model. Our research reveals that while reducing video length does result in a slight drop in model performance, we've chosen to retain videos with missing parts in our dataset.

Please feel free to explore this repository and the associated papers to gain a deeper understanding of our research and its outcomes.

For any questions, comments, or collaborations, please don't hesitate to get in touch!

Get Started

First, make sure to install all necessary dependencies using:

pip install -r requirements.txt

Create an account on Weights & Biases to facilitate experiment tracking and reproducibility. Then please set up your WANDB_API_KEY in your environment.

export WANDB_API_KEY=your_api_key_here

To train the model, simply specify the hyperparameters and run the following:

python -m train
  --experiment_name [str; name of the experiment to name the output logs and plots in WandB]
  
  --epochs [int; number of epochs]
  --lr [float; learning rate]
  
  --training_set_path [str; path to the H5 file with training set's skeletal data]
  --validation_set_path [str; path to the H5 file with validation set's skeletal data]
  --device [int; GPU number if GPU is available]

The hyperparameter modifications made during our research are hardcoded in the repository, so you can directly experiment with the provided hyperparameters.

Reproduce our results

To reproduce the results of our you need to download from this link the following files:

  • DGI305-AEC--50--mediapipe--Train.hdf5
  • DGI305-AEC--50--mediapipe--Val.hdf5
  • points_54.csv

To obtain these files, please contact us.

Test the inference

To use the inference, you'll need to download the following files from the same link:

  • points_54.csv
  • meaning.json
  • spoter-50Classes-68_5Top1acc_87Top5acc.pth

To obtain these files, please contact us.

After obtaining the necessary files, you can run the inference script:

python inference.py

This will allow you to test the model's performance on new data and see its predictions in action.

License

The code is published under the Apache License 2.0 which allows for both academic and commercial use as presented in the original repository of Matyáš Boháček.

About

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%