GitHub - JoeNatan30/spoter: Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

by Matyáš Boháček and Marek Hrúz, University of West Bohemia
Should you have any questions or inquiries, feel free to contact us here.

This is a forked version of the original repository created by Matyáš Boháček and is based on the research paper titled "Sign Pose-based Transformer for Word-level Sign Language Recognition."

Research Contributions

In the course of our research using Spoter, we have made several contributions, including the following papers and extended abstract:

Impact of Pose Estimation Models for Landmark-based Sign Language Recognition

In this paper, we delve into the significance of pose estimation models for landmark-based sign language recognition. We specifically explore the utilization of 29 and 71 landmarks from Mediapipe, Openpose, and RHnet models to input into both the Spoter and a graph-based model. Through our analysis, we conclude that the Mediapipe model in combination with the Spoter model exhibits better compatibility with our dataset. Interestingly, we observe that employing 71 points yields positive outcomes, but subsequent experiments led us to discover that using 54 points actually yields superior results.

Less is More: Techniques to Reduce Overfitting in your Transformer Model for Sign Language Recognition

The strategies detailed in this paper are designed to counter overfitting by making modifications to both the data and the training process. Our findings underscore the effectiveness of employing a combination of the AEC and PUCP305 techniques, which yield notable improvements in our results. Additionally, we highlight the significance of data augmentation, label smoothing, and model complexity reduction in enhancing model generalization. These insights have led us to make certain parameter adjustments to the Spoter model to better balance complexity and performance.

Impact of Video Length Reduction due to Missing Landmarks on Sign Language Recognition Model

This study investigates how shortening videos due to missing landmarks affects the performance of our sign language recognition model. Our research reveals that while reducing video length does result in a slight drop in model performance, we've chosen to retain videos with missing parts in our dataset.

Please feel free to explore this repository and the associated papers to gain a deeper understanding of our research and its outcomes.

For any questions, comments, or collaborations, please don't hesitate to get in touch!

Get Started

First, make sure to install all necessary dependencies using:

pip install -r requirements.txt

Create an account on Weights & Biases to facilitate experiment tracking and reproducibility. Then please set up your WANDB_API_KEY in your environment.

export WANDB_API_KEY=your_api_key_here

To train the model, simply specify the hyperparameters and run the following:

python -m train
  --experiment_name [str; name of the experiment to name the output logs and plots in WandB]
  
  --epochs [int; number of epochs]
  --lr [float; learning rate]
  
  --training_set_path [str; path to the H5 file with training set's skeletal data]
  --validation_set_path [str; path to the H5 file with validation set's skeletal data]
  --device [int; GPU number if GPU is available]

The hyperparameter modifications made during our research are hardcoded in the repository, so you can directly experiment with the provided hyperparameters.

Reproduce our results

To reproduce the results of our you need to download from this link the following files:

DGI305-AEC--50--mediapipe--Train.hdf5
DGI305-AEC--50--mediapipe--Val.hdf5
points_54.csv

To obtain these files, please contact us.

Test the inference

To use the inference, you'll need to download the following files from the same link:

points_54.csv
meaning.json
spoter-50Classes-68_5Top1acc_87Top5acc.pth

To obtain these files, please contact us.

After obtaining the necessary files, you can run the inference script:

python inference.py

This will allow you to test the model's performance on new data and see its predictions in action.

License

The code is published under the Apache License 2.0 which allows for both academic and commercial use as presented in the original repository of Matyáš Boháček.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
augmentations		augmentations
data_structurization		data_structurization
datasets		datasets
normalization		normalization
spoter		spoter
.gitignore		.gitignore
LICENSE		LICENSE
Mapeo landmarks librerias.csv		Mapeo landmarks librerias.csv
README.md		README.md
config.json		config.json
fuerte_493.mp4		fuerte_493.mp4
inference.py		inference.py
meaning.json		meaning.json
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Contributions

Get Started

Reproduce our results

Test the inference

License

About

Releases

Packages

Languages

License

JoeNatan30/spoter

Folders and files

Latest commit

History

Repository files navigation

Research Contributions

Get Started

Reproduce our results

Test the inference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages