[WIP] End-to-End Mispronunciation Detection and Diagnosis (MDD) Service

This project implements an End-to-End Mispronunciation Detection and Diagnosis (E2E-MDD) service based on an end-to-end phone recognizer using wav2vec CTC. The goal is to provide:

MDD Feedback: Detailed diagnostic feedback on mispronunciations (Dictate).
Pronunciation Scoring: Using Goodness of Pronunciation (GOP) to assess phoneme-level scores.
Teaching Suggestions: Automatically generated pedagogical feedback using Large Language Models (LLMs).

Features

End-to-End Design: Streamlines the MDD process by integrating wav2vec CTC for phoneme recognition.
Comprehensive Feedback: Combines GOP scoring with LLM-generated teaching suggestions.

References

GOP Methodology
- Cao, X., Fan, Z., Svendsen, T., & Salvi, G. (2024). A Framework for Phoneme-Level Pronunciation Assessment Using CTC. Proc. Interspeech 2024 (pp. 302-306).
LLM Feedback Generation
- Zhong, H., Xie, Y., & Yao, Z. (2024). Leveraging Large Language Models to Refine Automatic Feedback Generation at Articulatory Level in Computer-Aided Pronunciation Training. Proc. Interspeech 2024 (pp. 2600-2604).

Setup

Step 1: Install Dependencies

Follow these steps to set up the environment:

conda environment

conda create -n wav2vec2-mdd python==3.8.0
conda activate wav2vec2-mdd

install requirements

pip install -r requirements.txt

torch cuda version ( see https://pytorch.org/get-started/previous-versions) )

pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116

Assuming you've already installed HuggingFace transformers library, you need also to install the ctcdecode library

git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .

Step 2: Prepare the Model

Download the pre-trained model:
- model.zip
Place the downloaded file into the directory:
- models/mdd/
Navigate to the directory:
```
cd models/mdd
```
Unzip the file:
```
unzip wav2vec2-mdd.zip
```

Usage

To start the MDD service, use the following command:

python app.py --timeout 12000

Testing

To test the service, execute:

python client.py

Acknowledgement

This project was made possible through the contributions of:

Fu-An Chao (Wav2vec2-mdd)
Yu-Hsuan Fang (LLM Feedback)

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
__pycache__		__pycache__
conf		conf
example		example
local		local
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
build.envd		build.envd
client.py		client.py
decoder.py		decoder.py
index.html		index.html
path.sh		path.sh
requirements.txt		requirements.txt
utils.py		utils.py
wer.py		wer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WIP] End-to-End Mispronunciation Detection and Diagnosis (MDD) Service

Features

References

Setup

Step 1: Install Dependencies

Step 2: Prepare the Model

Usage

Testing

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[WIP] End-to-End Mispronunciation Detection and Diagnosis (MDD) Service

Features

References

Setup

Step 1: Install Dependencies

Step 2: Prepare the Model

Usage

Testing

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages