This project implements an End-to-End Mispronunciation Detection and Diagnosis (E2E-MDD) service based on an end-to-end phone recognizer using wav2vec CTC. The goal is to provide:
- MDD Feedback: Detailed diagnostic feedback on mispronunciations (Dictate).
- Pronunciation Scoring: Using Goodness of Pronunciation (GOP) to assess phoneme-level scores.
- Teaching Suggestions: Automatically generated pedagogical feedback using Large Language Models (LLMs).
- End-to-End Design: Streamlines the MDD process by integrating wav2vec CTC for phoneme recognition.
- Comprehensive Feedback: Combines GOP scoring with LLM-generated teaching suggestions.
-
GOP Methodology
- Cao, X., Fan, Z., Svendsen, T., & Salvi, G. (2024). A Framework for Phoneme-Level Pronunciation Assessment Using CTC. Proc. Interspeech 2024 (pp. 302-306).
-
LLM Feedback Generation
- Zhong, H., Xie, Y., & Yao, Z. (2024). Leveraging Large Language Models to Refine Automatic Feedback Generation at Articulatory Level in Computer-Aided Pronunciation Training. Proc. Interspeech 2024 (pp. 2600-2604).
Follow these steps to set up the environment:
- conda environment
conda create -n wav2vec2-mdd python==3.8.0
conda activate wav2vec2-mdd
- install requirements
pip install -r requirements.txt
- torch cuda version ( see https://pytorch.org/get-started/previous-versions) )
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
- Assuming you've already installed HuggingFace transformers library, you need also to install the ctcdecode library
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode && pip install .
- Download the pre-trained model:
- Place the downloaded file into the directory:
models/mdd/
- Navigate to the directory:
cd models/mdd - Unzip the file:
unzip wav2vec2-mdd.zip
To start the MDD service, use the following command:
python app.py --timeout 12000To test the service, execute:
python client.pyThis project was made possible through the contributions of:
- Fu-An Chao (Wav2vec2-mdd)
- Yu-Hsuan Fang (LLM Feedback)