Text-to-speech extension for Oobabooga's text-generation-webui using Coqui TTS.
Assuming you already have the WebUI set up:
- Install eSpeak-NG and ensure it is in your PATH
- Activate the conda environment with the
cmd_xxx.bator usingconda activate textgen - Enter the
text-generation-webui/extensions/directory and clone this repository
cd text-generation-webui/extensions/
git clone https://github.com/Fire-Input/text-generation-webui-coqui-tts coqui_tts
- install the requirements
pip install -r extensions/coqui_tts/requirements.txt
- The
coqui_ttsextension will automatically download the pretrained modeltts_models/en/vctk/vitsby default. It is less than 200MB in size, and will be downloaded to\home\USER\.local\share\ttsfor Linux andC:\Users\USER\AppData\Local\ttsfor Windows. - When running oobabooga, the
ttspackage (versionTTS==0.17.4) may throw an error aboutnumpyif you are using python <3.11, trypip install numpy==1.24.4andpip install numba==0.57.1to install the most compatible version ofnumpyandnumbafor this version. Ignore any error messages about incompatible package versions as thettspackage needs to update itsrequirements.txtto later versions ofnumpyandnumbaand restart the WebUI. - Custom models are not supported yet.
- Everytime you generate a new audio, Coqui will print out a log message to the console. This is normal and unfortunately cannot be disabled.
- Audio files are saved to
text-generation-webui/extensions/coqui_tts/outputs/ - A lot of the code is copied from the ElevenLabs extension.
- And some code copied from da3dsoul's fork.
- I do not have a Coqui Studio API key, so I cannot test it. Therefore, it is not supported yet.
- Windows 11
- Conda Installation with WSL2
- WSL2 Ubuntu 22.04
- Python 3.9.16
- numpy==1.21.6
- Conda 23.3.1
- CUDA 11.7
- WebUI commit: 68dcbc7ebda3f0d9700dde43d0d29324f5c244b1
- eSpeak-NG 1.50