This README documents the process of setting up and running a custom Bangla Text-to-Speech (TTS) model using the Coqui TTS library with a VITS female voice model. It includes the steps taken to resolve compatibility issues and successfully generate speech output.
- Python Version: Python 3.10
- Operating System: macOS (tested on MacBook Pro)
- Virtual Environment: A Python virtual environment is recommended to manage dependencies.
- Model Files: Custom VITS female Bangla model files (
model_file.pth
andconfig.json
) located at/Users/Desktop/cq/tts_models--bn--custom--vits_female/
. - Dependencies:
- Coqui TTS library (specific version required, see below)
- Additional libraries:
torch
,numpy
,soundfile
,librosa
-
Create and Activate a Virtual Environment:
python3 -m venv /Users/Desktop/cq/env source /Users/Desktop/cq/env/bin/activate
-
Install Coqui TTS (Specific Version):
- Initially attempted to install the latest Coqui TTS version using
pip install TTS
, but the model download failed due to compatibility issues. - The custom model (
tts_models--bn--custom--vits_female
) was manually downloaded from the source as a ZIP file (tts_models--bn--custom--vits_female.zip
). - Unzipped the model to
/Users/Desktop/cq/tts_models--bn--custom--vits_female/
, containingmodel_file.pth
andconfig.json
. - Installed Coqui TTS version 0.13.0, as the model appeared compatible with version 0.13.3 (based on the model’s directory name
v0.13.3_models
):pip install TTS==0.13.0
- Installed additional dependencies:
pip install torch numpy soundfile librosa
- Initially attempted to install the latest Coqui TTS version using
-
Resolve Compatibility Issues:
- Running the initial script resulted in errors:
AttributeError: 'TTS' object has no attribute 'is_multi_lingual'
: Occurred because the model’sconfig.json
lacked anis_multi_lingual
field.TypeError: argument of type 'NoneType' is not iterable
: Occurred in theis_coqui_studio
check becausemodel_name
wasNone
.
- Fix 1: Modify
config.json
:- Opened
/Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json
. - Added
"is_multi_lingual": false
to indicate the model is single-language (Bangla):{ "output_path": "/home/ansary/Shabab/", "is_multi_lingual": false, "logger_uri": null, "run_name": "vits_4_nov", ... }
- Saved the file.
- Opened
- Fix 2: Patch the TTS Library:
- The
TypeError
persisted due toself.model_name
beingNone
in theis_coqui_studio
check. - Modified the Coqui TTS library to handle
model_name=None
:- Located
/Users/Desktop/cq/env/lib/python3.10/site-packages/TTS/api.py
. - Found the
is_coqui_studio
property (around line 296):@property def is_coqui_studio(self): return "coqui_studio" in self.model_name
- Replaced it with:
@property def is_coqui_studio(self): model_name = self.model_name if self.model_name is not None else "" return "coqui_studio" in model_name
- Saved the file.
- Note: This is a temporary workaround. Consider updating to a newer TTS version for a permanent fix (see Troubleshooting).
- Located
- The
- Running the initial script resulted in errors:
-
Create the Script:
- Save the following code in
/User/Desktop/cq/text2speech.py
:from TTS.api import TTS # Load the female Bangla model from the local path tts = TTS( model_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/model_file.pth", config_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json", gpu=False ) # Synthesize speech tts.tts_to_file( text="আকাশে মেঘের ভেলা, নদীতে স্রোতের খেলা, প্রকৃতির এই রূপে মন হয় উতলা। সবুজ পাহাড়, ফুলের বাগান, বাংলার সৌন্দর্যে মুগ্ধ সব মানুষের মন।", file_path="/Users/Desktop/cq/bangla_output.wav" )
- Save the following code in
-
Run the Script:
python /Users/Desktop/cq/text2speech.py
-
Verify Output:
- The script generates
bangla_output.wav
in/Users/Desktop/cq/tts_models--bn--custom--vits_female/
. - Play the output to verify:
afplay /Users/Desktop/cq/bangla_output.wav
- The script generates
- Error:
AttributeError: 'TTS' object has no attribute 'is_multi_lingual'
:- Ensure
"is_multi_lingual": false
is added toconfig.json
.
- Ensure
- Error:
TypeError: argument of type 'NoneType' is not iterable
:- Verify the
is_coqui_studio
patch inTTS/api.py
is applied correctly.
- Verify the
- Error:
KeyError: 'bn'
:- Avoid using
model_name
in theTTS
constructor, as it triggers a model zoo lookup for non-existent Bangla models.
- Avoid using
- Library Compatibility:
- If issues persist, try updating to the latest TTS version:
pip install --upgrade TTS
- Alternatively, try an older version (e.g., 0.11.0) if the model was trained with an earlier version:
pip install TTS==0.11.0
- If issues persist, try updating to the latest TTS version:
- Dependencies:
- Ensure all required libraries are installed:
pip install torch numpy soundfile librosa
- Ensure all required libraries are installed:
- Verbose Output:
- Add
progress_bar=True
totts_to_file
for debugging:tts.tts_to_file( text="...", file_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/bangla_output.wav", progress_bar=True )
- Add
- The model is a single-language VITS model for Bangla, trained with Coqui TTS version ~0.13.3 (based on the directory name
v0.13.3_models
). - The
is_coqui_studio
patch is a temporary workaround. Updating to a newer TTS version may eliminate the need for this modification. - The
UserWarning
abouttorch.nn.utils.weight_norm
is benign and can be ignored.
- The custom Bangla VITS models are available in two variants: male and female voices.
- Model Structure:
"bn": { "custom": { "vits-male", "vits-female" } }
- Download Links:
- Bangla Male Model: tts_models--bn--custom--vits_male.zip
- Bangla Female Model: tts_models--bn--custom--vits_female.zip
- The female model was manually downloaded due to issues with automatic downloading via
pip install TTS
. - Unzip the downloaded ZIP file to
/Users/Desktop/cq/tts_models--bn--custom--vits_female/
to obtainmodel_file.pth
andconfig.json
.