Bangla TTS with Custom VITS Female Model

This README documents the process of setting up and running a custom Bangla Text-to-Speech (TTS) model using the Coqui TTS library with a VITS female voice model. It includes the steps taken to resolve compatibility issues and successfully generate speech output.

Prerequisites

Python Version: Python 3.10
Operating System: macOS (tested on MacBook Pro)
Virtual Environment: A Python virtual environment is recommended to manage dependencies.
Model Files: Custom VITS female Bangla model files (model_file.pth and config.json) located at /Users/Desktop/cq/tts_models--bn--custom--vits_female/.
Dependencies:
- Coqui TTS library (specific version required, see below)
- Additional libraries: torch, numpy, soundfile, librosa

Setup Instructions

Create and Activate a Virtual Environment:

python3 -m venv /Users/Desktop/cq/env
source /Users/Desktop/cq/env/bin/activate

Install Coqui TTS (Specific Version):
- Initially attempted to install the latest Coqui TTS version using pip install TTS, but the model download failed due to compatibility issues.
- The custom model (tts_models--bn--custom--vits_female) was manually downloaded from the source as a ZIP file (tts_models--bn--custom--vits_female.zip).
- Unzipped the model to /Users/Desktop/cq/tts_models--bn--custom--vits_female/, containing model_file.pth and config.json.
- Installed Coqui TTS version 0.13.0, as the model appeared compatible with version 0.13.3 (based on the model’s directory name v0.13.3_models):
```
pip install TTS==0.13.0
```
- Installed additional dependencies:
```
pip install torch numpy soundfile librosa
```
Resolve Compatibility Issues:
- Running the initial script resulted in errors:
  - AttributeError: 'TTS' object has no attribute 'is_multi_lingual': Occurred because the model’s config.json lacked an is_multi_lingual field.
  - TypeError: argument of type 'NoneType' is not iterable: Occurred in the is_coqui_studio check because model_name was None.
- Fix 1: Modify config.json:
  - Opened /Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json.
  - Added "is_multi_lingual": false to indicate the model is single-language (Bangla):
```
{
    "output_path": "/home/ansary/Shabab/",
    "is_multi_lingual": false,
    "logger_uri": null,
    "run_name": "vits_4_nov",
    ...
}
```
  - Saved the file.
- Fix 2: Patch the TTS Library:
  - The TypeError persisted due to self.model_name being None in the is_coqui_studio check.
  - Modified the Coqui TTS library to handle model_name=None:
    - Located /Users/Desktop/cq/env/lib/python3.10/site-packages/TTS/api.py.
    - Found the is_coqui_studio property (around line 296):
      @property def is_coqui_studio(self): return "coqui_studio" in self.model_name
    - Replaced it with:
      @property def is_coqui_studio(self): model_name = self.model_name if self.model_name is not None else "" return "coqui_studio" in model_name
    - Saved the file.
    - Note: This is a temporary workaround. Consider updating to a newer TTS version for a permanent fix (see Troubleshooting).

Running the TTS Script

Create the Script:

Save the following code in /User/Desktop/cq/text2speech.py:

from TTS.api import TTS

# Load the female Bangla model from the local path
tts = TTS(
    model_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/model_file.pth",
    config_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json",
    gpu=False
)

# Synthesize speech
tts.tts_to_file(
    text="আকাশে মেঘের ভেলা, নদীতে স্রোতের খেলা, প্রকৃতির এই রূপে মন হয় উতলা। সবুজ পাহাড়, ফুলের বাগান, বাংলার সৌন্দর্যে মুগ্ধ সব মানুষের মন।",
    file_path="/Users/Desktop/cq/bangla_output.wav"
)

Run the Script:
```
python /Users/Desktop/cq/text2speech.py
```
Verify Output:
- The script generates bangla_output.wav in /Users/Desktop/cq/tts_models--bn--custom--vits_female/.
- Play the output to verify:
```
afplay /Users/Desktop/cq/bangla_output.wav
```

Troubleshooting

Error: AttributeError: 'TTS' object has no attribute 'is_multi_lingual':
- Ensure "is_multi_lingual": false is added to config.json.
Error: TypeError: argument of type 'NoneType' is not iterable:
- Verify the is_coqui_studio patch in TTS/api.py is applied correctly.
Error: KeyError: 'bn':
- Avoid using model_name in the TTS constructor, as it triggers a model zoo lookup for non-existent Bangla models.
Library Compatibility:
- If issues persist, try updating to the latest TTS version:
```
pip install --upgrade TTS
```
- Alternatively, try an older version (e.g., 0.11.0) if the model was trained with an earlier version:
```
pip install TTS==0.11.0
```
Dependencies:
- Ensure all required libraries are installed:
```
pip install torch numpy soundfile librosa
```

Verbose Output:

Add progress_bar=True to tts_to_file for debugging:

tts.tts_to_file(
    text="...",
    file_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/bangla_output.wav",
    progress_bar=True
)

Notes

The model is a single-language VITS model for Bangla, trained with Coqui TTS version ~0.13.3 (based on the directory name v0.13.3_models).
The is_coqui_studio patch is a temporary workaround. Updating to a newer TTS version may eliminate the need for this modification.
The UserWarning about torch.nn.utils.weight_norm is benign and can be ignored.

Model Source

The custom Bangla VITS models are available in two variants: male and female voices.

Model Structure:

"bn": {
    "custom": {
        "vits-male",
        "vits-female"
    }
}

Download Links:
- Bangla Male Model: tts_models--bn--custom--vits_male.zip
- Bangla Female Model: tts_models--bn--custom--vits_female.zip
The female model was manually downloaded due to issues with automatic downloading via pip install TTS.
Unzip the downloaded ZIP file to /Users/Desktop/cq/tts_models--bn--custom--vits_female/ to obtain model_file.pth and config.json.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.markdown		README.markdown
bangla_output.wav		bangla_output.wav
text2speech.py		text2speech.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bangla TTS with Custom VITS Female Model

Prerequisites

Setup Instructions

Running the TTS Script

Troubleshooting

Notes

Model Source

About

Uh oh!

Releases

Packages

Languages

zafi5/Text2Speech_bn

Folders and files

Latest commit

History

Repository files navigation

Bangla TTS with Custom VITS Female Model

Prerequisites

Setup Instructions

Running the TTS Script

Troubleshooting

Notes

Model Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages