Skip to content

Open-source Bangla Text-to-Speech using Coqui TTS. Converts Bangla text to natural-sounding speech with pretrained models.

Notifications You must be signed in to change notification settings

zafi5/Text2Speech_bn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Bangla TTS with Custom VITS Female Model

This README documents the process of setting up and running a custom Bangla Text-to-Speech (TTS) model using the Coqui TTS library with a VITS female voice model. It includes the steps taken to resolve compatibility issues and successfully generate speech output.

Prerequisites

  • Python Version: Python 3.10
  • Operating System: macOS (tested on MacBook Pro)
  • Virtual Environment: A Python virtual environment is recommended to manage dependencies.
  • Model Files: Custom VITS female Bangla model files (model_file.pth and config.json) located at /Users/Desktop/cq/tts_models--bn--custom--vits_female/.
  • Dependencies:
    • Coqui TTS library (specific version required, see below)
    • Additional libraries: torch, numpy, soundfile, librosa

Setup Instructions

  1. Create and Activate a Virtual Environment:

    python3 -m venv /Users/Desktop/cq/env
    source /Users/Desktop/cq/env/bin/activate
  2. Install Coqui TTS (Specific Version):

    • Initially attempted to install the latest Coqui TTS version using pip install TTS, but the model download failed due to compatibility issues.
    • The custom model (tts_models--bn--custom--vits_female) was manually downloaded from the source as a ZIP file (tts_models--bn--custom--vits_female.zip).
    • Unzipped the model to /Users/Desktop/cq/tts_models--bn--custom--vits_female/, containing model_file.pth and config.json.
    • Installed Coqui TTS version 0.13.0, as the model appeared compatible with version 0.13.3 (based on the model’s directory name v0.13.3_models):
      pip install TTS==0.13.0
    • Installed additional dependencies:
      pip install torch numpy soundfile librosa
  3. Resolve Compatibility Issues:

    • Running the initial script resulted in errors:
      • AttributeError: 'TTS' object has no attribute 'is_multi_lingual': Occurred because the model’s config.json lacked an is_multi_lingual field.
      • TypeError: argument of type 'NoneType' is not iterable: Occurred in the is_coqui_studio check because model_name was None.
    • Fix 1: Modify config.json:
      • Opened /Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json.
      • Added "is_multi_lingual": false to indicate the model is single-language (Bangla):
        {
            "output_path": "/home/ansary/Shabab/",
            "is_multi_lingual": false,
            "logger_uri": null,
            "run_name": "vits_4_nov",
            ...
        }
      • Saved the file.
    • Fix 2: Patch the TTS Library:
      • The TypeError persisted due to self.model_name being None in the is_coqui_studio check.
      • Modified the Coqui TTS library to handle model_name=None:
        • Located /Users/Desktop/cq/env/lib/python3.10/site-packages/TTS/api.py.
        • Found the is_coqui_studio property (around line 296):
          @property
          def is_coqui_studio(self):
              return "coqui_studio" in self.model_name
        • Replaced it with:
          @property
          def is_coqui_studio(self):
              model_name = self.model_name if self.model_name is not None else ""
              return "coqui_studio" in model_name
        • Saved the file.
        • Note: This is a temporary workaround. Consider updating to a newer TTS version for a permanent fix (see Troubleshooting).

Running the TTS Script

  1. Create the Script:

    • Save the following code in /User/Desktop/cq/text2speech.py:
      from TTS.api import TTS
      
      # Load the female Bangla model from the local path
      tts = TTS(
          model_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/model_file.pth",
          config_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/config.json",
          gpu=False
      )
      
      # Synthesize speech
      tts.tts_to_file(
          text="আকাশে মেঘের ভেলা, নদীতে স্রোতের খেলা, প্রকৃতির এই রূপে মন হয় উতলা। সবুজ পাহাড়, ফুলের বাগান, বাংলার সৌন্দর্যে মুগ্ধ সব মানুষের মন।",
          file_path="/Users/Desktop/cq/bangla_output.wav"
      )
  2. Run the Script:

    python /Users/Desktop/cq/text2speech.py
  3. Verify Output:

    • The script generates bangla_output.wav in /Users/Desktop/cq/tts_models--bn--custom--vits_female/.
    • Play the output to verify:
      afplay /Users/Desktop/cq/bangla_output.wav

Troubleshooting

  • Error: AttributeError: 'TTS' object has no attribute 'is_multi_lingual':
    • Ensure "is_multi_lingual": false is added to config.json.
  • Error: TypeError: argument of type 'NoneType' is not iterable:
    • Verify the is_coqui_studio patch in TTS/api.py is applied correctly.
  • Error: KeyError: 'bn':
    • Avoid using model_name in the TTS constructor, as it triggers a model zoo lookup for non-existent Bangla models.
  • Library Compatibility:
    • If issues persist, try updating to the latest TTS version:
      pip install --upgrade TTS
    • Alternatively, try an older version (e.g., 0.11.0) if the model was trained with an earlier version:
      pip install TTS==0.11.0
  • Dependencies:
    • Ensure all required libraries are installed:
      pip install torch numpy soundfile librosa
  • Verbose Output:
    • Add progress_bar=True to tts_to_file for debugging:
      tts.tts_to_file(
          text="...",
          file_path="/Users/Desktop/cq/tts_models--bn--custom--vits_female/bangla_output.wav",
          progress_bar=True
      )

Notes

  • The model is a single-language VITS model for Bangla, trained with Coqui TTS version ~0.13.3 (based on the directory name v0.13.3_models).
  • The is_coqui_studio patch is a temporary workaround. Updating to a newer TTS version may eliminate the need for this modification.
  • The UserWarning about torch.nn.utils.weight_norm is benign and can be ignored.

Model Source

  • The custom Bangla VITS models are available in two variants: male and female voices.
  • Model Structure:
    "bn": {
        "custom": {
            "vits-male",
            "vits-female"
        }
    }
  • Download Links:
  • The female model was manually downloaded due to issues with automatic downloading via pip install TTS.
  • Unzip the downloaded ZIP file to /Users/Desktop/cq/tts_models--bn--custom--vits_female/ to obtain model_file.pth and config.json.

About

Open-source Bangla Text-to-Speech using Coqui TTS. Converts Bangla text to natural-sounding speech with pretrained models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages