Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Docker Image configuration error when running TTS server. #3454

Closed
EvarDion opened this issue Dec 20, 2023 · 12 comments · Fixed by idiap/coqui-ai-TTS#252
Closed

[Bug] Docker Image configuration error when running TTS server. #3454

EvarDion opened this issue Dec 20, 2023 · 12 comments · Fixed by idiap/coqui-ai-TTS#252
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.

Comments

@EvarDion
Copy link

EvarDion commented Dec 20, 2023

Describe the bug

VITS is working fine but a number of other multilingual models are failing to run because of a configuration issue.

A partial list of the models that don't work are:

tts_models/multilingual/multi-dataset/xtts_v2
tts_models/multilingual/multi-dataset/bark
tts_models/en/multi-dataset/tortoise-v2

To Reproduce

Download and run the docker image on windows 10 following the Tutorial instruction here:

The setting I used was GPU = true.

Expected behavior

Models should run.

Logs

StackTrace:

root@709cd4fb2c7c:~# python3 TTS/server/server.py --use_cuda true --model_name tts_models/multilingual/multi-dataset/xtts_v2
 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Traceback (most recent call last):
  File "/root/TTS/server/server.py", line 104, in <module>
    synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts
    self.tts_config = load_config(tts_config_path)
  File "/root/TTS/config/__init__.py", line 82, in load_config
    ext = os.path.splitext(config_path)[1]
  File "/usr/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Environment

"CUDA": {
        "GPU": [
            "NVIDIA GeForce RTX 3060"
        ],
        "available": true,
        "version": "11.8"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "2.1.1+cu118",
        "TTS": "0.22.0",
        "numpy": "1.22.0"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            ""
        ],
        "processor": "x86_64",
        "python": "3.10.12",
        "version": "#1 SMP Thu Oct 5 21:02:42 UTC 2023"
    }

Additional context

I did a git clone of the latest repo into the docker container and reinstalled all of the dependencies and the error still occurs so I'm guess its still an unresolved issue.

No response

@EvarDion EvarDion added the bug Something isn't working label Dec 20, 2023
@EvarDion
Copy link
Author

EvarDion commented Dec 21, 2023

I managed to get the config to load properly by adding the following code at line 105 of TTS/server/server.py

#Check the model path for a config file if none is supplied.
if config_path is None:
    print("looking for config in: ", model_path)
    model_config_path = os.path.join(model_path, "config.json")
    print("model_config_path:", model_config_path)
    if os.path.exists(model_config_path):
         config_path = model_config_path

UPDATE: The Web UI does not load the speaker IDs for xtts_v2, bark and tortoise-v2 so I guess this is a feature that is still a work in progress.

TEMPORARY FIX:
(How To Call xtts_v2 with a http Get request)

Example Get Request:

http://[::1]:5002/api/tts?text=Hello%20how%20are%20you%20today.%20I%20am%20a%20robot.%20How%20may%20I%20help%20you%3F&speaker_id=Daisy%20Studious&style_wav=&language_id=en

Direct Link

Command for listing Speaker ids.

  tts --list_speaker_idxs --model_name   tts_models/multilingual/multi-dataset/xtts_v2

@relesssar
Copy link

Same problem.

python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2
 > tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Traceback (most recent call last):
  File "/root/TTS/server/server.py", line 104, in <module>
    synthesizer = Synthesizer(
  File "/root/TTS/utils/synthesizer.py", line 93, in __init__
    self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
  File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts
    self.tts_config = load_config(tts_config_path)
  File "/root/TTS/config/__init__.py", line 82, in load_config
    ext = os.path.splitext(config_path)[1]
  File "/usr/local/lib/python3.10/posixpath.py", line 118, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

@EvarDion
Copy link
Author

Same problem.

If you want to run the server you just need to edit the server.py and manually apply the code fix for it in my comment above but the UI does not list the speaker_ids so you will need to construct a http get request on your own if you want to hear all the voices. (see examples above).

@djdookie
Copy link

Temporary fix works for me.

Any ideas if voice cloning (--speaker_wav /path/to/sample.wav) also works via tts-server?
If yes and we can successfully use this parameter in the GET-request, where should the sample.wav be stored?

@EvarDion
Copy link
Author

Any ideas if voice cloning (--speaker_wav /path/to/sample.wav) also works via tts-server? If yes and we can successfully use this parameter in the GET-request, where should the sample.wav be stored?

Sorry have not tried it yet.

Copy link

stale bot commented Feb 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Feb 2, 2024
@stale stale bot closed this as completed Feb 10, 2024
@strevg
Copy link

strevg commented Apr 7, 2024

Same issue
python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2

tts_models/multilingual/multi-dataset/xtts_v2 is already downloaded.
Traceback (most recent call last):
File "/root/TTS/server/server.py", line 104, in
synthesizer = Synthesizer(
File "/root/TTS/utils/synthesizer.py", line 93, in init
self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
File "/root/TTS/utils/synthesizer.py", line 183, in _load_tts
self.tts_config = load_config(tts_config_path)
File "/root/TTS/config/init.py", line 82, in load_config
ext = os.path.splitext(config_path)[1]
File "/usr/local/lib/python3.10/posixpath.py", line 118, in splitext
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType
on MacBook M1 Pro

@chiefMarlin
Copy link

Same issue

@MP242
Copy link

MP242 commented May 28, 2024

hey,

After downloading the model with the command below, rerun the command with model_path and config_path.

python3 TTS/server/server.py --model_name tts_models/multilingual/multi-dataset/xtts_v2
python3 TTS/server/server.py \
    --model_path ~/.local/share/tts/tts_models/multilingual/multi-dataset/xtts_v2 \
    --config_path ~/.local/share/tts/tts_models/multilingual/multi-dataset/xtts_v2/config.json

@kopp
Copy link

kopp commented Sep 2, 2024

In the latest docker (ghcr.io/coqui-ai/tts-cpu from 2024-09-01) the paths changed, i.e. now run

tts-server \
  --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 \
  --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json

@stevenlafl
Copy link

stevenlafl commented Sep 19, 2024

Yeah, needs speaker "json" now

# docker
docker run --name coqui --rm -it -p 5002:5002 --gpus all -v ./tts:/root/.local/share --entrypoint /bin/bash ghcr.io/coqui-ai/tts

tts-server \
  --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 \
  --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json \
  --speakers_file_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/speakers_xtts.pth \
  --use_cuda true

or docker compose:

services:
  coqui:
    container_name: coqui
    image: ghcr.io/coqui-ai/tts
    build:
      context: ./TTS
    ports:
      - 5002:5002
    environment:
      - COQUI_TOS_AGREED=1
    #entrypoint: ["python3", "TTS/server/server.py", "--model_name", "tts_models/multilingual/multi-dataset/xtts_v2", "--use_cuda", "true"]
    entrypoint:
      - "/bin/bash"
      - "-c"
      - "tts-server --model_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2 --config_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json --speakers_file_path ~/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/speakers_xtts.pth --use_cuda true"
    volumes:
      - ./tts:/root/.local/share
    deploy:
      resources:
        reservations:
          devices:
            - count: all # alternatively, use `count: all` for all GPUs
              capabilities: [gpu]

Then you can

http://[::1]:5002/api/tts?text=Hello%20how%20are%20you%20today.%20I%20am%20a%20robot.%20How%20may%20I%20help%20you%3F&speaker_id=Daisy%20Studious&style_wav=&language_id=en

This URL uses "Daisy Studious" but the list is:

[
  'Claribel Dervla',   'Daisy Studious',     'Gracie Wise',
  'Tammie Ema',        'Alison Dietlinde',   'Ana Florence',
  'Annmarie Nele',     'Asya Anara',         'Brenda Stern',
  'Gitta Nikolina',    'Henriette Usha',     'Sofia Hellen',
  'Tammy Grit',        'Tanja Adelina',      'Vjollca Johnnie',
  'Andrew Chipper',    'Badr Odhiambo',      'Dionisio Schuyler',
  'Royston Min',       'Viktor Eka',         'Abrahan Mack',
  'Adde Michal',       'Baldur Sanjin',      'Craig Gutsy',
  'Damien Black',      'Gilberto Mathias',   'Ilkin Urbano',
  'Kazuhiko Atallah',  'Ludvig Milivoj',     'Suad Qasim',
  'Torcull Diarmuid',  'Viktor Menelaos',    'Zacharie Aimilios',
  'Nova Hogarth',      'Maja Ruoho',         'Uta Obando',
  'Lidiya Szekeres',   'Chandra MacFarland', 'Szofi Granger',
  'Camilla Holmström', 'Lilya Stainthorpe',  'Zofija Kendrick',
  'Narelle Moon',      'Barbora MacLean',    'Alexandra Hisakawa', 
  'Alma María',        'Rosemary Okafor',    'Ige Behringer', 
  'Filip Traverse',    'Damjan Chapman',     'Wulf Carlevaro', 
  'Aaron Dreschner',   'Kumar Dahl',         'Eugenio Mataracı',
  'Ferran Simen',      'Xavier Hayasaka',    'Luis Moray',
  'Marcos Rudaski'
]

Thanks @EvarDion @kopp for parts of this

Note: if you get AttributeError: 'NoneType' object has no attribute 'name_to_id' it's because it doesn't like quotes. Have to use it like it's shown above.

@eginhard
Copy link
Contributor

Our fork now supports all Coqui TTS models in the server. Specifying a speaker_wav file is not possible yet, this is tracked in idiap#254.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants