Synthesizing Audio with Unseen Speakers Using Pre-trained VITS Model #3741

adil-ahmed · 2024-05-15T13:40:37Z

adil-ahmed
May 15, 2024

I've been using a pre-trained VITS model (VCTK dataset) for text-to-speech synthesis. I've successfully obtained a list of available speakers using the command:
!tts --model_name tts_models/en/vctk/vits --list_speaker_idxs
Additionally, I've synthesized audio from one of the speakers (p234) using the following code:

!tts --text "Working on a big project. Good wishes!" \
--out_path /content/speech.wav \
--model_name tts_models/en/vctk/vits \
--speaker_idx p234

Now, I'm facing a challenge where I need to synthesize audio from the same pre-trained model but with the voice of a speaker who wasn't present in the dataset during training. I understand that I need to provide a reference audio for this purpose (Zero shot).

Can someone guide me on how to achieve this? Any suggestions or code examples would be highly appreciated.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Synthesizing Audio with Unseen Speakers Using Pre-trained VITS Model #3741

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Synthesizing Audio with Unseen Speakers Using Pre-trained VITS Model #3741

Uh oh!

adil-ahmed May 15, 2024

Replies: 0 comments

adil-ahmed
May 15, 2024