For your information: Run VITS models from Coqui with sherpa-onnx (supporting Android, Raspberry Pi, etc) #3194

csukuangfj · 2023-11-11T05:09:06Z

csukuangfj
Nov 11, 2023

FYI: We have supported exporting vits models from Coqui to ONNX and run it with sherpa-onnx

sherpa-onnx supports both text-to-speech and speech-to-text and it runs on Linux/macOS/Windows/Android/iOS
and provides various APIs for different languages, e.g., C++/C/Python/C#/Kotlin/Swift/Java/Go, etc.

The following colab notebook shows how to convert vits models from Coqui to sherpa-onnx
https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing

You can also try the exported models by visiting the following huggingface space
https://huggingface.co/spaces/k2-fsa/text-to-speech

We also have pre-built Android APKs for the VITS English models from Coqui.
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html

neurlang · 2023-11-11T21:29:36Z

neurlang
Nov 11, 2023

Does this integrate with the android Text-to-speech output api, I mean can we switch the Preferred engine to coqui?

1 reply

csukuangfj Nov 12, 2023
Author

no, it does not.

it supports using tts models for text-to-speech locally without Internet connection.

MXC48 · 2023-11-12T11:27:04Z

MXC48
Nov 12, 2023

for me, the application skips words when reading: I tested the three French versions for arm64 v8 and they all skipped words.
Otherwise I find the audio quality incredible and if I had the possibility of selecting it in the system settings as the default TTS I would do so.

8 replies

MXC48 Nov 12, 2023

for example, words remembered in red are not read

Please check the ./lexicon.txt and see if the word is there. If not, you can add it to lexicon.txt.

Words that are not in ./lexicon.txt are skipped.

Where is this file located? Sorry, I'm not a developer.

csukuangfj Nov 12, 2023
Author

For instance, for the model vits-piper-fr_FR-siwis-low, lexicon.txt can be found at
https://huggingface.co/csukuangfj/vits-piper-fr_FR-siwis-low/tree/main

MXC48 Nov 12, 2023

Can I contribute by adding words?

csukuangfj Nov 12, 2023
Author

Sure.

Please change the code at
https://github.com/csukuangfj/models/blob/20f97d932133bcb48646cbc8ab330cfea06f079f/.github/scripts/additional_words.py#L70

lexicon.txt is generated automagically. Please don't change lexicon.txt directly.

csukuangfj Dec 6, 2023
Author

It is fixed now.

No words are skipped. We use espeak-ng for models that use espeak-ng and use characters for models that use characters. lexicon.txt is no longer required for models from piper and coqui-ai/TTS.

erogol · 2023-11-13T12:28:52Z

erogol
Nov 13, 2023
Maintainer

@csukuangfj thanks!

0 replies

eemilk · 2023-12-10T08:58:41Z

eemilk
Dec 10, 2023

@csukuangfj Awesome I tried the speech recognition model example on iOS. There was also some methods already for getting vits TTS model working, are there some examples expected soon of how to use the TTS on iOS?

1 reply

csukuangfj Dec 10, 2023
Author

Thanks for using Next-gen Kaldi!

Yes, we already support TTS on iOS. Please find the step-by-step guide for iOS in the following YouTube video.
https://www.youtube.com/watch?v=MvePdkuMNJk

Note that the above video does not reflect the latest code. We discard lexicon.txt for piper and coqui-ai/tts models.
You can find the example model at
https://github.com/k2-fsa/sherpa-onnx/blob/master/ios-swiftui/SherpaOnnxTts/SherpaOnnxTts/ViewModel.swift#L69

// https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
func getTtsFor_en_US_amy_low() -> SherpaOnnxOfflineTtsWrapper {
  // please see  https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-piper-en_US-amy-low.tar.bz2

  // vits-vctk.onnx
  let model = getResource("en_US-amy-low", "onnx")

  // tokens.txt
  let tokens = getResource("tokens", "txt")

  // in this case, we don't need lexicon.txt
  let dataDir = resourceURL(to: "espeak-ng-data")

  let vits = sherpaOnnxOfflineTtsVitsModelConfig(model: model, lexicon: "", tokens: tokens, dataDir: dataDir)
  let modelConfig = sherpaOnnxOfflineTtsModelConfig(vits: vits)
  var config = sherpaOnnxOfflineTtsConfig(model: modelConfig)

  return SherpaOnnxOfflineTtsWrapper(config: &config)
}

func createOfflineTts() -> SherpaOnnxOfflineTtsWrapper {
  return getTtsFor_en_US_amy_low()

  // return getTtsForVCTK()

  // return getTtsForAishell3()

  // please add more models on need by following the above two examples
}

hokusai42 · 2025-01-07T14:22:47Z

hokusai42
Jan 7, 2025

Hi, the coqui language models really sound great!

Just one thing, I don't know how to fix that, I guess it's not trivial: most of the coqui models can't speak numbers when you submit e.g. "1 2 3" to the engine. If you submit the numbers converted to words e.g. "un deux trois" that will work obviously. However, if you have an application say a navigation software, you have no control on what that program sends to the API, and these programs usually don't convert numbers to text.

I only observed that issue with most (perhaps all) of the coqui models, instead for english, which seems to be the only one to digest numbers correctly. Also, all piper models seem to be fine. You can test it using the frontend via the link that you have provided above: https://huggingface.co/spaces/k2-fsa/text-to-speech.

Here is a list of coqui models facing this issue:
csukuangfj/vits-coqui-fr-css10
csukuangfj/vits-coqui-de-css10|1 speaker
csukuangfj/vits-coqui-nl-css10
csukuangfj/vits-coqui-ga-cv
csukuangfj/vits-coqui-da-cv
csukuangfj/vits-coqui-sv-cv
csukuangfj/vits-coqui-pt-cv
csukuangfj/vits-coqui-lv-cv

Would be great if this could be fixed, as the coqui models are some excellent voices.

0 replies

hokusai42 · 2025-01-09T21:39:54Z

hokusai42
Jan 9, 2025

Regarding the issue with speaking numbers: I guess there is a preprocessor script that translates numbers to words, which is implemented only for the english language. If that is the case, could you direct me to the script of the english package, please? As I'm a developer, but without knowledge of your AI toolchain, I might be able to provide a script for some of the other languages, to contribute to your project.

1 reply

csukuangfj Jan 10, 2025
Author

I guess there is a preprocessor script that translates numbers to words

The technique is called text normalization.

Please have a look at pynini and
https://github.com/NVIDIA/NeMo-text-processing

sherpa-onnx supports rule_fsts, which are built using pynini.

hokusai42 · 2025-01-14T21:18:58Z

hokusai42
Jan 14, 2025

You are right, but rather than starting from scratch, my thinking was that this technique is already in use for vits-coqui-en (it must be, as numbers are working there). I just don't find the location of the respective code in the vits/coqui project. Seeing how it's integrated would speed up things, as once you see how it works and integrates, it's not a big deal to make adaptions.

So, if you have a clue, please hint.

3 replies

eginhard Jan 14, 2025

For English, most of it is through inflect, which doesn't support other languages. Espeak handles some textnorm as well. The XTTS model uses num2words, which also supports other languages.

csukuangfj Jan 15, 2025
Author

One thing to note is that most people don't care about deploying TTS models in an environment without Python.

Most packages are only available in Python and it is difficult, if not possible, to port them to C++ or C.

The technique I suggested above is quite mature, widely used, and is super easy to support in C++.

csukuangfj Jan 15, 2025
Author

my thinking was that this technique is already in use for vits-coqui-en

Do you mean vits-coqui-en-ljspeech, vits-coqui-en-vctk, or vits-coqui-en-ljspeech-neon?

All of them use espeak-ng and espeak-ng can do text normalization by itself.

vits-coqui-fr-css10 uses characters as modeling unit and you MUST do text normalization by yourself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For your information: Run VITS models from Coqui with sherpa-onnx (supporting Android, Raspberry Pi, etc) #3194

{{title}}

Replies: 7 comments 14 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

For your information: Run VITS models from Coqui with sherpa-onnx (supporting Android, Raspberry Pi, etc) #3194

Replies: 7 comments · 14 replies

csukuangfj Nov 12, 2023 Author

csukuangfj Nov 12, 2023 Author

csukuangfj Nov 12, 2023 Author

csukuangfj Dec 6, 2023 Author

erogol Nov 13, 2023 Maintainer

csukuangfj Dec 10, 2023 Author

csukuangfj Jan 10, 2025 Author

csukuangfj Jan 15, 2025 Author

csukuangfj Jan 15, 2025 Author

Replies: 7 comments 14 replies

csukuangfj Nov 12, 2023
Author

csukuangfj Nov 12, 2023
Author

csukuangfj Nov 12, 2023
Author

csukuangfj Dec 6, 2023
Author

erogol
Nov 13, 2023
Maintainer

csukuangfj Dec 10, 2023
Author

csukuangfj Jan 10, 2025
Author

csukuangfj Jan 15, 2025
Author

csukuangfj Jan 15, 2025
Author