For your information: Run VITS models from Coqui with sherpa-onnx (supporting Android, Raspberry Pi, etc) #3194
Replies: 7 comments 14 replies
-
Does this integrate with the android Text-to-speech output api, I mean can we switch the Preferred engine to coqui? |
Beta Was this translation helpful? Give feedback.
-
for me, the application skips words when reading: I tested the three French versions for arm64 v8 and they all skipped words. |
Beta Was this translation helpful? Give feedback.
-
@csukuangfj thanks! |
Beta Was this translation helpful? Give feedback.
-
@csukuangfj Awesome I tried the speech recognition model example on iOS. There was also some methods already for getting vits TTS model working, are there some examples expected soon of how to use the TTS on iOS? |
Beta Was this translation helpful? Give feedback.
-
Hi, the coqui language models really sound great! Just one thing, I don't know how to fix that, I guess it's not trivial: most of the coqui models can't speak numbers when you submit e.g. "1 2 3" to the engine. If you submit the numbers converted to words e.g. "un deux trois" that will work obviously. However, if you have an application say a navigation software, you have no control on what that program sends to the API, and these programs usually don't convert numbers to text. I only observed that issue with most (perhaps all) of the coqui models, instead for english, which seems to be the only one to digest numbers correctly. Also, all piper models seem to be fine. You can test it using the frontend via the link that you have provided above: https://huggingface.co/spaces/k2-fsa/text-to-speech. Here is a list of coqui models facing this issue: Would be great if this could be fixed, as the coqui models are some excellent voices. |
Beta Was this translation helpful? Give feedback.
-
Regarding the issue with speaking numbers: I guess there is a preprocessor script that translates numbers to words, which is implemented only for the english language. If that is the case, could you direct me to the script of the english package, please? As I'm a developer, but without knowledge of your AI toolchain, I might be able to provide a script for some of the other languages, to contribute to your project. |
Beta Was this translation helpful? Give feedback.
-
You are right, but rather than starting from scratch, my thinking was that this technique is already in use for vits-coqui-en (it must be, as numbers are working there). I just don't find the location of the respective code in the vits/coqui project. Seeing how it's integrated would speed up things, as once you see how it works and integrates, it's not a big deal to make adaptions. So, if you have a clue, please hint. |
Beta Was this translation helpful? Give feedback.
-
FYI: We have supported exporting vits models from Coqui to ONNX and run it with sherpa-onnx
sherpa-onnx supports both text-to-speech and speech-to-text and it runs on Linux/macOS/Windows/Android/iOS
and provides various APIs for different languages, e.g., C++/C/Python/C#/Kotlin/Swift/Java/Go, etc.
The following colab notebook shows how to convert vits models from Coqui to sherpa-onnx
https://colab.research.google.com/drive/1cI9VzlimS51uAw4uCR-OBeSXRPBc4KoK?usp=sharing
You can also try the exported models by visiting the following huggingface space
https://huggingface.co/spaces/k2-fsa/text-to-speech
We also have pre-built Android APKs for the VITS English models from Coqui.
https://k2-fsa.github.io/sherpa/onnx/tts/apk.html
Beta Was this translation helpful? Give feedback.
All reactions