Kokoro FastAPI Audio Streaming Support? #445
Replies: 3 comments 2 replies
-
|
I use FastKoko directly to read back material I'm writing for a book. When processing text, I get the start of the material read back while the rest of input is processed. This is under Docker 4.59 under Win 11 on a Legion 7 with i9, RTX4090. I'd call the delay perhaps 1-2 seconds tops. YMMV. For real time conversation this might be a problem, otherwise, life's good. |
Beta Was this translation helpful? Give feedback.
-
|
It would seem that my implementation could be the issue. I am using OpenAI TTS |
Beta Was this translation helpful? Give feedback.
-
|
For what it is worth, I did finally get streaming to work with Kokoro for Home Assistant! It required wyoming_openai to handle the API rather than OpenAI TTS. Streaming is not about the speed of audio generation. Yes, Kokoro is fast. Instead, streaming is the ability to run text generation parallel to audio TTS generation. Thus, the first sentence that is generated by an LLM can begin to be audio decoded while the LLM writes the second sentence, followed by the third, etc. The alternative is linear processing where the LLM has to generate the full final text before the task of TTS audio generation can begin. The result of enabling streaming is the decreased time between LLM prompt processing and the first spoken word in a STT > LLM > TTS voice assistant pipeline. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Piper TTS streams audio on sentence boundaries, significantly reducing wait times for long text-to-speech responses. This is extremely helpful for LLMs. As the text is generated, the TTS generation starts for each sentence.
Is this possible with Kokoro for Home Assistant Voice Assist?
Beta Was this translation helpful? Give feedback.
All reactions