You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to implement a chatbot assistant with a voice (TTS) function with Next.js and Vercel AI SDK.
Afaik, the AssistantResponse only returns as a stream to the useAssistant hook. So my current workflow is:
Chat send to chat API route /api/chat
Backend return AssistantResponse
useAssistant hook to check that the response is finished
Send the last message to voice API route /api/voice
The voice API sends the request to the OpenAI tts-1 model
Stream back the TTS response to UI
The problem with this approach is that it creates a great delay between text and voice response (I.e: when the user starts typing the next message, the voice returns)
So is there any workaround for this use case?
Such as combining these two APIs as 1 (E.g: wait for the AssistantResponse in the backend and send to the TTS model immediately)
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
I'm trying to implement a chatbot assistant with a voice (TTS) function with Next.js and Vercel AI SDK.
Afaik, the
AssistantResponse
only returns as a stream to theuseAssistant
hook. So my current workflow is:/api/chat
AssistantResponse
useAssistant
hook to check that the response is finished/api/voice
tts-1
modelThe problem with this approach is that it creates a great delay between text and voice response (I.e: when the user starts typing the next message, the voice returns)
So is there any workaround for this use case?
Such as combining these two APIs as 1 (E.g: wait for the AssistantResponse in the backend and send to the TTS model immediately)
Thank you all in advance!
Beta Was this translation helpful? Give feedback.
All reactions