Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TextToSpeech WebSockets #46

Open
StephenHodgson opened this issue May 13, 2024 · 15 comments
Open

Add support for TextToSpeech WebSockets #46

StephenHodgson opened this issue May 13, 2024 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@StephenHodgson
Copy link
Member

Support websockets for text to speech

ElevenLabs-DotNet-Proxy should also support forwarding websockets connections

@StephenHodgson StephenHodgson added the enhancement New feature or request label May 13, 2024
@StephenHodgson StephenHodgson self-assigned this May 13, 2024
@ocinon
Copy link

ocinon commented Jul 22, 2024

@StephenHodgson did you start implementing WebSockets by any chance?
Also, I saw the speech-to-speech model in your 3.0.0 draft, but there is no support yet, correct?

@StephenHodgson
Copy link
Member Author

Yes I was already doing this for the unity package and was considering porting it once done

@ocinon
Copy link

ocinon commented Jul 25, 2024

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here:
ocinon/ElevenLabs-DotNet@93457e1

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

@StephenHodgson
Copy link
Member Author

StephenHodgson commented Jul 25, 2024

@StephenHodgson I couldn't find any previous WebSocket implementation in your Unity repo. As I needed it, I implemented it for the DotNet version here: ocinon/ElevenLabs-DotNet@93457e1

It extends the client slightly and tries to pick up the same patterns the repo used before. It lacks proxy support and tests. If you have any notes, let me know.

Feel free to open a pull request!

Only feedback is to rebase on the development branch

@odillner
Copy link

Any updates on this? It would be very useful in a project I'm part of.

@ocinon
Copy link

ocinon commented Oct 22, 2024

Sorry for never updating the thread. After some back-and-forth with ElevenLabs support, it turned out that their WebSocket implementation has a 20-second timeout. This is fine for batch conversions but makes it pretty useless for low-volume or prototyping voice-to-voice bots or similar use cases.

It might be possible to keep sending a space string (" ") as a keep-alive signal, but I stopped spending more time on it, as during testing, I didn't get speed increases compared to the REST API (but I didn't do proper testing). The code exists, and I could push it for reference.

@odillner
Copy link

Thanks for the quick response!

Well that's disappointing, but thanks for doing the legwork.

I'm gonna do some testing on my own, so please push the code.

@ocinon
Copy link

ocinon commented Oct 22, 2024

It's here ocinon/ElevenLabs-DotNet

I updated it to the latest ElevenLabs version. Keep-alive messages don't seem to work. BUT the ElevenLabs support just told me that they added an "inactivity timeout" that raises the timeout to up to 180 seconds. I added it to the code. Happy testing!

Some basic testing code:

using ElevenLabsClient client = new(ELEVEN_LABS_KEY);
await using FileStream fileStream
	= new("output.mp3", FileMode.Create, FileAccess.Write, FileShare.Read);
await client.TextToSpeechWebSocketEndpoint.StartTextToSpeechAsync(
	Voice.Arnold, (async voiceClip =>
		              {
			              if (voiceClip == null)
			              {
				              Console.WriteLine("Received null voice clip.");
				              return;
			              }

			              Console.WriteLine(
				              $"Received voice clip with {voiceClip.ClipData.Length} bytes.");
			              await fileStream.WriteAsync(voiceClip.ClipData);
		              }),
	null, null, Model.TurboV2_5, OutputFormat.MP3_44100_128, null, null, null, 180);
while (true)
{
	Console.Write("Enter text to convert to speech: ");
	string? text = Console.ReadLine();
	if (text is null) { continue; }

	if (text == "exit") { break; }

	bool?  flush   = text == "flush" ? true : null;
	bool   trigger = text == "trigger";
	string prompt  = text is "flush" or "trigger" ? "." : text;
	await client.TextToSpeechWebSocketEndpoint.SendTextToSpeechAsync(prompt, flush, trigger);
}

await client.TextToSpeechWebSocketEndpoint.EndTextToSpeechAsync();

@StephenHodgson
Copy link
Member Author

@ocinon feel free to open a PR on the main project for everyone else to get :)

@StephenHodgson
Copy link
Member Author

I've also been playing with the websocket support for my OpenAI-DotNet project and will likely port over some stuff from there as well, esp around the web socket client. Just a bit of an abstraction layer to help keep the socket alive, and listening, etc

@ocinon
Copy link

ocinon commented Oct 22, 2024

@StephenHodgson should we push it into the development branch for now? Could you open that one for me?

@StephenHodgson
Copy link
Member Author

Sure I'll push a development branch right now for you to target :)

@StephenHodgson StephenHodgson linked a pull request Oct 22, 2024 that will close this issue
@StephenHodgson
Copy link
Member Author

you may want to rebase your changes tho and just make sure you've synced with upstream.

@ocinon
Copy link

ocinon commented Oct 22, 2024

It's up to date but not rebased. One sec.

@ocinon
Copy link

ocinon commented Oct 22, 2024

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants