From 3ef95608a93ce06a58be7c2a41a52cdbbb55fc78 Mon Sep 17 00:00:00 2001 From: Alain Rafiki <6798298+alainrafiki@users.noreply.github.com> Date: Mon, 14 Oct 2024 11:30:18 -0500 Subject: [PATCH] Update README.md Removed an unnecessary article. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index fb26200..fefb12a 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ A complete voice chat app powered by a speech-to-speech language model and bidirectional streaming. -On the backend is Kyutai Lab's [Moshi](https://github.com/kyutai-labs/moshi) model, which will continuously listen, plan, and respond to a the user. It uses the [Mimi](https://huggingface.co/kyutai/mimi) streaming encoder/decoder model to maintain an unbroken stream of audio in and out, and a [speech-text foundation model](https://huggingface.co/kyutai/moshiko-pytorch-bf16) to determine when and how to respond. +On the backend is Kyutai Lab's [Moshi](https://github.com/kyutai-labs/moshi) model, which will continuously listen, plan, and respond to the user. It uses the [Mimi](https://huggingface.co/kyutai/mimi) streaming encoder/decoder model to maintain an unbroken stream of audio in and out, and a [speech-text foundation model](https://huggingface.co/kyutai/moshiko-pytorch-bf16) to determine when and how to respond. Thanks to bidirectional websocket streaming and use of the [Opus audio codec](https://opus-codec.org/) for compressing audio across the network, response times on good internet can be nearly instantaneous, closely matching the cadence of human speech.