speak-to-llm

Speak-to-llm is a straightforward tool that lets you chat with Large Language Models through the terminal featuring low latency Text-To-Speech powered by OpenAI and ElevenLabs. It supports all models available on Ollama, including the latest ones like Llama3.3 and Phi-4 and Gwen 2.5.

How It Works

The low latency is achieved through a streaming approach:

Sentence-by-Sentence Processing: As the LLM generates text, the application splits the response into sentences
Parallel Processing: Each complete sentence is immediately sent to the TTS service while the LLM continues generating the rest of the response.
Audio Streaming: Audio is streamed back and played as soon as it's available, rather than waiting for the entire response to be processed.

This approach reduces the perceived latency compared to traditional methods that wait for the entire LLM response before starting TTS processing or streaming.

Demo 🤖

Requirements

Ollama

The application uses Ollamas API for serving the local Large Language Models.

To install Ollama visit: https://ollama.com/download

OpenAI

If you want to use Text to Speech you need to set your OpenAI API Key in as a environment variable.

export OPENAI_API_KEY = your_openai_api_key

If you are not familiar with the process you can visit this guide

Optional (For Windows)

For installing the other requirements, I recommend using a package manager like Chocolatey on Windows.

To install Chocolatey run powershell as an administrator and execute this command:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

ffmpeg

Download and install ffmpeg from https://ffmpeg.org/download.html and add it to the path.

with Chocolatey (run shell as admin):

choco install ffmpeg

MPV ( Needed for elevenlabs TTS )

If you want to use elevenlabs you need to install mpv: https://mpv.io/installation/

with Chocolatey (run shell as admin):

choco install mpv

Installation

Clone the repository and install the requirements. You can use uv or install straight from the requirements.txt file.

git clone https://github.com/makefinks/speak-to-llm.git
cd speak-to-llm
uv sync
# or pip install -r requirements.txt

If you have an NVIDIA GPU it is highly recommended to install the cuda version of torch. Without this whisper will run on your CPU and will be significantly slower.

Install torch If you used uv:

uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126

If you used pip directly:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126

Usage

python speak_llm.py

Arguments

--normal_stream

Disables the sentence wise streaming which is the default implementation. Introduces larger latency for longer model outputs and larger models where generating the whole response can take a while. Comes with improved stability.

--whisper <model_name>

Which whisper model to use for transcription (All Models). Use large-v3 for the best quality if you have ~10GB VRAM.

--llm_model <model_name>

Which LLM model to run with ollama. Here is a list of available models on ollama. Defaults to Llama3-8b.

--tts <tts_provider>

Which Text-To-Speech provider to use. Options are "openai" or "elevenlabs".

--lang

Which language the conversation is going to be in. Options are "en" and "multi". Elevenlabs support a turbo model for english that is used when "en" is selected as the conversation language. The option "multi" is used for every other language but the latency is higher.

--voice_id <elevenlabs_voice_id>

Which voice to use for ElevenLabs TTS.

--silent

If the silent flag is set, no tts api requests are made and the output is purely in text.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
images		images
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
audio.py		audio.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
speak_llm.py		speak_llm.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

speak-to-llm

How It Works

Demo 🤖

Requirements

Ollama

OpenAI

Optional (For Windows)

ffmpeg

MPV ( Needed for elevenlabs TTS )

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

makefinks/speak-to-llm

Folders and files

Latest commit

History

Repository files navigation

speak-to-llm

How It Works

Demo 🤖

Requirements

Ollama

OpenAI

Optional (For Windows)

ffmpeg

MPV ( Needed for elevenlabs TTS )

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages