Skip to content

A basic terminal application to have voice conversations with models served by ollama. Uses Whisper and OpenAI TTS for accurate and low latency results.

Notifications You must be signed in to change notification settings

makefinks/speak-to-llm

Repository files navigation

speak-to-llm

Speak-to-llm is a straightforward tool that lets you chat with Large Language Models through the terminal featuring low latency Text-To-Speech powered by OpenAI and ElevenLabs. It supports all models available on Ollama, including the latest ones like Llama3.3 and Phi-4 and Gwen 2.5.

How It Works

The low latency is achieved through a streaming approach:

  1. Sentence-by-Sentence Processing: As the LLM generates text, the application splits the response into sentences
  2. Parallel Processing: Each complete sentence is immediately sent to the TTS service while the LLM continues generating the rest of the response.
  3. Audio Streaming: Audio is streamed back and played as soon as it's available, rather than waiting for the entire response to be processed.

This approach reduces the perceived latency compared to traditional methods that wait for the entire LLM response before starting TTS processing or streaming.


Demo 🤖

demo

Requirements

Ollama

The application uses Ollamas API for serving the local Large Language Models.

To install Ollama visit: https://ollama.com/download

OpenAI

If you want to use Text to Speech you need to set your OpenAI API Key in as a environment variable.

export OPENAI_API_KEY = your_openai_api_key

If you are not familiar with the process you can visit this guide

Optional (For Windows)

For installing the other requirements, I recommend using a package manager like Chocolatey on Windows.

To install Chocolatey run powershell as an administrator and execute this command:

Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

ffmpeg

Download and install ffmpeg from https://ffmpeg.org/download.html and add it to the path.

with Chocolatey (run shell as admin):

choco install ffmpeg

MPV ( Needed for elevenlabs TTS )

If you want to use elevenlabs you need to install mpv: https://mpv.io/installation/

with Chocolatey (run shell as admin):

choco install mpv

Installation

Clone the repository and install the requirements. You can use uv or install straight from the requirements.txt file.

git clone https://github.com/makefinks/speak-to-llm.git
cd speak-to-llm
uv sync
# or pip install -r requirements.txt

If you have an NVIDIA GPU it is highly recommended to install the cuda version of torch. Without this whisper will run on your CPU and will be significantly slower.

Install torch If you used uv:


uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126

If you used pip directly:


pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu126

Usage

python speak_llm.py

Arguments

--normal_stream

Disables the sentence wise streaming which is the default implementation. Introduces larger latency for longer model outputs and larger models where generating the whole response can take a while. Comes with improved stability.

--whisper <model_name>

Which whisper model to use for transcription (All Models). Use large-v3 for the best quality if you have ~10GB VRAM.

--llm_model <model_name>

Which LLM model to run with ollama. Here is a list of available models on ollama. Defaults to Llama3-8b.

--tts <tts_provider>

Which Text-To-Speech provider to use. Options are "openai" or "elevenlabs".

--lang

Which language the conversation is going to be in. Options are "en" and "multi". Elevenlabs support a turbo model for english that is used when "en" is selected as the conversation language. The option "multi" is used for every other language but the latency is higher.

--voice_id <elevenlabs_voice_id>

Which voice to use for ElevenLabs TTS.

--silent

If the silent flag is set, no tts api requests are made and the output is purely in text.

About

A basic terminal application to have voice conversations with models served by ollama. Uses Whisper and OpenAI TTS for accurate and low latency results.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages