- 2025/09/16: Released models (pVAD, turn-detector) and services (FireRedTTS1, FireRedASR) for the cascade system.
FireRedChat offers a fully self-hosted solution for building real-time voice AI agents. It integrates robust TTS (Text-to-Speech), ASR (Automatic Speech Recognition), pVAD (Personalized Voice Activity Detection), and EoT (End-of-Turn) functionalities, enabling developers to create customizable, privacy-focused AI agents with:
- No reliance on external APIs
- Zero data leakage
- Complete deployment control
The system architecture includes:
- LiveKit RTC Server: Acts as the core video/audio chat meeting room for real-time communication.
- Agents (AI-Agent Bot Server): Handles the intelligent agents that process and respond to user interactions.
- Agents-Playground (WebUI): A user-friendly web interface for joining and interacting with the chat rooms online.
- Redis Server: Enables multi-node hosting for scalability and data persistence across instances.
- TTS Server: Handles text-to-speech conversion for agent responses.
- ASR Server: Handles automatic speech recognition for user inputs.
Additionally, you'll need to host your own LLM (Large Language Model) server to power the AI agents. This can be a simple setup like an Ollama server, vLLM server or a more advanced one, such as Dify, depending on your needs.
| Module | Description |
|---|---|
| 🆕 TTS Service | Accelerated FireRedTTS with text normalization and G2P (Grapheme-to-Phoneme) support. |
| 🆕 ASR Service | Automatic speech recognition with punctuation model integration. |
| 🆕 pVAD | Personalized Voice Activity Detection for improved barge-in experience. |
| 🆕 Turn-Detector | Compact end-of-turn detection for quicker response (English and Chinese). |
| 🆕 Context-Aware TTS Coming soon |
Context-aware TTS with text normalization and G2P. |
| 🆕 Audio LLM Service Coming soon |
Acoustically aware LLM with vLLM acceleration. |
| Fork of livekit/agents | Core framework for real-time voice AI agent development. |
| Fork of livekit/agents-playground | Intuitive web UI for easy user-interactions. |
Try the demo or follow these steps to deploy your own instance. Note that you'll need to set up your own LLM server (e.g., Ollama or Dify) to integrate with the AI-Agent Bot Server for full functionality.
In this step, we'll set up the foundational services: the LiveKit RTC Server (for real-time communication), the Redis Server (for multi-node support), and the WebUI (for browser-based access).
First, clone the repository:
git clone --recurse-submodules https://github.com/FireRedTeam/FireRedChat.git(Optional) If you have a domain name, follow the official doc and generate your configuration files:
mkdir output
docker pull livekit/generate
docker run --rm -it -v $PWD:/output livekit/generateTo quickly startup RTC Server, Redis Server, and WebUI services on the same machine, use Docker Compose:
cd docker
docker-compose up -dOnce running:
- The RTC Server will be hosted at
0.0.0.0:7880. It uses two UDP ports per user for data communication (e.g., audio/video streams). - The WebUI can be accessed at
0.0.0.0:3000in your browser, allowing users to join chat rooms online.
Note: Secured connections are not required for local development if accessing from the same machine hosting the LiveKit RTC Server.
To secure the LiveKit RTC Server connection (upgrading ws://0.0.0.0:7880 to wss://) and serve the WebUI securely via HTTPS, configure Nginx as a reverse proxy. For detailed instructions, refer to the Foundry VTT LiveKit Hosting Guide.
Notes:
- After configuring Nginx, update your LiveKit client to use
wss://your-domain.com/livekitinstead ofws://0.0.0.0:7880. The WebUI will be accessible athttps://your-domain.com. - Ensure port 443 is open on your firewall for HTTPS/WSS traffic, and the UDP port range (e.g.,
50000-60000) is open for LiveKit media streams.
Launch the additional servers required for voice processing and AI functionality.
-
FireRedASR Service (Automatic Speech Recognition):
Refer to the ASR Server README for setup instructions.
-
FireRedTTS Service (Text-to-Speech):
Refer to the TTS Server README for setup instructions.
-
LLM Service Deploy a self-hosted LLM server to power the AI agents. Options include Ollama for a lightweight setup, vLLM for high-performance inference, or additionally with Dify for advanced workflows.
# ollama example, refer to https://ollama.com for official guide
curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen2.5-
Agents Service (AI-Agent Bot Server):
Upon a user joining a chat room, a bot worker is automatically dispatched to handle interactions. This service drives the intelligent agents and depends on the ASR, TTS, and LLM services for full functionality. For detailed setup instructions, refer to the Agents Service README.
We extend our gratitude to the following open-source projects:
- livekit/livekit (RTC)
- livekit/agents
- livekit/agents-playground
- speechbrain/spkrec-ecapa-voxceleb
- google-bert/bert-base-multilingual-cased
The content provided is for academic purposes only and is intended to demonstrate technical capabilities.