Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,7 @@ set(SOURCES_CORE
src/cpp/server/backends/whisper_server.cpp
src/cpp/server/backends/moonshine_server.cpp
src/cpp/server/backends/kokoro_server.cpp
src/cpp/server/backends/chatterbox_server.cpp
src/cpp/server/backends/sd_server.cpp
src/cpp/server/backends/vllm_server.cpp
src/cpp/server/backends/backend_utils.cpp
Expand Down
117 changes: 117 additions & 0 deletions docs/guide/configuration/chatterbox.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Chatterbox Backend Options

Lemonade integrates [Chatterbox](https://github.com/resemble-ai/chatterbox) (Resemble AI) as a **text-to-speech** backend, exposed through the OpenAI-compatible `/v1/audio/speech` endpoint (the same endpoint the Kokoro backend uses). Unlike Kokoro (CPU/Metal only), Chatterbox is a PyTorch model and supports **GPU acceleration** across vendors:

1. **GPU by default, CPU fallback.** Chatterbox auto-selects the best available device — CUDA (NVIDIA), ROCm (AMD, Linux), or Metal/MPS (Apple Silicon) — and falls back to CPU when no GPU is present. PyTorch's ROCm build drives AMD GPUs through the CUDA API, so a single bundle per device covers all GPU architectures.
2. **Expressive, multilingual, voice cloning.** Chatterbox supports emotion/exaggeration control, 23+ languages (multilingual variant), and zero-shot voice cloning from a reference clip.
3. **Byte-level streaming.** When the loaded Chatterbox build provides `generate_stream`, audio is emitted incrementally as raw PCM16 @ 24 kHz (`stream_format: "audio"`) for low time-to-first-audio; otherwise it falls back transparently to a single full-utterance response.

## Available Backends

Chatterbox auto-selects in this preference order (first available wins):

| Backend | Device | Platforms |
|---------|--------|-----------|
| `metal` | Apple Silicon GPU (MPS) | macOS arm64 |
| `cuda` | NVIDIA GPU | Windows x64, Linux x64 |
| `rocm` | AMD GPU (HIP via PyTorch ROCm) | Linux x64 |
| `cpu` | CPU | Windows x64, Linux x64, macOS arm64 |

Each is a self-contained PyInstaller bundle from [lemonade-sdk/chatterbox-rocm](https://github.com/lemonade-sdk/chatterbox-rocm) with an embedded Python runtime, the device-appropriate PyTorch wheel, and the `chatterbox-tts` library. No system Python is required (or touched) on the host; Lemonade additionally sets `PYTHONNOUSERSITE=1` at launch.

## Install

The correct device bundle is installed automatically the first time a Chatterbox model is loaded. To install explicitly:

```bash
lemonade backends install chatterbox:cuda # or rocm / metal / cpu
```

Or via HTTP:
```bash
curl -X POST http://localhost:13305/api/v1/install \
-H 'Content-Type: application/json' \
-d '{"recipe": "chatterbox", "backend": "cuda"}'
```

Bundle versions are pinned in [`backend_versions.json`](https://github.com/lemonade-sdk/lemonade/blob/main/src/cpp/resources/backend_versions.json) (`chatterbox.{cuda,rocm,metal,cpu}`), with tags following the upstream library version (`chatterbox0.1.7` = `chatterbox-tts` 0.1.7). Bundles are built automatically by [lemonade-sdk/chatterbox-rocm](https://github.com/lemonade-sdk/chatterbox-rocm), a distribution-only repo that tracks `chatterbox-tts` PyPI releases — no Chatterbox code is forked; the `main.py` wrapper in `tools/chatterbox-server/` here is frozen together with the PyPI wheel into a self-contained bundle.

## Models

Three variants are registered in [`server_models.json`](https://github.com/lemonade-sdk/lemonade/blob/main/src/cpp/resources/server_models.json), downloading from Hugging Face into the standard HF cache:

| Model | Variant | Checkpoint |
|-------|---------|-----------|
| `Chatterbox` | English (`ChatterboxTTS`) | `ResembleAI/chatterbox` |
| `Chatterbox-Multilingual` | 23+ languages (`ChatterboxMultilingualTTS`) | `ResembleAI/chatterbox` |
| `Chatterbox-Turbo` | Fast English w/ paralinguistic tags (`ChatterboxTurboTTS`) | `ResembleAI/chatterbox-turbo` |

```bash
lemonade pull Chatterbox
```

To register your own Chatterbox checkpoint (loaded via `from_local`):

```bash
lemonade pull user.MyChatterbox \
--checkpoint main ResembleAI/chatterbox \
--recipe chatterbox
```

## Use

### Speech synthesis (OpenAI-compatible)

```bash
curl http://localhost:13305/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"model": "Chatterbox", "input": "Hello from Lemonade.", "response_format": "mp3"}' \
--output speech.mp3
```

`response_format` accepts `mp3` (default), `wav`, `pcm`, `flac`, and `opus`.

### Streaming

Request raw PCM streaming (24 kHz, signed 16-bit, little-endian):

```bash
curl http://localhost:13305/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"model": "Chatterbox", "input": "Streaming audio.", "stream_format": "audio"}' \
--output speech.pcm
```

### Voice cloning and expressive controls

The OpenAI `voice` field is treated as a reference-audio path for zero-shot voice cloning when it points at an existing file (or pass `audio_prompt_path` explicitly). Chatterbox-specific controls `exaggeration`, `cfg_weight`, and `temperature` are passed through, and `language_id` selects the language for the multilingual variant:

```bash
curl http://localhost:13305/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{"model": "Chatterbox-Multilingual", "input": "Bonjour.", "language_id": "fr", "exaggeration": 0.6}' \
--output bonjour.mp3
```

## Tuning

Force a specific device (overriding auto-selection) via config or per-load:

```bash
lemonade config set chatterbox.backend=cpu
```

Free-form CLI args can be appended to `chatterbox-server` via `chatterbox_args`:

```bash
lemonade config set chatterbox_args="..."
```

(`--ckpt-dir`, `--variant`, `--device`, `--host`, and `--port` are managed by Lemonade and rejected as custom args.)

## Known gotchas

- **ROCm is Linux-only.** PyTorch publishes ROCm wheels for Linux only, so the `rocm` bundle is offered on Linux x64. On Windows, AMD GPUs fall back to the `cpu` bundle.
- **Large bundles.** Chatterbox ships a full PyTorch runtime; the GPU bundles are multi-gigabyte downloads. The first load also downloads the model weights (~2 GB) from Hugging Face.
- **GPU memory.** Chatterbox participates in the GPU LRU like other GPU models; on tight-VRAM systems it may evict (or be evicted by) an LLM. Use `--max-loaded-models` and per-model eviction settings to tune coexistence.
- **Streaming support is version-dependent.** Byte-level streaming uses Chatterbox's `generate_stream` when present in the installed build; otherwise the wrapper returns the full utterance as a single chunk over the same streaming contract.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ nav:
- llama.cpp: guide/configuration/llamacpp.md
- vLLM: guide/configuration/vllm.md
- Moonshine: guide/configuration/moonshine.md
- Chatterbox: guide/configuration/chatterbox.md
- Cloud Offload: guide/configuration/cloud.md
- FAQ: guide/faq.md
- Development:
Expand Down
1 change: 1 addition & 0 deletions src/app/src/renderer/BackendManager.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ const RECIPE_ORDER = new Map([
'moonshine',
'sd-cpp',
'kokoro',
'chatterbox',
'flm',
'ryzenai-llm',
'vllm',
Expand Down
1 change: 1 addition & 0 deletions src/app/src/renderer/utils/recipeNames.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ export const RECIPE_DISPLAY_NAMES: Record<string, string> = {
'moonshine': 'Moonshine',
'sd-cpp': 'StableDiffusion.cpp',
'kokoro': 'Kokoro',
'chatterbox': 'Chatterbox',
'cloud': 'Cloud',
'vllm': 'vLLM ROCm (experimental)',
};
55 changes: 55 additions & 0 deletions src/cpp/include/lemon/backends/chatterbox_server.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#pragma once

#include "../server_capabilities.h"
#include "../wrapped_server.h"
#include "backend_utils.h"
#include <string>

namespace lemon {
namespace backends {

// Chatterbox text-to-speech backend (Resemble AI). Runs the self-contained
// chatterbox-server subprocess (PyInstaller bundle from the
// lemonade-sdk/chatterbox-rocm distribution repo) and forwards OpenAI-style
// /v1/audio/speech requests to it. Supports CUDA, ROCm, Metal, and CPU; the
// device variant is auto-selected (GPU when available, else CPU) via the
// RECIPE_DEFS preference order in system_info.cpp.
class ChatterboxServer : public WrappedServer, public ITextToSpeechServer {
public:
static InstallParams get_install_params(const std::string& backend, const std::string& version);

inline static const BackendSpec SPEC = BackendSpec(
"chatterbox",
#ifdef _WIN32
"chatterbox-server.exe"
#else
"chatterbox-server"
#endif
, get_install_params
, true // supports_split_archive: GPU bundles exceed GitHub's 2 GiB asset limit
);

explicit ChatterboxServer(const std::string& log_level,
ModelManager* model_manager,
BackendManager* backend_manager);

~ChatterboxServer() override;

void load(const std::string& model_name,
const ModelInfo& model_info,
const RecipeOptions& options,
bool do_not_upgrade = false) override;

void unload() override;

// ICompletionServer implementation (not supported - return errors)
json chat_completion(const json& request) override;
json completion(const json& request) override;
json responses(const json& request) override;

// ITextToSpeechServer implementation
void audio_speech(const json& request, httplib::DataSink& sink) override;
};

} // namespace backends
} // namespace lemon
5 changes: 5 additions & 0 deletions src/cpp/include/lemon/model_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,11 @@ struct ModelInfo {
// Moonshine-specific model architecture (e.g., 2 = TINY_STREAMING, 4 = SMALL_STREAMING, 5 = MEDIUM_STREAMING)
int moonshine_arch = -1;

// Chatterbox-specific model variant ("english", "multilingual", "turbo").
// Selects which Chatterbox class chatterbox-server instantiates. Empty
// defaults to "english".
std::string chatterbox_variant;

// Utility
std::string checkpoint(const std::string& type = "main") const { return checkpoints.count(type) ? checkpoints.at(type) : ""; }
std::string resolved_path(const std::string& type = "main") const { return resolved_paths.count(type) ? resolved_paths.at(type) : ""; }
Expand Down
3 changes: 3 additions & 0 deletions src/cpp/include/lemon/model_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,9 @@ inline DeviceType get_device_type_from_recipe(const std::string& recipe) {
return DEVICE_CPU;
} else if (recipe == "kokoro") {
return DEVICE_CPU;
} else if (recipe == "chatterbox") {
// Defaults to GPU; ChatterboxServer overrides to CPU when the cpu backend is selected.
return DEVICE_GPU;
} else if (is_collection_recipe(recipe)) {
return DEVICE_NONE;
} else if (recipe == "cloud") {
Expand Down
6 changes: 6 additions & 0 deletions src/cpp/resources/backend_versions.json
Original file line number Diff line number Diff line change
Expand Up @@ -92,5 +92,11 @@
"moonshine": {
"cpu": "moonshine0.0.62"
},
"chatterbox": {
"cuda": "chatterbox0.1.7",
"rocm": "chatterbox0.1.7",
"metal": "chatterbox0.1.7",
"cpu": "chatterbox0.1.7"
},
"clear_bin_if_lemonade_below": "9.4.0"
}
28 changes: 28 additions & 0 deletions src/cpp/resources/server_models.json
Original file line number Diff line number Diff line change
Expand Up @@ -1581,6 +1581,34 @@
],
"size": 0.354
},
"Chatterbox": {
"checkpoint": "ResembleAI/chatterbox",
"recipe": "chatterbox",
"chatterbox_variant": "english",
"suggested": true,
"labels": [
"tts"
],
"size": 3.2
},
"Chatterbox-Multilingual": {
"checkpoint": "ResembleAI/chatterbox",
"recipe": "chatterbox",
"chatterbox_variant": "multilingual",
"labels": [
"tts"
],
"size": 3.2
},
"Chatterbox-Turbo": {
"checkpoint": "ResembleAI/chatterbox-turbo",
"recipe": "chatterbox",
"chatterbox_variant": "turbo",
"labels": [
"tts"
],
"size": 3.0
},
"RealESRGAN-x4plus": {
"checkpoint": "amd/realesrgan-x4plus:RealESRGAN_x4plus.pth",
"recipe": "sd-cpp",
Expand Down
2 changes: 2 additions & 0 deletions src/cpp/server/backends/backend_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include "lemon/backends/llamacpp_server.h"
#include "lemon/backends/whisper_server.h"
#include "lemon/backends/sd_server.h"
#include "lemon/backends/chatterbox_server.h"
#include "lemon/backends/kokoro_server.h"
#include "lemon/backends/ryzenaiserver.h"
#include "lemon/backends/vllm_server.h"
Expand Down Expand Up @@ -43,6 +44,7 @@ namespace lemon::backends {
if (recipe == "whispercpp") return &WhisperServer::SPEC;
if (recipe == "sd-cpp") return &SDServer::SPEC;
if (recipe == "kokoro") return &KokoroServer::SPEC;
if (recipe == "chatterbox") return &ChatterboxServer::SPEC;
if (recipe == "ryzenai-llm") return &::lemon::RyzenAIServer::SPEC;
if (recipe == "vllm") return &VLLMServer::SPEC;
if (recipe == "flm") return &FastFlowLMServer::SPEC;
Expand Down
Loading
Loading