22 Apr 07:03

Immutable

v0.4.5

ed2eec5

v0.4.5 Latest

Latest

Second hotfix for the "offline mode is enabled" crash on model load. 0.4.4 reverted the inference-path offline guards but kept the same trap on the load path, so users who updated to 0.4.4 kept hitting the exact error the release was supposed to fix (#526). This release removes the load-path guards and patches the transformers tokenizer load to be robust to HuggingFace metadata failures at the source, so the class of bug can't recur.

Reliability

Load no longer fails with "offline mode is enabled" (#530, fixes #526). transformers 4.57.x added an unconditional huggingface_hub.model_info() call inside AutoTokenizer.from_pretrained (via _patch_mistral_regex) that runs for every non-local repo load, regardless of cache state or whether the target model is actually a Mistral variant. The load-time HF_HUB_OFFLINE guard from 0.4.2 turned that into a hard crash for cached online users the moment 0.4.4 removed the inference-path guard that had been masking the problem. Fix wraps _patch_mistral_regex so any exception from the HF metadata check is caught and the tokenizer is returned unchanged — matching the success-path behavior for non-Mistral repos. The wrapper installs at backend.backends import time so it covers Qwen Base, Qwen CustomVoice, TADA, and every other transformers-backed engine on Windows, Linux, and CUDA alike. The load-time force_offline_if_cached guards were removed — with the wrapper in place they provide zero value and only risk re-introducing the same failure mode.
No more 30s pause when generating without a network. The HuggingFace metadata timeout called out as a known caveat in 0.4.4 is covered by the same patch; offline users no longer wait for the check to time out before load completes.

Assets 23

cuda-libs-cu128-v1.tar.gz

sha256:d47699a500182b28813442c909ee72a3f2c532e14d2ee1349a92e3af67505e1d

1.97 GB 2026-04-22T05:39:10Z
cuda-libs-cu128-v1.tar.gz.sha256

sha256:1ef90b7efd9b4db0e59eeebe8b1117095d2767931789eefe6637d6330ef25d66

93 Bytes 2026-04-22T05:39:10Z
cuda-libs.json

sha256:4bd6315b0d9cfaee2c34b2f4a875516c33baa347ac4a42cbe3c5f9a80dd424f5

193 Bytes 2026-04-22T05:39:10Z
latest.json

sha256:5771879ed027b2eea33f50353a226af8ed77c9ec4786490b74db6b8cc304319c

5.92 KB 2026-04-22T05:55:50Z
voicebox-server-cuda.tar.gz

sha256:2fbdc4739fe69195e3e1d7912f56b20e4f823cff64d4ad15cf817a25120e2c49

973 MB 2026-04-22T05:39:10Z
voicebox-server-cuda.tar.gz.sha256

sha256:66ff328bf5240e1f216947f6c750768b7810692aa5f0e80567d20a4797e76e50

95 Bytes 2026-04-22T05:39:10Z
Voicebox_0.4.5_aarch64.dmg

sha256:6491af9c293554f3dbeece196ca33a2c83bae0435882a059d3709594a0595a15

483 MB 2026-04-22T05:31:15Z
Voicebox_0.4.5_x64-setup.exe

sha256:0f60b0c807f7e98b864302780db8fe7f21b74eb756dae836a6b73580ee349f24

482 MB 2026-04-22T05:43:50Z
Voicebox_0.4.5_x64-setup.exe.sig

sha256:fafdea8c83d0c0e303ba67e4258879a22deb92d7fd7a0f65c4084c4175a197c5

420 Bytes 2026-04-22T05:44:04Z
Voicebox_0.4.5_x64-setup.nsis.zip

sha256:54d9da2509ee14af98de795b7516460067b72b15e9c41f171a952f33bb68d88f

482 MB 2026-04-22T05:44:04Z
Source code (zip)

2026-04-22T05:06:19Z
Source code (tar.gz)

2026-04-22T05:06:19Z
Release attestation (json)

2026-04-22T05:06:19Z

21 Apr 12:10

github-actions

Immutable

v0.4.4

74e0044

v0.4.4

Hotfix for a regression in 0.4.3 where generation and transcription could fail outright with "offline mode is enabled" even when the user was online.

Reliability

Inference no longer fails with "offline mode is enabled" while online (#524, reverts the inference-path guards from #503). 0.4.3 wrapped every inference body (generate, transcribe, create_voice_clone_prompt) with a process-wide HF_HUB_OFFLINE flip to stop lazy HuggingFace lookups from hanging when the network drops mid-inference (#462). That flag also blocks legitimate metadata calls (e.g. HfApi().model_info for revision resolution) so online users started seeing generation fail outright. Inference now runs with the process's default HF state. Load-time offline guards — which weren't the source of the regression — stay in place.

Known caveat: users generating without an internet connection may see brief pauses during inference while HuggingFace metadata lookups time out (typically ~30s, after which the library recovers). A proper offline-mode toggle is planned for 0.4.5.

Assets 23

21 Apr 07:51

github-actions

Immutable

v0.4.3

328bdca

voicebox v0.4.3

A patch focused on two user-impacting reliability fixes: macOS DMG notarization (unblocks brew install voicebox on macOS 15 Sequoia and fixes spurious "app isn't signed" Gatekeeper dialogs on older Intel Macs) and Kokoro Japanese voice initialization on fresh installs.

macOS

DMGs are now notarized and stapled (#523). Tauri's bundler notarizes the .app inside the DMG but ships the DMG wrapper itself unnotarized. Gatekeeper rejects that on macOS 15 Sequoia (confirmed by Homebrew Cask CI failing on both arm and intel Sequoia runners) and causes the "the app is not signed" dialog on older Intel Macs when Apple's notarization servers are slow or unreachable (#509). The release workflow now submits each DMG to notarytool, staples the ticket, verifies with spctl, and overwrites the draft-release asset tauri-action uploaded. Adds ~5-10 min per macOS job.

Backend

Kokoro Japanese voices no longer crash on fresh installs (#521, fixes #514). misaki[ja] pulls in fugashi, which needs a MeCab dictionary on disk. The unidic package that was being installed ships no data and expects a ~526MB runtime download that just setup doesn't run (and which wouldn't survive PyInstaller anyway). Swapped to unidic-lite, which bundles a MeCab-compatible dict inside the wheel (~50MB). Collected in build_binary.py so frozen builds pick up unidic_lite/dicdir/.

Assets 23

21 Apr 00:39

github-actions

Immutable

v0.4.2

0f97300

voicebox v0.4.2

This release localizes the entire app. English, Simplified Chinese (zh-CN), Traditional Chinese (zh-TW), and Japanese (ja) are wired up end-to-end across every tab, modal, dialog, and toast — 559 translation keys per locale, parity verified. Plus a batch of reliability fixes: offline-mode now actually stays offline, Chatterbox accepts reference samples it used to reject, MLX Qwen 0.6B points at the right repo, and macOS system audio survives backgrounding.

Internationalization (#508)

i18next foundation with an in-app language switcher that re-renders the tree on change — lazy-loaded components were holding stale strings without an explicit key-bump on the React root.
Four locales at full coverage: English, Simplified Chinese, Traditional Chinese, Japanese. No partial/English-fallback surfaces.
Every user-visible surface translated: Stories (list, content editor, dialogs, toasts), Effects (list, detail, chain editor, built-in preset names), Voices (table, search, inspector, Create/Edit modal, audio sample panels), Audio Channels (list, dialogs, device picker), history + story dropdown menus, ProfileCard / ProfileList / HistoryTable, and the unsupported-model note.
Relative dates localize via date-fns locale objects (3 days ago → 3 天前 / 3 日前) — Intl.RelativeTimeFormat doesn't produce the phrasing we use in the history table.
Dev-build version suffix (v0.4.2 (dev) / (开发版) / (開發版) / (開発版)) is now locale-aware.
559 translation keys across all four locales.

Reliability

HF_HUB_OFFLINE now guards every inference path (#503) — some engines were still attempting a HuggingFace metadata roundtrip on first load when offline mode was enabled, causing hangs on airgapped or flaky networks.
Chatterbox reference samples are preprocessed instead of rejected (#502) — samples outside the expected sample rate or channel layout are resampled to match, rather than failing with an opaque error.
MLX Qwen 0.6B repo path fixed (#501) — now points at the published mlx-community repo so the model actually downloads on Apple Silicon.
macOS system audio survives backgrounding (#486, closes #41) — WKWebView was tearing down the audio session when the app lost focus, silently killing system-audio capture.
MLX backend miniaudio dependency pinned (#506) — mlx_audio.stt needs it at runtime and nothing else transitively pulled it in, so --no-deps installs were breaking on first use.

Landing / Docs

New /download page (#487) — no more dumping first-time visitors onto the GitHub releases list. The API example snippet on the landing page also got an accuracy pass.
Download redirects work behind reverse proxies (#498) — uses the public origin instead of localhost when resolving platform-specific installer URLs.
MDX docs audited against the multi-engine backend (#484) — stale single-engine assumptions removed.
Three more tutorials + mobile navbar / hero CTA fixes (#483).

Linux

Still not shipping. The re-enable attempt (#488) landed on main but CI still hangs in the tauri-action bundler step on ubuntu-22.04 — no output for 25+ minutes after rpm bundling, even with createUpdaterArtifacts: false and --bundles deb,rpm. The matrix entry is disabled again for 0.4.2; the ubuntu-specific setup steps stay in the workflow so re-enabling is a one-line change once we identify the hang. Next release will take another pass.

New Contributors

@shekharyv — download redirects behind reverse proxies (#498)

Assets 23

19 Apr 01:53

github-actions

Immutable

v0.4.1

3c1e851

v0.4.1

A fast follow-up to 0.4.0 focused on making the new engines actually load in the production binary — plus generation cancellation, Linux system-audio capture, and the repo's first PR-time type check. Five first-time contributors shipped in this release.

0.4.0 introduced three new TTS engines, but the frozen PyInstaller binary tripped over several Python-ecosystem quirks that don't show up in the dev venv: transformers opening .py sources at runtime, scipy.stats._distn_infrastructure hitting a frozen-importer NameError, and chatterbox-multilingual failing to find its Chinese segmenter dictionary. This release patches all of those in one sweep.

Frozen-Binary Reliability (#438)

Kokoro now bundles .py sources alongside .pyc via --collect-all kokoro so transformers' _can_set_attn_implementation regex scan can read them — previously FileNotFoundError: kokoro/modules.py killed Kokoro loading in production builds
Chatterbox Multilingual now bundles spacy_pkuseg/dicts/default.pkl and the package's native .so extensions via --collect-all spacy_pkuseg — previously the Chinese word segmenter crashed with FileNotFoundError on first load
scipy.stats._distn_infrastructure — new runtime hook source-patches the trailing del obj (which raises NameError under PyInstaller's frozen importer because the preceding list comprehension evaluates empty) to globals().pop('obj', None), unblocking librosa → scipy.signal → scipy.stats for every TTS engine that depends on librosa
transformers.masking_utils — same runtime hook forces _is_torch_greater_or_equal_than_2_6 = False so the older sdpa_mask_older_torch path is selected; the 2.6+ path uses TransformGetItemToIndex(), a real torch._dynamo graph transform our permissive stub can't reproduce
torch._dynamo — no-op stub replaces the real module before transformers imports it, preventing the torch._numpy._ufuncs import crash (NameError: name 'name' is not defined) that blocked Kokoro and every engine pulling in flex_attention
.spec paths are now repo-relative instead of absolute, so the generated spec is portable across machines and CI

Generation

Cancel queued or running generations (#444) — new /generate/{id}/cancel endpoint and a Stop button on the history row while generating. The serial queue now tracks per-ID state (queued / running / cancelled) so queued jobs are skipped before the worker picks them up and running jobs are .cancel()-ed mid-flight; run_generation catches CancelledError and marks the row failed with a "cancelled" error.
Legacy data/ path prefix resolution (#440) — generations stored with the old data/ prefix under pre-0.4 installs now resolve correctly after the storage root moved, fixing 404s for historical audio.

Model Migration

Migration dialog no longer hangs when the cache is empty (#439) — the backend now emits a completion SSE event even when zero models are moved.
Storage-change flow surfaces a toast when there's nothing to migrate (#433) instead of proceeding with a no-op move and restarting the server.
Deleting all generations from a voice profile now deletes the associated version files and DB rows too (#447) — previously orphaned versions accumulated in storage.

Platform

Linux system audio capture (#457) — cpal's ALSA backend doesn't expose PulseAudio/PipeWire monitor sources by name, so the previous device-name search never matched and silently fell back to the microphone. Detection now uses pactl get-default-sink + pactl list short sources and routes via PULSE_SOURCE, with the name-based search retained as a fallback when pactl is absent.

Frontend CI

First PR-time quality gate (#418) — new .github/workflows/ci.yml runs bun run typecheck + bun run build:web on every PR. Fixed pre-existing type issues that were being suppressed with @ts-expect-error, cleaned up a dep-array typo ([platform.metadata.isTauricheckOnMountcheckForUpdates]) in useAutoUpdater, and removed 100+ lines of dead ModelItem code from ModelManagement.tsx.
Follow-up: widened apiClient.migrateModels() return type to include moved and errors so the storage-change handler typechecks against the real backend response (#470).

Docs

Clarified in the Quick Start + README that paralinguistic tags ([laugh], [sigh]) only work with Chatterbox Turbo; other engines read them as literal text (#450).

New Contributors

@Bortlesboat — generation cancellation (#444)
@gaojulong — migration dialog hang fix (#439)
@fuleinist — migration no-op toast (#433)
@erionjuniordeandrade-a11y — frontend CI + type hardening (#418)
@estefrac — Linux pactl system-audio capture (#457)

Assets 23

16 Apr 22:06

github-actions

Immutable

v0.4.0

625e1ba

v0.4.0

The biggest Voicebox release yet. Three new TTS engines bring the lineup to seven — HumeAI TADA, Kokoro 82M, and Qwen CustomVoice join Qwen3-TTS, LuxTTS, Chatterbox Multilingual, and Chatterbox Turbo. GPU support broadens to Intel Arc (XPU) and NVIDIA Blackwell (RTX 50-series), with runtime diagnostics that warn when your PyTorch build doesn't match your GPU. The CUDA backend is now split into independently versioned server and library archives, so upgrading no longer redownloads 4 GB of PyTorch/CUDA DLLs.

This release also marks a big community moment: 13 new contributors shipped fixes and features in 0.4.0. Thirty-plus bug fixes target the most-reported issues in the tracker — numpy 2.x TTS crashes, Windows background-server reliability, macOS 11 launch failures, audio playback silence, Stories clip-splitting races, history status staleness, and more.

New TTS Engines

HumeAI TADA — Expressive English & Multilingual (#296)

Added tada-1b (English) and tada-3b-ml (multilingual) backends
Replaced descript-audio-codec with a lightweight DAC shim to cut dependencies
Switched audio decoding to soundfile to sidestep torchcodec bundling issues
Redirected gated Llama tokenizer lookups to an ungated mirror so model loading works out of the box
Fixed tokenizer patch that was corrupting AutoTokenizer for other engines
Fixed TorchScript error in frozen builds

Kokoro 82M — Fast Lightweight TTS (#325)

Added Kokoro 82M engine with a new voice profile type system that distinguishes preset voices from cloned profiles
Profile grid now handles engine compatibility directly — removed redundant dropdown filtering
Tightened Kokoro profile handling so preset voices can't be edited like cloned profiles

Qwen CustomVoice (#328)

Added qwen-custom-voice preset engine backed by Qwen3-TTS
Enforced preset/profile engine compatibility across the generation flow
Floating generator now shows all engines instead of silently filtering

Voice Profile UX

Until 0.4, every engine in Voicebox was a cloning model, so every voice profile was usable with every engine and the profile grid just showed them all. Introducing Kokoro and Qwen CustomVoice — which work from preset voices rather than cloned samples — broke that assumption for the first time. An early cut on main filtered the grid by the selected engine, which left users running pre-release builds thinking their cloned voices had vanished whenever they switched to a preset-only engine.

This release ships the resolution before it ever reaches a tagged version:

Grey-out instead of filter — all profiles are always visible; unsupported ones render dimmed with a compatibility hint at the bottom of the grid
Auto-switch on selection — clicking a greyed-out profile selects it AND switches the engine to a compatible one, instead of silently doing nothing
Instruct toggle restored for Qwen CustomVoice — the floating generate box now reveals a delivery-instructions input (tone, emotion, pace) when CustomVoice is selected. Hidden across the board while the new multi-engine lineup was stabilizing because most engines don't honor the kwarg; now conditionally exposed only for the one engine that was actually trained for instruction-based style control
Supported profiles sort first; the grid scrolls the selected profile into view after engine/sort changes
Fixed engine desync on tab navigation — the form now initializes its engine from the store
Fixed the disabled-and-selected card click edge case by bouncing selection to re-trigger the auto-switch
Cleaned up scroll effect timers (requestAnimationFrame + setTimeout) to prevent stale DOM writes on unmount or rapid selection changes

GPU & Platform

Intel Arc (XPU) Support (#320)

First-class Intel Arc support across all PyTorch-based backends
Device-aware seeding, XPU detection in the GPU status panel, and setup flow detection
Reports correct device name and VRAM in settings

Blackwell / RTX 50-series Support (#316, #401)

Upgraded the CUDA backend from cu126 → cu128 for RTX 50-series support
Added sm_120+PTX to the CUDA build via TORCH_CUDA_ARCH_LIST for forward-compatibility with Blackwell architectures (closes 5 open reports: #386, #395, #396, #399, #400)
GPU settings UI fixes around install/uninstall state

GPU Compatibility Diagnostics (#367, adapted)

New check_cuda_compatibility() compares the current device's compute capability against the bundled PyTorch's architecture list
Health endpoint exposes a gpu_compatibility_warning field so the UI can surface mismatches
Startup logs a WARN when the installed PyTorch build doesn't support the detected GPU
GPU status label shows [UNSUPPORTED - see logs] — no more silent "no kernel image" failures

Split CUDA Backend (#298)

CUDA backend now ships as two independently versioned archives: a small server binary and a large libs archive (the ~4 GB of PyTorch/CUDA DLLs)
Upgrading Voicebox no longer redownloads the libs archive when only the server binary changed
Added asyncio.Lock around download_cuda_binary() so auto-update and manual download can't race on the same temp file (#428)
Updated package_cuda.py for PyInstaller 6.18 onedir layout
Temp archives are always cleaned up on failure, even when the install aborts mid-extract

Bug Fixes

Critical: TTS Generation

numpy 2.x torch.from_numpy crash (#361) — torch compiled against numpy 1.x ABI fails silently when paired with numpy 2.x, causing RuntimeError: Numpy is not available / Unable to create tensor on every TTS request in bundled macOS Intel / Rosetta builds. Pinned numpy<2.0 in requirements and added a PyInstaller runtime hook with a ctypes.memmove fallback as belt-and-suspenders. Hardened afterward to raise on unknown dtypes instead of silently reinterpreting bytes as float32.

Platform Reliability

Windows background server (#402) — "keep server running after close" now actually keeps the server running. The HTTP /watchdog/disable request could lose the race against process exit on Windows; added a .keep-running sentinel file as a synchronous fallback, with stale-sentinel cleanup on startup to avoid orphan server processes
macOS 11 launch crash (#424) — weak-linked ScreenCaptureKit so the app can launch on macOS < 12.3 instead of crashing at dyld resolution. Gated system audio capture behind a real sw_vers version check so unsupported systems cleanly advertise "not available" rather than crashing at runtime
macOS Intel (x86_64) setup (#416) — relaxed torch>=2.7.0 → torch>=2.2.0. PyTorch dropped pre-built x86_64 wheels after 2.2.2, so Intel Mac devs could no longer pip install. Now resolves to the latest compatible torch per platform
Offline model loading (#318) — Qwen TTS and Whisper force offline mode when loading cached models, so startup works without network access
GUI startup with external server (#319) — fixed GUI launch when pointed at a remote/external server, and added data refresh on server switch; hardened health validation and error handling
Qwen3-TTS cache split on Windows (adapted from #218) — route Qwen3TTSModel.from_pretrained through hf_constants.HF_HUB_CACHE so the speech tokenizer and preprocessor_config.json resolve from a single cache root
Qwen3-TTS bundling (#305) — bundle qwen_tts source files in the PyInstaller build to fix inspect.getsource errors in frozen builds
Backend import paths (#345) — moved lazy imports to top-level with absolute paths to resolve the "Failed to Save" preset error caused by ModuleNotFoundError in production builds
Effects service import (#384) — fixed ModuleNotFoundError on preset create/update by switching to relative imports (#349)

Audio & Playback

cpal stream silent playback (#405) — cpal::Stream was dropped on function return immediately after play(), causing every playback to fall silent. Now holds the stream until either the buffer drains or the stop flag fires (#404)

Stories & History

Clip-splitting race (#403) — rapid double-clicks on split could race through split_story_item with inconsistent state. Added with_for_update() row locking on the backend and an isPending guard on the frontend (#366)
History status staleness (#394) — GET /history/{id} was hardcoding status="completed" regardless of the DB row, breaking any client polling for job completion. Now returns status, error, engine, model_size, and is_favorited from the actual row
"Clear failed" bulk button (#412) — new DELETE /history/failed endpoint and a header strip showing `"N failed generati...

Assets 23

17 Mar 09:46

github-actions

Immutable

v0.3.0

ac68052

v0.3.0

This release rewrites the backend into a modular architecture, overhauls the settings UI into routed sub-pages, fixes audio player freezing, migrates documentation to Fumadocs, and ships a batch of bug fixes targeting the most-reported issues from the tracker.

The backend's 3,000-line monolith main.py has been decomposed into domain routers, a services layer, and a proper database package. A style guide and ruff configuration now enforce consistency. On the frontend, settings have been split into dedicated routed pages with server logs, a changelog viewer, and an about page. The audio player no longer freezes mid-playback, and model loading status is now visible in the UI. Seven user-reported bugs have been fixed, including server crashes during sample uploads, generation list staleness, cryptic error messages, and CUDA support for RTX 50-series GPUs.

Settings Overhaul (#294)

Split settings into routed sub-tabs: General, Generation, GPU, Logs, Changelog, About
Added live server log viewer with auto-scroll
Added in-app changelog page that parses CHANGELOG.md at build time
Added About page with version info, license, and generation folder quick-open
Extracted reusable SettingRow component for consistent setting layouts

Audio Player Fix (#293)

Fixed audio player freezing during playback
Improved playback UX with better state management and listener cleanup
Fixed restart race condition during regeneration
Added stable keys for audio element re-rendering
Improved accessibility across player controls

Backend Refactor (#285)

Extracted all routes from main.py into 13 domain routers under backend/routes/ — main.py dropped from ~3,100 lines to ~10
Moved CRUD and service modules into backend/services/, platform detection into backend/utils/
Split monolithic database.py into a database/ package with separate models, session, migrations, and seed modules
Added backend/STYLE_GUIDE.md and pyproject.toml with ruff linting config
Removed dead code: unused _get_cuda_dll_excludes, stale studio.py, example_usage.py, old Makefile
Deduplicated shared logic across TTS backends into backends/base.py
Improved startup logging with version, platform, data directory, and database stats
Fixed startup database session leak — sessions now rollback and close in finally block
Isolated shutdown unload calls so one backend failure doesn't block the others
Handled null duration in story_items migration
Reject model migration when target is a subdirectory of source cache

Documentation Rewrite (#288)

Migrated docs site from Mintlify to Fumadocs (Next.js-based)
Rewrote introduction and root page with content from README
Added "Edit on GitHub" links and last-updated timestamps on all pages
Generated OpenAPI spec and auto-generated API reference pages
Removed stale planning docs (CUDA_BACKEND_SWAP, EXTERNAL_PROVIDERS, MLX_AUDIO, TTS_PROVIDER_ARCHITECTURE, etc.)
Sidebar groups now expand by default; root redirects to /docs
Added OG image metadata and /og preview page

UI & Frontend

Added model loading status indicator and effects preset dropdown (3187344)
Fixed take-label race condition during regeneration
Added accessible focus styling to select component
Softened select focus indicator opacity
Addressed 4 critical and 12 major issues from CodeRabbit review

Bug Fixes (#295)

Fixed sample uploads crashing the server — audio decoding now runs in a thread pool instead of blocking the async event loop (#278)
Fixed generation list not updating when a generation completes — switched to refetchQueries for reliable cache busting, added SSE error fallback, and page reset on completion (#231)
Fixed error toasts showing [object Object] instead of the actual error message (#290)
Added Whisper model selection (base, small, medium, large, turbo) and expanded language support to the /transcribe endpoint (#233)
Upgraded CUDA backend build from cu121 to cu126 for RTX 50-series (Blackwell) GPU support (#289)
Handled client disconnects in SSE and streaming endpoints to suppress [Errno 32] Broken Pipe errors (#248)
Fixed Docker build failure from pip hash mismatch on Qwen3-TTS dependencies (#286)
Added 50 MB upload size limit with chunked reads to prevent unbounded memory allocation on sample uploads
Eliminated redundant double audio decode in sample processing pipeline

Platform Fixes

Replaced netstat with TcpStream + PowerShell for Windows port detection (#277)
Fixed Docker frontend build and cleaned up Docker docs
Fixed macOS download links to use .dmg instead of .app.tar.gz
Added dynamic download redirect routes to landing site

Release Tooling

Added draft-release-notes and release-bump agent skills
Wired CI release workflow to extract notes from CHANGELOG.md for GitHub Releases
Backfilled changelog with all historical releases

Assets 22

15 Mar 23:28

github-actions

v0.2.3

34e17bd

v0.2.3

The "it works in dev but not in prod" release. This version fixes a series of PyInstaller bundling issues that prevented model downloading, loading, generation, and progress tracking from working in production builds.

Model Downloads Now Actually Work

The v0.2.1/v0.2.2 builds could not download or load models that weren't already cached from a dev install. This release fixes the entire chain:

Chatterbox, Chatterbox Turbo, and LuxTTS all download, load, and generate correctly in bundled builds
Real-time download progress — byte-level progress bars now work in production. The root cause: huggingface_hub silently disables tqdm progress bars based on logger level, which prevented our progress tracker from receiving byte updates. We now force-enable the internal counter regardless.
Fixed Python 3.12.0 code.replace() bug — the macOS build was on Python 3.12.0, which has a known CPython bug that corrupts bytecode when PyInstaller rewrites code objects. This caused NameError: name 'obj' is not defined crashes during scipy/torch imports. Upgraded to Python 3.12.13.

PyInstaller Fixes

Collect all inflect files — typeguard's @typechecked decorator calls inspect.getsource() at import time, which needs .py source files, not just bytecode. Fixes LuxTTS "could not get source code" error.
Collect all perth files — bundles the pretrained watermark model (hparams.yaml, .pth.tar) needed by Chatterbox at runtime
Collect all piper_phonemize files — bundles espeak-ng-data/ (phoneme tables, language dicts) needed by LuxTTS for text-to-phoneme conversion
Set ESPEAK_DATA_PATH in frozen builds so the espeak-ng C library finds the bundled data instead of looking at /usr/share/espeak-ng-data/
Collect all linacodec files — fixes inspect.getsource error in Vocos codec
Collect all zipvoice files — fixes source code lookup in LuxTTS voice cloning
Copy metadata for requests, transformers, huggingface-hub, tokenizers, safetensors, tqdm — fixes importlib.metadata lookups in frozen binary
Add hidden imports for chatterbox, chatterbox_turbo, luxtts, zipvoice backends
Add multiprocessing.freeze_support() to fix resource_tracker subprocess crash in frozen binary
--noconsole now only applied on Windows — macOS/Linux need stdout/stderr for Tauri sidecar log capture
Hardened sys.stdout/sys.stderr devnull redirect to test writability, not just None check

Updater

Fixed updater artifact generation with v1Compatible for tauri-action signature files
Updated tauri-action to v0.6 to fix updater JSON and .sig generation

Other Fixes

Full traceback logging on all backend model loading errors (was just str(e) before)

Assets 21

15 Mar 16:55

github-actions

v0.2.2

e7f749f

v0.2.2

UPDATE: I'm working on a rewrite of the model downloading, it's absolute hell and takes a while to test as it always works in dev and never in prod builds. Will have a solution up ASAP. If you're eager to test 0.2.x please compile from source. Next update will solve model downloading and the updater issue for good.

v0.2.2

Fix Chatterbox model support in bundled builds [SIKE fixed in 0.2.3]
Fix LuxTTS/ZipVoice support in bundled builds [SIKE fixed in 0.2.3]
Auto-update CUDA binary when app version changes
CUDA download progress bar
Fix server process staying alive on macOS (SIGHUP handling, watchdog grace period)
Hide console window when running CUDA binary on Windows

Assets 12

15 Mar 15:27

github-actions

v0.2.1

a5269d2

v0.2.1

The best local voice cloning tool, just got better...

See the new website: https://voicebox.sh

Released 2026-03-15 — v0.2.1 on GitHub (version bump due to an immutable release tag on GitHub)

Voicebox v0.1.x was a single-engine voice cloning app built around Qwen3-TTS. v0.2.0 is a ground-up rethink: four TTS engines, 23 languages, paralinguistic emotion controls, a post-processing effects pipeline, unlimited generation length, an async generation queue, and support for every major GPU vendor. Plus Docker.

New TTS Engines

Multi-Engine Architecture

Voicebox now runs four independent TTS engines behind a thread-safe per-engine backend registry. Switch engines per-generation from a single dropdown — no restart required.

Engine	Languages	Size	Key Strengths
Qwen3-TTS 1.7B	10	~3.5 GB	Highest quality, delivery instructions ("speak slowly", "whisper")
Qwen3-TTS 0.6B	10	~1.2 GB	Lighter, faster variant
LuxTTS	English	~300 MB	CPU-friendly, 48 kHz output, 150x realtime
Chatterbox Multilingual	23	~3.2 GB	Broadest language coverage, zero-shot cloning
Chatterbox Turbo	English	~1.5 GB	350M params, low latency, paralinguistic tags

Chatterbox Multilingual — 23 Languages (#257)

Zero-shot voice cloning in Arabic, Chinese, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swahili, Swedish, and Turkish. The language dropdown dynamically filters to show only languages supported by the selected engine.

LuxTTS — Lightweight English TTS (#254)

A fast, CPU-friendly English engine. ~300 MB download, 48 kHz output, runs at 150x realtime on CPU. Good for quick drafts and machines without a GPU.

Chatterbox Turbo — Expressive English (#258)

A fast 350M-parameter English model with inline paralinguistic tags.

Paralinguistic Tags Autocomplete (#265)

Type / in the text input with Chatterbox Turbo selected to open an autocomplete for 9 expressive tags that the model synthesizes inline with speech:

[laugh] [chuckle] [gasp] [cough] [sigh] [groan] [sniff] [shush] [clear throat]

Tags render as inline badges in a rich text editor and serialize cleanly to the API.

Generation

Unlimited Generation Length — Auto-Chunking (#266)

Long text is now automatically split at sentence boundaries, generated per-chunk, and crossfaded back together. Engine-agnostic — works with all four engines.

Auto-chunking limit slider — 100–5,000 chars (default 800)
Crossfade slider — 0–200ms (default 50ms), or 0 for a hard cut
Max text length raised to 50,000 characters
Smart splitting respects abbreviations (Dr., e.g., a.m.), CJK punctuation, and never breaks inside [tags]

Asynchronous Generation Queue (#269)

Generation is now fully non-blocking. Submit a generation and start typing the next one immediately.

Serial execution queue prevents GPU contention
Real-time SSE status streaming (generating → completed / failed)
Failed generations can be retried without re-entering text
Stale generations from crashes are auto-recovered on startup
Generating status pill shown inline in the story editor

Generation Versions

Every generation now supports multiple versions with provenance tracking:

Original — the unprocessed TTS output, always preserved
Effects versions — apply different effects chains to create new versions from any source
Takes — regenerate with the same text/voice but a new seed
Source tracking — each version records which version it was derived from
Version pinning in stories — pin a specific version to a story track clip
Favorites — star generations for quick access

Language Parameter Fix

Qwen TTS models now correctly receive the selected language. The generation form syncs with the voice profile's language setting.

Post-Processing Effects (#271)

A full audio effects system powered by Spotify's pedalboard library. Apply effects after generation, preview in real time, and build reusable presets.

Effect	Description
Pitch Shift	±12 semitones
Reverb	Room size, damping, wet/dry mix
Delay	Adjustable time, feedback, mix
Chorus / Flanger	Modulated delay — short for metallic, long for lush
Compressor	Threshold, ratio, attack, release
Gain	-40 to +40 dB
High-Pass Filter	Configurable cutoff frequency
Low-Pass Filter	Configurable cutoff frequency

4 built-in presets — Robotic, Radio, Echo Chamber, Deep Voice
Custom presets — create unlimited drag-and-drop effect chains
Per-profile default effects — assign a chain to a voice profile, auto-applies to every generation
Live preview — audition effects against existing audio before committing
Source version selection — apply effects to any version of a generation, not just the latest

Platform Support

Windows Support (#272)

Full Windows support with CUDA GPU detection, cross-platform justfile, and clean server shutdown using taskkill /T for the process tree.

Linux (#262)

Pre-built Linux binaries are not available for this release — the release CI is still broken on Linux and we're working on fixing it. However, this release includes significant Linux improvements that make compiling from source much easier:

AMD ROCm GPU acceleration with automatic HSA_OVERRIDE_GFX_VERSION for unlisted GPUs
NVIDIA GBM buffer crash fix (#210)
WebKitGTK microphone access for voice sample recording
Cross-platform justfile with Linux-specific setup targets
See the README for build-from-source instructions — we'll ship Linux CI builds as soon as we can

NVIDIA CUDA Backend Swap (#252)

The CPU-only release can download and swap in a CUDA-accelerated backend from within the app. Downloads split parts to work around GitHub's 2GB asset limit, verifies SHA-256 checksums, and restarts the server automatically.

Intel Arc (XPU) and DirectML

PyTorch backend supports Intel Arc GPUs via IPEX/XPU and any-GPU on Windows via DirectML.

Docker + Web Deployment (#161)

Run Voicebox headless:

docker compose up

3-stage build, non-root runtime, health checks, persistent model cache. Binds to localhost only by default.

Whisper Turbo

Added openai/whisper-large-v3-turbo as a transcription model option.

Model Management (#268)

Per-model unload — free GPU memory without deleting downloaded models
Custom models directory — set VOICEBOX_MODELS_DIR to store models anywhere
Model folder migration — move all models to a new location with progress tracking
Download cancel/clear UI — cancel in-progress downloads, VS Code-style problems panel for errors (#238)
Restructured settings UI — server settings and model management split into cleaner sections

Security & Reliability

CORS hardening — explicit allowlist of local origins instead of wildcard *; extensible via VOICEBOX_CORS_ORIGINS (#88)
Network access toggle — fully disable outbound requests for air-gapped deployments (#133)
Offline crash fix — Voicebox no longer crashes when HuggingFace is unreachable (#152)
Atomic audio saves — two-phase write prevents corrupted files on crash or disk-full (#263)
Filesystem health endpoint — proactive disk space and directory writability checks
Errno-specific error messages — clear feedback for permission denied, disk full, missing directory
Chatterbox float64 dtype fix — patches S3Tokenizer and VoiceEncoder to cast float64→float32, preventing crashes on certain audio inputs (#264)
Watchdog respects keep-server-running — /watchdog/disable endpoint prevents the server from shutting down when the app window closes, if configured
Server shutdown on Windows — clean process tree termination with taskkill /T and os._exit fallback

Accessibility (#243)

Screen reader support (tested with NVDA/Narrator) across all major UI surfaces
Keyboard navigation for voice cards, history rows, model management, and story editor
State-aware aria-label attributes on all interactive controls

UI Polish

Redesigned landing page with animated ControlUI hero, multi-engine copy, model cards, and voice creator section (#274)
Glassmorphic active state for sidebar buttons with accent border shine
Voices tab overhaul with inline inspector
Re...

Assets 12

Releases: jamiepine/voicebox

v0.4.5

Reliability

Uh oh!

v0.4.4

Reliability

Uh oh!

voicebox v0.4.3

macOS

Backend

Uh oh!

voicebox v0.4.2

Internationalization (#508)

Reliability

Landing / Docs

Linux

New Contributors

Uh oh!

v0.4.1

Frozen-Binary Reliability (#438)

Generation

Model Migration

Platform

Frontend CI

Docs

New Contributors

Uh oh!

v0.4.0

New TTS Engines

HumeAI TADA — Expressive English & Multilingual (#296)

Kokoro 82M — Fast Lightweight TTS (#325)

Qwen CustomVoice (#328)

Voice Profile UX

GPU & Platform

Intel Arc (XPU) Support (#320)

Blackwell / RTX 50-series Support (#316, #401)

GPU Compatibility Diagnostics (#367, adapted)

Split CUDA Backend (#298)

Bug Fixes

Critical: TTS Generation

Platform Reliability

Audio & Playback

Stories & History

Uh oh!

v0.3.0

Settings Overhaul (#294)

Audio Player Fix (#293)

Backend Refactor (#285)

Documentation Rewrite (#288)

UI & Frontend

Bug Fixes (#295)

Platform Fixes

Release Tooling

Uh oh!

v0.2.3

Model Downloads Now Actually Work

PyInstaller Fixes

Updater

Other Fixes

Uh oh!

v0.2.2

v0.2.2

Uh oh!

v0.2.1

The best local voice cloning tool, just got better...

New TTS Engines

Multi-Engine Architecture

Chatterbox Multilingual — 23 Languages (#257)

LuxTTS — Lightweight English TTS (#254)

Chatterbox Turbo — Expressive English (#258)

Paralinguistic Tags Autocomplete (#265)

Generation

Unlimited Generation Length — Auto-Chunking (#266)

Asynchronous Generation Queue (#269)

Generation Versions

Language Parameter Fix

Post-Processing Effects (#271)

Platform Support

Windows Support (#272)

Linux (#262)