feat: chunked TTS generation for long text (engine-agnostic) by jamiepine · Pull Request #266 · jamiepine/voicebox

jamiepine · 2026-03-13T13:22:03Z

Summary

Long text that exceeds TTS model context limits is now automatically chunked, generated per-segment, and concatenated with crossfade. Works with all engines (Qwen, LuxTTS, Chatterbox, Chatterbox Turbo).

Text is split at sentence boundaries with abbreviation awareness (Dr., Mr., e.g., decimals), CJK punctuation support, and paralinguistic tag preservation ([laugh], [cough])
Audio chunks are joined with a 50ms linear crossfade to eliminate clicks
Short text (<=800 chars) uses the single-shot fast path with zero overhead
text max_length raised from 5,000 to 50,000 characters

Auto-chunking limit setting

A slider in Server Connection settings lets users control the chunk size (100-2000 chars, default 800). Lower values improve quality for long outputs by keeping each chunk well within the model's context window. The setting is persisted in localStorage.

Changes

File	Change
`backend/utils/chunked_tts.py`	New: text splitting, audio concat, engine-agnostic `generate_chunked()` wrapper
`backend/main.py`	Both `/generate` and `/generate/stream` route through `generate_chunked()`
`backend/models.py`	`text` max_length -> 50000, new `max_chunk_chars` field on `GenerationRequest`
`app/src/stores/serverStore.ts`	Persisted `maxChunkChars` setting (default 800)
`app/src/lib/api/types.ts`	`max_chunk_chars` on `GenerationRequest`
`app/src/lib/hooks/useGenerationForm.ts`	Wire store value through to API, text max -> 50000
`app/src/components/ServerSettings/ConnectionForm.tsx`	Auto-chunking limit slider UI

Design decisions

Engine-agnostic layer -- Chunking wraps the standard TTSBackend.generate() protocol in the dispatch layer (main.py), not inside individual backends. Every engine gets long text support for free.

Per-chunk trim -- Chatterbox trim_tts_output() is applied to each chunk individually, catching hallucinated trailing noise at every boundary instead of only at the end.

Per-chunk seed variation -- Seed N produces chunk seeds N, N+1, N+2, ... to avoid correlated RNG artefacts while keeping output deterministic.

Persisted setting, not per-request -- Chunk size is a "set and forget" preference stored in localStorage, not a form field cluttering the generation UI. It's still sent as max_chunk_chars on every request so the backend respects it.

No quality selector -- The original PR #99 included a "standard vs high" resampler (24kHz -> 44.1kHz via soxr). This was dropped because upsampling cannot recover frequencies above Nyquist that were never generated.

What this replaces from PR #99

Cherry-picks the core chunking idea from #99 and fixes:

Critical bug: language parameter was missing from _generate_single() (NameError on every generation)
Design flaw: Only modified one of five backends; now engine-agnostic
Design flaw: max_length=50000 applied globally but only Qwen got chunking; now all engines chunk
Misleading feature: Removed snake-oil quality selector / soxr upsampling
Bug: Same seed passed to all chunks; now varies per chunk
Improvement: Abbreviation-aware splitting, CJK support, paralinguistic tag safety

Closes #99

cc @glaucusj-sai -- thanks for the original implementation, the sentence-boundary splitting and crossfade concat ideas carried through

Summary by CodeRabbit

New Features
- Increased text input limit to 50,000 characters.
- Intelligent, configurable text chunking for long TTS (default 800 chars) and crossfade between segments (default 50 ms).
- Client and API accept max_chunk_chars and crossfade_ms; UI sliders added to adjust them.
- Per-segment trimming and smoother audio concatenation for long-text speech.
UI
- Added generation settings panel and connection health/status indicators in Server Settings.

Text exceeding max_chunk_chars (default 800) is automatically split at sentence boundaries, generated per-chunk, and concatenated with a 50ms crossfade. Works with all engines (Qwen, LuxTTS, Chatterbox, Turbo). - Abbreviation-aware sentence splitter (Dr., Mr., e.g., decimals) - CJK sentence-ending punctuation support - Paralinguistic tag preservation ([laugh], [cough], etc.) - Per-chunk seed variation to avoid correlated RNG artefacts - Per-chunk Chatterbox trim (catches hallucination at each boundary) - max_chunk_chars exposed as per-request param on GenerationRequest - Text max_length raised to 50,000 characters Closes #99

coderabbitai · 2026-03-13T13:22:26Z

📝 Walkthrough

Walkthrough

Adds engine-agnostic chunked TTS: sentence/clause-aware text splitting, per-chunk generation (with optional per-engine trimming), crossfade-based audio concatenation, and wiring through backend endpoints plus frontend settings for chunk size and crossfade.

Changes

Cohort / File(s)	Summary
Chunked Generation Utilities `backend/utils/chunked_tts.py`	New module: `split_text_into_chunks`, `concatenate_audio_chunks`, `generate_chunked`, DEFAULT_MAX_CHUNK_CHARS; sentence/clause-aware splitting, per-chunk seeds, optional `trim_fn`, and crossfade concatenation.
Backend Generation Integration `backend/main.py`	Replaces direct TTS calls at three sites with `generate_chunked`, passes `max_chunk_chars` and `crossfade_ms`, wires per-engine `trim_fn` for chatterbox variants, and preserves duration/sample-rate handling.
Models / Validation `backend/models.py`	`GenerationRequest.text` max_length increased 5,000→50,000; added `max_chunk_chars: int` and `crossfade_ms: int` with defaults and bounds.
Frontend: API types & Hooks `app/src/lib/api/types.ts`, `app/src/lib/hooks/useGenerationForm.ts`	Adds optional `max_chunk_chars?` and `crossfade_ms?` to request types; bump generation form text max length to 50,000; hook includes `max_chunk_chars` and `crossfade_ms` in mutation payload.
Frontend: Persistent Store `app/src/stores/serverStore.ts`	Adds persisted `maxChunkChars` (default 800) and `crossfadeMs` (default 50) with setters.
Frontend: Server Settings UI `app/src/components/ServerSettings/GenerationSettings.tsx`, `app/src/components/ServerSettings/ConnectionForm.tsx`, `app/src/components/ServerTab/ServerTab.tsx`	New GenerationSettings component (sliders for chunk size and crossfade); ConnectionForm shows connection health; ServerTab now renders GenerationSettings instead of ServerStatus; wiring to store.
Frontend: Misc UI adjustments `app/src/components/ServerSettings/GpuAcceleration.tsx`, `app/src/components/ServerSettings/ModelManagement.tsx`, `app/src/components/ServerSettings/ServerStatus.tsx`	UI simplifications and reflows: removed ModelProgress UI, refined model lists and badges, and simplified GPU/status renderings.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as "API Handler"
    participant Chunker as "Text Chunker"
    participant Backend as "TTS Backend"
    participant Trimmer as "Trim Fn"
    participant Concat as "Audio Concatenator"

    Client->>API: POST /tts/generate (text, max_chunk_chars, crossfade_ms, voice_prompt...)
    API->>Chunker: split_text_into_chunks(text, max_chunk_chars)
    Chunker-->>API: chunks[]

    loop for each chunk
        API->>Backend: generate(chunk, voice_prompt, language, seed, instruct)
        Backend-->>API: audio_chunk, sample_rate
        alt trim_fn present
            API->>Trimmer: trim_fn(audio_chunk, sample_rate)
            Trimmer-->>API: trimmed_chunk
        else
            Note right of API: use audio_chunk as-is
        end
    end

    API->>Concat: concatenate_audio_chunks(chunks[], sample_rate, crossfade_ms)
    Concat-->>API: final_audio
    API-->>Client: audio_stream + duration

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: Chatterbox TTS engine with multilingual voice cloning #257 — Adds chatterbox TTS backend and trim_tts_output; directly related to per-engine trimming and trim_fn usage.
feat: Chatterbox Turbo engine + per-engine language lists #258 — Introduces chatterbox_turbo handling and overlaps with backend/main.py generation wiring used here.
#99 — Linked feature request describing chunked TTS, quality selector, and settings endpoints that align with these changes.

Poem

🐰 I nibble lines and stitch their song,

Chunks hop in order, none jarringly long.
Seeds for each, trims soft and neat,
Crossfades hum and make ends meet.
Hooray — a seamless voice, spry and strong.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main feature: implementing chunked TTS generation for handling long text inputs in an engine-agnostic manner.
Linked Issues check	✅ Passed	The PR fully addresses all core coding requirements from `#99`: engine-agnostic chunked TTS with sentence-boundary splitting, 50ms crossfade concatenation, fast short-text path, 50k char limit, per-chunk trimming for Chatterbox, seed variation, persisted UI controls, and explicitly excludes soxr resampling/quality selector.
Out of Scope Changes check	✅ Passed	UI reorganization in ServerSettings (ModelManagement, GpuAcceleration, ServerStatus, ServerTab) is incidental to chunking integration and not core to the feature; all changes properly support chunking or simplify existing UI without introducing unrelated functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/chunked-tts

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Persisted setting (default 800 chars) controls how long text is split before generation. Lower values improve quality for long outputs by keeping each chunk well within the model's context window. - Slider in Server Connection settings (100–2000 chars, step 50) - Stored in localStorage via Zustand persist - Passed as max_chunk_chars on every generation request - Frontend text limit raised to 50,000 to match backend

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

app/src/stores/serverStore.ts (1)

36-37: Normalize maxChunkChars in the setter to prevent invalid persisted values.

If persisted state is edited/corrupted, out-of-range values can flow to requests and trigger backend validation failures. Clamp in the store setter.

Proposed change

+const MIN_CHUNK_CHARS = 100;
+const MAX_CHUNK_CHARS = 5000;
+const CHUNK_STEP = 50;
+
 export const useServerStore = create<ServerStore>()(
   persist(
     (set) => ({
@@
       maxChunkChars: 800,
-      setMaxChunkChars: (value) => set({ maxChunkChars: value }),
+      setMaxChunkChars: (value) => {
+        const normalized = Math.max(
+          MIN_CHUNK_CHARS,
+          Math.min(MAX_CHUNK_CHARS, Math.round(value / CHUNK_STEP) * CHUNK_STEP),
+        );
+        set({ maxChunkChars: normalized });
+      },
     }),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/stores/serverStore.ts` around lines 36 - 37, The setter
setMaxChunkChars currently writes the raw value into state which lets
out-of-range or corrupted persisted values slip through; update setMaxChunkChars
to clamp the incoming value to a safe range (e.g. between a defined
MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds around 800) using
Math.max/Math.min before calling set({ maxChunkChars: ... }) so the store always
persists and returns a normalized, valid maxChunkChars.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 170-172: ConnectionForm currently sets the slider max to 2000
which conflicts with the feature/back-end contract; change the slider max prop
from 2000 to 5000 for both slider occurrences in ConnectionForm (the two blocks
around the shown min/max/step props) and update any related constants or
validation logic in the ConnectionForm component that enforce a 2000 upper bound
so they match the new 5000 limit.

---

Nitpick comments:
In `@app/src/stores/serverStore.ts`:
- Around line 36-37: The setter setMaxChunkChars currently writes the raw value
into state which lets out-of-range or corrupted persisted values slip through;
update setMaxChunkChars to clamp the incoming value to a safe range (e.g.
between a defined MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds
around 800) using Math.max/Math.min before calling set({ maxChunkChars: ... })
so the store always persists and returns a normalized, valid maxChunkChars.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f66caec-9ecc-43dd-bf5e-093f67303ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 70ca7f6 and 837f852.

📒 Files selected for processing (4)

app/src/components/ServerSettings/ConnectionForm.tsx
app/src/lib/api/types.ts
app/src/lib/hooks/useGenerationForm.ts
app/src/stores/serverStore.ts

coderabbitai · 2026-03-13T13:40:17Z

app/src/components/ServerSettings/ConnectionForm.tsx

+              min={100}
+              max={2000}
+              step={50}


⚠️ Potential issue | 🟠 Major

Expand slider max to 5000 to match the feature contract.

Line 170-172 currently caps at 2000, but the feature/back-end contract supports up to 5000. This limits valid user configuration unnecessarily.

Proposed change

<Slider id="maxChunkChars" value={[maxChunkChars]} onValueChange={([value]) => setMaxChunkChars(value)} min={100} - max={2000} + max={5000} step={50} aria-label="Auto-chunking character limit" /> Long text is split into chunks at sentence boundaries before generating. Lower values - can improve quality for long outputs. Default is 800. + can improve quality for long outputs. Range is 100–5000. Default is 800. 

Also applies to: 175-178

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 170 - 172, ConnectionForm currently sets the slider max to 2000 which conflicts with the feature/back-end contract; change the slider max prop from 2000 to 5000 for both slider occurrences in ConnectionForm (the two blocks around the shown min/max/step props) and update any related constants or validation logic in the ConnectionForm component that enforce a 2000 upper bound so they match the new 5000 limit.

Persisted setting (default 50ms) controls how audio chunks are blended together. Set to 0 for a clean hard cut with no overlap.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/chunked_tts.py`:
- Around line 141-143: The loop that scans CJK sentence-end punctuation updates
the split candidate `best` without verifying tag boundaries, which can split
inside bracketed tags; modify the for-loop in chunked_tts.py (the re.finditer
block) to skip any match where `_inside_bracket_tag(match.start())` returns true
and only update `best` when the match is outside bracket tags, preserving the
"never split inside [ ... ]" guarantee; ensure you call `_inside_bracket_tag`
with the match start index before assigning to `best`.
- Around line 298-301: concatenate_audio_chunks currently assumes a single
sample rate by only using the first chunk's rate (sample_rate variable), which
can produce incorrect timing/pitch if later chunks have different rates; update
chunked_tts.py to iterate audio_chunks and validate that each chunk's sample
rate matches the chosen sample_rate (or resample mismatched chunks to
sample_rate) before calling concatenate_audio_chunks, raising a clear error if
automatic resampling is not performed; reference the sample_rate variable,
audio_chunks list, and the concatenate_audio_chunks function when locating where
to add the validation/resampling logic.
- Around line 125-135: The period handling currently only grabs contiguous
letters before the dot, missing dotted abbreviations like "e.g." or "U.S.";
update the block when char == "." to scan backwards collecting letters and dots
(stop at first char that is neither letter nor '.'), form the candidate token
from that span, normalize it by removing '.' and lowercasing, then check that
normalized token against _ABBREVIATIONS (e.g., treat "e.g." -> "eg"); preserve
the decimal-number skip by ensuring if the character immediately before the
scanned span is a digit you continue as before. Use the existing local names
(char, pos, text, word_start, _ABBREVIATIONS) to locate and replace the logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 398401aa-e13a-4674-aa73-6c1585e829ff

📥 Commits

Reviewing files that changed from the base of the PR and between 837f852 and 97292ec.

📒 Files selected for processing (7)

app/src/components/ServerSettings/ConnectionForm.tsx
app/src/lib/api/types.ts
app/src/lib/hooks/useGenerationForm.ts
app/src/stores/serverStore.ts
backend/main.py
backend/models.py
backend/utils/chunked_tts.py

🚧 Files skipped from review as they are similar to previous changes (4)

app/src/components/ServerSettings/ConnectionForm.tsx
app/src/stores/serverStore.ts
app/src/lib/hooks/useGenerationForm.ts
backend/main.py

coderabbitai · 2026-03-13T13:55:00Z

backend/utils/chunked_tts.py

+        if char == ".":
+            # Walk backwards to find the preceding word
+            word_start = pos - 1
+            while word_start >= 0 and text[word_start].isalpha():
+                word_start -= 1
+            word = text[word_start + 1 : pos].lower()
+            if word in _ABBREVIATIONS:
+                continue
+            # Skip decimal numbers (digit immediately before the period)
+            if word_start >= 0 and text[word_start].isdigit():
+                continue


⚠️ Potential issue | 🟠 Major

Abbreviation detection misses dotted forms like e.g. and U.S.

Current token extraction only reads contiguous letters before .. That makes dotted abbreviations look like g/s, so they can be misclassified as sentence boundaries.

💡 Proposed fix

if char == ".": - # Walk backwards to find the preceding word - word_start = pos - 1 - while word_start >= 0 and text[word_start].isalpha(): - word_start -= 1 - word = text[word_start + 1 : pos].lower() - if word in _ABBREVIATIONS: + # Capture preceding token including internal dots (e.g. "e.g", "u.s") + token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos]) + token = token_match.group(1).lower() if token_match else "" + if token in _ABBREVIATIONS: continue # Skip decimal numbers (digit immediately before the period) - if word_start >= 0 and text[word_start].isdigit(): + if pos > 0 and text[pos - 1].isdigit(): continue

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if char == ".":

# Walk backwards to find the preceding word

word_start = pos - 1

while word_start >= 0 and text[word_start].isalpha():

word_start -= 1

word = text[word_start + 1 : pos].lower()

if word in _ABBREVIATIONS:

continue

# Skip decimal numbers (digit immediately before the period)

if word_start >= 0 and text[word_start].isdigit():

continue

if char == ".":

# Capture preceding token including internal dots (e.g. "e.g", "u.s")

token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos])

token = token_match.group(1).lower() if token_match else ""

if token in _ABBREVIATIONS:

continue

# Skip decimal numbers (digit immediately before the period)

if pos > 0 and text[pos - 1].isdigit():

continue

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/utils/chunked_tts.py` around lines 125 - 135, The period handling currently only grabs contiguous letters before the dot, missing dotted abbreviations like "e.g." or "U.S."; update the block when char == "." to scan backwards collecting letters and dots (stop at first char that is neither letter nor '.'), form the candidate token from that span, normalize it by removing '.' and lowercasing, then check that normalized token against _ABBREVIATIONS (e.g., treat "e.g." -> "eg"); preserve the decimal-number skip by ensuring if the character immediately before the scanned span is a digit you continue as before. Use the existing local names (char, pos, text, word_start, _ABBREVIATIONS) to locate and replace the logic.

coderabbitai · 2026-03-13T13:55:00Z

backend/utils/chunked_tts.py

+    for m in re.finditer(r"[\u3002\uff01\uff1f]", text):
+        if m.start() > best:
+            best = m.start()


⚠️ Potential issue | 🟠 Major

CJK sentence-end matching can still split inside bracket tags

The CJK pass updates best without checking _inside_bracket_tag, which breaks the “never split inside [ ... ]” guarantee for tags containing CJK punctuation.

💡 Proposed fix

# CJK sentence-ending punctuation for m in re.finditer(r"[\u3002\uff01\uff1f]", text): - if m.start() > best: + if _inside_bracket_tag(text, m.start()): + continue + if m.start() > best: best = m.start()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/utils/chunked_tts.py` around lines 141 - 143, The loop that scans CJK sentence-end punctuation updates the split candidate `best` without verifying tag boundaries, which can split inside bracketed tags; modify the for-loop in chunked_tts.py (the re.finditer block) to skip any match where `_inside_bracket_tag(match.start())` returns true and only update `best` when the match is outside bracket tags, preserving the "never split inside [ ... ]" guarantee; ensure you call `_inside_bracket_tag` with the match start index before assigning to `best`.

coderabbitai · 2026-03-13T13:55:00Z

backend/utils/chunked_tts.py

+        if sample_rate is None:
+            sample_rate = chunk_sr
+
+    audio = concatenate_audio_chunks(audio_chunks, sample_rate, crossfade_ms=crossfade_ms)


⚠️ Potential issue | 🟠 Major

Validate sample rate consistency across generated chunks

Concatenation assumes all chunk audio is at one sample rate, but only the first chunk’s rate is kept. If any later chunk differs, output timing/pitch is wrong while metadata still reports the first rate.

💡 Proposed fix

audio_chunks.append(np.asarray(chunk_audio, dtype=np.float32)) if sample_rate is None: sample_rate = chunk_sr + elif chunk_sr != sample_rate: + raise ValueError( + f"Inconsistent sample rates across chunks: expected {sample_rate}, got {chunk_sr}" + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/utils/chunked_tts.py` around lines 298 - 301, concatenate_audio_chunks currently assumes a single sample rate by only using the first chunk's rate (sample_rate variable), which can produce incorrect timing/pitch if later chunks have different rates; update chunked_tts.py to iterate audio_chunks and validate that each chunk's sample rate matches the chosen sample_rate (or resample mismatched chunks to sample_rate) before calling concatenate_audio_chunks, raising a clear error if automatic resampling is not performed; reference the sample_rate variable, audio_chunks list, and the concatenate_audio_chunks function when locating where to add the validation/resampling logic.

- Split chunking/crossfade sliders into dedicated GenerationSettings card - Merge connection status badges into ConnectionForm (remove ServerStatus card) - 2-column grid layout for the entire settings page - GPU Acceleration: remove icon, badge, and MLX info card - Models: merge 'Other Voice Models' into single 'Voice Generation' list - Model detail: remove 'Downloaded' badge, border above actions, swap badges above stats row, match disk size font to stats

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

app/src/components/ServerSettings/ConnectionForm.tsx (1)

66-66: Avoid adding a non-interactive card to the tab order.

Line 66 sets tabIndex={0} on a container with no keyboard interaction, which adds an extra focus stop.

Proposed fix

-    <Card role="region" aria-label="Server Connection" tabIndex={0}>
+    <Card role="region" aria-label="Server Connection">

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ConnectionForm.tsx` at line 66, The Card in
ConnectionForm (the Card element in the ServerSettings/ConnectionForm component)
is non-interactive but has tabIndex={0}, creating an unnecessary focus stop;
remove the tabIndex={0} attribute from the Card (or change it to tabIndex={-1}
only if you must keep it focusable for a specific a11y reason) so the container
is not in the tab order, leaving interactive child controls tabbable as usual.

app/src/components/ServerSettings/GpuAcceleration.tsx (1)

225-229: Avoid hardcoding PyTorch for every non-MLX native backend.

At Line 228, the fallback label can misreport backend type. Prefer explicit mapping from health.backend_type and a neutral fallback.

♻️ Proposed refactor

+  const backendLabel =
+    health.backend_type === 'mlx'
+      ? 'MLX'
+      : health.backend_type === 'pytorch'
+        ? 'PyTorch'
+        : health.backend_type || 'GPU backend';

...
             {isCurrentlyCuda
               ? 'CUDA (GPU accelerated)'
               : hasNativeGpu
-                ? `${health.backend_type === 'mlx' ? 'MLX' : 'PyTorch'} (GPU accelerated)`
+                ? `${backendLabel} (GPU accelerated)`
                 : 'CPU'}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx` around lines 225 -
229, The JSX is hardcoding "PyTorch" for any native non-MLX backend; update the
logic in GpuAcceleration.tsx to derive the label from health.backend_type
instead of a fixed "PyTorch" string — introduce a small mapping or helper (e.g.,
getBackendLabel or BACKEND_LABELS) and use it in the ternary expression with a
neutral fallback like 'Native' or the raw backend_type when unknown, keeping the
existing isCurrentlyCuda and hasNativeGpu checks intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 116-118: The VRAM badge is currently gated by a truthy check so a
valid value of 0 won’t render; update the condition in ConnectionForm (the check
around health.vram_used_mb before rendering <Badge>) to explicitly allow zero by
checking for null/undefined (e.g., health.vram_used_mb !== null &&
health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0
MB displays while still avoiding rendering when the value is absent.

In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 615-620: hfModelInfo.downloads and hfModelInfo.likes are used
without null-safety which can crash if the HF API omits them; in the render
where Download and Heart icons are shown, pass guarded values to formatDownloads
(e.g., use optional chaining or nullish coalescing) so
formatDownloads(hfModelInfo.downloads ?? 0) and
formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the
ModelManagement component render around the Download/Heart spans to default
missing numeric fields to 0 before calling formatDownloads.

---

Nitpick comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Line 66: The Card in ConnectionForm (the Card element in the
ServerSettings/ConnectionForm component) is non-interactive but has
tabIndex={0}, creating an unnecessary focus stop; remove the tabIndex={0}
attribute from the Card (or change it to tabIndex={-1} only if you must keep it
focusable for a specific a11y reason) so the container is not in the tab order,
leaving interactive child controls tabbable as usual.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx`:
- Around line 225-229: The JSX is hardcoding "PyTorch" for any native non-MLX
backend; update the logic in GpuAcceleration.tsx to derive the label from
health.backend_type instead of a fixed "PyTorch" string — introduce a small
mapping or helper (e.g., getBackendLabel or BACKEND_LABELS) and use it in the
ternary expression with a neutral fallback like 'Native' or the raw backend_type
when unknown, keeping the existing isCurrentlyCuda and hasNativeGpu checks
intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 139496ff-45e2-45c7-ba9a-385e6de80bd9

📥 Commits

Reviewing files that changed from the base of the PR and between 97292ec and 9aa7080.

📒 Files selected for processing (6)

app/src/components/ServerSettings/ConnectionForm.tsx
app/src/components/ServerSettings/GenerationSettings.tsx
app/src/components/ServerSettings/GpuAcceleration.tsx
app/src/components/ServerSettings/ModelManagement.tsx
app/src/components/ServerSettings/ServerStatus.tsx
app/src/components/ServerTab/ServerTab.tsx

coderabbitai · 2026-03-13T14:36:49Z

app/src/components/ServerSettings/ConnectionForm.tsx

+              {health.vram_used_mb && (
+                <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
+              )}


⚠️ Potential issue | 🟡 Minor

Render VRAM badge when usage is 0 MB.

Line 116 uses a truthy check, so 0 won’t render even though it’s valid data.

Proposed fix

- {health.vram_used_mb && ( + {health.vram_used_mb != null && ( <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge> )}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{health.vram_used_mb && (

<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>

)}

{health.vram_used_mb != null && (

<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>

)}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 116 - 118, The VRAM badge is currently gated by a truthy check so a valid value of 0 won’t render; update the condition in ConnectionForm (the check around health.vram_used_mb before rendering <Badge>) to explicitly allow zero by checking for null/undefined (e.g., health.vram_used_mb !== null && health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0 MB displays while still avoiding rendering when the value is absent.

coderabbitai · 2026-03-13T14:36:49Z

app/src/components/ServerSettings/ModelManagement.tsx

+                        <Download className="h-3.5 w-3.5" />
+                        {formatDownloads(hfModelInfo.downloads)}
+                      </span>
+                      <span className="flex items-center gap-1" title="Likes">
+                        <Heart className="h-3.5 w-3.5" />
+                        {formatDownloads(hfModelInfo.likes)}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify local typing and current call-site assumptions for HuggingFace stats fields. # 1) Check HuggingFaceModelInfo field nullability in frontend types fd "types.ts" app/src -t f -x rg -n "interface HuggingFaceModelInfo|downloads|likes" {} # 2) Inspect all formatDownloads call sites in this file rg -n "formatDownloads\\(" app/src/components/ServerSettings/ModelManagement.tsx -C 2

Repository: jamiepine/voicebox

Length of output: 914

🏁 Script executed:

# Get more context around the problematic lines sed -n '590,650p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 3353

🏁 Script executed:

# Check where hfModelInfo is defined/assigned rg -n "hfModelInfo\s*=" app/src/components/ServerSettings/ModelManagement.tsx -B 2 -A 2

Repository: jamiepine/voicebox

Length of output: 44

🏁 Script executed:

# Look for where this JSX is rendered and how hfModelInfo is obtained rg -n "hfModelInfo" app/src/components/ServerSettings/ModelManagement.tsx | head -20

Repository: jamiepine/voicebox

Length of output: 1215

🏁 Script executed:

# Get the complete HuggingFaceModelInfo interface definition sed -n '131,150p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 824

🏁 Script executed:

# Also check the formatDownloads function to see if it guards against undefined sed -n '51,60p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 459

🏁 Script executed:

# Get HuggingFaceModelInfo interface from the file rg -n "export interface HuggingFaceModelInfo" app/src/components/ServerSettings/ModelManagement.tsx -A 20

Repository: jamiepine/voicebox

Length of output: 44

🏁 Script executed:

# Find types.ts files in the project fd "types.ts" app/src -type f

Repository: jamiepine/voicebox

Length of output: 232

🏁 Script executed:

# Check if HuggingFaceModelInfo is imported or defined elsewhere rg -n "interface HuggingFaceModelInfo" app/src --type ts

Repository: jamiepine/voicebox

Length of output: 130

🏁 Script executed:

# Get the full HuggingFaceModelInfo interface definition sed -n '131,165p' app/src/lib/api/types.ts | cat -n

Repository: jamiepine/voicebox

Length of output: 1030

Guard HF stats formatting against missing numeric fields.

At line 616 and 620, unguarded hfModelInfo.downloads and hfModelInfo.likes lack defensive checks despite receiving external API data that may not include these fields. While the type definition marks them as required, the codebase defensively guards other optional API fields. Add null-safe defaults to prevent crashes if the HuggingFace API response omits these fields.

🛡️ Proposed fix

- {formatDownloads(hfModelInfo.downloads)} + {formatDownloads(hfModelInfo.downloads ?? 0)} ... - {formatDownloads(hfModelInfo.likes)} + {formatDownloads(hfModelInfo.likes ?? 0)}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

<Download className="h-3.5 w-3.5" />

{formatDownloads(hfModelInfo.downloads)}





<Heart className="h-3.5 w-3.5" />

{formatDownloads(hfModelInfo.likes)}

<Download className="h-3.5 w-3.5" />

{formatDownloads(hfModelInfo.downloads ?? 0)}





<Heart className="h-3.5 w-3.5" />

{formatDownloads(hfModelInfo.likes ?? 0)}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 615 - 620, hfModelInfo.downloads and hfModelInfo.likes are used without null-safety which can crash if the HF API omits them; in the render where Download and Heart icons are shown, pass guarded values to formatDownloads (e.g., use optional chaining or nullish coalescing) so formatDownloads(hfModelInfo.downloads ?? 0) and formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the ModelManagement component render around the Download/Heart spans to default missing numeric fields to 0 before calling formatDownloads.

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

feat: add chunk crossfade slider (0ms = hard cut)

97292ec

Persisted setting (default 50ms) controls how audio chunks are blended together. Set to 0 for a clean hard cut with no overlap.

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

jamiepine merged commit 325714b into main Mar 13, 2026
1 check passed

This was referenced Mar 13, 2026

feat: model management improvements and folder migration #268

Merged

feat: async generation queue #269

Merged

Conversation

jamiepine commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Auto-chunking limit setting

Changes

Design decisions

What this replaces from PR #99

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading