Skip to content

feat: chunked TTS generation for long text (engine-agnostic)#266

Merged
jamiepine merged 4 commits intomainfrom
feat/chunked-tts
Mar 13, 2026
Merged

feat: chunked TTS generation for long text (engine-agnostic)#266
jamiepine merged 4 commits intomainfrom
feat/chunked-tts

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 13, 2026

Summary

Long text that exceeds TTS model context limits is now automatically chunked, generated per-segment, and concatenated with crossfade. Works with all engines (Qwen, LuxTTS, Chatterbox, Chatterbox Turbo).

  • Text is split at sentence boundaries with abbreviation awareness (Dr., Mr., e.g., decimals), CJK punctuation support, and paralinguistic tag preservation ([laugh], [cough])
  • Audio chunks are joined with a 50ms linear crossfade to eliminate clicks
  • Short text (<=800 chars) uses the single-shot fast path with zero overhead
  • text max_length raised from 5,000 to 50,000 characters

Auto-chunking limit setting

A slider in Server Connection settings lets users control the chunk size (100-2000 chars, default 800). Lower values improve quality for long outputs by keeping each chunk well within the model's context window. The setting is persisted in localStorage.

Changes

File Change
backend/utils/chunked_tts.py New: text splitting, audio concat, engine-agnostic generate_chunked() wrapper
backend/main.py Both /generate and /generate/stream route through generate_chunked()
backend/models.py text max_length -> 50000, new max_chunk_chars field on GenerationRequest
app/src/stores/serverStore.ts Persisted maxChunkChars setting (default 800)
app/src/lib/api/types.ts max_chunk_chars on GenerationRequest
app/src/lib/hooks/useGenerationForm.ts Wire store value through to API, text max -> 50000
app/src/components/ServerSettings/ConnectionForm.tsx Auto-chunking limit slider UI

Design decisions

Engine-agnostic layer -- Chunking wraps the standard TTSBackend.generate() protocol in the dispatch layer (main.py), not inside individual backends. Every engine gets long text support for free.

Per-chunk trim -- Chatterbox trim_tts_output() is applied to each chunk individually, catching hallucinated trailing noise at every boundary instead of only at the end.

Per-chunk seed variation -- Seed N produces chunk seeds N, N+1, N+2, ... to avoid correlated RNG artefacts while keeping output deterministic.

Persisted setting, not per-request -- Chunk size is a "set and forget" preference stored in localStorage, not a form field cluttering the generation UI. It's still sent as max_chunk_chars on every request so the backend respects it.

No quality selector -- The original PR #99 included a "standard vs high" resampler (24kHz -> 44.1kHz via soxr). This was dropped because upsampling cannot recover frequencies above Nyquist that were never generated.

What this replaces from PR #99

Cherry-picks the core chunking idea from #99 and fixes:

  • Critical bug: language parameter was missing from _generate_single() (NameError on every generation)
  • Design flaw: Only modified one of five backends; now engine-agnostic
  • Design flaw: max_length=50000 applied globally but only Qwen got chunking; now all engines chunk
  • Misleading feature: Removed snake-oil quality selector / soxr upsampling
  • Bug: Same seed passed to all chunks; now varies per chunk
  • Improvement: Abbreviation-aware splitting, CJK support, paralinguistic tag safety

Closes #99

cc @glaucusj-sai -- thanks for the original implementation, the sentence-boundary splitting and crossfade concat ideas carried through

Summary by CodeRabbit

  • New Features

    • Increased text input limit to 50,000 characters.
    • Intelligent, configurable text chunking for long TTS (default 800 chars) and crossfade between segments (default 50 ms).
    • Client and API accept max_chunk_chars and crossfade_ms; UI sliders added to adjust them.
    • Per-segment trimming and smoother audio concatenation for long-text speech.
  • UI

    • Added generation settings panel and connection health/status indicators in Server Settings.

Text exceeding max_chunk_chars (default 800) is automatically split at
sentence boundaries, generated per-chunk, and concatenated with a 50ms
crossfade.  Works with all engines (Qwen, LuxTTS, Chatterbox, Turbo).

- Abbreviation-aware sentence splitter (Dr., Mr., e.g., decimals)
- CJK sentence-ending punctuation support
- Paralinguistic tag preservation ([laugh], [cough], etc.)
- Per-chunk seed variation to avoid correlated RNG artefacts
- Per-chunk Chatterbox trim (catches hallucination at each boundary)
- max_chunk_chars exposed as per-request param on GenerationRequest
- Text max_length raised to 50,000 characters

Closes #99
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Adds engine-agnostic chunked TTS: sentence/clause-aware text splitting, per-chunk generation (with optional per-engine trimming), crossfade-based audio concatenation, and wiring through backend endpoints plus frontend settings for chunk size and crossfade.

Changes

Cohort / File(s) Summary
Chunked Generation Utilities
backend/utils/chunked_tts.py
New module: split_text_into_chunks, concatenate_audio_chunks, generate_chunked, DEFAULT_MAX_CHUNK_CHARS; sentence/clause-aware splitting, per-chunk seeds, optional trim_fn, and crossfade concatenation.
Backend Generation Integration
backend/main.py
Replaces direct TTS calls at three sites with generate_chunked, passes max_chunk_chars and crossfade_ms, wires per-engine trim_fn for chatterbox variants, and preserves duration/sample-rate handling.
Models / Validation
backend/models.py
GenerationRequest.text max_length increased 5,000→50,000; added max_chunk_chars: int and crossfade_ms: int with defaults and bounds.
Frontend: API types & Hooks
app/src/lib/api/types.ts, app/src/lib/hooks/useGenerationForm.ts
Adds optional max_chunk_chars? and crossfade_ms? to request types; bump generation form text max length to 50,000; hook includes max_chunk_chars and crossfade_ms in mutation payload.
Frontend: Persistent Store
app/src/stores/serverStore.ts
Adds persisted maxChunkChars (default 800) and crossfadeMs (default 50) with setters.
Frontend: Server Settings UI
app/src/components/ServerSettings/GenerationSettings.tsx, app/src/components/ServerSettings/ConnectionForm.tsx, app/src/components/ServerTab/ServerTab.tsx
New GenerationSettings component (sliders for chunk size and crossfade); ConnectionForm shows connection health; ServerTab now renders GenerationSettings instead of ServerStatus; wiring to store.
Frontend: Misc UI adjustments
app/src/components/ServerSettings/GpuAcceleration.tsx, app/src/components/ServerSettings/ModelManagement.tsx, app/src/components/ServerSettings/ServerStatus.tsx
UI simplifications and reflows: removed ModelProgress UI, refined model lists and badges, and simplified GPU/status renderings.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as "API Handler"
    participant Chunker as "Text Chunker"
    participant Backend as "TTS Backend"
    participant Trimmer as "Trim Fn"
    participant Concat as "Audio Concatenator"

    Client->>API: POST /tts/generate (text, max_chunk_chars, crossfade_ms, voice_prompt...)
    API->>Chunker: split_text_into_chunks(text, max_chunk_chars)
    Chunker-->>API: chunks[]

    loop for each chunk
        API->>Backend: generate(chunk, voice_prompt, language, seed, instruct)
        Backend-->>API: audio_chunk, sample_rate
        alt trim_fn present
            API->>Trimmer: trim_fn(audio_chunk, sample_rate)
            Trimmer-->>API: trimmed_chunk
        else
            Note right of API: use audio_chunk as-is
        end
    end

    API->>Concat: concatenate_audio_chunks(chunks[], sample_rate, crossfade_ms)
    Concat-->>API: final_audio
    API-->>Client: audio_stream + duration
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I nibble lines and stitch their song,

Chunks hop in order, none jarringly long.
Seeds for each, trims soft and neat,
Crossfades hum and make ends meet.
Hooray — a seamless voice, spry and strong.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: implementing chunked TTS generation for handling long text inputs in an engine-agnostic manner.
Linked Issues check ✅ Passed The PR fully addresses all core coding requirements from #99: engine-agnostic chunked TTS with sentence-boundary splitting, 50ms crossfade concatenation, fast short-text path, 50k char limit, per-chunk trimming for Chatterbox, seed variation, persisted UI controls, and explicitly excludes soxr resampling/quality selector.
Out of Scope Changes check ✅ Passed UI reorganization in ServerSettings (ModelManagement, GpuAcceleration, ServerStatus, ServerTab) is incidental to chunking integration and not core to the feature; all changes properly support chunking or simplify existing UI without introducing unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/chunked-tts
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Persisted setting (default 800 chars) controls how long text is split
before generation.  Lower values improve quality for long outputs by
keeping each chunk well within the model's context window.

- Slider in Server Connection settings (100–2000 chars, step 50)
- Stored in localStorage via Zustand persist
- Passed as max_chunk_chars on every generation request
- Frontend text limit raised to 50,000 to match backend
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
app/src/stores/serverStore.ts (1)

36-37: Normalize maxChunkChars in the setter to prevent invalid persisted values.

If persisted state is edited/corrupted, out-of-range values can flow to requests and trigger backend validation failures. Clamp in the store setter.

Proposed change
+const MIN_CHUNK_CHARS = 100;
+const MAX_CHUNK_CHARS = 5000;
+const CHUNK_STEP = 50;
+
 export const useServerStore = create<ServerStore>()(
   persist(
     (set) => ({
@@
       maxChunkChars: 800,
-      setMaxChunkChars: (value) => set({ maxChunkChars: value }),
+      setMaxChunkChars: (value) => {
+        const normalized = Math.max(
+          MIN_CHUNK_CHARS,
+          Math.min(MAX_CHUNK_CHARS, Math.round(value / CHUNK_STEP) * CHUNK_STEP),
+        );
+        set({ maxChunkChars: normalized });
+      },
     }),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/stores/serverStore.ts` around lines 36 - 37, The setter
setMaxChunkChars currently writes the raw value into state which lets
out-of-range or corrupted persisted values slip through; update setMaxChunkChars
to clamp the incoming value to a safe range (e.g. between a defined
MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds around 800) using
Math.max/Math.min before calling set({ maxChunkChars: ... }) so the store always
persists and returns a normalized, valid maxChunkChars.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 170-172: ConnectionForm currently sets the slider max to 2000
which conflicts with the feature/back-end contract; change the slider max prop
from 2000 to 5000 for both slider occurrences in ConnectionForm (the two blocks
around the shown min/max/step props) and update any related constants or
validation logic in the ConnectionForm component that enforce a 2000 upper bound
so they match the new 5000 limit.

---

Nitpick comments:
In `@app/src/stores/serverStore.ts`:
- Around line 36-37: The setter setMaxChunkChars currently writes the raw value
into state which lets out-of-range or corrupted persisted values slip through;
update setMaxChunkChars to clamp the incoming value to a safe range (e.g.
between a defined MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds
around 800) using Math.max/Math.min before calling set({ maxChunkChars: ... })
so the store always persists and returns a normalized, valid maxChunkChars.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4f66caec-9ecc-43dd-bf5e-093f67303ad2

📥 Commits

Reviewing files that changed from the base of the PR and between 70ca7f6 and 837f852.

📒 Files selected for processing (4)
  • app/src/components/ServerSettings/ConnectionForm.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • app/src/stores/serverStore.ts

Comment on lines +170 to +172
min={100}
max={2000}
step={50}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expand slider max to 5000 to match the feature contract.

Line 170-172 currently caps at 2000, but the feature/back-end contract supports up to 5000. This limits valid user configuration unnecessarily.

Proposed change
             <Slider
               id="maxChunkChars"
               value={[maxChunkChars]}
               onValueChange={([value]) => setMaxChunkChars(value)}
               min={100}
-              max={2000}
+              max={5000}
               step={50}
               aria-label="Auto-chunking character limit"
             />
             <p className="text-sm text-muted-foreground">
               Long text is split into chunks at sentence boundaries before generating. Lower values
-              can improve quality for long outputs. Default is 800.
+              can improve quality for long outputs. Range is 100–5000. Default is 800.
             </p>

Also applies to: 175-178

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 170 - 172,
ConnectionForm currently sets the slider max to 2000 which conflicts with the
feature/back-end contract; change the slider max prop from 2000 to 5000 for both
slider occurrences in ConnectionForm (the two blocks around the shown
min/max/step props) and update any related constants or validation logic in the
ConnectionForm component that enforce a 2000 upper bound so they match the new
5000 limit.

Persisted setting (default 50ms) controls how audio chunks are blended
together.  Set to 0 for a clean hard cut with no overlap.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/utils/chunked_tts.py`:
- Around line 141-143: The loop that scans CJK sentence-end punctuation updates
the split candidate `best` without verifying tag boundaries, which can split
inside bracketed tags; modify the for-loop in chunked_tts.py (the re.finditer
block) to skip any match where `_inside_bracket_tag(match.start())` returns true
and only update `best` when the match is outside bracket tags, preserving the
"never split inside [ ... ]" guarantee; ensure you call `_inside_bracket_tag`
with the match start index before assigning to `best`.
- Around line 298-301: concatenate_audio_chunks currently assumes a single
sample rate by only using the first chunk's rate (sample_rate variable), which
can produce incorrect timing/pitch if later chunks have different rates; update
chunked_tts.py to iterate audio_chunks and validate that each chunk's sample
rate matches the chosen sample_rate (or resample mismatched chunks to
sample_rate) before calling concatenate_audio_chunks, raising a clear error if
automatic resampling is not performed; reference the sample_rate variable,
audio_chunks list, and the concatenate_audio_chunks function when locating where
to add the validation/resampling logic.
- Around line 125-135: The period handling currently only grabs contiguous
letters before the dot, missing dotted abbreviations like "e.g." or "U.S.";
update the block when char == "." to scan backwards collecting letters and dots
(stop at first char that is neither letter nor '.'), form the candidate token
from that span, normalize it by removing '.' and lowercasing, then check that
normalized token against _ABBREVIATIONS (e.g., treat "e.g." -> "eg"); preserve
the decimal-number skip by ensuring if the character immediately before the
scanned span is a digit you continue as before. Use the existing local names
(char, pos, text, word_start, _ABBREVIATIONS) to locate and replace the logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 398401aa-e13a-4674-aa73-6c1585e829ff

📥 Commits

Reviewing files that changed from the base of the PR and between 837f852 and 97292ec.

📒 Files selected for processing (7)
  • app/src/components/ServerSettings/ConnectionForm.tsx
  • app/src/lib/api/types.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • app/src/stores/serverStore.ts
  • backend/main.py
  • backend/models.py
  • backend/utils/chunked_tts.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • app/src/components/ServerSettings/ConnectionForm.tsx
  • app/src/stores/serverStore.ts
  • app/src/lib/hooks/useGenerationForm.ts
  • backend/main.py

Comment on lines +125 to +135
if char == ".":
# Walk backwards to find the preceding word
word_start = pos - 1
while word_start >= 0 and text[word_start].isalpha():
word_start -= 1
word = text[word_start + 1 : pos].lower()
if word in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
if word_start >= 0 and text[word_start].isdigit():
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Abbreviation detection misses dotted forms like e.g. and U.S.

Current token extraction only reads contiguous letters before .. That makes dotted abbreviations look like g/s, so they can be misclassified as sentence boundaries.

💡 Proposed fix
         if char == ".":
-            # Walk backwards to find the preceding word
-            word_start = pos - 1
-            while word_start >= 0 and text[word_start].isalpha():
-                word_start -= 1
-            word = text[word_start + 1 : pos].lower()
-            if word in _ABBREVIATIONS:
+            # Capture preceding token including internal dots (e.g. "e.g", "u.s")
+            token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos])
+            token = token_match.group(1).lower() if token_match else ""
+            if token in _ABBREVIATIONS:
                 continue
             # Skip decimal numbers (digit immediately before the period)
-            if word_start >= 0 and text[word_start].isdigit():
+            if pos > 0 and text[pos - 1].isdigit():
                 continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if char == ".":
# Walk backwards to find the preceding word
word_start = pos - 1
while word_start >= 0 and text[word_start].isalpha():
word_start -= 1
word = text[word_start + 1 : pos].lower()
if word in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
if word_start >= 0 and text[word_start].isdigit():
continue
if char == ".":
# Capture preceding token including internal dots (e.g. "e.g", "u.s")
token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos])
token = token_match.group(1).lower() if token_match else ""
if token in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
if pos > 0 and text[pos - 1].isdigit():
continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/chunked_tts.py` around lines 125 - 135, The period handling
currently only grabs contiguous letters before the dot, missing dotted
abbreviations like "e.g." or "U.S."; update the block when char == "." to scan
backwards collecting letters and dots (stop at first char that is neither letter
nor '.'), form the candidate token from that span, normalize it by removing '.'
and lowercasing, then check that normalized token against _ABBREVIATIONS (e.g.,
treat "e.g." -> "eg"); preserve the decimal-number skip by ensuring if the
character immediately before the scanned span is a digit you continue as before.
Use the existing local names (char, pos, text, word_start, _ABBREVIATIONS) to
locate and replace the logic.

Comment on lines +141 to +143
for m in re.finditer(r"[\u3002\uff01\uff1f]", text):
if m.start() > best:
best = m.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CJK sentence-end matching can still split inside bracket tags

The CJK pass updates best without checking _inside_bracket_tag, which breaks the “never split inside [ ... ]” guarantee for tags containing CJK punctuation.

💡 Proposed fix
     # CJK sentence-ending punctuation
     for m in re.finditer(r"[\u3002\uff01\uff1f]", text):
-        if m.start() > best:
+        if _inside_bracket_tag(text, m.start()):
+            continue
+        if m.start() > best:
             best = m.start()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/chunked_tts.py` around lines 141 - 143, The loop that scans CJK
sentence-end punctuation updates the split candidate `best` without verifying
tag boundaries, which can split inside bracketed tags; modify the for-loop in
chunked_tts.py (the re.finditer block) to skip any match where
`_inside_bracket_tag(match.start())` returns true and only update `best` when
the match is outside bracket tags, preserving the "never split inside [ ... ]"
guarantee; ensure you call `_inside_bracket_tag` with the match start index
before assigning to `best`.

Comment on lines +298 to +301
if sample_rate is None:
sample_rate = chunk_sr

audio = concatenate_audio_chunks(audio_chunks, sample_rate, crossfade_ms=crossfade_ms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate sample rate consistency across generated chunks

Concatenation assumes all chunk audio is at one sample rate, but only the first chunk’s rate is kept. If any later chunk differs, output timing/pitch is wrong while metadata still reports the first rate.

💡 Proposed fix
         audio_chunks.append(np.asarray(chunk_audio, dtype=np.float32))
         if sample_rate is None:
             sample_rate = chunk_sr
+        elif chunk_sr != sample_rate:
+            raise ValueError(
+                f"Inconsistent sample rates across chunks: expected {sample_rate}, got {chunk_sr}"
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/chunked_tts.py` around lines 298 - 301,
concatenate_audio_chunks currently assumes a single sample rate by only using
the first chunk's rate (sample_rate variable), which can produce incorrect
timing/pitch if later chunks have different rates; update chunked_tts.py to
iterate audio_chunks and validate that each chunk's sample rate matches the
chosen sample_rate (or resample mismatched chunks to sample_rate) before calling
concatenate_audio_chunks, raising a clear error if automatic resampling is not
performed; reference the sample_rate variable, audio_chunks list, and the
concatenate_audio_chunks function when locating where to add the
validation/resampling logic.

- Split chunking/crossfade sliders into dedicated GenerationSettings card
- Merge connection status badges into ConnectionForm (remove ServerStatus card)
- 2-column grid layout for the entire settings page
- GPU Acceleration: remove icon, badge, and MLX info card
- Models: merge 'Other Voice Models' into single 'Voice Generation' list
- Model detail: remove 'Downloaded' badge, border above actions, swap
  badges above stats row, match disk size font to stats
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
app/src/components/ServerSettings/ConnectionForm.tsx (1)

66-66: Avoid adding a non-interactive card to the tab order.

Line 66 sets tabIndex={0} on a container with no keyboard interaction, which adds an extra focus stop.

Proposed fix
-    <Card role="region" aria-label="Server Connection" tabIndex={0}>
+    <Card role="region" aria-label="Server Connection">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ConnectionForm.tsx` at line 66, The Card in
ConnectionForm (the Card element in the ServerSettings/ConnectionForm component)
is non-interactive but has tabIndex={0}, creating an unnecessary focus stop;
remove the tabIndex={0} attribute from the Card (or change it to tabIndex={-1}
only if you must keep it focusable for a specific a11y reason) so the container
is not in the tab order, leaving interactive child controls tabbable as usual.
app/src/components/ServerSettings/GpuAcceleration.tsx (1)

225-229: Avoid hardcoding PyTorch for every non-MLX native backend.

At Line 228, the fallback label can misreport backend type. Prefer explicit mapping from health.backend_type and a neutral fallback.

♻️ Proposed refactor
+  const backendLabel =
+    health.backend_type === 'mlx'
+      ? 'MLX'
+      : health.backend_type === 'pytorch'
+        ? 'PyTorch'
+        : health.backend_type || 'GPU backend';

...
             {isCurrentlyCuda
               ? 'CUDA (GPU accelerated)'
               : hasNativeGpu
-                ? `${health.backend_type === 'mlx' ? 'MLX' : 'PyTorch'} (GPU accelerated)`
+                ? `${backendLabel} (GPU accelerated)`
                 : 'CPU'}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx` around lines 225 -
229, The JSX is hardcoding "PyTorch" for any native non-MLX backend; update the
logic in GpuAcceleration.tsx to derive the label from health.backend_type
instead of a fixed "PyTorch" string — introduce a small mapping or helper (e.g.,
getBackendLabel or BACKEND_LABELS) and use it in the ternary expression with a
neutral fallback like 'Native' or the raw backend_type when unknown, keeping the
existing isCurrentlyCuda and hasNativeGpu checks intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 116-118: The VRAM badge is currently gated by a truthy check so a
valid value of 0 won’t render; update the condition in ConnectionForm (the check
around health.vram_used_mb before rendering <Badge>) to explicitly allow zero by
checking for null/undefined (e.g., health.vram_used_mb !== null &&
health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0
MB displays while still avoiding rendering when the value is absent.

In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 615-620: hfModelInfo.downloads and hfModelInfo.likes are used
without null-safety which can crash if the HF API omits them; in the render
where Download and Heart icons are shown, pass guarded values to formatDownloads
(e.g., use optional chaining or nullish coalescing) so
formatDownloads(hfModelInfo.downloads ?? 0) and
formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the
ModelManagement component render around the Download/Heart spans to default
missing numeric fields to 0 before calling formatDownloads.

---

Nitpick comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Line 66: The Card in ConnectionForm (the Card element in the
ServerSettings/ConnectionForm component) is non-interactive but has
tabIndex={0}, creating an unnecessary focus stop; remove the tabIndex={0}
attribute from the Card (or change it to tabIndex={-1} only if you must keep it
focusable for a specific a11y reason) so the container is not in the tab order,
leaving interactive child controls tabbable as usual.

In `@app/src/components/ServerSettings/GpuAcceleration.tsx`:
- Around line 225-229: The JSX is hardcoding "PyTorch" for any native non-MLX
backend; update the logic in GpuAcceleration.tsx to derive the label from
health.backend_type instead of a fixed "PyTorch" string — introduce a small
mapping or helper (e.g., getBackendLabel or BACKEND_LABELS) and use it in the
ternary expression with a neutral fallback like 'Native' or the raw backend_type
when unknown, keeping the existing isCurrentlyCuda and hasNativeGpu checks
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 139496ff-45e2-45c7-ba9a-385e6de80bd9

📥 Commits

Reviewing files that changed from the base of the PR and between 97292ec and 9aa7080.

📒 Files selected for processing (6)
  • app/src/components/ServerSettings/ConnectionForm.tsx
  • app/src/components/ServerSettings/GenerationSettings.tsx
  • app/src/components/ServerSettings/GpuAcceleration.tsx
  • app/src/components/ServerSettings/ModelManagement.tsx
  • app/src/components/ServerSettings/ServerStatus.tsx
  • app/src/components/ServerTab/ServerTab.tsx

Comment on lines +116 to +118
{health.vram_used_mb && (
<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Render VRAM badge when usage is 0 MB.

Line 116 uses a truthy check, so 0 won’t render even though it’s valid data.

Proposed fix
-              {health.vram_used_mb && (
+              {health.vram_used_mb != null && (
                 <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
               )}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{health.vram_used_mb && (
<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
)}
{health.vram_used_mb != null && (
<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 116 - 118,
The VRAM badge is currently gated by a truthy check so a valid value of 0 won’t
render; update the condition in ConnectionForm (the check around
health.vram_used_mb before rendering <Badge>) to explicitly allow zero by
checking for null/undefined (e.g., health.vram_used_mb !== null &&
health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0
MB displays while still avoiding rendering when the value is absent.

Comment on lines +615 to +620
<Download className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.downloads)}
</span>
<span className="flex items-center gap-1" title="Likes">
<Heart className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.likes)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify local typing and current call-site assumptions for HuggingFace stats fields.

# 1) Check HuggingFaceModelInfo field nullability in frontend types
fd "types.ts" app/src -t f -x rg -n "interface HuggingFaceModelInfo|downloads|likes" {}

# 2) Inspect all formatDownloads call sites in this file
rg -n "formatDownloads\\(" app/src/components/ServerSettings/ModelManagement.tsx -C 2

Repository: jamiepine/voicebox

Length of output: 914


🏁 Script executed:

# Get more context around the problematic lines
sed -n '590,650p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 3353


🏁 Script executed:

# Check where hfModelInfo is defined/assigned
rg -n "hfModelInfo\s*=" app/src/components/ServerSettings/ModelManagement.tsx -B 2 -A 2

Repository: jamiepine/voicebox

Length of output: 44


🏁 Script executed:

# Look for where this JSX is rendered and how hfModelInfo is obtained
rg -n "hfModelInfo" app/src/components/ServerSettings/ModelManagement.tsx | head -20

Repository: jamiepine/voicebox

Length of output: 1215


🏁 Script executed:

# Get the complete HuggingFaceModelInfo interface definition
sed -n '131,150p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 824


🏁 Script executed:

# Also check the formatDownloads function to see if it guards against undefined
sed -n '51,60p' app/src/components/ServerSettings/ModelManagement.tsx | cat -n

Repository: jamiepine/voicebox

Length of output: 459


🏁 Script executed:

# Get HuggingFaceModelInfo interface from the file
rg -n "export interface HuggingFaceModelInfo" app/src/components/ServerSettings/ModelManagement.tsx -A 20

Repository: jamiepine/voicebox

Length of output: 44


🏁 Script executed:

# Find types.ts files in the project
fd "types.ts" app/src -type f

Repository: jamiepine/voicebox

Length of output: 232


🏁 Script executed:

# Check if HuggingFaceModelInfo is imported or defined elsewhere
rg -n "interface HuggingFaceModelInfo" app/src --type ts

Repository: jamiepine/voicebox

Length of output: 130


🏁 Script executed:

# Get the full HuggingFaceModelInfo interface definition
sed -n '131,165p' app/src/lib/api/types.ts | cat -n

Repository: jamiepine/voicebox

Length of output: 1030


Guard HF stats formatting against missing numeric fields.

At line 616 and 620, unguarded hfModelInfo.downloads and hfModelInfo.likes lack defensive checks despite receiving external API data that may not include these fields. While the type definition marks them as required, the codebase defensively guards other optional API fields. Add null-safe defaults to prevent crashes if the HuggingFace API response omits these fields.

🛡️ Proposed fix
-                        {formatDownloads(hfModelInfo.downloads)}
+                        {formatDownloads(hfModelInfo.downloads ?? 0)}
...
-                        {formatDownloads(hfModelInfo.likes)}
+                        {formatDownloads(hfModelInfo.likes ?? 0)}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<Download className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.downloads)}
</span>
<span className="flex items-center gap-1" title="Likes">
<Heart className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.likes)}
<Download className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.downloads ?? 0)}
</span>
<span className="flex items-center gap-1" title="Likes">
<Heart className="h-3.5 w-3.5" />
{formatDownloads(hfModelInfo.likes ?? 0)}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 615 -
620, hfModelInfo.downloads and hfModelInfo.likes are used without null-safety
which can crash if the HF API omits them; in the render where Download and Heart
icons are shown, pass guarded values to formatDownloads (e.g., use optional
chaining or nullish coalescing) so formatDownloads(hfModelInfo.downloads ?? 0)
and formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the
ModelManagement component render around the Download/Heart spans to default
missing numeric fields to 0 before calling formatDownloads.

@jamiepine jamiepine merged commit 325714b into main Mar 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant