feat: chunked TTS generation for long text (engine-agnostic)#266
feat: chunked TTS generation for long text (engine-agnostic)#266
Conversation
Text exceeding max_chunk_chars (default 800) is automatically split at sentence boundaries, generated per-chunk, and concatenated with a 50ms crossfade. Works with all engines (Qwen, LuxTTS, Chatterbox, Turbo). - Abbreviation-aware sentence splitter (Dr., Mr., e.g., decimals) - CJK sentence-ending punctuation support - Paralinguistic tag preservation ([laugh], [cough], etc.) - Per-chunk seed variation to avoid correlated RNG artefacts - Per-chunk Chatterbox trim (catches hallucination at each boundary) - max_chunk_chars exposed as per-request param on GenerationRequest - Text max_length raised to 50,000 characters Closes #99
📝 WalkthroughWalkthroughAdds engine-agnostic chunked TTS: sentence/clause-aware text splitting, per-chunk generation (with optional per-engine trimming), crossfade-based audio concatenation, and wiring through backend endpoints plus frontend settings for chunk size and crossfade. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as "API Handler"
participant Chunker as "Text Chunker"
participant Backend as "TTS Backend"
participant Trimmer as "Trim Fn"
participant Concat as "Audio Concatenator"
Client->>API: POST /tts/generate (text, max_chunk_chars, crossfade_ms, voice_prompt...)
API->>Chunker: split_text_into_chunks(text, max_chunk_chars)
Chunker-->>API: chunks[]
loop for each chunk
API->>Backend: generate(chunk, voice_prompt, language, seed, instruct)
Backend-->>API: audio_chunk, sample_rate
alt trim_fn present
API->>Trimmer: trim_fn(audio_chunk, sample_rate)
Trimmer-->>API: trimmed_chunk
else
Note right of API: use audio_chunk as-is
end
end
API->>Concat: concatenate_audio_chunks(chunks[], sample_rate, crossfade_ms)
Concat-->>API: final_audio
API-->>Client: audio_stream + duration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Persisted setting (default 800 chars) controls how long text is split before generation. Lower values improve quality for long outputs by keeping each chunk well within the model's context window. - Slider in Server Connection settings (100–2000 chars, step 50) - Stored in localStorage via Zustand persist - Passed as max_chunk_chars on every generation request - Frontend text limit raised to 50,000 to match backend
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
app/src/stores/serverStore.ts (1)
36-37: NormalizemaxChunkCharsin the setter to prevent invalid persisted values.If persisted state is edited/corrupted, out-of-range values can flow to requests and trigger backend validation failures. Clamp in the store setter.
Proposed change
+const MIN_CHUNK_CHARS = 100; +const MAX_CHUNK_CHARS = 5000; +const CHUNK_STEP = 50; + export const useServerStore = create<ServerStore>()( persist( (set) => ({ @@ maxChunkChars: 800, - setMaxChunkChars: (value) => set({ maxChunkChars: value }), + setMaxChunkChars: (value) => { + const normalized = Math.max( + MIN_CHUNK_CHARS, + Math.min(MAX_CHUNK_CHARS, Math.round(value / CHUNK_STEP) * CHUNK_STEP), + ); + set({ maxChunkChars: normalized }); + }, }),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/stores/serverStore.ts` around lines 36 - 37, The setter setMaxChunkChars currently writes the raw value into state which lets out-of-range or corrupted persisted values slip through; update setMaxChunkChars to clamp the incoming value to a safe range (e.g. between a defined MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds around 800) using Math.max/Math.min before calling set({ maxChunkChars: ... }) so the store always persists and returns a normalized, valid maxChunkChars.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 170-172: ConnectionForm currently sets the slider max to 2000
which conflicts with the feature/back-end contract; change the slider max prop
from 2000 to 5000 for both slider occurrences in ConnectionForm (the two blocks
around the shown min/max/step props) and update any related constants or
validation logic in the ConnectionForm component that enforce a 2000 upper bound
so they match the new 5000 limit.
---
Nitpick comments:
In `@app/src/stores/serverStore.ts`:
- Around line 36-37: The setter setMaxChunkChars currently writes the raw value
into state which lets out-of-range or corrupted persisted values slip through;
update setMaxChunkChars to clamp the incoming value to a safe range (e.g.
between a defined MIN_CHUNK_CHARS and MAX_CHUNK_CHARS or sensible numeric bounds
around 800) using Math.max/Math.min before calling set({ maxChunkChars: ... })
so the store always persists and returns a normalized, valid maxChunkChars.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4f66caec-9ecc-43dd-bf5e-093f67303ad2
📒 Files selected for processing (4)
app/src/components/ServerSettings/ConnectionForm.tsxapp/src/lib/api/types.tsapp/src/lib/hooks/useGenerationForm.tsapp/src/stores/serverStore.ts
| min={100} | ||
| max={2000} | ||
| step={50} |
There was a problem hiding this comment.
Expand slider max to 5000 to match the feature contract.
Line 170-172 currently caps at 2000, but the feature/back-end contract supports up to 5000. This limits valid user configuration unnecessarily.
Proposed change
<Slider
id="maxChunkChars"
value={[maxChunkChars]}
onValueChange={([value]) => setMaxChunkChars(value)}
min={100}
- max={2000}
+ max={5000}
step={50}
aria-label="Auto-chunking character limit"
/>
<p className="text-sm text-muted-foreground">
Long text is split into chunks at sentence boundaries before generating. Lower values
- can improve quality for long outputs. Default is 800.
+ can improve quality for long outputs. Range is 100–5000. Default is 800.
</p>Also applies to: 175-178
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 170 - 172,
ConnectionForm currently sets the slider max to 2000 which conflicts with the
feature/back-end contract; change the slider max prop from 2000 to 5000 for both
slider occurrences in ConnectionForm (the two blocks around the shown
min/max/step props) and update any related constants or validation logic in the
ConnectionForm component that enforce a 2000 upper bound so they match the new
5000 limit.
Persisted setting (default 50ms) controls how audio chunks are blended together. Set to 0 for a clean hard cut with no overlap.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/utils/chunked_tts.py`:
- Around line 141-143: The loop that scans CJK sentence-end punctuation updates
the split candidate `best` without verifying tag boundaries, which can split
inside bracketed tags; modify the for-loop in chunked_tts.py (the re.finditer
block) to skip any match where `_inside_bracket_tag(match.start())` returns true
and only update `best` when the match is outside bracket tags, preserving the
"never split inside [ ... ]" guarantee; ensure you call `_inside_bracket_tag`
with the match start index before assigning to `best`.
- Around line 298-301: concatenate_audio_chunks currently assumes a single
sample rate by only using the first chunk's rate (sample_rate variable), which
can produce incorrect timing/pitch if later chunks have different rates; update
chunked_tts.py to iterate audio_chunks and validate that each chunk's sample
rate matches the chosen sample_rate (or resample mismatched chunks to
sample_rate) before calling concatenate_audio_chunks, raising a clear error if
automatic resampling is not performed; reference the sample_rate variable,
audio_chunks list, and the concatenate_audio_chunks function when locating where
to add the validation/resampling logic.
- Around line 125-135: The period handling currently only grabs contiguous
letters before the dot, missing dotted abbreviations like "e.g." or "U.S.";
update the block when char == "." to scan backwards collecting letters and dots
(stop at first char that is neither letter nor '.'), form the candidate token
from that span, normalize it by removing '.' and lowercasing, then check that
normalized token against _ABBREVIATIONS (e.g., treat "e.g." -> "eg"); preserve
the decimal-number skip by ensuring if the character immediately before the
scanned span is a digit you continue as before. Use the existing local names
(char, pos, text, word_start, _ABBREVIATIONS) to locate and replace the logic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 398401aa-e13a-4674-aa73-6c1585e829ff
📒 Files selected for processing (7)
app/src/components/ServerSettings/ConnectionForm.tsxapp/src/lib/api/types.tsapp/src/lib/hooks/useGenerationForm.tsapp/src/stores/serverStore.tsbackend/main.pybackend/models.pybackend/utils/chunked_tts.py
🚧 Files skipped from review as they are similar to previous changes (4)
- app/src/components/ServerSettings/ConnectionForm.tsx
- app/src/stores/serverStore.ts
- app/src/lib/hooks/useGenerationForm.ts
- backend/main.py
| if char == ".": | ||
| # Walk backwards to find the preceding word | ||
| word_start = pos - 1 | ||
| while word_start >= 0 and text[word_start].isalpha(): | ||
| word_start -= 1 | ||
| word = text[word_start + 1 : pos].lower() | ||
| if word in _ABBREVIATIONS: | ||
| continue | ||
| # Skip decimal numbers (digit immediately before the period) | ||
| if word_start >= 0 and text[word_start].isdigit(): | ||
| continue |
There was a problem hiding this comment.
Abbreviation detection misses dotted forms like e.g. and U.S.
Current token extraction only reads contiguous letters before .. That makes dotted abbreviations look like g/s, so they can be misclassified as sentence boundaries.
💡 Proposed fix
if char == ".":
- # Walk backwards to find the preceding word
- word_start = pos - 1
- while word_start >= 0 and text[word_start].isalpha():
- word_start -= 1
- word = text[word_start + 1 : pos].lower()
- if word in _ABBREVIATIONS:
+ # Capture preceding token including internal dots (e.g. "e.g", "u.s")
+ token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos])
+ token = token_match.group(1).lower() if token_match else ""
+ if token in _ABBREVIATIONS:
continue
# Skip decimal numbers (digit immediately before the period)
- if word_start >= 0 and text[word_start].isdigit():
+ if pos > 0 and text[pos - 1].isdigit():
continue📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if char == ".": | |
| # Walk backwards to find the preceding word | |
| word_start = pos - 1 | |
| while word_start >= 0 and text[word_start].isalpha(): | |
| word_start -= 1 | |
| word = text[word_start + 1 : pos].lower() | |
| if word in _ABBREVIATIONS: | |
| continue | |
| # Skip decimal numbers (digit immediately before the period) | |
| if word_start >= 0 and text[word_start].isdigit(): | |
| continue | |
| if char == ".": | |
| # Capture preceding token including internal dots (e.g. "e.g", "u.s") | |
| token_match = re.search(r"([A-Za-z](?:[A-Za-z.]*)?)$", text[:pos]) | |
| token = token_match.group(1).lower() if token_match else "" | |
| if token in _ABBREVIATIONS: | |
| continue | |
| # Skip decimal numbers (digit immediately before the period) | |
| if pos > 0 and text[pos - 1].isdigit(): | |
| continue |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/utils/chunked_tts.py` around lines 125 - 135, The period handling
currently only grabs contiguous letters before the dot, missing dotted
abbreviations like "e.g." or "U.S."; update the block when char == "." to scan
backwards collecting letters and dots (stop at first char that is neither letter
nor '.'), form the candidate token from that span, normalize it by removing '.'
and lowercasing, then check that normalized token against _ABBREVIATIONS (e.g.,
treat "e.g." -> "eg"); preserve the decimal-number skip by ensuring if the
character immediately before the scanned span is a digit you continue as before.
Use the existing local names (char, pos, text, word_start, _ABBREVIATIONS) to
locate and replace the logic.
| for m in re.finditer(r"[\u3002\uff01\uff1f]", text): | ||
| if m.start() > best: | ||
| best = m.start() |
There was a problem hiding this comment.
CJK sentence-end matching can still split inside bracket tags
The CJK pass updates best without checking _inside_bracket_tag, which breaks the “never split inside [ ... ]” guarantee for tags containing CJK punctuation.
💡 Proposed fix
# CJK sentence-ending punctuation
for m in re.finditer(r"[\u3002\uff01\uff1f]", text):
- if m.start() > best:
+ if _inside_bracket_tag(text, m.start()):
+ continue
+ if m.start() > best:
best = m.start()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/utils/chunked_tts.py` around lines 141 - 143, The loop that scans CJK
sentence-end punctuation updates the split candidate `best` without verifying
tag boundaries, which can split inside bracketed tags; modify the for-loop in
chunked_tts.py (the re.finditer block) to skip any match where
`_inside_bracket_tag(match.start())` returns true and only update `best` when
the match is outside bracket tags, preserving the "never split inside [ ... ]"
guarantee; ensure you call `_inside_bracket_tag` with the match start index
before assigning to `best`.
| if sample_rate is None: | ||
| sample_rate = chunk_sr | ||
|
|
||
| audio = concatenate_audio_chunks(audio_chunks, sample_rate, crossfade_ms=crossfade_ms) |
There was a problem hiding this comment.
Validate sample rate consistency across generated chunks
Concatenation assumes all chunk audio is at one sample rate, but only the first chunk’s rate is kept. If any later chunk differs, output timing/pitch is wrong while metadata still reports the first rate.
💡 Proposed fix
audio_chunks.append(np.asarray(chunk_audio, dtype=np.float32))
if sample_rate is None:
sample_rate = chunk_sr
+ elif chunk_sr != sample_rate:
+ raise ValueError(
+ f"Inconsistent sample rates across chunks: expected {sample_rate}, got {chunk_sr}"
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/utils/chunked_tts.py` around lines 298 - 301,
concatenate_audio_chunks currently assumes a single sample rate by only using
the first chunk's rate (sample_rate variable), which can produce incorrect
timing/pitch if later chunks have different rates; update chunked_tts.py to
iterate audio_chunks and validate that each chunk's sample rate matches the
chosen sample_rate (or resample mismatched chunks to sample_rate) before calling
concatenate_audio_chunks, raising a clear error if automatic resampling is not
performed; reference the sample_rate variable, audio_chunks list, and the
concatenate_audio_chunks function when locating where to add the
validation/resampling logic.
- Split chunking/crossfade sliders into dedicated GenerationSettings card - Merge connection status badges into ConnectionForm (remove ServerStatus card) - 2-column grid layout for the entire settings page - GPU Acceleration: remove icon, badge, and MLX info card - Models: merge 'Other Voice Models' into single 'Voice Generation' list - Model detail: remove 'Downloaded' badge, border above actions, swap badges above stats row, match disk size font to stats
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
app/src/components/ServerSettings/ConnectionForm.tsx (1)
66-66: Avoid adding a non-interactive card to the tab order.Line 66 sets
tabIndex={0}on a container with no keyboard interaction, which adds an extra focus stop.Proposed fix
- <Card role="region" aria-label="Server Connection" tabIndex={0}> + <Card role="region" aria-label="Server Connection">🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ConnectionForm.tsx` at line 66, The Card in ConnectionForm (the Card element in the ServerSettings/ConnectionForm component) is non-interactive but has tabIndex={0}, creating an unnecessary focus stop; remove the tabIndex={0} attribute from the Card (or change it to tabIndex={-1} only if you must keep it focusable for a specific a11y reason) so the container is not in the tab order, leaving interactive child controls tabbable as usual.app/src/components/ServerSettings/GpuAcceleration.tsx (1)
225-229: Avoid hardcodingPyTorchfor every non-MLX native backend.At Line 228, the fallback label can misreport backend type. Prefer explicit mapping from
health.backend_typeand a neutral fallback.♻️ Proposed refactor
+ const backendLabel = + health.backend_type === 'mlx' + ? 'MLX' + : health.backend_type === 'pytorch' + ? 'PyTorch' + : health.backend_type || 'GPU backend'; ... {isCurrentlyCuda ? 'CUDA (GPU accelerated)' : hasNativeGpu - ? `${health.backend_type === 'mlx' ? 'MLX' : 'PyTorch'} (GPU accelerated)` + ? `${backendLabel} (GPU accelerated)` : 'CPU'}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/GpuAcceleration.tsx` around lines 225 - 229, The JSX is hardcoding "PyTorch" for any native non-MLX backend; update the logic in GpuAcceleration.tsx to derive the label from health.backend_type instead of a fixed "PyTorch" string — introduce a small mapping or helper (e.g., getBackendLabel or BACKEND_LABELS) and use it in the ternary expression with a neutral fallback like 'Native' or the raw backend_type when unknown, keeping the existing isCurrentlyCuda and hasNativeGpu checks intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Around line 116-118: The VRAM badge is currently gated by a truthy check so a
valid value of 0 won’t render; update the condition in ConnectionForm (the check
around health.vram_used_mb before rendering <Badge>) to explicitly allow zero by
checking for null/undefined (e.g., health.vram_used_mb !== null &&
health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0
MB displays while still avoiding rendering when the value is absent.
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 615-620: hfModelInfo.downloads and hfModelInfo.likes are used
without null-safety which can crash if the HF API omits them; in the render
where Download and Heart icons are shown, pass guarded values to formatDownloads
(e.g., use optional chaining or nullish coalescing) so
formatDownloads(hfModelInfo.downloads ?? 0) and
formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the
ModelManagement component render around the Download/Heart spans to default
missing numeric fields to 0 before calling formatDownloads.
---
Nitpick comments:
In `@app/src/components/ServerSettings/ConnectionForm.tsx`:
- Line 66: The Card in ConnectionForm (the Card element in the
ServerSettings/ConnectionForm component) is non-interactive but has
tabIndex={0}, creating an unnecessary focus stop; remove the tabIndex={0}
attribute from the Card (or change it to tabIndex={-1} only if you must keep it
focusable for a specific a11y reason) so the container is not in the tab order,
leaving interactive child controls tabbable as usual.
In `@app/src/components/ServerSettings/GpuAcceleration.tsx`:
- Around line 225-229: The JSX is hardcoding "PyTorch" for any native non-MLX
backend; update the logic in GpuAcceleration.tsx to derive the label from
health.backend_type instead of a fixed "PyTorch" string — introduce a small
mapping or helper (e.g., getBackendLabel or BACKEND_LABELS) and use it in the
ternary expression with a neutral fallback like 'Native' or the raw backend_type
when unknown, keeping the existing isCurrentlyCuda and hasNativeGpu checks
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 139496ff-45e2-45c7-ba9a-385e6de80bd9
📒 Files selected for processing (6)
app/src/components/ServerSettings/ConnectionForm.tsxapp/src/components/ServerSettings/GenerationSettings.tsxapp/src/components/ServerSettings/GpuAcceleration.tsxapp/src/components/ServerSettings/ModelManagement.tsxapp/src/components/ServerSettings/ServerStatus.tsxapp/src/components/ServerTab/ServerTab.tsx
| {health.vram_used_mb && ( | ||
| <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge> | ||
| )} |
There was a problem hiding this comment.
Render VRAM badge when usage is 0 MB.
Line 116 uses a truthy check, so 0 won’t render even though it’s valid data.
Proposed fix
- {health.vram_used_mb && (
+ {health.vram_used_mb != null && (
<Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge>
)}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {health.vram_used_mb && ( | |
| <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge> | |
| )} | |
| {health.vram_used_mb != null && ( | |
| <Badge variant="outline">VRAM: {health.vram_used_mb.toFixed(0)} MB</Badge> | |
| )} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/ServerSettings/ConnectionForm.tsx` around lines 116 - 118,
The VRAM badge is currently gated by a truthy check so a valid value of 0 won’t
render; update the condition in ConnectionForm (the check around
health.vram_used_mb before rendering <Badge>) to explicitly allow zero by
checking for null/undefined (e.g., health.vram_used_mb !== null &&
health.vram_used_mb !== undefined or Number.isFinite(health.vram_used_mb)) so 0
MB displays while still avoiding rendering when the value is absent.
| <Download className="h-3.5 w-3.5" /> | ||
| {formatDownloads(hfModelInfo.downloads)} | ||
| </span> | ||
| <span className="flex items-center gap-1" title="Likes"> | ||
| <Heart className="h-3.5 w-3.5" /> | ||
| {formatDownloads(hfModelInfo.likes)} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify local typing and current call-site assumptions for HuggingFace stats fields.
# 1) Check HuggingFaceModelInfo field nullability in frontend types
fd "types.ts" app/src -t f -x rg -n "interface HuggingFaceModelInfo|downloads|likes" {}
# 2) Inspect all formatDownloads call sites in this file
rg -n "formatDownloads\\(" app/src/components/ServerSettings/ModelManagement.tsx -C 2Repository: jamiepine/voicebox
Length of output: 914
🏁 Script executed:
# Get more context around the problematic lines
sed -n '590,650p' app/src/components/ServerSettings/ModelManagement.tsx | cat -nRepository: jamiepine/voicebox
Length of output: 3353
🏁 Script executed:
# Check where hfModelInfo is defined/assigned
rg -n "hfModelInfo\s*=" app/src/components/ServerSettings/ModelManagement.tsx -B 2 -A 2Repository: jamiepine/voicebox
Length of output: 44
🏁 Script executed:
# Look for where this JSX is rendered and how hfModelInfo is obtained
rg -n "hfModelInfo" app/src/components/ServerSettings/ModelManagement.tsx | head -20Repository: jamiepine/voicebox
Length of output: 1215
🏁 Script executed:
# Get the complete HuggingFaceModelInfo interface definition
sed -n '131,150p' app/src/components/ServerSettings/ModelManagement.tsx | cat -nRepository: jamiepine/voicebox
Length of output: 824
🏁 Script executed:
# Also check the formatDownloads function to see if it guards against undefined
sed -n '51,60p' app/src/components/ServerSettings/ModelManagement.tsx | cat -nRepository: jamiepine/voicebox
Length of output: 459
🏁 Script executed:
# Get HuggingFaceModelInfo interface from the file
rg -n "export interface HuggingFaceModelInfo" app/src/components/ServerSettings/ModelManagement.tsx -A 20Repository: jamiepine/voicebox
Length of output: 44
🏁 Script executed:
# Find types.ts files in the project
fd "types.ts" app/src -type fRepository: jamiepine/voicebox
Length of output: 232
🏁 Script executed:
# Check if HuggingFaceModelInfo is imported or defined elsewhere
rg -n "interface HuggingFaceModelInfo" app/src --type tsRepository: jamiepine/voicebox
Length of output: 130
🏁 Script executed:
# Get the full HuggingFaceModelInfo interface definition
sed -n '131,165p' app/src/lib/api/types.ts | cat -nRepository: jamiepine/voicebox
Length of output: 1030
Guard HF stats formatting against missing numeric fields.
At line 616 and 620, unguarded hfModelInfo.downloads and hfModelInfo.likes lack defensive checks despite receiving external API data that may not include these fields. While the type definition marks them as required, the codebase defensively guards other optional API fields. Add null-safe defaults to prevent crashes if the HuggingFace API response omits these fields.
🛡️ Proposed fix
- {formatDownloads(hfModelInfo.downloads)}
+ {formatDownloads(hfModelInfo.downloads ?? 0)}
...
- {formatDownloads(hfModelInfo.likes)}
+ {formatDownloads(hfModelInfo.likes ?? 0)}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <Download className="h-3.5 w-3.5" /> | |
| {formatDownloads(hfModelInfo.downloads)} | |
| </span> | |
| <span className="flex items-center gap-1" title="Likes"> | |
| <Heart className="h-3.5 w-3.5" /> | |
| {formatDownloads(hfModelInfo.likes)} | |
| <Download className="h-3.5 w-3.5" /> | |
| {formatDownloads(hfModelInfo.downloads ?? 0)} | |
| </span> | |
| <span className="flex items-center gap-1" title="Likes"> | |
| <Heart className="h-3.5 w-3.5" /> | |
| {formatDownloads(hfModelInfo.likes ?? 0)} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 615 -
620, hfModelInfo.downloads and hfModelInfo.likes are used without null-safety
which can crash if the HF API omits them; in the render where Download and Heart
icons are shown, pass guarded values to formatDownloads (e.g., use optional
chaining or nullish coalescing) so formatDownloads(hfModelInfo.downloads ?? 0)
and formatDownloads(hfModelInfo.likes ?? 0) (or equivalent) are used; update the
ModelManagement component render around the Download/Heart spans to default
missing numeric fields to 0 before calling formatDownloads.
Summary
Long text that exceeds TTS model context limits is now automatically chunked, generated per-segment, and concatenated with crossfade. Works with all engines (Qwen, LuxTTS, Chatterbox, Chatterbox Turbo).
Dr.,Mr.,e.g., decimals), CJK punctuation support, and paralinguistic tag preservation ([laugh],[cough])textmax_length raised from 5,000 to 50,000 charactersAuto-chunking limit setting
A slider in Server Connection settings lets users control the chunk size (100-2000 chars, default 800). Lower values improve quality for long outputs by keeping each chunk well within the model's context window. The setting is persisted in localStorage.
Changes
backend/utils/chunked_tts.pygenerate_chunked()wrapperbackend/main.py/generateand/generate/streamroute throughgenerate_chunked()backend/models.pytextmax_length -> 50000, newmax_chunk_charsfield onGenerationRequestapp/src/stores/serverStore.tsmaxChunkCharssetting (default 800)app/src/lib/api/types.tsmax_chunk_charsonGenerationRequestapp/src/lib/hooks/useGenerationForm.tsapp/src/components/ServerSettings/ConnectionForm.tsxDesign decisions
Engine-agnostic layer -- Chunking wraps the standard
TTSBackend.generate()protocol in the dispatch layer (main.py), not inside individual backends. Every engine gets long text support for free.Per-chunk trim -- Chatterbox
trim_tts_output()is applied to each chunk individually, catching hallucinated trailing noise at every boundary instead of only at the end.Per-chunk seed variation -- Seed
Nproduces chunk seedsN, N+1, N+2, ...to avoid correlated RNG artefacts while keeping output deterministic.Persisted setting, not per-request -- Chunk size is a "set and forget" preference stored in localStorage, not a form field cluttering the generation UI. It's still sent as
max_chunk_charson every request so the backend respects it.No quality selector -- The original PR #99 included a "standard vs high" resampler (24kHz -> 44.1kHz via soxr). This was dropped because upsampling cannot recover frequencies above Nyquist that were never generated.
What this replaces from PR #99
Cherry-picks the core chunking idea from #99 and fixes:
languageparameter was missing from_generate_single()(NameError on every generation)max_length=50000applied globally but only Qwen got chunking; now all engines chunkCloses #99
cc @glaucusj-sai -- thanks for the original implementation, the sentence-boundary splitting and crossfade concat ideas carried through
Summary by CodeRabbit
New Features
UI