Merge voice latency and fail-fast experiments by BH3GEI · Pull Request #1504 · octos-org/octos

BH3GEI · 2026-06-25T22:44:47Z

Summary

Merge the useful parts of the voice latency experiments: Volcano ws_binary streaming TTS, voice/audio_chunk UI protocol events, shared no-redirect cloud TTS client.
Merge fail-fast LLM policy handling so voice turns do not retry, fail over, or hedge after fail-fast errors, and surface TurnFailure consistently.
Keep the current main branch security boundary for per-profile cloud TTS config, HTTPS-only Volcano endpoints, endpoint allowlist, and redirects disabled.

Branches reviewed

Merged: origin/perf/voice-tts-latency
Merged: origin/feat/voice-llm-error-fail-fast
Not merged separately: origin/perf/voice-cloud-tts-shared-client, subsumed by the latency branch and reconciled with main security handling.
Not merged: origin/test/voice-perf-plus-selection, conflicts heavily and the core voice selection/perf pieces are already present on main.
Not merged: origin/fix/tts-final-syllable-truncation, already present on main.

Validation

cargo fmt --check
git diff --check
cargo test -p octos-core --lib voice_audio_chunk_round_trips_through_rpc_notification: 1 passed, 0 failed
cargo test -p octos-cli --features api voice_turn: 57 passed, 0 failed
cargo test -p octos-cli --features api volcano_ws: 7 passed, 0 failed, 1 ignored live Volcano test requiring VOLC_TTS_APPID/VOLC_TTS_TOKEN
cargo test -p octos-llm --lib failfast: 12 passed, 0 failed
cargo test -p octos-agent --lib failfast: 5 passed, 0 failed
cargo test -p octos-cli --features api ui_protocol: 532 passed, 0 failed, 2 ignored
OCTOS_LEDGER_SOAK=1 cargo test -p octos-cli --features api ui_protocol -- --ignored --nocapture: 2 passed, 0 failed

Per-sentence reply synthesis built a fresh `reqwest::Client` on every call, so each sentence paid a new TCP+TLS handshake to bytedance. Share one process-wide client via `OnceLock` so reqwest pools connections (keep-alive) across the sentence-streamed TTS path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Path X for cloud-TTS latency: keep the BV001 voice but switch from the non-streaming HTTP `query` to the v1 ws_binary `submit` protocol, where the server streams audio chunks as it synthesizes. The V3 大模型 endpoint rejects BV001 (verified: resource/speaker mismatch), so v1 ws is the right surface for this voice; auth/body/cluster are identical to the existing HTTP path. Self-contained `volcano_ws` module: - encode_request_frame / parse_server_frame / build_submit_payload — pure binary-protocol logic, unit-tested (incl. truncation safety). - synthesize_ws — builds an explicit rustls TLS connector (ring provider + native roots, mirroring the octos-bus WS channels, since the process installs no global CryptoProvider), opens the WebSocket, streams audio frames until the negative-sequence final chunk, and writes one file (drop-in for synthesize_volcano). Buffering to a complete file for now; streaming chunks onward to the client is the next step (⑤). - a #[ignore] live test against the real endpoint (needs VOLC_TTS_*). Verified live: BV001 streams over ws into valid audio. tokio-tungstenite/rustls/rustls-native-certs become api-gated deps. Not yet wired into synthesize_reply — that lands with the streaming delivery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Split synthesize_ws into a streaming core, synthesize_ws_stream, which invokes an on_chunk callback per audio frame as it arrives, and a thin collect→file wrapper (synthesize_ws) over it. Transport-agnostic: the collect→file path keeps ④a working (live test still green), and the ⑤ push-to-client path (audio frames over the UI WebSocket) will consume synthesize_ws_stream directly. No behaviour change to the file path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Server-protocol half of streamed voice-reply audio (transport B). Adds: - VoiceAudioChunkEvent { session_id, topic, turn_id, segment_id, seq, mime, audio_b64, last } + a VoiceAudioChunk variant on UiNotification, wired through method()/session_id()/topic()/stamp_topic/params/decode. - methods::VOICE_AUDIO_CHUNK ("voice/audio_chunk") in the notification methods list. - UI_PROTOCOL_FEATURE_VOICE_AUDIO_V1 ("event.voice_audio.v1") in the known-features registry; gates emission so clients that don't negotiate it keep getting whole-file file/attached audio. - round-trip test + updated capability goldens. Chunks sharing a segment_id form one playable utterance (one reply sentence); seq orders them, last marks the segment end. The cli emit path + frontend MSE playback follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add the `voice_audio` per-connection capability to ConnectionUiFeatures: negotiated from `event.voice_audio.v1` (header/query), advertised back in the requested-features set, and enforced in the live/replay capability filter so `voice/audio_chunk` notifications only reach connections that opted into progressive playback (others keep whole-file `file/attached`). Wire the new UiNotification variant through the cli's exhaustive matches (cursor extraction = non-cursor-bearing; ledger session-id lookup). No emitter yet — the voice worker push + frontend MSE playback follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…(B-2) The voice-turn worker now pushes cloud TTS audio progressively instead of only writing a whole file. For each reply sentence, when the connection negotiated event.voice_audio.v1, it drives the v1 ws `submit` stream and emits one `voice/audio_chunk` (base64 frame, per-sentence segment_id, incrementing seq, last on the final frame) per audio chunk via the ephemeral send path — so the client can play before the sentence finishes synthesizing. Falls through to the existing whole-file `file/attached` path when the client didn't negotiate, the turn isn't cloud-routed, or the stream fails. - voice_turn::synthesize_reply_streaming: cloud-only streaming entry that threads the VOLC_TTS_* config into volcano_ws::synthesize_ws_stream and passes the audio mime to the chunk callback. - synthesize_ws_stream callback gains an is_last flag. - worker captures ws.clone() + features.voice_audio to emit per chunk. Live-verified end to end against BV001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add FailFast guard to chat() and chat_stream() retry loops: when current_llm_call_policy() == FailFast, return the first error immediately without any backoff or retry. Normal-policy behavior is byte-for-byte unchanged. Tests: should_call_inner_once_when_failfast_even_if_retryable, should_retry_when_normal_policy (CountingProvider mock, always-429). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- should_retry_when_normal_policy: switch from rate_limited(Some(0)) to ServerError{503} so rate_limit_delay returns None and calculate_delay uses the configured 1-2ms delays (was 3×30s=90s, now <1ms). - CountingProvider: add chat_stream impl (same counter + error as chat) so the mock can't silently panic on stream calls. - Add should_call_inner_once_when_failfast_on_stream: exercises the chat_stream FailFast guard, asserts inner called exactly once. All 40 retry tests pass in 0.01s. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

FallbackProvider (chat + chat_stream) and ProviderChain (chat_inner + chat_stream) now return the primary error immediately when current_llm_call_policy() == FailFast, skipping all fallback/lane-switch logic. Normal-policy failover behavior is byte-for-byte unchanged. Tests: should_not_failover_when_failfast, should_not_failover_stream_when_failfast, should_failover_when_normal_policy (fallback.rs) and should_not_switch_lane_when_failfast, should_not_switch_lane_stream_when_failfast (failover.rs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Under LlmCallPolicy::FailFast, chat() now skips hedged_chat entirely (no proactive double-provider fire) and both chat() and chat_stream() return immediately on the first error without trying the next slot. Normal-policy hedge and failover behavior is byte-for-byte unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lFast Add `&& current_llm_call_policy() != LlmCallPolicy::FailFast` guard to the image-modality 400 fallback in both `chat` and `chat_stream`. Under FailFast the condition falls to the else branch that converts the 400 into a typed LlmError and returns immediately — no second HTTP POST. Normal behaviour (retry text-only on image-modality 400) is unchanged. TDD: two wiremock tests (chat + chat_stream) assert exactly 1 request reaches the mock server when FailFast is active. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Under `LlmCallPolicy::FailFast`, `call_llm_with_hooks` now runs at most one attempt (retry_max=0) and skips the non-streaming fallback in both the stream-error branch and the empty-response branch. Normal policy behaviour is byte-for-byte unchanged. Adds two inline tests that assert chat_stream=1 and chat=0 under FailFast for both failure modes (stream error and empty response). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…2nd call, hook-deny excluded Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…iew/voice-experiments-merge-check # Conflicts: # crates/octos-cli/src/api/ui_protocol.rs # crates/octos-cli/src/api/voice_turn.rs # crates/octos-core/src/ui_protocol.rs

…into review/voice-experiments-merge-check

BH3GEI · 2026-06-25T23:04:22Z

Deployment and validation update:

Deployed the PR branch build to the current octos.mofa.ai backend bundle.
Deployed binary: octos 1.1.0 (207a4a8 2026-06-26).
Previous bundle was backed up at /Users/mac/repos/octos-dev-state/bundle/octos.backup-20260626-065005-pre-voice-experiments.
LaunchAgent io.octos.dev-backend is running with the new bundle.
https://octos.mofa.ai/login returns HTTP 200.
Authenticated /api/my/profile returns admin profile running.
OMiniX runtime is healthy, service_registered/service_running true, voice_models_ready true, ASR and TTS models ready.

Local Rust validation:

cargo fmt --check: passed
git diff --check: passed
octos-core voice/audio_chunk round-trip: 1 passed, 0 failed
octos-cli voice_turn: 57 passed, 0 failed
octos-cli volcano_ws: 7 passed, 0 failed, 1 live Volcano test requires VOLC_TTS_APPID/VOLC_TTS_TOKEN and was not runnable on this machine
octos-llm failfast: 12 passed, 0 failed
octos-agent failfast: 5 passed, 0 failed
octos-cli ui_protocol: 532 passed, 0 failed, 2 ignored
octos-cli ui_protocol ignored tests with OCTOS_LEDGER_SOAK=1: 2 passed, 0 failed

Frontend / deployed smoke:

Playwright against https://octos.mofa.ai with system Chrome: 26 passed, 1 failed because wake-word-model.spec imports /src/home/voice/wake-word-model.ts, which is only available under Vite dev and not in the production build.
The production Home auto-listen wake phrase test passed against https://octos.mofa.ai.
The source-level wake-word model score test passed against local Vite: 1 passed, 0 failed.

GitHub checks:

Required CI checks reported by gh pr checks are passing.
PR remains blocked only by review requirement.

alan0x and others added 20 commits June 22, 2026 16:31

chore(voice): sync Cargo.lock for ws/rustls deps

9798f9c

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(llm): add per-turn LlmCallPolicy task-local

1f77bfb

feat(agent): add TurnFailure projection + voice empty-response check

e8d2346

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(agent): FailFast LLM bail emits TurnFailure (classify once), no …

60d0f0f

…2nd call, hook-deny excluded Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/perf/voice-tts-latency' into rev…

2c9b2f9

…iew/voice-experiments-merge-check # Conflicts: # crates/octos-cli/src/api/ui_protocol.rs # crates/octos-cli/src/api/voice_turn.rs # crates/octos-core/src/ui_protocol.rs

Merge remote-tracking branch 'origin/feat/voice-llm-error-fail-fast' …

53b3621

…into review/voice-experiments-merge-check

fix(voice): format merged voice TTS client

014ed66

fix(voice): align Volcano client reuse test

207a4a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge voice latency and fail-fast experiments#1504

Merge voice latency and fail-fast experiments#1504
BH3GEI wants to merge 20 commits into
mainfrom
review/voice-experiments-merge-check

BH3GEI commented Jun 25, 2026

Uh oh!

BH3GEI commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BH3GEI commented Jun 25, 2026

Summary

Branches reviewed

Validation

Uh oh!

BH3GEI commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants