Skip to content

Merge voice latency and fail-fast experiments#1504

Open
BH3GEI wants to merge 20 commits into
mainfrom
review/voice-experiments-merge-check
Open

Merge voice latency and fail-fast experiments#1504
BH3GEI wants to merge 20 commits into
mainfrom
review/voice-experiments-merge-check

Conversation

@BH3GEI

@BH3GEI BH3GEI commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Merge the useful parts of the voice latency experiments: Volcano ws_binary streaming TTS, voice/audio_chunk UI protocol events, shared no-redirect cloud TTS client.
  • Merge fail-fast LLM policy handling so voice turns do not retry, fail over, or hedge after fail-fast errors, and surface TurnFailure consistently.
  • Keep the current main branch security boundary for per-profile cloud TTS config, HTTPS-only Volcano endpoints, endpoint allowlist, and redirects disabled.

Branches reviewed

  • Merged: origin/perf/voice-tts-latency
  • Merged: origin/feat/voice-llm-error-fail-fast
  • Not merged separately: origin/perf/voice-cloud-tts-shared-client, subsumed by the latency branch and reconciled with main security handling.
  • Not merged: origin/test/voice-perf-plus-selection, conflicts heavily and the core voice selection/perf pieces are already present on main.
  • Not merged: origin/fix/tts-final-syllable-truncation, already present on main.

Validation

  • cargo fmt --check
  • git diff --check
  • cargo test -p octos-core --lib voice_audio_chunk_round_trips_through_rpc_notification: 1 passed, 0 failed
  • cargo test -p octos-cli --features api voice_turn: 57 passed, 0 failed
  • cargo test -p octos-cli --features api volcano_ws: 7 passed, 0 failed, 1 ignored live Volcano test requiring VOLC_TTS_APPID/VOLC_TTS_TOKEN
  • cargo test -p octos-llm --lib failfast: 12 passed, 0 failed
  • cargo test -p octos-agent --lib failfast: 5 passed, 0 failed
  • cargo test -p octos-cli --features api ui_protocol: 532 passed, 0 failed, 2 ignored
  • OCTOS_LEDGER_SOAK=1 cargo test -p octos-cli --features api ui_protocol -- --ignored --nocapture: 2 passed, 0 failed

alan0x and others added 20 commits June 22, 2026 16:31
Per-sentence reply synthesis built a fresh `reqwest::Client` on every
call, so each sentence paid a new TCP+TLS handshake to bytedance. Share
one process-wide client via `OnceLock` so reqwest pools connections
(keep-alive) across the sentence-streamed TTS path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Path X for cloud-TTS latency: keep the BV001 voice but switch from the
non-streaming HTTP `query` to the v1 ws_binary `submit` protocol, where
the server streams audio chunks as it synthesizes. The V3 大模型 endpoint
rejects BV001 (verified: resource/speaker mismatch), so v1 ws is the
right surface for this voice; auth/body/cluster are identical to the
existing HTTP path.

Self-contained `volcano_ws` module:
- encode_request_frame / parse_server_frame / build_submit_payload —
  pure binary-protocol logic, unit-tested (incl. truncation safety).
- synthesize_ws — builds an explicit rustls TLS connector (ring provider
  + native roots, mirroring the octos-bus WS channels, since the process
  installs no global CryptoProvider), opens the WebSocket, streams audio
  frames until the negative-sequence final chunk, and writes one file
  (drop-in for synthesize_volcano). Buffering to a complete file for now;
  streaming chunks onward to the client is the next step (⑤).
- a #[ignore] live test against the real endpoint (needs VOLC_TTS_*).
  Verified live: BV001 streams over ws into valid audio.

tokio-tungstenite/rustls/rustls-native-certs become api-gated deps. Not
yet wired into synthesize_reply — that lands with the streaming delivery.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split synthesize_ws into a streaming core, synthesize_ws_stream, which
invokes an on_chunk callback per audio frame as it arrives, and a thin
collect→file wrapper (synthesize_ws) over it. Transport-agnostic: the
collect→file path keeps ④a working (live test still green), and the ⑤
push-to-client path (audio frames over the UI WebSocket) will consume
synthesize_ws_stream directly. No behaviour change to the file path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Server-protocol half of streamed voice-reply audio (transport B). Adds:
- VoiceAudioChunkEvent { session_id, topic, turn_id, segment_id, seq,
  mime, audio_b64, last } + a VoiceAudioChunk variant on UiNotification,
  wired through method()/session_id()/topic()/stamp_topic/params/decode.
- methods::VOICE_AUDIO_CHUNK ("voice/audio_chunk") in the notification
  methods list.
- UI_PROTOCOL_FEATURE_VOICE_AUDIO_V1 ("event.voice_audio.v1") in the
  known-features registry; gates emission so clients that don't negotiate
  it keep getting whole-file file/attached audio.
- round-trip test + updated capability goldens.

Chunks sharing a segment_id form one playable utterance (one reply
sentence); seq orders them, last marks the segment end. The cli emit
path + frontend MSE playback follow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the `voice_audio` per-connection capability to ConnectionUiFeatures:
negotiated from `event.voice_audio.v1` (header/query), advertised back in
the requested-features set, and enforced in the live/replay capability
filter so `voice/audio_chunk` notifications only reach connections that
opted into progressive playback (others keep whole-file `file/attached`).
Wire the new UiNotification variant through the cli's exhaustive matches
(cursor extraction = non-cursor-bearing; ledger session-id lookup).

No emitter yet — the voice worker push + frontend MSE playback follow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(B-2)

The voice-turn worker now pushes cloud TTS audio progressively instead of
only writing a whole file. For each reply sentence, when the connection
negotiated event.voice_audio.v1, it drives the v1 ws `submit` stream and
emits one `voice/audio_chunk` (base64 frame, per-sentence segment_id,
incrementing seq, last on the final frame) per audio chunk via the
ephemeral send path — so the client can play before the sentence finishes
synthesizing. Falls through to the existing whole-file `file/attached`
path when the client didn't negotiate, the turn isn't cloud-routed, or
the stream fails.

- voice_turn::synthesize_reply_streaming: cloud-only streaming entry that
  threads the VOLC_TTS_* config into volcano_ws::synthesize_ws_stream and
  passes the audio mime to the chunk callback.
- synthesize_ws_stream callback gains an is_last flag.
- worker captures ws.clone() + features.voice_audio to emit per chunk.

Live-verified end to end against BV001.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add FailFast guard to chat() and chat_stream() retry loops: when
current_llm_call_policy() == FailFast, return the first error immediately
without any backoff or retry. Normal-policy behavior is byte-for-byte unchanged.
Tests: should_call_inner_once_when_failfast_even_if_retryable,
should_retry_when_normal_policy (CountingProvider mock, always-429).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- should_retry_when_normal_policy: switch from rate_limited(Some(0)) to
  ServerError{503} so rate_limit_delay returns None and calculate_delay
  uses the configured 1-2ms delays (was 3×30s=90s, now <1ms).
- CountingProvider: add chat_stream impl (same counter + error as chat)
  so the mock can't silently panic on stream calls.
- Add should_call_inner_once_when_failfast_on_stream: exercises the
  chat_stream FailFast guard, asserts inner called exactly once.

All 40 retry tests pass in 0.01s.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FallbackProvider (chat + chat_stream) and ProviderChain (chat_inner +
chat_stream) now return the primary error immediately when
current_llm_call_policy() == FailFast, skipping all fallback/lane-switch
logic. Normal-policy failover behavior is byte-for-byte unchanged.
Tests: should_not_failover_when_failfast, should_not_failover_stream_when_failfast,
should_failover_when_normal_policy (fallback.rs) and
should_not_switch_lane_when_failfast, should_not_switch_lane_stream_when_failfast
(failover.rs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under LlmCallPolicy::FailFast, chat() now skips hedged_chat entirely
(no proactive double-provider fire) and both chat() and chat_stream()
return immediately on the first error without trying the next slot.
Normal-policy hedge and failover behavior is byte-for-byte unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lFast

Add `&& current_llm_call_policy() != LlmCallPolicy::FailFast` guard to
the image-modality 400 fallback in both `chat` and `chat_stream`. Under
FailFast the condition falls to the else branch that converts the 400
into a typed LlmError and returns immediately — no second HTTP POST.

Normal behaviour (retry text-only on image-modality 400) is unchanged.

TDD: two wiremock tests (chat + chat_stream) assert exactly 1 request
reaches the mock server when FailFast is active.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under `LlmCallPolicy::FailFast`, `call_llm_with_hooks` now runs at
most one attempt (retry_max=0) and skips the non-streaming fallback
in both the stream-error branch and the empty-response branch.
Normal policy behaviour is byte-for-byte unchanged.

Adds two inline tests that assert chat_stream=1 and chat=0 under
FailFast for both failure modes (stream error and empty response).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2nd call, hook-deny excluded

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iew/voice-experiments-merge-check

# Conflicts:
#	crates/octos-cli/src/api/ui_protocol.rs
#	crates/octos-cli/src/api/voice_turn.rs
#	crates/octos-core/src/ui_protocol.rs
@BH3GEI

BH3GEI commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

Deployment and validation update:

  • Deployed the PR branch build to the current octos.mofa.ai backend bundle.
  • Deployed binary: octos 1.1.0 (207a4a8 2026-06-26).
  • Previous bundle was backed up at /Users/mac/repos/octos-dev-state/bundle/octos.backup-20260626-065005-pre-voice-experiments.
  • LaunchAgent io.octos.dev-backend is running with the new bundle.
  • https://octos.mofa.ai/login returns HTTP 200.
  • Authenticated /api/my/profile returns admin profile running.
  • OMiniX runtime is healthy, service_registered/service_running true, voice_models_ready true, ASR and TTS models ready.

Local Rust validation:

  • cargo fmt --check: passed
  • git diff --check: passed
  • octos-core voice/audio_chunk round-trip: 1 passed, 0 failed
  • octos-cli voice_turn: 57 passed, 0 failed
  • octos-cli volcano_ws: 7 passed, 0 failed, 1 live Volcano test requires VOLC_TTS_APPID/VOLC_TTS_TOKEN and was not runnable on this machine
  • octos-llm failfast: 12 passed, 0 failed
  • octos-agent failfast: 5 passed, 0 failed
  • octos-cli ui_protocol: 532 passed, 0 failed, 2 ignored
  • octos-cli ui_protocol ignored tests with OCTOS_LEDGER_SOAK=1: 2 passed, 0 failed

Frontend / deployed smoke:

  • Playwright against https://octos.mofa.ai with system Chrome: 26 passed, 1 failed because wake-word-model.spec imports /src/home/voice/wake-word-model.ts, which is only available under Vite dev and not in the production build.
  • The production Home auto-listen wake phrase test passed against https://octos.mofa.ai.
  • The source-level wake-word model score test passed against local Vite: 1 passed, 0 failed.

GitHub checks:

  • Required CI checks reported by gh pr checks are passing.
  • PR remains blocked only by review requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants