Merge voice latency and fail-fast experiments#1504
Open
BH3GEI wants to merge 20 commits into
Open
Conversation
Per-sentence reply synthesis built a fresh `reqwest::Client` on every call, so each sentence paid a new TCP+TLS handshake to bytedance. Share one process-wide client via `OnceLock` so reqwest pools connections (keep-alive) across the sentence-streamed TTS path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Path X for cloud-TTS latency: keep the BV001 voice but switch from the non-streaming HTTP `query` to the v1 ws_binary `submit` protocol, where the server streams audio chunks as it synthesizes. The V3 大模型 endpoint rejects BV001 (verified: resource/speaker mismatch), so v1 ws is the right surface for this voice; auth/body/cluster are identical to the existing HTTP path. Self-contained `volcano_ws` module: - encode_request_frame / parse_server_frame / build_submit_payload — pure binary-protocol logic, unit-tested (incl. truncation safety). - synthesize_ws — builds an explicit rustls TLS connector (ring provider + native roots, mirroring the octos-bus WS channels, since the process installs no global CryptoProvider), opens the WebSocket, streams audio frames until the negative-sequence final chunk, and writes one file (drop-in for synthesize_volcano). Buffering to a complete file for now; streaming chunks onward to the client is the next step (⑤). - a #[ignore] live test against the real endpoint (needs VOLC_TTS_*). Verified live: BV001 streams over ws into valid audio. tokio-tungstenite/rustls/rustls-native-certs become api-gated deps. Not yet wired into synthesize_reply — that lands with the streaming delivery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Split synthesize_ws into a streaming core, synthesize_ws_stream, which invokes an on_chunk callback per audio frame as it arrives, and a thin collect→file wrapper (synthesize_ws) over it. Transport-agnostic: the collect→file path keeps ④a working (live test still green), and the ⑤ push-to-client path (audio frames over the UI WebSocket) will consume synthesize_ws_stream directly. No behaviour change to the file path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Server-protocol half of streamed voice-reply audio (transport B). Adds:
- VoiceAudioChunkEvent { session_id, topic, turn_id, segment_id, seq,
mime, audio_b64, last } + a VoiceAudioChunk variant on UiNotification,
wired through method()/session_id()/topic()/stamp_topic/params/decode.
- methods::VOICE_AUDIO_CHUNK ("voice/audio_chunk") in the notification
methods list.
- UI_PROTOCOL_FEATURE_VOICE_AUDIO_V1 ("event.voice_audio.v1") in the
known-features registry; gates emission so clients that don't negotiate
it keep getting whole-file file/attached audio.
- round-trip test + updated capability goldens.
Chunks sharing a segment_id form one playable utterance (one reply
sentence); seq orders them, last marks the segment end. The cli emit
path + frontend MSE playback follow.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the `voice_audio` per-connection capability to ConnectionUiFeatures: negotiated from `event.voice_audio.v1` (header/query), advertised back in the requested-features set, and enforced in the live/replay capability filter so `voice/audio_chunk` notifications only reach connections that opted into progressive playback (others keep whole-file `file/attached`). Wire the new UiNotification variant through the cli's exhaustive matches (cursor extraction = non-cursor-bearing; ledger session-id lookup). No emitter yet — the voice worker push + frontend MSE playback follow. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(B-2) The voice-turn worker now pushes cloud TTS audio progressively instead of only writing a whole file. For each reply sentence, when the connection negotiated event.voice_audio.v1, it drives the v1 ws `submit` stream and emits one `voice/audio_chunk` (base64 frame, per-sentence segment_id, incrementing seq, last on the final frame) per audio chunk via the ephemeral send path — so the client can play before the sentence finishes synthesizing. Falls through to the existing whole-file `file/attached` path when the client didn't negotiate, the turn isn't cloud-routed, or the stream fails. - voice_turn::synthesize_reply_streaming: cloud-only streaming entry that threads the VOLC_TTS_* config into volcano_ws::synthesize_ws_stream and passes the audio mime to the chunk callback. - synthesize_ws_stream callback gains an is_last flag. - worker captures ws.clone() + features.voice_audio to emit per chunk. Live-verified end to end against BV001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add FailFast guard to chat() and chat_stream() retry loops: when current_llm_call_policy() == FailFast, return the first error immediately without any backoff or retry. Normal-policy behavior is byte-for-byte unchanged. Tests: should_call_inner_once_when_failfast_even_if_retryable, should_retry_when_normal_policy (CountingProvider mock, always-429). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- should_retry_when_normal_policy: switch from rate_limited(Some(0)) to
ServerError{503} so rate_limit_delay returns None and calculate_delay
uses the configured 1-2ms delays (was 3×30s=90s, now <1ms).
- CountingProvider: add chat_stream impl (same counter + error as chat)
so the mock can't silently panic on stream calls.
- Add should_call_inner_once_when_failfast_on_stream: exercises the
chat_stream FailFast guard, asserts inner called exactly once.
All 40 retry tests pass in 0.01s.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FallbackProvider (chat + chat_stream) and ProviderChain (chat_inner + chat_stream) now return the primary error immediately when current_llm_call_policy() == FailFast, skipping all fallback/lane-switch logic. Normal-policy failover behavior is byte-for-byte unchanged. Tests: should_not_failover_when_failfast, should_not_failover_stream_when_failfast, should_failover_when_normal_policy (fallback.rs) and should_not_switch_lane_when_failfast, should_not_switch_lane_stream_when_failfast (failover.rs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under LlmCallPolicy::FailFast, chat() now skips hedged_chat entirely (no proactive double-provider fire) and both chat() and chat_stream() return immediately on the first error without trying the next slot. Normal-policy hedge and failover behavior is byte-for-byte unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lFast Add `&& current_llm_call_policy() != LlmCallPolicy::FailFast` guard to the image-modality 400 fallback in both `chat` and `chat_stream`. Under FailFast the condition falls to the else branch that converts the 400 into a typed LlmError and returns immediately — no second HTTP POST. Normal behaviour (retry text-only on image-modality 400) is unchanged. TDD: two wiremock tests (chat + chat_stream) assert exactly 1 request reaches the mock server when FailFast is active. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Under `LlmCallPolicy::FailFast`, `call_llm_with_hooks` now runs at most one attempt (retry_max=0) and skips the non-streaming fallback in both the stream-error branch and the empty-response branch. Normal policy behaviour is byte-for-byte unchanged. Adds two inline tests that assert chat_stream=1 and chat=0 under FailFast for both failure modes (stream error and empty response). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…2nd call, hook-deny excluded Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iew/voice-experiments-merge-check # Conflicts: # crates/octos-cli/src/api/ui_protocol.rs # crates/octos-cli/src/api/voice_turn.rs # crates/octos-core/src/ui_protocol.rs
…into review/voice-experiments-merge-check
Collaborator
Author
|
Deployment and validation update:
Local Rust validation:
Frontend / deployed smoke:
GitHub checks:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Branches reviewed
Validation