fix(voice): hold mic-mute window open until response stream ends#318
Merged
Conversation
Before: playNextAudio() drained the audio queue with an 800ms debounce.
Under load, Groq Orpheus has been observed taking 22-25s per TTS chunk;
the queue empties, STT resumes, the mic captures the late-arriving
chunk as user speech.
After: while the server response stream is still open, drain wait
extends to 30s. Stream `finally{}` clears the flag and collapses any
pending long-wait drain back to the short 800ms window so the mic
returns promptly after the response completes.
Rationale, monitoring checklist, and rollback steps:
docs/reference/voice-flow-mic-mute-drain.md
SpeechRecognition lifecycle, AudioContext, queue ordering, PTT, and
wake-word paths are unchanged. Single-commit revert restores prior
behavior exactly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a streamed response includes multiple TTS chunks, the audio queue can briefly empty between chunks. With Groq Orpheus generation latency reaching 22-25s per chunk under load, the existing 800ms drain timer fired well before the next chunk arrived → STT resumed → mic captured the late-arriving TTS audio as user speech.
This PR extends
playNextAudio()'s drain window to 30s while the response stream is still open (_streamingResponseActive = true). When the stream ends,finally{}clears the flag and collapses any pending long-wait drain back to the 800ms window so the mic returns promptly.Incident
User report (2026-05-19, 1381-char response):
Scope
Touches only
ClawdbotModeinsrc/app.js:_streamingResponseActive = false_streamingResponseActive = trueimmediately beforefetch(?stream=1)playNextAudio()empty-queue branch: use 30000ms drain when flag is true, 800ms otherwisefinally{}: clear flag + collapse pending long drain to short windowUntouched (and intentionally so):
SpeechRecognitionlifecycle inWebSpeechSTT.js(single instance, no destroy/recreate)VoiceSession.onSpeakingChangehandlerTTSPlayerqueue ordering and AudioContext path_textDoneReceivedinterject guardRationale + monitoring checklist
docs/reference/voice-flow-mic-mute-drain.md(added in this PR) — symptom, root cause, why 30s, what to monitor post-deploy, rollback steps.Rollback
Single commit.
git revert 9c0952erestores prior behavior exactly (every drain falls back to 800ms; new flag remains declared but unused).Risk
_streamingResponseActiveever leakstrueafter a stream ends, mic stays muted indefinitely → user mic appears dead. Guards:finally{}clears the flag unconditionally, plus the existing 60sINACTIVITY_TIMEOUT_MSwill abort the request and triggerfinally{}if the stream truly hangs.