Skip to content

fix(voice): hold mic-mute window open until response stream ends#318

Merged
MCERQUA merged 1 commit into
devfrom
fix/mic-mute-hold-during-tts-pending
May 19, 2026
Merged

fix(voice): hold mic-mute window open until response stream ends#318
MCERQUA merged 1 commit into
devfrom
fix/mic-mute-hold-during-tts-pending

Conversation

@MCERQUA
Copy link
Copy Markdown
Owner

@MCERQUA MCERQUA commented May 19, 2026

Summary

When a streamed response includes multiple TTS chunks, the audio queue can briefly empty between chunks. With Groq Orpheus generation latency reaching 22-25s per chunk under load, the existing 800ms drain timer fired well before the next chunk arrived → STT resumed → mic captured the late-arriving TTS audio as user speech.

This PR extends playNextAudio()'s drain window to 30s while the response stream is still open (_streamingResponseActive = true). When the stream ends, finally{} clears the flag and collapses any pending long-wait drain back to the 800ms window so the mic returns promptly.

Incident

User report (2026-05-19, 1381-char response):

Response complete (1381 chars, LLM: 39679ms)
🔊 Playing TTS (TTS: 0ms)
🔊 Playing TTS (TTS: 0ms)
                              ← 22-second silent gap, mic went hot
🔊 Playing TTS (TTS: 22021ms)
🔊 Playing TTS (TTS: 22068ms)
🔊 Playing TTS (TTS: 22069ms)
...

Scope

Touches only ClawdbotMode in src/app.js:

  • Constructor: declare _streamingResponseActive = false
  • Stream start: set _streamingResponseActive = true immediately before fetch(?stream=1)
  • playNextAudio() empty-queue branch: use 30000ms drain when flag is true, 800ms otherwise
  • Streaming handler finally{}: clear flag + collapse pending long drain to short window

Untouched (and intentionally so):

  • SpeechRecognition lifecycle in WebSpeechSTT.js (single instance, no destroy/recreate)
  • VoiceSession.onSpeakingChange handler
  • TTSPlayer queue ordering and AudioContext path
  • PTT and wake-word flows
  • _textDoneReceived interject guard

Rationale + monitoring checklist

docs/reference/voice-flow-mic-mute-drain.md (added in this PR) — symptom, root cause, why 30s, what to monitor post-deploy, rollback steps.

Rollback

Single commit. git revert 9c0952e restores prior behavior exactly (every drain falls back to 800ms; new flag remains declared but unused).

Risk

  • If _streamingResponseActive ever leaks true after a stream ends, mic stays muted indefinitely → user mic appears dead. Guards: finally{} clears the flag unconditionally, plus the existing 60s INACTIVITY_TIMEOUT_MS will abort the request and trigger finally{} if the stream truly hangs.
  • 30s is longer than any real-world inter-chunk gap observed so far. If Orpheus latency degrades further, the window may still empty; in that case the value can be bumped or made configurable, but a window > 30s starts approaching the inactivity timeout, so it's better to fix the generator.

Before: playNextAudio() drained the audio queue with an 800ms debounce.
Under load, Groq Orpheus has been observed taking 22-25s per TTS chunk;
the queue empties, STT resumes, the mic captures the late-arriving
chunk as user speech.

After: while the server response stream is still open, drain wait
extends to 30s. Stream `finally{}` clears the flag and collapses any
pending long-wait drain back to the short 800ms window so the mic
returns promptly after the response completes.

Rationale, monitoring checklist, and rollback steps:
docs/reference/voice-flow-mic-mute-drain.md

SpeechRecognition lifecycle, AudioContext, queue ordering, PTT, and
wake-word paths are unchanged. Single-commit revert restores prior
behavior exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MCERQUA MCERQUA merged commit 7fabd8d into dev May 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant