MCERQUA · MCERQUA · May 19, 2026 · May 19, 2026
diff --git a/docs/reference/voice-flow-mic-mute-drain.md b/docs/reference/voice-flow-mic-mute-drain.md
@@ -0,0 +1,73 @@
+# Voice Flow — Mic Mute Drain Window (2026-05-19)
+
+**Component:** `src/app.js` — ClawdbotMode streaming response handler + `playNextAudio()`
+**Branch / PR:** `fix/mic-mute-hold-during-tts-pending` → see GitHub PR
+**Origin incident:** 2026-05-19, mic captured tail of TTS audio as user speech mid-response.
+
+---
+
+## Symptom
+
+During a long streamed response, the user reported the mic was "off in timing with the actual audio playback" — STT picked up the end of the TTS being played, then the mic appeared hot before the next TTS chunk played. The Action Console showed:
+
+```
+Response complete (1381 chars, LLM: 39679ms)
+🔊 Playing TTS (TTS: 0ms)            ← chunk 1
+🔊 Playing TTS (TTS: 0ms)            ← chunk 2
+                                       ← 22-second silent gap
+🔊 Playing TTS (TTS: 22021ms)        ← chunk 3 (generation took 22s)
+🔊 Playing TTS (TTS: 22068ms)
+🔊 Playing TTS (TTS: 22069ms)
+🔊 Playing TTS (TTS: 23637ms)
+🔊 Playing TTS (TTS: 23644ms)
+🔊 Playing TTS (TTS: 23646ms)
+🔊 Playing TTS (TTS: 23650ms)
+```
+
+## Root Cause
+
+`ClawdbotMode.playNextAudio()` ran an 800ms drain timer when the audio queue emptied. That window was a debounce to handle short inter-sentence gaps. But Groq Orpheus TTS has been observed taking **22–25 seconds** to generate a single chunk under load. The 800ms drain timer fired long before the next chunk arrived, the empty queue triggered `onListening()` → `stt.resume()` → mic hot. When the late chunk finally played, the mic captured it as speech.
+
+## Fix
+
+Extend the drain window dynamically based on whether the **server response stream is still open**:
+
+| Stream state | Drain wait |
+|---|---|
+| `_streamingResponseActive = true` (chunks may still arrive) | **30,000 ms** |
+| `_streamingResponseActive = false` (stream ended) | **800 ms** (unchanged) |
+
+New flag `_streamingResponseActive` (declared in constructor):
+- Set `true` immediately before the `fetch(?stream=1)` call
+- Set `false` in the streaming handler's `finally{}` block
+- When the stream ends and a long drain timer is pending against an empty queue, that block collapses the pending timer and re-invokes `playNextAudio()` so the short-window drain fires and the mic returns promptly
+
+## Why 30s
+
+Worst observed Orpheus gen latency in the incident was ~24s. 30s gives margin without crossing into "something's actually wrong" territory. If a chunk truly never arrives, the existing `INACTIVITY_TIMEOUT_MS = 60000` in the stream reader aborts the request, after which the `finally{}` clears `_streamingResponseActive` and the short-window drain releases the mic.
+
+## What did NOT change
+
+- SpeechRecognition lifecycle — still the same single instance, still `abort()` on mute, `start()` on resume. Per project rule, NEVER destroy/recreate SR instances.
+- The 800ms inter-sentence debounce is preserved for the normal case (post-stream drain).
+- AudioContext + queue ordering — untouched.
+- `_textDoneReceived` flag and interject logic — untouched.
+- PTT and wake-word flows — untouched.
+
+## Rollback
+
+Single commit. `git revert <sha>` returns the file to pre-fix behavior — every drain falls back to 800ms unconditionally and the new flag is unused (declared as `false`, never read).
+
+## Monitoring
+
+Things to watch after deploy:
+1. **Echo captures decrease** — search `[VoiceSession] Ignoring transcript during TTS` in browser logs. Should drop on long responses.
+2. **Mic-hot timing matches audio playback** — Action Console "Playing TTS" lines should always precede `LISTENING` status transitions for streamed responses with audio.
+3. **Stop button behavior** — should remain visible the entire time TTS is in-flight, even during 20+ second Orpheus generation gaps.
+4. **No new stuck-in-listening states** — if `_streamingResponseActive` ever leaks `true` after a stream ends, the mic would stay muted indefinitely. The `finally{}` block and the 60s inactivity timeout both guard against this; verify by checking that long responses fully release the mic afterward.
+
+## Related
+
+- `src/providers/WebSpeechSTT.js` — `mute()` / `resume()` semantics (no changes here)
+- `src/core/VoiceSession.js` — `onSpeakingChange` handler (no changes here)
+- Server-side TTS chunk timing — see openclaw response chunking + Groq Orpheus provider in OpenVoiceUI/`tts_providers/`
diff --git a/src/app.js b/src/app.js
@@ -3462,6 +3462,15 @@ connectAiradio();
                 // fetch. Checked in sendMessage() to fall through to the normal
                 // fresh-request path instead of interject.
                 this._textDoneReceived = false;
+                // Drain-timer extension: when the server response stream is still
+                // open, more TTS chunks may arrive with LONG gaps (Groq Orpheus
+                // takes 20-25s per chunk under load). The default 800ms drain
+                // briefly empties the audio queue between chunks → STT resumes →
+                // mic captures the next chunk as user speech. While this flag is
+                // true, playNextAudio() uses a 30s drain instead. The stream's
+                // finally{} clears the flag, after which the normal 800ms drain
+                // releases the mic promptly. Origin: 2026-05-19 echo capture bug.
+                this._streamingResponseActive = false;
 
                 // Use shared STT instance instead of creating a new one
                 // This prevents conflicts with VoiceConversation's STT
@@ -3943,6 +3952,7 @@ connectAiradio();
                     const gatewayAgentId = localStorage.getItem('gateway_agent_id') || null;
                     this._fetchAbortController = new AbortController();
                     this._textDoneReceived = false;  // new stream — reset the race-window guard
+                    this._streamingResponseActive = true;  // stream open — TTS chunks may arrive with long gaps; see constructor note
                     const response = await fetch(`${this.config.serverUrl}/api/conversation?stream=1`, {
                         method: 'POST',
                         signal: this._fetchAbortController.signal,
@@ -4648,6 +4658,17 @@ connectAiradio();
                     if (_inactivityTimer) clearTimeout(_inactivityTimer);
                     this._sending = false;
                     this._fetchAbortController = null;
+                    // Stream is done. Future drain timer fires should use the short
+                    // 800ms wait again. If an extended-wait drain timer is currently
+                    // pending and the queue is empty, collapse it to the short window
+                    // so the mic returns promptly after the response ends.
+                    // See constructor note on _streamingResponseActive.
+                    this._streamingResponseActive = false;
+                    if (this._drainTimer && this.audioQueue.length === 0) {
+                        clearTimeout(this._drainTimer);
+                        this._drainTimer = null;
+                        this.playNextAudio();  // re-run drain logic with short wait now
+                    }
                     // Safety net: if no audio was queued/played, STT never gets restarted
                     // via onListening callback. Ensure mic comes back after a short delay.
                     // Only fires if call is still active (_voiceActive) — prevents restart after hang-up.
@@ -4924,6 +4945,16 @@ connectAiradio();
                     // Don't immediately transition to listening — more TTS chunks
                     // may be in-flight from streamed sentences. Wait briefly and
                     // check again so the stop button doesn't flash between sentences.
+                    //
+                    // 2026-05-19: extend the drain window while the server response
+                    // stream is still open. Groq Orpheus has been observed taking
+                    // 22-25 SECONDS to generate a single TTS chunk under load; the
+                    // old 800ms wait empties the queue between chunks, STT resumes,
+                    // and the mic captures the late chunk as user speech (echo).
+                    // While _streamingResponseActive is true, wait up to 30s. The
+                    // stream's finally{} clears the flag and re-arms the short
+                    // timer so the mic releases promptly after the stream ends.
+                    const drainMs = this._streamingResponseActive ? 30000 : 800;
                     if (!this._drainTimer) {
                         this._drainTimer = setTimeout(() => {
                             this._drainTimer = null;
@@ -4956,7 +4987,7 @@ connectAiradio();
                                     }, 600);
                                 }
                             }
-                        }, 800);  // 800ms grace period for next TTS chunk to arrive
+                        }, drainMs);
                     }
                     return;
                 }