Summary
Add true streaming support to both TTS (Kokoro) and ASR (Whisper) backends while maintaining subprocess isolation for guaranteed memory cleanup.
Current State
| Backend |
Input |
Output |
Subprocess Isolation |
| Kokoro TTS |
Text |
Full audio (wait for completion) |
✅ |
| Whisper ASR |
Audio chunks (WebSocket) |
Full transcription (wait for completion) |
✅ |
Proposed: Unified Streaming Architecture
Use multiprocessing.Queue for IPC to stream chunks while keeping subprocess isolation:
┌─────────────────────────────────────────────────────────────┐
│ Main Process │
│ ┌─────────────┐ Queue (in) ┌─────────────────────┐ │
│ │ API Layer │ ───────────────► │ │ │
│ │ (FastAPI) │ │ Subprocess Worker │ │
│ │ │ ◄─────────────── │ (Model loaded) │ │
│ └─────────────┘ Queue (out) └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
| Backend |
Input Queue |
Output Queue |
| TTS (Kokoro) |
Text + voice + speed |
Audio chunks (~3s batches) |
| ASR (Whisper) |
Audio chunks |
Partial transcriptions |
Benefits
- Lower latency: First byte/word delivered faster
- Memory isolation preserved: Subprocess termination guarantees GPU memory release
- Unified pattern: Same architecture for TTS and ASR
- Code reuse: Shared
StreamingSubprocessBackend base class
Implementation Notes
- Kokoro-FastAPI achieves ~300ms first-token latency but uses in-process model (no subprocess isolation)
- Our approach trades some latency (~3s batches) for guaranteed memory cleanup
- Could use
multiprocessing.Manager().Queue() which is picklable for ProcessPoolExecutor
Estimate
~330 lines total for both backends with shared base class.
References
Summary
Add true streaming support to both TTS (Kokoro) and ASR (Whisper) backends while maintaining subprocess isolation for guaranteed memory cleanup.
Current State
Proposed: Unified Streaming Architecture
Use
multiprocessing.Queuefor IPC to stream chunks while keeping subprocess isolation:Benefits
StreamingSubprocessBackendbase classImplementation Notes
multiprocessing.Manager().Queue()which is picklable for ProcessPoolExecutorEstimate
~330 lines total for both backends with shared base class.
References
vad_filter=True