feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262
feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262
Conversation
Cherry-picked and adapted from PR #89 and #214: - Linux audio capture via PulseAudio/PipeWire monitor sources (cpal) - AMD ROCm GPU support: HSA_OVERRIDE_GFX_VERSION env var, ROCm detection - Whisper Turbo model (openai/whisper-large-v3-turbo) in all endpoints - Cleaner Whisper language handling via generate_kwargs - tauri::async_runtime::spawn fix to prevent panic on app shutdown - Enable Linux (ubuntu-22.04) in release CI matrix
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis pull request adds Ubuntu 22.04 to the release workflow matrix, introduces a Whisper turbo model variant with improved language handling in the PyTorch backend, enhances GPU detection for AMD ROCm, and implements comprehensive Linux audio capture functionality with WAV encoding support. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Capture as start_capture()
participant Device as Audio Device
participant Thread as Capture Thread
participant State as Shared State
participant Stop as stop_capture()
participant WAV as WAV Encoder
participant Output as Base64 Result
Client->>Capture: Initialize audio capture
Capture->>State: Set up state & channels
Capture->>Thread: Spawn capture thread
Thread->>Device: Select monitor/fallback device
Thread->>Device: Configure stream (sample format)
Thread->>State: Stream audio data into samples
Thread->>State: Check timeout or stop signal
Note over Thread: Wait for stop signal or max duration
Client->>Stop: Stop capture
Stop->>State: Signal stop flag
Stop->>State: Wait briefly for thread
Stop->>State: Retrieve captured samples
Stop->>WAV: Convert samples to 16-bit PCM WAV
WAV->>WAV: Encode WAV bytes to base64
Stop->>Output: Return base64 string
Output->>Client: Audio data ready
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Cherry-picks the good parts of PR #89 (@rocopolas) and #214 without the macOS-breaking changes, plus enables Linux in the release CI.
cpalmonitor devices, replacingtodo!()stubsHSA_OVERRIDE_GFX_VERSIONenv var for unlisted AMD GPUs,MIOPEN_LOG_LEVELnoise suppression, ROCm vs CUDA detection in GPU statusopenai/whisper-large-v3-turboadded to all endpoints (model status, download, delete, transcribe)tauri::async_runtime::spawninstead oftokio::spawnin window close handler to prevent panic on shutdownubuntu-22.04in the CI release matrixChanges NOT taken from #89
tauri.conf.jsonresource removal (would break macOS builds)package.jsondev script changes (opinionated, unnecessary)basetoturbo(should be a separate user-facing decision)Files changed
.github/workflows/release.yml— enable Linux buildbackend/main.py— ROCm env vars, ROCm detection, Whisper Turbo in 3 config dictsbackend/backends/pytorch_backend.py— Whisper Turbo in HF repos, cleaner language kwargstauri/src-tauri/src/audio_capture/linux.rs— full implementation (~300 lines)tauri/src-tauri/src/main.rs— async_runtime::spawn fixSupersedes the relevant parts of #89 and #214.
Summary by CodeRabbit
New Features
Bug Fixes
Chores