feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix by jamiepine · Pull Request #262 · jamiepine/voicebox

jamiepine · 2026-03-13T10:35:42Z

Summary

Cherry-picks the good parts of PR #89 (@rocopolas) and #214 without the macOS-breaking changes, plus enables Linux in the release CI.

Linux audio capture — full PulseAudio/PipeWire implementation via cpal monitor devices, replacing todo!() stubs
AMD ROCm GPU support — HSA_OVERRIDE_GFX_VERSION env var for unlisted AMD GPUs, MIOPEN_LOG_LEVEL noise suppression, ROCm vs CUDA detection in GPU status
Whisper Turbo — openai/whisper-large-v3-turbo added to all endpoints (model status, download, delete, transcribe)
Spawn fix — tauri::async_runtime::spawn instead of tokio::spawn in window close handler to prevent panic on shutdown
Linux release builds — uncommented ubuntu-22.04 in the CI release matrix

Changes NOT taken from #89

tauri.conf.json resource removal (would break macOS builds)
package.json dev script changes (opinionated, unnecessary)
Default Whisper model change from base to turbo (should be a separate user-facing decision)

Files changed

.github/workflows/release.yml — enable Linux build
backend/main.py — ROCm env vars, ROCm detection, Whisper Turbo in 3 config dicts
backend/backends/pytorch_backend.py — Whisper Turbo in HF repos, cleaner language kwargs
tauri/src-tauri/src/audio_capture/linux.rs — full implementation (~300 lines)
tauri/src-tauri/src/main.rs — async_runtime::spawn fix

Supersedes the relevant parts of #89 and #214.

Summary by CodeRabbit

New Features
- Added Linux audio capture support for recording audio input.
- Introduced Whisper turbo model variant for faster speech-to-text transcription.
- Enhanced AMD GPU (ROCm) detection and support alongside NVIDIA CUDA.
Bug Fixes
- Fixed app shutdown handling to prevent crashes during exit.
Chores
- Added Ubuntu 22.04 to release workflow.

Cherry-picked and adapted from PR #89 and #214: - Linux audio capture via PulseAudio/PipeWire monitor sources (cpal) - AMD ROCm GPU support: HSA_OVERRIDE_GFX_VERSION env var, ROCm detection - Whisper Turbo model (openai/whisper-large-v3-turbo) in all endpoints - Cleaner Whisper language handling via generate_kwargs - tauri::async_runtime::spawn fix to prevent panic on app shutdown - Enable Linux (ubuntu-22.04) in release CI matrix

coderabbitai · 2026-03-13T10:36:01Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request adds Ubuntu 22.04 to the release workflow matrix, introduces a Whisper turbo model variant with improved language handling in the PyTorch backend, enhances GPU detection for AMD ROCm, and implements comprehensive Linux audio capture functionality with WAV encoding support.

Changes

Cohort / File(s)	Summary
CI/Release `.github/workflows/release.yml`	Uncommented Ubuntu 22.04 matrix entry with platform "ubuntu-22.04", empty args, Python 3.12, and PyTorch backend to enable first-class CI testing on Ubuntu 22.04.
Backend Model Support `backend/backends/pytorch_backend.py`, `backend/main.py`	Added Whisper turbo model variant ("openai/whisper-large-v3-turbo"); refactored language handling in transcription to conditionally build generate_kwargs with forced_decoder_ids only when language is provided; enhanced GPU detection to distinguish ROCm (AMD) from CUDA (NVIDIA) via torch.version.hip; initialized AMD ROCm environment variables early to prevent MIOpen warnings.
Tauri Audio Capture `tauri/src-tauri/src/audio_capture/linux.rs`	Implemented complete Linux audio capture pipeline: start_capture initializes state, spawns capture thread selecting monitor/fallback device, configures streams for multiple sample formats (F32, I16, U16), enforces max duration timeout; stop_capture exports captured samples as 16-bit PCM WAV, encodes to base64; added real-time error logging and shared error propagation; is_supported enhanced to detect monitor devices with fallback logic.
Tauri Runtime `tauri/src-tauri/src/main.rs`	Replaced tokio::spawn with tauri::async_runtime::spawn in window close handling to prevent panics during app shutdown when Tokio runtime is dropped; preserves existing control flow for frontend signal or timeout before window closure.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Capture as start_capture()
    participant Device as Audio Device
    participant Thread as Capture Thread
    participant State as Shared State
    participant Stop as stop_capture()
    participant WAV as WAV Encoder
    participant Output as Base64 Result

    Client->>Capture: Initialize audio capture
    Capture->>State: Set up state & channels
    Capture->>Thread: Spawn capture thread
    Thread->>Device: Select monitor/fallback device
    Thread->>Device: Configure stream (sample format)
    Thread->>State: Stream audio data into samples
    Thread->>State: Check timeout or stop signal
    Note over Thread: Wait for stop signal or max duration
    
    Client->>Stop: Stop capture
    Stop->>State: Signal stop flag
    Stop->>State: Wait briefly for thread
    Stop->>State: Retrieve captured samples
    Stop->>WAV: Convert samples to 16-bit PCM WAV
    WAV->>WAV: Encode WAV bytes to base64
    Stop->>Output: Return base64 string
    Output->>Client: Audio data ready

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰✨ A turbo whisper now flows,
Linux captures audio's sweet throes,
GPU knows its AMD soul,
ROCm rocks and plays its role!
Ubuntu joins the CI race,
Code harmonizes—what a pace! 🎵

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main features introduced: Linux support, AMD ROCm, Whisper Turbo, and a spawn fix.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/linux-rocm-whisper-turbo

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jamiepine merged commit f58c7c1 into main Mar 13, 2026
1 check was pending

jamiepine mentioned this pull request Mar 13, 2026

Linux Support #89

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262

feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262
jamiepine merged 1 commit intomainfrom
feat/linux-rocm-whisper-turbo

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jamiepine commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes NOT taken from #89

Files changed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamiepine commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading