Skip to content

feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262

Merged
jamiepine merged 1 commit intomainfrom
feat/linux-rocm-whisper-turbo
Mar 13, 2026
Merged

feat: Linux support, AMD ROCm, Whisper Turbo, and spawn fix#262
jamiepine merged 1 commit intomainfrom
feat/linux-rocm-whisper-turbo

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 13, 2026

Summary

Cherry-picks the good parts of PR #89 (@rocopolas) and #214 without the macOS-breaking changes, plus enables Linux in the release CI.

  • Linux audio capture — full PulseAudio/PipeWire implementation via cpal monitor devices, replacing todo!() stubs
  • AMD ROCm GPU supportHSA_OVERRIDE_GFX_VERSION env var for unlisted AMD GPUs, MIOPEN_LOG_LEVEL noise suppression, ROCm vs CUDA detection in GPU status
  • Whisper Turboopenai/whisper-large-v3-turbo added to all endpoints (model status, download, delete, transcribe)
  • Spawn fixtauri::async_runtime::spawn instead of tokio::spawn in window close handler to prevent panic on shutdown
  • Linux release builds — uncommented ubuntu-22.04 in the CI release matrix

Changes NOT taken from #89

  • tauri.conf.json resource removal (would break macOS builds)
  • package.json dev script changes (opinionated, unnecessary)
  • Default Whisper model change from base to turbo (should be a separate user-facing decision)

Files changed

  • .github/workflows/release.yml — enable Linux build
  • backend/main.py — ROCm env vars, ROCm detection, Whisper Turbo in 3 config dicts
  • backend/backends/pytorch_backend.py — Whisper Turbo in HF repos, cleaner language kwargs
  • tauri/src-tauri/src/audio_capture/linux.rs — full implementation (~300 lines)
  • tauri/src-tauri/src/main.rs — async_runtime::spawn fix

Supersedes the relevant parts of #89 and #214.

Summary by CodeRabbit

  • New Features

    • Added Linux audio capture support for recording audio input.
    • Introduced Whisper turbo model variant for faster speech-to-text transcription.
    • Enhanced AMD GPU (ROCm) detection and support alongside NVIDIA CUDA.
  • Bug Fixes

    • Fixed app shutdown handling to prevent crashes during exit.
  • Chores

    • Added Ubuntu 22.04 to release workflow.

Cherry-picked and adapted from PR #89 and #214:

- Linux audio capture via PulseAudio/PipeWire monitor sources (cpal)
- AMD ROCm GPU support: HSA_OVERRIDE_GFX_VERSION env var, ROCm detection
- Whisper Turbo model (openai/whisper-large-v3-turbo) in all endpoints
- Cleaner Whisper language handling via generate_kwargs
- tauri::async_runtime::spawn fix to prevent panic on app shutdown
- Enable Linux (ubuntu-22.04) in release CI matrix
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This pull request adds Ubuntu 22.04 to the release workflow matrix, introduces a Whisper turbo model variant with improved language handling in the PyTorch backend, enhances GPU detection for AMD ROCm, and implements comprehensive Linux audio capture functionality with WAV encoding support.

Changes

Cohort / File(s) Summary
CI/Release
.github/workflows/release.yml
Uncommented Ubuntu 22.04 matrix entry with platform "ubuntu-22.04", empty args, Python 3.12, and PyTorch backend to enable first-class CI testing on Ubuntu 22.04.
Backend Model Support
backend/backends/pytorch_backend.py, backend/main.py
Added Whisper turbo model variant ("openai/whisper-large-v3-turbo"); refactored language handling in transcription to conditionally build generate_kwargs with forced_decoder_ids only when language is provided; enhanced GPU detection to distinguish ROCm (AMD) from CUDA (NVIDIA) via torch.version.hip; initialized AMD ROCm environment variables early to prevent MIOpen warnings.
Tauri Audio Capture
tauri/src-tauri/src/audio_capture/linux.rs
Implemented complete Linux audio capture pipeline: start_capture initializes state, spawns capture thread selecting monitor/fallback device, configures streams for multiple sample formats (F32, I16, U16), enforces max duration timeout; stop_capture exports captured samples as 16-bit PCM WAV, encodes to base64; added real-time error logging and shared error propagation; is_supported enhanced to detect monitor devices with fallback logic.
Tauri Runtime
tauri/src-tauri/src/main.rs
Replaced tokio::spawn with tauri::async_runtime::spawn in window close handling to prevent panics during app shutdown when Tokio runtime is dropped; preserves existing control flow for frontend signal or timeout before window closure.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Capture as start_capture()
    participant Device as Audio Device
    participant Thread as Capture Thread
    participant State as Shared State
    participant Stop as stop_capture()
    participant WAV as WAV Encoder
    participant Output as Base64 Result

    Client->>Capture: Initialize audio capture
    Capture->>State: Set up state & channels
    Capture->>Thread: Spawn capture thread
    Thread->>Device: Select monitor/fallback device
    Thread->>Device: Configure stream (sample format)
    Thread->>State: Stream audio data into samples
    Thread->>State: Check timeout or stop signal
    Note over Thread: Wait for stop signal or max duration
    
    Client->>Stop: Stop capture
    Stop->>State: Signal stop flag
    Stop->>State: Wait briefly for thread
    Stop->>State: Retrieve captured samples
    Stop->>WAV: Convert samples to 16-bit PCM WAV
    WAV->>WAV: Encode WAV bytes to base64
    Stop->>Output: Return base64 string
    Output->>Client: Audio data ready
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰✨ A turbo whisper now flows,
Linux captures audio's sweet throes,
GPU knows its AMD soul,
ROCm rocks and plays its role!
Ubuntu joins the CI race,
Code harmonizes—what a pace! 🎵

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main features introduced: Linux support, AMD ROCm, Whisper Turbo, and a spawn fix.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/linux-rocm-whisper-turbo
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jamiepine jamiepine merged commit f58c7c1 into main Mar 13, 2026
1 check was pending
@jamiepine jamiepine mentioned this pull request Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant