Local speech-to-text that types into any focused application on GNOME Wayland. Uses whisper.cpp with Intel Arc iGPU (SYCL) acceleration.
Hotkey (Ctrl+Alt+V) → whisper-ptt script
↓ toggles on/off
rec (sox) → silence detection → audio clip
↓ (background)
curl → whisper-server (SYCL GPU, port 8178) → transcribed text
↓
ydotool type → typed into focused app
| Component | Purpose | Location |
|---|---|---|
| whisper-ptt | Main toggle script | ~/.local/bin/whisper-ptt |
| whisper-server | Persistent whisper.cpp HTTP server | systemd user service |
| ydotoold | Virtual keyboard daemon | systemd user service |
| whisper.cpp | Speech-to-text engine (SYCL build) | ~/projects/whisper.cpp/ |
| Model | ggml-base.en (142MB) | ~/projects/whisper.cpp/models/ggml-base.en.bin |
# Core
sudo dnf install sox wtype ydotool intel-compute-runtime oneapi-level-zero cmake
# Intel oneAPI Base Toolkit (SYCL compiler + runtime)
sudo dnf install intel-oneapi-base-toolkit
# Build dependencies
sudo dnf install build-essential # or gcc gcc-c++ make on Fedora# whisper-server: keeps model loaded in memory for fast inference
systemctl --user status whisper-server
systemctl --user restart whisper-server
# ydotoold: virtual keyboard daemon (required for ydotool)
systemctl --user status ydotoold
# Service files
~/.config/systemd/user/whisper-server.service
~/.config/systemd/user/ydotoold.serviceSet via GNOME custom shortcuts (gsettings):
# View current binding
gsettings get org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ binding
# Change binding
gsettings set org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ binding '<Ctrl><Alt>v'| Say | Action | Keybinding |
|---|---|---|
| "enter" / "press enter" / "submit" / "send it" | Press Enter, reset line | Enter |
| "new line" / "newline" | Press Enter | Enter |
| "tab" / "press tab" | Press Tab | Tab |
| "delete" / "undo" / "delete that" / "undo that" | Undo last input | Ctrl+_ |
| "delete line" / "clear line" | Clear entire line | Ctrl+E, Ctrl+U |
# Edit the service to change the model path
vim ~/.config/systemd/user/whisper-server.service
# Available models: ggml-base.en.bin (142MB), ggml-small.en.bin (466MB)
systemctl --user daemon-reload
systemctl --user restart whisper-serverThe server uses --prompt to bias recognition toward coding terms.
Edit the prompt in ~/.config/systemd/user/whisper-server.service.
# Current session log (overwritten each session)
cat /tmp/whisper-ptt.log
# Format: [TIME] audio=Xs latency=Yms text="..." cmd="..."systemctl --user disable --now whisper-server ydotoold
rm ~/.config/systemd/user/whisper-server.service
rm ~/.config/systemd/user/ydotoold.service
systemctl --user daemon-reloadrm ~/.local/bin/whisper-ptt
gsettings reset org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ binding
gsettings reset org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ command
gsettings reset org.gnome.settings-daemon.plugins.media-keys.custom-keybinding:/org/gnome/settings-daemon/plugins/media-keys/custom-keybindings/custom0/ name
gsettings set org.gnome.settings-daemon.plugins.media-keys custom-keybindings '[]'rm -rf ~/projects/whisper.cppsudo rm /etc/udev/rules.d/70-uinput.rules
sudo udevadm control --reload-rulessudo dnf remove ydotool sox wtype intel-compute-runtime oneapi-level-zero intel-oneapi-base-toolkit cmake
# Remove user from groups
sudo gpasswd -d $USER input
sudo gpasswd -d $USER render
sudo gpasswd -d $USER video- Segfault on Ctrl+C: harmless, sox doesn't handle SIGINT cleanly. Use hotkey toggle instead.
- "Compositor does not support virtual keyboard protocol": wtype doesn't work on GNOME. Use ydotool.
- Shortcuts don't work:
gsd-media-keysmay not be running or may have crashed. Fix:This is a known issue on GNOME 49 / Fedora 43 where# Check if running ps aux | grep gsd-media-keys # Restart it kill $(pgrep gsd-media-keys) 2>/dev/null; sleep 1; /usr/libexec/gsd-media-keys &
gsd-media-keyscrashes silently. If it happens frequently, add to autostart:mkdir -p ~/.config/autostart cat > ~/.config/autostart/gsd-media-keys.desktop << 'EOF' [Desktop Entry] Type=Application Name=GSD Media Keys Exec=/usr/libexec/gsd-media-keys X-GNOME-Autostart-Phase=Initialization NoDisplay=true EOF
- Socket errors from ydotool: ensure ydotoold user service is running and
YDOTOOL_SOCKETis not set. - Slow first transcription: SYCL kernel JIT compile. Subsequent calls are cached and fast.