11# talktype
22
3- Push-to-talk speech-to-text for Linux. Bind a keyboard shortcut, press it to
4- start recording, press it again to transcribe and type the text wherever your
5- cursor is .
3+ Push-to-talk speech-to-text for Linux. Press a hotkey to start recording, press
4+ it again to transcribe and type the text wherever your cursor is. No GUI, no
5+ app to keep running — just a keyboard shortcut .
66
7- Transcription is pluggable — ships with
8- [ faster-whisper] ( https://github.com/SYSTRAN/faster-whisper ) by default, but you
9- can swap in any model or tool that reads audio and prints text.
7+ - ** Pluggable backends** — swap transcription models without changing anything else
8+ - ** Works everywhere** — GNOME, Sway, Hyprland, i3, X11
9+ - ** ~ 100 lines of bash** — easy to read, easy to hack on
10+
11+ Ships with [ faster-whisper] ( https://github.com/SYSTRAN/faster-whisper ) by
12+ default, plus optional [ Parakeet] ( https://huggingface.co/nvidia/parakeet-ctc-1.1b )
13+ and [ Moonshine] ( https://huggingface.co/UsefulSensors/moonshine-base ) backends.
14+ Or bring your own — anything that reads a WAV and prints text works.
1015
1116> ** Note:** This project is in early development — expect rough edges. If you
1217> run into issues, please [ open a bug] ( https://github.com/csheaff/talktype/issues ) .
1318
1419## Requirements
1520
1621- Linux (Wayland or X11)
17- - PipeWire (default on most modern distros )
22+ - Audio recorder: [ ffmpeg ] ( https://ffmpeg.org/ ) (preferred) or PipeWire ( ` pw-record ` )
1823- [ ydotool] ( https://github.com/ReimuNotMoe/ydotool ) for typing text
1924 (user must be in the ` input ` group — see Install)
20- - [ socat] ( https://linux.die.net/man/1/socat ) (only needed for server mode )
25+ - [ socat] ( https://linux.die.net/man/1/socat ) (for server-backed transcription )
2126
2227For the default backend (faster-whisper):
2328- NVIDIA GPU with CUDA (or use CPU mode — see Whisper backend options)
@@ -53,6 +58,22 @@ Then **reboot** for the group change to take effect.
5358make model
5459```
5560
61+ ## Configuration
62+
63+ talktype reads ` ~/.config/talktype/config ` on startup (follows ` $XDG_CONFIG_HOME ` ).
64+ This works everywhere — GNOME shortcuts, terminals, Sway, cron — no need to set
65+ environment variables in each context.
66+
67+ ``` bash
68+ mkdir -p ~ /.config/talktype
69+ cat > ~ /.config/talktype/config << 'EOF '
70+ TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
71+ EOF
72+ ```
73+
74+ Any ` TALKTYPE_* ` variable can go in this file. Environment variables still work
75+ and are applied after the config file, so they override it.
76+
5677## Setup
5778
5879Bind ` talktype ` to a keyboard shortcut:
@@ -75,21 +96,19 @@ bindsym $mod+d exec talktype
7596
7697## Backends
7798
78- Three backends are included. Each has a one-shot script (loads model per
79- invocation) and a server mode ( loads model once, keeps it in memory) .
99+ Three backends are included. Server backends auto-start on first use — the
100+ model loads once and stays in memory for fast subsequent transcriptions .
80101
81102### Whisper (default)
82103
83- The default backend uses [ faster-whisper] ( https://github.com/SYSTRAN/faster-whisper ) .
84- Best with a GPU .
104+ [ faster-whisper] ( https://github.com/SYSTRAN/faster-whisper ) . Best with a GPU .
105+ Works out of the box after ` make install ` with no config needed .
85106
86- ``` bash
87- # One-shot (default, no extra setup needed)
88- talktype
107+ For faster repeated use, switch to server mode in your config:
89108
90- # Server mode (faster — model stays in memory)
91- ./transcribe-server start
92- export TALKTYPE_CMD=" /path/to/talktype/transcribe-server transcribe"
109+ ``` bash
110+ # ~/.config/talktype/config
111+ TALKTYPE_CMD=" /path/to/talktype/transcribe-server transcribe"
93112```
94113
95114| Variable | Default | Description |
@@ -99,17 +118,19 @@ export TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
99118| ` WHISPER_DEVICE ` | ` cuda ` | ` cuda ` or ` cpu ` |
100119| ` WHISPER_COMPUTE ` | ` float16 ` | ` float16 ` (GPU), ` int8 ` or ` float32 ` (CPU) |
101120
102- ### Parakeet (GPU, best accuracy)
121+ ### Parakeet (GPU, best word accuracy)
103122
104123[ NVIDIA Parakeet CTC 1.1B] ( https://huggingface.co/nvidia/parakeet-ctc-1.1b )
105- via HuggingFace Transformers. 1.1B params, excellent accuracy.
124+ via HuggingFace Transformers. 1.1B params, excellent word accuracy.
125+ Note: CTC model — outputs lowercase text without punctuation.
106126
107127``` bash
108128make parakeet
129+ ```
109130
110- # Server mode (recommended — 4.2GB model)
111- ./backends/parakeet-server start
112- export TALKTYPE_CMD=" /path/to/talktype/backends/parakeet-server transcribe"
131+ ``` bash
132+ # ~/.config/talktype/config
133+ TALKTYPE_CMD=" /path/to/talktype/backends/parakeet-server transcribe"
113134```
114135
115136### Moonshine (CPU, lightweight)
@@ -119,25 +140,34 @@ Sensors. 61.5M params, purpose-built for CPU/edge inference.
119140
120141``` bash
121142make moonshine
143+ ```
122144
123- # One-shot (fine for this small model)
124- export TALKTYPE_CMD=" /path/to/talktype/backends/moonshine"
125-
126- # Or server mode
127- ./backends/moonshine-server start
128- export TALKTYPE_CMD=" /path/to/talktype/backends/moonshine-server transcribe"
145+ ``` bash
146+ # ~/.config/talktype/config
147+ TALKTYPE_CMD=" /path/to/talktype/backends/moonshine-server transcribe"
129148```
130149
131150Set ` MOONSHINE_MODEL=UsefulSensors/moonshine-tiny ` for an even smaller 27M
132151param model.
133152
153+ ### Manual server management
154+
155+ The server starts automatically on first transcription. You can also manage
156+ it directly:
157+
158+ ``` bash
159+ ./backends/parakeet-server start # start manually
160+ ./backends/parakeet-server stop # stop the server
161+ ```
162+
134163### Custom backends
135164
136165Set ` TALKTYPE_CMD ` to any command that takes a WAV file path as its last
137166argument and prints text to stdout:
138167
139168``` bash
140- export TALKTYPE_CMD=" /path/to/my-transcriber"
169+ # ~/.config/talktype/config
170+ TALKTYPE_CMD=" /path/to/my-transcriber"
141171```
142172
143173Your command will be called as: ` $TALKTYPE_CMD /path/to/recording.wav `
0 commit comments