Skip to content

Commit f67aefd

Browse files
authored
Merge pull request #5 from csheaff/dev
Add config file, server auto-start, and bug fixes
2 parents 7a2a3ee + 2d8988c commit f67aefd

File tree

9 files changed

+126
-54
lines changed

9 files changed

+126
-54
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ install: deps venv
1212

1313
# Install system dependencies (requires sudo)
1414
deps:
15-
sudo apt install -y ydotool pipewire libnotify-bin python3-venv socat
15+
sudo apt install -y ydotool ffmpeg pipewire libnotify-bin python3-venv socat
1616

1717
# Create Python venv with faster-whisper (default backend)
1818
venv: .venv/.done

README.md

Lines changed: 60 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,28 @@
11
# talktype
22

3-
Push-to-talk speech-to-text for Linux. Bind a keyboard shortcut, press it to
4-
start recording, press it again to transcribe and type the text wherever your
5-
cursor is.
3+
Push-to-talk speech-to-text for Linux. Press a hotkey to start recording, press
4+
it again to transcribe and type the text wherever your cursor is. No GUI, no
5+
app to keep running — just a keyboard shortcut.
66

7-
Transcription is pluggable — ships with
8-
[faster-whisper](https://github.com/SYSTRAN/faster-whisper) by default, but you
9-
can swap in any model or tool that reads audio and prints text.
7+
- **Pluggable backends** — swap transcription models without changing anything else
8+
- **Works everywhere** — GNOME, Sway, Hyprland, i3, X11
9+
- **~100 lines of bash** — easy to read, easy to hack on
10+
11+
Ships with [faster-whisper](https://github.com/SYSTRAN/faster-whisper) by
12+
default, plus optional [Parakeet](https://huggingface.co/nvidia/parakeet-ctc-1.1b)
13+
and [Moonshine](https://huggingface.co/UsefulSensors/moonshine-base) backends.
14+
Or bring your own — anything that reads a WAV and prints text works.
1015

1116
> **Note:** This project is in early development — expect rough edges. If you
1217
> run into issues, please [open a bug](https://github.com/csheaff/talktype/issues).
1318
1419
## Requirements
1520

1621
- Linux (Wayland or X11)
17-
- PipeWire (default on most modern distros)
22+
- Audio recorder: [ffmpeg](https://ffmpeg.org/) (preferred) or PipeWire (`pw-record`)
1823
- [ydotool](https://github.com/ReimuNotMoe/ydotool) for typing text
1924
(user must be in the `input` group — see Install)
20-
- [socat](https://linux.die.net/man/1/socat) (only needed for server mode)
25+
- [socat](https://linux.die.net/man/1/socat) (for server-backed transcription)
2126

2227
For the default backend (faster-whisper):
2328
- NVIDIA GPU with CUDA (or use CPU mode — see Whisper backend options)
@@ -53,6 +58,22 @@ Then **reboot** for the group change to take effect.
5358
make model
5459
```
5560

61+
## Configuration
62+
63+
talktype reads `~/.config/talktype/config` on startup (follows `$XDG_CONFIG_HOME`).
64+
This works everywhere — GNOME shortcuts, terminals, Sway, cron — no need to set
65+
environment variables in each context.
66+
67+
```bash
68+
mkdir -p ~/.config/talktype
69+
cat > ~/.config/talktype/config << 'EOF'
70+
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
71+
EOF
72+
```
73+
74+
Any `TALKTYPE_*` variable can go in this file. Environment variables still work
75+
and are applied after the config file, so they override it.
76+
5677
## Setup
5778

5879
Bind `talktype` to a keyboard shortcut:
@@ -75,21 +96,19 @@ bindsym $mod+d exec talktype
7596

7697
## Backends
7798

78-
Three backends are included. Each has a one-shot script (loads model per
79-
invocation) and a server mode (loads model once, keeps it in memory).
99+
Three backends are included. Server backends auto-start on first use — the
100+
model loads once and stays in memory for fast subsequent transcriptions.
80101

81102
### Whisper (default)
82103

83-
The default backend uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper).
84-
Best with a GPU.
104+
[faster-whisper](https://github.com/SYSTRAN/faster-whisper). Best with a GPU.
105+
Works out of the box after `make install` with no config needed.
85106

86-
```bash
87-
# One-shot (default, no extra setup needed)
88-
talktype
107+
For faster repeated use, switch to server mode in your config:
89108

90-
# Server mode (faster — model stays in memory)
91-
./transcribe-server start
92-
export TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
109+
```bash
110+
# ~/.config/talktype/config
111+
TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
93112
```
94113

95114
| Variable | Default | Description |
@@ -99,17 +118,19 @@ export TALKTYPE_CMD="/path/to/talktype/transcribe-server transcribe"
99118
| `WHISPER_DEVICE` | `cuda` | `cuda` or `cpu` |
100119
| `WHISPER_COMPUTE` | `float16` | `float16` (GPU), `int8` or `float32` (CPU) |
101120

102-
### Parakeet (GPU, best accuracy)
121+
### Parakeet (GPU, best word accuracy)
103122

104123
[NVIDIA Parakeet CTC 1.1B](https://huggingface.co/nvidia/parakeet-ctc-1.1b)
105-
via HuggingFace Transformers. 1.1B params, excellent accuracy.
124+
via HuggingFace Transformers. 1.1B params, excellent word accuracy.
125+
Note: CTC model — outputs lowercase text without punctuation.
106126

107127
```bash
108128
make parakeet
129+
```
109130

110-
# Server mode (recommended — 4.2GB model)
111-
./backends/parakeet-server start
112-
export TALKTYPE_CMD="/path/to/talktype/backends/parakeet-server transcribe"
131+
```bash
132+
# ~/.config/talktype/config
133+
TALKTYPE_CMD="/path/to/talktype/backends/parakeet-server transcribe"
113134
```
114135

115136
### Moonshine (CPU, lightweight)
@@ -119,25 +140,34 @@ Sensors. 61.5M params, purpose-built for CPU/edge inference.
119140

120141
```bash
121142
make moonshine
143+
```
122144

123-
# One-shot (fine for this small model)
124-
export TALKTYPE_CMD="/path/to/talktype/backends/moonshine"
125-
126-
# Or server mode
127-
./backends/moonshine-server start
128-
export TALKTYPE_CMD="/path/to/talktype/backends/moonshine-server transcribe"
145+
```bash
146+
# ~/.config/talktype/config
147+
TALKTYPE_CMD="/path/to/talktype/backends/moonshine-server transcribe"
129148
```
130149

131150
Set `MOONSHINE_MODEL=UsefulSensors/moonshine-tiny` for an even smaller 27M
132151
param model.
133152

153+
### Manual server management
154+
155+
The server starts automatically on first transcription. You can also manage
156+
it directly:
157+
158+
```bash
159+
./backends/parakeet-server start # start manually
160+
./backends/parakeet-server stop # stop the server
161+
```
162+
134163
### Custom backends
135164

136165
Set `TALKTYPE_CMD` to any command that takes a WAV file path as its last
137166
argument and prints text to stdout:
138167

139168
```bash
140-
export TALKTYPE_CMD="/path/to/my-transcriber"
169+
# ~/.config/talktype/config
170+
TALKTYPE_CMD="/path/to/my-transcriber"
141171
```
142172

143173
Your command will be called as: `$TALKTYPE_CMD /path/to/recording.wav`

backends/moonshine-server

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ case "${1:-}" in
1919
echo "Already running (PID $(cat "$PIDFILE"))"
2020
exit 0
2121
fi
22+
if [ ! -x "$VENV/bin/python3" ]; then
23+
echo "Moonshine backend not installed. Run: make moonshine" >&2
24+
exit 1
25+
fi
26+
rm -f "$PIDFILE" "$SOCK"
2227
echo "Starting moonshine server (loading $MODEL)..."
2328
"$VENV/bin/python3" "$SCRIPT_DIR/moonshine-daemon.py" "$SOCK" "$MODEL" &
2429
PID=$!
@@ -46,8 +51,7 @@ case "${1:-}" in
4651
;;
4752
transcribe)
4853
if [ ! -S "$SOCK" ]; then
49-
echo "Moonshine server not running. Start it with: backends/moonshine-server start" >&2
50-
exit 1
54+
"$0" start >&2 || exit 1
5155
fi
5256
echo "$2" | socat - UNIX-CONNECT:"$SOCK"
5357
;;

backends/parakeet-daemon.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import sys
44
import socket
55
import signal
6+
import torch
67
import soundfile as sf
78
from transformers import AutoProcessor, AutoModelForCTC
89

@@ -17,11 +18,13 @@
1718

1819
def transcribe(audio_path):
1920
audio, sr = sf.read(audio_path)
20-
inputs = processor(audio, sampling_rate=sr)
21-
inputs.to(model.device, dtype=model.dtype)
22-
predicted_ids = model.generate(**inputs)
23-
texts = processor.batch_decode(predicted_ids, skip_special_tokens=True)
24-
return texts[0].strip() if texts else ""
21+
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
22+
inputs = inputs.to(model.device, dtype=model.dtype)
23+
with torch.no_grad():
24+
logits = model(**inputs).logits
25+
predicted_ids = torch.argmax(logits, dim=-1)
26+
text = processor.batch_decode(predicted_ids, skip_special_tokens=True)
27+
return text[0].strip() if text else ""
2528

2629

2730
def cleanup(*_):

backends/parakeet-server

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,16 @@ case "${1:-}" in
1818
echo "Already running (PID $(cat "$PIDFILE"))"
1919
exit 0
2020
fi
21+
if [ ! -x "$VENV/bin/python3" ]; then
22+
echo "Parakeet backend not installed. Run: make parakeet" >&2
23+
exit 1
24+
fi
25+
rm -f "$PIDFILE" "$SOCK"
2126
echo "Starting parakeet server (loading model)..."
2227
"$VENV/bin/python3" "$SCRIPT_DIR/parakeet-daemon.py" "$SOCK" &
2328
PID=$!
2429
disown "$PID"
2530
echo "$PID" > "$PIDFILE"
26-
# Wait for socket to appear
2731
for i in $(seq 1 60); do
2832
[ -S "$SOCK" ] && break
2933
sleep 1
@@ -45,10 +49,8 @@ case "${1:-}" in
4549
fi
4650
;;
4751
transcribe)
48-
# Called by talktype — sends audio path to the server, prints result
4952
if [ ! -S "$SOCK" ]; then
50-
echo "Parakeet server not running. Start it with: backends/parakeet-server start" >&2
51-
exit 1
53+
"$0" start >&2 || exit 1
5254
fi
5355
echo "$2" | socat - UNIX-CONNECT:"$SOCK"
5456
;;

talktype

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,15 @@
1313
#
1414
set -euo pipefail
1515

16+
# ── Load user config (works from GNOME shortcuts, cron, etc.) ──
17+
TALKTYPE_CONFIG="${TALKTYPE_CONFIG:-${XDG_CONFIG_HOME:-$HOME/.config}/talktype/config}"
18+
# shellcheck disable=SC1090
19+
[ -f "$TALKTYPE_CONFIG" ] && source "$TALKTYPE_CONFIG"
20+
1621
TALKTYPE_DIR="${TALKTYPE_DIR:-${XDG_RUNTIME_DIR:-/tmp}/talktype}"
1722
PIDFILE="$TALKTYPE_DIR/rec.pid"
1823
AUDIOFILE="$TALKTYPE_DIR/rec.wav"
24+
NOTIFYFILE="$TALKTYPE_DIR/notify.id"
1925

2026
mkdir -p "$TALKTYPE_DIR"
2127

@@ -35,16 +41,33 @@ if [ -z "${TALKTYPE_CMD:-}" ]; then
3541
TALKTYPE_CMD="$VENV_DIR/bin/python3 $SCRIPT_DIR/transcribe $WHISPER_MODEL $WHISPER_LANG $WHISPER_DEVICE $WHISPER_COMPUTE"
3642
fi
3743

44+
# ── Notification helper ──
45+
notify() {
46+
local icon="$1" msg="$2"
47+
local -a args=(-a TalkType -u critical -i "$icon" -p "TalkType" "$msg")
48+
if [ -f "$NOTIFYFILE" ]; then
49+
args+=(-r "$(cat "$NOTIFYFILE")")
50+
fi
51+
notify-send "${args[@]}" 2>/dev/null | head -1 > "$NOTIFYFILE" || true
52+
}
53+
54+
notify_close() {
55+
if [ -f "$NOTIFYFILE" ]; then
56+
notify-send -a TalkType -r "$(cat "$NOTIFYFILE")" -e "TalkType" "" 2>/dev/null || true
57+
rm -f "$NOTIFYFILE"
58+
fi
59+
}
60+
3861
# ── Check core dependencies ──
3962
check_deps() {
4063
local missing=()
4164
command -v ydotool &>/dev/null || missing+=(ydotool)
42-
command -v pw-record &>/dev/null || missing+=(pipewire)
65+
command -v ffmpeg &>/dev/null || command -v pw-record &>/dev/null || missing+=("ffmpeg or pipewire")
4366
command -v notify-send &>/dev/null || missing+=(libnotify-bin)
4467

4568
if [ ${#missing[@]} -gt 0 ]; then
4669
echo "Missing: ${missing[*]}" >&2
47-
notify-send -h string:x-canonical-private-synchronous:talktype -t 3000 -i dialog-error "TalkType" "Missing: ${missing[*]}" 2>/dev/null || true
70+
notify-send -t 3000 -i dialog-error "TalkType" "Missing: ${missing[*]}" 2>/dev/null || true
4871
exit 1
4972
fi
5073
}
@@ -55,21 +78,26 @@ check_deps
5578
if [ -f "$PIDFILE" ]; then
5679
PID=$(cat "$PIDFILE")
5780
kill "$PID" 2>/dev/null || true
58-
wait "$PID" 2>/dev/null || true
81+
# Wait for recorder to finalize the file (not a child, so wait(1) won't work)
82+
while kill -0 "$PID" 2>/dev/null; do sleep 0.05; done
5983
rm -f "$PIDFILE"
6084

85+
notify process-working "Transcribing..."
86+
6187
# Run the transcription command with the audio file as last arg
6288
TEXT=$($TALKTYPE_CMD "$AUDIOFILE")
6389

6490
rm -f "$AUDIOFILE"
6591

6692
if [ -z "$TEXT" ]; then
67-
notify-send -h string:x-canonical-private-synchronous:talktype -t 1500 -i dialog-warning "TalkType" "No speech detected" 2>/dev/null || true
93+
notify dialog-warning "No speech detected"
6894
exit 0
6995
fi
7096

71-
# Type text at cursor via ydotool (works on any Wayland compositor)
72-
ydotool type -- "$TEXT"
97+
notify_close
98+
99+
# Type text at cursor via ydotool
100+
ydotool type --key-delay 50 -- "$TEXT"
73101

74102
# ── Otherwise → start recording ──
75103
else
@@ -84,5 +112,5 @@ else
84112
PID=$!
85113
disown "$PID"
86114
echo "$PID" > "$PIDFILE"
87-
notify-send -h string:x-canonical-private-synchronous:talktype -t 1500 -i audio-input-microphone "TalkType" "Listening..." 2>/dev/null || true
115+
notify audio-input-microphone "Listening..."
88116
fi

test/server.bats

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,12 +73,12 @@ start_mock_daemon() {
7373

7474
# ── Server wrapper logic ──
7575

76-
@test "transcribe fails with helpful message when server not running" {
77-
# Test each server script's transcribe command without a running server
76+
@test "transcribe auto-start fails gracefully when backend not installed" {
77+
# With no venv installed, transcribe should attempt auto-start and fail
7878
for server in transcribe-server backends/parakeet-server backends/moonshine-server; do
7979
run "$REPO_DIR/$server" transcribe /tmp/test.wav
8080
[ "$status" -eq 1 ]
81-
[[ "$output" == *"not running"* ]]
81+
[[ "$output" == *"not installed"* ]]
8282
done
8383
}
8484

test/talktype.bats

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
# with simple mocks so we can test the control flow in isolation.
66

77
setup() {
8+
export TALKTYPE_CONFIG="/dev/null"
89
export TALKTYPE_DIR="$BATS_TEST_TMPDIR/talktype"
910
export TALKTYPE_CMD="$BATS_TEST_DIRNAME/mock-transcribe"
1011

transcribe-server

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,11 @@ case "${1:-}" in
2222
echo "Already running (PID $(cat "$PIDFILE"))"
2323
exit 0
2424
fi
25+
if [ ! -x "$VENV/bin/python3" ]; then
26+
echo "Whisper backend not installed. Run: make install" >&2
27+
exit 1
28+
fi
29+
rm -f "$PIDFILE" "$SOCK"
2530
echo "Starting whisper server (loading $WHISPER_MODEL model)..."
2631
"$VENV/bin/python3" "$SCRIPT_DIR/whisper-daemon.py" "$SOCK" "$WHISPER_MODEL" "$WHISPER_LANG" "$WHISPER_DEVICE" "$WHISPER_COMPUTE" &
2732
PID=$!
@@ -49,8 +54,7 @@ case "${1:-}" in
4954
;;
5055
transcribe)
5156
if [ ! -S "$SOCK" ]; then
52-
echo "Whisper server not running. Start it with: transcribe-server start" >&2
53-
exit 1
57+
"$0" start >&2 || exit 1
5458
fi
5559
echo "$2" | socat - UNIX-CONNECT:"$SOCK"
5660
;;

0 commit comments

Comments
 (0)