Utter enables using an Android phone as a microphone for dictation on Linux. Speak into your Android device, and the transcribed text appears in your Linux text editor as if typed on the keyboard.
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Android │ WebSocket│ Relay │WebSocket│ Linux │
│ App │◄────────►│ Server │◄────────►│ Client │
│ │ │ │ │ │
│ Voice Input │ │ User-based │ │ Keyboard │
│ Google STT │ │ Routing │ │ Simulation │
└─────────────┘ └──────────────┘ └──────────────┘
-
Android App (Kotlin)
- Simple UI with EditText accepting voice input
- Uses Android's built-in speech-to-text (Google Keyboard)
- WebSocket client to send text to relay server
-
Relay Server (Node.js/TypeScript)
- WebSocket server to relay messages
- Routes text from Android → Linux client
- Manages client connections with device pairing
- Deployed to cloud (Railway, Render, or Fly.io)
-
Linux Client (Python)
- WebSocket client to receive text
- Simulates keyboard input using
xdotoolorydotool - Runs as background service
Goal: Validate core functionality without complexity
Features:
- Linux client listens on local WebSocket (port 8080)
- Android app connects directly to Linux IP (same WiFi)
- Test voice input → text transmission → keyboard simulation
- No authentication or encryption needed (same trusted network)
Success Criteria:
- Speak into Android phone
- See text appear in Linux text editor
- Latency < 500ms
Goal: Enable internet connectivity
Features:
- Create minimal WebSocket relay server (~50-100 lines)
- Simple device pairing (shared 6-digit PIN code)
- Deploy to free cloud hosting
- Both clients connect through relay
- No encryption yet (plaintext relay)
Pairing Flow:
- Generate PIN on Linux client
- Enter PIN in Android app
- Server associates both connections with same session ID
- Messages relayed between paired devices
Success Criteria:
- Android and Linux can be on different networks
- Pairing works reliably
- Messages route correctly
Goal: Make it usable daily
Features:
- Auto-reconnection on network drops
- Send-on-timeout (auto-send after 2 seconds of silence)
- Better error handling and status indicators
- Android: Persistent notification showing connection status
- Linux: System tray icon with status
- Message queue for offline scenarios
Success Criteria:
- Can use reliably throughout the day
- Gracefully handles network interruptions
- Clear feedback on connection status
Goal: Make it secure for public relay server
- Device Registration: Each device gets unique ID + public/private key pair
- Pairing Flow:
- Generate one-time pairing code on Linux client
- Enter code in Android app to establish trust
- Exchange public keys through relay
- Store paired device IDs locally
- Session Auth: Use cryptographic signatures to verify identity
- No passwords stored on server
- Relay server only validates signatures against public keys
-
End-to-End Encryption:
- Text encrypted on Android before sending
- Only Linux client can decrypt (relay server is blind)
- Use libsodium/NaCl (sealed boxes or box encryption)
- Similar to happy-server's zero-knowledge model
-
Transport Security:
- WSS (WebSocket Secure) instead of WS
- HTTPS for any REST endpoints
- Valid SSL certificates (Let's Encrypt)
- No Server Storage: Text relayed in real-time, never persisted
- Ephemeral Sessions: Messages deleted after delivery
- Forward Secrecy: Consider rotating session keys
- Rate Limiting: Prevent abuse on relay server
Security Philosophy:
The relay server should only know: "Device A wants to send to Device B" It should NEVER know: "What text is being sent"
- Language: Kotlin
- Libraries:
- OkHttp or Scarlet for WebSocket
- Material Design components
- Android Speech-to-Text API
- Language: Node.js + TypeScript
- Libraries:
socket.ioorwsfor WebSocketexpressfor REST endpoints (pairing)dotenvfor configuration
- Deployment: Railway, Render, or Fly.io (free tier)
- Language: Python 3.9+
- Libraries:
websocketsfor WebSocket clientsubprocessto callxdotoolorydotoolasynciofor async event loop
- Deployment: systemd service
utter/
├── docs/
│ └── PLAN.md # This file
├── android-app/ # Kotlin Android app
│ ├── app/
│ ├── build.gradle
│ └── README.md
├── relay-server/ # Node.js/TypeScript server
│ ├── src/
│ ├── package.json
│ ├── tsconfig.json
│ └── README.md
├── utterd/ # Python client
│ ├── utterd
│ └── README.md
└── README.md # Main project README
Android App → [Send] → Linux Client → xdotool → Text appears
↓ ↓
Voice input Simulate typing
Android App → [Send] → Relay Server → Linux Client → xdotool → Text appears
↓ ↓ ↓
Voice input Route by ID Simulate typing
{
"type": "text",
"content": "Hello world",
"timestamp": 1697654321000
}Instead of Phase 4 security, you could:
- Deploy relay server on your own VPS
- Use firewall rules to lock it down
- Run over VPN (Tailscale/WireGuard)
- Much simpler, but less convenient for public use
This project is inspired by happy-server, which demonstrates:
- WebSocket-based relay architecture
- End-to-end encryption with zero-knowledge server
- Real-time synchronization between clients
We adopt a similar relay pattern but simplify for our specific use case.
- Support for multiple Linux clients (e.g., work desktop + laptop)
- Clipboard sync in addition to keyboard simulation
- Voice commands (e.g., "new line", "backspace")
- iOS app
- Text formatting hints (markdown, code blocks)
- Offline queue with retry logic
- Usage analytics (local only, privacy-preserving)
- Phase 1: Build and test on same network
- Phase 2: Deploy relay server, test over internet
- Phase 3: Add polish and reliability features
- Phase 4: Add security when ready for public use
Start simple, iterate fast, add complexity only when needed.