A macOS desktop app that turns your voice into text — offline by default, with optional cloud enhancement. Press a hotkey, speak naturally, and text appears at your cursor. Supports 11 languages, real-time translation, and AI-powered text polishing.
Works 100% offline with production-quality results. No account, no API key, no internet required. Just install, download a model, and start speaking. Cloud features (AI polish, translation, cloud ASR) are optional enhancements — the core voice input experience is fully local.
Core principles: privacy first (audio never leaves your device in offline mode), low latency (sub-second response), and zero configuration (works out of the box).
| Offline Mode | Smart Mode | Cloud Mode | |
|---|---|---|---|
| ASR Engine | Sherpa-ONNX (local) | Sherpa-ONNX (local) | Cloud ASR (Groq, DashScope, etc.) |
| AI Polish | — | LLM correction + translation | — |
| Privacy | 100% offline | ASR offline, LLM via cloud | Audio sent to cloud |
| Latency | Fastest | +0.5~1s for LLM | Depends on network |
- 8 Offline Models — SenseVoice, Paraformer, Whisper Large-v3, FireRedASR, and more
- Two Trigger Modes — Hold to Speak (PTT) or Tap to Toggle
- Streaming & Offline — Real-time subtitles while speaking, or higher accuracy after release
| Languages | Input | Output | Translation |
|---|---|---|---|
| Chinese, English, Japanese, Korean, Cantonese | All modes | All modes | — |
| Spanish, French, German, Russian, Portuguese | Whisper / Cloud | Smart Mode | Via LLM |
- Auto-detect — Let the model detect what language you're speaking
- Translation Mode — Set different input/output languages (e.g., speak Chinese → output English). Requires Smart Mode.
- Script Control — Choose Simplified or Traditional Chinese output
| Provider | Protocol | Highlights |
|---|---|---|
| DashScope (Aliyun) | WebSocket | Paraformer realtime, Chinese optimized |
| Groq | REST (Whisper) | Fast, 99 languages |
| OpenAI | REST (Whisper/GPT-4o) | Most accurate multilingual |
| Volcengine (ByteDance) | WebSocket (binary) | Seed-ASR, highest Chinese accuracy |
| iFlytek | WebSocket | 202 dialects |
| Tencent Cloud | WebSocket | 5h/month free |
LLM post-processing: fix homophones, remove filler words, translate, enforce output language.
- 12 LLM Providers — DashScope, DeepSeek, Volcengine, OpenAI, Anthropic, Zhipu, Kimi, MiniMax, Gemini, iFlytek, Groq, Ollama (local)
- Professional Vocabulary — Industry dictionaries (Tech/Medical/Legal/Finance/Education) + personal dictionary
- Typewriter Mode (Alpha) — Stream LLM output character by character to cursor
Hotkey-driven productivity features on top of voice input:
- Flash Notes — Dedicated hotkey, speak and auto-save as timestamped Markdown to any folder
- AI Organize — Select any text, press hotkey, LLM restructures logic and appends below the original (keeps source intact)
- Instant Translation — Speak source language, output target language in real time (works with any AI Polish LLM)
- Correction Feedback — Spot an ASR/LLM mistake? Fix it inline, press the feedback hotkey — LLM diffs against the last recording trace and auto-learns the term into your vocabulary
- AI Debug — Hold hotkey to capture screenshot + voice description of a bug, auto-sent to Claude Code / Cursor / bound AI coding windows (up to 5 slots)
- Bluetooth Detection — Auto-detects headset connect/disconnect
- Device Selection — Choose preferred mic in settings
- Pre-segmentation — 3s pause triggers background decoding, minimizing final wait on stop
- Download
SpeakOut.dmgfrom Releases - Drag to
/Applications - First launch:
xattr -cr /Applications/SpeakOut.app(required until Developer ID signing) - Grant permissions: Input Monitoring, Accessibility, Microphone
- Follow the onboarding wizard to download a voice model
- macOS 13+ (Ventura or later)
- ~230MB for default model, up to ~1.4GB for Whisper/FireRedASR
| Model | Languages | Size |
|---|---|---|
| Zipformer Bilingual | Zh/En | ~490MB |
| Paraformer Streaming | Zh/En | ~1GB |
| Model | Languages | Size | Notes |
|---|---|---|---|
| SenseVoice 2024 | Zh/En/Ja/Ko/Yue | ~228MB | Default, built-in punctuation |
| SenseVoice 2025 | Zh/En/Ja/Ko/Yue | ~158MB | Cantonese enhanced |
| Paraformer Offline | Zh/En | ~217MB | Mature & stable |
| Paraformer Dialect | Zh/En + Sichuan | ~218MB | Dialect support |
| Whisper Large-v3 | 99 languages | ~1.0GB | Best multilingual |
| FireRedASR Large | Zh/En + dialects | ~1.4GB | Highest capacity |
Hotkey → native_input.m (CGEventTap)
→ C Ring Buffer (16kHz PCM)
→ CoreEngine FFI polling
→ ASR (8 offline models / 6 cloud providers)
→ LLM polish + translation (optional, 12 providers)
→ Clipboard paste to active app
| Layer | Path | Description |
|---|---|---|
| Engine | lib/engine/ |
CoreEngine, ASR providers, model management |
| Service | lib/services/ |
Config, LLM, billing, diary, audio devices |
| UI | lib/ui/ |
macOS-native UI (macos_ui), settings, overlay |
| Native | native_lib/ |
Objective-C: CGEventTap + AudioQueue ring buffer |
| Gateway | gateway/ |
Cloudflare Workers (Hono): license, billing, version check |
Codebase: ~29,000 lines across 86 files. 598 tests.
flutter pub get # Dependencies
flutter analyze # Static analysis (0 issues)
flutter test # Run tests (598 tests)
flutter build macos --release # Build
./scripts/install.sh # Install to /Applications
./scripts/create_styled_dmg.sh # Create DMG
# Native library (after modifying native_input.m)
cd native_lib && clang -dynamiclib -framework Cocoa -framework Carbon \
-framework AVFoundation -framework AudioToolbox -framework CoreAudio \
-framework Accelerate -o libnative_input.dylib native_input.m -fobjc-arc- Offline Mode — Audio never leaves your device
- Credentials — API keys stored in SharedPreferences (local, not synced); export/backup includes plaintext keys with explicit user confirmation
- Logging — User speech content never logged by default; developer mode logs may include input/output text for debugging
- Independent Review — Passed 4 rounds of independent third-party security review
Copyright © 2026 Leon. All Rights Reserved.
- 完全离线可用 — 无需账号、无需联网、无需 API Key,安装即用。8 款本地模型基于 Sherpa-ONNX,中英识别准确率媲美云端,音频不出设备
- 三种工作模式 — 纯离线(隐私优先)/ 智能(离线识别 + AI 润色)/ 云端(高精度)
- 两种触发方式 — 按住说话(PTT)或单击切换(Toggle);PTT 和 Toggle 可共用一个键
- 预分段识别 — 录音中检测到 3 秒停顿自动后台解码,停止时只等最后一段,显著减少等待
- 中英日韩粤 + 西法德俄葡,支持输入/输出自动检测
- 口译模式 — 输入中文→输出英文等任意组合,LLM 自动翻译(需智能模式)
- 闪念笔记 — 独立热键,语音直接保存为 Markdown,按天归档到自定义目录
- AI 梳理 — 选中文字按快捷键,LLM 深度重组逻辑结构并追加在原文下一行
- 即时翻译 — 按住说话自动翻译为目标语言,不影响正常录音
- 纠错反馈 — 发现识别错误,改完按反馈键,LLM 对比最近录音自动学入词汇表
- AI 一键调试 — 按住截屏+语音描述 bug,自动发送到绑定的 Claude Code / Cursor 窗口(最多 5 槽位)
- 6 家云端 ASR — 阿里云百炼(DashScope 实时)、Groq、OpenAI、火山引擎、讯飞、腾讯云
- 12 家 LLM — 百炼、DeepSeek、豆包、OpenAI、Claude、智谱、Kimi、MiniMax、Gemini、讯飞、Groq、Ollama 本地
- 服务商预置 — 新用户打开云账户即可看到完整列表,点击配置即用
- 账户导入/导出 — 跨设备迁移,JSONL 格式,含凭证
- 行业词典 + 个人词库 — 术语注入 LLM 实现领域感知
- API 密钥本地存储 — SharedPreferences,不上云、不同步;导出备份含明文密钥需用户确认
- 签名公证 — Developer ID 签名 + Apple 公证,下载双击即用,无 Gatekeeper 警告
- 从 Releases 下载
SpeakOut.dmg - 拖到
/Applications - 首次启动前:
xattr -cr /Applications/SpeakOut.app - 授权:输入监控、辅助功能、麦克风
- 按引导下载语音模型即可使用
系统要求:macOS 13+,磁盘空间 230MB ~ 1.4GB(取决于模型选择)
