π€ AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.
speakmcp-vid.mp4
Platform Support: macOS (Apple Silicon & Intel) with full MCP agent functionality.
β οΈ Windows/Linux: MCP tools not currently supported β see v0.2.2 for dictation-only builds.
Voice Recording:
- Hold
Ctrl(macOS/Linux) orCtrl+/(Windows) to start recording - Release to stop recording and transcribe
- Text is automatically inserted into your active application
MCP Agent Mode (macOS only):
- Hold
Ctrl+Altto start recording for agent mode - Release
Ctrl+Altto process with MCP tools - Watch real-time progress as the agent executes tools
- Results are automatically inserted or displayed
Text Input:
Ctrl+T(macOS/Linux) orCtrl+Shift+T(Windows) for direct typing
- Voice-to-Text: Hold
Ctrl(macOS/Linux) orCtrl+/(Windows) to record - Toggle Voice Dictation: Press
Fnkey to start/stop recording (configurable) - Multi-Language Support: 30+ languages including Spanish, French, German, Chinese, Japanese, Arabic, Hindi
- Text-to-Speech (TTS): AI-generated speech with 50+ voices across OpenAI, Groq, and Gemini
- Auto-Play TTS: Automatic speech playback for seamless conversations
- MCP Agent Mode: Hold
Ctrl+Altfor intelligent tool execution with real-time progress - MCP Integration: Connect to any MCP-compatible tools and services
- OAuth 2.1 Support: Secure authentication for MCP servers with deep link integration
- Tool Management: Per-server tool toggles and approval prompts
- Conversation Continuity: Context preservation across agent interactions
- Cross-Platform: macOS, Windows, and Linux support with native builds
- Rate Limit Handling: Exponential backoff retry for API rate limits (429 errors)
- Model Selection: Choose specific models for OpenAI, Groq, and Gemini providers
- Debug Modes: Comprehensive logging for LLM calls and tool execution
- Universal Integration: Works with any text-input application
- Text Input:
Ctrl+T(macOS/Linux) orCtrl+Shift+T(Windows) for direct input - Dark/Light Themes: Toggle between dark and light modes
- Resizable Panels: Drag-to-resize interface components
- Kill Switch: Emergency stop for agent operations (
Ctrl+Shift+Escape) - Conversation Management: Full conversation history with tool call visualization
Built with modern technologies for cross-platform performance:
- Electron: Main process for system integration, MCP orchestration, and TTS processing
- React + TypeScript: Modern UI with real-time progress tracking and conversation management
- Rust: High-performance keyboard monitoring and text injection across platforms
- MCP Client: Full Model Context Protocol implementation with OAuth 2.1 support
- Multi-Provider AI: OpenAI, Groq, and Gemini integration for speech, text, and TTS
Prerequisites: Node.js 18+, pnpm, Rust toolchain
β οΈ Important: This project uses pnpm as its package manager. Using npm or yarn may cause installation issues, especially with Electron binaries. If you don't have pnpm installed:npm install -g pnpm
# Setup
git clone https://github.com/aj47/SpeakMCP.git
cd SpeakMCP
pnpm install
pnpm build-rs # Build Rust binary for your platform
pnpm dev # Start development server
# Platform-specific builds
pnpm build # Production build for current platform
pnpm build:mac # macOS build (Apple Silicon + Intel)
pnpm build:win # Windows build (x64)
pnpm build:linux # Linux build (x64)
# Testing
pnpm test # Run test suite
pnpm test:run # Run tests once (CI mode)
pnpm test:coverage # Run tests with coverage"Electron uninstall" error when running pnpm dev:
This usually means Electron binaries weren't installed correctly. Fix it by:
# Clean install with pnpm
rm -rf node_modules
pnpm installMultiple lock files (package-lock.json, pnpm-lock.yaml, bun.lock):
If you have multiple lock files, you've mixed package managers. Clean up:
# Remove all lock files except pnpm's
rm -f package-lock.json bun.lock
rm -rf node_modules
pnpm installNode version mismatch:
This project works best with Node.js 18-20. Check your version:
node --version # Should be v18.x, v19.x, or v20.xIf using nvm, switch to the recommended version:
nvm use 20AI Providers: OpenAI, Groq, Google Gemini
- Configure API keys and custom base URLs in settings
- Select specific models for each provider
- Multi-language speech recognition support
- TTS with 50+ voices across providers
MCP Servers: Configure tools in mcpServers JSON format:
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
},
"web-search": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-web-search"],
"env": {"BRAVE_API_KEY": "your-key"}
}
}
}Keyboard Shortcuts:
- Hold Ctrl (macOS/Linux) / Ctrl+/ (Windows): Voice recording
- Fn Key: Toggle voice dictation (press once to start/stop)
- Hold Ctrl+Alt: MCP agent mode (macOS only)
- Ctrl+T (macOS/Linux) / Ctrl+Shift+T (Windows): Text input mode
- Ctrl+Shift+Escape: Kill switch for agent operations
MCP (Model Context Protocol) enables AI assistants to connect to external tools. SpeakMCP implements a full MCP client with advanced capabilities.
Enhanced Features:
- Intelligent Tool Selection: Automatically determines which tools to use
- Real-time Progress: Visual feedback with TTS narration during execution
- Conversation Continuity: Context preservation across multi-turn interactions
- OAuth 2.1 Integration: Secure authentication for MCP servers
- Rate Limit Handling: Automatic retry with exponential backoff
- Kill Switch: Emergency stop functionality with
Ctrl+Shift+Escape - Tool Management: Per-server tool toggles and approval prompts
Example commands:
- "Create a new project folder and add a README"
- "Search for latest AI news and summarize the top 3 articles"
- "Send a message to the team about today's progress"
- "Analyze this codebase and suggest improvements"
For development and troubleshooting, SpeakMCP includes comprehensive debug logging:
# Enable all debug modes
pnpm dev d # Shortest option
pnpm dev debug-all # Readable format
# Enable specific modes
pnpm dev debug-llm # LLM calls and responses
pnpm dev debug-tools # MCP tool execution
pnpm dev debug-ui # UI focus, renders, and state changesSee DEBUGGING.md for detailed debugging instructions.
We welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.
π¬ Get help on Discord | π More info at techfren.net
This project is licensed under the AGPL-3.0 License.
- Whispo - This project is a fork of Whispo, the original AI voice assistant
- OpenAI for Whisper speech recognition and GPT models
- Anthropic for Claude and MCP protocol development
- Model Context Protocol for the extensible tool integration standard
- Electron for cross-platform desktop framework
- React for the user interface
- Rust for system-level integration
- Groq for fast inference capabilities
- Google for Gemini models
Made with β€οΈ by the SpeakMCP team