SpeakMCP

🎤 AI-powered voice assistant with MCP integration - A fork of Whispo that transforms your voice into intelligent actions with advanced speech recognition, LLM processing, and Model Context Protocol (MCP) tool execution.

🎬 Preview

speakmcp-vid.mp4

🚀 Quick Start

Download

📥 Download Latest Release

Platform Support: macOS (Apple Silicon & Intel) with full MCP agent functionality. ⚠️ Windows/Linux: MCP tools not currently supported — see v0.2.2 for dictation-only builds.

Basic Usage

Voice Recording:

Hold Ctrl (macOS/Linux) or Ctrl+/ (Windows) to start recording
Release to stop recording and transcribe
Text is automatically inserted into your active application

MCP Agent Mode (macOS only):

Hold Ctrl+Alt to start recording for agent mode
Release Ctrl+Alt to process with MCP tools
Watch real-time progress as the agent executes tools
Results are automatically inserted or displayed

Text Input:

Ctrl+T (macOS/Linux) or Ctrl+Shift+T (Windows) for direct typing

✨ Features

🎤 Voice & Speech

Voice-to-Text: Hold Ctrl (macOS/Linux) or Ctrl+/ (Windows) to record
Toggle Voice Dictation: Press Fn key to start/stop recording (configurable)
Multi-Language Support: 30+ languages including Spanish, French, German, Chinese, Japanese, Arabic, Hindi
Text-to-Speech (TTS): AI-generated speech with 50+ voices across OpenAI, Groq, and Gemini
Auto-Play TTS: Automatic speech playback for seamless conversations

🤖 AI Agent & MCP

MCP Agent Mode: Hold Ctrl+Alt for intelligent tool execution with real-time progress
MCP Integration: Connect to any MCP-compatible tools and services
OAuth 2.1 Support: Secure authentication for MCP servers with deep link integration
Tool Management: Per-server tool toggles and approval prompts
Conversation Continuity: Context preservation across agent interactions

🛠️ Platform & Performance

Cross-Platform: macOS, Windows, and Linux support with native builds
Rate Limit Handling: Exponential backoff retry for API rate limits (429 errors)
Model Selection: Choose specific models for OpenAI, Groq, and Gemini providers
Debug Modes: Comprehensive logging for LLM calls and tool execution
Universal Integration: Works with any text-input application

🎨 User Experience

Text Input: Ctrl+T (macOS/Linux) or Ctrl+Shift+T (Windows) for direct input
Dark/Light Themes: Toggle between dark and light modes
Resizable Panels: Drag-to-resize interface components
Kill Switch: Emergency stop for agent operations (Ctrl+Shift+Escape)
Conversation Management: Full conversation history with tool call visualization

🏗️ Architecture

Built with modern technologies for cross-platform performance:

Electron: Main process for system integration, MCP orchestration, and TTS processing
React + TypeScript: Modern UI with real-time progress tracking and conversation management
Rust: High-performance keyboard monitoring and text injection across platforms
MCP Client: Full Model Context Protocol implementation with OAuth 2.1 support
Multi-Provider AI: OpenAI, Groq, and Gemini integration for speech, text, and TTS

🛠️ Development

Prerequisites: Node.js 18+, pnpm, Rust toolchain

⚠️ Important: This project uses pnpm as its package manager. Using npm or yarn may cause installation issues, especially with Electron binaries. If you don't have pnpm installed:
npm install -g pnpm

# Setup
git clone https://github.com/aj47/SpeakMCP.git
cd SpeakMCP
pnpm install
pnpm build-rs  # Build Rust binary for your platform
pnpm dev       # Start development server

# Platform-specific builds
pnpm build        # Production build for current platform
pnpm build:mac    # macOS build (Apple Silicon + Intel)
pnpm build:win    # Windows build (x64)
pnpm build:linux  # Linux build (x64)

# Testing
pnpm test         # Run test suite
pnpm test:run     # Run tests once (CI mode)
pnpm test:coverage # Run tests with coverage

🔧 Troubleshooting Development Setup

"Electron uninstall" error when running pnpm dev:

This usually means Electron binaries weren't installed correctly. Fix it by:

# Clean install with pnpm
rm -rf node_modules
pnpm install

Multiple lock files (package-lock.json, pnpm-lock.yaml, bun.lock):

If you have multiple lock files, you've mixed package managers. Clean up:

# Remove all lock files except pnpm's
rm -f package-lock.json bun.lock
rm -rf node_modules
pnpm install

Node version mismatch:

This project works best with Node.js 18-20. Check your version:

node --version  # Should be v18.x, v19.x, or v20.x

If using nvm, switch to the recommended version:

nvm use 20

⚙️ Configuration

AI Providers: OpenAI, Groq, Google Gemini

Configure API keys and custom base URLs in settings
Select specific models for each provider
Multi-language speech recognition support
TTS with 50+ voices across providers

MCP Servers: Configure tools in mcpServers JSON format:

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path"]
    },
    "web-search": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-web-search"],
      "env": {"BRAVE_API_KEY": "your-key"}
    }
  }
}

Keyboard Shortcuts:

Hold Ctrl (macOS/Linux) / Ctrl+/ (Windows): Voice recording
Fn Key: Toggle voice dictation (press once to start/stop)
Hold Ctrl+Alt: MCP agent mode (macOS only)
Ctrl+T (macOS/Linux) / Ctrl+Shift+T (Windows): Text input mode
Ctrl+Shift+Escape: Kill switch for agent operations

🤖 MCP Agent Mode

MCP (Model Context Protocol) enables AI assistants to connect to external tools. SpeakMCP implements a full MCP client with advanced capabilities.

Enhanced Features:

Intelligent Tool Selection: Automatically determines which tools to use
Real-time Progress: Visual feedback with TTS narration during execution
Conversation Continuity: Context preservation across multi-turn interactions
OAuth 2.1 Integration: Secure authentication for MCP servers
Rate Limit Handling: Automatic retry with exponential backoff
Kill Switch: Emergency stop functionality with Ctrl+Shift+Escape
Tool Management: Per-server tool toggles and approval prompts

Example commands:

"Create a new project folder and add a README"
"Search for latest AI news and summarize the top 3 articles"
"Send a message to the team about today's progress"
"Analyze this codebase and suggest improvements"

🐛 Debug Mode

For development and troubleshooting, SpeakMCP includes comprehensive debug logging:

# Enable all debug modes
pnpm dev d               # Shortest option
pnpm dev debug-all       # Readable format

# Enable specific modes
pnpm dev debug-llm       # LLM calls and responses
pnpm dev debug-tools     # MCP tool execution
pnpm dev debug-ui        # UI focus, renders, and state changes

See DEBUGGING.md for detailed debugging instructions.

🤝 Contributing

We welcome contributions! Fork the repo, create a feature branch, and open a Pull Request.

💬 Get help on Discord | 🌐 More info at techfren.net

📄 License

This project is licensed under the AGPL-3.0 License.

🙏 Acknowledgments

Whispo - This project is a fork of Whispo, the original AI voice assistant
OpenAI for Whisper speech recognition and GPT models
Anthropic for Claude and MCP protocol development
Model Context Protocol for the extensible tool integration standard
Electron for cross-platform desktop framework
React for the user interface
Rust for system-level integration
Groq for fast inference capabilities
Google for Gemini models

Made with ❤️ by the SpeakMCP team

Name		Name	Last commit message	Last commit date
Latest commit History 1,075 Commits
.vscode		.vscode
apps		apps
packages/shared		packages/shared
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmrc		.npmrc
.nvmrc		.nvmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakMCP

🎬 Preview

🚀 Quick Start

Download

Basic Usage

✨ Features

🎤 Voice & Speech

🤖 AI Agent & MCP

🛠️ Platform & Performance

🎨 User Experience

🏗️ Architecture

🛠️ Development

🔧 Troubleshooting Development Setup

⚙️ Configuration

🤖 MCP Agent Mode

🐛 Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

mg-dev25/SpeakMCP

Folders and files

Latest commit

History

Repository files navigation

SpeakMCP

🎬 Preview

🚀 Quick Start

Download

Basic Usage

✨ Features

🎤 Voice & Speech

🤖 AI Agent & MCP

🛠️ Platform & Performance

🎨 User Experience

🏗️ Architecture

🛠️ Development

🔧 Troubleshooting Development Setup

⚙️ Configuration

🤖 MCP Agent Mode

🐛 Debug Mode

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages