Control remote desktops over VNC with native OCR β no cloud vision APIs needed.
Claude KVM is an MCP server that gives AI agents full control over remote desktop environments via VNC. It pairs a lightweight Node.js proxy with a native Swift daemon that handles screen capture, keyboard/mouse input, and on-device text detection using Apple Vision OCR. The daemon connects to any standard VNC server and translates MCP tool calls into RFB protocol actions β click, type, scroll, drag, screenshot, and element detection β all running locally on macOS with zero external API costs for vision tasks.
VNC_HOST(required): Hostname or IP of the VNC server to connect to.VNC_PORT(required): Port number of the VNC server (typically 5900).VNC_PASSWORD(optional): VNC or Apple Remote Desktop password. Required if the VNC server uses authentication.VNC_USERNAME(optional): Username for Apple Remote Desktop (ARD) authentication.CLAUDE_KVM_DAEMON_PATH(required): Path to theclaude-kvm-daemonbinary. Install via Homebrew:brew install ARAS-Workspace/tap/claude-kvm-daemon
Developer Tools
Remote Desktop Automation, GUI Testing, CI/CD Visual Testing, Screen Scraping, macOS Remote Control, Accessibility Automation, Desktop Agent Orchestration
- Full VNC desktop control β mouse clicks, double-clicks, dragging, scrolling, keyboard input, and clipboard paste
- On-device OCR via Apple Vision β detect all text elements with bounding boxes, zero API cost, ~50ms
- Screenshot and visual diff β capture full screen, crop around cursor, or detect changes since baseline
- Action queue β batch up to 20 sequential actions in a single tool call for efficient multi-step workflows
- Native Swift daemon β platform-optimized performance on Apple Silicon with static-linked LibVNC
- Runtime configuration β adjust timing parameters, display scaling, and behavior without reconnecting
- Apple Remote Desktop support β ARD authentication (type 30), Meta-to-Super key remapping for macOS targets
- Coordinate scaling β automatic mapping between scaled display space and native framebuffer resolution
- CI/CD integration β tested end-to-end on GitHub Actions with screen recording and step-by-step screenshots
- "Take a screenshot and describe what you see on the remote desktop"
- "Open the Applications folder and launch Calculator"
- "Use detect_elements to find all text on screen, then click the Submit button"
- Tool: vnc_command β Control the remote desktop: screenshot, click, type, scroll, OCR detection, and more
- Tool: action_queue β Batch multiple actions (up to 20) in a single call for efficient automation
mcp, vnc, kvm, remote-desktop, automation, ocr, apple-vision, macos, apple-silicon, native, gui-testing, screen-capture, desktop-control, ci-cd, swift, libvnc, rfb-protocol, clipboard, keyboard, mouse