GitHub - KingsYR123/MiniMax-MCP: Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech and video/image generation APIs. This server allows MCP clients like Claude Desktop, Cursor, Windsurf, OpenAI Agents and others to generate speech, clone voices, generate video, generate image and more.

Documentation

中文文档
MiniMax-MCP-JS - Official JavaScript implementation of MiniMax MCP

Quickstart with MCP Client

Get your API key from MiniMax.
Install uv (Python package manager), install with curl -LsSf https://astral.sh/uv/install.sh | sh or see the uv repo for additional install methods.
Important: The API host and key vary by region and must match; otherwise, you'll encounter an Invalid API key error.

Region	Global	Mainland
MINIMAX_API_KEY	go get from MiniMax Global	go get from MiniMax
MINIMAX_API_HOST	https://api.minimax.io	https://api.minimaxi.com

Claude Desktop

Go to Claude > Settings > Developer > Edit Config > claude_desktop_config.json to include the following:

{
  "mcpServers": {
    "MiniMax": {
      "command": "uvx",
      "args": [
        "minimax-mcp",
        "-y"
      ],
      "env": {
        "MINIMAX_API_KEY": "insert-your-api-key-here",
        "MINIMAX_MCP_BASE_PATH": "local-output-dir-path, such as /User/xxx/Desktop",
        "MINIMAX_API_HOST": "api host, https://api.minimax.io | https://api.minimaxi.com",
        "MINIMAX_API_RESOURCE_MODE": "optional, [url|local], url is default, audio/image/video are downloaded locally or provided in URL format"
      }
    }
  }
}

⚠️ Warning: The API key needs to match the host. If an error "API Error: invalid api key" occurs, please check your api host:

Global Host：https://api.minimax.io
Mainland Host：https://api.minimaxi.com

If you're using Windows, you will have to enable "Developer Mode" in Claude Desktop to use the MCP server. Click "Help" in the hamburger menu in the top left and select "Enable Developer Mode".

Cursor

Go to Cursor -> Preferences -> Cursor Settings -> MCP -> Add new global MCP Server to add above config.

That's it. Your MCP client can now interact with MiniMax through these tools:

Transport

We support two transport types: stdio and sse.

stdio	SSE
Run locally	Can be deployed locally or in the cloud
Communication through `stdout`	Communication through `network`
Input: Supports processing `local files` or valid `URL` resources	Input: When deployed in the cloud, it is recommended to use `URL` for input

Available Tools

tool	description
`text_to_audio`	Convert text to audio with a given voice
`list_voices`	List all voices available
`voice_clone`	Clone a voice using provided audio files
`generate_video`	Generate a video from a prompt
`text_to_image`	Generate a image from a prompt
`query_video_generation`	Query the result of video generation task
`music_generation`	Generate a music track from a prompt and lyrics
`voice_design`	Generate a voice from a prompt using preview text

Release Notes

July 2, 2025

🆕 What's New

Voice Design: New voice_design tool - create custom voices from descriptive prompts with preview audio
Video Enhancement: Added MiniMax-Hailuo-02 model with ultra-clear quality and duration/resolution controls
Music Generation: Enhanced music_generation tool powered by music-1.5 model

📈 Enhanced Tools

voice_design - Generate personalized voices from text descriptions
generate_video - Now supports MiniMax-Hailuo-02 with 6s/10s duration and 768P/1080P resolution options
music_generation - High-quality music creation with music-1.5 model

FAQ

1. invalid api key

Please ensure your API key and API host are regionally aligned

Region	Global	Mainland
MINIMAX_API_KEY	go get from MiniMax Global	go get from MiniMax
MINIMAX_API_HOST	https://api.minimax.io	https://api.minimaxi.com

2. spawn uvx ENOENT

Please confirm its absolute path by running this command in your terminal:

which uvx

Once you obtain the absolute path (e.g., /usr/local/bin/uvx), update your configuration to use that path (e.g., "command": "/usr/local/bin/uvx").

3. How to use `generate_video` in async-mode

Define completion rules before starting: Alternatively, these rules can be configured in your IDE settings (e.g., Cursor):

Deep Agent CLI

Overview

MiniMax Deep Agent is an LLM-driven multimodal command-line AI assistant based on MiniMax's own large models and MCP multimodal toolchain. It implements text-to-image, text-to-video, text-to-music, text-to-speech, and other capabilities.

Key Features

Full MiniMax tech stack: Uses MiniMax-M2.5 for reasoning and MiniMax MCP Server for tools, with the same API Key
True Agent: LLM makes decisions in a loop, not hardcoded routing
MCP protocol integration: Connects to tool servers via standard MCP protocol, tools are pluggable
Web search: Real-time information retrieval through Tavily API, supporting news, weather, document queries

Quickstart

Configure environment variables in .env file:

# Required — API authentication
MINIMAX_API_KEY=your_key_here
MINIMAX_API_HOST=https://api.minimaxi.com  # Mainland China
# MINIMAX_API_HOST=https://api.minimax.io  # Global

# Optional — Agent behavior
MINIMAX_CHAT_MODEL=MiniMax-M2.5            # Inference model
MINIMAX_MCP_BASE_PATH=~/Desktop            # File save directory
MINIMAX_API_RESOURCE_MODE=local            # local|url

# Optional — Web search (Tavily)
TAVILY_API_KEY=tvly-xxxxx                  # Get from https://tavily.com

# Optional — Debug
DEBUG=1                                     # Output logs to terminal

Start the agent:

# One-click start (recommended)
./run_agent.sh

# Or manually
uv run --python 3.12 python deep_agent.py

# Debug mode
DEBUG=1 uv run --python 3.12 python deep_agent.py

Usage Examples

Simple task: "Draw a cat"

User: "画一只猫"
Agent: "当然可以！基于\"猫\"这个描述，我来为你生成图片。"
(Agent calls text_to_image tool)
Agent: "已帮你生成图片，保存在 /Desktop/image_xxx.jpg"

Compound task: "Draw a beach sunset, then add relaxing music"

User: "画一张海边日落，然后配上轻松的音乐"
Agent: "好的，我会先帮你生成海边日落的图片，然后为你创作一首轻松的音乐。"
(Agent first calls text_to_image, then music_generation)
Agent: "完成了！图片保存在 xxx，音乐保存在 xxx"

Web search task: "What's the latest AI news"

User: "最近有什么 AI 新闻"
Agent: "让我帮你搜索最新的 AI 新闻。"
(Agent calls web_search tool)
Agent: "最近 AI 领域的重要新闻有：1. ... 2. ... 3. ..."

Search + generation task: "Check today's weather in Hangzhou and broadcast it"

User: "查一下今天杭州天气，然后用语音播报"
Agent: "我需要先搜索杭州今天的天气，然后用语音播报结果。"
(Agent first calls web_search, then text_to_audio)
Agent: "杭州今天晴，25°C。语音播报已生成，保存在 xxx"

Example usage

⚠️ Warning: Using these tools may incur costs.

1. broadcast a segment of the evening news

2. clone a voice

3. generate a video

4. generate images

Web Search Capability

Overview

The Deep Agent includes a built-in web_search tool powered by Tavily Search API, designed specifically for AI agents.

Features

Returns AI-generated summaries + original search results
Supports real-time information: news, weather, prices, documentation, etc.
LLM autonomously decides when to search (e.g., for real-time questions or uncertain knowledge)

Configuration

Get a Tavily API key from https://tavily.com
Add the API key to your .env file:
```
TAVILY_API_KEY=tvly-xxxxx
```

Usage

The agent will automatically use the web_search tool when needed:

When you ask about current events (e.g., "What's the weather today?")
When you ask for the latest information (e.g., "Latest AI news")
When the LLM is uncertain about a fact

Architecture

Overall Architecture

┌─────────────────────────────────────────────────────────┐
│                    Deep Agent CLI                        │
│                   (deep_agent.py)                        │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │              Agent Loop (ReAct)                    │  │
│  │                                                   │  │
│  │  User Input                                       │  │
│  │      │                                            │  │
│  │      ▼                                            │  │
│  │  ┌─────────┐   tool_calls   ┌──────────────┐     │  │
│  │  │ MiniMax │ ─────────────→ │  Tool Router │     │  │
│  │  │  Chat   │                │  (call_tool)  │     │  │
│  │  │  API    │ ←───────────── │              │     │  │
│  │  │ (M2.5)  │   tool_result  └──┬───────┬──┘     │  │
│  │  └────┬────┘                   │       │         │  │
│  │       │ text              stdio │       │ HTTPS   │  │
│  │       ▼                        ▼       ▼         │  │
│  │  User Output          MCP Server  Local Tools    │  │
│  │                       (9 tools)   (web_search)   │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
                    │                          │
                    │ HTTPS                    │ HTTPS
                    ▼                          ▼
          ┌───────────────────┐      ┌─────────────────┐
          │  MiniMax Cloud API │      │  Tavily Search  │
          │  - Image Gen      │      │  API            │
          │  - Video Gen      │      └─────────────────┘
          │  - Music Gen      │
          │  - TTS / Voice    │
          └───────────────────┘

Three-Tier Architecture

Tier	Component	Responsibility
Inference Layer	MiniMax Chat API (M2.5)	Understand user intent, plan steps, select tools, organize responses
Protocol Layer	MCP Client ↔ MCP Server + Local Tools	Tool discovery, parameter passing, result return
Execution Layer	MiniMax Cloud API + Tavily API	Multimodal generation + real-time information retrieval

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
docs		docs
minimax_mcp		minimax_mcp
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README-CN.md		README-CN.md
README.md		README.md
deep_agent.py		deep_agent.py
deep_agent_enhanced.py		deep_agent_enhanced.py
mcp_server_config_demo.json		mcp_server_config_demo.json
pyproject.toml		pyproject.toml
run_agent.sh		run_agent.sh
setup.py		setup.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Documentation

Quickstart with MCP Client

Claude Desktop

Cursor

Transport

Available Tools

Release Notes

July 2, 2025

🆕 What's New

📈 Enhanced Tools

FAQ

1. invalid api key

2. spawn uvx ENOENT

3. How to use generate_video in async-mode

Deep Agent CLI

Overview

Key Features

Quickstart

Usage Examples

Simple task: "Draw a cat"

Compound task: "Draw a beach sunset, then add relaxing music"

Web search task: "What's the latest AI news"

Search + generation task: "Check today's weather in Hangzhou and broadcast it"

Example usage

1. broadcast a segment of the evening news

2. clone a voice

3. generate a video

4. generate images

Web Search Capability

Overview

Features

Configuration

Usage

Architecture

Overall Architecture

Three-Tier Architecture

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3. How to use `generate_video` in async-mode

Packages