Skip to content

KingsYR123/MiniMax-MCP

 
 

Repository files navigation

export

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech and video/image generation APIs. This server allows MCP clients like Claude Desktop, Cursor, Windsurf, OpenAI Agents and others to generate speech, clone voices, generate video, generate image and more.

Documentation

Quickstart with MCP Client

  1. Get your API key from MiniMax.
  2. Install uv (Python package manager), install with curl -LsSf https://astral.sh/uv/install.sh | sh or see the uv repo for additional install methods.
  3. Important: The API host and key vary by region and must match; otherwise, you'll encounter an Invalid API key error.
Region Global Mainland
MINIMAX_API_KEY go get from MiniMax Global go get from MiniMax
MINIMAX_API_HOST https://api.minimax.io https://api.minimaxi.com

Claude Desktop

Go to Claude > Settings > Developer > Edit Config > claude_desktop_config.json to include the following:

{
  "mcpServers": {
    "MiniMax": {
      "command": "uvx",
      "args": [
        "minimax-mcp",
        "-y"
      ],
      "env": {
        "MINIMAX_API_KEY": "insert-your-api-key-here",
        "MINIMAX_MCP_BASE_PATH": "local-output-dir-path, such as /User/xxx/Desktop",
        "MINIMAX_API_HOST": "api host, https://api.minimax.io | https://api.minimaxi.com",
        "MINIMAX_API_RESOURCE_MODE": "optional, [url|local], url is default, audio/image/video are downloaded locally or provided in URL format"
      }
    }
  }
}

⚠️ Warning: The API key needs to match the host. If an error "API Error: invalid api key" occurs, please check your api host:

  • Global Host:https://api.minimax.io
  • Mainland Host:https://api.minimaxi.com

If you're using Windows, you will have to enable "Developer Mode" in Claude Desktop to use the MCP server. Click "Help" in the hamburger menu in the top left and select "Enable Developer Mode".

Cursor

Go to Cursor -> Preferences -> Cursor Settings -> MCP -> Add new global MCP Server to add above config.

That's it. Your MCP client can now interact with MiniMax through these tools:

Transport

We support two transport types: stdio and sse.

stdio SSE
Run locally Can be deployed locally or in the cloud
Communication through stdout Communication through network
Input: Supports processing local files or valid URL resources Input: When deployed in the cloud, it is recommended to use URL for input

Available Tools

tool description
text_to_audio Convert text to audio with a given voice
list_voices List all voices available
voice_clone Clone a voice using provided audio files
generate_video Generate a video from a prompt
text_to_image Generate a image from a prompt
query_video_generation Query the result of video generation task
music_generation Generate a music track from a prompt and lyrics
voice_design Generate a voice from a prompt using preview text

Release Notes

July 2, 2025

🆕 What's New

  • Voice Design: New voice_design tool - create custom voices from descriptive prompts with preview audio
  • Video Enhancement: Added MiniMax-Hailuo-02 model with ultra-clear quality and duration/resolution controls
  • Music Generation: Enhanced music_generation tool powered by music-1.5 model

📈 Enhanced Tools

  • voice_design - Generate personalized voices from text descriptions
  • generate_video - Now supports MiniMax-Hailuo-02 with 6s/10s duration and 768P/1080P resolution options
  • music_generation - High-quality music creation with music-1.5 model

FAQ

1. invalid api key

Please ensure your API key and API host are regionally aligned

Region Global Mainland
MINIMAX_API_KEY go get from MiniMax Global go get from MiniMax
MINIMAX_API_HOST https://api.minimax.io https://api.minimaxi.com

2. spawn uvx ENOENT

Please confirm its absolute path by running this command in your terminal:

which uvx

Once you obtain the absolute path (e.g., /usr/local/bin/uvx), update your configuration to use that path (e.g., "command": "/usr/local/bin/uvx").

3. How to use generate_video in async-mode

Define completion rules before starting: Alternatively, these rules can be configured in your IDE settings (e.g., Cursor):

Deep Agent CLI

Overview

MiniMax Deep Agent is an LLM-driven multimodal command-line AI assistant based on MiniMax's own large models and MCP multimodal toolchain. It implements text-to-image, text-to-video, text-to-music, text-to-speech, and other capabilities.

Key Features

  • Full MiniMax tech stack: Uses MiniMax-M2.5 for reasoning and MiniMax MCP Server for tools, with the same API Key
  • True Agent: LLM makes decisions in a loop, not hardcoded routing
  • MCP protocol integration: Connects to tool servers via standard MCP protocol, tools are pluggable
  • Web search: Real-time information retrieval through Tavily API, supporting news, weather, document queries

Quickstart

  1. Configure environment variables in .env file:

    # Required — API authentication
    MINIMAX_API_KEY=your_key_here
    MINIMAX_API_HOST=https://api.minimaxi.com  # Mainland China
    # MINIMAX_API_HOST=https://api.minimax.io  # Global
    
    # Optional — Agent behavior
    MINIMAX_CHAT_MODEL=MiniMax-M2.5            # Inference model
    MINIMAX_MCP_BASE_PATH=~/Desktop            # File save directory
    MINIMAX_API_RESOURCE_MODE=local            # local|url
    
    # Optional — Web search (Tavily)
    TAVILY_API_KEY=tvly-xxxxx                  # Get from https://tavily.com
    
    # Optional — Debug
    DEBUG=1                                     # Output logs to terminal
  2. Start the agent:

    # One-click start (recommended)
    ./run_agent.sh
    
    # Or manually
    uv run --python 3.12 python deep_agent.py
    
    # Debug mode
    DEBUG=1 uv run --python 3.12 python deep_agent.py

Usage Examples

Simple task: "Draw a cat"

User: "画一只猫"
Agent: "当然可以!基于\"猫\"这个描述,我来为你生成图片。"
(Agent calls text_to_image tool)
Agent: "已帮你生成图片,保存在 /Desktop/image_xxx.jpg"

Compound task: "Draw a beach sunset, then add relaxing music"

User: "画一张海边日落,然后配上轻松的音乐"
Agent: "好的,我会先帮你生成海边日落的图片,然后为你创作一首轻松的音乐。"
(Agent first calls text_to_image, then music_generation)
Agent: "完成了!图片保存在 xxx,音乐保存在 xxx"

Web search task: "What's the latest AI news"

User: "最近有什么 AI 新闻"
Agent: "让我帮你搜索最新的 AI 新闻。"
(Agent calls web_search tool)
Agent: "最近 AI 领域的重要新闻有:1. ... 2. ... 3. ..."

Search + generation task: "Check today's weather in Hangzhou and broadcast it"

User: "查一下今天杭州天气,然后用语音播报"
Agent: "我需要先搜索杭州今天的天气,然后用语音播报结果。"
(Agent first calls web_search, then text_to_audio)
Agent: "杭州今天晴,25°C。语音播报已生成,保存在 xxx"

Example usage

⚠️ Warning: Using these tools may incur costs.

1. broadcast a segment of the evening news

2. clone a voice

3. generate a video

4. generate images

Web Search Capability

Overview

The Deep Agent includes a built-in web_search tool powered by Tavily Search API, designed specifically for AI agents.

Features

  • Returns AI-generated summaries + original search results
  • Supports real-time information: news, weather, prices, documentation, etc.
  • LLM autonomously decides when to search (e.g., for real-time questions or uncertain knowledge)

Configuration

  1. Get a Tavily API key from https://tavily.com
  2. Add the API key to your .env file:
    TAVILY_API_KEY=tvly-xxxxx

Usage

The agent will automatically use the web_search tool when needed:

  • When you ask about current events (e.g., "What's the weather today?")
  • When you ask for the latest information (e.g., "Latest AI news")
  • When the LLM is uncertain about a fact

Architecture

Overall Architecture

┌─────────────────────────────────────────────────────────┐
│                    Deep Agent CLI                        │
│                   (deep_agent.py)                        │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │              Agent Loop (ReAct)                    │  │
│  │                                                   │  │
│  │  User Input                                       │  │
│  │      │                                            │  │
│  │      ▼                                            │  │
│  │  ┌─────────┐   tool_calls   ┌──────────────┐     │  │
│  │  │ MiniMax │ ─────────────→ │  Tool Router │     │  │
│  │  │  Chat   │                │  (call_tool)  │     │  │
│  │  │  API    │ ←───────────── │              │     │  │
│  │  │ (M2.5)  │   tool_result  └──┬───────┬──┘     │  │
│  │  └────┬────┘                   │       │         │  │
│  │       │ text              stdio │       │ HTTPS   │  │
│  │       ▼                        ▼       ▼         │  │
│  │  User Output          MCP Server  Local Tools    │  │
│  │                       (9 tools)   (web_search)   │  │
│  └───────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘
                    │                          │
                    │ HTTPS                    │ HTTPS
                    ▼                          ▼
          ┌───────────────────┐      ┌─────────────────┐
          │  MiniMax Cloud API │      │  Tavily Search  │
          │  - Image Gen      │      │  API            │
          │  - Video Gen      │      └─────────────────┘
          │  - Music Gen      │
          │  - TTS / Voice    │
          └───────────────────┘

Three-Tier Architecture

Tier Component Responsibility
Inference Layer MiniMax Chat API (M2.5) Understand user intent, plan steps, select tools, organize responses
Protocol Layer MCP Client ↔ MCP Server + Local Tools Tool discovery, parameter passing, result return
Execution Layer MiniMax Cloud API + Tavily API Multimodal generation + real-time information retrieval

About

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.0%
  • Shell 3.0%