Skip to content

Add MCP server for PTZ app with FastMCP interface#1

Open
saumya-pailwan wants to merge 3 commits into
plebbyd:mainfrom
saumya-pailwan:ptz-mcp-server
Open

Add MCP server for PTZ app with FastMCP interface#1
saumya-pailwan wants to merge 3 commits into
plebbyd:mainfrom
saumya-pailwan:ptz-mcp-server

Conversation

@saumya-pailwan

Copy link
Copy Markdown

This adds the MCP server (sage_mcp.py) and integrates job submission logic tailored for the PTZ object detection pipeline. Includes custom submit_ptz_app_job() and SAGE-compatible configurations.

@plebbyd

plebbyd commented Nov 14, 2025

Copy link
Copy Markdown
Owner

@saumya-pailwan A few questions:

  1. How were the commits to this PR done by Dario? Did you collaborate with him on this?
  2. How is someone supposed to use this MCP server if it is running on the node?

@saumya-pailwan

saumya-pailwan commented Nov 17, 2025

Copy link
Copy Markdown
Author

@plebbyd Great questions! Let me clarify:

1. Regarding the commits:

No collaboration with Dario on this specific work - these are my commits only. The issue happened because I was pushing code from the H100 cluster where the wheel user account was authenticated as Dario. I identified and resolved this authentication mix-up with Sean's help, which is why my later work (including the agent system/PlantNet integration) shows correctly under my account.

All the MCP server implementation, including:

  • camera_mcp_server.py (hardware abstraction layer)
  • sage_mcp.py (job submission orchestration)
  • mcp_client.py (client interface)
  • Integration with existing PTZ control logic

was developed independently as part of exploring agentic camera control architecture.

2. Usage of the MCP server:

The MCP server provides two complementary deployment options, depending on the use case:

1: Standalone PTZ MCP Server (Current Implementation)

The camera_mcp_server.py runs as a separate service on the edge node, exposing camera controls as MCP tools.

Deployment:

# On the edge node with camera access
python camera_mcp_server.py \
  --username camera_user \
  --password camera_pass \
  --cameraip 130.202.23.153

This exposes tools at http://localhost:8000/mcp:

  • get_position()
  • move_absolute(pan, tilt, zoom)
  • move_relative(pan, tilt, zoom)
  • take_snapshot()
  • stop_movement()

Usage from Detection Pipeline:

# Original approach (direct camera control)
camera = CameraControl(ip, user, password)
camera.absolute_control(pan=45, tilt=0, zoom=1)
image = camera.snap_shot()

# Through MCP server
from mcp_client import MCPClient

mcp_client = MCPClient("http://localhost:8000")
mcp_client.move_absolute(pan=45, tilt=0, zoom=1)
image_bytes = mcp_client.take_snapshot()

Usage with LLM Agents:

# Claude/GPT can now control the camera
llm_prompt = """
You have access to these camera control tools:
- get_position() -> returns current pan/tilt/zoom
- move_absolute(pan, tilt, zoom) -> moves camera
- take_snapshot() -> captures image

Task: Scan the area for people by rotating the camera.
"""

# Agent makes decisions and calls tools through MCP

2. Integration with Sage-MCP Server

The PTZ camera tools can be integrated into the main Sage-MCP server, making camera control available alongside other Sage tools (data queries, job submission, etc.).

Once integrated, users could interact via:

Cursor/Claude:

"Get the current position of camera at 130.202.23.153"
"Move the PTZ camera to pan=45, tilt=10, zoom=5"
"Take a snapshot from the camera and show me what it sees"

HTTP API:

curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "tool": "get_ptz_position",
    "params": {
      "camera_ip": "130.202.23.153",
      "username": "admin",
      "password": "secret"
    }
  }'

Python MCP Client:

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def control_camera():
    server_params = StdioServerParameters(
        command="python",
        args=["sage_mcp.py"]
    )
    
    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            
            result = await session.call_tool(
                "get_ptz_position",
                {
                    "camera_ip": "130.202.23.153",
                    "username": "admin",
                    "password": "secret"
                }
            )
            print(f"Current position: {result}")

Design Decisions

After implementing this MCP abstraction, I developed the PlantNet species identification workflow which demonstrated that:

  • Direct integration (current production code) is more practical for deterministic, high-performance pipelines
  • MCP abstraction enables exploratory/agentic use cases where LLMs need flexible camera control

The MCP architecture would be particularly valuable for:

  • Multi-modal agents that combine camera control with Sage data queries
  • Interactive research where humans/agents explore scenes adaptively
  • Remote orchestration of distributed camera networks

Additional Context:
The sage_mcp.py in this PR also provides SAGE job submission capabilities (similar to what's in the main Sage-MCP repo), enabling remote deployment and monitoring of PTZ detection jobs across multiple edge nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants