Skip to content

Add multimodal input support for images and files#116

Merged
wu-changxing merged 11 commits intomainfrom
claude/update-feature-info-92LEq
Mar 13, 2026
Merged

Add multimodal input support for images and files#116
wu-changxing merged 11 commits intomainfrom
claude/update-feature-info-92LEq

Conversation

@wu-changxing
Copy link
Copy Markdown
Collaborator

Description

This PR adds comprehensive multimodal input support to the Connectonion framework, allowing agents to accept images and files alongside text prompts across all connection types (HTTP, WebSocket, and relay). Includes file upload validation with configurable size limits and detailed documentation.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Changes Made

Core Features

  • Agent multimodal input: Extended Agent.input() to accept images (list of base64 data URLs) and files (list of dicts with name and base64 data)
  • Message formatting: Automatically converts multimodal inputs to proper LLM message format (text + image_url + file objects)
  • File validation: Added validate_files() function in config.py to enforce configurable file size and count limits
  • Host configuration: Added max_file_size (MB) and max_files_per_request parameters to host() function

API Endpoints

  • POST /input: Now accepts optional images and files fields in request body
  • WebSocket /ws: Supports images and files in INPUT messages
  • GET /info: Returns new accepted_inputs field describing supported input types and file limits
  • Error handling: Returns 400 status with descriptive error messages when file validation fails

Client Library

  • connect.input(): Added images and files parameters for remote agent calls
  • connect.input_async(): Async version also supports multimodal inputs
  • Message building: Properly constructs WebSocket messages with multimodal content

Documentation

  • Added "Multimodal Input (Images & Files)" section with examples for HTTP, WebSocket, and Python client
  • Updated /info endpoint documentation with accepted_inputs schema
  • Added file size limit configuration examples in host.yaml and code
  • Updated interactive docs UI with image/file input fields and file attachment support

Testing

  • Added unit tests for Agent.input() with images, files, and combined multimodal inputs
  • Added tests for multiple files in single request
  • Added HTTP endpoint tests verifying images/files are passed to handlers
  • Added file validation tests (oversized files, too many files, edge cases)
  • Added info_handler tests verifying accepted_inputs field with custom config

Example Usage

from connectonion import Agent, host

# Local agent with multimodal input
agent = Agent(name="analyzer")

# Send with images
result = agent.input("Describe this image", images=["data:image/png;base64,iVBORw0KGgo..."])

# Send with files
result = agent.input("Summarize this document", files=[
    {"name": "report.pdf", "data": "data:application/pdf;base64,JVBERi..."}
])

# Send with both
result = agent.input("Analyze", images=[...], files=[...])

# Configure file limits when hosting
host(create_agent, max_file_size=50, max_files_per_request=5)
# HTTP request with images
curl -X POST http://localhost:8000/input \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What do you see?",
    "images": ["data:image/png;base64,iVBORw0KGgo..."]
  }'

# HTTP request with files
curl -X POST http://localhost:8000/input \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Summarize this",
    "files": [{"name": "doc.pdf", "data": "data:application/pdf;base64,JVBERi..."}]
  }'

Testing

  • Added unit tests for Agent multimodal input (images, files, combined)
  • Added HTTP endpoint tests for image/file parameter passing and validation
  • Added file validation tests (size limits, file count limits, edge cases)
  • Added info_handler tests for accepted_inputs

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

claude added 11 commits March 12, 2026 08:21
- Add files parameter to agent.input() accepting base64-encoded file dicts
- Thread files through HTTP POST /input, WebSocket /ws, and RemoteAgent
- Add accepted_inputs field to /info endpoint (text, images, files)
- Add unit tests for single file, multiple files, and mixed images+files

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The HTTP POST /input handler was silently dropping images (pre-existing
bug) while WebSocket /ws was passing them. Now both endpoints have
parity: images and files are extracted from request data and forwarded
to the route handler.

Also fixes the existing test mock to accept **kw and adds a new test
that verifies images and files flow through the HTTP handler correctly.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The validate_files function in config.py was defined but never called.
Now both HTTP and WebSocket handlers validate files against configured
limits (max_file_size, max_files_per_request from host.yaml) before
processing. HTTP returns 400 on validation failure; WebSocket returns
ERROR message.

Defaults: 10MB per file, 10 files per request (configurable in host.yaml).

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The /info endpoint now returns file upload limits so clients know
constraints before uploading:

  "accepted_inputs": {
    "text": true,
    "images": true,
    "files": {
      "max_file_size_mb": 10,
      "max_files_per_request": 10
    }
  }

Limits come from host.yaml config (or defaults). Clients can check
these before sending files to avoid 400 errors.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
- docs/network/host.md: Add images/files to request format, multimodal
  input section, updated /info response with accepted_inputs
- co_ai prompts host.md: Mirror same changes for AI assistant context
- static/docs.html: Add file/image upload UI to POST /input, show
  accepted_inputs in agent info bar, wire attachments to HTTP and WS

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Instead of embedding raw base64 file data in LLM messages (which no
model supports natively), files are now:
1. Decoded from base64 data URLs and saved to .co/uploads/{filename}
2. Referenced in a system-reminder text block listing the file paths

Images remain as image_url content blocks (all models support these).
The agent's tools can then read the saved files by path.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Files are decoded and saved to .co/uploads/, then the agent receives
only the file path via a system reminder. Unlike images which are
passed directly to the LLM.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Instruct agent to use read_file or other tools to read uploaded
files rather than the vague "read these files" phrasing.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Sanitize filenames with Path(name).name to strip directory components
like "../../etc/passwd" → "passwd". Prevents writing outside .co/uploads/.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Documents three options for dedicated file transfer endpoints:
admin-only (minimal), full API, and hybrid approach.
Recommends starting with admin-only read endpoints.

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
@wu-changxing wu-changxing merged commit eae3a17 into main Mar 13, 2026
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants