Add multimodal input support for images and files by wu-changxing · Pull Request #116 · openonion/connectonion

wu-changxing · 2026-03-13T00:35:52Z

Description

This PR adds comprehensive multimodal input support to the Connectonion framework, allowing agents to accept images and files alongside text prompts across all connection types (HTTP, WebSocket, and relay). Includes file upload validation with configurable size limits and detailed documentation.

Type of Change

New feature (non-breaking change which adds functionality)
Documentation update

Changes Made

Core Features

Agent multimodal input: Extended Agent.input() to accept images (list of base64 data URLs) and files (list of dicts with name and base64 data)
Message formatting: Automatically converts multimodal inputs to proper LLM message format (text + image_url + file objects)
File validation: Added validate_files() function in config.py to enforce configurable file size and count limits
Host configuration: Added max_file_size (MB) and max_files_per_request parameters to host() function

API Endpoints

POST /input: Now accepts optional images and files fields in request body
WebSocket /ws: Supports images and files in INPUT messages
GET /info: Returns new accepted_inputs field describing supported input types and file limits
Error handling: Returns 400 status with descriptive error messages when file validation fails

Client Library

connect.input(): Added images and files parameters for remote agent calls
connect.input_async(): Async version also supports multimodal inputs
Message building: Properly constructs WebSocket messages with multimodal content

Documentation

Added "Multimodal Input (Images & Files)" section with examples for HTTP, WebSocket, and Python client
Updated /info endpoint documentation with accepted_inputs schema
Added file size limit configuration examples in host.yaml and code
Updated interactive docs UI with image/file input fields and file attachment support

Testing

Added unit tests for Agent.input() with images, files, and combined multimodal inputs
Added tests for multiple files in single request
Added HTTP endpoint tests verifying images/files are passed to handlers
Added file validation tests (oversized files, too many files, edge cases)
Added info_handler tests verifying accepted_inputs field with custom config

Example Usage

from connectonion import Agent, host

# Local agent with multimodal input
agent = Agent(name="analyzer")

# Send with images
result = agent.input("Describe this image", images=["data:image/png;base64,iVBORw0KGgo..."])

# Send with files
result = agent.input("Summarize this document", files=[
    {"name": "report.pdf", "data": "data:application/pdf;base64,JVBERi..."}
])

# Send with both
result = agent.input("Analyze", images=[...], files=[...])

# Configure file limits when hosting
host(create_agent, max_file_size=50, max_files_per_request=5)

# HTTP request with images
curl -X POST http://localhost:8000/input \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What do you see?",
    "images": ["data:image/png;base64,iVBORw0KGgo..."]
  }'

# HTTP request with files
curl -X POST http://localhost:8000/input \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Summarize this",
    "files": [{"name": "doc.pdf", "data": "data:application/pdf;base64,JVBERi..."}]
  }'

Testing

Added unit tests for Agent multimodal input (images, files, combined)
Added HTTP endpoint tests for image/file parameter passing and validation
Added file validation tests (size limits, file count limits, edge cases)
Added info_handler tests for accepted_inputs

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

- Add files parameter to agent.input() accepting base64-encoded file dicts - Thread files through HTTP POST /input, WebSocket /ws, and RemoteAgent - Add accepted_inputs field to /info endpoint (text, images, files) - Add unit tests for single file, multiple files, and mixed images+files https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

The HTTP POST /input handler was silently dropping images (pre-existing bug) while WebSocket /ws was passing them. Now both endpoints have parity: images and files are extracted from request data and forwarded to the route handler. Also fixes the existing test mock to accept **kw and adds a new test that verifies images and files flow through the HTTP handler correctly. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

The validate_files function in config.py was defined but never called. Now both HTTP and WebSocket handlers validate files against configured limits (max_file_size, max_files_per_request from host.yaml) before processing. HTTP returns 400 on validation failure; WebSocket returns ERROR message. Defaults: 10MB per file, 10 files per request (configurable in host.yaml). https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

The /info endpoint now returns file upload limits so clients know constraints before uploading: "accepted_inputs": { "text": true, "images": true, "files": { "max_file_size_mb": 10, "max_files_per_request": 10 } } Limits come from host.yaml config (or defaults). Clients can check these before sending files to avoid 400 errors. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

- docs/network/host.md: Add images/files to request format, multimodal input section, updated /info response with accepted_inputs - co_ai prompts host.md: Mirror same changes for AI assistant context - static/docs.html: Add file/image upload UI to POST /input, show accepted_inputs in agent info bar, wire attachments to HTTP and WS https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Instead of embedding raw base64 file data in LLM messages (which no model supports natively), files are now: 1. Decoded from base64 data URLs and saved to .co/uploads/{filename} 2. Referenced in a system-reminder text block listing the file paths Images remain as image_url content blocks (all models support these). The agent's tools can then read the saved files by path. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Files are decoded and saved to .co/uploads/, then the agent receives only the file path via a system reminder. Unlike images which are passed directly to the LLM. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Instruct agent to use read_file or other tools to read uploaded files rather than the vague "read these files" phrasing. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Sanitize filenames with Path(name).name to strip directory components like "../../etc/passwd" → "passwd". Prevents writing outside .co/uploads/. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Documents three options for dedicated file transfer endpoints: admin-only (minimal), full API, and hybrid approach. Recommends starting with admin-only read endpoints. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

claude added 11 commits March 12, 2026 08:21

Revert co_ai prompts host.md - not part of this feature

d3bf5fd

https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Clarify that file data is saved to disk, not inserted into messages

bba2a80

Files are decoded and saved to .co/uploads/, then the agent receives only the file path via a system reminder. Unlike images which are passed directly to the LLM. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Improve file upload system reminder to prompt tool usage

1cb0276

Instruct agent to use read_file or other tools to read uploaded files rather than the vague "read these files" phrasing. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

Fix path traversal vulnerability in file upload

dc6bb20

Sanitize filenames with Path(name).name to strip directory components like "../../etc/passwd" → "passwd". Prevents writing outside .co/uploads/. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq

wu-changxing merged commit eae3a17 into main Mar 13, 2026
0 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multimodal input support for images and files#116

Add multimodal input support for images and files#116
wu-changxing merged 11 commits intomainfrom
claude/update-feature-info-92LEq

wu-changxing commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wu-changxing commented Mar 13, 2026

Description

Type of Change

Changes Made

Core Features

API Endpoints

Client Library

Documentation

Testing

Example Usage

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants