Add multimodal input support for images and files#116
Merged
wu-changxing merged 11 commits intomainfrom Mar 13, 2026
Merged
Conversation
- Add files parameter to agent.input() accepting base64-encoded file dicts - Thread files through HTTP POST /input, WebSocket /ws, and RemoteAgent - Add accepted_inputs field to /info endpoint (text, images, files) - Add unit tests for single file, multiple files, and mixed images+files https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The HTTP POST /input handler was silently dropping images (pre-existing bug) while WebSocket /ws was passing them. Now both endpoints have parity: images and files are extracted from request data and forwarded to the route handler. Also fixes the existing test mock to accept **kw and adds a new test that verifies images and files flow through the HTTP handler correctly. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The validate_files function in config.py was defined but never called. Now both HTTP and WebSocket handlers validate files against configured limits (max_file_size, max_files_per_request from host.yaml) before processing. HTTP returns 400 on validation failure; WebSocket returns ERROR message. Defaults: 10MB per file, 10 files per request (configurable in host.yaml). https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
The /info endpoint now returns file upload limits so clients know
constraints before uploading:
"accepted_inputs": {
"text": true,
"images": true,
"files": {
"max_file_size_mb": 10,
"max_files_per_request": 10
}
}
Limits come from host.yaml config (or defaults). Clients can check
these before sending files to avoid 400 errors.
https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
- docs/network/host.md: Add images/files to request format, multimodal input section, updated /info response with accepted_inputs - co_ai prompts host.md: Mirror same changes for AI assistant context - static/docs.html: Add file/image upload UI to POST /input, show accepted_inputs in agent info bar, wire attachments to HTTP and WS https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Instead of embedding raw base64 file data in LLM messages (which no
model supports natively), files are now:
1. Decoded from base64 data URLs and saved to .co/uploads/{filename}
2. Referenced in a system-reminder text block listing the file paths
Images remain as image_url content blocks (all models support these).
The agent's tools can then read the saved files by path.
https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Files are decoded and saved to .co/uploads/, then the agent receives only the file path via a system reminder. Unlike images which are passed directly to the LLM. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Instruct agent to use read_file or other tools to read uploaded files rather than the vague "read these files" phrasing. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Sanitize filenames with Path(name).name to strip directory components like "../../etc/passwd" → "passwd". Prevents writing outside .co/uploads/. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
Documents three options for dedicated file transfer endpoints: admin-only (minimal), full API, and hybrid approach. Recommends starting with admin-only read endpoints. https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds comprehensive multimodal input support to the Connectonion framework, allowing agents to accept images and files alongside text prompts across all connection types (HTTP, WebSocket, and relay). Includes file upload validation with configurable size limits and detailed documentation.
Type of Change
Changes Made
Core Features
Agent.input()to acceptimages(list of base64 data URLs) andfiles(list of dicts with name and base64 data)validate_files()function inconfig.pyto enforce configurable file size and count limitsmax_file_size(MB) andmax_files_per_requestparameters tohost()functionAPI Endpoints
imagesandfilesfields in request bodyimagesandfilesin INPUT messagesaccepted_inputsfield describing supported input types and file limitsClient Library
imagesandfilesparameters for remote agent callsDocumentation
/infoendpoint documentation withaccepted_inputsschemahost.yamland codeTesting
Agent.input()with images, files, and combined multimodal inputsinfo_handlertests verifyingaccepted_inputsfield with custom configExample Usage
Testing
https://claude.ai/code/session_019ktNJoNpDXEgiiZEf4ataq