Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Prompt/Sampling Messages to contain multiple content blocks. #198

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

evalstate
Copy link
Contributor

@evalstate evalstate commented Mar 13, 2025

Tool Call Results allow the return of an array of Text, Image and EmbeddedResources. This is typically consistent with Messaging APIs (e.g. OpenAI, Anthropic) which allow separation of content blocks within a "User" or "Assistant" message.

The current API treats Prompt and Sampling messages as singular - e.g. they can only contain one content block. This means that client code for message handling needs to "special case" building multi-part messages by recognizing and concatenating them. This also potentially loses the semantics of the "Message" container.

Motivation and Context

  1. Consistency across schema: Currently CallToolResultSchema uses an array of content items, while PromptMessageSchema and SamplingMessageSchema use a single content item. This inconsistency creates implementation complexity.

  2. Alignment with LLM provider APIs: Modern LLM APIs like OpenAI's Chat Completions and Anthropic's Messages API support multiple content blocks per message:

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],
    }],
)
  1. Improved expressiveness: Allows for natural combinations like:
  • Text with supporting images in the same message
  • Text with embedded code snippets as separate blocks
  • Multiple resource references within a logical message unit
  1. Simplified client implementations: Eliminates the need for clients to split/join content across multiple messages to represent what is logically a single message with multiple parts.

How Has This Been Tested?

Breaking Changes

This breaking change can be mitigated with a Protocol Version check to convert from a single element to an Array.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

@PederHP
Copy link
Contributor

PederHP commented Mar 14, 2025

An alternative to the breaking change could be to use a new name for the field or to add the array of content as a new type of content. Not saying either is better than a breaking change. Just worth considering, as in practice many clients/servers will likely not support multiple protocol versions, which means that non-backwards compatible schema changes will break compatibility. Maybe that's ok, but thought I mention this anyway.

@evalstate
Copy link
Contributor Author

I did think on this one quite hard, but I think mitigating are:

  • Relatively low take-up of Prompts/Sampling Features reduce the risk. Those using the features will likely be in a position to adapt. It would be nice to know if others had similar feedback.
  • Conversion for the general case is quite straightforward
  • A new name/field would introduce duplication and tech debt - it might make sense as a migration path, but internally I'm now coding to the assumption that Messages have multiple content blocks.

@PederHP
Copy link
Contributor

PederHP commented Mar 14, 2025

I did think on this one quite hard, but I think mitigating are:

  • Relatively low take-up of Prompts/Sampling Features reduce the risk. Those using the features will likely be in a position to adapt. It would be nice to know if others had similar feedback.
  • Conversion for the general case is quite straightforward
  • A new name/field would introduce duplication and tech debt - it might make sense as a migration path, but internally I'm now coding to the assumption that Messages have multiple content blocks.

I agree, but I think it makes sense to have articulated and considered the alternatives.

@evalstate
Copy link
Contributor Author

Well, it's put here as a draft to provoke the conversation - and get input from the Maintainers. I'm happy to put the work in to a solution of any type (compatibility preserving etc.) if we agree this is something worth doing - but there will be a lot of documentation etc. to write if we proceed with any option. Thank you.

@dsp-ant
Copy link
Member

dsp-ant commented Mar 20, 2025

Curious what @jspahrsummers and @jerome3o-anthropic have to say, but I think this approach makes sense. It'll be a bit painful for clients to update, but I think that's probably okay. Luckily the protocol is versioned and so we can deal with different result types.

@evalstate
Copy link
Contributor Author

On this one, I am planning on writing a discussion thread showing examples of this, and potential workarounds with sample code.

@jspahrsummers
Copy link
Member

Yep, no objections from me.

Copy link

@cliffhall cliffhall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Draft
Development

Successfully merging this pull request may close these issues.

5 participants