-
Notifications
You must be signed in to change notification settings - Fork 699
RFC: Client / Server Content capabilities #223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RFC: Client / Server Content capabilities #223
Conversation
A draft supplement is supplied below intended for inclusion in the documentation once the right place is identified if we progress with this PR. ClientCapabilities contentTypes
ToolAnnotation generateHints
Audience AnnotationsFor Resources annotated with |
The structured output piece strikes me as the biggest issue here - are you imagining that structured output always flows through an embedded resource? If not, the TextContent type is notably missing a mime type property, so as near as I can tell there's no mechanism to actually communicate a mime type like application/json to a client. If you're imagining this always goes through an embedded resource unless the return is explicitly text/plain, and only text/plain maps to the TextContent, based on the fact that many MCP servers are using the text type to return json, this proposal would double the number of calls for a json heavy MCP server if they wanted to adhere exactly to spec. I'd also think it would be useful to have prescriptive documentation around this - I know this is a philosophical discussion, but is a PDF an image content type with a mime type of pdf or, since it's a flat file format, that needs to be an embedded resource? In the negotiation scenario, are you saying the tool call will either return a text type for text/plain (again in the example you gave), an image type for application/pdf, or and embedded resource, say if they're returning something structured? High level, I like this idea as a negotiation, but there might need to be some supporting changes to handle the structured output piece efficiently, and the tools documentation would need to be updated with a minimal amount of guidance for how servers should treat these content type requests, AND there should be documentation for clients on recommendations for what is expected of the client / host if they send in a renders mime type. |
The question on 371 was whether to use a
A PDF is a binary object that would be delivered as a "BlobResourceContents" with a MIME type of
Questions on structured output specifically should be raised on #371. These are optional capabilities that Host and MCP Server Implementors can take advantage of to build enhanced applications. |
Per #180 (comment), I believe |
The suggestion was to use an EmbeddedResource of TextResourceContents type which contains a mimeType, a uri and Edited to say that mimeType on TextContent is potentially a good addition, my preference would be to fix it as |
Ok, then this is basically dependent on #371 going through? That should be called out in the PR.
This doesn't actually address the concern around documentation (though it nicely explains the thinking, but again, this is an intrinsic dependency on 371) - this PR has no documentation updates that state that a server SHOULD take these actions based on client supported mime types. It also gives no guidance what the order of precedence is - speaking as someone working on the server side, without documentation it would be unclear and spotty how servers should respond to different client mime type capabilities. Should render be preferred over tokenize or vice verse, or should you return two content types if the renders and tokenizes are different sets? This kind of sneaks back to your comments on the search PR suggesting giving prescriptive guidance to the client developers on how to handle the search capability, but that same sort of guidance is helpful on the server side for potentially contradictory client mime types. |
This proposal has no dependency on #371 and predates it by 4 weeks. |
Ah gotcha - then I guess the direct feedback is this PR should pickup the fields required to universally communicate back mime type on responses. Again, coming from the server side, I'm not sure how I'd support application/json or pdf (without looking at 371, which we're saying we shouldn't have to look at since it's not a dependency) |
The content types are already within the protocol, and well documented here: https://modelcontextprotocol.io/specification/2025-03-26/server/resources The terminology on MAY, SHOULD and so on are defined here: https://modelcontextprotocol.io/specification/2025-03-26 |
To put the request for documentation in context, here's the guidance offered by the HTTP spec on content types - it's multiple pages and includes recommendations on defaults, recommendations on sniffing, how to handle unknown responses from both the server and client side, and more. And, it's worth mentioning, the HTTP server use case is actually quite a bit simpler in so much as an http server really only has the ability to return a single result to a request and only gets a single accept header from the requestor. This is as opposed to MCP servers which can potentially return multiple responses and have a much more complex matrix of considerations for what to return, and with this PR actually present two distinct accept lists. This PR creates a similar mechanism within MCP but without any actual documented guidance on what should be returned by default in the absence of accepted content types, which accept list should be given precedence, if it's acceptable or preferable to return multiple content blocks if multiple mime types are accepted by the client, etc. Regarding the comment on content types being well documented, this is from your PR: ``contentTypes I think this is what confused me, because as it's written here contentTypes only partially applies to CallToolResults (since mime is notably absent from the text response). So, maybe a slight rewording, OR pull an optional mime type onto the TextContent as part of this PR? |
Actually, this all brings up an interesting question - right now, mime types are primarily on resources (plus image content and audio content) - would a server ever change the resources it presents to the client based on the accept lists? |
CallToolResult returns an array of content. |
Yup, and my point is one of the array element options doesn't have a mime type. |
From the introduction to this PR:
"adapt their content delivery" means Servers adjusting the outputs of Prompts, Resources or Tools based on the content type hints from the Client. This is a good point to clarify for this discussion, thank you. |
I've updated the comment in #371 to include an example
This PR is not proposing "accept lists", but optional content type hints. Since this PR is adding optional hints to the existing protocol, it may be more appropriate to start a separate discussion in the forums on that topic and whether MCP should contain that guidance. As other points of discussion for this PR, I'd like to also get feedback on:
|
VS Code and many other clients allow users to change models on-the-fly, even in the same "chat session." With this proposal, if that happens, a client would need to stop and restart their MCP connection if they were to announce a different set of content types that they're able to tokenize. Since some servers can be stateful (e.g. playwright/puppeteer) this isn't something that can be done safely. I think we would need some way to announce a new (sub)set of capabilities to servers. |
I understand. I might think about this the other way though - if the Client can match content consumption/generation (e.g. |
In the current state of this PR, yes we might want to warn the user about the risk. But if there were a way to signal a change in capabilities, then it would 'just work' (given a well-implemented MCP server) and we wouldn't have to warn the user about anything 🙂 |
There's loads of scenarios, and I think another idea going around about exposing direct model information to the MCP Server. I guess we need to figure out the right level of abstraction for the Protocol. We already have mid-lifecycle capabilities change from the Server->Client (e.g. ToolListChangeNotification) so it doesn't seem out of the question. |
Instead of specifying content types as a capability during the initialization phase, what about specifying them as a |
|
Follow-up thought: perhaps we should add |
I can't see the harm in Clients using the _meta field for that, the point is to advertise to the MCP Server what can be handled (not a guarantee that if it's sent it will be handled - it's a hint). Ultimately it's the Host apps choice whether to allow the User to select or to optimize model selection. We have to be careful - at some point the abstractions between Client and Server get so leaky that MCP is an inhibitor rather than an interop enabler....! |
This issue is kind of related. I wonder if this could be combined or also supported in some way: #604 |
I think there is a bigger issue here that is "content negotiation". @connor4312's point on changing requirements between tool calls is a good example. I defer this for now, but I have a strong suspicion that we want something different that is more akin to accept headers in HTTP for each request itself. |
Here's an idea of how dynamic capabilities could be represented: diff --git a/schema/draft/schema.ts b/schema/draft/schema.ts
index c688dc3..f9d66b5 100644
--- a/schema/draft/schema.ts
+++ b/schema/draft/schema.ts
@@ -200,9 +200,10 @@ export interface InitializedNotification extends Notification {
}
/**
- * Capabilities a client may support. Known capabilities are defined here, in this schema, but this is not a closed set: any client can define its own, additional capabilities.
+ * Part of {@link ClientCapabilities} which are sent during initialization and
+ * cannot be changed during the course of a session.
*/
-export interface ClientCapabilities {
+export interface StaticClientCapabilities {
/**
* Experimental, non-standard capabilities that the client supports.
*/
@@ -224,6 +225,13 @@ export interface ClientCapabilities {
* Present if the client supports elicitation from the server.
*/
elicitation?: object;
+}
+
+/**
+ * Part of {@link ClientCapabilities} which can be dynamically changed during
+ * the course of a session.
+ */
+export interface DynamicClientCapabilities {
/**
* Present if the client advertises content types it can handle.
*/
@@ -239,6 +247,12 @@ export interface ClientCapabilities {
};
}
+
+/**
+ * Capabilities a client may support. Known capabilities are defined here, in this schema, but this is not a closed set: any client can define its own, additional capabilities.
+ */
+export interface ClientCapabilities extends DynamicClientCapabilities, StaticClientCapabilities {}
+
/**
* Capabilities that a server may support. Known capabilities are defined here, in this schema, but this is not a closed set: any server can define its own, additional capabilities.
*/
@@ -1333,6 +1347,22 @@ export interface ElicitResult extends Result {
content?: { [key: string]: unknown };
}
+/**
+ * A notification from the client to the server, informing it that its capabilities
+ * have changed. This is typically used when the client has updated its underlying
+ * model or configuration.
+ */
+export interface ClientCapabilitiesChangedNotification extends Notification {
+ method: "notifications/client_capabilities/changed";
+ params: {
+ /**
+ * The new client capabilities that the client supports.
+ */
+ capabilities: DynamicClientCapabilities;
+ };
+}
+
+
/* Client messages */
export type ClientRequest =
| PingRequest
@@ -1353,7 +1383,8 @@ export type ClientNotification =
| CancelledNotification
| ProgressNotification
| InitializedNotification
- | RootsListChangedNotification;
+ | RootsListChangedNotification
+ | ClientCapabilitiesChangedNotification;
export type ClientResult = EmptyResult | CreateMessageResult | ListRootsResult | ElicitResult; As MCP and my understanding of it as a client implementor has grown, I no longer think per-request headers are ideal. Namely due to sampling: sampling requests and responses will represent different content types and they can be emitted async, outside the lifecycle of any particular client request, so I think a push mechanism for the client to announce changed capabilities is preferable. |
Tagging @kentcdodds and referring to #679 |
I agree with @connor4312 here and (as a server implementer) I think that it would be useful to go both ways as well (so the server can announce changed capabilities as well as the client). In general, what I mean by #679 is that both clients and servers should communicate both what they can offer and what they can accept. Before now I hadn't considered the fact that these capabilities could change over time and I'm not sure I completely understand the use case there, but I do think that the client and server should both be able to communicate their full capabilities. |
VS Code and most other clients let you change the model you're using during a chat session. Or even change it autmatically depending on the query. Different models will have different sets of mimetypes they natively understand, and we don't want to have to restart MCP servers to announce updated capabilities when that happens. |
That makes complete sense. I don't want to take things too far off the rails here, but I think the discussion over on this issue has a bearing on what could happen in this PR as well: #679 (comment) |
This PR adds a new
contentTypes
capability to theClientCapabilities
and ageneratesHint
Tool Annotation, allowing clients to advertise which MIME types they can render to Users and tokenize for LLM consumption. It also allows Tools to advertise the content types they may generate in aCallToolResult
.This enhancement works with the existing annotations system to optionally enable MCP Servers to adapt their content delivery to best match Host capabilities.
Motivation and Context
Different Host application/LLM pairs have different content handling requirements and capabilities (e.g. Chat Applications, IDEs, Video/Content Editing Suite, Agentic Applications).
This addition allows MCP Servers to make informed decisions to:
Update 2025-04-19:
The addition also enhances interoperability for implementors of the A2A protocol, which defines input and output modes for Agents. See AgentCard here and AgentSkill here.
How Has This Been Tested?
The extension has not been directly tested, however some example scenarios are:
application/pdf
or downgrade totext/plain
based on LLM capabilities.Breaking Changes
The change is backwards compatible.
Types of changes
Checklist
Additional context
This is not intended to be a complicated content-type negotiation protocol - but to provide a simple way for participating Hosts and Servers to provide better User Experiences across a range of deployment scenarios. The list of mime-types is intended to be indicative and neither restrictive nor exhaustive.
An agreed convention for Resources where
audience: [User], priority: 1
is used to indicate content that should be rendered and not tokenized would further enhance the proposal. For example a PDF could be sent for rendering, with the main content sent as text/plain for the LLM.I do not think a reciprocal server capability is necessary, as "Roots" provide the ability for the Host to provide arbitrary content to the server.Update 2025-04-19
After consideration, a Server "generates" capability is appropriate. By convention Servers that support "Structured Outputs" would advertise "application/json" in theirgenerates
list.Update 2025-04-24
Migrated Server "generates" capability to a generatesHint in ToolAnnotation. By convention Servers that support "Structured Outputs" would advertise the content type (e.g.
application/json
orapplication/xml
). This would be compatible with the potential addition of a Schema related to this tool.This PR has been opened for discussion and refinement, with additional documentation to be prepared if there is agreement in principle.