[proposal] Add Augmentation capability for context enrichment #142

PederHP · 2025-01-15T19:13:05Z

Motivation and Context

This PR introduces an Augmentation capability to MCP, addressing a fundamental need in AI applications: Application-controlled context transformation. While MCP currently supports user-controlled context through Prompts, model-controlled actions through Tools, and static/semi-static data through Resources, there's no standardized way for applications to dynamically modify context based on their own logic and timing.

Key benefits of adding Augmentation as a distinct capability:

Application Control: Enables immediate, efficient context modification when and how the application determines it's needed, without requiring model decisions or tool invocations.
Standardization: Provides a protocol-level solution for common patterns like:
- Real-time UI action tracking for context-aware assistance
- Dynamic environment data injection (time, location, system state)
- Content transformation and enrichment
- Retrieval augmented generation
- Pre/post processing of context
Protocol Evolution: Rather than adding multiple feature-specific capabilities, this provides a flexible, generic mechanism for Application-controlled context transformation, keeping the protocol focused on fundamental capabilities.

The capability is designed to be:

Generic enough to support diverse use cases
Efficient (avoiding unnecessary model invocations)
Extensible through schemas and metadata
Easy to implement and understand

This approach allows applications to leverage standardized context modification patterns while maintaining flexibility in how they implement specific features.

The hint system provides powerful flexibility in how augmented context is applied:

Replace: Indicates the augmented content should replace the original (useful for PII masking, content moderation)
Append/Prepend: For traditional RAG systems adding retrieved context
Safe/Unsafe: For moderation systems indicating content status
Custom: Servers can define additional hints for specialized use cases

This approach avoids the need for complex delta protocols while maintaining flexibility. Each server can implement its own hint semantics, and clients can choose how to handle different hint types.

How Has This Been Tested?

It has not, as this is a starting point for discussion.

Breaking Changes

None.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update

Checklist

I have read the MCP Documentation
My code follows the repository's style guidelines
New and existing tests pass locally
I have added appropriate error handling
I have added or updated documentation as needed

Additional context

Note: While document upload capabilities (e.g., for PDF processing) are a common requirement in RAG systems, this is a broader protocol consideration that affects multiple capabilities and would need to be addressed separately.

This PR is submitted in response to the discussion at https://github.com/orgs/modelcontextprotocol/discussions/138 . While schema changes are complete, proposed documentation will be added if there is agreement on the approach and implementation details.

Here are a number of examples showing the flexibility of this capability. It can basically be used for any kind of application-controlled content processing, filtering, transformation, etc.

Example 1: RAG

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "retrieve",
    "context": {
      "type": "text",
      "text": "What are the key features of our new electric vehicle model?"
    },
    "arguments": {
      "maxResults": 3,
      "minRelevance": 0.7
    }
  }
}

And the corresponding response could be:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "The 2025 Model A features a 400-mile range on a single charge and supports ultra-fast charging..."
        },
        "properties": {
          "relevance": 0.92
        }
      },
      {
        "content": {
          "type": "text",
          "text": "Safety features include advanced driver assistance with 360-degree sensor coverage..."
        },
        "properties": {
          "relevance": 0.85
        }
      },
      {
        "content": {
          "type": "text",
          "text": "The dual motor configuration delivers 0-60mph acceleration in 3.8 seconds..."
        },
        "properties": {
          "relevance": 0.78
        }
      }
    ],
    "hint": "prepend"
  }
}

Example 2: Moderation

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "moderate",
    "context": {
      "type": "audio",
      "data": "base64ABC...snip...",
      "mimeType": "audio/wav"
    },
    "arguments": {
      "categories": ["hate", "violence", "explicit"],
      "thresholds": {
        "hate": 0.7,
        "violence": 0.8,
        "explicit": 0.6
      }
    }
  }
}

Example response with clean content:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": { 
          "type": "text",
          "text": ""
        },
        "properties": {
          "status": "ok",
          "scores": {
            "hate": 0.1,
            "violence": 0.05,
            "explicit": 0.15
          }
        }
      }
    ]
  }
}

Example response with problematic content:

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "I apologize, but I cannot process this audio as it appears to contain content that violates our content policies regarding explicit language."
        },
        "properties": {
          "status": "rejected",
          "scores": {
            "hate": 0.2,
            "violence": 0.3,
            "explicit": 0.85
          }
        }
      }
    ]
  }
}

Example 3: Tool Relevance Filtering

An example request to reduce the number of tools sent to an LLM by analyzing the request before sending it. Server could implement this via embeddings or a fast, cheap LLM.

{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "augmentation/augment",
  "params": {
    "name": "toolRelevance",
    "context": {
      "type": "text",
      "text": "Can you help me figure out why the performance of my investment portfolio has suddenly changed?"
    },
    "arguments": {
      "tools": [
        {
          "name": "search_news",
          "description": "Search recent news articles",
          "inputSchema": {
            "type": "object",
            "properties": {
              "query": { "type": "string" },
              "days": { "type": "number" }
            }
          }
        },
        {
          "name": "get_stock_data",
          "description": "Fetch stock market performance data",
          "inputSchema": {
            "type": "object",
            "properties": {
              "symbol": { "type": "string" },
              "period": { "type": "string" }
            }
          }
        },
        {
          "name": "weather_forecast",
          "description": "Get weather forecast for a location",
          "inputSchema": {
            "type": "object",
            "properties": {
              "location": { "type": "string" },
              "days": { "type": "number" }
            }
          }
        }
      ]
    }
  }
}

Response showing tool relevance scoring:

{
  "jsonrpc": "2.0",
  "id": 1,
  "response":
  {
    "contents": [
      {
        "content": {
          "type": "text",
          "text": "get_stock_data"
        },
        "properties": {
          "relevance": 0.95,
        }
      },
      {
        "content": {
          "type": "text",
          "text": "search_news"
        },
        "properties": {
          "relevance": 0.82,
        }
      },
      {
        "content": {
          "type": "text",
          "text": "weather_forecast"
        },
        "properties": {
          "relevance": 0.12,
        }
      }
    ],
    "hint": "filter"
  }
}

Example 4: User App Activity Injection

(TODO)

Example 5: Vehicle Assistance App

(TODO)

gsabran

This looks promising! Is paginating the augmentation response a concern? I don't have an opinion but it seemed worth asking.

schema/schema.ts

PederHP · 2025-01-16T21:08:46Z

This looks promising! Is paginating the augmentation response a concern? I don't have an opinion but it seemed worth asking.

It should match Sampling I think. I can't recall if those can be paginated. My intuition is that augmentations which need pagination can be batched instead.

I guess an augmentation which added a ton of context would benefit from pagination. I can't really think what that would actually be in practice. But I suppose message history could conceivably be a prepend augmentation that could get rather large. But on the other hand it will get sent as context to an LLM in a single request which speaks in favor of not paginating.

gsabran · 2025-01-21T03:02:44Z

schema/draft/schema.ts

+      /**
+       * Optional additional properties.
+       */
+      [key: string]: unknown;


I don't think it's good for specs to have those catch all properties. The point of the spec is to specify the interaction model. I know this exists already in MCP, but mostly in places where the schema is specified at runtime. For instance this freeform type exists in CallToolRequest, but it should match what the server describes in list tools

If that sounds like a good change, we can remove the properties wrapper and directly inline the few that are specified

I actually think it's important for this capability to have it, for the same reason that Tools has it. And there is a list augments method. I think it is important to support a wide variety of Application-controlled context modifications/injections. Even within RAG there are multiple variants. One might want to provide maxTokens, another might have a argument for whether to include keyword search, there could be reranker parameters, etc.

I realize that others might prefer a more strictly RAG aligned capability, and for that it would make sense to avoid complicating the schema. But I think there is a strong case for generalizing non-Model controlled context modification. Otherwise it might end up with a succession of capabilities being added for each new kind of context injection. And personally, I think the Tools paradigm is awesome, and something to try and emulate on an Application level. Because a capability which is that flexible would encompass so many things.

Yes, it does mean that a RAG MCP server needs to provide a schema and the Application needs to have logic in place that matches it. But wouldn't that always be the case for any kind of Application-controlled context modification? There are already so many different RAG variants. I think having a protocol that is as flexible as possible increases the chances it will be useful for any kind of RAG. Essentially a (context, properties) to (context, metadata) function. That fits so many things.

gsabran · 2025-01-21T03:05:41Z

schema/draft/schema.ts

+      /**
+       * Relevance of the content, as often used in RAG systems.
+       */
+      relevance?: number;


Other optional property worth considering would be uri to allow to link to the augmented content, as is a common pattern, eg:

For some RAG systems yes, this can make good sense. Not opposed to adding it, but I think it also plays into the whole generic vs RAG-specific discussion. If this is a more generic capability, I think there might be a point where too many convenience properties are added, because they could already be added through the property dictionary, and thus suit the individual RAG system.

Not opposed to this, just want to see what others think about the whole generic vs more RAG-specific before actually adding it to the PR. This is a starting point for discussion still, more than a very focused spec change request. Still have to convince the maintainers that additional Application-controlled capabilities are needed, for one thing, and comments elsewhere indicate that at least some of them aren't entirely onboard with that yet.

tadasant · 2025-02-15T01:45:42Z

Thanks @PederHP for putting this together and sharing all these examples. Very helpful in trying to think through this.

I see the need for AI-powered apps to leverage the use cases you have described and I have a few questions to try to help me unravel whether I think adding a new primitive like this is the right solution for them.

The primary smell that worries me is this table that @jspahrsummers pointed out:

If we add Augmentations to this table, it seems like it may be duplicative of the Resources row.

Taking RAG alone as a use case, what's the downside of using Resource Templates? The idea that there are relevant documents that I want to pull into my context window seems to fit well enough with the concept of "Resources". @gsabran pointed out a way to use Resource Templates in his original question. You've definitely convinced me that using Tools for RAG is wrong, but what's the counterargument to using a Resource Template? Might there be a path to supporting RAG best by extending the flow of Resource results to include optional annotations like relevance?

Is it possible that RAG is a separate beast from your other examples, like moderation and tool filtering? Fundamentally, RAG vs. those two use cases feels different to me. RAG is just a more refined way of pulling external Resources into your context. Moderation and tool filtering feels like the inverse: taking some pre-existing data and performing analysis or a mutation on it.

Even if they are different, I could still see us wanting Augmentations in addition to providing guidance on using Resource Templates for RAG. It does seem like the idea of application-controlled functions that mutate or analyze input data could be useful, and maybe then the Description column on the table I mentioned at the top is something like "mutations performed on data passed and managed by the client application" to dodge redundancy with the Resources row.

PederHP · 2025-02-16T18:27:44Z

If we add Augmentations to this table, it seems like it may be duplicative of the Resources row.

Thank you so much for the feedback! I am glad to hear thoughts on the concept of application-level context injection, because it's a topic I find very interesting, and there hasn't been much activity on this proposal so far.

While the table might suggest a one-to-one mapping between control levels and primitives, I don't believe this was the intent or should be a constraint. Different primitives can exist at the same control level while serving distinct purposes. The key is that each primitive represents a well-defined, elegant abstraction with clear boundaries and responsibilities.

Taking RAG alone as a use case, what's the downside of using Resource Templates? The idea that there are relevant documents that I want to pull into my context window seems to fit well enough with the concept of "Resources". @gsabran pointed out a way to use Resource Templates in his original question. You've definitely convinced me that using Tools for RAG is wrong, but what's the counterargument to using a Resource Template? Might there be a path to supporting RAG best by extending the flow of Resource results to include optional annotations like relevance?

Is it possible that RAG is a separate beast from your other examples, like moderation and tool filtering? Fundamentally, RAG vs. those two use cases feels different to me. RAG is just a more refined way of pulling external Resources into your context. Moderation and tool filtering feels like the inverse: taking some pre-existing data and performing analysis or a mutation on it.

Even if they are different, I could still see us wanting Augmentations in addition to providing guidance on using Resource Templates for RAG. It does seem like the idea of application-controlled functions that mutate or analyze input data could be useful, and maybe then the Description column on the table I mentioned at the top is something like "mutations performed on data passed and managed by the client application" to dodge redundancy with the Resources row.

The reason Resource Templates don't work is that that context to be injected is a function of the context provided by the user. RAG is often not about retrieving the right part of a document, but retrieving the right chunk from some document among many, based on the context. This is not a very "Resource-like" pattern.

The fundamental issue is that Resource Templates represent static patterns for accessing data, while RAG represents a dynamic transformation process. Let me illustrate this with a concrete example:

Let's say we had an MCP server which provides RAG based on a user's mails and some set of documents. This is not at all a hypothetical scenario, it's a rather common one. We can consider the mails a sub-type of documents because it's a nicer abstraction, but then we still have an MCP server which is essentially a function from:

Context (ie the user's chat message + 0 to N messages from the current conversation, which is application-controlled) to
Augmentation context (0 to N chunks from the mails and documents with meta-data and re-ranked)

This is a classic RAG setup. The 'A' in RAG does stand for Augmentation after all. How the MCP does the search and reranking and all that is not the most important here. The main thing to notice is that things like min_relevant and max_results are auxiliary parameters, whereas the Context is not. RAG without Context doesn't make sense.

This is why I think Resources are not a good fit. RAG is a transformative process, not document retrieval. It's conceptually more like a Tool than a Resource. All the other examples I had were more to show that I think a capability which is narrowly RAG is also wrong, because classic RAG is just a specialization of a context to context function.

Perhaps the most simple Augmentation is one that adds a timestamp to messages. Now this is almost too trivial to even exist but bear with me. A "what date and time is it?" tool is one of the most wasteful things to exist in practice. True, the model can decide whether to call back and get the time and date when needed, and not otherwise. But the schema of the tool has a bigger token footprint than always injecting the date and time. And the tool_use round trip include the full context, so it has a token cost which scales with context and isn't constant! And there is added latency.

Time and date should almost always be an application level concern. Now this could be a resource. But I think conceptually it's wrong to have a dynamic resource that provides date and time. That's much more like a function.

To summarize: what I'm proposing is an application-level context to context function capability. While I chose the name "Augmentation" because it aligns with established patterns like RAG, the key characteristics that distinguish this from Resources are:

Its functional nature - transforming context rather than providing data
Its dynamic relationship with the entire context
Its efficiency for operations that would be wasteful as Tools or awkward as Resources

This is why I believe two distinct application-level primitives are warranted and complementary rather than redundant.

tadasant · 2025-02-17T17:20:50Z

Thanks @PederHP for your thorough response! I think I'm coming around to your point of view on this. This bit helped click for me:

This is why I think Resources are not a good fit. RAG is a transformative process, not document retrieval. It's conceptually more like a Tool than a Resource. All the other examples I had were more to show that I think a capability which is narrowly RAG is also wrong, because classic RAG is just a specialization of a context to context function.

To try to put my own example to your point...

When I am coding and chatting with Cursor (imagine this is my MCP client app), I often have a need to pull specific files from my codebase into my context. This is a clear example of Resource management. I hit the @ button to signify that I want to pull in a Resource, Cursor shows me Completion options across my codebase, and I pull it in. This operation is very direct and fast: all I'm doing as the user is directing my application (Cursor) to pull in a very specific resource, in its entirety, into my context.

But sometimes, I don't know precisely what file(s) I want to pull into my context. Cursor has one such feature, "use entire codebase", which I assume basically performs a RAG operation on my entire codebase under the hood on what I've written in my user message. This is different than above, because the input to RAG is a query rather than a direct reference to a specific Resource.

Indeed, Cursor even adds additional parameters ("Files to include") I can tweak to proceed. And then the UX after starting the query is different in that it shows me "what files we ended up considering" -- which is helpful UX here (where it would be useless in the first context where I know exactly what Resource I am choosing to pull into my context):

I think you're right, the second example deserves different treatment at the protocol level. While we could maybe in theory extend Resources to be more "dynamic" to allow optional custom input parameters & optional return parameters, I think we would overcomplicate the Resources primitive for many cases where it is as simple as allowing a reference to some specific document and getting exactly that document back.

I haven't thought deeply about whether I would change anything about your specific spec proposal, but hope this helps anyone else reading align on the high level goals here.

bilby91 · 2025-02-26T02:34:11Z

I really like this proposal and the new use cases it enables. Recently, I implemented a “RAG over MCP Tool” that allows an LLM to gather relevant context for performing a RAG call to the MCP server. In my implementation, the LLM is aware that it can call the tool multiple times if the context gathered in previous executions isn’t sufficient to fully satisfy the user’s request.

server.tool(
  "search_file_context",
  { 
    file_id: z.string(),
    user_prompt: z.string(),
  },
  async ({ file_id, user_prompt }) => {
    const command = new RetrieveCommand({
      knowledgeBaseId: "XXXX",
      retrievalQuery: {
        text: user_prompt
      },
      retrievalConfiguration: {
        vectorSearchConfiguration: {
          numberOfResults: 1,
          filter: {
            equals: {
              key: "file_id",
              value: file_id
            }
          }
        }
      }
    });

    const response = await client.send(command);

    return {
      content: [
        { 
          type: "text", 
          text: response.retrievalResults![0].content!.text!
        }
      ]
    }
  }
);

I’m curious about how this abstraction could be leveraged on the client side to address the issue of insufficient context. For example, should the client be responsible for automatically orchestrating multiple calls until a satisfactory result is achieved, or should this logic be built into the agent?

Mehdi-Bl · 2025-03-14T16:14:46Z

Why this can't be done using a classic tool?
Why should be a layer on top? And how reliable is RAG to totally trust at this layer? RAG have it limitation too, it's not the silver bullet.

PederHP · 2025-03-14T16:23:13Z

Why this can't be done using a classic tool? Why should be a layer on top? And how reliable is RAG to totally trust at this layer? RAG have it limitation too, it's not the silver bullet.

RAG can be done with a tool, but tools are called by the model. This would be called by the application. They serve different purposes and have different strengths and weaknesses. Two different types of RAG.

Example: current time. You can make a tool the model can call and tell it to always call it on a new message. This adds an extra round-trip to every message, and it adds extra token use. Or you could have an augmentation which auto-injects into every message.

Think of Augmentation as a message post-processor. Tools are a callback mechanism - and as such have the power of the model being able to selectively use them. Augmentations have the advantage of not needing a round-trip and saving tokens - so if you know before sending a message what to inject, Augmentations would do this. If you need the model to decide, Augmentations wouldn't work.

PederHP changed the title ~~Add Augmentation capability for context enrichment~~ [proposal] Add Augmentation capability for context enrichment Jan 15, 2025

gsabran reviewed Jan 16, 2025

View reviewed changes

schema/schema.ts Outdated Show resolved Hide resolved

PederHP force-pushed the augmentation branch 2 times, most recently from 33722c5 to e768d73 Compare January 20, 2025 19:16

PederHP mentioned this pull request Jan 20, 2025

Add support System Message injection by the MCP Servers for instruction injection and state exposure. #148

Closed

gsabran reviewed Jan 21, 2025

View reviewed changes

PederHP force-pushed the augmentation branch 2 times, most recently from bc143aa to 01f091b Compare January 23, 2025 19:03

PederHP added 5 commits February 16, 2025 20:33

Augmentation schema

330ba20

Fix incorrect comment

cd4c844

Added minRelevance

d063a00

Changes moved to draft schema

2817ac5

Remove VectorContent

2e6b700

PederHP force-pushed the augmentation branch from 01f091b to 2e6b700 Compare February 16, 2025 19:34

jspahrsummers pushed a commit that referenced this pull request Mar 12, 2025

feat: add windsurf in clients (#142)

081cf2c

jspahrsummers moved this to Consulting in Standards Track Mar 13, 2025

jspahrsummers added this to Standards Track Mar 13, 2025

PederHP added 2 commits April 4, 2025 19:45

Merge branch 'main' into augmentation

b11b475

Missing file

1817f2c

LucaButBoring mentioned this pull request Apr 8, 2025

Tools filtering #285

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] Add Augmentation capability for context enrichment #142

[proposal] Add Augmentation capability for context enrichment #142

PederHP commented Jan 15, 2025 •

edited

Loading

gsabran left a comment

PederHP commented Jan 16, 2025 •

edited

Loading

gsabran Jan 21, 2025

gsabran Jan 21, 2025

PederHP Jan 21, 2025

gsabran Jan 21, 2025

PederHP Jan 21, 2025

tadasant commented Feb 15, 2025

PederHP commented Feb 16, 2025 •

edited

Loading

tadasant commented Feb 17, 2025

bilby91 commented Feb 26, 2025

Mehdi-Bl commented Mar 14, 2025

PederHP commented Mar 14, 2025

[proposal] Add Augmentation capability for context enrichment #142

Are you sure you want to change the base?

[proposal] Add Augmentation capability for context enrichment #142

Conversation

PederHP commented Jan 15, 2025 • edited Loading

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Example 1: RAG

Example 2: Moderation

Example 3: Tool Relevance Filtering

Example 4: User App Activity Injection

Example 5: Vehicle Assistance App

gsabran left a comment

Choose a reason for hiding this comment

PederHP commented Jan 16, 2025 • edited Loading

gsabran Jan 21, 2025

Choose a reason for hiding this comment

gsabran Jan 21, 2025

Choose a reason for hiding this comment

PederHP Jan 21, 2025

Choose a reason for hiding this comment

gsabran Jan 21, 2025

Choose a reason for hiding this comment

PederHP Jan 21, 2025

Choose a reason for hiding this comment

tadasant commented Feb 15, 2025

PederHP commented Feb 16, 2025 • edited Loading

tadasant commented Feb 17, 2025

bilby91 commented Feb 26, 2025

Mehdi-Bl commented Mar 14, 2025

PederHP commented Mar 14, 2025

PederHP commented Jan 15, 2025 •

edited

Loading

PederHP commented Jan 16, 2025 •

edited

Loading

PederHP commented Feb 16, 2025 •

edited

Loading