Skip to content

PoC: InferenceClient is also a MCPClient #2986

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

PoC: InferenceClient is also a MCPClient #2986

wants to merge 8 commits into from

Conversation

julien-c
Copy link
Member

@julien-c julien-c commented Apr 8, 2025

Required reading

https://modelcontextprotocol.io/quickstart/client

TL;DR: MCP is a standard API to expose sets of Tools that can be hooked to LLMs

Summary of how to use this

client = MCPClient(
    provider="together",
    model="Qwen/Qwen2.5-72B-Instruct",
    api_key=os.environ["HF_TOKEN"],
)
await client.add_mcp_server(
    "node", ["--disable-warning=ExperimentalWarning", f"{os.path.expanduser('~')}/Desktop/hf-mcp/index.ts"]
    # in the future we'll (very likely) be able to just pass a remote URL,
    # but for now, we need the MCP server to run locally
)

response = await client.process_query(
    """
    find an app that generates 3D models from text,
    and also get the best paper about transformers
    """
)

Open questions

  • Should we implement this in (Async)InferenceClient directly? Or on a distinct class, like here?

Where to find the MCP Server used here as an example

Note that you can replace it with any MCP Server, from this doc for instance: https://modelcontextprotocol.io/examples

https://gist.github.com/julien-c/0500ba922e1b38f2dc30447fb81f7dc6

Script output

Generation from LLM with tools

3D Model Generation from Text

Here are some of the best apps that can generate 3D models from text:

  1. Shap-E

  2. Hunyuan3D-1

  3. LGM

  4. 3D-Adapter

  5. Fictiverse-Voxel_XL_Lora

Best Paper on Transformers

One of the most influential and highly cited papers on transformers is:

  • Title: "Attention Is All You Need"
  • Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin
  • Link: huggingface.co/papers/1706.03762
  • Description: This paper introduced the Transformer architecture, which has become a cornerstone in natural language processing and many other areas of deep learning.

If you are looking for more recent advancements, here are a few other notable papers:

  • Title: "RoFormer: Enhanced Transformer with Rotary Position Embedding"

  • Link: huggingface.co/papers/2104.09864

  • Description: This paper introduces RoFormer, which improves the Transformer by using rotary position embeddings.

  • Title: "Performer: Generalized Attention with Gaussian Kernels for Sequence Modeling"

  • Link: huggingface.co/papers/2009.14794

  • Description: This paper presents Performer, a more efficient and scalable version of the Transformer.

  • Title: "Longformer: The Long-Document Transformer"

  • Link: huggingface.co/papers/2004.05150

  • Description: This paper introduces Longformer, which extends the Transformer to handle very long documents.

These resources should provide you with a solid foundation for both generating 3D models from text and understanding the latest advancements in transformer models.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@julien-c julien-c requested a review from mishig25 April 8, 2025 12:51
mcp_client.py Outdated
self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str]):
Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not name this method connect_to_server ?
we can't add multiple mcp servers to single instance of client, can we?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we can

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean we would need to store a map of sessions, but there's nothing preventing us from doing it, conceptually

Copy link
Contributor

@mishig25 mishig25 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if the following question is out of scope.
Question: how would I connect to multiple MCP servers? Would it look like option 1 or option 2?

Option 1

client1 = MCPClient()
client1.add_mcp_server()

client2 = MCPClient()
client2.add_mcp_server()

or

Option 2

client = MCPClient()
client.add_mcp_server(server1)
client.add_mcp_server(server2)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another design question: class MCPClient(AsyncInferenceClient) vs class AsyncInferenceClient(...args, mcp_clients: MCPClient[])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to chime in unannounced, but from a very removed external user standpoint, I find this all very confusing - I just don't think what you coded should be called MCPClient 😄

When I came to this PR I was fully expecting MCPClient to be passed as parameter to InferenceClient, though I hear @Wauplin above, so why not a wrapper. But the end result is really more of an InferenceClientWithEmbeddedMCP to me, not an MCPClient.

That being said, it's just about semantics, but I'm kind of a semantics extremist, sorry about that (and feel free to completely disregard this message, as is very likely XD)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was fully expecting MCPClient to be passed as parameter to InferenceClient

What do you mean as parameter? Do you have an example signature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second option of #2986 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, sure, we can probably add this I guess

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah actually with the async/await stuff i'm not so sure.

Copy link

@grll grll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments out of my head, SSE supports would not be so hard to add and could really be a nice addition

self.exit_stack = AsyncExitStack()
self.available_tools: List[ChatCompletionInputTool] = []

async def add_mcp_server(self, command: str, args: List[str], env: Dict[str, str]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would need to lighten a bit the requirements on your args if you want to make it work with SSE or the intent is just to support STDIO ? I see the rest seems to focus on stdio so maybe it's by design

"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note that I have seen some MCP servers with jsonref in their description which sometimes confuses the model. In mcpadapt I had to resolve the jsonref before passing it to the model, might be minor for now

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused or sometime plain unsupported by the model sdk like google genai...

ToolName: TypeAlias = str


class MCPClient:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could potentially also support 2 syntaxes like in the MCPClient for smolagents:

  1. like you have with try and finally
  2. one as a context manager, where you directly pass the the MCP client settings + the MCP servers with the tools you want to have this would allow to do something like "with MCPClient(...) as client:" and then just run client.chat.completion with the tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants