Skip to content

Document known limitations and/or inconsistencies of the Responses API in Llama Stack #3575

@jwm4

Description

@jwm4

🚀 Describe the new functionality needed

The Llama Stack documentation for Responses should include a list of known limitations and/or inconsistencies with OpenAI’s Responses. We should also keep up to date when some limitations are addressed and/or when new features from OpenAI emerge and we haven’t implemented them in Llama Stack yet. Ideally this would reference the specific date version in https://platform.openai.com/docs/changelog that we are comparing to so that if a change occurs on the OpenAI side and we haven’t updated the known limitations and/or inconsistencies list yet, readers can at least see what date version of OpenAI we are comparing to. See also #3040 for more about known limitations.

Here are some raw notes about some of the things that would go into this documentation:

Streaming

#2364. This appears to be a work in progress.

MCP and function tools don't work if they have no arguments

#3560

Tool Choice

In OpenAI this subset of the responses API allows you to set restrictions or requirements for which tools should be used when generating a response. This needs to be implemented completely. #3548

Global Guardrails

When you call the OpenAI Responses API, the model outputs presumably go through safety models that are configured by the OpenAI administrators. Llama Stack should have a way to configure some safety model or models (or non-model logic?) to be used for all Responses requests either through run.yaml or some sort of administrative API.

User-Controlled Guardrails

Because OpenAI has their own global guardrails, they haven’t released a way for users to configure their own guardrails (either as a complement to or a replacement for the global ones). However, Llama Stack users might want that, as noted in #3325 . This could potentially be a non-breaking (additive) difference from the OpenAI APIs.

Safety Identification and Tracking

In OpenAI’s platform, a user can track their agentic users using a safety identifier that is passed with each response. When the request is found to be in violation of moderation or safety rules, the account holder is alerted and automated actions can be taken.

#3549

Built in Tools

One of the core platform features of responses is its ecosystem of built in tools that can lower the barrier to entry for developing certain types of agentic workflows. In general, these tend to be aligned with the specific tools that a given model was trained to know how to use, which makes their existence awkward for a model-agnostic framework like Llama Stack. There are a couple of built-in tools in Llama Stack already (file search, web search, ??). We might not want to add more, but we should at least document which ones we support and which of the ones that are in the OpenAI Responses API that we do not support (e.g., code interpreter). Note that some built-in tools may drag in additional APIs such as the containers API for the code interpreter tool.

No issues for adding more built-in tools are filed upstream. It is not clear whether there is any demand for them.

Prompt Templates

In OpenAI’s platform, prompts can be templated using a structured language that looks similar to python’s jinja. Those templates can be stored for free on the server side, then anyone in an organization can access and use those templates when creating a response. This allows people to more easily share workflows and solutions for getting quality results with each other. This is rumored to be work in progress, but I haven't found a relevant link to an open issue for it.

Connectors

Connectors are MCP servers that are maintained and managed by the provider of the Responses API (e.g., Llama Stack). OpenAI has documented their connectors at https://platform.openai.com/docs/guides/tools-connectors-mcp . Would we want to add some, all or none of the ones OpenAI supports? Or a mechanism for administrators to add their own connectors via run.yaml or some API?

MCP Elicitations

Elicitations are a way for servers to request additional information from users through the client during interactions, e.g., a tool for some service could ask the user for their username before proceeding. See https://modelcontextprotocol.io/specification/draft/client/elicitation for more information. Does this work in the reference implementation of OpenAI Responses? Does this work now in Llama Stack? If the answers are different, then this is a known incompatibility and even if they are the same, it is a known limitation.

MCP Sampling

Sampling is a way for the MCP tool to ask the generative AI model something. See https://modelcontextprotocol.io/specification/draft/client/sampling . Does this work in the reference implementation of OpenAI Responses? Does this work now in Llama Stack? If the answers are different, then this is a known incompatibility and even if they are the same, it is a known limitation.

Reasoning

#3551

Service Tier

#3550

logprobs

#3552

max_tool_calls

#3563

max_output_tokens

#3562

metadata

#3564

instructions

#3566

incomplete_details

#3567

background

#3568

parallel_tool_calls

There is a rumor that parallel_tool_calls doesn't work. Someone should test this out to verify and open a ticket if that's true.

Response branching

Response branching, as discussed in the Agents vs OpenAI Responses API doc is not working

Some that are already resolved:

Whenever any of the ones above get resolved, I move them here for reference (mainly so that if someone else remembers one of these from earlier releases, they can look here and see that they are resolved).

require_approval parameter for MCP tools in Responses API does not work

#3443

MCP tools don't work if they have array-type arguments

This was fixed for the Agent API in #3003 and for Responses in #3602.

MCP tools don't work if they have an input and output JSON schema with $ref, $defs

#3365

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions