Document known limitations and/or inconsistencies of the Responses API in Llama Stack

### 🚀 Describe the new functionality needed

The Llama Stack documentation for Responses should include a list of known limitations and/or inconsistencies with OpenAI’s Responses.  We should also keep up to date when some limitations are addressed and/or when new features from OpenAI emerge and we haven’t implemented them in Llama Stack yet.  Ideally this would reference the specific date version in https://platform.openai.com/docs/changelog that we are comparing to so that if a change occurs on the OpenAI side and we haven’t updated the known limitations and/or inconsistencies list yet, readers can at least see what date version of OpenAI we are comparing to.  See also https://github.com/llamastack/llama-stack/issues/3040 for more about known limitations.


Here are some raw notes about some of the things that would go into this documentation:


## Streaming


https://github.com/meta-llama/llama-stack/issues/2364. This appears to be a work in progress.


## MCP and function tools don't work if they have no arguments


https://github.com/llamastack/llama-stack/pull/3560


## Tool Choice


In OpenAI this subset of the responses API allows you to set restrictions or requirements for which tools should be used when generating a response. This needs to be implemented completely.  https://github.com/llamastack/llama-stack/issues/3548


## Global Guardrails


When you call the OpenAI Responses API, the model outputs presumably go through safety models that are configured by the OpenAI administrators.  Llama Stack should have a way to configure some safety model or models (or non-model logic?) to be used for all Responses requests either through run.yaml or some sort of administrative API.


## User-Controlled Guardrails


Because OpenAI has their own global guardrails, they haven’t released a way for users to configure their own guardrails (either as a complement to or a replacement for the global ones).  However, Llama Stack users might want that, as noted in https://github.com/llamastack/llama-stack/issues/3325 .  This could potentially be a non-breaking (additive) difference from the OpenAI APIs.


## Safety Identification and Tracking


In OpenAI’s platform, a user can track their agentic users using a safety identifier that is passed with each response. When the request is found to be in violation of moderation or safety rules, the account holder is alerted and automated actions can be taken.

https://github.com/llamastack/llama-stack/issues/3549


## Built in Tools


One of the core platform features of responses is its ecosystem of built in tools that can lower the barrier to entry for developing certain types of agentic workflows. In general, these tend to be aligned with the specific tools that a given model was trained to know how to use, which makes their existence awkward for a model-agnostic framework like Llama Stack.  There are a couple of built-in tools in Llama Stack already (file search, web search, ??).  We might not want to add more, but we should at least document which ones we support and which of the ones that are in the OpenAI Responses API that we do not support (e.g., code interpreter).  Note that some built-in tools may drag in additional APIs such as the [containers API](https://platform.openai.com/docs/api-reference/containers) for the code interpreter tool.


No issues for adding more built-in tools are filed upstream.  It is not clear whether there is any demand for them.


## Prompt Templates


In OpenAI’s platform, prompts can be templated using a structured language that looks similar to python’s jinja. Those templates can be stored for free on the server side, then anyone in an organization can access and use those templates when creating a response. This allows people to more easily share workflows and solutions for getting quality results with each other.  This is rumored to be work in progress, but I haven't found a relevant link to an open issue for it.


## Connectors


Connectors are MCP servers that are maintained and managed by the provider of the Responses API (e.g., Llama Stack).  OpenAI has documented their connectors at https://platform.openai.com/docs/guides/tools-connectors-mcp .  Would we want to add some, all or none of the ones OpenAI supports?  Or a mechanism for administrators to add their own connectors via run.yaml or some API?


## MCP Elicitations


Elicitations are a way for servers to request additional information from users through the client during interactions, e.g., a tool for some service could ask the user for their username before proceeding.  See https://modelcontextprotocol.io/specification/draft/client/elicitation for more information.  Does this work in the reference implementation of OpenAI Responses?  Does this work now in Llama Stack?  If the answers are different, then this is a known incompatibility and even if they are the same, it is a known limitation.

## MCP Sampling


Sampling is a way for the MCP tool to ask the generative AI model something.  See https://modelcontextprotocol.io/specification/draft/client/sampling .  Does this work in the reference implementation of OpenAI Responses?  Does this work now in Llama Stack?  If the answers are different, then this is a known incompatibility and even if they are the same, it is a known limitation.

## Reasoning


https://github.com/llamastack/llama-stack/issues/3551


## Service Tier


https://github.com/llamastack/llama-stack/issues/3550


## logprobs


https://github.com/llamastack/llama-stack/issues/3552


## max_tool_calls


https://github.com/llamastack/llama-stack/issues/3563


## max_output_tokens


https://github.com/llamastack/llama-stack/issues/3562


## metadata


https://github.com/llamastack/llama-stack/issues/3564


## instructions


https://github.com/llamastack/llama-stack/issues/3566


## incomplete_details


https://github.com/llamastack/llama-stack/issues/3567


## background


https://github.com/llamastack/llama-stack/issues/3568

## parallel_tool_calls

There is a rumor that `parallel_tool_calls` doesn't work.  Someone should test this out to verify and open a ticket if that's true.

## Response branching

Response branching, as discussed in the [Agents vs OpenAI Responses API doc](https://llama-stack.readthedocs.io/en/latest/building_applications/responses_vs_agents.html#agents-vs-openai-responses-api) is not working

# Some that are already resolved:

Whenever any of the ones above get resolved, I move them here for reference (mainly so that if someone else remembers one of these from earlier releases, they can look here and see that they are resolved).

## require_approval parameter for MCP tools in Responses API does not work

https://github.com/llamastack/llama-stack/issues/3443

## MCP tools don't work if they have array-type arguments

This was fixed for the Agent API in https://github.com/llamastack/llama-stack/pull/3003 and for Responses in #3602.

## MCP tools don't work if they have an input and output JSON schema with $ref, $defs

#3365

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document known limitations and/or inconsistencies of the Responses API in Llama Stack #3575

🚀 Describe the new functionality needed

Streaming

MCP and function tools don't work if they have no arguments

Tool Choice

Global Guardrails

User-Controlled Guardrails

Safety Identification and Tracking

Built in Tools

Prompt Templates

Connectors

MCP Elicitations

MCP Sampling

Reasoning

Service Tier

logprobs

max_tool_calls

max_output_tokens

metadata

instructions

incomplete_details

background

parallel_tool_calls

Response branching

Some that are already resolved:

require_approval parameter for MCP tools in Responses API does not work

MCP tools don't work if they have array-type arguments

MCP tools don't work if they have an input and output JSON schema with $ref, $defs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Document known limitations and/or inconsistencies of the Responses API in Llama Stack #3575

Description

🚀 Describe the new functionality needed

Streaming

MCP and function tools don't work if they have no arguments

Tool Choice

Global Guardrails

User-Controlled Guardrails

Safety Identification and Tracking

Built in Tools

Prompt Templates

Connectors

MCP Elicitations

MCP Sampling

Reasoning

Service Tier

logprobs

max_tool_calls

max_output_tokens

metadata

instructions

incomplete_details

background

parallel_tool_calls

Response branching

Some that are already resolved:

require_approval parameter for MCP tools in Responses API does not work

MCP tools don't work if they have array-type arguments

MCP tools don't work if they have an input and output JSON schema with $ref, $defs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions