Skip to content

Commit 958d0dc

Browse files
committed
Pass parallel_tool_calls directly and document intended usage in integration test
Signed-off-by: Anastas Stoyanovsky <[email protected]>
1 parent 91f1b35 commit 958d0dc

File tree

8 files changed

+31
-196
lines changed

8 files changed

+31
-196
lines changed

docs/docs/providers/agents/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Agents
44
5-
APIs for creating and interacting with agentic systems.
5+
APIs for creating and interacting with agentic systems.
66
sidebar_label: Agents
77
title: Agents
88
---
@@ -13,6 +13,6 @@ title: Agents
1313

1414
Agents
1515

16-
APIs for creating and interacting with agentic systems.
16+
APIs for creating and interacting with agentic systems.
1717

1818
This section contains documentation for all available providers for the **agents** API.
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
description: |
33
The Batches API enables efficient processing of multiple requests in a single operation,
4-
particularly useful for processing large datasets, batch evaluation workflows, and
5-
cost-effective inference at scale.
4+
particularly useful for processing large datasets, batch evaluation workflows, and
5+
cost-effective inference at scale.
66
7-
The API is designed to allow use of openai client libraries for seamless integration.
7+
The API is designed to allow use of openai client libraries for seamless integration.
88
9-
This API provides the following extensions:
10-
- idempotent batch creation
9+
This API provides the following extensions:
10+
- idempotent batch creation
1111
12-
Note: This API is currently under active development and may undergo changes.
12+
Note: This API is currently under active development and may undergo changes.
1313
sidebar_label: Batches
1414
title: Batches
1515
---
@@ -19,14 +19,14 @@ title: Batches
1919
## Overview
2020

2121
The Batches API enables efficient processing of multiple requests in a single operation,
22-
particularly useful for processing large datasets, batch evaluation workflows, and
23-
cost-effective inference at scale.
22+
particularly useful for processing large datasets, batch evaluation workflows, and
23+
cost-effective inference at scale.
2424

25-
The API is designed to allow use of openai client libraries for seamless integration.
25+
The API is designed to allow use of openai client libraries for seamless integration.
2626

27-
This API provides the following extensions:
28-
- idempotent batch creation
27+
This API provides the following extensions:
28+
- idempotent batch creation
2929

30-
Note: This API is currently under active development and may undergo changes.
30+
Note: This API is currently under active development and may undergo changes.
3131

3232
This section contains documentation for all available providers for the **batches** API.

docs/docs/providers/eval/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Evaluations
44
5-
Llama Stack Evaluation API for running evaluations on model and agent candidates.
5+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
66
sidebar_label: Eval
77
title: Eval
88
---
@@ -13,6 +13,6 @@ title: Eval
1313

1414
Evaluations
1515

16-
Llama Stack Evaluation API for running evaluations on model and agent candidates.
16+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
1717

1818
This section contains documentation for all available providers for the **eval** API.

docs/docs/providers/files/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Files
44
5-
This API is used to upload documents that can be used with other Llama Stack APIs.
5+
This API is used to upload documents that can be used with other Llama Stack APIs.
66
sidebar_label: Files
77
title: Files
88
---
@@ -13,6 +13,6 @@ title: Files
1313

1414
Files
1515

16-
This API is used to upload documents that can be used with other Llama Stack APIs.
16+
This API is used to upload documents that can be used with other Llama Stack APIs.
1717

1818
This section contains documentation for all available providers for the **files** API.

docs/docs/providers/inference/index.mdx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
description: |
33
Inference
44
5-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
5+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
66
7-
This API provides the raw interface to the underlying models. Three kinds of models are supported:
8-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
9-
- Embedding models: these models generate embeddings to be used for semantic search.
10-
- Rerank models: these models reorder the documents based on their relevance to a query.
7+
This API provides the raw interface to the underlying models. Three kinds of models are supported:
8+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
9+
- Embedding models: these models generate embeddings to be used for semantic search.
10+
- Rerank models: these models reorder the documents based on their relevance to a query.
1111
sidebar_label: Inference
1212
title: Inference
1313
---
@@ -18,11 +18,11 @@ title: Inference
1818

1919
Inference
2020

21-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
21+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
2222

23-
This API provides the raw interface to the underlying models. Three kinds of models are supported:
24-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
25-
- Embedding models: these models generate embeddings to be used for semantic search.
26-
- Rerank models: these models reorder the documents based on their relevance to a query.
23+
This API provides the raw interface to the underlying models. Three kinds of models are supported:
24+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
25+
- Embedding models: these models generate embeddings to be used for semantic search.
26+
- Rerank models: these models reorder the documents based on their relevance to a query.
2727

2828
This section contains documentation for all available providers for the **inference** API.

docs/docs/providers/safety/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Safety
44
5-
OpenAI-compatible Moderations API.
5+
OpenAI-compatible Moderations API.
66
sidebar_label: Safety
77
title: Safety
88
---
@@ -13,6 +13,6 @@ title: Safety
1313

1414
Safety
1515

16-
OpenAI-compatible Moderations API.
16+
OpenAI-compatible Moderations API.
1717

1818
This section contains documentation for all available providers for the **safety** API.

src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]:
242242
messages=messages,
243243
# Pydantic models are dict-compatible but mypy treats them as distinct types
244244
tools=self.ctx.chat_tools, # type: ignore[arg-type]
245+
parallel_tool_calls=self.parallel_tool_calls,
245246
stream=True,
246247
temperature=self.ctx.temperature,
247248
response_format=response_format,

tests/integration/agents/test_openai_responses.py

Lines changed: 0 additions & 166 deletions
Original file line numberDiff line numberDiff line change
@@ -516,169 +516,3 @@ def test_response_with_instructions(openai_client, client_with_models, text_mode
516516

517517
# Verify instructions from previous response was not carried over to the next response
518518
assert response_with_instructions2.instructions == instructions2
519-
520-
521-
@pytest.mark.skip(reason="Tool calling is not reliable.")
522-
def test_max_tool_calls_with_function_tools(openai_client, client_with_models, text_model_id):
523-
"""Test handling of max_tool_calls with function tools in responses."""
524-
if isinstance(client_with_models, LlamaStackAsLibraryClient):
525-
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
526-
527-
client = openai_client
528-
max_tool_calls = 1
529-
530-
tools = [
531-
{
532-
"type": "function",
533-
"name": "get_weather",
534-
"description": "Get weather information for a specified location",
535-
"parameters": {
536-
"type": "object",
537-
"properties": {
538-
"location": {
539-
"type": "string",
540-
"description": "The city name (e.g., 'New York', 'London')",
541-
},
542-
},
543-
},
544-
},
545-
{
546-
"type": "function",
547-
"name": "get_time",
548-
"description": "Get current time for a specified location",
549-
"parameters": {
550-
"type": "object",
551-
"properties": {
552-
"location": {
553-
"type": "string",
554-
"description": "The city name (e.g., 'New York', 'London')",
555-
},
556-
},
557-
},
558-
},
559-
]
560-
561-
# First create a response that triggers function tools
562-
response = client.responses.create(
563-
model=text_model_id,
564-
input="Can you tell me the weather in Paris and the current time?",
565-
tools=tools,
566-
stream=False,
567-
max_tool_calls=max_tool_calls,
568-
)
569-
570-
# Verify we got two function calls and that the max_tool_calls do not affect function tools
571-
assert len(response.output) == 2
572-
assert response.output[0].type == "function_call"
573-
assert response.output[0].name == "get_weather"
574-
assert response.output[0].status == "completed"
575-
assert response.output[1].type == "function_call"
576-
assert response.output[1].name == "get_time"
577-
assert response.output[0].status == "completed"
578-
579-
# Verify we have a valid max_tool_calls field
580-
assert response.max_tool_calls == max_tool_calls
581-
582-
583-
def test_max_tool_calls_invalid(openai_client, client_with_models, text_model_id):
584-
"""Test handling of invalid max_tool_calls in responses."""
585-
if isinstance(client_with_models, LlamaStackAsLibraryClient):
586-
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
587-
588-
client = openai_client
589-
590-
input = "Search for today's top technology news."
591-
invalid_max_tool_calls = 0
592-
tools = [
593-
{"type": "web_search"},
594-
]
595-
596-
# Create a response with an invalid max_tool_calls value i.e. 0
597-
# Handle ValueError from LLS and BadRequestError from OpenAI client
598-
with pytest.raises((ValueError, BadRequestError)) as excinfo:
599-
client.responses.create(
600-
model=text_model_id,
601-
input=input,
602-
tools=tools,
603-
stream=False,
604-
max_tool_calls=invalid_max_tool_calls,
605-
)
606-
607-
error_message = str(excinfo.value)
608-
assert f"Invalid max_tool_calls={invalid_max_tool_calls}; should be >= 1" in error_message, (
609-
f"Expected error message about invalid max_tool_calls, got: {error_message}"
610-
)
611-
612-
613-
def test_max_tool_calls_with_builtin_tools(openai_client, client_with_models, text_model_id):
614-
"""Test handling of max_tool_calls with built-in tools in responses."""
615-
if isinstance(client_with_models, LlamaStackAsLibraryClient):
616-
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
617-
618-
client = openai_client
619-
620-
input = "Search for today's top technology and a positive news story. You MUST make exactly two separate web search calls."
621-
max_tool_calls = [1, 5]
622-
tools = [
623-
{"type": "web_search"},
624-
]
625-
626-
# First create a response that triggers web_search tools without max_tool_calls
627-
response = client.responses.create(
628-
model=text_model_id,
629-
input=input,
630-
tools=tools,
631-
stream=False,
632-
)
633-
634-
# Verify we got two web search calls followed by a message
635-
assert len(response.output) == 3
636-
assert response.output[0].type == "web_search_call"
637-
assert response.output[0].status == "completed"
638-
assert response.output[1].type == "web_search_call"
639-
assert response.output[1].status == "completed"
640-
assert response.output[2].type == "message"
641-
assert response.output[2].status == "completed"
642-
assert response.output[2].role == "assistant"
643-
644-
# Next create a response that triggers web_search tools with max_tool_calls set to 1
645-
response_2 = client.responses.create(
646-
model=text_model_id,
647-
input=input,
648-
tools=tools,
649-
stream=False,
650-
max_tool_calls=max_tool_calls[0],
651-
)
652-
653-
# Verify we got one web search tool call followed by a message
654-
assert len(response_2.output) == 2
655-
assert response_2.output[0].type == "web_search_call"
656-
assert response_2.output[0].status == "completed"
657-
assert response_2.output[1].type == "message"
658-
assert response_2.output[1].status == "completed"
659-
assert response_2.output[1].role == "assistant"
660-
661-
# Verify we have a valid max_tool_calls field
662-
assert response_2.max_tool_calls == max_tool_calls[0]
663-
664-
# Finally create a response that triggers web_search tools with max_tool_calls set to 5
665-
response_3 = client.responses.create(
666-
model=text_model_id,
667-
input=input,
668-
tools=tools,
669-
stream=False,
670-
max_tool_calls=max_tool_calls[1],
671-
)
672-
673-
# Verify we got two web search calls followed by a message
674-
assert len(response_3.output) == 3
675-
assert response_3.output[0].type == "web_search_call"
676-
assert response_3.output[0].status == "completed"
677-
assert response_3.output[1].type == "web_search_call"
678-
assert response_3.output[1].status == "completed"
679-
assert response_3.output[2].type == "message"
680-
assert response_3.output[2].status == "completed"
681-
assert response_3.output[2].role == "assistant"
682-
683-
# Verify we have a valid max_tool_calls field
684-
assert response_3.max_tool_calls == max_tool_calls[1]

0 commit comments

Comments
 (0)