Skip to content

Commit 78d9c74

Browse files
LLMObs OOTB tool selection documentation (#31734)
* tool selection documentation v1 * remove incomplete sentence * update descriptions and example script * add images and refine example * new image * Update content/en/llm_observability/evaluations/ootb_evaluations.md Co-authored-by: Janine Chan <[email protected]> --------- Co-authored-by: Janine Chan <[email protected]>
1 parent fe59377 commit 78d9c74

File tree

2 files changed

+59
-0
lines changed

2 files changed

+59
-0
lines changed

content/en/llm_observability/evaluations/ootb_evaluations.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -301,6 +301,65 @@ After instrumenting your application to send session-end spans, configure the ev
301301

302302
This configuration ensures evaluations run only on complete sessions. This provides accurate assessments of user intention resolution.
303303

304+
#### Tool selection
305+
306+
This check evaluates whether the agent has successfully selected the appropriate tools to address the user's request.
307+
308+
{{< img src="llm_observability/evaluations/tool_selection_failure.png" alt="A tool selection failure detected by the evaluation in LLM Observability" style="width:100%;" >}}
309+
310+
| Evaluation Stage | Evaluation Method | Evaluation Definition |
311+
|---|---|---|
312+
| Evaluated on LLM spans| Evaluated using LLM | Tool Selection verifies that the tools chosen by the LLM align with the user's request and the available tools. The evaluation identifies cases where irrelevant or incorrect tool calls were made.|
313+
314+
##### Instrumentation
315+
316+
This evaluation is supported in dd-trace version 3.12 and above. The example below uses the OpenAI Agents SDK to illustrate how tools are made available to the agent and to the evaluation:
317+
318+
{{< code-block lang="python" >}}
319+
from ddtrace.llmobs import LLMObs
320+
from agents import Agent, ModelSettings, function_tool
321+
322+
@function_tool
323+
def add_numbers(a: int, b: int) -> int:
324+
"""
325+
Adds two numbers together.
326+
"""
327+
return a + b
328+
329+
@function_tool
330+
def subtract_numbers(a: int, b: int) -> int:
331+
"""
332+
Subtracts two numbers.
333+
"""
334+
return a - b
335+
336+
337+
# List of tools available to the agent
338+
math_tutor_agent = Agent(
339+
name="Math Tutor",
340+
handoff_description="Specialist agent for math questions",
341+
instructions="You provide help with math problems. Please use the tools to find the answer.",
342+
model="o3-mini",
343+
tools=[
344+
add_numbers, subtract_numbers
345+
],
346+
)
347+
348+
history_tutor_agent = Agent(
349+
name="History Tutor",
350+
handoff_description="Specialist agent for history questions",
351+
instructions="You provide help with history problems.",
352+
model="o3-mini",
353+
)
354+
355+
# The triage agent decides which specialized agent to hand off the task to — another type of tool selection covered by this evaluation.
356+
triage_agent = Agent(
357+
'openai:gpt-4o',
358+
model_settings=ModelSettings(temperature=0),
359+
instructions='What is the sum of 1 to 10?',
360+
handoffs=[math_tutor_agent, history_tutor_agent],
361+
)
362+
{{< /code-block >}}
304363

305364
### Security and Safety evaluations
306365

421 KB
Loading

0 commit comments

Comments
 (0)