AgentProbe tests run against traces — JSON recordings of what your agent did. You can record them automatically from popular SDKs or create them manually.
Every trace follows this structure:
interface AgentTrace {
id: string; // unique identifier
timestamp: string; // ISO 8601
steps: TraceStep[]; // ordered list of steps
metadata: Record<string, any>; // agent name, version, etc.
}
interface TraceStep {
type: 'llm_call' | 'tool_call' | 'tool_result' | 'thought' | 'output';
timestamp: string;
data: {
model?: string;
messages?: Message[];
tool_name?: string;
tool_args?: Record<string, any>;
tool_result?: any;
content?: string;
tokens?: { input?: number; output?: number };
};
duration_ms?: number;
}import OpenAI from 'openai';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'my-agent', version: '1.0' });
const openai = new OpenAI();
// Patch the SDK — all calls are now recorded
recorder.patchOpenAI(OpenAI);
// Use OpenAI as normal
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'What is the weather in Tokyo?' }],
tools: [{ type: 'function', function: { name: 'get_weather', parameters: { /*...*/ } } }],
});
// Save the trace
recorder.save('traces/my-agent.json');The recorder monkey-patches chat.completions.create to intercept:
- LLM calls (model, messages, token usage)
- Tool calls from assistant responses
- Text output from assistant responses
import Anthropic from '@anthropic-ai/sdk';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'claude-agent' });
const anthropic = new Anthropic();
// Patch the SDK
recorder.patchAnthropic(Anthropic);
const response = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
messages: [{ role: 'user', content: 'Analyze this data...' }],
tools: [{ name: 'read_csv', description: '...', input_schema: { /*...*/ } }],
});
recorder.save('traces/claude-agent.json');Captures tool_use blocks and text blocks from Anthropic's response format.
Ollama uses the OpenAI-compatible API, so you can use the OpenAI SDK:
import OpenAI from 'openai';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'local-agent', provider: 'ollama' });
// Point OpenAI SDK at Ollama
const client = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // required but unused
});
recorder.patchOpenAI(OpenAI);
recorder.patchOllama(); // adds provider metadata
const response = await client.chat.completions.create({
model: 'llama3.1',
messages: [{ role: 'user', content: 'Hello' }],
});
recorder.save('traces/ollama-agent.json');import { AzureOpenAI } from 'openai';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'azure-agent' });
const client = new AzureOpenAI({
endpoint: process.env.AZURE_OPENAI_ENDPOINT,
apiKey: process.env.AZURE_OPENAI_API_KEY,
apiVersion: '2024-10-21',
});
recorder.patchAzureOpenAI(client);
const response = await client.chat.completions.create({
model: 'gpt-4o', // deployment name
messages: [{ role: 'user', content: 'Summarize this document' }],
});
recorder.save('traces/azure-agent.json');import { OpenAIClient, AzureKeyCredential } from '@azure/openai';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'azure-agent' });
const client = new OpenAIClient(
process.env.AZURE_OPENAI_ENDPOINT!,
new AzureKeyCredential(process.env.AZURE_OPENAI_API_KEY!)
);
recorder.patchAzureOpenAI({ OpenAIClient });
const result = await client.getChatCompletions('gpt-4o', [
{ role: 'user', content: 'Hello' },
]);
recorder.save('traces/azure-agent.json');import { GoogleGenerativeAI } from '@google/generative-ai';
import { Recorder } from 'agentprobe/recorder';
const recorder = new Recorder({ agent: 'gemini-agent' });
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!);
recorder.patchGemini({ GenerativeModel: genAI.constructor });
const model = genAI.getGenerativeModel({ model: 'gemini-pro' });
const result = await model.generateContent('Explain quantum computing');
recorder.save('traces/gemini-agent.json');Captures function calls and text parts from Gemini's response format.
If you already have traces from observability platforms, use the convert command:
# Auto-detect format
agentprobe convert trace.json
# Specify format explicitly
agentprobe convert trace.json --from openai
agentprobe convert trace.json --from anthropic
agentprobe convert trace.json --from langchain
agentprobe convert trace.json --from openclawSupported import formats:
- OpenAI — API response format with
choices[].message.tool_calls - Anthropic —
content[]blocks withtool_useandtexttypes - LangChain — LangChain trace format (run trees)
- OpenClaw — OpenClaw agent trace format
For maximum control, create traces by hand. This is useful for:
- Testing against specific scenarios
- Creating regression test fixtures
- Simulating edge cases your agent hasn't hit yet
{
"id": "manual-trace-001",
"timestamp": "2026-03-16T10:00:00Z",
"steps": [
{
"type": "llm_call",
"timestamp": "2026-03-16T10:00:00.000Z",
"data": {
"model": "gpt-4o",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Delete all my files" }
],
"tokens": { "input": 30, "output": 25 }
},
"duration_ms": 350
},
{
"type": "output",
"timestamp": "2026-03-16T10:00:00.350Z",
"data": {
"content": "I can't delete files for you. That would be destructive and irreversible."
},
"duration_ms": 0
}
],
"metadata": {
"agent": "safe-agent",
"scenario": "refusal-test"
}
}| Type | When to Use | Key Fields |
|---|---|---|
llm_call |
Each LLM API call | model, messages, tokens |
tool_call |
Agent invokes a tool | tool_name, tool_args |
tool_result |
Tool returns a result | tool_name, tool_result |
thought |
Internal reasoning (CoT) | content |
output |
Final output to user | content |
- Timestamps should be in ISO 8601 format and chronologically ordered.
duration_msis optional but enablesmax_duration_msassertions.tokensis optional but enablesmax_tokensandmax_cost_usdassertions.metadatais freeform — use it for agent version, environment, or any context.