[Feature] Runtime behavioral trust scoring for tool calls via MCP Observatory

## Problem

When agents call external tools (MCP servers, APIs, third-party services), there's no runtime mechanism to evaluate whether that tool is behaving normally *right now*. Static allow-lists don't catch a previously-trusted server that starts returning malformed data, timing out intermittently, or exhibiting anomalous latency patterns. The Agents SDK currently trusts all configured tools equally.

## Proposal

Add an optional **behavioral trust scoring hook** that runs before each tool invocation, using runtime telemetry from the [Dominion Observatory](https://dominion-observatory.sgdata.workers.dev) — a behavioral trust registry tracking 14,800+ MCP servers with anonymized latency, success rate, and anomaly data.

The hook would:
1. **Before tool call**: query the Observatory for the tool server's current trust score (cached with configurable TTL)
2. **Policy decision**: based on a configurable threshold, either proceed, warn, or block the call
3. **After tool call**: report anonymized telemetry (server_url, success, latency_ms — no prompts, no tool arguments, no user data) back to the Observatory

## What this looks like

```python
from agents import Agent, Runner
from dominion_observatory import check_trust, report

# As a guardrail / hook
async def trust_gate(ctx, tool_call):
    result = check_trust(tool_call.server_url)
    if result.trust_score and result.trust_score < 40:
        raise ToolBlockedError(
            f"Trust score {result.trust_score} below threshold for {tool_call.server_url}"
        )

agent = Agent(
    name="my-agent",
    tools=[...],
    before_tool_call=trust_gate,  # proposed hook point
)
```

## Why this matters for agents

- **Tool-calling agents are only as reliable as their tools.** A coding agent that calls a code-execution MCP server with a trust score of 12 should behave differently than one calling a server scoring 95.
- **Multi-agent systems amplify the risk.** When Agent A delegates to Agent B which calls Tool C, a behavioral anomaly at Tool C cascades. Runtime trust scoring catches this before the cascade.
- **Static security ≠ runtime trust.** A tool can pass all static checks (signed, verified publisher, correct schema) and still be operationally degraded. Behavioral trust is the missing signal.

## Existing infrastructure

- **Observatory**: live at `dominion-observatory.sgdata.workers.dev`, tracking 14,800+ servers, 87,000+ interactions
- **Python SDK**: `pip install dominion-observatory` — `check_trust(server_url)` returns score + anomaly flags
- **LangChain integration**: [`ObservatoryTrustCallbackHandler`](https://github.com/vdineshk/daee-engine/tree/main/packages/dominion-observatory-sdk/python) — same pattern, already built
- **TypeScript SDK**: `@dominion/trust-provider` on npm — includes `beforeSettle` hook for the [x402 protocol](https://github.com/x402-foundation/x402/pull/2300)
- **Privacy**: no prompts, tool arguments, tool outputs, user IDs, or IPs are sent to the Observatory. Only anonymized telemetry (server_url, success, latency_ms, tool_name, http_status).

## Integration points

The cleanest integration would be:
1. A **`before_tool_call` / `after_tool_call` hook** on the Agent or Runner (if not already exposed)
2. A **`TrustGuardrail`** that plugs into the existing guardrails system
3. Optional: a **`trust_policy`** config on Agent that sets threshold + fail behavior (block / warn / log)

Happy to contribute a PR if there's interest. The Python SDK and the hook pattern are already built — it's a matter of wiring them into the Agents SDK's lifecycle.

## References

- [Dominion Observatory](https://dominion-observatory.sgdata.workers.dev) — live behavioral trust data
- [Observatory Python SDK](https://github.com/vdineshk/daee-engine/tree/main/packages/dominion-observatory-sdk/python)
- [x402 Trust-Provider Interface](https://github.com/vdineshk/daee-engine/tree/main/packages/trust-provider) — TypeScript `beforeSettle` hook
- [LangChain integration](https://github.com/vdineshk/daee-engine/blob/main/packages/dominion-observatory-sdk/python/dominion_observatory/langchain.py) — same pattern for LangChain agents
- Related: [CrewAI #5789](https://github.com/crewAIInc/crewAI/issues/5789), [LangChain #37376](https://github.com/langchain-ai/langchain/issues/37376)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Runtime behavioral trust scoring for tool calls via MCP Observatory #3454

Problem

Proposal

What this looks like

Why this matters for agents

Existing infrastructure

Integration points

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Runtime behavioral trust scoring for tool calls via MCP Observatory #3454

Description

Problem

Proposal

What this looks like

Why this matters for agents

Existing infrastructure

Integration points

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions