Skip to content

[Feature] Runtime behavioral trust scoring for tool calls via MCP Observatory #3454

@vdineshk

Description

@vdineshk

Problem

When agents call external tools (MCP servers, APIs, third-party services), there's no runtime mechanism to evaluate whether that tool is behaving normally right now. Static allow-lists don't catch a previously-trusted server that starts returning malformed data, timing out intermittently, or exhibiting anomalous latency patterns. The Agents SDK currently trusts all configured tools equally.

Proposal

Add an optional behavioral trust scoring hook that runs before each tool invocation, using runtime telemetry from the Dominion Observatory — a behavioral trust registry tracking 14,800+ MCP servers with anonymized latency, success rate, and anomaly data.

The hook would:

  1. Before tool call: query the Observatory for the tool server's current trust score (cached with configurable TTL)
  2. Policy decision: based on a configurable threshold, either proceed, warn, or block the call
  3. After tool call: report anonymized telemetry (server_url, success, latency_ms — no prompts, no tool arguments, no user data) back to the Observatory

What this looks like

from agents import Agent, Runner
from dominion_observatory import check_trust, report

# As a guardrail / hook
async def trust_gate(ctx, tool_call):
    result = check_trust(tool_call.server_url)
    if result.trust_score and result.trust_score < 40:
        raise ToolBlockedError(
            f"Trust score {result.trust_score} below threshold for {tool_call.server_url}"
        )

agent = Agent(
    name="my-agent",
    tools=[...],
    before_tool_call=trust_gate,  # proposed hook point
)

Why this matters for agents

  • Tool-calling agents are only as reliable as their tools. A coding agent that calls a code-execution MCP server with a trust score of 12 should behave differently than one calling a server scoring 95.
  • Multi-agent systems amplify the risk. When Agent A delegates to Agent B which calls Tool C, a behavioral anomaly at Tool C cascades. Runtime trust scoring catches this before the cascade.
  • Static security ≠ runtime trust. A tool can pass all static checks (signed, verified publisher, correct schema) and still be operationally degraded. Behavioral trust is the missing signal.

Existing infrastructure

  • Observatory: live at dominion-observatory.sgdata.workers.dev, tracking 14,800+ servers, 87,000+ interactions
  • Python SDK: pip install dominion-observatorycheck_trust(server_url) returns score + anomaly flags
  • LangChain integration: ObservatoryTrustCallbackHandler — same pattern, already built
  • TypeScript SDK: @dominion/trust-provider on npm — includes beforeSettle hook for the x402 protocol
  • Privacy: no prompts, tool arguments, tool outputs, user IDs, or IPs are sent to the Observatory. Only anonymized telemetry (server_url, success, latency_ms, tool_name, http_status).

Integration points

The cleanest integration would be:

  1. A before_tool_call / after_tool_call hook on the Agent or Runner (if not already exposed)
  2. A TrustGuardrail that plugs into the existing guardrails system
  3. Optional: a trust_policy config on Agent that sets threshold + fail behavior (block / warn / log)

Happy to contribute a PR if there's interest. The Python SDK and the hook pattern are already built — it's a matter of wiring them into the Agents SDK's lifecycle.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions