Skip to content

Add an agent-cost-reduction benchmark scenario #37

Description

@madeinplutofabio

Summary

NCP's adoption pitch is "reduce repeated orchestration cost in agentic systems". A targeted benchmark that compares an LLM-only agent loop against an LLM+NCP loop on the same workflow makes this pitch concrete and defensible.

Acceptance criteria

  • Pick one repeatable agent workflow. The lead-qualification graph (Add a practical lead-qualification MCP example graph #29) is a natural fit.
  • Run two scenarios on the same input dataset:
    • A) LLM agent does the full workflow through prompt/tool orchestration each call.
    • B) LLM agent calls one NCP graph (via ncp-mcp-server) that runs the deterministic part of the workflow.
  • Measure at minimum:
    • Number of LLM calls per workflow invocation
    • Total prompt and output tokens
    • End-to-end latency
    • Traceability and debuggability (qualitative: what can you reproduce from the trace alone?)
  • Document assumptions honestly (model version, prompt strategy, dataset, cost-per-token used for projections).
  • Add results under bench/ or docs/. Include raw measurement data, not just summary numbers.

Why this matters

This benchmark is potentially very powerful if done carefully. It gives adopters a defensible cost-reduction number to take to their teams.

Out of scope

  • Closed-source models that cannot be fully scripted.
  • Implying NCP is a drop-in replacement for LLMs. The point is to show where deterministic graphs replace orchestration overhead.

Where to read

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:benchmarksBenchmarks, datasets, and performance workarea:examplesExample graphs, bricks, and adopter-facing samplestype:researchResearch, evaluation, or investigation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions