Add an agent-cost-reduction benchmark scenario

## Summary

NCP's adoption pitch is "reduce repeated orchestration cost in agentic systems". A targeted benchmark that compares an LLM-only agent loop against an LLM+NCP loop on the same workflow makes this pitch concrete and defensible.

## Acceptance criteria

- [ ] Pick one repeatable agent workflow. The `lead-qualification` graph (#29) is a natural fit.
- [ ] Run two scenarios on the same input dataset:
  - **A)** LLM agent does the full workflow through prompt/tool orchestration each call.
  - **B)** LLM agent calls one NCP graph (via `ncp-mcp-server`) that runs the deterministic part of the workflow.
- [ ] Measure at minimum:
  - Number of LLM calls per workflow invocation
  - Total prompt and output tokens
  - End-to-end latency
  - Traceability and debuggability (qualitative: what can you reproduce from the trace alone?)
- [ ] Document assumptions honestly (model version, prompt strategy, dataset, cost-per-token used for projections).
- [ ] Add results under `bench/` or `docs/`. Include raw measurement data, not just summary numbers.

## Why this matters

This benchmark is potentially very powerful if done carefully. It gives adopters a defensible cost-reduction number to take to their teams.

## Out of scope

- Closed-source models that cannot be fully scripted.
- Implying NCP is a drop-in replacement for LLMs. The point is to show where deterministic graphs replace orchestration overhead.

## Where to read

- `BENCHMARK.md` for the existing benchmark methodology to align with
- `bench/datasets/` for the existing dataset format
- The lead-qualification graph issue (#29) for the workflow under test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an agent-cost-reduction benchmark scenario #37

Summary

Acceptance criteria

Why this matters

Out of scope

Where to read

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add an agent-cost-reduction benchmark scenario #37

Description

Summary

Acceptance criteria

Why this matters

Out of scope

Where to read

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions