Skip to content

MCP surface scaling: tiny default surface, facets, discovery meta-tools, routine_eval #32

Description

@hartsock

Problem

With ~20 plugin suites the naive trajectory is 100–150 MCP tools. Every connected agent pays the full schema cost in context (~30–45k tokens before any work) and tool-selection accuracy degrades with menu size. The surface must scale sub-linearly with plugins.

Design (layered)

Tier 0 — steps, not tools (the standing rule)

Plugin suites add step types (config-driven, zero MCP surface) by default. Read paths live in routines/reports (routine_run is the compression). Only mutations earn a tool. Every plugin issue's tool list gets reviewed against this rule.

Tier 1 — generic store dispatch

Collapse per-table CRUD into three kind-keyed tools — store_put {kind, record}, store_query {kind, filter}, store_close {kind, id} — replacing the N×(add/list/done) pattern as kinds multiply. Per-kind record schemas live in the registry (discoverable via Tier 3 / #30), not in tools/list. Existing specific tools (reminder_add…) remain as thin aliases during a deprecation window, then fold in.

Tier 2 — facets (a leash for the tool surface)

[mcp] expose = ["core", "review", "journal", …] with $MODULEX_TOOLS override (same three-tier sourcing discipline as Caveats; deny-by-default beyond "core"). tools/list reflects only exposed facets; notifications/tools/list_changed emitted if the set changes. modulex-mcp --facet <name> mounts a narrow single-suite server, so a client can attach several small servers instead of one giant one.

Tier 3 — discovery meta-tools (constant-size long tail)

Three always-present tools keep the default surface ~8 entries forever:

  • tool_search {query} → matching tool names + one-liners
  • tool_describe {name} → full input schema on demand
  • tool_invoke {name, args} → validated dispatch to any registered (and facet-permitted) tool
    Lazy schema loading server-side, so EVERY MCP client benefits, not just harnesses with native deferred-tool support. Builds on Tool-surface self-documentation: generate agent skills from the live tool registry #30's registry self-documentation and its declared mutates flag.

Tier 4 — routine_eval (composition escape hatch)

One tool accepting an inline routine definition (steps array; identical semantics to config routines; same leash, same soft-failure report). Agents compose ad-hoc multi-step queries instead of the project ever needing 40 one-shot query tools. Dry-run honored; the resulting report is stored with a generation like any run.

Budget

  • Default tools/list: ≤ 12 entries regardless of plugin count (run/list/step/report/steps + store_put/query/close + tool_search/describe/invoke + routine_eval).
  • CI check: a test pins the default-facet tool count so surface growth is a deliberate, reviewed change.

Acceptance

  • Facet config + env override with banner (provenance, like the leash)
  • Discovery trio implemented; tool_invoke enforces facet + (future) mutates-policy
  • routine_eval with inline steps, leashed identically to config routines
  • Store dispatch trio; per-kind schemas via tool_describe
  • Default-surface size test in CI

Development discipline

Rust-first, jujutsu-style (no panics in lib code, RFC-1574 doc comments, lower-level tests over e2e). Follow Foundation contributor rules + the rust-tdd skill: TDD, regression tests, 80% coverage floor, README per crate, exec/net leashed, generation counters not wall-clock, credentials as references only. Plugin model per #10.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pluginNew plugin / step-type family

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions