You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With ~20 plugin suites the naive trajectory is 100–150 MCP tools. Every connected agent pays the full schema cost in context (~30–45k tokens before any work) and tool-selection accuracy degrades with menu size. The surface must scale sub-linearly with plugins.
Design (layered)
Tier 0 — steps, not tools (the standing rule)
Plugin suites add step types (config-driven, zero MCP surface) by default. Read paths live in routines/reports (routine_run is the compression). Only mutations earn a tool. Every plugin issue's tool list gets reviewed against this rule.
Tier 1 — generic store dispatch
Collapse per-table CRUD into three kind-keyed tools — store_put {kind, record}, store_query {kind, filter}, store_close {kind, id} — replacing the N×(add/list/done) pattern as kinds multiply. Per-kind record schemas live in the registry (discoverable via Tier 3 / #30), not in tools/list. Existing specific tools (reminder_add…) remain as thin aliases during a deprecation window, then fold in.
Tier 2 — facets (a leash for the tool surface)
[mcp] expose = ["core", "review", "journal", …] with $MODULEX_TOOLS override (same three-tier sourcing discipline as Caveats; deny-by-default beyond "core"). tools/list reflects only exposed facets; notifications/tools/list_changed emitted if the set changes. modulex-mcp --facet <name> mounts a narrow single-suite server, so a client can attach several small servers instead of one giant one.
Tier 3 — discovery meta-tools (constant-size long tail)
Three always-present tools keep the default surface ~8 entries forever:
tool_describe {name} → full input schema on demand
tool_invoke {name, args} → validated dispatch to any registered (and facet-permitted) tool
Lazy schema loading server-side, so EVERY MCP client benefits, not just harnesses with native deferred-tool support. Builds on Tool-surface self-documentation: generate agent skills from the live tool registry #30's registry self-documentation and its declared mutates flag.
Tier 4 — routine_eval (composition escape hatch)
One tool accepting an inline routine definition (steps array; identical semantics to config routines; same leash, same soft-failure report). Agents compose ad-hoc multi-step queries instead of the project ever needing 40 one-shot query tools. Dry-run honored; the resulting report is stored with a generation like any run.
routine_eval with inline steps, leashed identically to config routines
Store dispatch trio; per-kind schemas via tool_describe
Default-surface size test in CI
Development discipline
Rust-first, jujutsu-style (no panics in lib code, RFC-1574 doc comments, lower-level tests over e2e). Follow Foundation contributor rules + the rust-tdd skill: TDD, regression tests, 80% coverage floor, README per crate, exec/net leashed, generation counters not wall-clock, credentials as references only. Plugin model per #10.
Problem
With ~20 plugin suites the naive trajectory is 100–150 MCP tools. Every connected agent pays the full schema cost in context (~30–45k tokens before any work) and tool-selection accuracy degrades with menu size. The surface must scale sub-linearly with plugins.
Design (layered)
Tier 0 — steps, not tools (the standing rule)
Plugin suites add step types (config-driven, zero MCP surface) by default. Read paths live in routines/reports (
routine_runis the compression). Only mutations earn a tool. Every plugin issue's tool list gets reviewed against this rule.Tier 1 — generic store dispatch
Collapse per-table CRUD into three kind-keyed tools —
store_put {kind, record},store_query {kind, filter},store_close {kind, id}— replacing the N×(add/list/done) pattern as kinds multiply. Per-kind record schemas live in the registry (discoverable via Tier 3 / #30), not in tools/list. Existing specific tools (reminder_add…) remain as thin aliases during a deprecation window, then fold in.Tier 2 — facets (a leash for the tool surface)
[mcp] expose = ["core", "review", "journal", …]with$MODULEX_TOOLSoverride (same three-tier sourcing discipline as Caveats; deny-by-default beyond "core").tools/listreflects only exposed facets;notifications/tools/list_changedemitted if the set changes.modulex-mcp --facet <name>mounts a narrow single-suite server, so a client can attach several small servers instead of one giant one.Tier 3 — discovery meta-tools (constant-size long tail)
Three always-present tools keep the default surface ~8 entries forever:
tool_search {query}→ matching tool names + one-linerstool_describe {name}→ full input schema on demandtool_invoke {name, args}→ validated dispatch to any registered (and facet-permitted) toolLazy schema loading server-side, so EVERY MCP client benefits, not just harnesses with native deferred-tool support. Builds on Tool-surface self-documentation: generate agent skills from the live tool registry #30's registry self-documentation and its declared
mutatesflag.Tier 4 —
routine_eval(composition escape hatch)One tool accepting an inline routine definition (steps array; identical semantics to config routines; same leash, same soft-failure report). Agents compose ad-hoc multi-step queries instead of the project ever needing 40 one-shot query tools. Dry-run honored; the resulting report is stored with a generation like any run.
Budget
tools/list: ≤ 12 entries regardless of plugin count (run/list/step/report/steps + store_put/query/close + tool_search/describe/invoke + routine_eval).Acceptance
tool_invokeenforces facet + (future) mutates-policyroutine_evalwith inline steps, leashed identically to config routinestool_describeDevelopment discipline
Rust-first, jujutsu-style (no panics in lib code, RFC-1574 doc comments, lower-level tests over e2e). Follow Foundation contributor rules + the rust-tdd skill: TDD, regression tests, 80% coverage floor, README per crate, exec/net leashed, generation counters not wall-clock, credentials as references only. Plugin model per #10.