Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,21 @@ It is a telescope, not the sky: the point is *starting the day with context*,
identically for every agent and human that asks. Don't grow product features
into the engine — grow step types and plugins.

## The foundation pass (read docs/FOUNDATION.md first)

This project's reason to exist is anti-bloat **by construction**: the data
contract (#26), progressive disclosure (#32), and the plugin crate model
(#10). Until that pass is complete, no plugin-backlog work (#11–#31) starts.
Standing law from it:

- **Steps before tools** — new capability is a step type unless it mutates;
tool additions justify themselves in the PR.
- **The tool budget is a CI-pinned constant** (default `tools/list` ≤ 12).
- **`data` schemas are versioned contracts** — breaking one is a breaking
release with a regression test.
- **Every PR states its disclosure tier** (step / store kind / facet tool /
discovered tool).

## Hard rules

1. **All subprocess spawns go through `ExecGate::spawn`**
Expand Down
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,16 @@ structured report. The same routine runs identically from a terminal, from
Claude Code, from newt-agent, or from any other MCP client — deterministically,
with no credentials in config and every subprocess gated by a capability leash.

**Why it exists:** every MCP server bloats by default — tool schemas flood
agent context, agents end up string-matching prose, and features pile into a
monolith. Modulex is built so that *cannot happen*: a versioned **data
contract** (agents never parse prose), **progressive disclosure** (the
default tool surface is ≤ 12 entries forever, CI-pinned; everything else is
discovered at the moment of need), and a **declared-authority plugin model**
(capability arrives as feature-gated step types, not core patches). Agent
cost stays constant while capability grows — **by construction, not by
vigilance**. See [docs/FOUNDATION.md](docs/FOUNDATION.md).

The flagship routine is the **good-morning dashboard**: tend your repos, count
your deadlines, scan your boards, list the reviews waiting on you — one
invocation, one report.
Expand All @@ -22,7 +32,14 @@ invocation, one report.

## Status

Early development. See `modulex.toml.example` for the configuration surface.
Early development — the [foundation pass](docs/FOUNDATION.md) (data contract
[#26], progressive disclosure [#32], plugin crate model [#10]) lands before
the plugin backlog (#11–#31). See `modulex.toml.example` for the
configuration surface.

[#26]: https://github.com/hartsock/modulex-mcp/issues/26
[#32]: https://github.com/hartsock/modulex-mcp/issues/32
[#10]: https://github.com/hartsock/modulex-mcp/issues/10

## Quick start

Expand Down
223 changes: 223 additions & 0 deletions docs/FOUNDATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# The Foundation Pass

**Why this project exists, and what must be built first.**

Tracking issues: [#26] (data contract) · [#32] (progressive disclosure) ·
[#10] (plugin crate model). Everything else waits for these three.

[#26]: https://github.com/hartsock/modulex-mcp/issues/26
[#32]: https://github.com/hartsock/modulex-mcp/issues/32
[#10]: https://github.com/hartsock/modulex-mcp/issues/10

---

## 1. The problem: MCP servers bloat by default

Every MCP server faces the same gravity. Capabilities arrive as tools; tools
arrive as schemas; and **every connected agent pays for every schema in its
context window on every session, whether it uses the tool or not.** A server
with 120 tools costs an agent ~35–45k tokens before any work happens — and
tool-selection accuracy *drops* as the menu grows. The result, across the
ecosystem, is a familiar failure shape:

- **Surface bloat** — `tools/list` grows linearly with features.
- **Prose coupling** — agents parse human-formatted output with string
matching, so every cosmetic change breaks somebody's automation.
- **Monolith creep** — features land in the core because there is no
disciplined seam for them to land anywhere else, so every user carries
every dependency.

Plugin ecosystems make all three worse: more contributors, more tools, more
formats, faster.

modulex exists to demonstrate the alternative: **a plugin-extensible MCP
server whose cost to agents stays constant while its capability grows** —
not by discipline and review vigilance, but **by construction**. That
property is the product. The morning-dashboard routine is the demo.

## 2. The three pillars

Bloat has three faces (context cost, format coupling, dependency creep).
Each pillar eliminates one *structurally*:

### Pillar A — The data contract ([#26])

> Reports serve humans AND agents. The markdown is for the human; every step
> also emits a stable, versioned, typed `data` payload. **An agent never
> parses prose.**

- Every step type documents its `data` schema in rustdoc and exposes it
machine-readably.
- Schemas are versioned with the crate; a breaking shape change is a
breaking release with a regression test that proves the old shape fails
loudly.
- `format: "data"` on `routine_run` / `step_run` / `report_get` returns only
the structured payloads — the agent-native view of a report.

This kills prose coupling: cosmetic report changes can never break an agent,
and schema changes are visible, versioned events.

### Pillar B — Progressive disclosure ([#32])

> An agent always sees a **small, stable index** of what is possible;
> detail — schemas, knowledge, data — is disclosed **at the moment of
> need**, never preloaded.

Four mechanisms, one principle:

1. **Steps, not tools.** Capability lands as *step types* (config-driven,
zero MCP surface). Read paths live in routines; only **mutations** earn a
tool.
2. **Generic store dispatch.** `store_put` / `store_query` / `store_close`
(kind-keyed) instead of N×CRUD tools; per-kind record schemas are
disclosed on demand, not listed.
3. **Facets.** `[mcp] expose = [...]` (with an env override, same three-tier
sourcing as the exec leash) gates which tool groups a connection sees —
deny-by-default beyond `core`. One binary can also mount as a narrow
single-facet server.
4. **Discovery meta-tools.** `tool_search` → `tool_describe` →
`tool_invoke`: the long tail is searchable and callable without ever
appearing in `tools/list`. Plus `routine_eval` — one tool that accepts an
inline routine definition, so arbitrary composition costs one schema.

**The budget: the default `tools/list` is ≤ 12 entries, forever, pinned by a
CI test.** Growing it is a deliberate, reviewed change to a named constant —
not a side effect of adding a feature.

This kills surface bloat: context cost is proportional to what an agent
*uses*, not to what the server *has*.

### Pillar C — The plugin crate model ([#10])

> Plugins are crates with a narrow, declared seam — not patches to the core.

- A plugin is a `modulex-plugin-<name>` crate exporting
`register(registry)`: its step types, its (facet-scoped) tool specs, its
store migrations.
- **Feature-gated linkage**: each plugin's dependencies stay out of every
build that doesn't ask for it (`--no-default-features` → lean engine;
`full` → everything). CI builds the matrix.
- **Declared authority**: plugins declare the programs they spawn and the
hosts they reach; the declarations feed the default deny-all-except
exec/net grants. A plugin cannot quietly widen the leash.
- **Owned store slices**: per-plugin schema versions in the store's `meta`
table; plugins migrate their own tables and never touch core's.

This kills monolith creep: the core stays a small engine; capability is
opt-in at compile time, disclosed at runtime, and leashed at execution time.

## 3. How the pillars interlock

A new capability, end to end, under all three pillars:

```
capability = step type (B1: zero tool surface)
+ data schema (A: versioned, machine-readable)
+ [mutation tool] (B: facet-scoped, discovered not listed)
+ plugin crate (C: feature-gated deps, declared leash)
```

The costs that normally grow linearly become flat or opt-in:

| Cost | Naive MCP server | modulex |
|---|---|---|
| Agent context (tools/list) | O(features) | **O(1)** — ≤12 entries, CI-pinned |
| Agent context (per task) | all schemas | only the 1–2 schemas it pulls |
| Format breakage risk | every report tweak | **zero** — data contract versioned |
| Binary size / dep graph | every user carries all | feature-gated, opt-in |
| Leash surface | grows silently | declared per plugin, deny-by-default |

That table is the pitch. Nothing in it depends on reviewer vigilance; every
row is enforced by a test, a type, or a build flag.

## 4. Implementation plan (the foundation pass)

Ordered, small PRs. Nothing from the plugin backlog starts until F5 is done.

### F1 — Data contract core (#26)

- `StepData` documentation convention + per-step-type JSON schema, exposed
via an extended `steps_list` (name, description, data schema).
- Populate `data` for every existing builtin (git steps → typed per-repo
status enums; deadline/countdown → numeric days; reminders already typed).
- `format: "data"` on the three report-bearing tools.
- Tests: every builtin has a schema; every schema validates against the
step's real output in the existing test suite; a schema-stability
regression harness (golden schemas, breaking change = failing test).

### F2 — Tool registry refactor (prerequisite for B and C)

- Replace the match-based tool dispatch with a `ToolSpec` registry: name,
description, input schema, **`mutates: bool`** (declared, not guessed),
facet, handler.
- Builtins re-register through it; behavior identical (existing server tests
must pass unchanged — that's the regression net).

### F3 — Progressive disclosure (#32)

- Facets: `[mcp] expose` + env override, provenance banner (leash-style);
`tools/list` filtered by facet; `notifications/tools/list_changed`.
- Discovery trio: `tool_search`, `tool_describe`, `tool_invoke` (facet- and
mutates-aware dispatch).
- Store dispatch trio: `store_put`/`store_query`/`store_close`; existing
specific tools become aliases inside a non-default `store-classic` facet.
- **The budget test**: default-facet tool count pinned in CI.

### F4 — `routine_eval`

- Inline routine definitions (steps array, identical semantics to config),
same leash, same soft-failure report, generation-stamped like any run.
- Declared-default grant interaction: inline steps may only use programs
already in the grant — `routine_eval` discloses capability, never widens
authority.

### F5 — Reference plugin (#10 proven end to end)

- Extract/build ONE plugin under the model (`modulex-plugin-health` is the
candidate: no network, few deps, obviously useful) with its own README,
tests, feature flag, facet, data schemas, declared programs.
- Write `docs/PLUGIN_AUTHORING.md` from the experience — the contract every
backlog issue (#11–#31) then follows.
- CI: feature-matrix builds (`--no-default-features`, default, `full`).

### Definition of done for the pass

- [ ] An agent runs the morning routine and acts on **every** section
without one string-match against markdown (`format: "data"` only).
- [ ] `tools/list` ≤ 12 with all foundation features merged — and a CI test
fails if it grows.
- [ ] `tool_describe`/`tool_invoke` reach every registered tool, including
the reference plugin's, with facets enforced.
- [ ] `cargo build --no-default-features` yields a lean engine without the
reference plugin's deps; `full` includes it.
- [ ] `docs/PLUGIN_AUTHORING.md` exists and the reference plugin's PR was
written against it.

## 5. The anti-bloat rules (standing law)

These outlive the pass; PR review enforces them; most are CI-enforced:

1. **Steps before tools.** New capability is a step type unless it mutates
something on the user's behalf. Tool additions justify themselves in the
PR description.
2. **The budget is a constant.** The default tool surface is a named,
CI-pinned number. Changing it is its own commit with its own reasoning.
3. **Schemas are contracts.** `data` shapes version with the crate;
breaking one without the regression-harness dance is a blocked PR.
4. **Plugins declare their authority.** Programs and hosts in the manifest
seam, surfaced by `doctor`, folded into deny-by-default grants.
5. **Disclosure tier stated in every PR.** Each addition names where it
lands: step / store kind / facet tool / discovered tool. "Default
surface" is an exceptional answer that needs an exceptional reason.
6. All existing hard rules hold (leashed exec/net, unserializable secrets,
generation counters not wall-clock, soft failures, TDD + regression
tests, 80% coverage floor, README per crate).

## 6. What this gives the community

A worked, tested answer to the question every MCP server hits at scale:
*how do you grow capability without growing the bill every agent pays?*
The pattern — *data contract + progressive disclosure + declared-authority
plugins* — is general. modulex is its reference implementation: small
enough to read, disciplined enough to copy, and demonstrated end-to-end on
a routine real people run every morning.
Loading