Skip to content

F1.5: live-contract test tier — verified beliefs about real tools (#36)#37

Merged
hartsock merged 3 commits into
mainfrom
f1.5/live-contract-tests
Jun 6, 2026
Merged

F1.5: live-contract test tier — verified beliefs about real tools (#36)#37
hartsock merged 3 commits into
mainfrom
f1.5/live-contract-tests

Conversation

@hartsock

@hartsock hartsock commented Jun 6, 2026

Copy link
Copy Markdown
Owner

Summary

Stacked on #35 — retarget to main after it merges (and #34 before it — the usual order).

Tier 3 of the testing strategy: MODULEX_LIVE_TESTS=1-gated tests that run REAL tools through the real TokioSpawner + leash and assert only the output shapes our mock fixtures encode:

Live test Belief verified
live_git_status_and_unpushed_states clean/dirty/no-upstream state mapping, via the real handlers against a test-created repo
live_gh_pr_list_json_shape gh --json emits number/title/author.login (skips if absent/unauthed)
live_glab_presence glab presence/version (auth-marker belief documented as host-only)
live_harness_contract_end_to_end JSON-on-stdout harness contract against a real process
live_plugin_protocol_with_real_python modulex-plugin/1 stdin/stdout round-trip with real python3
  • Opt-in by construction: without the env var every test exits with a notice — just check and PR CI remain host-independent (the probe-seam lesson, kept).
  • FIXTURE-SYNC rule: mock fixtures mimicking real CLIs now carry comments citing their live test; CLAUDE.md codifies it.
  • just live-test recipe + .github/workflows/live-contract.yml (manual dispatch + weekly cron, GH_TOKEN-authed) so drift surfaces on a cadence.

Test plan

  • Skip path: default run → 5 passed (all skip-notices, zero tool contact)
  • Live on a real host: 5/5 green — gh shape verified over live PRs from a public repo, real git states match our enums exactly, real sh/python3 round-trips
  • just check green: 126 tests total, clippy -D warnings all-features, fmt clean

Fixes #36

hartsock and others added 3 commits June 5, 2026 22:09
WHAT: StepHandler grows description() and data_schema() (required —
the compiler enforces the contract on every handler, including plugin
seams). All 17 builtins now emit typed StepResult.data: git steps carry
per-repo state enums (clean/dirty, all-pushed/unpushed/no-upstream,
ok/diverged/fetch-failed/...), deadline/countdown are refactored to
compute-then-render with numeric days/work-day payloads, github-pr-scan
parses PRs into typed records, gitlab steps carry per-target state +
raw passthrough, mr-sla/board/chores/reminders/url-watch all typed;
script reports exit_code; harness/python/Python-registered handlers are
documented passthroughs. StepRegistry::specs() + Engine::step_specs()
expose (type, description, schema); MCP steps_list returns them;
routine_run/step_run/report_get gain format="data"
(Report::to_data_json — payloads only, no prose); CLI gains --data.

Enforcement: tests/data_contract.rs pins every schema to a checked-in
golden (tests/golden/<type>.json; UPDATE_GOLDEN_SCHEMAS=1 regenerates —
a schema change is a visible reviewed diff and a breaking release) and
drives 15 builtins through mock spawner + in-memory store, validating
every executed step's data against its schema (jsonschema, dev-dep
only — runtime never validates).

WHY: FOUNDATION pillar A (#26): agents never parse prose. Verified
live: `modulex run demo --data` returns pure typed payloads.

Disclosure tier: step-data contract (no new tools; steps_list enriched
in place).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WHAT: Spawner gains program_available() (default = the real PATH probe;
TokioSpawner inherits it). ExecGate::program_available delegates to the
spawner, and the engine's soft-skip probe now asks the gate instead of
the host PATH. MockSpawner answers for its scripted world — everything
available unless named via the new .missing([..]) builder — and the
engine's missing-tool test uses an explicit missing set instead of a
hopefully-absent binary name.

WHY: CI failure on the data-contract test: the glab-backed steps soft-
skipped on the runner (no glab installed) but ran on the dev box — the
probe was an environment dependency outside the seam, violating the
'mocked engines are host-independent' rule. Regression coverage: the
contract test itself (fails on any host missing a CLI under the old
code) plus the reworked missing_tool_soft_skips_the_step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WHAT: tests/live_contract.rs — tier 3, opt-in via MODULEX_LIVE_TESTS=1
(every test exits with a notice without it; default check/CI stay
host-independent). Five live verifications through the REAL TokioSpawner
+ leash: (1) git clean/dirty/no-upstream state mapping driven through
the real GitStatus/GitUnpushed handlers against a test-created temp
repo; (2) gh pr list --json field shape (number/title/author.login)
against a stable public repo, skipping when gh is absent or
unauthenticated; (3) glab presence/version probe; (4) the harness
JSON-on-stdout contract end-to-end with a real shell script; (5) the
modulex-plugin/1 protocol with a real python3 plugin (stdin request,
stdout response, typed mapping). Absent tools skip with a visible
notice — correct in this tier, whose job is host-dependence.
FIXTURE-SYNC citations added at the mock-fixture sites (data_contract,
github tests); CLAUDE.md testing rule extended; just live-test recipe;
.github/workflows/live-contract.yml (workflow_dispatch + weekly cron,
GH_TOKEN-authed) so fixture drift surfaces on a cadence.

WHY: #36 — mocks encode beliefs about external tools; this tier makes
them verified beliefs. Ran live on this host: 5/5 green (gh shape
verified over real PRs).

Disclosure tier: tests only — zero surface change.

Fixes #36

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hartsock hartsock added the risk:low Scoped, tested, no CI/build changes label Jun 6, 2026
@hartsock hartsock changed the base branch from f1/data-contract to main June 6, 2026 10:01
@hartsock

hartsock commented Jun 6, 2026

Copy link
Copy Markdown
Owner Author

Retargeted to main — this PR now carries BOTH F1 and F1.5.

#35 was merged into its stale base branch (integrate/stranded-merges) after that branch had already landed, so F1 never reached main — the third stacked-merge incident. This branch contains the full F1 + F1.5 history, so merging this single PR onto main lands the complete data contract + live-test tier.

Process change to stop this recurring: from now on every PR in this repo is based on main directly, even when work is logically stacked — a child PR's diff temporarily shows its parent's commits, but a UI merge then does the right thing in any order. No more retarget-before-merge choreography.

@hartsock hartsock merged commit e6bac69 into main Jun 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

risk:low Scoped, tested, no CI/build changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

F1.5: live-contract test tier — verify mocked CLI beliefs against real tools (opt-in)

1 participant