F1.5: live-contract test tier — verified beliefs about real tools (#36)#37
Merged
Conversation
WHAT: StepHandler grows description() and data_schema() (required — the compiler enforces the contract on every handler, including plugin seams). All 17 builtins now emit typed StepResult.data: git steps carry per-repo state enums (clean/dirty, all-pushed/unpushed/no-upstream, ok/diverged/fetch-failed/...), deadline/countdown are refactored to compute-then-render with numeric days/work-day payloads, github-pr-scan parses PRs into typed records, gitlab steps carry per-target state + raw passthrough, mr-sla/board/chores/reminders/url-watch all typed; script reports exit_code; harness/python/Python-registered handlers are documented passthroughs. StepRegistry::specs() + Engine::step_specs() expose (type, description, schema); MCP steps_list returns them; routine_run/step_run/report_get gain format="data" (Report::to_data_json — payloads only, no prose); CLI gains --data. Enforcement: tests/data_contract.rs pins every schema to a checked-in golden (tests/golden/<type>.json; UPDATE_GOLDEN_SCHEMAS=1 regenerates — a schema change is a visible reviewed diff and a breaking release) and drives 15 builtins through mock spawner + in-memory store, validating every executed step's data against its schema (jsonschema, dev-dep only — runtime never validates). WHY: FOUNDATION pillar A (#26): agents never parse prose. Verified live: `modulex run demo --data` returns pure typed payloads. Disclosure tier: step-data contract (no new tools; steps_list enriched in place). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WHAT: Spawner gains program_available() (default = the real PATH probe; TokioSpawner inherits it). ExecGate::program_available delegates to the spawner, and the engine's soft-skip probe now asks the gate instead of the host PATH. MockSpawner answers for its scripted world — everything available unless named via the new .missing([..]) builder — and the engine's missing-tool test uses an explicit missing set instead of a hopefully-absent binary name. WHY: CI failure on the data-contract test: the glab-backed steps soft- skipped on the runner (no glab installed) but ran on the dev box — the probe was an environment dependency outside the seam, violating the 'mocked engines are host-independent' rule. Regression coverage: the contract test itself (fails on any host missing a CLI under the old code) plus the reworked missing_tool_soft_skips_the_step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WHAT: tests/live_contract.rs — tier 3, opt-in via MODULEX_LIVE_TESTS=1 (every test exits with a notice without it; default check/CI stay host-independent). Five live verifications through the REAL TokioSpawner + leash: (1) git clean/dirty/no-upstream state mapping driven through the real GitStatus/GitUnpushed handlers against a test-created temp repo; (2) gh pr list --json field shape (number/title/author.login) against a stable public repo, skipping when gh is absent or unauthenticated; (3) glab presence/version probe; (4) the harness JSON-on-stdout contract end-to-end with a real shell script; (5) the modulex-plugin/1 protocol with a real python3 plugin (stdin request, stdout response, typed mapping). Absent tools skip with a visible notice — correct in this tier, whose job is host-dependence. FIXTURE-SYNC citations added at the mock-fixture sites (data_contract, github tests); CLAUDE.md testing rule extended; just live-test recipe; .github/workflows/live-contract.yml (workflow_dispatch + weekly cron, GH_TOKEN-authed) so fixture drift surfaces on a cadence. WHY: #36 — mocks encode beliefs about external tools; this tier makes them verified beliefs. Ran live on this host: 5/5 green (gh shape verified over real PRs). Disclosure tier: tests only — zero surface change. Fixes #36 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
Author
|
Retargeted to #35 was merged into its stale base branch ( Process change to stop this recurring: from now on every PR in this repo is based on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on #35 — retarget to main after it merges (and #34 before it — the usual order).
Tier 3 of the testing strategy:
MODULEX_LIVE_TESTS=1-gated tests that run REAL tools through the realTokioSpawner+ leash and assert only the output shapes our mock fixtures encode:live_git_status_and_unpushed_stateslive_gh_pr_list_json_shapegh --jsonemitsnumber/title/author.login(skips if absent/unauthed)live_glab_presencelive_harness_contract_end_to_endlive_plugin_protocol_with_real_pythonmodulex-plugin/1stdin/stdout round-trip with real python3just checkand PR CI remain host-independent (the probe-seam lesson, kept).just live-testrecipe +.github/workflows/live-contract.yml(manual dispatch + weekly cron,GH_TOKEN-authed) so drift surfaces on a cadence.Test plan
just checkgreen: 126 tests total, clippy-D warningsall-features, fmt cleanFixes #36