spec(runner): sync spec with code drift, add OpenShell desired state#1643
spec(runner): sync spec with code drift, add OpenShell desired state#1643markturansky wants to merge 1 commit into
Conversation
✅ Deploy Preview for cheerful-kitten-f556a0 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
markturansky
left a comment
There was a problem hiding this comment.
amber review — Round 1
Three findings from cross-checking the spec changes against the current codebase on main.
Required — Must Fix
Broken companion doc references in OpenShell section
The OpenShell Desired State section references two documents that do not exist anywhere in the repo:
See: docs/internal/agents/openshell-security-analysis.md
See: docs/internal/agents/openshell-runner-adaptation.md
Checked docs/internal/agents/ on main and on this branch — only README.md and an active/ subdirectory are present. Neither file exists.
Fix: remove the See: links, or replace them with a forward reference (TBD / planned) until the docs are created.
Notable Gap
operational_events.py missing from spec layout
This PR's stated goal is to sync the spec with code drift. operational_events.py was introduced in PR #1633 at bridges/claude/operational_events.py and is present in the repo, but it is absent from the File Layout table in this spec update.
The bridges/claude/ section now lists fixtures/ and other additions — operational_events.py should be included for completeness.
Architectural Note (non-blocking)
_grpc_client.py in OpenShell "Files to Modify" may be a no-op
The OpenShell desired-state section lists _grpc_client.py as a file to modify for the sandbox adaptation. gRPC communication already happens in the runner process (outside the Claude subprocess boundary), so sandboxing the Claude subprocess with Landlock/seccomp-BPF should not require changes to the gRPC client. Worth confirming whether this file actually needs modification before implementation begins, to avoid scope creep in the adaptation work.
Blocking on the companion doc fix before approve.
fc5c3e2 to
90ccbcd
Compare
markturansky
left a comment
There was a problem hiding this comment.
amber review — Round 2
All three Round 1 findings addressed. Codebase verification passed: populate_runtime_credentials() in platform/auth.py ✅, secret_redaction.py ✅, _grpc_client.py ✅, operational_events.py ✅, SessionWorker in bridges/claude/session.py ✅.
One notable gap in the companion doc to fix before implementation starts — the spec and companion doc disagree on what gets sandboxed.
Notable Gap — Sandboxing scope inconsistent between spec and companion doc
runner.spec.md (the spec) says the Claude subprocess is sandboxed:
Desired: Runner container → OpenShell Supervisor → Claude subprocess (sandboxed netns)
Files to Modify: bridges/claude/bridge.py — "Launch Claude CLI via OpenShell Supervisor instead of direct subprocess." gRPC: "Verify no changes needed — gRPC runs in runner process outside sandbox boundary."
openshell-runner-adaptation.md (companion doc) describes wrapping the entire uvicorn process:
CMD ["openshell-sandbox", "--provider", "ambient", "--", \
"/bin/bash", "-c", "umask 0022 && cd /app/ambient-runner && uvicorn main:app ..."]The Supervisor wraps the uvicorn process, applying Landlock + seccomp + netns before exec.
In this model, the runner itself is inside the sandbox, and the companion doc's Files to Modify table accordingly lists _grpc_client.py — "Proxy-aware channel construction for intra-cluster gRPC."
These are different architectures. If an implementer follows the companion doc's Dockerfile, they'll sandbox the runner (not just Claude subprocess), and gRPC will need proxy-awareness — directly contradicting the spec. If they follow the spec, the Claude subprocess is sandboxed via bridge.py, and gRPC stays outside the sandbox.
Fix: align the companion doc to match the spec's model. Update the Dockerfile section to show openshell-sandbox wrapping the Claude CLI invocation (from bridge.py), not the uvicorn entrypoint. Update _grpc_client.py entry in companion doc's Files to Modify to "No changes needed — gRPC runs in runner process, outside Claude subprocess sandbox boundary."
Spec itself is correct and internally consistent — approving the spec. Companion doc should be fixed before this moves out of draft.
Round 2: LGTM on the spec. Companion doc fix is pre-implementation, not pre-merge.
90ccbcd to
f13261a
Compare
markturansky
left a comment
There was a problem hiding this comment.
amber review — Round 3 LGTM
Round 2 finding addressed. The companion doc (openshell-runner-adaptation.md) now correctly describes sandboxing only the Claude subprocess:
- Dockerfile keeps uvicorn entrypoint unchanged;
openshell-sandboxis invoked bybridge.pywhen launching the Claude CLI, not wrapped around the runner itself - Explicit prose added: "The runner itself (FastAPI/uvicorn, gRPC client, credential fetching) runs outside the sandbox boundary"
_grpc_client.pyrow in Files to Modify updated to "No changes needed — gRPC runs in runner process, outside Claude subprocess sandbox boundary"- Step 5 is now a standalone "No changes needed" section with a clear rationale
Spec and companion docs are now consistent. No further findings.
Approved. Ready to undraft when the author is satisfied with the OpenShell desired-state section.
jsell-rh
left a comment
There was a problem hiding this comment.
Nice analysis — the security model breakdown and strategy comparison are thorough. A few things we learned integrating OpenShell in the ADP project that are worth flagging:
1. inference.local doesn't work outside OpenShell-managed sandboxes. The placeholder/proxy pattern (openshell:resolve:env:*) relies on the Supervisor owning the network namespace — it creates a veth pair, routes all traffic through 10.200.0.1:3128, and rewrites placeholders at the HTTP layer. If the runner pod keeps its default network namespace (no unshare(CLONE_NEWNET)), the proxy won't intercept requests and placeholders will be sent as-is to upstream APIs. Strategy 1's implementation steps should account for the Supervisor needing NET_ADMIN capability to set up the netns before the agent process starts.
2. OpenShell's provider management API is gRPC-only, not REST. The Operator integration (Step 7) will need a gRPC client to call CreateProvider, SetClusterInference, etc. on the Gateway. There are no REST equivalents — the Gateway multiplexes gRPC and HTTP on port 8080, but provider/inference management is exclusively via the openshell.v1.OpenShell gRPC service. The proto definitions are at proto/openshell.proto and proto/inference.proto in the upstream repo.
4. The Supervisor needs to control process lifecycle, not just run as a sidecar. The Supervisor expects to be the process that fork/execs the agent — it applies Landlock, seccomp, and privilege drop in the pre_exec closure (after fork, before exec). Running it as a separate sidecar container in the same pod doesn't give it control over the agent's process setup. For standard K8s pods, the Supervisor binary needs to be the entrypoint that wraps the Claude CLI invocation (as the PR's Step 3 suggests with the Dockerfile change), not a parallel container. The bridge.py modification (Step 6) is the right place to invoke it.
5. The gh wrapper and git credential helper pattern works at the HTTP layer only. The proxy can rewrite Authorization: Bearer openshell:resolve:env:GITHUB_TOKEN in HTTP requests, but it can't intercept credential usage in non-HTTP contexts. The git credential helper works because git ultimately makes HTTPS requests that pass through the proxy. But if any tool reads the placeholder token and uses it for SSH auth or embeds it in a non-HTTP protocol, the placeholder string will be sent verbatim. Worth documenting this limitation — the current install_gh_wrapper() and install_git_credential_helper() patterns should work fine since they're HTTP-based, but future credential types (SSH keys, kubeconfig client certs) would need a different approach.
- Fix source layout: add model.py, observability files, fixtures/, remove duplicate workspace.py - Document AGUI_TOKEN session auth middleware and SDK_OPTIONS env var - Document runtime model switching via POST /model - Add 'Desired State: OpenShell Credential Isolation' section with migration path 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
f13261a to
597cc4c
Compare
Summary
model.py,observability_config.py,observability_privacy.py,mlflow_observability.py,fixtures/), removes duplicateworkspace.pyentryAGUI_TOKENsession auth middleware,SDK_OPTIONSenv var,POST /modelendpointTest plan
🤖 Generated with Claude Code