docs: add comprehensive architecture document#250
docs: add comprehensive architecture document#250hzxuzhonghu wants to merge 3 commits intovolcano-sh:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive architecture document for AgentCube, detailing its split-plane design, core components, and sandbox lifecycle. The review identifies several inconsistencies between the new documentation and the existing codebase, specifically regarding supported HTTP methods, default workspace paths, and binary initialization logic. Additionally, it notes missing implementation for documented configuration variables and recommends enforcing request body limits in PicoD to align with the architectural specifications.
| GET /health → Health check (no auth required) | ||
| ``` | ||
|
|
||
| **Security**: JWT verification using RSA public key injected via `PICOD_AUTH_PUBLIC_KEY` env var. Max body: 32 MB. Path traversal protection via `sanitizePath()`. |
There was a problem hiding this comment.
| | `picod` | `cmd/picod/main.go` | `make build-picod` | In-sandbox daemon: execute + files | | ||
| | `agentd` | `cmd/agentd/main.go` | `make build-agentd` | Standalone session expiry cleanup | | ||
|
|
||
| All binaries use `controller-runtime` for Kubernetes integration and `signal.NotifyContext()` for graceful shutdown. |
There was a problem hiding this comment.
The statement that all binaries use controller-runtime and signal.NotifyContext() is inaccurate. For instance, cmd/router does not use controller-runtime, cmd/workload-manager uses signal.Notify instead of signal.NotifyContext, and cmd/picod does not implement explicit signal handling in its main.go.
| All binaries use `controller-runtime` for Kubernetes integration and `signal.NotifyContext()` for graceful shutdown. | |
| Binaries use controller-runtime or signal handling (e.g., signal.NotifyContext) for Kubernetes integration and graceful shutdown. |
| | `WORKLOAD_MANAGER_ADDR` | (required) | Workload Manager service address | | ||
| | `AGENTCUBE_NAMESPACE` | `default` | Namespace for operations | |
| | Env Var | Default | Description | | ||
| |---------|---------|-------------| | ||
| | `PORT` | `8080` | Listen port | | ||
| | `WORKSPACE` | `/workspace` | Working directory for file operations | |
There was a problem hiding this comment.
The documentation states the default workspace is /workspace, but the implementation in pkg/picod/server.go (lines 58-65) defaults to the current working directory if the workspace is not specified.
| | `WORKSPACE` | `/workspace` | Working directory for file operations | | |
| | WORKSPACE | (current directory) | Working directory for file operations | |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #250 +/- ##
==========================================
+ Coverage 35.60% 43.46% +7.85%
==========================================
Files 29 30 +1
Lines 2533 2616 +83
==========================================
+ Hits 902 1137 +235
+ Misses 1505 1358 -147
+ Partials 126 121 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Replaces the old, high-level architecture overview with a much more comprehensive system design document for AgentCube, covering the split-plane architecture, components, CRDs, key flows, and operational behavior.
Changes:
- Removed
overview.mdand introduced a new long-formarchitecture.md. - Added detailed descriptions of Router, Workload Manager, PicoD, agentd, session store, CRD relationships, and key lifecycle flows.
- Added Mermaid diagrams (sequence + state machine) and configuration/build/dependency reference sections.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| docs/agentcube/docs/architecture/architecture.md | New comprehensive architecture doc; several sections need alignment with current code (routes/methods, config flags vs env vars, build targets, deps, interfaces). |
| docs/agentcube/docs/architecture/overview.md | Removed legacy overview content (superseded by architecture.md). |
| **API Routes**: | ||
| ``` | ||
| POST /v1/namespaces/{ns}/agent-runtimes/{name}/invocations/* | ||
| POST /v1/namespaces/{ns}/code-interpreters/{name}/invocations/* | ||
| ``` | ||
|
|
||
| **Error Responses**: `400` invalid session | `429` concurrency limit | `502` sandbox unreachable | `504` timeout | ||
|
|
There was a problem hiding this comment.
Router invocation endpoints support both GET and POST (GET is used for file download); the doc currently lists only POST. Also, an invalid/nonexistent session ID is returned as a 404 (NotFound) via api.NewSessionNotFoundError, not a 400. Please update the route list and error-response codes to match the router implementation (see pkg/router/server.go and pkg/api/errors.go).
| PicoD verifies JWT signature using public key from env | ||
| ``` | ||
|
|
||
| **JWT Claims**: `session_id`, `sandbox_id`, `exp` (expiration), `iat` (issued at) |
There was a problem hiding this comment.
JWT claims list mentions sandbox_id, but Router currently only adds session_id as a custom claim when signing requests (plus standard exp, iat, iss). Either document only the claims that are actually present, or update the signer/verifier to include and validate sandbox_id if it's required for the design.
| **JWT Claims**: `session_id`, `sandbox_id`, `exp` (expiration), `iat` (issued at) | |
| **JWT Claims**: `session_id`, `exp` (expiration), `iat` (issued at), `iss` (issuer) |
| |--------|-------------|--------------|------| | ||
| | `workload-manager` | `cmd/workload-manager/main.go` | `make build` | Control plane: API server + reconcilers + GC | | ||
| | `router` | `cmd/router/main.go` | `make build-router` | Data plane: session routing + JWT + reverse proxy | | ||
| | `picod` | `cmd/picod/main.go` | `make build-picod` | In-sandbox daemon: execute + files | |
There was a problem hiding this comment.
make build-picod is referenced as the PicoD build target, but the repo Makefile does not define a build-picod target (only docker-build-picod). Please update the table to the correct build target/command so readers can reproduce builds reliably.
| | `picod` | `cmd/picod/main.go` | `make build-picod` | In-sandbox daemon: execute + files | | |
| | `picod` | `cmd/picod/main.go` | `make docker-build-picod` | In-sandbox daemon: execute + files | |
There was a problem hiding this comment.
I would like to remove the build, that does not belong to arch
| | `picod` | `cmd/picod/main.go` | `make build-picod` | In-sandbox daemon: execute + files | | ||
| | `agentd` | `cmd/agentd/main.go` | `make build-agentd` | Standalone session expiry cleanup | | ||
|
|
||
| All binaries use `controller-runtime` for Kubernetes integration and `signal.NotifyContext()` for graceful shutdown. |
There was a problem hiding this comment.
Statement that all binaries use controller-runtime and signal.NotifyContext() is inaccurate: Router uses signal.NotifyContext, Workload Manager uses a manual signal channel, agentd uses ctrl.SetupSignalHandler(), and PicoD does not use controller-runtime. Please adjust this line to reflect the actual shutdown/integration patterns per binary.
| All binaries use `controller-runtime` for Kubernetes integration and `signal.NotifyContext()` for graceful shutdown. | |
| Router uses `signal.NotifyContext()` for graceful shutdown, Workload Manager integrates with Kubernetes via `controller-runtime` but uses a manual signal channel, `agentd` uses `controller-runtime`'s `ctrl.SetupSignalHandler()`, and `picod` is a standalone binary without `controller-runtime` integration. |
| cmd/router → pkg/router → pkg/store, pkg/common/types, pkg/api | ||
| cmd/workload-manager → pkg/workloadmanager → pkg/store, pkg/common/types, pkg/api, pkg/apis/runtime/v1alpha1 | ||
| cmd/picod → pkg/picod → (standalone, no internal deps) | ||
| cmd/agentd → pkg/agentd → pkg/apis/runtime/v1alpha1 |
There was a problem hiding this comment.
Package dependency map shows cmd/agentd → pkg/agentd → pkg/apis/runtime/v1alpha1, but agentd actually imports agent-sandbox types and pkg/workloadmanager (for the annotation key), not the runtime v1alpha1 API. Please update this mapping to match the current imports to avoid misleading maintainers.
| cmd/agentd → pkg/agentd → pkg/apis/runtime/v1alpha1 | |
| cmd/agentd → pkg/agentd → agent-sandbox types, pkg/workloadmanager |
| ## Configuration Reference | ||
|
|
||
| ### Router | ||
| | Env Var | Default | Description | | ||
| |---------|---------|-------------| | ||
| | `PORT` | `8080` | Listen port | | ||
| | `WORKLOAD_MANAGER_ADDR` | (required) | Workload Manager service address | | ||
| | `AGENTCUBE_NAMESPACE` | `default` | Namespace for operations | | ||
| | `MAX_CONCURRENT_REQUESTS` | `1000` | Concurrency limit | | ||
| | `ENABLE_TLS` | `false` | Enable TLS termination | | ||
|
|
||
| ### Workload Manager | ||
| | Env Var | Default | Description | | ||
| |---------|---------|-------------| | ||
| | `PORT` | `8080` | Listen port | | ||
| | `RUNTIME_CLASS_NAME` | `kuasar-vmm` | Default RuntimeClassName for pods | | ||
| | `ENABLE_TLS` | `false` | Enable TLS | | ||
| | `ENABLE_AUTH` | `false` | Enable K8s auth forwarding | | ||
|
|
||
| ### PicoD | ||
| | Env Var | Default | Description | | ||
| |---------|---------|-------------| | ||
| | `PORT` | `8080` | Listen port | | ||
| | `WORKSPACE` | `/workspace` | Working directory for file operations | | ||
| | `PICOD_AUTH_PUBLIC_KEY` | (injected) | PEM-encoded RSA public key for JWT verification | | ||
|
|
There was a problem hiding this comment.
Configuration is documented as environment variables for Router/Workload Manager/PicoD, but in the code these are primarily CLI flags (e.g., router --port, --enable-tls, --max-concurrent-requests; workload-manager --runtime-class-name, --enable-auth; picod --workspace). Also WORKSPACE env var is not used by PicoD. Please revise this section to distinguish real env vars (e.g., WORKLOAD_MANAGER_ADDR, AGENTCUBE_NAMESPACE, STORE_TYPE, REDIS_ADDR, etc.) from flags.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
337ed40 to
7e38a66
Compare
| PicoD verifies JWT signature using public key from env | ||
| ``` | ||
|
|
||
| **JWT Claims**: `session_id`, `sandbox_id`, `exp` (expiration), `iat` (issued at) |
|
|
||
| **Binary**: `cmd/picod` | **Package**: `pkg/picod` | ||
|
|
||
| A lightweight HTTP daemon running inside every sandbox pod. It exposes APIs for command execution and file management. |
There was a problem hiding this comment.
Only in CodeInterpreter with AuthModePicoD
There was a problem hiding this comment.
you mean i should mention the picod auth mode? I am thinking the auth may change
There was a problem hiding this comment.
Yes. If the auth will change, ignore it~
Signed-off-by: Zhonghu Xu <[email protected]>
- Implement 32 MB request body limit in picod via MaxBytesReader middleware - Fix inaccurate signal handling claim: router uses signal.NotifyContext, agentd uses ctrl.SetupSignalHandler, workload-manager uses signal.Notify, picod has no signal handling; only agentd and workload-manager use controller-runtime - Remove non-existent AGENTCUBE_NAMESPACE from router config table; add actual flags (tls-cert, tls-key, debug); clarify WORKLOAD_MANAGER_ADDR is env var - Fix PicoD WORKSPACE default from /workspace to current directory Signed-off-by: Zhonghu Xu <[email protected]>
Signed-off-by: Zhonghu Xu <[email protected]>
7e38a66 to
d2f5a98
Compare
Summary
This PR replaces the existing
overview.mdwith a comprehensivearchitecture.mdthat documents the full AgentCube system design based on a thorough review of the design proposals, codebase, and the official architecture diagram.What's included
Coverage
Diagrams (Mermaid)
Files changed
docs/agentcube/docs/architecture/architecture.md— new comprehensive architecture document (391 lines)docs/agentcube/docs/architecture/overview.md— removed (content superseded)