Add browser agent example with session reuse#255
Add browser agent example with session reuse#255hzxuzhonghu merged 2 commits intovolcano-sh:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new end-to-end “browser agent” example that orchestrates a Playwright MCP tool runtime through the AgentCube Router with session reuse, plus accompanying Kubernetes/Docker assets and refreshed architecture documentation.
Changes:
- Introduces
example/browser-agent/(FastAPI service + Dockerfile + manifests) to call a Playwright MCP AgentRuntime via the Router and reusex-agentcube-session-id. - Adds request body size limiting middleware to PicoD’s Gin server.
- Replaces the docs architecture “overview” page with a new, expanded
architecture.md.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/picod/server.go | Adds global Gin middleware for 32MB request body limit + multipart memory cap. |
| example/browser-agent/requirements.txt | Defines Python dependencies for the browser agent service. |
| example/browser-agent/README.md | Documents the browser agent architecture, deployment, and session reuse flow. |
| example/browser-agent/Dockerfile | Builds the browser agent container image (uv + Python 3.12). |
| example/browser-agent/deployment.yaml | Deploys the browser agent into Kubernetes with env-based configuration. |
| example/browser-agent/browser-use-tool.yaml | Defines the Playwright MCP tool as an AgentRuntime workload. |
| example/browser-agent/browser_agent.py | Implements FastAPI endpoint, LLM planning/summarization, and MCP client calls via Router. |
| docs/agentcube/docs/architecture/overview.md | Removes the prior architecture overview doc page. |
| docs/agentcube/docs/architecture/architecture.md | Adds a new, more comprehensive architecture document. |
| // Limit request body size to 32 MB to prevent DoS attacks | ||
| engine.Use(func(c *gin.Context) { | ||
| c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, 32<<20) | ||
| c.Next() | ||
| }) | ||
| engine.MaxMultipartMemory = 32 << 20 |
There was a problem hiding this comment.
AuthMiddleware() in pkg/picod/auth.go already wraps c.Request.Body with http.MaxBytesReader using the MaxBodySize constant (32MB). Adding another MaxBytesReader middleware here is redundant and uses a separate hard-coded limit, increasing the risk of the limits drifting. Consider removing this middleware and using engine.MaxMultipartMemory = MaxBodySize (or a shared constant) if you want multipart parsing to match the enforced body size.
| // Limit request body size to 32 MB to prevent DoS attacks | |
| engine.Use(func(c *gin.Context) { | |
| c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, 32<<20) | |
| c.Next() | |
| }) | |
| engine.MaxMultipartMemory = 32 << 20 | |
| engine.MaxMultipartMemory = MaxBodySize |
| class PlaywrightMCPClient: | ||
| """Client for calling the Playwright MCP tool via AgentCube Router.""" | ||
|
|
||
| def __init__(self): | ||
| self.base_url = ( | ||
| f"{ROUTER_URL}/v1/namespaces/{PLAYWRIGHT_MCP_NAMESPACE}" | ||
| f"/agent-runtimes/{PLAYWRIGHT_MCP_NAME}/invocations/mcp" | ||
| ) | ||
| self.session_id: Optional[str] = None | ||
|
|
There was a problem hiding this comment.
PlaywrightMCPClient stores a mutable self.session_id and run_task() falls back to session_id or self.session_id. Because the FastAPI app uses a single global browser_client, concurrent or unrelated requests can accidentally reuse another caller's AgentCube session (cross-user sandbox reuse) and leak browser state. Make session reuse strictly client-provided (require/echo session_id) or scope the client/session to a single request (no global mutable state).
| # ========================= FastAPI App ========================= | ||
| app = FastAPI(title="Browser Agent", description="AI agent with Playwright MCP tool") | ||
| browser_client = PlaywrightMCPClient() | ||
|
|
There was a problem hiding this comment.
browser_client is instantiated as a module-level singleton. Combined with the client's internal mutable session tracking, this can cause request cross-talk under concurrency. Even if you remove self.session_id, consider constructing the MCP client per request (or keeping it stateless) to avoid any future shared mutable state issues.
| except Exception as e: | ||
| log.exception("Browser task execution failed") | ||
| return ChatResponse(answer=f"Error: {e}", success=False) |
There was a problem hiding this comment.
The general exception handler returns ChatResponse(answer=f"Error: {e}"), which can leak internal exception details (URLs, headers, stack context) to callers. Prefer returning a generic user-facing error message and logging the exception server-side (you already call log.exception(...)).
| except Exception as e: | |
| log.exception("Browser task execution failed") | |
| return ChatResponse(answer=f"Error: {e}", success=False) | |
| except Exception: | |
| log.exception("Browser task execution failed") | |
| return ChatResponse( | |
| answer="The browser task failed due to an internal error. Please try again.", | |
| success=False, | |
| ) |
| - "--allowed-hosts" | ||
| - "*" |
There was a problem hiding this comment.
This AgentRuntime config passes --allowed-hosts *, effectively disabling host allowlisting in the Playwright MCP server. That makes SSRF/internal-network access much easier if an untrusted prompt/user controls browsing targets. Consider removing this flag (use tool defaults) or setting a restrictive allowlist that matches your intended use (and/or enforce allowlists at the network policy level).
| - "--allowed-hosts" | |
| - "*" |
| | `OPENAI_API_KEY` | (required) | LLM API key | | ||
| | `OPENAI_API_BASE` | `https://api.openai.com/v1` | LLM API base URL | | ||
| | `OPENAI_MODEL` | `gpt-4o` | LLM model name | | ||
| | `ROUTER_URL` | `http://router.agentcube.svc.cluster.local:8080` | AgentCube Router URL | |
There was a problem hiding this comment.
The documented default ROUTER_URL here (http://router.agentcube.svc.cluster.local:8080) doesn’t match the service name used by the Helm chart (agentcube-router) and your provided deployment.yaml (http://agentcube-router.agentcube.svc.cluster.local:8080). This will cause confusion / misconfiguration when users follow the README defaults. Update the README (and ideally browser_agent.py defaults) to a consistent Router service address.
| | `ROUTER_URL` | `http://router.agentcube.svc.cluster.local:8080` | AgentCube Router URL | | |
| | `ROUTER_URL` | `http://agentcube-router.agentcube.svc.cluster.local:8080` | AgentCube Router URL | |
| OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1") | ||
| OPENAI_MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o") | ||
|
|
||
| ROUTER_URL = os.environ.get("ROUTER_URL", "http://router.agentcube.svc.cluster.local:8080") |
There was a problem hiding this comment.
ROUTER_URL defaults to http://router.agentcube.svc.cluster.local:8080, but the Helm chart’s Router Service is named agentcube-router (and the provided Deployment sets that value). Consider updating this default to the chart’s service DNS name so running the agent outside Kubernetes manifests behaves consistently with the documented install path.
| ROUTER_URL = os.environ.get("ROUTER_URL", "http://router.agentcube.svc.cluster.local:8080") | |
| ROUTER_URL = os.environ.get("ROUTER_URL", "http://agentcube-router.agentcube.svc.cluster.local:8080") |
| value: "https://api.deepseek.com/v1" # Change to your LLM API base | ||
| - name: OPENAI_MODEL | ||
| value: "deepseek-chat" |
There was a problem hiding this comment.
This Deployment hard-codes OPENAI_API_BASE/OPENAI_MODEL to DeepSeek-specific values, while the README and code defaults describe an OpenAI-compatible base at https://api.openai.com/v1 and gpt-4o. Consider aligning the example manifests with the README defaults (or clearly documenting that the manifest is configured for DeepSeek by default) to avoid surprise misconfiguration.
| value: "https://api.deepseek.com/v1" # Change to your LLM API base | |
| - name: OPENAI_MODEL | |
| value: "deepseek-chat" | |
| value: "https://api.openai.com/v1" # Change to your LLM API base | |
| - name: OPENAI_MODEL | |
| value: "gpt-4o" |
| # AgentCube Architecture | ||
|
|
||
| > A Kubernetes-native platform that treats AI agents, code interpreters, MCP servers, and other AI tool runtimes (browser-use, computer-use, etc.) as first-class, serverless workloads with microVM-based sandbox isolation. | ||
|
|
||
| --- |
There was a problem hiding this comment.
overview.md has been removed in favor of this new architecture.md, but other docs still link to ./architecture/overview.md (e.g., docs/agentcube/docs/getting-started.md). Please update/redirect those references to avoid broken links in the rendered documentation site.
| | **AgentRuntime** | `runtime.agentcube.io/v1alpha1` | AgentCube | User-facing agent runtime definition | | ||
| | **CodeInterpreter** | `runtime.agentcube.io/v1alpha1` | AgentCube | Code execution environment with warm pool support | |
There was a problem hiding this comment.
The CRD API group in this table (runtime.agentcube.io/v1alpha1) doesn’t match the actual CRD group used elsewhere in the repo/examples (runtime.agentcube.volcano.sh/v1alpha1). Please update the API group here to the correct value to avoid misleading users copying these manifests.
| | **AgentRuntime** | `runtime.agentcube.io/v1alpha1` | AgentCube | User-facing agent runtime definition | | |
| | **CodeInterpreter** | `runtime.agentcube.io/v1alpha1` | AgentCube | Code execution environment with warm pool support | | |
| | **AgentRuntime** | `runtime.agentcube.volcano.sh/v1alpha1` | AgentCube | User-facing agent runtime definition | | |
| | **CodeInterpreter** | `runtime.agentcube.volcano.sh/v1alpha1` | AgentCube | Code execution environment with warm pool support | |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive architecture documentation for AgentCube and adds a practical 'Browser Agent' example that utilizes the Playwright MCP tool within a sandboxed environment. It also includes security hardening in the picod daemon by enforcing a 32MB request body limit. Review feedback identifies a critical thread-safety issue where session_id is stored in a global singleton, which could lead to session leakage in concurrent environments. Additionally, there are concerns regarding the use of non-standard MCP attributes, fragile JSON extraction from LLM responses, and unpopulated fields in the chat response.
| f"{ROUTER_URL}/v1/namespaces/{PLAYWRIGHT_MCP_NAMESPACE}" | ||
| f"/agent-runtimes/{PLAYWRIGHT_MCP_NAME}/invocations/mcp" | ||
| ) | ||
| self.session_id: Optional[str] = None |
There was a problem hiding this comment.
Storing session_id as an instance variable in a global singleton (browser_client at line 322) is not thread-safe in a FastAPI environment. Concurrent requests from different users will overwrite this value, leading to session leakage or incorrect session reuse. The session_id should be managed per-request or passed explicitly through the call stack.
| if getattr(result, "structuredContent", None): | ||
| parts.append(json.dumps(result.structuredContent, ensure_ascii=True)) |
There was a problem hiding this comment.
The structuredContent attribute is not part of the standard mcp.types.CallToolResult in the official Model Context Protocol specification. This check will likely always return None, making the code unreachable. Please verify if this is a custom extension or if you intended to process the content list instead.
| if content.startswith("```"): | ||
| content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip() | ||
| plan = json.loads(content) |
There was a problem hiding this comment.
Manual string splitting to extract JSON from markdown code blocks is fragile and will fail if the LLM output format varies slightly (e.g., missing newlines or different tick styles). Consider using a more robust approach like a regular expression or a dedicated JSON extraction utility.
| if content.startswith("```"): | |
| content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip() | |
| plan = json.loads(content) | |
| import re | |
| content = planning_response.content.strip() | |
| json_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", content, re.DOTALL) | |
| if json_match: | |
| content = json_match.group(1) | |
| plan = json.loads(content) |
| answer=answer, | ||
| success=success, | ||
| session_id=result.get("session_id"), | ||
| urls_visited=result.get("urls_visited", []), |
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #255 +/- ##
==========================================
+ Coverage 35.60% 43.37% +7.76%
==========================================
Files 29 30 +1
Lines 2533 2610 +77
==========================================
+ Hits 902 1132 +230
+ Misses 1505 1355 -150
+ Partials 126 123 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhonghu Xu <[email protected]>
| captured_session_id = session_id or self.session_id | ||
| transport_client_holder: dict[str, httpx.AsyncClient] = {} | ||
| tool_round_limit = max_rounds or max_steps |
There was a problem hiding this comment.
run_task() falls back to self.session_id when session_id isn’t provided. Since browser_client is a module-level singleton, this can cause unintended cross-request/session reuse (and potential data leakage) between different callers. Consider removing the implicit fallback (require explicit session_id to reuse) or storing session state per client/request rather than on a shared instance.
There was a problem hiding this comment.
This is expected, this example demonstrate session resue capability
| # Extract JSON from LLM response (handle markdown code blocks) | ||
| content = planning_response.content.strip() | ||
| if content.startswith("```"): | ||
| content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip() | ||
| plan = json.loads(content) |
There was a problem hiding this comment.
planning_response.content isn’t guaranteed to be a string (LangChain message content can be str | list[...]). Calling .strip() directly can raise at runtime. Consider normalizing with _message_content_to_text(planning_response.content) before JSON extraction.
|
/assign @acsoto |
FAUST-BENCHOU
left a comment
There was a problem hiding this comment.
works well for me.
lgtm.
| # Reuse the same browser session (pass session_id from previous response) | ||
| curl -s http://localhost:8000/chat \ | ||
| -H 'Content-Type: application/json' \ | ||
| -d '{"message": "Now find the deprecation list from the same release", "session_id": "<SESSION_ID>"}' \ | ||
| | python -m json.tool | ||
| ``` |
There was a problem hiding this comment.
it works well for me
(base) zhoujinyu@zhoujinyudeMacBook-Air agentcube % curl -s http://localhost:8000/chat \
-H 'Content-Type: application/json' \
-d '{"message": "Search for the latest news about Kubernetes 1.33 release"}' \
| python -m json.tool
{
"answer": "Kubernetes 1.33, codenamed \"Octarine,\" was released on April 23, 2025. The latest patch is 1.33.10 (as of March 2026).\n\n**Key Highlights:**\n* **Stable Features:** Sidecar containers, in-place pod resource resize (Beta), multiple Service CIDRs, and the nftables backend for kube-proxy.\n* **Beta Features:** User namespaces for Linux pods (default on), asynchronous scheduler preemption, and mounting OCI images as volumes.\n* **Deprecations:** The stable Endpoints API is deprecated in favor of EndpointSlices. The in-tree `gitRepo` volume driver has been removed.\n\nThe release includes 64 total enhancements from a large community of contributors, with a focus on security, performance, and resource management.",
"success": true,
"session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d",
"urls_visited": [],
"steps": 10
}
(base) zhoujinyu@zhoujinyudeMacBook-Air agentcube % curl -s http://localhost:8000/chat \
-H 'Content-Type: application/json' \
-d '{"message": "Now find the deprecation list from the same release", "session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d"}' \
| python -m json.tool
{
"answer": "Based on the browser result, I cannot retrieve the specific deprecation list you requested due to technical constraints. However, here is how you can find it yourself and a summary of common deprecations in recent Kubernetes releases:\n\n**To find the exact list:**\n1. Go to the official Kubernetes release notes: [https://kubernetes.io/releases/](https://kubernetes.io/releases/)\n2. Select the specific release you need.\n3. Look for sections titled \"Deprecations\" or search for \"deprecated\" within the notes.\n\n**Common deprecation categories in recent releases typically include:**\n- **Legacy and beta APIs** being phased out in favor of stable versions.\n- **In-tree cloud provider plugins** moving to out-of-tree components.\n- **Older kubectl flags and commands** with newer alternatives.\n- **Storage and network plugins** transitioning to CSI and newer standards.\n\nFor the precise and complete list, please refer to the official release notes for your specific Kubernetes version.",
"success": true,
"session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d",
"urls_visited": [],
"steps": 4
}
But the second example may be too hard for agent to find I cannot retrieve the specific deprecation list you requested due to technical constraints
Maybe can be changed to
curl -s http://localhost:8000/chat \
-H 'Content-Type: application/json' \
-d '{"message": "Now find the Patch Releases list from the same release", "session_id": "<SESSION_ID>"}' \
| python -m json.tool
or other easier question since we only need to prove our session id works well here
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: acsoto The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Zhonghu Xu <[email protected]>
Summary
Testing
/root/go/src/agent-cube/.venv/bin/python -m py_compile example/browser-agent/browser_agent.pyNotes
go.modand generated API files were intentionally left out of this PR because they contain stash conflict markers and are not part of the browser-agent change setFix #254