Skip to content

Add browser agent example with session reuse#255

Merged
hzxuzhonghu merged 2 commits intovolcano-sh:mainfrom
hzxuzhonghu:browser-use
Apr 15, 2026
Merged

Add browser agent example with session reuse#255
hzxuzhonghu merged 2 commits intovolcano-sh:mainfrom
hzxuzhonghu:browser-use

Conversation

@hzxuzhonghu
Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu commented Apr 3, 2026

Summary

  • add a browser agent example backed by the Playwright MCP runtime
  • deploy the browser agent and browser-use tool with Kubernetes manifests and a Dockerfile
  • preserve AgentCube session reuse across MCP calls and stop cleanly when the tool-call limit is reached

Testing

  • /root/go/src/agent-cube/.venv/bin/python -m py_compile example/browser-agent/browser_agent.py

Notes

  • unrelated local changes in go.mod and generated API files were intentionally left out of this PR because they contain stash conflict markers and are not part of the browser-agent change set

Fix #254

Copilot AI review requested due to automatic review settings April 3, 2026 09:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new end-to-end “browser agent” example that orchestrates a Playwright MCP tool runtime through the AgentCube Router with session reuse, plus accompanying Kubernetes/Docker assets and refreshed architecture documentation.

Changes:

  • Introduces example/browser-agent/ (FastAPI service + Dockerfile + manifests) to call a Playwright MCP AgentRuntime via the Router and reuse x-agentcube-session-id.
  • Adds request body size limiting middleware to PicoD’s Gin server.
  • Replaces the docs architecture “overview” page with a new, expanded architecture.md.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pkg/picod/server.go Adds global Gin middleware for 32MB request body limit + multipart memory cap.
example/browser-agent/requirements.txt Defines Python dependencies for the browser agent service.
example/browser-agent/README.md Documents the browser agent architecture, deployment, and session reuse flow.
example/browser-agent/Dockerfile Builds the browser agent container image (uv + Python 3.12).
example/browser-agent/deployment.yaml Deploys the browser agent into Kubernetes with env-based configuration.
example/browser-agent/browser-use-tool.yaml Defines the Playwright MCP tool as an AgentRuntime workload.
example/browser-agent/browser_agent.py Implements FastAPI endpoint, LLM planning/summarization, and MCP client calls via Router.
docs/agentcube/docs/architecture/overview.md Removes the prior architecture overview doc page.
docs/agentcube/docs/architecture/architecture.md Adds a new, more comprehensive architecture document.

Comment thread pkg/picod/server.go Outdated
Comment on lines +76 to +81
// Limit request body size to 32 MB to prevent DoS attacks
engine.Use(func(c *gin.Context) {
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, 32<<20)
c.Next()
})
engine.MaxMultipartMemory = 32 << 20
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AuthMiddleware() in pkg/picod/auth.go already wraps c.Request.Body with http.MaxBytesReader using the MaxBodySize constant (32MB). Adding another MaxBytesReader middleware here is redundant and uses a separate hard-coded limit, increasing the risk of the limits drifting. Consider removing this middleware and using engine.MaxMultipartMemory = MaxBodySize (or a shared constant) if you want multipart parsing to match the enforced body size.

Suggested change
// Limit request body size to 32 MB to prevent DoS attacks
engine.Use(func(c *gin.Context) {
c.Request.Body = http.MaxBytesReader(c.Writer, c.Request.Body, 32<<20)
c.Next()
})
engine.MaxMultipartMemory = 32 << 20
engine.MaxMultipartMemory = MaxBodySize

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +172
class PlaywrightMCPClient:
"""Client for calling the Playwright MCP tool via AgentCube Router."""

def __init__(self):
self.base_url = (
f"{ROUTER_URL}/v1/namespaces/{PLAYWRIGHT_MCP_NAMESPACE}"
f"/agent-runtimes/{PLAYWRIGHT_MCP_NAME}/invocations/mcp"
)
self.session_id: Optional[str] = None

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PlaywrightMCPClient stores a mutable self.session_id and run_task() falls back to session_id or self.session_id. Because the FastAPI app uses a single global browser_client, concurrent or unrelated requests can accidentally reuse another caller's AgentCube session (cross-user sandbox reuse) and leak browser state. Make session reuse strictly client-provided (require/echo session_id) or scope the client/session to a single request (no global mutable state).

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +323
# ========================= FastAPI App =========================
app = FastAPI(title="Browser Agent", description="AI agent with Playwright MCP tool")
browser_client = PlaywrightMCPClient()

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

browser_client is instantiated as a module-level singleton. Combined with the client's internal mutable session tracking, this can cause request cross-talk under concurrency. Even if you remove self.session_id, consider constructing the MCP client per request (or keeping it stateless) to avoid any future shared mutable state issues.

Copilot uses AI. Check for mistakes.
Comment on lines +373 to +375
except Exception as e:
log.exception("Browser task execution failed")
return ChatResponse(answer=f"Error: {e}", success=False)
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general exception handler returns ChatResponse(answer=f"Error: {e}"), which can leak internal exception details (URLs, headers, stack context) to callers. Prefer returning a generic user-facing error message and logging the exception server-side (you already call log.exception(...)).

Suggested change
except Exception as e:
log.exception("Browser task execution failed")
return ChatResponse(answer=f"Error: {e}", success=False)
except Exception:
log.exception("Browser task execution failed")
return ChatResponse(
answer="The browser task failed due to an internal error. Please try again.",
success=False,
)

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +25
- "--allowed-hosts"
- "*"
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This AgentRuntime config passes --allowed-hosts *, effectively disabling host allowlisting in the Playwright MCP server. That makes SSRF/internal-network access much easier if an untrusted prompt/user controls browsing targets. Consider removing this flag (use tool defaults) or setting a restrictive allowlist that matches your intended use (and/or enforce allowlists at the network policy level).

Suggested change
- "--allowed-hosts"
- "*"

Copilot uses AI. Check for mistakes.
| `OPENAI_API_KEY` | (required) | LLM API key |
| `OPENAI_API_BASE` | `https://api.openai.com/v1` | LLM API base URL |
| `OPENAI_MODEL` | `gpt-4o` | LLM model name |
| `ROUTER_URL` | `http://router.agentcube.svc.cluster.local:8080` | AgentCube Router URL |
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented default ROUTER_URL here (http://router.agentcube.svc.cluster.local:8080) doesn’t match the service name used by the Helm chart (agentcube-router) and your provided deployment.yaml (http://agentcube-router.agentcube.svc.cluster.local:8080). This will cause confusion / misconfiguration when users follow the README defaults. Update the README (and ideally browser_agent.py defaults) to a consistent Router service address.

Suggested change
| `ROUTER_URL` | `http://router.agentcube.svc.cluster.local:8080` | AgentCube Router URL |
| `ROUTER_URL` | `http://agentcube-router.agentcube.svc.cluster.local:8080` | AgentCube Router URL |

Copilot uses AI. Check for mistakes.
Comment thread example/browser-agent/browser_agent.py Outdated
OPENAI_API_BASE = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
OPENAI_MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o")

ROUTER_URL = os.environ.get("ROUTER_URL", "http://router.agentcube.svc.cluster.local:8080")
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROUTER_URL defaults to http://router.agentcube.svc.cluster.local:8080, but the Helm chart’s Router Service is named agentcube-router (and the provided Deployment sets that value). Consider updating this default to the chart’s service DNS name so running the agent outside Kubernetes manifests behaves consistently with the documented install path.

Suggested change
ROUTER_URL = os.environ.get("ROUTER_URL", "http://router.agentcube.svc.cluster.local:8080")
ROUTER_URL = os.environ.get("ROUTER_URL", "http://agentcube-router.agentcube.svc.cluster.local:8080")

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
value: "https://api.deepseek.com/v1" # Change to your LLM API base
- name: OPENAI_MODEL
value: "deepseek-chat"
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Deployment hard-codes OPENAI_API_BASE/OPENAI_MODEL to DeepSeek-specific values, while the README and code defaults describe an OpenAI-compatible base at https://api.openai.com/v1 and gpt-4o. Consider aligning the example manifests with the README defaults (or clearly documenting that the manifest is configured for DeepSeek by default) to avoid surprise misconfiguration.

Suggested change
value: "https://api.deepseek.com/v1" # Change to your LLM API base
- name: OPENAI_MODEL
value: "deepseek-chat"
value: "https://api.openai.com/v1" # Change to your LLM API base
- name: OPENAI_MODEL
value: "gpt-4o"

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# AgentCube Architecture

> A Kubernetes-native platform that treats AI agents, code interpreters, MCP servers, and other AI tool runtimes (browser-use, computer-use, etc.) as first-class, serverless workloads with microVM-based sandbox isolation.

---
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overview.md has been removed in favor of this new architecture.md, but other docs still link to ./architecture/overview.md (e.g., docs/agentcube/docs/getting-started.md). Please update/redirect those references to avoid broken links in the rendered documentation site.

Copilot uses AI. Check for mistakes.
Comment on lines +170 to +171
| **AgentRuntime** | `runtime.agentcube.io/v1alpha1` | AgentCube | User-facing agent runtime definition |
| **CodeInterpreter** | `runtime.agentcube.io/v1alpha1` | AgentCube | Code execution environment with warm pool support |
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD API group in this table (runtime.agentcube.io/v1alpha1) doesn’t match the actual CRD group used elsewhere in the repo/examples (runtime.agentcube.volcano.sh/v1alpha1). Please update the API group here to the correct value to avoid misleading users copying these manifests.

Suggested change
| **AgentRuntime** | `runtime.agentcube.io/v1alpha1` | AgentCube | User-facing agent runtime definition |
| **CodeInterpreter** | `runtime.agentcube.io/v1alpha1` | AgentCube | Code execution environment with warm pool support |
| **AgentRuntime** | `runtime.agentcube.volcano.sh/v1alpha1` | AgentCube | User-facing agent runtime definition |
| **CodeInterpreter** | `runtime.agentcube.volcano.sh/v1alpha1` | AgentCube | Code execution environment with warm pool support |

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive architecture documentation for AgentCube and adds a practical 'Browser Agent' example that utilizes the Playwright MCP tool within a sandboxed environment. It also includes security hardening in the picod daemon by enforcing a 32MB request body limit. Review feedback identifies a critical thread-safety issue where session_id is stored in a global singleton, which could lead to session leakage in concurrent environments. Additionally, there are concerns regarding the use of non-standard MCP attributes, fragile JSON extraction from LLM responses, and unpopulated fields in the chat response.

f"{ROUTER_URL}/v1/namespaces/{PLAYWRIGHT_MCP_NAMESPACE}"
f"/agent-runtimes/{PLAYWRIGHT_MCP_NAME}/invocations/mcp"
)
self.session_id: Optional[str] = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Storing session_id as an instance variable in a global singleton (browser_client at line 322) is not thread-safe in a FastAPI environment. Concurrent requests from different users will overwrite this value, leading to session leakage or incorrect session reuse. The session_id should be managed per-request or passed explicitly through the call stack.

Comment on lines +130 to +131
if getattr(result, "structuredContent", None):
parts.append(json.dumps(result.structuredContent, ensure_ascii=True))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The structuredContent attribute is not part of the standard mcp.types.CallToolResult in the official Model Context Protocol specification. This check will likely always return None, making the code unreachable. Please verify if this is a custom extension or if you intended to process the content list instead.

Comment on lines +345 to +347
if content.startswith("```"):
content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip()
plan = json.loads(content)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manual string splitting to extract JSON from markdown code blocks is fragile and will fail if the LLM output format varies slightly (e.g., missing newlines or different tick styles). Consider using a more robust approach like a regular expression or a dedicated JSON extraction utility.

Suggested change
if content.startswith("```"):
content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip()
plan = json.loads(content)
import re
content = planning_response.content.strip()
json_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", content, re.DOTALL)
if json_match:
content = json_match.group(1)
plan = json.loads(content)

answer=answer,
success=success,
session_id=result.get("session_id"),
urls_visited=result.get("urls_visited", []),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The urls_visited field in ChatResponse is never populated because the run_task method does not track or return visited URLs. This results in an empty list being returned to the client even after successful browsing tasks.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 3, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.37%. Comparing base (845b798) to head (2ba0e12).
⚠️ Report is 163 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #255      +/-   ##
==========================================
+ Coverage   35.60%   43.37%   +7.76%     
==========================================
  Files          29       30       +1     
  Lines        2533     2610      +77     
==========================================
+ Hits          902     1132     +230     
+ Misses       1505     1355     -150     
+ Partials      126      123       -3     
Flag Coverage Δ
unittests 43.37% <ø> (+7.76%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI review requested due to automatic review settings April 7, 2026 01:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Comment on lines +212 to +214
captured_session_id = session_id or self.session_id
transport_client_holder: dict[str, httpx.AsyncClient] = {}
tool_round_limit = max_rounds or max_steps
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_task() falls back to self.session_id when session_id isn’t provided. Since browser_client is a module-level singleton, this can cause unintended cross-request/session reuse (and potential data leakage) between different callers. Consider removing the implicit fallback (require explicit session_id to reuse) or storing session state per client/request rather than on a shared instance.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expected, this example demonstrate session resue capability

Comment thread example/browser-agent/browser_agent.py
Comment on lines +356 to +360
# Extract JSON from LLM response (handle markdown code blocks)
content = planning_response.content.strip()
if content.startswith("```"):
content = content.split("\n", 1)[1].rsplit("```", 1)[0].strip()
plan = json.loads(content)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planning_response.content isn’t guaranteed to be a string (LangChain message content can be str | list[...]). Calling .strip() directly can raise at runtime. Consider normalizing with _message_content_to_text(planning_response.content) before JSON extraction.

Copilot uses AI. Check for mistakes.
Comment thread example/browser-agent/browser_agent.py
Comment thread example/browser-agent/browser_agent.py
Comment thread example/browser-agent/browser_agent.py
Comment thread example/browser-agent/browser_agent.py
@hzxuzhonghu
Copy link
Copy Markdown
Member Author

/assign @acsoto

Copy link
Copy Markdown
Contributor

@FAUST-BENCHOU FAUST-BENCHOU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works well for me.
lgtm.

Comment on lines +85 to +90
# Reuse the same browser session (pass session_id from previous response)
curl -s http://localhost:8000/chat \
-H 'Content-Type: application/json' \
-d '{"message": "Now find the deprecation list from the same release", "session_id": "<SESSION_ID>"}' \
| python -m json.tool
```
Copy link
Copy Markdown
Contributor

@FAUST-BENCHOU FAUST-BENCHOU Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it works well for me

(base) zhoujinyu@zhoujinyudeMacBook-Air agentcube % curl -s http://localhost:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message": "Search for the latest news about Kubernetes 1.33 release"}' \
  | python -m json.tool

{
    "answer": "Kubernetes 1.33, codenamed \"Octarine,\" was released on April 23, 2025. The latest patch is 1.33.10 (as of March 2026).\n\n**Key Highlights:**\n*   **Stable Features:** Sidecar containers, in-place pod resource resize (Beta), multiple Service CIDRs, and the nftables backend for kube-proxy.\n*   **Beta Features:** User namespaces for Linux pods (default on), asynchronous scheduler preemption, and mounting OCI images as volumes.\n*   **Deprecations:** The stable Endpoints API is deprecated in favor of EndpointSlices. The in-tree `gitRepo` volume driver has been removed.\n\nThe release includes 64 total enhancements from a large community of contributors, with a focus on security, performance, and resource management.",
    "success": true,
    "session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d",
    "urls_visited": [],
    "steps": 10
}
(base) zhoujinyu@zhoujinyudeMacBook-Air agentcube % curl -s http://localhost:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{"message": "Now find the deprecation list from the same release", "session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d"}' \
  | python -m json.tool
{
    "answer": "Based on the browser result, I cannot retrieve the specific deprecation list you requested due to technical constraints. However, here is how you can find it yourself and a summary of common deprecations in recent Kubernetes releases:\n\n**To find the exact list:**\n1. Go to the official Kubernetes release notes: [https://kubernetes.io/releases/](https://kubernetes.io/releases/)\n2. Select the specific release you need.\n3. Look for sections titled \"Deprecations\" or search for \"deprecated\" within the notes.\n\n**Common deprecation categories in recent releases typically include:**\n- **Legacy and beta APIs** being phased out in favor of stable versions.\n- **In-tree cloud provider plugins** moving to out-of-tree components.\n- **Older kubectl flags and commands** with newer alternatives.\n- **Storage and network plugins** transitioning to CSI and newer standards.\n\nFor the precise and complete list, please refer to the official release notes for your specific Kubernetes version.",
    "success": true,
    "session_id": "7b6d29da-62df-4e03-b34a-d73e555fec5d",
    "urls_visited": [],
    "steps": 4
}

But the second example may be too hard for agent to find I cannot retrieve the specific deprecation list you requested due to technical constraints
Maybe can be changed to

curl -s http://localhost:8000/chat \
 -H 'Content-Type: application/json' \
 -d '{"message": "Now find the Patch Releases list from the same release", "session_id": "<SESSION_ID>"}' \
 | python -m json.tool

or other easier question since we only need to prove our session id works well here

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion

Copy link
Copy Markdown
Member

@acsoto acsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: acsoto
Once this PR has been reviewed and has the lgtm label, please ask for approval from hzxuzhonghu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Zhonghu Xu <[email protected]>
@hzxuzhonghu hzxuzhonghu merged commit 45b3d5d into volcano-sh:main Apr 15, 2026
11 of 12 checks passed
@hzxuzhonghu hzxuzhonghu deleted the browser-use branch April 15, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a sample on browser-use

6 participants