Skip to content

[Feature] Browser automation tool (web interaction and data extraction) #48

@Wan-ZL

Description

@Wan-ZL

Description

Give Genesis the ability to interact with websites: fetch structured data, fill forms, extract content, and take screenshots. This transforms Genesis from "AI that talks about the web" to "AI that uses the web."

Why This Matters

  • Most user requests ultimately involve web interaction
  • Current web_fetch tool only gets raw HTML -- cannot interact with JavaScript-rendered content
  • Browser automation enables: price comparison, form filling, data extraction, screenshot capture
  • Competitors (OpenClaw, Lindy) offer browser automation as a core skill

Acceptance Criteria

  • BrowserService: assistant/server/services/browser.py
  • Uses Playwright for headless browser automation
  • Tools registered: browse_url, extract_content, take_screenshot, fill_form
  • browse_url: Navigate to URL, wait for content, return text/structured data
  • extract_content: CSS selector-based content extraction from a page
  • take_screenshot: Capture page screenshot, store in memory/files/
  • fill_form: Fill form fields by selector and submit
  • All tools require SYSTEM permission level
  • Resource limits: max 30 second page load, max 5 concurrent browsers
  • Sandbox: Playwright runs in headless mode with restricted permissions
  • URL allowlist/blocklist configurable via settings
  • At least 12 tests (mock browser interactions)
  • Documentation: assistant/docs/BROWSER_AUTOMATION.md

Technical Notes

  • Playwright is preferred over Selenium (faster, better API, async support)
  • Headless mode only (no visible browser window)
  • Browser context isolated per request (no cookie leakage)
  • Screenshots stored with metadata for future reference
  • Consider page content caching to reduce redundant requests

Security Considerations

  • SYSTEM permission required (browser can access local network)
  • URL filtering to prevent accessing sensitive internal services
  • No credential auto-fill (user must explicitly provide form data)
  • Rate limiting to prevent abuse
  • Browser sandbox prevents filesystem access

Priority Rationale

MEDIUM: Powerful capability but complex to implement safely. Should be built after the core accessibility and memory features are solid.

Phase

Phase 8: Always-On Partner

Dependencies

  • None (uses existing permission system)

Related

  • Phase 8 ADR: planner_iteration/decisions/ADR-004-phase8-always-on-partner.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions