diff --git a/CHANGELOG.md b/CHANGELOG.md index 9328ab0a9..1ddeb9e78 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,6 +13,7 @@ ### Bug Fixes - Fixed **Lightpanda engine** compatibility (#1050) + - Fixed **Helium auto-connect discovery** by scanning Helium user data directories on macOS and updating CDP guidance/help to treat Helium as a supported Chromium-based browser - Fixed **Windows daemon TCP bind** failing when Hyper-V reserves the port by falling back to an OS-assigned port and writing it to a `.port` file (#1041) - Fixed **Windows dashboard relay** using Unix socket instead of TCP (#1038) - Fixed **radio/checkbox elements** being dropped from compact snapshot tree because the `ref=` check required a leading `[` that those elements lack (#1008) diff --git a/README.md b/README.md index 19b10f31c..d2395f28e 100644 --- a/README.md +++ b/README.md @@ -373,19 +373,21 @@ agent-browser provides multiple ways to persist login sessions so you don't re-a |----------|----------|------------| | **Persistent profile** | Full browser state (cookies, IndexedDB, service workers, cache) across restarts | `--profile ` / `AGENT_BROWSER_PROFILE` | | **Session persistence** | Auto-save/restore cookies + localStorage by name | `--session-name ` / `AGENT_BROWSER_SESSION_NAME` | -| **Import from your browser** | Grab auth from a Chrome session you already logged into | `--auto-connect` + `state save` | +| **Import from your browser** | Grab auth from a Chrome/Chromium-family browser session you already logged into | `--auto-connect` + `state save` | | **State file** | Load a previously saved state JSON on launch | `--state ` / `AGENT_BROWSER_STATE` | | **Auth vault** | Store credentials locally (encrypted), login by name | `auth save` / `auth login` | ### Import auth from your browser -If you are already logged in to a site in Chrome, you can grab that auth state and reuse it: +If you are already logged in to a site in Chrome, Chromium, Brave, or Helium, you can grab that auth state and reuse it: ```bash -# 1. Launch Chrome with remote debugging enabled +# 1. Launch a Chromium-based browser with remote debugging enabled # macOS: "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 -# Or use --auto-connect to discover an already-running Chrome +# Helium: +open -na "Helium" --args --remote-debugging-port=9222 +# Or use --auto-connect to discover an already-running browser # 2. Connect and save the authenticated state agent-browser --auto-connect state save ./my-auth.json @@ -584,7 +586,7 @@ This is useful for multimodal AI models that can reason about visual layout, unl | `--screenshot-format ` | Screenshot format: `png`, `jpeg` (or `AGENT_BROWSER_SCREENSHOT_FORMAT` env) | | `--headed` | Show browser window (not headless) (or `AGENT_BROWSER_HEADED` env) | | `--cdp ` | Connect via Chrome DevTools Protocol (port or WebSocket URL) | -| `--auto-connect` | Auto-discover and connect to running Chrome (or `AGENT_BROWSER_AUTO_CONNECT` env) | +| `--auto-connect` | Auto-discover and connect to a running Chromium-based browser, including Helium (or `AGENT_BROWSER_AUTO_CONNECT` env) | | `--color-scheme ` | Color scheme: `dark`, `light`, `no-preference` (or `AGENT_BROWSER_COLOR_SCHEME` env) | | `--download-path ` | Default download directory (or `AGENT_BROWSER_DOWNLOAD_PATH` env) | | `--content-boundaries` | Wrap page output in boundary markers for LLM safety (or `AGENT_BROWSER_CONTENT_BOUNDARIES` env) | @@ -926,10 +928,10 @@ This enables control of: ### Auto-Connect -Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port: +Use `--auto-connect` to automatically discover and connect to a running Chromium-based browser without specifying a port: ```bash -# Auto-discover running Chrome with remote debugging +# Auto-discover a running Chromium-based browser with remote debugging agent-browser --auto-connect open example.com agent-browser --auto-connect snapshot @@ -937,17 +939,17 @@ agent-browser --auto-connect snapshot AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot ``` -Auto-connect discovers Chrome by: +Auto-connect discovers browsers by: -1. Reading Chrome's `DevToolsActivePort` file from the default user data directory +1. Reading `DevToolsActivePort` from the default user data directories for Chrome, Chrome Canary, Chromium, Brave, and Helium 2. Falling back to probing common debugging ports (9222, 9229) 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection This is useful when: -- Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port) +- Chrome 144+ or another Chromium-based browser has remote debugging enabled with a dynamic port - You want a zero-configuration connection to your existing browser -- You don't want to track which port Chrome is using +- You don't want to track which port the browser is using ## Streaming (Browser Preview) diff --git a/cli/src/native/cdp/chrome.rs b/cli/src/native/cdp/chrome.rs index d454e18b9..0c91e174c 100644 --- a/cli/src/native/cdp/chrome.rs +++ b/cli/src/native/cdp/chrome.rs @@ -553,7 +553,9 @@ pub async fn auto_connect_cdp() -> Result { } } - Err("No running Chrome instance found. Launch Chrome with --remote-debugging-port or use --cdp.".to_string()) + Err( + "No running Chromium-based browser instance found. Launch Chrome, Chromium, Brave, or Helium with --remote-debugging-port or use --cdp.".to_string(), + ) } fn is_port_reachable(port: u16) -> bool { @@ -574,6 +576,8 @@ fn get_chrome_user_data_dirs() -> Vec { "Google/Chrome Canary", "Chromium", "BraveSoftware/Brave-Browser", + "net.imput.helium", + "net.imput.helium-cdp", ] { dirs.push(base.join(name)); } @@ -1050,4 +1054,21 @@ mod tests { assert!(!dir.exists(), "Temp dir should be cleaned up on drop"); } + + #[cfg(target_os = "macos")] + #[test] + fn test_get_chrome_user_data_dirs_includes_helium() { + let dirs = get_chrome_user_data_dirs(); + let paths: Vec = dirs + .iter() + .map(|p| p.to_string_lossy().to_string()) + .collect(); + + assert!(paths + .iter() + .any(|p| p.ends_with("Library/Application Support/net.imput.helium"))); + assert!(paths + .iter() + .any(|p| p.ends_with("Library/Application Support/net.imput.helium-cdp"))); + } } diff --git a/cli/src/output.rs b/cli/src/output.rs index e357cff66..6a450bc29 100644 --- a/cli/src/output.rs +++ b/cli/src/output.rs @@ -2791,7 +2791,7 @@ Authentication: (or AGENT_BROWSER_SESSION_NAME env) --state Load saved auth state (cookies + storage) from JSON file (or AGENT_BROWSER_STATE env) - --auto-connect Connect to a running Chrome to reuse its auth state + --auto-connect Connect to a running Chromium-based browser to reuse auth state Tip: agent-browser --auto-connect state save ./auth.json --headers HTTP headers scoped to URL's origin (e.g., Authorization bearer token) @@ -2863,7 +2863,7 @@ Environment: AGENT_BROWSER_DEBUG Debug output AGENT_BROWSER_IGNORE_HTTPS_ERRORS Ignore HTTPS certificate errors AGENT_BROWSER_PROVIDER Browser provider (ios, browserbase, kernel, browseruse, browserless) - AGENT_BROWSER_AUTO_CONNECT Auto-discover and connect to running Chrome + AGENT_BROWSER_AUTO_CONNECT Auto-discover and connect to a running Chromium-based browser AGENT_BROWSER_ALLOW_FILE_ACCESS Allow file:// URLs to access local files AGENT_BROWSER_COLOR_SCHEME Color scheme preference (dark, light, no-preference) AGENT_BROWSER_DOWNLOAD_PATH Default download directory for browser downloads @@ -2906,7 +2906,7 @@ Examples: agent-browser screenshot --annotate # Labeled screenshot for vision models agent-browser wait --load networkidle # Wait for slow pages to load agent-browser --cdp 9222 snapshot # Connect via CDP port - agent-browser --auto-connect snapshot # Auto-discover running Chrome + agent-browser --auto-connect snapshot # Auto-discover running Chrome/Chromium/Brave/Helium agent-browser stream enable # Start runtime streaming on an auto-selected port agent-browser stream status # Inspect runtime streaming state agent-browser --color-scheme dark open example.com # Dark mode diff --git a/docs/src/app/cdp-mode/page.mdx b/docs/src/app/cdp-mode/page.mdx index 6cc80105b..8a7df3196 100644 --- a/docs/src/app/cdp-mode/page.mdx +++ b/docs/src/app/cdp-mode/page.mdx @@ -4,6 +4,7 @@ Connect to an existing browser via Chrome DevTools Protocol: ```bash # Start Chrome with: google-chrome --remote-debugging-port=9222 +# Or on macOS: open -na "Helium" --args --remote-debugging-port=9222 # Connect once, then run commands without --cdp agent-browser connect 9222 @@ -34,10 +35,10 @@ The `--cdp` flag accepts either: ## Auto-Connect -Use `--auto-connect` to automatically discover and connect to a running Chrome instance without specifying a port: +Use `--auto-connect` to automatically discover and connect to a running Chromium-based browser without specifying a port: ```bash -# Auto-discover running Chrome with remote debugging +# Auto-discover a running Chromium-based browser with remote debugging agent-browser --auto-connect open example.com agent-browser --auto-connect snapshot @@ -45,17 +46,17 @@ agent-browser --auto-connect snapshot AGENT_BROWSER_AUTO_CONNECT=1 agent-browser snapshot ``` -Auto-connect discovers Chrome by: +Auto-connect discovers browsers by: -1. Reading Chrome's `DevToolsActivePort` file from the default user data directory +1. Reading `DevToolsActivePort` from the default user data directories for Chrome, Chrome Canary, Chromium, Brave, and Helium 2. Falling back to probing common debugging ports (9222, 9229) 3. If HTTP-based discovery (`/json/version`, `/json/list`) fails, falling back to a direct WebSocket connection This is useful when: -- Chrome 144+ has remote debugging enabled via `chrome://inspect/#remote-debugging` (which uses a dynamic port) +- Chrome 144+ or another Chromium-based browser has remote debugging enabled with a dynamic port - You want a zero-configuration connection to your existing browser -- You don't want to track which port Chrome is using +- You don't want to track which port the browser is using ## Color scheme @@ -77,7 +78,7 @@ AGENT_BROWSER_COLOR_SCHEME=dark agent-browser --cdp 9222 open https://example.co This enables control of: - Electron apps -- Chrome/Chromium with remote debugging +- Chrome/Chromium/Brave/Helium with remote debugging - WebView2 applications - Remote browser services (via WebSocket URL) - Any browser exposing a CDP endpoint @@ -103,7 +104,7 @@ This enables control of: --exactExact text match --headedShow browser window {"--cdp "}CDP connection (port or WebSocket URL) - --auto-connectAuto-discover and connect to running Chrome + --auto-connectAuto-discover and connect to a running Chromium-based browser, including Helium --color-scheme <scheme>Persistent color scheme (dark, light, no-preference) --debugDebug output diff --git a/docs/src/app/commands/page.mdx b/docs/src/app/commands/page.mdx index a3e1d05ac..7448d251c 100644 --- a/docs/src/app/commands/page.mdx +++ b/docs/src/app/commands/page.mdx @@ -368,7 +368,7 @@ agent-browser reload # Reload page --screenshot-format # Format: png (default), jpeg (or AGENT_BROWSER_SCREENSHOT_FORMAT) --headed # Show browser window (not headless) --cdp # Connect via Chrome DevTools Protocol (port or WebSocket URL) ---auto-connect # Auto-discover and connect to running Chrome +--auto-connect # Auto-discover and connect to a running Chromium-based browser, including Helium --color-scheme # Color scheme: dark, light, no-preference --download-path # Default download directory --content-boundaries # Wrap page output in boundary markers for LLM safety diff --git a/docs/src/app/configuration/page.mdx b/docs/src/app/configuration/page.mdx index 36ca99047..e084e527c 100644 --- a/docs/src/app/configuration/page.mdx +++ b/docs/src/app/configuration/page.mdx @@ -163,7 +163,7 @@ These environment variables configure additional daemon and runtime behavior: VariableDescriptionDefault - AGENT_BROWSER_AUTO_CONNECTAuto-discover and connect to a running Chrome instance.(disabled) + AGENT_BROWSER_AUTO_CONNECTAuto-discover and connect to a running Chromium-based browser, including Helium.(disabled) AGENT_BROWSER_ALLOW_FILE_ACCESSAllow file:// URLs to access local files.(disabled) AGENT_BROWSER_COLOR_SCHEMEColor scheme preference (dark, light, no-preference).(none) AGENT_BROWSER_DOWNLOAD_PATHDefault directory for browser downloads.(temp directory) diff --git a/docs/src/app/sessions/page.mdx b/docs/src/app/sessions/page.mdx index 0801feeb3..d00de43c5 100644 --- a/docs/src/app/sessions/page.mdx +++ b/docs/src/app/sessions/page.mdx @@ -55,19 +55,22 @@ The profile directory stores: ## Import auth from your browser -If you are already logged in to a site in Chrome, you can grab that auth state and reuse it in agent-browser. This is the fastest way to bypass login flows, OAuth, SSO, or 2FA. +If you are already logged in to a site in Chrome, Chromium, Brave, or Helium, you can grab that auth state and reuse it in agent-browser. This is the fastest way to bypass login flows, OAuth, SSO, or 2FA. -**Step 1:** Start Chrome with remote debugging: +**Step 1:** Start a Chromium-based browser with remote debugging: ```bash # macOS "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 +# macOS (Helium) +open -na "Helium" --args --remote-debugging-port=9222 + # Linux google-chrome --remote-debugging-port=9222 ``` -Log in to your target site(s) in this Chrome window. +Log in to your target site(s) in that browser window. `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect. Only use on trusted machines and close Chrome when done. diff --git a/skills/agent-browser/SKILL.md b/skills/agent-browser/SKILL.md index a0979e693..331d0d843 100644 --- a/skills/agent-browser/SKILL.md +++ b/skills/agent-browser/SKILL.md @@ -53,7 +53,7 @@ When automating a site that requires login, choose the approach that fits: **Option 1: Import auth from the user's browser (fastest for one-off tasks)** ```bash -# Connect to the user's running Chrome (they're already logged in) +# Connect to the user's running Chromium-based browser (they're already logged in) agent-browser --auto-connect state save ./auth.json # Use that auth state agent-browser --state ./auth.json open https://app.example.com/dashboard @@ -345,7 +345,7 @@ agent-browser session list ### Connect to Existing Chrome ```bash -# Auto-discover running Chrome with remote debugging enabled +# Auto-discover a running Chromium-based browser with remote debugging enabled agent-browser --auto-connect open https://example.com agent-browser --auto-connect snapshot @@ -353,7 +353,7 @@ agent-browser --auto-connect snapshot agent-browser --cdp 9222 snapshot ``` -Auto-connect discovers Chrome via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails. +Auto-connect discovers Chrome, Chromium, Brave, and Helium via `DevToolsActivePort`, common debugging ports (9222, 9229), and falls back to a direct WebSocket connection if HTTP-based CDP discovery fails. ### Color Scheme (Dark Mode) diff --git a/skills/agent-browser/references/authentication.md b/skills/agent-browser/references/authentication.md index 89f4788a1..631ad4ba3 100644 --- a/skills/agent-browser/references/authentication.md +++ b/skills/agent-browser/references/authentication.md @@ -21,14 +21,17 @@ Login flows, session persistence, OAuth, 2FA, and authenticated browsing. ## Import Auth from Your Browser -The fastest way to authenticate is to reuse cookies from a Chrome session you are already logged into. +The fastest way to authenticate is to reuse cookies from a Chromium-based browser session you are already logged into. -**Step 1: Start Chrome with remote debugging** +**Step 1: Start a Chromium-based browser with remote debugging** ```bash # macOS "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222 +# macOS (Helium) +open -na "Helium" --args --remote-debugging-port=9222 + # Linux google-chrome --remote-debugging-port=9222 @@ -43,7 +46,7 @@ Log in to your target site(s) in this Chrome window as you normally would. **Step 2: Grab the auth state** ```bash -# Auto-discover the running Chrome and save its cookies + localStorage +# Auto-discover the running browser and save its cookies + localStorage agent-browser --auto-connect state save ./my-auth.json ``` @@ -58,7 +61,7 @@ agent-browser state load ./my-auth.json agent-browser open https://app.example.com/dashboard ``` -This works for any site, including those with complex OAuth flows, SSO, or 2FA -- as long as Chrome already has valid session cookies. +This works for any site, including those with complex OAuth flows, SSO, or 2FA -- as long as the browser already has valid session cookies. > **Security note:** State files contain session tokens in plaintext. Add them to `.gitignore`, delete when no longer needed, and set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest. See [Security Best Practices](#security-best-practices).