Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
629 changes: 223 additions & 406 deletions .agents/skills/agent-browser/SKILL.md

Large diffs are not rendered by default.

16 changes: 1 addition & 15 deletions .agents/skills/agent-browser/references/authentication.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,6 @@
# Authentication Patterns

Login flows, session persistence, OAuth, 2FA, and authenticated browsing.

**Related**: [session-management.md](session-management.md) for state persistence details, [SKILL.md](../SKILL.md) for quick start.

## Contents

- [Basic Login Flow](#basic-login-flow)
- [Saving Authentication State](#saving-authentication-state)
- [Restoring Authentication](#restoring-authentication)
- [OAuth / SSO Flows](#oauth--sso-flows)
- [Two-Factor Authentication](#two-factor-authentication)
- [HTTP Basic Auth](#http-basic-auth)
- [Cookie-Based Auth](#cookie-based-auth)
- [Token Refresh Handling](#token-refresh-handling)
- [Security Best Practices](#security-best-practices)
Patterns for handling login flows, session persistence, and authenticated browsing.

## Basic Login Flow

Expand Down
29 changes: 5 additions & 24 deletions .agents/skills/agent-browser/references/proxy-support.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,13 @@
# Proxy Support

Proxy configuration for geo-testing, rate limiting avoidance, and corporate environments.

**Related**: [commands.md](commands.md) for global options, [SKILL.md](../SKILL.md) for quick start.

## Contents

- [Basic Proxy Configuration](#basic-proxy-configuration)
- [Authenticated Proxy](#authenticated-proxy)
- [SOCKS Proxy](#socks-proxy)
- [Proxy Bypass](#proxy-bypass)
- [Common Use Cases](#common-use-cases)
- [Verifying Proxy Connection](#verifying-proxy-connection)
- [Troubleshooting](#troubleshooting)
- [Best Practices](#best-practices)
Configure proxy servers for browser automation, useful for geo-testing, rate limiting avoidance, and corporate environments.

## Basic Proxy Configuration

Use the `--proxy` flag or set proxy via environment variable:
Set proxy via environment variable before starting:

```bash
# Via CLI flag
agent-browser --proxy "http://proxy.example.com:8080" open https://example.com

# Via environment variable
# HTTP proxy
export HTTP_PROXY="http://proxy.example.com:8080"
agent-browser open https://example.com

Expand Down Expand Up @@ -61,13 +45,10 @@ agent-browser open https://example.com

## Proxy Bypass

Skip proxy for specific domains using `--proxy-bypass` or `NO_PROXY`:
Skip proxy for specific domains:

```bash
# Via CLI flag
agent-browser --proxy "http://proxy.example.com:8080" --proxy-bypass "localhost,*.internal.com" open https://example.com

# Via environment variable
# Bypass proxy for local addresses
export NO_PROXY="localhost,127.0.0.1,.internal.company.com"
agent-browser open https://internal.company.com # Direct connection
agent-browser open https://external.com # Via proxy
Expand Down
14 changes: 1 addition & 13 deletions .agents/skills/agent-browser/references/session-management.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,6 @@
# Session Management

Multiple isolated browser sessions with state persistence and concurrent browsing.

**Related**: [authentication.md](authentication.md) for login patterns, [SKILL.md](../SKILL.md) for quick start.

## Contents

- [Named Sessions](#named-sessions)
- [Session Isolation Properties](#session-isolation-properties)
- [Session State Persistence](#session-state-persistence)
- [Common Patterns](#common-patterns)
- [Default Session](#default-session)
- [Session Cleanup](#session-cleanup)
- [Best Practices](#best-practices)
Run multiple isolated browser sessions concurrently with state persistence.

## Named Sessions

Expand Down
34 changes: 13 additions & 21 deletions .agents/skills/agent-browser/references/snapshot-refs.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,21 @@
# Snapshot and Refs
# Snapshot + Refs Workflow

Compact element references that reduce context usage dramatically for AI agents.
The core innovation of agent-browser: compact element references that reduce context usage dramatically for AI agents.

**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
## How It Works

## Contents

- [How Refs Work](#how-refs-work)
- [Snapshot Command](#the-snapshot-command)
- [Using Refs](#using-refs)
- [Ref Lifecycle](#ref-lifecycle)
- [Best Practices](#best-practices)
- [Ref Notation Details](#ref-notation-details)
- [Troubleshooting](#troubleshooting)

## How Refs Work

Traditional approach:
### The Problem
Traditional browser automation sends full DOM to AI agents:
```
Full DOM/HTML → AI parses → CSS selector → Action (~3000-5000 tokens)
Full DOM/HTML sent → AI parses → Generates CSS selector → Executes action
~3000-5000 tokens per interaction
```

agent-browser approach:
### The Solution
agent-browser uses compact snapshots with refs:
```
Compact snapshot → @refs assigned → Direct interaction (~200-400 tokens)
Compact snapshot → @refs assigned → Direct ref interaction
~200-400 tokens per interaction
```

## The Snapshot Command
Expand Down Expand Up @@ -174,8 +166,8 @@ agent-browser snapshot -i
### Element Not Visible in Snapshot

```bash
# Scroll down to reveal element
agent-browser scroll down 1000
# Scroll to reveal element
agent-browser scroll --bottom
agent-browser snapshot -i

# Or wait for dynamic content
Expand Down
13 changes: 1 addition & 12 deletions .agents/skills/agent-browser/references/video-recording.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,6 @@
# Video Recording

Capture browser automation as video for debugging, documentation, or verification.

**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.

## Contents

- [Basic Recording](#basic-recording)
- [Recording Commands](#recording-commands)
- [Use Cases](#use-cases)
- [Best Practices](#best-practices)
- [Output Format](#output-format)
- [Limitations](#limitations)
Capture browser automation sessions as video for debugging, documentation, or verification.

## Basic Recording

Expand Down
86 changes: 36 additions & 50 deletions .agents/skills/agent-browser/templates/authenticated-session.sh
Original file line number Diff line number Diff line change
@@ -1,81 +1,67 @@
#!/bin/bash
# Template: Authenticated Session Workflow
# Purpose: Login once, save state, reuse for subsequent runs
# Usage: ./authenticated-session.sh <login-url> [state-file]
# Login once, save state, reuse for subsequent runs
#
# RECOMMENDED: Use the auth vault instead of this template:
# echo "<pass>" | agent-browser auth save myapp --url <login-url> --username <user> --password-stdin
# agent-browser auth login myapp
# The auth vault stores credentials securely and the LLM never sees passwords.
# Usage:
# ./authenticated-session.sh <login-url> [state-file]
#
# Environment variables:
# APP_USERNAME - Login username/email
# APP_PASSWORD - Login password
#
# Two modes:
# 1. Discovery mode (default): Shows form structure so you can identify refs
# 2. Login mode: Performs actual login after you update the refs
#
# Setup steps:
# 1. Run once to see form structure (discovery mode)
# 2. Update refs in LOGIN FLOW section below
# 3. Set APP_USERNAME and APP_PASSWORD
# 4. Delete the DISCOVERY section
# Setup:
# 1. Run once to see your form structure
# 2. Note the @refs for your fields
# 3. Uncomment LOGIN FLOW section and update refs

set -euo pipefail

LOGIN_URL="${1:?Usage: $0 <login-url> [state-file]}"
STATE_FILE="${2:-./auth-state.json}"

echo "Authentication workflow: $LOGIN_URL"
echo "Authentication workflow for: $LOGIN_URL"

# ================================================================
# SAVED STATE: Skip login if valid saved state exists
# ================================================================
# ══════════════════════════════════════════════════════════════
# SAVED STATE: Skip login if we have valid saved state
# ══════════════════════════════════════════════════════════════
if [[ -f "$STATE_FILE" ]]; then
echo "Loading saved state from $STATE_FILE..."
if agent-browser --state "$STATE_FILE" open "$LOGIN_URL" 2>/dev/null; then
agent-browser wait --load networkidle
echo "Loading saved authentication state..."
agent-browser state load "$STATE_FILE"
agent-browser open "$LOGIN_URL"
agent-browser wait --load networkidle

CURRENT_URL=$(agent-browser get url)
if [[ "$CURRENT_URL" != *"login"* ]] && [[ "$CURRENT_URL" != *"signin"* ]]; then
echo "Session restored successfully"
agent-browser snapshot -i
exit 0
fi
echo "Session expired, performing fresh login..."
agent-browser close 2>/dev/null || true
else
echo "Failed to load state, re-authenticating..."
CURRENT_URL=$(agent-browser get url)
if [[ "$CURRENT_URL" != *"login"* ]] && [[ "$CURRENT_URL" != *"signin"* ]]; then
echo "Session restored successfully!"
agent-browser snapshot -i
exit 0
fi
echo "Session expired, performing fresh login..."
rm -f "$STATE_FILE"
fi

# ================================================================
# DISCOVERY MODE: Shows form structure (delete after setup)
# ================================================================
# ══════════════════════════════════════════════════════════════
# DISCOVERY MODE: Show form structure (remove after setup)
# ══════════════════════════════════════════════════════════════
echo "Opening login page..."
agent-browser open "$LOGIN_URL"
agent-browser wait --load networkidle

echo ""
echo "Login form structure:"
echo "---"
echo "┌─────────────────────────────────────────────────────────┐"
echo "│ LOGIN FORM STRUCTURE │"
echo "├─────────────────────────────────────────────────────────┤"
agent-browser snapshot -i
echo "---"
echo "└─────────────────────────────────────────────────────────┘"
echo ""
echo "Next steps:"
echo " 1. Note the refs: username=@e?, password=@e?, submit=@e?"
echo " 2. Update the LOGIN FLOW section below with your refs"
echo " 3. Set: export APP_USERNAME='...' APP_PASSWORD='...'"
echo " 1. Note refs: @e? = username, @e? = password, @e? = submit"
echo " 2. Uncomment LOGIN FLOW section below"
echo " 3. Replace @e1, @e2, @e3 with your refs"
echo " 4. Delete this DISCOVERY MODE section"
echo ""
agent-browser close
exit 0

# ================================================================
# ══════════════════════════════════════════════════════════════
# LOGIN FLOW: Uncomment and customize after discovery
# ================================================================
# ══════════════════════════════════════════════════════════════
# : "${APP_USERNAME:?Set APP_USERNAME environment variable}"
# : "${APP_PASSWORD:?Set APP_PASSWORD environment variable}"
#
Expand All @@ -92,14 +78,14 @@ exit 0
# # Verify login succeeded
# FINAL_URL=$(agent-browser get url)
# if [[ "$FINAL_URL" == *"login"* ]] || [[ "$FINAL_URL" == *"signin"* ]]; then
# echo "Login failed - still on login page"
# echo "ERROR: Login failed - still on login page"
# agent-browser screenshot /tmp/login-failed.png
# agent-browser close
# exit 1
# fi
#
# # Save state for future runs
# echo "Saving state to $STATE_FILE"
# echo "Saving authentication state to: $STATE_FILE"
# agent-browser state save "$STATE_FILE"
# echo "Login successful"
# echo "Login successful!"
# agent-browser snapshot -i
71 changes: 35 additions & 36 deletions .agents/skills/agent-browser/templates/capture-workflow.sh
Original file line number Diff line number Diff line change
@@ -1,69 +1,68 @@
#!/bin/bash
# Template: Content Capture Workflow
# Purpose: Extract content from web pages (text, screenshots, PDF)
# Usage: ./capture-workflow.sh <url> [output-dir]
#
# Outputs:
# - page-full.png: Full page screenshot
# - page-structure.txt: Page element structure with refs
# - page-text.txt: All text content
# - page.pdf: PDF version
#
# Optional: Load auth state for protected pages
# Extract content from web pages with optional authentication

set -euo pipefail

TARGET_URL="${1:?Usage: $0 <url> [output-dir]}"
OUTPUT_DIR="${2:-.}"

echo "Capturing: $TARGET_URL"
echo "Capturing content from: $TARGET_URL"
mkdir -p "$OUTPUT_DIR"

# Optional: Load authentication state
# Optional: Load authentication state if needed
# if [[ -f "./auth-state.json" ]]; then
# echo "Loading authentication state..."
# agent-browser state load "./auth-state.json"
# fi

# Navigate to target
# Navigate to target page
agent-browser open "$TARGET_URL"
agent-browser wait --load networkidle

# Get metadata
TITLE=$(agent-browser get title)
URL=$(agent-browser get url)
echo "Title: $TITLE"
echo "URL: $URL"
# Get page metadata
echo "Page title: $(agent-browser get title)"
echo "Page URL: $(agent-browser get url)"

# Capture full page screenshot
agent-browser screenshot --full "$OUTPUT_DIR/page-full.png"
echo "Saved: $OUTPUT_DIR/page-full.png"
echo "Screenshot saved: $OUTPUT_DIR/page-full.png"

# Get page structure with refs
# Get page structure
agent-browser snapshot -i > "$OUTPUT_DIR/page-structure.txt"
echo "Saved: $OUTPUT_DIR/page-structure.txt"
echo "Structure saved: $OUTPUT_DIR/page-structure.txt"

# Extract all text content
# Extract main content
# Adjust selector based on target site structure
# agent-browser get text @e1 > "$OUTPUT_DIR/main-content.txt"

# Extract specific elements (uncomment as needed)
# agent-browser get text "article" > "$OUTPUT_DIR/article.txt"
# agent-browser get text "main" > "$OUTPUT_DIR/main.txt"
# agent-browser get text ".content" > "$OUTPUT_DIR/content.txt"

# Get full page text
agent-browser get text body > "$OUTPUT_DIR/page-text.txt"
echo "Saved: $OUTPUT_DIR/page-text.txt"
echo "Text content saved: $OUTPUT_DIR/page-text.txt"

# Save as PDF
# Optional: Save as PDF
agent-browser pdf "$OUTPUT_DIR/page.pdf"
echo "Saved: $OUTPUT_DIR/page.pdf"

# Optional: Extract specific elements using refs from structure
# agent-browser get text @e5 > "$OUTPUT_DIR/main-content.txt"
echo "PDF saved: $OUTPUT_DIR/page.pdf"

# Optional: Handle infinite scroll pages
# for i in {1..5}; do
# agent-browser scroll down 1000
# agent-browser wait 1000
# done
# agent-browser screenshot --full "$OUTPUT_DIR/page-scrolled.png"
# Optional: Capture with scrolling for infinite scroll pages
# scroll_and_capture() {
# local count=0
# while [[ $count -lt 5 ]]; do
# agent-browser scroll down 1000
# agent-browser wait 1000
# ((count++))
# done
# agent-browser screenshot --full "$OUTPUT_DIR/page-scrolled.png"
# }
# scroll_and_capture

# Cleanup
agent-browser close

echo ""
echo "Capture complete:"
echo "Capture complete! Files saved to: $OUTPUT_DIR"
ls -la "$OUTPUT_DIR"
Loading
Loading