Stagehand agent improvements #1094

tkattkat · 2025-09-23T22:41:17Z

why

This PR enhances the Stagehand agent with model routing, expanded toolset, and more robust context management to improve performance and reliability across different LLM providers.

what changed

Model Routing

Model-specific tool filtering: Tools are now dynamically included/excluded based on the model being used
Anthropic-optimized toolset: When using Claude models with storeActions: false, enables specialized tools for better performance
Custom system prompts: Different system prompts are applied based on the model to optimize behavior

New Tools Added

Anthropic-Specific Tools (enabled when storeActions: false)

clickAndHold: Performs click and hold actions with coordinate precision
type: coordinate based typing
click: Precise coordinate-based clicking
dragAndDrop: drag and drop functionality

Model-Agnostic Tools

think: Allows the agent to reason through problems before acting
keys: Keyboard input handling for complex key combinations
search: Web search capability (auto-enabled when EXA_API_KEY is provided)

Enhanced Context Management

Image optimization: Automatically removes old images, keeping only the 2 most recent
A11y tree management: Maintains only the 2 most recent accessibility trees
checkpointing: Creates conversation summaries every 25 tool calls
Token-based summarization: When context exceeds 120,000 tokens, automatically summarizes content

Enhanced Type Safety

Discriminated union types: AgentToolCall and AgentToolResult provide complete type safety
Tool-specific typing: Each tool has strongly typed parameters and return values

test plan

tested locally
tested on browserbase
tested with exa key, and without to ensure search tool is only present in prompt & tools when key is present
tested with claude 4 / non to ensure system prompt / tools are properly routed based on models being used

# why solves #1060 patch regression of playwright arguments being removed from agent execute response # what changed agent.execute now returns playwright arguments in its response # test plan tested locally

…ms to docs (#1065) # why reflect project id changes in docs # what changed advanced configuration comments # test plan reviewed via mintlify on localhost

# why Easier to use for Custom LLM Clients and keep users up to date with our aisdk file # what changed added export of aisdk to lib/index.ts # test plan build local stagehand, import local AISdkClient, run Azure Stagehand session

…onfigu… (#1073) …ration settings # why Updated docs to match the new fingerprint params in the Browserbase docs here: https://docs.browserbase.com/guides/stealth-customization#customization-options # what changed Update browser configuration docs to reflect the docs changes. # test plan

# why Updating docs to reflect aisdk can be imported directly # what changed The model page # test plan Reviewed page with mintlify dev locally

# why # what changed # test plan

# why Currently, we do not support stagehand agent within the api # what changed When api is enabled, stagehand agent now routes through the api # test plan Tested locally

# why Currently, using playwright screenshot command is not available when the execution environment is Stagehand. A customer has indicated they would prefer to use Playwright's native screenshot command instead of CDP when using Browserbase as CDP screenshot causes unexpected behavior for their target site. # what changed - added a StagehandScreenshotOptions type with useCDP argument added - extended page type to accept custom stagehand screeenshot options - update screenshot proxy to default useCDP to true if the env is browserbase and use playwright screenshot if false - added eval for screenshot with and without cdp # test plan - tested and confirmed functionality with eval and external example script (not committed)

…1057) # why We want to build a best in class agent in stagehand. Therefore, we need more eval benchmarks. # what changed - Added Web-bench evals dataset - Added a subset of OS World evals - those that can be run in a chrome browser (desktop-based tasks omitted) - added LICENSE noticed to the copied evals tasks - Added ground truth / expected result to some WebVoyager tasks using reference_answer.json from Browser Use public evals repo. Improvements to `pnpm run evals -man` to better describe how to run evals. # test plan Evals should run locally and bb for these new benchmarks.

# why Initial instructions didn't mention uv or pip prerequisites and also didn't mention venv. Fix reduces friction on first timers. # what changed - added link to install uv - added details for initializing venv - adjusted code example respectively # test plan docs change

# why - webpage structure changed, needed to update the xpath in the expected locator

… with LanguageModelV1 + LiteLLM works for python (#1086) # why 1. aisdk not yet available through npm package 2. customLLM provider only works with LanguageModelV1 3. LiteLLM compatible providers are supported in python # what changed 1. change docs to install stagehand from git repo 2. pin versions that use LanguageModelV1 # test plan local test

# why currently we pass stagehand page to agent, this results in our page management having issues when facing new tabs # what changed the stagehand object is now passed instead of stagehandPage # test plan tested locally

…vements

# why Our existing screenshot service is a dummy time-based triggered service. It also does not trigger based on any actions of the agent. # what changed Added img hash diff algo (quick check with MSE, verify with SSIM algo) to see if there was an actual UI change and only store ss in the buffer if that is so. Added ss interceptor which copies each screenshot the agent is taking to a buffer (if different enough from the previous ss) to be later used for evals. - There's also a small refactor of the agent initialization config to enable the screenshot collector service to be attached # test plan Tests pass locally --------- Co-authored-by: Miguel <[email protected]> Co-authored-by: miguel <[email protected]>

# why # what changed # test plan

# why anthropic released a new sota computer use model # what changed added claude-sonnet-4-5-20250929 as a model to the list # test plan ran evals

Why Custom AI SDK tools and MCP integrations weren't working properly with Anthropic CUA - parameters were empty {} and tools weren't tracked. What Changed - Convert Zod schemas to JSON Schema before sending to Anthropic (using zodToJsonSchema) - Track custom tool calls in the actions array - Silence "Unknown tool name" warnings for custom tools Test Plan Tested with examples file. Parameters passed correctly ({"city":"San Francisco"} instead of {}) Custom tools execute and appear in actions array No warnings

# why To improve context # what changed Added current page and url to the system prompt # test plan

# why To inform the user throughout the agent execution process # what changed Added logs to tool calls, and on the stagehand agent handler # test plan - [x] tested locally

PR to make clearer the dependencies for `extract` (for those who haven't used zod or pydantic before) --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

# why Adding support for Gemini's new Computer Use model # what changed We partnered with Google Deepmind to help integrate and test their new Computer Use models. <img width="1238" height="655" alt="Screenshot 2025-10-07 at 1 14 44 PM" src="https://github.com/user-attachments/assets/af0d854a-8e55-4937-a071-10335497f686" /> The new model tag `gemini-2.5-pro-computer-use-preview-10-2025` is available for Stagehand Agent. You can try it today with the example `cua-example.ts` To learn more, check out the blog post [https://www.browserbase.com/blog/evaluating-browser-agents](https://www.browserbase.com/blog/evaluating-browser-agents) --------- Co-authored-by: tkattkat <[email protected]> Co-authored-by: Kylejeong2 <[email protected]> Co-authored-by: Sameel <[email protected]>

# why # what changed # test plan

@tkattkat

This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1082](#1082) [`8c0fd01`](8c0fd01) Thanks [@tkattkat](https://github.com/tkattkat)! - Pass stagehand object to agent instead of stagehand page - [#1104](#1104) [`a1ad06c`](a1ad06c) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix logging for stagehand agent - [#1066](#1066) [`9daa584`](9daa584) Thanks [@tkattkat](https://github.com/tkattkat)! - Add playwright arguments to agent execute response - [#1077](#1077) [`7f38b3a`](7f38b3a) Thanks [@tkattkat](https://github.com/tkattkat)! - adds support for stagehand agent in the api - [#1032](#1032) [`bf2d0e7`](bf2d0e7) Thanks [@miguelg719](https://github.com/miguelg719)! - Fix for zod peer dependency support - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - Replace operator handler with base of new agent - [#1089](#1089) [`536f366`](536f366) Thanks [@miguelg719](https://github.com/miguelg719)! - Fixed info logs on api session create - [#1103](#1103) [`889cb6c`](889cb6c) Thanks [@tkattkat](https://github.com/tkattkat)! - patch custom tool support in anthropic cua client - [#1056](#1056) [`6a002b2`](6a002b2) Thanks [@chrisreadsf](https://github.com/chrisreadsf)! - remove need for duplicate project id if already passed to Stagehand - [#1090](#1090) [`8ff5c5a`](8ff5c5a) Thanks [@miguelg719](https://github.com/miguelg719)! - Improve failed act error logs - [#1014](#1014) [`6966201`](6966201) Thanks [@tkattkat](https://github.com/tkattkat)! - replace operator agent with scaffold for new stagehand agent - [#1107](#1107) [`3ccf335`](3ccf335) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix: url extraction not working inside an array - [#1102](#1102) [`a99aa48`](a99aa48) Thanks [@miguelg719](https://github.com/miguelg719)! - Add current page and date context to agent - [#1110](#1110) [`dda52f1`](dda52f1) Thanks [@miguelg719](https://github.com/miguelg719)! - Add support for new Gemini Computer Use models ## @browserbasehq/[email protected] ### Minor Changes - [#1057](#1057) [`b7be89e`](b7be89e) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - added web voyager ground truth (optional), added web bench, and subset of OSWorld evals which run on a browser ### Patch Changes - [#1072](#1072) [`dc2d420`](dc2d420) Thanks [@filip-michalsky](https://github.com/filip-michalsky)! - improve evals screenshot service - add img hashing diff to add screenshots and change to screenshot intercepts from the agent - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`8c0fd01`](8c0fd01), [`a1ad06c`](a1ad06c), [`9daa584`](9daa584), [`7f38b3a`](7f38b3a), [`bf2d0e7`](bf2d0e7), [`6966201`](6966201), [`536f366`](536f366), [`889cb6c`](889cb6c), [`6a002b2`](6a002b2), [`8ff5c5a`](8ff5c5a), [`6966201`](6966201), [`3ccf335`](3ccf335), [`a99aa48`](a99aa48), [`dda52f1`](dda52f1)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

# why The original example used JavaScript destructuring syntax [table] which doesn't work in Python. Fixed to use proper Python array indexing. # what changed fixed example to proper python syntax # test plan Co-authored-by: Steven Bryan <[email protected]>

# why - need to set default viewport when running on browserbase. previously, we only defined the default inside the exported `StagehandConfig` # what changed - set default viewport to 1288 * 711 when running on browserbase # test plan - tested locally, - regression evals

@seanmcguire12

This PR was opened by the [Changesets release](https://github.com/changesets/action) GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated. # Releases ## @browserbasehq/[email protected] ### Patch Changes - [#1114](#1114) [`c0fbc51`](c0fbc51) Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - configure default viewport when running on browserbase ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`c0fbc51`](c0fbc51)]: - @browserbasehq/[email protected] ## @browserbasehq/[email protected] ### Patch Changes - Updated dependencies \[[`c0fbc51`](c0fbc51)]: - @browserbasehq/[email protected] Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

…vements

tkattkat and others added 30 commits September 10, 2025 13:40

add playwright arguments to agent (#1066)

9daa584

# why solves #1060 patch regression of playwright arguments being removed from agent execute response # what changed agent.execute now returns playwright arguments in its response # test plan tested locally

[docs] add info on not needing project id in browserbase session para…

f6f05b0

…ms to docs (#1065) # why reflect project id changes in docs # what changed advanced configuration comments # test plan reviewed via mintlify on localhost

Export aisdk (#1058)

c886544

# why Easier to use for Custom LLM Clients and keep users up to date with our aisdk file # what changed added export of aisdk to lib/index.ts # test plan build local stagehand, import local AISdkClient, run Azure Stagehand session

[docs] export aisdk (#1074)

3c39a05

# why Updating docs to reflect aisdk can be imported directly # what changed The model page # test plan Reviewed page with mintlify dev locally

Fix zod peer dependency support (#1032)

bf2d0e7

# why # what changed # test plan

add stagehand agent to api (#1077)

7f38b3a

# why Currently, we do not support stagehand agent within the api # what changed When api is enabled, stagehand agent now routes through the api # test plan Tested locally

update xpath in observe_vantechjournal (#1088)

b9c8102

# why - webpage structure changed, needed to update the xpath in the expected locator

Fix session create logs on api (#1089)

536f366

Improve failed act logs (#1090)

8ff5c5a

initial commit

13b0603

update agent types

c097fba

update logger type

ae974a5

update log levels

4b26bf1

remove logger helper, and use inline

d6434d6

extract changes

a4b277e

remove unnecessary return values from tools

0f376a2

clean up action handler types

5801b85

remove aria tree caching

9ad0e6d

move system prompt to prompt.ts

5dc0d1d

pass stagehand, instead of stagehandPage to agent (#1082)

8c0fd01

# why currently we pass stagehand page to agent, this results in our page management having issues when facing new tabs # what changed the stagehand object is now passed instead of stagehandPage # test plan tested locally

Merge remote-tracking branch 'origin/main' into stagehand-agent-impro…

edce0cc

…vements

remove unnecessary type casting

b2742dd

remove more unnecessary type casting

ab1ec2d

update aria tool

01a7de3

tkattkat and others added 29 commits September 24, 2025 12:26

add back model routing

529f226

add move

f8fdd5c

prompt changes

0125390

update prompts

a7bf3a7

adjust prompt

97cba02

increase checkpoint interval

eb18df9

update prompt

937b378

update prompt

ab941b1

update evals cli docs (#1096)

108de3c

# why # what changed # test plan

change scroll to be percentage based

a77eeea

adjust prompt

d1821cb

update screenshot tool

ff0d942

adding support for new claude 4.5 sonnet agent model (#1099)

e0e6b30

# why anthropic released a new sota computer use model # what changed added claude-sonnet-4-5-20250929 as a model to the list # test plan ran evals

custom tool schema sbased on model

919eb3b

update tool routing

057bdca

update scroll

9a8ee76

Add current date and page url to agent context (#1102)

a99aa48

# why To improve context # what changed Added current page and url to the system prompt # test plan

Additional agent logging (#1104)

a1ad06c

# why To inform the user throughout the agent execution process # what changed Added logs to tool calls, and on the stagehand agent handler # test plan - [x] tested locally

Include import statements in extract code examples (#1105)

0791404

PR to make clearer the dependencies for `extract` (for those who haven't used zod or pydantic before) --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

google cua docs (#1111)

9a29937

# why # what changed # test plan

Merge remote-tracking branch 'origin/main' into stagehand-agent-impro…

46c0fa2

…vements

fix type errors from merge

ceae3bd

miguelg719 force-pushed the main branch from 4994eab to bd0a799 Compare October 29, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stagehand agent improvements #1094

Stagehand agent improvements #1094

Uh oh!

tkattkat commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Stagehand agent improvements #1094

Are you sure you want to change the base?

Stagehand agent improvements #1094

Uh oh!

Conversation

tkattkat commented Sep 23, 2025

why

what changed

test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants