add goto tool to anthropic cua client #1112

tkattkat · 2025-10-07T21:08:47Z

why

Currently, anthropic does not have a goto tool, limiting its ability to navigate the web

what changed

added goto tool to the AnthropicCuaClient

test plan

tested locally & on browserbase

# why solves #1060 patch regression of playwright arguments being removed from agent execute response # what changed agent.execute now returns playwright arguments in its response # test plan tested locally

…ms to docs (#1065) # why reflect project id changes in docs # what changed advanced configuration comments # test plan reviewed via mintlify on localhost

# why Easier to use for Custom LLM Clients and keep users up to date with our aisdk file # what changed added export of aisdk to lib/index.ts # test plan build local stagehand, import local AISdkClient, run Azure Stagehand session

…onfigu… (#1073) …ration settings # why Updated docs to match the new fingerprint params in the Browserbase docs here: https://docs.browserbase.com/guides/stealth-customization#customization-options # what changed Update browser configuration docs to reflect the docs changes. # test plan

# why Updating docs to reflect aisdk can be imported directly # what changed The model page # test plan Reviewed page with mintlify dev locally

# why # what changed # test plan

# why Currently, we do not support stagehand agent within the api # what changed When api is enabled, stagehand agent now routes through the api # test plan Tested locally

# why Currently, using playwright screenshot command is not available when the execution environment is Stagehand. A customer has indicated they would prefer to use Playwright's native screenshot command instead of CDP when using Browserbase as CDP screenshot causes unexpected behavior for their target site. # what changed - added a StagehandScreenshotOptions type with useCDP argument added - extended page type to accept custom stagehand screeenshot options - update screenshot proxy to default useCDP to true if the env is browserbase and use playwright screenshot if false - added eval for screenshot with and without cdp # test plan - tested and confirmed functionality with eval and external example script (not committed)

…1057) # why We want to build a best in class agent in stagehand. Therefore, we need more eval benchmarks. # what changed - Added Web-bench evals dataset - Added a subset of OS World evals - those that can be run in a chrome browser (desktop-based tasks omitted) - added LICENSE noticed to the copied evals tasks - Added ground truth / expected result to some WebVoyager tasks using reference_answer.json from Browser Use public evals repo. Improvements to `pnpm run evals -man` to better describe how to run evals. # test plan Evals should run locally and bb for these new benchmarks.

# why Initial instructions didn't mention uv or pip prerequisites and also didn't mention venv. Fix reduces friction on first timers. # what changed - added link to install uv - added details for initializing venv - adjusted code example respectively # test plan docs change

# why - webpage structure changed, needed to update the xpath in the expected locator

… with LanguageModelV1 + LiteLLM works for python (#1086) # why 1. aisdk not yet available through npm package 2. customLLM provider only works with LanguageModelV1 3. LiteLLM compatible providers are supported in python # what changed 1. change docs to install stagehand from git repo 2. pin versions that use LanguageModelV1 # test plan local test

# why currently we pass stagehand page to agent, this results in our page management having issues when facing new tabs # what changed the stagehand object is now passed instead of stagehandPage # test plan tested locally

# why Our existing screenshot service is a dummy time-based triggered service. It also does not trigger based on any actions of the agent. # what changed Added img hash diff algo (quick check with MSE, verify with SSIM algo) to see if there was an actual UI change and only store ss in the buffer if that is so. Added ss interceptor which copies each screenshot the agent is taking to a buffer (if different enough from the previous ss) to be later used for evals. - There's also a small refactor of the agent initialization config to enable the screenshot collector service to be attached # test plan Tests pass locally --------- Co-authored-by: Miguel <[email protected]> Co-authored-by: miguel <[email protected]>

# why To help make sense of eval test cases and results # what changed Added metadata to eval runs, cleaned deprecated code # test plan

# why # what changed # test plan

# why anthropic released a new sota computer use model # what changed added claude-sonnet-4-5-20250929 as a model to the list # test plan ran evals

Why Custom AI SDK tools and MCP integrations weren't working properly with Anthropic CUA - parameters were empty {} and tools weren't tracked. What Changed - Convert Zod schemas to JSON Schema before sending to Anthropic (using zodToJsonSchema) - Track custom tool calls in the actions array - Silence "Unknown tool name" warnings for custom tools Test Plan Tested with examples file. Parameters passed correctly ({"city":"San Francisco"} instead of {}) Custom tools execute and appear in actions array No warnings

# why To improve context # what changed Added current page and url to the system prompt # test plan

# why To inform the user throughout the agent execution process # what changed Added logs to tool calls, and on the stagehand agent handler # test plan - [x] tested locally

PR to make clearer the dependencies for `extract` (for those who haven't used zod or pydantic before) --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

# why Adding support for Gemini's new Computer Use model # what changed We partnered with Google Deepmind to help integrate and test their new Computer Use models. <img width="1238" height="655" alt="Screenshot 2025-10-07 at 1 14 44 PM" src="https://github.com/user-attachments/assets/af0d854a-8e55-4937-a071-10335497f686" /> The new model tag `gemini-2.5-pro-computer-use-preview-10-2025` is available for Stagehand Agent. You can try it today with the example `cua-example.ts` To learn more, check out the blog post [https://www.browserbase.com/blog/evaluating-browser-agents](https://www.browserbase.com/blog/evaluating-browser-agents) --------- Co-authored-by: tkattkat <[email protected]> Co-authored-by: Kylejeong2 <[email protected]> Co-authored-by: Sameel <[email protected]>

changeset-bot · 2025-10-07T21:08:51Z

🦋 Changeset detected

Latest commit: 044763d

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

greptile-apps

Greptile Overview

Summary

This pull request adds a new "goto" tool to the AnthropicCUAClient to enable URL navigation capabilities. The implementation follows the established pattern for tool integration within the client, adding three key components:

Tool Definition: A new goto tool is added to the API request configuration with proper JSON schema validation, requiring a URL parameter as a string input.
Tool Execution Handling: The takeAction method now includes logic to process goto tool requests, providing basic success/error messaging when the tool is invoked.
Action Conversion: The convertToolUseToAction method can now convert goto tool use items into AgentAction objects with the appropriate function name and URL arguments.

This change addresses a fundamental limitation where the Anthropic CUA client lacked basic web navigation capabilities, which is essential for web automation tasks that need to move between different URLs during execution. The implementation integrates seamlessly with the existing tool architecture and follows the same patterns used by other built-in tools in the client.

Important Files Changed

Changed Files

Filename	Score	Overview
lib/agent/AnthropicCUAClient.ts	2/5	Added goto tool definition, execution handling, and action conversion but missing actual navigation implementation

Confidence score: 2/5

This PR has a critical implementation gap that prevents it from functioning as intended
Score reflects incomplete functionality where the goto tool is defined but doesn't perform actual navigation
Pay close attention to lib/agent/AnthropicCUAClient.ts as it needs the missing action handler call to complete the navigation functionality

Sequence Diagram

sequenceDiagram
    participant User
    participant AnthropicCUAClient
    participant AnthropicAPI
    participant ScreenshotProvider
    participant ActionHandler

    User->>AnthropicCUAClient: "execute(executionOptions)"
    AnthropicCUAClient->>AnthropicCUAClient: "createInitialInputItems(instruction)"
    
    loop "while !completed && currentStep < maxSteps"
        AnthropicCUAClient->>AnthropicCUAClient: "executeStep(inputItems, logger)"
        AnthropicCUAClient->>AnthropicCUAClient: "getAction(inputItems)"
        AnthropicCUAClient->>AnthropicAPI: "beta.messages.create(requestParams with goto tool)"
        AnthropicAPI-->>AnthropicCUAClient: "response with content blocks"
        
        alt "tool_use block found"
            AnthropicCUAClient->>AnthropicCUAClient: "convertToolUseToAction(toolUseItem)"
            
            alt "goto tool"
                AnthropicCUAClient->>AnthropicCUAClient: "return goto action"
            else "computer tool"
                AnthropicCUAClient->>AnthropicCUAClient: "return computer action"
            else "custom tool"
                AnthropicCUAClient->>AnthropicCUAClient: "return custom_tool action"
            end
            
            opt "actionHandler exists"
                AnthropicCUAClient->>ActionHandler: "actionHandler(action)"
                ActionHandler-->>AnthropicCUAClient: "action executed"
            end
            
            AnthropicCUAClient->>AnthropicCUAClient: "takeAction(toolUseItems, logger)"
            
            alt "computer tool"
                AnthropicCUAClient->>ScreenshotProvider: "captureScreenshot()"
                ScreenshotProvider-->>AnthropicCUAClient: "base64 screenshot"
            else "goto tool"
                AnthropicCUAClient->>AnthropicCUAClient: "create success message"
            else "custom tool"
                AnthropicCUAClient->>AnthropicCUAClient: "execute custom tool"
            end
        else "text block found"
            AnthropicCUAClient->>AnthropicCUAClient: "extract message text"
        end
        
        AnthropicCUAClient->>AnthropicCUAClient: "update nextInputItems with assistant response"
    end
    
    AnthropicCUAClient-->>User: "AgentResult with success, actions, message, completed"

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-10-07T21:09:45Z

lib/agent/AnthropicCUAClient.ts

+          if (item.name === "goto") {
+            try {
+              const url = item.input.url;
+              toolResult = `Successfully navigated to ${url}`;
+            } catch (error) {
+              toolResult = `Error with goto: ${error instanceof Error ? error.message : String(error)}`;
+            }


logic: The goto tool claims success but doesn't actually perform navigation. It only extracts the URL and returns a success message without calling any navigation logic.

Prompt To Fix With AI

This is a comment left during a code review. Path: lib/agent/AnthropicCUAClient.ts Line: 604:610 Comment: **logic:** The goto tool claims success but doesn't actually perform navigation. It only extracts the URL and returns a success message without calling any navigation logic. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps

Greptile Overview

Summary

This PR adds a `goto` tool to the `AnthropicCUAClient` class to enable web navigation capabilities for the Anthropic Computer Use Agent (CUA). The implementation follows the established pattern in the codebase by adding the tool definition to the tools array, handling execution in the `takeAction` method, and converting it to an `AgentAction` in the `convertToolUseToAction` method.

The change addresses a stated limitation where Anthropic's CUA lacks navigation functionality, which is essential for web automation tasks. The goto tool is defined with a simple schema requiring only a URL parameter and integrates into the existing tool execution pipeline alongside the computer tool and custom tools.

However, the current implementation is fundamentally incomplete - it only extracts the URL parameter and returns a success message without actually performing any navigation. This means the tool will report successful navigation while the browser remains on the current page, creating a mismatch between the agent's understanding and the actual browser state.

Important Files Changed

Changed Files

Filename	Score	Overview
lib/agent/AnthropicCUAClient.ts	1/5	Added goto tool definition and handling, but implementation only simulates navigation without performing actual browser navigation

Confidence score: 1/5

This PR is not safe to merge due to a critical logic flaw that will cause agent confusion
Score reflects incomplete implementation that claims success while performing no actual navigation
Pay close attention to the goto tool execution logic in AnthropicCUAClient.ts which needs actual navigation implementation

Sequence Diagram

sequenceDiagram
    participant User
    participant AnthropicCUAClient
    participant Anthropic_API
    participant ActionHandler
    participant ScreenshotProvider

    User->>AnthropicCUAClient: "execute(instruction)"
    AnthropicCUAClient->>AnthropicCUAClient: "createInitialInputItems(instruction)"
    
    loop "Until completed or maxSteps reached"
        AnthropicCUAClient->>AnthropicCUAClient: "executeStep(inputItems)"
        AnthropicCUAClient->>Anthropic_API: "client.beta.messages.create(requestParams)"
        Note over AnthropicCUAClient,Anthropic_API: "Includes computer tool and new goto tool"
        Anthropic_API-->>AnthropicCUAClient: "response with content blocks"
        
        AnthropicCUAClient->>AnthropicCUAClient: "convertToolUseToAction(toolUseItem)"
        
        opt "If actionHandler exists and actions found"
            loop "For each action"
                AnthropicCUAClient->>ActionHandler: "actionHandler(action)"
                ActionHandler-->>AnthropicCUAClient: "action completed"
            end
        end
        
        alt "Tool is computer"
            AnthropicCUAClient->>ScreenshotProvider: "captureScreenshot()"
            ScreenshotProvider-->>AnthropicCUAClient: "base64 screenshot"
        else "Tool is goto"
            Note over AnthropicCUAClient: "Process goto URL navigation"
        else "Tool is custom tool"
            AnthropicCUAClient->>AnthropicCUAClient: "tools[toolName].execute(input)"
        end
        
        AnthropicCUAClient->>AnthropicCUAClient: "takeAction(toolUseItems)"
        AnthropicCUAClient->>AnthropicCUAClient: "Update nextInputItems for conversation"
    end
    
    AnthropicCUAClient-->>User: "AgentResult with success, actions, message, completed"

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

tkattkat and others added 26 commits September 10, 2025 13:40

add playwright arguments to agent (#1066)

9daa584

# why solves #1060 patch regression of playwright arguments being removed from agent execute response # what changed agent.execute now returns playwright arguments in its response # test plan tested locally

[docs] add info on not needing project id in browserbase session para…

f6f05b0

…ms to docs (#1065) # why reflect project id changes in docs # what changed advanced configuration comments # test plan reviewed via mintlify on localhost

Export aisdk (#1058)

c886544

# why Easier to use for Custom LLM Clients and keep users up to date with our aisdk file # what changed added export of aisdk to lib/index.ts # test plan build local stagehand, import local AISdkClient, run Azure Stagehand session

[docs] export aisdk (#1074)

3c39a05

# why Updating docs to reflect aisdk can be imported directly # what changed The model page # test plan Reviewed page with mintlify dev locally

Fix zod peer dependency support (#1032)

bf2d0e7

# why # what changed # test plan

add stagehand agent to api (#1077)

7f38b3a

# why Currently, we do not support stagehand agent within the api # what changed When api is enabled, stagehand agent now routes through the api # test plan Tested locally

update xpath in observe_vantechjournal (#1088)

b9c8102

# why - webpage structure changed, needed to update the xpath in the expected locator

Fix session create logs on api (#1089)

536f366

Improve failed act logs (#1090)

8ff5c5a

pass stagehand, instead of stagehandPage to agent (#1082)

8c0fd01

# why currently we pass stagehand page to agent, this results in our page management having issues when facing new tabs # what changed the stagehand object is now passed instead of stagehandPage # test plan tested locally

Eval metadata (#1092)

f89b13e

# why To help make sense of eval test cases and results # what changed Added metadata to eval runs, cleaned deprecated code # test plan

update evals cli docs (#1096)

108de3c

# why # what changed # test plan

adding support for new claude 4.5 sonnet agent model (#1099)

e0e6b30

# why anthropic released a new sota computer use model # what changed added claude-sonnet-4-5-20250929 as a model to the list # test plan ran evals

Add current date and page url to agent context (#1102)

a99aa48

# why To improve context # what changed Added current page and url to the system prompt # test plan

Additional agent logging (#1104)

a1ad06c

# why To inform the user throughout the agent execution process # what changed Added logs to tool calls, and on the stagehand agent handler # test plan - [x] tested locally

Include import statements in extract code examples (#1105)

0791404

PR to make clearer the dependencies for `extract` (for those who haven't used zod or pydantic before) --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

add goto tool to anthropic cua client

88b3644

greptile-apps bot reviewed Oct 7, 2025

View reviewed changes

tkattkat marked this pull request as draft October 7, 2025 21:17

tkattkat marked this pull request as ready for review October 7, 2025 21:43

greptile-apps bot reviewed Oct 7, 2025

View reviewed changes

changeset

044763d

miguelg719 added act These changes pertain to the act function extract These changes pertain to the extract function observe These changes pertain to the observe function labels Oct 23, 2025

miguelg719 force-pushed the main branch from 4994eab to bd0a799 Compare October 29, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add goto tool to anthropic cua client #1112

add goto tool to anthropic cua client #1112

Uh oh!

tkattkat commented Oct 7, 2025

Uh oh!

changeset-bot bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Oct 7, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Uh oh!

add goto tool to anthropic cua client #1112

Are you sure you want to change the base?

add goto tool to anthropic cua client #1112

Uh oh!

Conversation

tkattkat commented Oct 7, 2025

why

what changed

test plan

Uh oh!

changeset-bot bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Important Files Changed

Confidence score: 2/5

Sequence Diagram

Uh oh!

greptile-apps bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Summary

Important Files Changed

Confidence score: 1/5

Sequence Diagram

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

changeset-bot bot commented Oct 7, 2025 •

edited

Loading