New Env: Online-Mind2Web #168

Genteki · 2025-10-13T19:38:43Z

Original #156

A hud remote-browser based environment for Online-Mind2Web dataset.

Note

Introduces a new Online-Mind2Web environment with a remote-browser MCP server, multi-provider support, setup/eval tools, recording, and Docker packaging.

Environment: environments/online_mind2web
- Adds pyproject, Dockerfile, README, .gitignore, and test_task.json.
MCP Server (src/hud_controller/server.py)
- Boots a remote-browser MCP server, attaches a persistent context, initializes provider, Playwright, and computer tools; exposes telemetry and graceful shutdown.
Persistent Context (context.py)
- New context server to persist browser/provider state, telemetry, and Playwright handle across hot-reloads.
Providers (providers/)
- Implements AnchorBrowserProvider, BrowserBaseProvider, HyperBrowserProvider, SteelProvider with launch/close/status APIs and live view URLs; registry and proxy helper.
Tools (tools/)
- BrowserExecutor: maps computer actions to Playwright.
- AnthropicComputerToolWithRecord and OpenAIComputerToolWithRecord: add screenshot/history recording to /screenshot and /action_history.
Setup Hub (setup/)
- Tools for navigation, cookies, and basic interactions (navigate_to_url, set_cookies/clear_cookies, click_element/fill_input/select_option).
Evaluation Hub (evaluate/)
- webjudge (multi-screenshot + keypoint analysis via GPT-4o), autonomous_eval (single-screenshot VLM check), and overall_judge (aggregate) returning EvaluationResult.
Problems (problems/)
- Registry/decorator plus sample problems: navigate/verify, form interaction, button click, Google search.
Packaging & Run
- Docker image to start context + MCP server; script entry hud-om2w; instructions to eval local JSON or HuggingFace dataset.

^{Written by Cursor Bugbot for commit f70d067. This will update automatically on new commits. Configure here.}

promptless · 2025-10-13T19:57:42Z

📝 Documentation updates detected!

New suggestion: Add comprehensive Online-Mind2Web environment documentation for PR #168
Updated existing suggestion: Add comprehensive Mind2Web evaluation documentation (updated for PR #156)

Create Online-Mind2Web folder

c9666f9

This comment was marked as outdated.

Sign in to view

reformat

96e5448

This comment was marked as outdated.

Sign in to view

Add evaluate tool for choice

88a041b

This comment was marked as outdated.

Sign in to view

Genteki changed the title ~~Online-Mind2Web Folder~~ New Env: Online-Mind2Web Oct 23, 2025

Merge branch 'hud-evals:main' into Online-Mind2Web

f70d067

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New Env: Online-Mind2Web #168

New Env: Online-Mind2Web #168

Uh oh!

Genteki commented Oct 13, 2025 •

edited by cursor bot

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Oct 13, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

New Env: Online-Mind2Web #168

Are you sure you want to change the base?

New Env: Online-Mind2Web #168

Uh oh!

Conversation

Genteki commented Oct 13, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

promptless bot commented Oct 13, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Genteki commented Oct 13, 2025 •

edited by cursor bot

Loading