New Env: Online-Mind2Web #168
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original #156
A hud remote-browser based environment for Online-Mind2Web dataset.
Note
Introduces a new Online-Mind2Web environment with a remote-browser MCP server, multi-provider support, setup/eval tools, recording, and Docker packaging.
environments/online_mind2web.gitignore, andtest_task.json.src/hud_controller/server.py)context.py)providers/)AnchorBrowserProvider,BrowserBaseProvider,HyperBrowserProvider,SteelProviderwith launch/close/status APIs and live view URLs; registry and proxy helper.tools/)BrowserExecutor: maps computer actions to Playwright.AnthropicComputerToolWithRecordandOpenAIComputerToolWithRecord: add screenshot/history recording to/screenshotand/action_history.setup/)navigate_to_url,set_cookies/clear_cookies,click_element/fill_input/select_option).evaluate/)webjudge(multi-screenshot + keypoint analysis via GPT-4o),autonomous_eval(single-screenshot VLM check), andoverall_judge(aggregate) returningEvaluationResult.problems/)hud-om2w; instructions to eval local JSON or HuggingFace dataset.Written by Cursor Bugbot for commit f70d067. This will update automatically on new commits. Configure here.