-
Notifications
You must be signed in to change notification settings - Fork 37
Online-Mind2Web Example (Rebased) #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 Documentation updates detected! Updated existing suggestion: Add comprehensive Mind2Web evaluation documentation (updated for PR #156) |
if openai_api_key: | ||
logging.info( | ||
f"DEBUG: Raw key repr: {repr(openai_api_key[:10])}" | ||
) # Show first 50 chars with repr to see any weird characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Logging Mismatch: Key Truncation
The logging statement for openai_api_key
shows only the first 10 characters, but its inline comment indicates it should display the first 50. This mismatch between the code's behavior and its description can be confusing during debugging.
Additional Locations (1)
if main_score >= score_threshold: | ||
# Include high-scoring screenshots in final evaluation | ||
final_images = [] | ||
for screenshot_b64 in screenshot_history: # All screenshots |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Online-Mind2Web Evaluation
The file changes here have been rebased and modified for
hud-python==0.4.50
. Original PR: #145Update
AnthropicComputerToolWithRecord
: Claude computer tool with screenshot recording, using callback functionOpenAIComputerToolWithRecord
OpenAI computer tool with screenshot recording, using callback functionwebjudge_online_mind2web.py]
Evaluating method, using GPT-4o, based on problem description, screenshot history and action history. [Reference: Online-Mind2Web]autonomous_eval.py
: Evaluating method, using GPT-4o, based on problem description, final screenshot and action history. [Reference: Online-Mind2Web]AnthropicComputerTool
,OpenAIComputerTool
with the ones with history recordingUpdates Oct 8, 2025
Online-Mind2Web supported in
remote-browser
envNote
Introduce Anthropic/OpenAI computer tools that auto-save screenshots and record actions, and add Online‑Mind2Web evaluators (autonomous_eval, webjudge, overall_judge) wired into both browser and remote-browser environments.
evaluate/online_mind2web
withautonomous_eval
andwebjudge
using GPT‑4o, leveraging screenshot and action history.environments/browser/server/evaluate/__init__.py
and include in server.evaluate/autonomous_eval.py
,evaluate/webjudge.py
, andevaluate/overall_judge.py
; register viaevaluate
hub.AnthropicComputerToolWithRecord
andOpenAIComputerToolWithRecord
that:/screenshot
on key actions./action_history/action_history.txt
.browser/server/main.py
and remote server wiring; export from local tool packages.hud-python
pin to@main
inbrowser/server/pyproject.toml
.Written by Cursor Bugbot for commit a2cfa28. This will update automatically on new commits. Configure here.