Skip to content

v0.1

Choose a tag to compare

@Jeomon Jeomon released this 17 Jun 16:46
· 56 commits to main since this release

Key Features & Updates

  • Dual Agent Modes: Supports both non-vision and vision-based agent operation (to support both LLM and VLM).
  • Scrollable vs. Interactive Elements: A clear separation improves DOM recognition and interaction.
  • Scrolling Logic: Enables scrolling through distinct webpage sections, including nested containers.
  • HTML → Markdown: Upgraded to markdownify in the Scrape Tool for better content conversion.
  • Tab Management: Tracks the number of open tabs, active tab, and supports basic tab control.
  • Extensible Tools: Add custom tools to the agent via the additional_tools parameter.
  • Iframe & Shadow DOM Access: Enhanced ability to interact with embedded or encapsulated elements.
  • Structured Output: Returns well-defined BaseModel outputs using the structured_output parameter.
  • Human-in-the-Loop: Add manual checkpoints in the workflow via the include_human_in_loop parameter (thanks @tanmaysk001!)
  • Inference Wrapper: Fixed the bug in the open router implementation (thanks @thecoderwithHat)
  • Navigation Fixes: Improved handling of edge-case navigations across complex sites.