Releases: 4fuu/open-browser-cli
0.6.1 (631369d)
browser-cli 0.6.1
0.6.1 是一次 Linux 二进制兼容性修复版本,不含功能变更。
变更
- Linux 预编译二进制从 glibc 动态链接切换为 musl 静态链接
- 此前的二进制在 CI(Ubuntu 24.04,glibc 2.39)上编译,无法在 glibc 版本较旧的发行版(如 Ubuntu 20.04、Debian 11 等)上运行。
- 切换到 musl 后产物完全静态链接,可在任意 Linux 发行版上直接运行,无 glibc 版本限制。
- 受影响的目标:
x86_64-unknown-linux-musl、aarch64-unknown-linux-musl。 - macOS 和 Windows 二进制不受影响。
兼容性说明
- 这是一次纯构建层修复,CLI 行为与协议完全不变。
- 已在使用
0.6.0的用户可直接替换二进制,无需任何配置变更。
English
0.6.1 is a Linux binary compatibility fix with no functional changes.
Changes
- Linux prebuilt binaries now use musl static linking instead of glibc dynamic linking
- Previous binaries were compiled on Ubuntu 24.04 (glibc 2.39) and would fail to run on older distributions such as Ubuntu 20.04 or Debian 11.
- The switch to musl produces fully self-contained binaries that run on any Linux distribution regardless of the installed glibc version.
- Affected targets:
x86_64-unknown-linux-muslandaarch64-unknown-linux-musl. - macOS and Windows binaries are not affected.
Compatibility notes
- This is a build-layer fix only; CLI behavior and protocol are completely unchanged.
- Users already on
0.6.0can drop in the new binary without any configuration changes.
Full Changelog: 0.6.0...0.6.1
0.6.0 (9e21fc1)
browser-cli 0.6.0
0.6.0 是一次以浏览器资源获取、媒体状态可见性和扩展发布稳定性增强为主的版本更新,汇总了自 0.5.4 以来这一轮 CLI 能力扩展、页面结构化增强、扩展侧交互修正,以及浏览器扩展打包/发布流程改进。
亮点
- 新增截图与资源下载能力
- 新增
browser-cli screenshot <session-id>,支持--output、--quality、--json等参数,可直接把当前页面视口截图保存到本地。 - 新增
browser-cli download <session-id> <target>,可用元素 ID 或直接 URL 作为目标,并通过浏览器当前会话上下文下载图片或文件,保留登录态 / cookie 访问能力。 - 下载结果会自动推断文件名并做安全清洗;截图和下载都支持结构化 JSON 输出,便于 Agent / 脚本串联。
- 新增
- 结构化页面视图开始暴露媒体播放状态
- 浏览器快照现在会提取
<audio>/<video>的播放状态,包括playing/paused/ended、当前时间、总时长、静音状态,以及视频分辨率。 - Rust 侧新增专用
media节点,XML / JSON 输出会保留这些字段,让页面上的媒体控件与播放状态可以被更稳定地理解和利用。 - 插件目标遍历也已同步理解媒体节点,为后续插件扩展和自动化规则提供更完整的语义基础。
- 浏览器快照现在会提取
- 扩展侧交互与采集行为更稳定
- 当点击或输入触发页面滚动时,浏览器里的光标提示现在会跟随目标元素移动,不再轻易出现动画脉冲停在旧位置的情况。
- 截图流程补强了质量参数校验,并在捕获完成后恢复原先的活动标签页,减少截图动作对当前浏览上下文的打扰。
- 浏览器扩展打包与 Chrome 集成更稳妥
- 扩展构建现在会生成浏览器专用产物,Chrome / Firefox 各自使用对应的 manifest 与包内容进行发布。
- Chrome 包不再混用 Manifest V2 的
background.scripts与 Manifest V3 的service_worker,解压加载时的产物路径也已与最终 zip 内容保持一致。 - macOS 上的 Chrome Native Messaging host 默认安装路径已修正到正确目录,减少
browser-cli setup --browser chrome后扩展仍无法连接 relay 的问题。
兼容性与行为说明
- 这是一次向前兼容的版本更新;新增的
screenshot和download命令是增量能力,不会影响现有open/page/click/type/wait工作流。 screenshot --full-page目前仍会回退为视口截图,并输出提示;这次版本先补齐统一命令入口与截图链路,完整全页拼接仍可后续继续增强。download面向“浏览器当前会话可访问”的资源:它会复用扩展背景页的网络上下文,因此更适合下载登录后图片、文件链接或页面中可直接访问的媒体资源。- Chrome 用户升级到这一版后,建议 reload 扩展;macOS 用户如果之前已经执行过
browser-cli setup --browser chrome --extension-id <id>,建议重新执行一次以把 native host manifest 写入修正后的目录。
English
0.6.0 is a feature release focused on browser-side resource capture, richer media-state visibility, and more reliable extension packaging and integration, covering this round of CLI expansion, structured-page enhancements, extension interaction fixes, and browser-extension release improvements since 0.5.4.
Highlights
- New screenshot and resource-download workflows
- Added
browser-cli screenshot <session-id>with options such as--output,--quality, and--json, so the current page viewport can be captured directly to a local file. - Added
browser-cli download <session-id> <target>, where the target can be either an element ID or a direct URL, and downloads are performed through the browser session context so authenticated resources remain accessible. - Download output now auto-detects and sanitizes filenames, and both screenshot and download flows support structured JSON responses for agent and script pipelines.
- Added
- Structured page output now exposes media playback state
- Browser snapshots now extract playback information from
<audio>/<video>elements, includingplaying/paused/ended, current time, duration, mute state, and video resolution. - The Rust side now models these as dedicated
medianodes, and both XML and JSON output preserve the media fields so playback surfaces can be understood more reliably. - Plugin target traversal has been updated to understand media nodes as well, which gives future plugin rules and automation flows a stronger semantic base.
- Browser snapshots now extract playback information from
- Extension-side interaction and capture behavior is more robust
- When click or type actions trigger page scrolling, the visible cursor overlay now stays attached to the active target instead of leaving pulse feedback behind at stale coordinates.
- Screenshot handling now validates quality input more carefully and restores the previously active tab after capture, reducing disruption to the current browsing context.
- Browser-extension packaging and Chrome integration are more reliable
- Extension builds now produce browser-specific artifacts, and Chrome / Firefox release packages are assembled from their corresponding generated manifests.
- The Chrome package no longer mixes Manifest V2
background.scriptswith Manifest V3service_worker, and archive paths now match the final unpacked zip layout. - On macOS, the default Chrome Native Messaging host install path has been corrected so
browser-cli setup --browser chromeis less likely to leave the extension unable to connect to the relay.
Compatibility and behavior notes
- This is a forward-compatible release. The new
screenshotanddownloadcommands are additive and do not break existingopen,page,click,type, orwaitflows. screenshot --full-pagestill falls back to viewport capture with a warning; this release establishes the command surface and capture pipeline first, while full-page stitching can be improved later.downloadis designed for resources that are accessible from the browser's current session context, which makes it especially useful for authenticated images, file links, and page-visible media resources.- Chrome users should reload the extension after upgrading. macOS users who previously ran
browser-cli setup --browser chrome --extension-id <id>should run it again once so the native host manifest is rewritten into the corrected directory.
Full Changelog: 0.5.4...0.6.0
0.5.4 (4bfcb96)
browser-cli 0.5.4
0.5.4 是一个以 JSON 输出分层、view 聚焦行为优化、浏览器扩展交互提示补强和文档补全为主的补丁版本,汇总了自 0.5.3 以来这一轮 CLI 输出、结构化视图、扩展侧交互反馈和使用说明改进。
亮点
--json现在区分默认紧凑模式和--verbose全量模式page、search、block、view新增--verbose/-v,需要完整结构时可以显式展开。- 默认
--json会返回更适合 Agent 消费的紧凑数据:压平低信号容器,省略element_refs、full_texts、full_blocks等原始扩展字段。 search --json默认只保留page、tag、context和命中的element_id,减少下游解析噪声。
view默认更聚焦当前目标- 当目标元素位于
list/table中时,默认view会收窄到包含该目标的单个item/row,避免把整块列表上下文一并展开。 - 如果确实需要完整列表或表格上下文,可以显式使用
view --verbose。 open、click、type、wait这些默认返回页面的路径仍继续输出完整 XML 页面,不会因为新的紧凑 JSON 策略而被意外裁剪。
- 当目标元素位于
- 浏览器扩展的自动化状态提示更明显
- 扩展侧 content script 调整了光标叠层样式:待机时会显示更小的光标和更明显的
zzzidle 标记,执行任务时再切回高亮工作态。 - 这让真实浏览器里“当前只是待机”还是“正在执行自动化动作”更容易一眼区分。
- 扩展侧 content script 调整了光标叠层样式:待机时会显示更小的光标和更明显的
- 文档与开发说明同步更新
README.md、README.en.md、SKILL.md已同步补充--verbose、view默认聚焦行为,以及最新命令速查与输出说明。- 仓库新增
CONTRIBUTING.md,把本地开发、扩展重载、测试要求和提交前检查整理成独立说明。
兼容性与行为说明
- 这是一个兼容性补丁版本,没有协议层破坏性变更。
- XML 主输出行为基本保持不变;这次的“紧凑 / 全量”切换主要体现在
--json结果和view的默认聚焦范围上。 - 如果下游脚本原来依赖
page --json、search --json、block --json或view --json中的完整原始字段,需要改为显式传入--verbose。 - 浏览器扩展侧的这次改动主要是自动化光标的视觉反馈增强,不影响协议或页面结构化字段。
English
0.5.4 is a patch release focused on layered JSON output, a tighter default view scope, clearer browser-extension automation feedback, and refreshed documentation, covering this round of CLI-output, structured-view, extension UX, and usage-guide refinements since 0.5.3.
Highlights
--jsonnow distinguishes between compact default output and full--verboseoutputpage,search,block, andviewnow accept--verbose/-vso callers can opt into the full structure when needed.- Default
--jsonoutput is now more agent-friendly: low-signal containers are flattened and raw expansion fields such aselement_refs,full_texts, andfull_blocksare omitted. search --jsonnow keeps onlypage,tag,context, and matchedelement_idby default, which reduces downstream parsing noise.
viewnow defaults to a tighter target-focused scope- When the target element lives inside a
listortable, defaultviewoutput narrows to the single matchingitem/rowinstead of expanding the entire surrounding block. - When full surrounding list or table context is needed,
view --verbosekeeps the larger structure intact. open,click,type, andwaitstill return full XML pages on their default non-JSON paths, so the new compact JSON behavior does not silently trim those flows.
- When the target element lives inside a
- Browser-extension automation state is easier to read
- The extension-side content script updates the cursor overlay so idle state is shown with a smaller cursor and a more obvious
zzzmarker, while active task mode still switches back to the highlighted working state. - This makes it easier to tell at a glance whether the real browser is idle or currently executing automation.
- The extension-side content script updates the cursor overlay so idle state is shown with a smaller cursor and a more obvious
- Documentation and contributor guidance were refreshed together
README.md,README.en.md, andSKILL.mdnow document--verbose, the new defaultviewfocus behavior, and the updated command/output conventions.- A new
CONTRIBUTING.mdcollects local setup, extension reload, test requirements, and pre-submission checks into one place.
Compatibility and behavior notes
- This is a compatible patch release with no protocol-breaking changes.
- XML output behavior is largely unchanged; the new compact-vs-full distinction mainly affects
--jsonresults and the default scope ofview. - Downstream tooling that depended on full raw fields from
page --json,search --json,block --json, orview --jsonshould now pass--verboseexplicitly. - The browser-extension change in this release is purely about automation cursor feedback and does not alter the protocol or structured page fields.
Full Changelog: 0.5.3...0.5.4
0.5.3 (16ff0dd)
browser-cli 0.5.3
0.5.3 是一个以结构化输出语义保留、class 信息可读性和列表项交互识别补强为主的补丁版本,汇总了自 0.5.2 以来这一轮页面结构化 / XML 渲染 / 插件目标遍历改进。
亮点
- 结构化结果开始保留更有用的 class 语义
- 交互节点如
link、button,以及列表项item,现在会在结构化树中保留自身的class信息。 - XML 输出会在保留 class 语义的同时做去重和压缩,减少重复兄弟节点、重复路径和 BEM 基类带来的噪声。
- 对重复结构的列表项,会优先保留首个模式样本上的 class,后续同构项自动省略重复 class,让输出更短但仍保留足够语义。
- 交互节点如
- 列表项交互识别更符合真实页面
- 如果
<li>本身就携带弱交互信号,例如cursor:pointer,结构化阶段现在会直接把它识别为可点击元素,而不是先包成普通item。 - 这让排行榜、tab 列表、卡片列表一类自定义站点组件更容易直接被
click命中。
- 如果
- XML 上下文去噪更进一步
- class 路径会自动压缩掉冗余父级前缀,例如 BEM 基类与修饰符、父块名与子元素名的重复片段。
- 兄弟节点之间重复的相同 class 也会按结构去重,避免长列表输出被模板 class 填满。
- 插件目标遍历与 CLI 目标查找已同步适配新的节点字段,保持交互查找行为一致。
兼容性与行为说明
- 这是一个兼容性补丁版本,没有引入新的命令参数或协议层破坏性变更。
- 现有
eN/tN/bN工作流保持不变;本次主要提升的是输出中 class 语义的可利用性和列表交互元素的命中率。 - XML 输出中的
class属性会比之前更有信息量,但也会经过主动去重压缩;如果下游逻辑依赖重复模板 class 的完整原样展开,需要按新的更紧凑表示适配。
English
0.5.3 is a patch release focused on semantic preservation in structured output, more useful class rendering, and stronger interactive detection for list items, covering this round of page-structuring, XML-rendering, and plugin target-traversal refinements since 0.5.2.
Highlights
- Structured output now preserves more useful class semantics
- Interactive nodes such as
link,button, and listitemnodes now retain their ownclassinformation in the structured tree. - XML rendering keeps that class context while deduplicating and compacting it, reducing noise from repeated siblings, repeated path segments, and BEM base classes.
- For repetitive list structures, class information is preferentially kept on the first representative item while later isomorphic items omit duplicate class data, keeping output shorter without losing the pattern.
- Interactive nodes such as
- List-item interaction detection better matches real pages
- When an
<li>itself carries weak interactive signals such ascursor:pointer, the structuring pass now emits it directly as an interactive target instead of wrapping it as a plainitemfirst. - This makes custom site components such as ranking lists, tab strips, and card lists easier to target directly with
click.
- When an
- XML context denoising goes further
- Class paths now compact redundant parent prefixes automatically, including repeated BEM base/modifier relationships and repeated block/element prefixes.
- Repeated sibling classes are also deduplicated structurally so long lists are not flooded with template-level class noise.
- Plugin target traversal and CLI target lookup have been updated to understand the expanded node fields and keep interaction resolution behavior consistent.
Compatibility and behavior notes
- This is a compatible patch release with no new command syntax and no protocol-breaking changes.
- Existing
eN/tN/bNworkflows remain unchanged; the main improvement here is better class-level semantics in output and more reliable targeting of interactive list items. classattributes in XML now carry more signal than before, but they are also intentionally deduplicated and compacted; downstream logic that depended on fully repeated template classes should adapt to the denser representation.
Full Changelog: 0.5.2...0.5.3
0.5.2 (23b0d2e)
browser-cli 0.5.2
0.5.2 是一个以多窗口会话追踪、短 ID 参数一致性和弱交互识别补强为主的补丁版本,汇总了自 0.5.1 以来这一轮 CLI / 结构化解析 / 扩展侧会话管理改进。
亮点
- 新窗口 / 新标签点击现在会自动生成新 session
- 当普通
click触发站点自身的新 tab / window 打开行为时,扩展现在会识别这个上下文切换,并为目标页面创建新的 session,而不是让追踪断开。 - CLI 会直接拿到新的
session_id和目标页内容,减少一次额外的list/page调用。 - 原 session 保持停留在旧页面,不会被重绑到新 tab,整体行为更符合显式会话模型。
- 当普通
- ID 参数语义更统一
click/type现在同时接受e3和3。text/block现在也同时接受完整 ID 和数字部分,例如t1/1、b1/1。README.md、README.en.md、SKILL.md和 CLI help 已同步更新到统一的参数说明。
- 弱交互识别补强
- 浏览器快照会保留非原生交互节点上的运行时
cursor: pointer信号。 - Rust 侧结构化阶段会把带可见文本的
cursor:pointer节点保守识别为按钮,减少自定义列表项 / 卡片项上的漏判。
- 浏览器快照会保留非原生交互节点上的运行时
兼容性与行为说明
- 这是一个兼容性补丁版本,没有引入破坏性的协议或会话模型变更。
click --new-session仍然保留,继续用于“显式把链接目标打开成新会话”的场景;本次新增的是对站点自身新 tab / window 打开行为的自动追踪。- 现有数字 ID 工作流继续有效;
eN/tN/bN只是新增的等价写法,不影响已有脚本。 - 由于扩展新增了
webNavigation权限来识别新建导航目标,升级后需要 reload 浏览器扩展才能完整生效。
English
0.5.2 is a patch release focused on cross-window session tracking, consistent short-ID argument handling, and stronger weak-interaction detection across the CLI, structured page parsing, and extension-side session management since 0.5.1.
Highlights
- Click-opened tabs/windows now become tracked sessions automatically
- When a normal
clicktriggers site-driven new-tab or new-window behavior, the extension now detects that context change and creates a new session for the destination page instead of losing tracking. - The CLI now receives the new
session_idtogether with the destination page content, removing the need for an extralist/pageround trip. - The original session stays on the old page and is not rebound to the new tab, which better matches the explicit session model.
- When a normal
- More consistent ID argument semantics
click/typenow accept bothe3and3.text/blocknow also accept either full IDs or their numeric forms, such ast1/1andb1/1.README.md,README.en.md,SKILL.md, and CLI help have been updated to reflect the unified argument behavior.
- Stronger weak-interaction detection
- Browser snapshots now preserve runtime
cursor: pointersignals on non-native interactive nodes. - During Rust-side structuring, visible-text nodes with
cursor:pointerare conservatively classified as buttons, reducing misses on custom list items and card-like controls.
- Browser snapshots now preserve runtime
Compatibility and behavior notes
- This is a compatible patch release with no protocol-breaking or session-model-breaking changes.
click --new-sessionremains available for explicitly opening link targets as separate sessions; what is new here is automatic tracking for site-driven new tab/window behavior.- Existing numeric-ID workflows remain valid;
eN/tN/bNare additive equivalent forms and do not break current scripts. - Because the extension now uses the
webNavigationpermission to detect newly created navigation targets, reloading the browser extension is required after upgrading for the full behavior to take effect.
Full Changelog: 0.5.1...0.5.2
0.5.1 (a5b6fd2)
browser-cli 0.5.1
0.5.1 是一个以页面语义保留、结构化视图可读性和长块元素定位修正为主的补丁版本,汇总了自 0.5.0 以来的这一轮页面解析 / XML 输出 / 扩展侧交互细节改进。
亮点
- 更完整的交互语义识别
- 结构化阶段现在能识别更多 role / 状态驱动的控件,包括
role="link"、tab、menuitem、option、switch等常见自定义交互语义。 - 对带
tabindex且同时具有aria-selected/aria-current/aria-expanded等状态的节点,会更保守地判定为可交互元素,减少自定义组件上的漏判。 - 浏览器侧快照补充了
class、tabindex、更多 ARIA 状态和运行时onclick信息,为 Rust 侧结构化判断提供更完整的原始事实。
- 结构化阶段现在能识别更多 role / 状态驱动的控件,包括
- 聚焦查看与长块目标解析更稳定
view现在可以正确定位被截断列表 / 表格块中的元素,并自动返回展开后的上下文。click的文本查询和--new-session链接解析会继续在完整块内容中查找目标,不再只局限于当前页已经展开的那一小段。
- XML 输出更贴近真实语义
- 会跳过纯展示型包装容器,减少
div/span/i一类无意义层级噪声。 - 被压平包装层上的
role/class语义会向下继承到最终输出节点,便于后续规则、插件或 Agent 利用上下文。 - 单一交互叶子的列表项会更紧凑地内联输出,但真正有记录边界的 item 分组仍会保留。
- 会跳过纯展示型包装容器,减少
- 光标驻留行为更克制、更自然
- 空闲游走改为按间隔决策,并在长时间没有 CLI 活动后自动停止,避免页面长期无意义移动。
- 移动轨迹加入轻微 wobble 和中途犹豫停顿,鼠标存在感更自然。
兼容性与行为说明
- 这是一个兼容性补丁版本,没有引入新的会话模型或协议层破坏性变更。
- 现有
eN/tN/bN工作流继续可用;本次主要是让复杂自定义组件和长块场景下的结构化结果更稳定、更容易定位。 - XML 输出会比
0.5.0更少展示无语义包装层;如果下游逻辑依赖这些展示型容器的具体层级,需要按新的结构化视图适配。
English
0.5.1 is a patch release focused on semantic preservation, structured-view readability, and more reliable element resolution inside truncated blocks, covering this round of page parsing, XML rendering, and extension-side interaction refinements since 0.5.0.
Highlights
- More complete interactive semantic detection
- The structured page pass now recognizes more role- and state-driven widgets, including common custom semantics such as
role="link",tab,menuitem,option, andswitch. - Nodes with
tabindexplus widget state such asaria-selected,aria-current, oraria-expandedare now classified more conservatively as interactive, reducing misses on custom components. - Browser-side snapshots now include
class,tabindex, more ARIA state, and runtimeonclicksignals so the Rust side has a fuller set of raw facts to classify from.
- The structured page pass now recognizes more role- and state-driven widgets, including common custom semantics such as
- More reliable focused views and long-block target resolution
viewcan now resolve elements that live inside truncated list/table blocks and automatically return the expanded surrounding context.- Text-query resolution in
clickand link extraction for--new-sessionnow continue searching through full block contents instead of only the currently expanded slice.
- XML output is closer to real page semantics
- Purely presentational wrapper containers are skipped to reduce noise from meaningless
div/span/ilayers. role/classsemantics from flattened wrappers are propagated down to the final rendered nodes, which gives downstream rules, plugins, and agents more usable context.- List items that only wrap a single interactive leaf are rendered more compactly, while true record boundaries are still preserved.
- Purely presentational wrapper containers are skipped to reduce noise from meaningless
- Cursor presence becomes more restrained and natural
- Idle wandering now runs on interval-based decisions and stops automatically after prolonged CLI inactivity, avoiding endless meaningless movement.
- Cursor motion now includes slight wobble and occasional hesitation pauses to make on-page presence look less mechanical.
Compatibility and behavior notes
- This is a compatible patch release with no new session-model or protocol-breaking changes.
- Existing
eN/tN/bNworkflows remain valid; the main goal here is to make structured output more stable and easier to target on complex custom widgets and long blocks. - XML output in
0.5.1intentionally shows fewer non-semantic wrapper layers than0.5.0; downstream logic that depended on those presentational container levels should adapt to the cleaner structure.
Full Changelog: 0.5.0...0.5.1
0.5.0 (59eda41)
browser-cli 0.5.0
0.5.0 是一次以命令易用性、结构化页面视图和交互体验打磨为主的版本更新,汇总了自 0.4.3 以来的这一轮 CLI / 页面解析 / 扩展侧改进。
亮点
- 更自然的命令参数与默认行为
open新增--wait和--quiet,并默认直接返回当前页面,而不是只返回 session 信息。click/type现在既可以使用数字元素 ID,也可以直接用页面上的文本查询来定位交互元素。wait新增--for <text>文本等待模式,并支持--quiet只输出紧凑结果。block新增--all,可以一次展开整个长列表或长表格块。
- 新增聚焦查看能力
- 新增
view命令,可针对元素、长文本、长列表/表格块或文本查询返回聚焦视图。 - 适合在复杂页面里只取某个局部上下文,减少手动翻页和二次检索。
- 新增
- 页面结构化模型升级
- 页面内部结构从更扁平的表示升级为树形
nodes,输出能更好保留容器、列表、表格和嵌套交互元素的上下文关系。 - XML / JSON 渲染、长块分页和聚焦视图现在共享同一套树结构,复杂 DOM 场景下的结果更稳定、更接近真实页面层次。
- 页面内部结构从更扁平的表示升级为树形
- 插件与扩展侧同步增强
- 插件匹配逻辑已适配新的树形页面结构,字符串目标解析会递归查找可交互节点。
- 浏览器扩展在打开页面后的首个快照前支持额外稳定等待,减少刚加载完成时页面仍在抖动的问题。
- 光标移动轨迹进一步拟人化,加入分段贝塞尔运动、绕行和更自然的停顿节奏。
兼容性与行为说明
- 原有的数字元素 ID 工作流保持可用;文本查询是新增能力,不会影响已有脚本继续传数字 ID。
click/type/wait现在默认会返回更新后的页面;如果自动化场景只需要成功摘要,可使用--quiet。- 本次没有引入破坏性的会话模型变更;现有 CLI / Relay / Extension 的整体使用方式保持不变。
README.md、AGENTS.md和SKILL.md已同步更新到新的参数语义,便于 AI / Agent 和脚本侧直接按新接口使用。
English
0.5.0 is a release focused on command ergonomics, structured page views, and interaction polish, covering this round of CLI, page parsing, and extension-side improvements since 0.4.3.
Highlights
- More natural command targets and defaults
opennow supports--waitand--quiet, and returns the current page by default instead of only printing session info.click/typenow accept either numeric element IDs or direct text queries to resolve interactive targets.waitnow supports--for <text>for text-based waiting and--quietfor compact automation output.blocknow supports--allto expand an entire long list/table block at once.
- New focused inspection flow
- Added a new
viewcommand that can return a focused view for an element, long text item, long list/table block, or text query. - This is especially useful on complex pages where only one local subtree or content segment is needed.
- Added a new
- Upgraded structured page model
- Internal page representation moved from a flatter structure to tree-based
nodes, preserving container, list, table, and nested interactive context much better. - XML / JSON rendering, block pagination, and focused views now share the same tree model, which improves stability on complex DOM layouts.
- Internal page representation moved from a flatter structure to tree-based
- Plugin and extension improvements
- Plugin target matching now traverses the new tree structure recursively when resolving interactive targets.
- The extension can now wait for additional DOM stability before sending the first snapshot after page open.
- Cursor motion has been further humanized with segmented bezier movement, detours, and more natural pause timing.
Compatibility and behavior notes
- Existing numeric element-ID workflows remain valid; text-query targeting is additive and does not break current scripts.
click/type/waitnow return the updated page by default. Use--quietwhen automation only needs a compact success result.- This release does not introduce a breaking session-model change. The overall CLI / Relay / Extension workflow remains the same.
README.md,AGENTS.md, andSKILL.mdhave been updated to reflect the new parameter semantics so agents and scripts can follow the latest interface directly.
Full Changelog: 0.4.3...0.5.0
0.4.3 (ef54ddf)
browser-cli 0.4.3
0.4.3 是一个以光标可视化和更自然的人类操作模拟为主的小版本更新,汇总了自 0.4.2 以来的交互增强。
- 新增常驻光标渲染。现在通过
open打开页面后,扩展会立即在页面中显示可见光标,而不是只在点击瞬间出现。 - 新增空闲游走行为。光标会在页面可见交互区域与空白区域之间随机移动,增强页面停留期间的人类操作存在感。
- 新增任务态光标标识。当收到明确操作指令时,光标会切换为黄色,和常规空闲态区分开来。
- 改进点击与输入前的接近路径。执行动作前会先滚动目标、按曲线路径靠近控件,并在移动过程中补充更连续的鼠标移动事件。
- 提升控件命中准确度。点击前会对目标可见区域做采样和命中校验,并在点击时提供缩放反馈,减少边缘命中和遮挡带来的误点。
本次发布没有引入协议层破坏性变更,现有 CLI / Relay / Extension 工作流保持不变。
English
0.4.3 is a small release focused on cursor visibility and more natural human-like browser interaction, covering all changes since 0.4.2.
- Added persistent cursor rendering. After a page is opened with
open, the extension now shows a visible cursor immediately instead of only during click execution. - Added idle wandering behavior. The cursor now moves between visible interactive regions and whitespace while the page is idle, making browser presence look less static.
- Added a task-state cursor marker. When an explicit action is being executed, the cursor switches to yellow so task mode is visually distinct from normal idle mode.
- Improved approach paths before click and type actions. The extension now scrolls the target into view, approaches it along a curved path, and emits a more continuous stream of mouse movement events before acting.
- Improved hit accuracy for actionable controls. Before clicking, the extension samples the visible target area and validates hit points, then applies a click-scale feedback effect to reduce edge hits and occlusion-related misses.
This release does not introduce any breaking protocol changes. Existing CLI / Relay / Extension workflows remain unchanged.
Full Changelog: 0.4.2...0.4.3
0.4.2 (29a6eab)
browser-cli 0.4.2
0.4.2 是一个以结构化输出质量和导航稳定性为主的小版本更新,汇总了自 0.4.0 以来的改进。
- 改进块级分页逻辑。超长
list/table不再按固定条目数硬切,而是按渲染后的 XML 行数预算分页,让首屏内容更合理,后续block分页也更贴近实际阅读密度。 - 优化表格 XML 输出。只有一个且内容较短的单元格行,现在会压缩成单行
<row><cell>...</cell></row>,减少冗余,提升终端和 AI 消费时的可读性。 - 修复导航类列表的结构化错位问题。由纯链接组成的列表不再被重复输出成
<list>和<link>两份;混合列表中的纯文本项与可点击项也会被更准确地拆分和排序。 - 改进元素标签回退与搜索命中。现在会把
title属性纳入标签回退和搜索字段,减少“元素可操作但没有可读标签”的情况。 - 修复跨页跳转后的缓存问题。当
click触发页面导航时,扩展会在新页面加载完成后主动请求新的快照,避免 Relay / CLI 继续读取旧页面缓存。
本次发布没有引入协议层破坏性变更,现有 CLI / Relay / Extension 工作流保持不变。
English
0.4.2 is a small release focused on structured output quality and navigation stability, rolling up all changes since 0.4.0.
- Improved block pagination. Long
list/tableblocks are no longer split by a fixed item count. They are now paginated by rendered XML line budget, which makes the first page denser and follow-upblockpages more predictable. - Cleaner table XML. Single-cell rows with short content are now emitted inline as
<row><cell>...</cell></row>, reducing output noise for terminal and AI consumers. - Fixed structured-output misalignment for navigation-style lists. Lists made entirely of links are no longer duplicated as both
<list>and<link>, and mixed lists now separate plain text items from actionable links more accurately. - Better label fallback and search coverage. The
titleattribute is now included in searchable fields and used as a fallback label source when needed. - Fixed stale cache after navigation. When a
clickcauses a page transition, the extension now waits for the new page to load and requests a fresh snapshot, preventing Relay / CLI from serving the previous page state.
This release does not introduce any breaking protocol changes. Existing CLI / Relay / Extension workflows remain unchanged.
Full Changelog: 0.4.0...0.4.2
0.4.1 (795d724)
Full Changelog: 0.4.0...0.4.1