TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OmniParser V2 – A simple screen parsing tool towards pure vision based GUI agent

64 点作者 punnerud3 个月前

4 条评论

rgovostes3 个月前
The OS has additional information including how different graphics layers are composited, and what accessibility metadata is attached to interface elements. It ought to be useful to exploit this to do better than screenshot parsing.
icodar3 个月前
This is not the intended use but it good working on parsing document layout from image.
nighthawk4543 个月前
One ponders the connections with the Recall feature
NewUser763123 个月前
Very cool work. Accurate GUI text and element parsing is exactly the kind of input that LLMs need to be effective agents.