TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
OmniParser V2 – A simple screen parsing tool towards pure vision based GUI agent
64 点
作者
punnerud
3 个月前
4 条评论
rgovostes
3 个月前
The OS has additional information including how different graphics layers are composited, and what accessibility metadata is attached to interface elements. It ought to be useful to exploit this to do better than screenshot parsing.
icodar
3 个月前
This is not the intended use but it good working on parsing document layout from image.
nighthawk454
3 个月前
One ponders the connections with the Recall feature
NewUser76312
3 个月前
Very cool work. Accurate GUI text and element parsing is exactly the kind of input that LLMs need to be effective agents.