TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computers

77 点作者 kristianpaul大约 1 年前

6 条评论

abrichr大约 1 年前
Thank you for making this available!<p>Check out <a href="https:&#x2F;&#x2F;github.com&#x2F;OpenAdaptAI&#x2F;OpenAdapt">https:&#x2F;&#x2F;github.com&#x2F;OpenAdaptAI&#x2F;OpenAdapt</a> for a cross platform (Mac and Windows) open source library that learns to perform tasks in desktop apps by observing human demonstrations.<p>We believe a major shortcoming with conventional approaches to AI agents is expecting them to be able to figure tasks of arbitrary complexity from first principles. While understandable from an academic perspective, this is unnecessary for practical utility, since humans perform these tasks constantly.<p>With OpenAdapt you can demonstrate to a model how to perform a task, then have it take over the task, with additional user-supplied natural language instructions.<p>I have created an issue to evaluate OpenAdapt on OSWorld: <a href="https:&#x2F;&#x2F;github.com&#x2F;OpenAdaptAI&#x2F;OpenAdapt&#x2F;issues&#x2F;642">https:&#x2F;&#x2F;github.com&#x2F;OpenAdaptAI&#x2F;OpenAdapt&#x2F;issues&#x2F;642</a>. Contributions welcome!<p>Edit: from <a href="https:&#x2F;&#x2F;github.com&#x2F;xlang-ai&#x2F;OSWorld&#x2F;tree&#x2F;main&#x2F;evaluation_examples">https:&#x2F;&#x2F;github.com&#x2F;xlang-ai&#x2F;OSWorld&#x2F;tree&#x2F;main&#x2F;evaluation_exa...</a>:<p>&gt; The .&#x2F;trajectories file contains the annotated trajectories for each data item in .&#x2F;examples for finishing the task.<p>Unfortunately this file does not appear to be included in the repo. I have submitted an issue here: <a href="https:&#x2F;&#x2F;github.com&#x2F;xlang-ai&#x2F;OSWorld&#x2F;issues&#x2F;30">https:&#x2F;&#x2F;github.com&#x2F;xlang-ai&#x2F;OSWorld&#x2F;issues&#x2F;30</a>
ec109685大约 1 年前
Buried in their presentation is the current effectiveness of agents to complete desktop computing tasks.<p>Humans are able to complete the tasks given at 70%+ effectiveness while the best model is at 12% (GPT4-v). Most of the other models were &lt;5% effective.
评论 #40193426 未加载
评论 #40193230 未加载
评论 #40193353 未加载
TheRoque大约 1 年前
Gotta love people working on replacing themselves. Jokes aside, seeing an AI interacting with a computer is kind of scary. It&#x27;s not just outputting text anymore, it&#x27;s doing the full work of a human working on a computer, meaning... a ton of people
评论 #40192696 未加载
评论 #40193176 未加载
stavros大约 1 年前
I built a small Python script so I could let GPT-4 debug my system issues:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;skorokithakis&#x2F;sysaidmin">https:&#x2F;&#x2F;github.com&#x2F;skorokithakis&#x2F;sysaidmin</a><p>It works surprisingly well!
评论 #40192850 未加载
bitwize大约 1 年前
Coming soon: Human-trained AI that can actuate a robotic hand to fill in paper forms with a Selectric typewriter. The doom of us all!
rosslazer大约 1 年前
Dumb question - What actually needs to be done to close the gap?