TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: AI Browser Agent Leaderboard

11 点作者 huss973 个月前

2 条评论

sparacha3 个月前
Leaderboards are getting harder and harder as a decision tool. What does it mean to be better 0.7% or 1.6%. How does that help me? Is higher always better? What are the trade offs? Evals continue be the hardest most important parts of LLMs and tools that use them
评论 #43260585 未加载
huss973 个月前
Hey all! Wanted to share this leaderboard we put together to centralize some of the different models available in the AI browser agent space.<p>Since working on Steel, we&#x27;ve seen a ton of people have a hard time putting the browser agent space and how it&#x27;s progressing into perspective and it felt odd to us that there were no centralized leaderboards like there were for so many other agentic use cases.<p>So we launched this leaderboard to help! It&#x27;s open-source and we&#x27;re open to any contributions we may be missing. We&#x27;re committed to keeping this up to date as the space progresses (which it seems to be doing quite quickly).<p>Let us know if you have any feedback&#x2F;thoughts :)