TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Vision AI Checkup, an Optometrist for VLMs

1 点作者 zerojames大约 17 小时前
Evaluating visual capabilities of language models is hard.<p>On the one end of the evaluation spectrum, we have vibe checks which, while useful for building intuition, are time-consuming to run across a dozen or more models. On the other end, we have large benchmarks which are so large that they are intractable to most users.<p>Vision AI Checkup is a new tool for evaluating VLMs. The site is made up of hand-crafted prompts focused on real-world problems: defect detection, understanding how the position of one object relates to another, colour understanding, and more.<p>Our prompts are especially focused on industrial tasks -- serial number reading, assembly line understanding, and more -- although we&#x27;re excited to add more general prompts.<p>The tool lets you see how models do across categories of prompts, and how different models do on a single prompt.<p>We have open sourced the codebase, with instructions on how to add a prompt to the assessment: <a href="https:&#x2F;&#x2F;github.com&#x2F;roboflow&#x2F;vision-ai-checkup">https:&#x2F;&#x2F;github.com&#x2F;roboflow&#x2F;vision-ai-checkup</a>. You can also add new models.<p>We&#x27;d love feedback and, also, ideas for areas where VLMs struggle that you&#x27;d like to see assessed!

暂无评论

暂无评论