TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DiceBench: A Simple Task Humans Fundamentally Cannot Do (But AI Might)

2 点作者 mrconter114 个月前

2 条评论

mrconter114 个月前
Author here. I think our approach to AI benchmarks might be too human-centric. We keep creating harder and harder problems that humans can solve (like expert-level math in FrontierMath), using human intelligence as the gold standard.<p>But maybe we need simpler examples that demonstrate fundamentally different ways of processing information. The dice prediction isn&#x27;t important - what matters is finding clean examples where all information is visible, but humans are cognitively limited in processing it, regardless of time or expertise.<p>It&#x27;s about moving beyond human performance as our primary reference point for measuring AI capabilities.
super_normal4 个月前
how about a totallynotcardcountingbench lottery, as well?