TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Re-Bench: Evaluating ML agents against human ML experts

2 pointsby marojejian5 months ago

1 comment

marojejian5 months ago
I&#x27;m less focused on the particular results here, and more that this is where we&#x27;re at: measuring ML at ML. Imagine a future where we can&#x27;t construct a benchmark that demonstrates humans outperform machines <i>at</i> machine learning. Sure, that doesn&#x27;t mean they are actually better in the key respects, especially at creativity and induction. But that&#x27;s still a hell of a stage to be at.<p>&gt;consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts.<p>&gt;the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. &gt;<p>&gt;However, humans currently display better returns to increasing time budgets