TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Measuring AI Ability to Complete Long Tasks – METR

7 pointsby gk1about 2 months ago

1 comment

nmcaabout 2 months ago
very big if true!<p>I downloaded their data from the metr public evals GitHub repo and independently re-implemented the very basic maximum likelihood analysis, which gave very similar results to what they shared.