TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
Measuring AI Ability to Complete Long Tasks – METR
7 points
by
gk1
about 2 months ago
1 comment
nmca
about 2 months ago
very big if true!<p>I downloaded their data from the metr public evals GitHub repo and independently re-implemented the very basic maximum likelihood analysis, which gave very similar results to what they shared.