TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Measuring AI Ability to Complete Long Tasks – METR
7 点
作者
gk1
大约 2 个月前
1 comment
nmca
大约 2 个月前
very big if true!<p>I downloaded their data from the metr public evals GitHub repo and independently re-implemented the very basic maximum likelihood analysis, which gave very similar results to what they shared.