Measuring AI Ability to Complete Long Tasks – METR

7 pointsby gk1about 2 months ago

1 comment

nmcaabout 2 months ago

very big if true!<p>I downloaded their data from the metr public evals GitHub repo and independently re-implemented the very basic maximum likelihood analysis, which gave very similar results to what they shared.