TechEcho

1 comment

I'm less focused on the particular results here, and more that this is where we're at: measuring ML at ML. Imagine a future where we can't construct a benchmark that demonstrates humans outperform machines at machine learning. Sure, that doesn't mean they are actually better in the key respects, especially at creativity and induction. But that's still a hell of a stage to be at.>consists of 7 challenging, open-ended ML research engineering environments and data from 71 8-hour attempts by 61 distinct human experts.>the best AI agents achieve a score 4x higher than human experts when both are given a total time budget of 2 hours per environment. >>However, humans currently display better returns to increasing time budgets

Re-Bench: Evaluating ML agents against human ML experts

1 comment

Re-Bench: Evaluating ML agents against human ML experts

1 comment