Hey HN, Greg from ARC Prize Foundation here.<p>Alongside Mike Knoop and François Francois Chollet, we’re launching ARC-AGI-2, a frontier AI benchmark that measures a model’s ability to generalize on tasks it hasn’t seen before, and the ARC Prize 2025 competition to beat it.<p>In Dec ‘24, ARC-AGI-1 (2019) pinpointed the moment AI moved beyond pure memorization as seen by OpenAI's o3.<p>ARC-AGI-2 targets test-time reasoning.<p>My view is that good AI benchmarks don't just measure progress, they inspire it. Our mission is to guide research towards general systems.<p>Base LLMs (no reasoning) are currently scoring 0% on ARC-AGI-2. Specialized AI reasoning systems (like R1 or o3-mini) are <4%.<p>Every (100%) of ARC-AGI-2 tasks, however, have been solved by at least two humans, quickly and easily. We know this because we tested 400 people live.<p>Our belief is that once we can no longer come up with quantifiable problems that are "feasible for humans and hard for AI" then we effectively have AGI. ARC-AGI-2 proves that we do not have AGI.<p>Change log from ARC-AGI-2 to ARC-AGI-2:
* The two main evaluation sets (semi-private, private eval) have increased to 120 tasks
* Solving tasks requires more reasoning vs pure intuition
* Each task has been confirmed to have been solved by at least 2 people (many more) out of an average of 7 test taskers in 2 attempts or less
* Non-training task sets are now difficulty-calibrated<p>The 2025 Prize ($1M, open-source required) is designed to drive progress on this specific gap. Last year's competition (also launched on HN) had 1.5K teams participate and had 40+ research papers published.<p>The Kaggle competition goes live later this week and you can sign up here: <a href="https://arcprize.org/competition" rel="nofollow">https://arcprize.org/competition</a><p>We're in an idea-constrained environment. The next AGI breakthrough might come from you, not a giant lab.<p>Happy to answer questions.