> The most frequent failure mode among human participants is the inability to find a correct solution Typically, human participants have a clear sense of whether they solved a problem correctly. In contrast, all evaluated LLMs consistently claimed to have solved the problems.<p>This is exactly the problem that needs to be solved. The yes-man nature of LLMs is the biggest inhibitor to progress, as a model that cannot self evaluate well cannot learn.<p>If we solve this though, combined with reasoning, I feel somewhat confident we will be able to achieve “AGI,” at least over text-accessible domains.