Those are really sweeping conclusions, considering the experiment is just a single iteration of a single prompt! FWIW, Claude Opus got this for me on the first try:<p>"In the Gilligan's Island episode "Seer Gilligan" (season 3, episode 8), Gilligan gains the ability to read minds after being hit on the head with a coconut. At first, the castaways are excited about Gilligan's new power and try to use it to their advantage. However, his mind-reading abilities soon cause chaos and misunderstandings among the group. In the end, Gilligan gets hit on the head again and loses his mind-reading powers, much to everyone's relief."<p>(the season number and episode number are wrong, but the name is right, suggesting that this is just lack of sufficient memorization rather than some deep statement about reasoning. The episode only has ~4,000 Google hits, so it's not super widely known.)<p>More rigorously, Claude Opus gets 60% on GPQA, which very smart humans only get 34% on, even if you give them half an hour per question and full Internet access. It seems implausible that you could do that without some sort of reasoning:<p><a href="https://arxiv.org/pdf/2311.12022.pdf" rel="nofollow">https://arxiv.org/pdf/2311.12022.pdf</a>