A summary of how it was created and leaderboard screenshot (by an author I presume):<p><a href="https://twitter.com/WenhuChen/status/1790597967319007564" rel="nofollow">https://twitter.com/WenhuChen/status/1790597967319007564</a><p>Part of the tweet: "We found that GPT-4o (71%) actually improves GPT-4-turbo (62%) by 9%! On the original MMLU, the improvement is only around 2%."