I wonder how long before we start to acknowledge that AI labs are heavily gaming benchmarks and they are mostly useless as a way of judging model performance.<p>The latest one to be caught was Meta, but they've all been doing it for a while now.