Also, their metric table is very interesting. It shows Falcon 7B and OpenLlama 7B much less favorably than other evaluations (including the HuggingFace leaderboard, which I am kinda suspicious of), and instruct benchmarks like that aren't seen as much.