It seems to be a stretch to include some of the evaluation criteria under the heading of "transparency", in particular the risks and mitigations ones, as they wind up being more of an assessment of compliance with a particular political stance that is far from universal (the view that certain capabilities in foundational models present a "risk" that must be "mitigated"). Indeed, those two categories wind up carrying GPT-4, which is notoriously closed but run by a company that is arguably the champion of this stance, to an inappropriately high position in the ranking.<p>If this index is adopted as a de facto standard or target, I would be concerned about the incentives this creates.
Excuse me if this is inaccurate, but my impression is that not a single listed model is reproducible from data and code. It doesn't even seem to be considered by the authors.<p>Until that's the case, the idea of transparency itself is laughable.<p>Every single one of these models may be designed to behave in ways beneficial to the creators. Who knows what conceptual biases Facebook is trying to inject into the world with Llama2? It would be a brilliant way to advertise, and no one would ever be able to tell.