TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
The most widely used benchmarks for evaluating LLMs
1 点
作者
kavaivaleri
大约 1 年前
Commonsense Reasoning - HellaSwag - Winogrande - PIQA - SIQA - OpenBookQA - ARC - CommonsenseQA<p>Logical Reasoning - MMLU - BBHard<p>Mathematical Reasoning - GSM-8K - MATH - MGSM - DROP<p>Code Generation - HumanEval - MBPP<p>World Knowledge & QA - NaturalQuestions - TriviaQA - MMMU - TruthfulQA<p>I collected their descriptions and links to their original papers here: https://www.turingpost.com/p/llm-benchmarks
1 comment
andy99
大约 1 年前
I've never been able to click on a Turingpost link, they all give an SSL error...