TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Frames: Factuality, Retrieval, and Reasoning MEasurement Set

3 点作者 adg297 个月前

1 comment

adg297 个月前
Evaluation dataset designed to test the capabilities of Retrieval-Augmented Generation (RAG) systems. Paper with details and experiments is available on arXiv: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2409.12941" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2409.12941</a>.<p>Dataset Overview 824 challenging multi-hop questions requiring information from 2-15 Wikipedia articles Questions span diverse topics including history, sports, science, animals, health, etc. Each question is labeled with reasoning types: numerical, tabular, multiple constraints, temporal, and post-processing Gold answers and relevant Wikipedia articles provided for each question<p>Key Features Tests end-to-end RAG capabilities in a unified framework Requires integration of information from multiple sources Incorporates complex reasoning and temporal disambiguation Designed to be challenging for state-of-the-art language models<p>Usage This dataset can be used to:<p>Evaluate RAG system performance Benchmark language model factuality and reasoning Develop and test multi-hop retrieval strategies