TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

PaperQA2 tops the RAG-QA Arena science benchmark

1 点作者 mskar3 个月前

1 comment

mskar3 个月前
We measured PaperQA2 (<a href="https:&#x2F;&#x2F;github.com&#x2F;Future-House&#x2F;paper-qa">https:&#x2F;&#x2F;github.com&#x2F;Future-House&#x2F;paper-qa</a>) against the science portion of the RAG-Arena benchmark (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2407.13998" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2407.13998</a>), it&#x27;s the first time we&#x27;ve compared PaperQA2 against other systems based on Cohere or Contextual.ai. PaperQA2 achieves a 12.4% higher score than Contextual.ai on the same dataset (1,404 questions and 1.7M documents).<p>We&#x27;re thrilled about this because it&#x27;s open source, and getting better every day -- check out the code to reproduce this result in our cookbook here: <a href="https:&#x2F;&#x2F;futurehouse.gitbook.io&#x2F;futurehouse-cookbook&#x2F;paperqa&#x2F;docs&#x2F;tutorials&#x2F;querying_with_clinical_trials" rel="nofollow">https:&#x2F;&#x2F;futurehouse.gitbook.io&#x2F;futurehouse-cookbook&#x2F;paperqa&#x2F;...</a>.