We measured PaperQA2 (<a href="https://github.com/Future-House/paper-qa">https://github.com/Future-House/paper-qa</a>) against the science portion of the RAG-Arena benchmark (<a href="https://arxiv.org/abs/2407.13998" rel="nofollow">https://arxiv.org/abs/2407.13998</a>), it's the first time we've compared PaperQA2 against other systems based on Cohere or Contextual.ai. PaperQA2 achieves a 12.4% higher score than Contextual.ai on the same dataset (1,404 questions and 1.7M documents).<p>We're thrilled about this because it's open source, and getting better every day -- check out the code to reproduce this result in our cookbook here: <a href="https://futurehouse.gitbook.io/futurehouse-cookbook/paperqa/docs/tutorials/querying_with_clinical_trials" rel="nofollow">https://futurehouse.gitbook.io/futurehouse-cookbook/paperqa/...</a>.