TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Including irrelevant documents improve RAG accuracy by 30%?

3 点作者 sourabh03agr超过 1 年前
I was very intrigued after reading &quot;The Power of Noise: Redefining Retrieval for RAG Systems&quot; paper (https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2401.14887.pdf), where they explored the impact of relevance, position and the number of retrieved documents on the accuracy of a RAG pipeline.<p>The authors defined 3 sets of documents: 1. Golden document - One which contains the answer to the given question. 2. Relevant documents - A set of documents that talk about the same topic but don&#x27;t contain the answer to the question. 3. Irrelevant documents - A set of documents that talk about different unrelated topics and, naturally, don&#x27;t contain the answer to the question.<p>Key takeaways:<p>1. More relevant documents = lower performance as the LLM gets confused. This challenges the general notion of adding top_k relevant documents to the context<p>2. The placement of the golden document matters: start &gt; end &gt; middle.<p>3. Surprisingly, adding irrelevant documents actually improved the model&#x27;s accuracy (as compared to the case where context is just the golden document). It would be interesting to validate it further on more powerful LLMs and other datasets, as this observation is highly counter-intuitive.<p>What do you think- Does the third observation make sense?

暂无评论

暂无评论