I was very intrigued after reading "The Power of Noise: Redefining Retrieval for RAG Systems" paper (https://arxiv.org/pdf/2401.14887.pdf), where they explored the impact of relevance, position and the number of retrieved documents on the accuracy of a RAG pipeline.<p>The authors defined 3 sets of documents:
1. Golden document - One which contains the answer to the given question.
2. Relevant documents - A set of documents that talk about the same topic but don't contain the answer to the question.
3. Irrelevant documents - A set of documents that talk about different unrelated topics and, naturally, don't contain the answer to the question.<p>Key takeaways:<p>1. More relevant documents = lower performance as the LLM gets confused. This challenges the general notion of adding top_k relevant documents to the context<p>2. The placement of the golden document matters: start > end > middle.<p>3. Surprisingly, adding irrelevant documents actually improved the model's accuracy (as compared to the case where context is just the golden document). It would be interesting to validate it further on more powerful LLMs and other datasets, as this observation is highly counter-intuitive.<p>What do you think- Does the third observation make sense?