The company I work for has tons of documentation and regulations for several areas. In some areas the documents are well over a thousand and for the ease of use of these documents we build RAG based chat bots. This is why I have been playing with RAG systems on the scale of "build completely from scratch" to "connect the services in Azure". The retrieval part of a RAG is vital for good/reliable answers and if you build it naive, the results are not overwhelming.<p>You can improve on the retrieved documents in many ways, like
- by better chunking,<p>- better embedding,<p>- embedding several rephrased versions of the query,<p>- embedding a hypothetical answer to the prompt,<p>- hybrid retrieval (vector similarity + keyword/tfidf/bm25 related search),<p>- massively incorporating meta data,<p>- introducing additional (or hierarchical) summaries of the documents,<p>- returning not only the chunks but also adjacent text,<p>- re-ranking the candidate documents,<p>- fine tuning the LLM and much, much more.<p>However, at the end of the day a RAG system usually still has a hard time answering questions that require an overview of your data. Example questions are:<p>- "What are the key differences between the new and the old version of document X?"<p>- "Which documents can I ask you questions about?"<p>- "How do the regulations differ between case A and case B?"<p>In these cases it is really helpful to incorporate LLMs to decide how to process the prompt. This can be something simple like query-routing, or rephrasing/enhancing the original prompt until something useful comes up. But it can also be agents that come up with sub-queries and a plan on how to combine the partial answers. You can also build a network of agents with different roles (like coordinator/planner, reviewer, retriever, ...) to come up with an answer.<p>* edited the formatting