From a consumer perspective, this is a super interesting paper because it touches on one of the fundamental issues with most RAG beyond the toy case - that you need to do different stuff depending on what the user is asking for. You also (usually) can't just ask because most users don't know that LLMs are bad at math or semantic search won't be sufficient to answer questions that involve enumeration or totality. And while you can always add more steps to your RAG pipeline, some of those steps may be computationally expensive or not particularly relevant to the question at hand.<p>That being said, it is a bit frustrating that so much RAG research focuses on multi-hop approaches with LLMs. IME multiple round trips to an LLM is essentially a non-starter for any serious consumer product as it's far too slow. Smaller models can struggle to follow instructions so they often can't be an adequate replacement even for simpler tasks. Curious to hear if other folks working in this space have had any success thinking critically about these types of problems!
This seems similar to building a RAG router (1) to perform dynamic retrieval/querying over data.<p>After getting hundreds of questions on my Interactive Resume AI chatbot (2), I've found the user queries can be categorized as: greeting, professional skills question, professional experience question, personal/hobby question and common interview question.<p>I am currently working on building a RAG router to help improve the quality of Q&A responses. I currently use gpt3.5 turbo without any special RAG techniques and the quality is lacking on performing Q&A over my resume and Q&A csv file. GPT4 works well but is too expensive.<p>1. <a href="https://docs.llamaindex.ai/en/stable/examples/low_level/router/" rel="nofollow">https://docs.llamaindex.ai/en/stable/examples/low_level/rout...</a>
2. <a href="https://jon-olson.com/resume_ai" rel="nofollow">https://jon-olson.com/resume_ai</a>