I've built an advanced RAG (Retrieval-Augmented Generation) pipeline from scratch to demystify the complex mechanics of modern LLM-powered Question Answering systems. This repository features:<p>-- An implementation of a sub-question query engine from scratch to answer complex user questions.<p>-- Illustrative explanations that unveil the inner workings of the system.<p>-- An analysis of the challenges I faced while working with the system, like prompt engineering and cost estimation.<p>-- Qualitative comparison with similar frameworks like LlamaIndex, offering a broader perspective.<p>Key Takeaway: While Modern QA pipelines with advanced RAG abstractions may seem complex, they are fundamentally powered by a series of LLM calls with meticulous prompt design. Hoping that this repository provides intuitive insights for building more robust and efficient RAG systems. All feedback is warmly welcomed!
This is a great README! It clearly breaks down some approaches to RAG. I also approciate how you strive to de-mystify what’s going on under the hood, which is in many ways VERY simple.<p>This seems very similar to LangSmith’s trace monitoring, which I have been leaning on heavily for observability. You also mention LlamaIndex— how do you see your project fitting into the ecosystem?<p>I don’t think I would able to use this yet because it is serial. Is it possible to non-serially issue independent sub-question queries?<p>In my experimental agent system, waggledance.ai[1], I have been working on a pre-agent step of picking and synthesizing the right context and tools[2] for a given subtask of a larger goal, and it seems to be boosting results. It looks like now I have to try sub-question answering in the mix as well.<p>[1] demo - <a href="https://waggledance.ai" rel="nofollow noreferrer">https://waggledance.ai</a><p>[2] relevant code sample - <a href="https://github.com/agi-merge/waggle-dance/blob/1b14163c24fd2c8f9689921ed2ef1e9451b00876/packages/agent/src/strategy/callExecutionAgent.ts#L175">https://github.com/agi-merge/waggle-dance/blob/1b14163c24fd2...</a>
As a researcher I've been interested in developing a RAG pipeline populated with research articles on my topic of study. Does it fit easily in the RAG approach to also return excerpts from the actual documents as to help me verify, at a glance the source and veracity of LLM outputs?
I love this write up. Thank you ! I’m looking for more resources like this - clear examples of composing LLMs into useful systems. Some of the cookbook examples in langchain, chainlit , etc have been useful too.
I’m really interested in content explaining how to navigate graphs of embedded items for Q/A. Any resources on how to do this or arguments for why it’s a bad approach?<p>For example, if my top K docs aren’t answering the question but each are linked to neighbors, I’d want to know some folk wisdom or tricks for structuring the neighbor graph to cheaply expand the set of useful results.
Pretty cool tutorial. As a side note, it is pretty hard to evaluate these pipelines for quality once you build them since there's not many standard practices yet given how new this all is. If it's helpful to anyone else, we built a free open source tool within my company that is basically a collection of premade metrics for determining the quality of these pipelines. <a href="https://github.com/TonicAI/tvalmetrics">https://github.com/TonicAI/tvalmetrics</a>