When I first heard about RAG (Retrieval-Augment Generation) I thought it’s some sophisticated architecture injection change to LLMs that allows you to add more information to the different nodes and attention heads of an LLM. Shortly after, I learned that it’s actually laughably simple. I’m surprised it has its own separate name!<p>It boils down to 5 steps:
1. Create a representation of all the possible information (text) you’d like to be considered for your question. [info-representation]<p>2. Create a representation of the question being asked. [question-representation]<p>3. Find the top N info-representations most similar to your question-representation.<p>4. Feed all of the information (text) from the top N representations into your LLM of choice (e.g. OpenAI GPT4o) along with the question.<p>5. And Voila! Your model will give you an answer given the context you’ve added.<p>It could almost be called “Expand your LLM prompt with more context”.
You’re glossing over a lot of details here, which is where most of the pain is.<p>Properly chunking the data, handling non-standard text formatting in source documents, not even having OCR’d text in source documents, having disparate indexes available per client, minimizing hallucinations even with properly context data, and more.