The retrieval part of RAG is usually vector search; but it doesn't have to be. Or at least not exclusively.<p>I've worked with various search backends for about 20 years. People treat vector search like magic pixie dust but the reality is that it's not that great unless you heavily tune your models to your use cases. A well tuned manually crafted query goes a long way.<p>Pretty much any system I've built over the last few years, the best way to think about search is about building a search context that includes anything relevant to answering the user's question. The user's direct input is only a small part of that. In the case of mobile systems, the user entered query is actually typically a very minor part of it. People type two or three letters and then expect magic to happen. Vector search is completely useless in situations like that. Why does search on mobile work anyway? Because of everything else we know to create a query (user location, time zone, locale, past searches, preferences, etc.)<p>RAG isn't any different. It's just search where the search results are post processed by an LLM with whatever the user typed. The better the query and retrieval, the better the result. The LLM can't rescue a poorly tuned search. But it can dig through a massive result of search results and extract key points.
This has been my thinking as well: the natural language interface is <i>amazing</i> and something we've been wanting for some time.<p>The generation is a showy gimmick.<p>So why aren't we separating the useful bit out? My sneaking suspicion is that we can't. It's a package deal in that there are no two parts, it's just one big soup of free-associating some text with other text, the stochastic parrot.<p>LLM do not understand. They generate associated text that looks like an answer to the question. In order to separate the two parts, we'd need LLMs that understand.<p>That, apparently, is a lot harder.
> <i>There is a lot of excitement around retrieval augmented generation or “RAG.” Roughly the idea is: some of the deficiencies in current generative AI or large language models (LLMs) can be papered over by augmenting their hallucinations with links, references, and extracts from definitive source documents. I.e.: knocking the LLM back into the lane.</i><p>This seems like a misunderstanding of what RAG is. RAG is not used to try to anchor to reality a general LLM by somehow making it come up with sources and links. RAG is a technology to augment search engines with vector search and, yes, a natural language interface. This concerns, typically, "small' search engines indexing a specific corpus. It lets them retrieve documents or document fragments that do not contain the terms in the search query, but that are conceptually similar (according to the encoder used).<p>RAG isn't a cure for ChatGPT's hallucinations, at all. It's a tool to improve and go past inverted indexes.
I don't think the author completely understands RAG, and this article is a bit disconnected and unclear to me. Google already provides a "flexible natural language query interface".<p>I think ironically it would be fairly trivial to build what he wants _using_ RAG.<p>1. Accept a natural language query, like ChatGPT et al already do<p>2. Ask an LLM to rephrase it in N different ways, optimized for Google searches<p>3. Scrape the top M pages for each of the output of 2 in parallel. You now have dozens of search results<p>4. Clean and vectorize all of these<p>5. Use either vector similarity or an LLM to return the best matching snippets for the original query from 1, constrained to stuff contained in 4.<p>It would take a little longer than a ChatGPT or Google response, but I can see the appeal too.
The article would be more convincing if they showed the recipe from the book so we could compare it with the one that ChatGPT output.<p>From a Google search, it looks like he's right about the poor accuracy. It gets the basic idea of the ingredients, but is not really accurate. And is initially wrong about the region.<p>But actually, this is what RAG is for. You would typically do a vector search for something similar to the question about "rice baked in an egg mixture". And assuming it found a match on the real recipe or on a few similar possibilities, feed those into the prompt for the LLM to incorporate.<p>So if you have a well indexed recipe database and large context window to include multiple possible matches, then RAG would probably work perfectly for this case.
We use the term "pre-googling" for this sort of "information retrieval". You might have some concept in your head and you want to know the exact term for it, once you get the term you're looking for from LLM you'll move to Google and search the "facts".<p>This might be a weird example for native english speakers but recently I just couldn't remember the term for graph where you're allowed to move in one direction and cannot do loops. LLM gave me the answer (directed acyclic graph or DAG)right away. Once I got the term I was looking for I moved on to Google search.<p>Same "pre-googling" works if you don't know if some concept exits.
Isn't the point of RAGs to make (in this example) actual recipe databases accessible to the LLM? Wouldn't it get <i>closer</i> to the articles stated goal of getting the actual recipie?
I think complaints like this show just how amazing AI is getting. This person really expected that ChatGPT would single-shot give them this obscure recipe that took them a ton of effort to find themselves. Current AI can do so much, that people lament that it can't do everything. It is incredible to me when people bring up bad AI generated legal fillings... like people actually expect it to already to all of the work of an attorney, without error.
> I wanted the retrieval of a good recipe, not an amalgam of “things that plausibly look like recipes.”<p>And that's the core issue with AI. It is not meant to give you answers, but to construct output that looks like an answer. How is that useful I fail to understand.
I continue to hold the strong position that calling LLMs without injecting source truths is pointless.<p>LLMs are exceptionally powerful as a reasoning engine. It’s useless as a source of truths or facts.<p>We have chat bots, chat bots with automatic RAG etc. After the initial excitement wears off, you’re going to want a way to inspect and adjust the source queries yourself. In this case, being able to select what to search for in Google might be a good way for the cooking recipe usecase.
I do not understand the point of this article.<p>You did not even tell us what the correct recipe was called. But let's ignore that for now.<p>I did some googling around, and the Wikipedia article for sartu di riso [0] mentions the fact that "it (the dish) found success in Sicily". Also, in [1] a commenter going by the name of passalamoda mentions that they also make this dish in Sicily. The comment they wrote is in Italy, but Frank Fariello has translated it for us, or if you don't believe him for whatever reason, google translate does a fine job. All of this is to say that associating this dish with Sicily based on the short description you've given, is not far fetched <i>at all</i>.<p>> I fail to see how an LLM summarizing the material would be an improvement.<p>I am fairly confident that typing the question you gave ChatGPT, waiting a few seconds for an answer, and then reading it can easily take under a minute. Lets be lenient, and say it takes 5 minutes to also ask a second question and receive the answer. That would still take way way <i>way</i> less time then to find a book, get the book and go through the book to find the correct recipe.<p>Also, you yourself have given a reason in your article as to why ChatGPT would be an improvement. I will quote it now:<p>> Most of her time was dealing with the brittleness of the query interface (depending on word matching and source popularity), spam, and locked down sources.<p>I have already spent way too much time on debunking a random internet article, but I also decided to try to get an answer from ChatGPT. I did that by continuing to ask it questions that a person looking for an answer, not contradictions, would ask. If we make the assumption, that deanputney is correct, and the dish you were looking for is Arancini al Burru, we are able to get an answer from ChatGPT by asking the very simple and natural question shown in [2].<p>[0] <a href="https://en.wikipedia.org/wiki/Sart%C3%B9_di_riso" rel="nofollow">https://en.wikipedia.org/wiki/Sart%C3%B9_di_riso</a><p>[1] <a href="https://memoriediangelina.com/2013/01/21/sartu-di-riso-neapolitan-rice-timbale/" rel="nofollow">https://memoriediangelina.com/2013/01/21/sartu-di-riso-neapo...</a><p>[2] <a href="https://imgur.com/a/PKqGXyK" rel="nofollow">https://imgur.com/a/PKqGXyK</a>
… you really want to be able to cook documents down to facts, as in the old A.I., and then be able to make logical queries. Trouble is it is easy to ontologize some things (ingredients) but not so easy to ontologize the aspects of things that make things memorable.
Suggested book about traditional Sicilian dishes: <a href="https://www.amazon.it/Profumi-Sicilia-libro-cucina-siciliana/dp/8886803737" rel="nofollow">https://www.amazon.it/Profumi-Sicilia-libro-cucina-siciliana...</a>
Is that ChatGPT example representative of RAG? I thought ChatGPT was primarily generative.<p>I think of something like Brave Search's AI feature when I think of RAG.
To keep control in the hands of the analyst, we've been working on UX's over agentic neurosymbolic RAG in louie.ai --<p>Ex: "search for login alerts from the morning, and if none, expand to the full day"<p>That requires generating a one-shot query combining semantic search + symbolic filters, and an LLM-reasoned agentic loop recovering if it turns up not enough such as a poorly formed query around 'login alerts' and the user's trigger around 'if none'<p>Likewise, unlike Disneyified consumer tools like chatgpt and perplexity that are designed to hide what is happening, we work with analysts who need visibility and control. That means designing search so subqueries and decisions flow back to the user in an understandable way: they need to inspect what is happening and be confident they missed nothing, and edit via natural language or their own queries when they want to proceed<p>Crazy days!
Clearly this author doesn't know what RAG is. RAG would be if he first did a 'retrieval' of all his mother's cookbooks containing Italian recipes, then reviewed the index for rice, scanned and OCRd those pages. That data would be submitted to ChatGPT with the query to 'augment' it within the constraints of the context window so that ChatGPT could generate a response with the highly relevant cookbook info.
Your input was incredibly low effort and prompted a very low effort output.<p>I took part of your blog post (which you clear were willing to put a few more tokens into) -<p>"My mother remembers growing up with a sicilian dish that was primarily rice baked in an egg mixture. Roughly a "rice frittata". What are some distinctly Sicilian dishes that this could be referring to?"<p>Notice there is not much extra context that you've offered any of us, either the LLM or us. You didn't even tell us what the recipe was...<p>How was the dish served, what did it look like?<p>What are you expecting of the LLM here? It not a psychic AGI.
Querying knowledge is not a nail, but the hammer is generation. It is written on the tin, "generation" AI. People want "insight", "summary", "workflow automation", "code completion" from a guided proverbial monkey hammering on the keyboard hoping that our problem will become a nail. It is getting closer though
Thanks for sharing your thoughts! I published a blog entry that describes our solution to this exact problem yesterday. We call it "Generation Augmented Retrieval (GAR)", please find the details here: <a href="https://blog.luk.sh/rag-vs-gar" rel="nofollow">https://blog.luk.sh/rag-vs-gar</a>
I think the author is over generalising things. Every tech is good for few things and not good for others.<p>Vector Search, LLMs all are revolutionary technologies and have their own limitations but we should not form a bias on the basis of few edge cases.
An interface for vector search is 100x more helpful to me than an LLM spitting out the same content as slop.<p>The key to vector search is how you chunk your data, but I have some libraries to help with that
> To me, LLM query management and retrieval is much more valuable than response generation<p>Sure, for certain tasks. For other tasks retrieval is less useful.
lol the thing they’re asking for is literally rag. “I want a smart person to parse relevant information and return a concise and relevant answer”
Take the query, find the book, dump the book into the context window and the LLM’s answer will be exactly what you want.