One point the article argues is that with 1M+ token context windows, we shouldn't use RAG to include chunks from a larger collection -- just include everything! This is a very important open question to me, as I have been building a fairly nuanced RAG pipeline for AI coding.<p>With GPT-4 Turbo, it is definitely NOT a good idea to just throw lots of extraneous code into the 128k content window. That distracts and confuses GPT, makes it code worse and less able to follow complex system prompts. You get much better results if you curate a smaller portion of the code that is relevant to the task at hand.<p>I am really interested to find out if Gemini has these same tendencies. If so, quality RAG is going to remain valuable for curating the context. If not, then Gemini would have a huge advantage over GPT-4. It would be really valuable to be able to naively harness such a large context window.
I think this kind of haystack recall is going to be extremely valuable to large enterprises with massive volumes of standards/practices/technical documents.<p>The status quo is that you have employees ingest and internalize this material over years of employment, then have those experienced employees manage large projects by telling the project if they're in compliance with the organization's standards.<p>A reliable recall system like this isn't just a drop in replacement for <i>new</i> employees, its a drop in replacement for a lot of what makes the <i>expensive</i> employees important: the backwards-and-forwards knowledge of the standards/practices.
can someone explain me, maybe with an example, what this part means:<p>> 6) A very important and at the same time difficult class of solutions is model result validation<p>Thanks.