One point the article argues is that with 1M+ token context windows, we shouldn't use RAG to include chunks from a larger collection -- just include everything! This is a very important open question to me, as I have been building a fairly nuanced RAG pipeline for AI coding.<p>With GPT-4 Turbo, it is definitely NOT a good idea to just throw lots of extraneous code into the 128k content window. That distracts and confuses GPT, makes it code worse and less able to follow complex system prompts. You get much better results if you curate a smaller portion of the code that is relevant to the task at hand.<p>I am really interested to find out if Gemini has these same tendencies. If so, quality RAG is going to remain valuable for curating the context. If not, then Gemini would have a huge advantage over GPT-4. It would be really valuable to be able to naively harness such a large context window.