Say I have a huge corpus of unstructured data. What's the most effective way to get a model that can product great answers on that data?<p>Is RAG is still the way to go? Should one fine tune the model on top of that data as well?<p>It seems that getting RAG to work well requires a lot of optimization. Are there many drag n drop solutions that work well? I know the open AI assistant API has a built-in knowledge retrieval, anyone has experience how good that is compared to other methods?<p>or is it better to pre train a custom model and instruct train it?<p>Would love to know what you guys are all doing!
Fine-tuning is not a good approach to integrating new knowledge into an LLM. It's a good way to drive the direction of the LLM's style, structure of responses (e.g., length, format).<p>I'd say RAG is still very much the way to go. What you need to then do is optimize how you chunk and embed data into the RAG database. Pinecone has a good post on this[1] and I believe others[2] are working on more automated solutions.<p>If you want a more generalized idea here, what state of the art (SOTA) models seems to be doing is using a more general "second brain" for LLMs to obtain information. This can be in the form of RAG, as per above, or in the form of more complex and rigorous models. For example, AlphaGeometry[3] uses an LLM combined with a geometry theorem prover to find solutions to problems.<p>[1] <a href="https://www.pinecone.io/learn/chunking-strategies/" rel="nofollow">https://www.pinecone.io/learn/chunking-strategies/</a><p>[2] <a href="https://unstructured.io/" rel="nofollow">https://unstructured.io/</a><p>[3] <a href="https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/" rel="nofollow">https://deepmind.google/discover/blog/alphageometry-an-olymp...</a>
I think there is a lot of ground to cover in "RAG". Most of the demos or tutorials online seem to simply use a vector database to retrieve similar documents according to a cosine distance.<p>I'm now working on a "hybrid" search combining lexical and semantic search, using an LLM to translate a user message into a search query to retrieve data.<p>As far as I know, there's not a "standard", the field keeps moving and there are no simple answers.