I did a hobby RAG project a little while back, and I'll just share my experience here.<p>1. First ask the LLM to answer your questions without RAG. It is easy to do and you may be surprised (I was, but my data was semi-public). This also gives you a baseline to beat.<p>2. Chunking of your data needs to be smart. Just chunking every N characters wasn't especially fruitful. My data was a book, so it was hierarchal (by heading level). I would chunk by book section and hand it to the LLM.<p>3. Use the context window effectively. There is a weighted knapsack problem here, there are chunks of various sizes (chars/tokens) with various weightings (quality of match). If your data supports it, then the problem is also hierarchal. For example, I have 4 excellent matches in this chapter, so should I include each match, or should I include the whole chapter?<p>4. Quality of input data counts. I spent 30 minutes copy-pasting the entire book into markdown format.<p>This was only a small project. I'd be interested to hear any other thoughts/tips.