Accuracy & Hallucinations is one of the MAIN challenge for adopting LLMs in production. Evaluations as part of CI/CD and in real-time are very good counter-measures.<p>While amazing techniques & libraries exist for this, there's little literature on how to use them in production. Tried writing a detailed blog that decodes the OpenAI's evals framework and goes step-by-step through how to use it to your advantage.<p>You can learn how to use the eval framework to evaluate models & prompts to optimise LLM systems for the best outputs.