We're thrilled to announce the launch of DeepEval, an LLM evaluation and testing suite designed to work nicely within a CI/CD pipeline.<p>As more companies integrate LLM/RAG applications into their operations, ensuring the effectiveness, reliability and safety of these models is hard.<p>About DeepEval
We started with consulting on a few RAG projects and quickly realised how many issues came up when we iterated on our prompts, chunking methodologies, added function calls, added guardrails, etc. Very quickly we realised this had downstream effects that caused unexpected problems and results.<p>DeepEval, inspired by Pytest, aims to make iterating on these RAG and agent applications as easy as possible by building evaluation into part of their CI/CD workflow. The goal is to make deployment of LLMs as straightforward as getting all tests to pass.<p>Some features of DeepEval include:
- Opinionated tests for answer relevancy, factual consistency, toxicness, bias.
- Web UI to view tests, implementations, comparisons.
- Opinionated flow for synthetic dataset creation.<p>We are currently in a fully operational beta release and would love your feedback and suggestions to continue improving DeepEval.<p>We are happy to answer any questions you have!