TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: DeepEval – Evaluation and Unit Testing for LLMs

18 pointsby jacky2wongover 1 year ago

4 comments

jacky2wongover 1 year ago
We&#x27;re thrilled to announce the launch of DeepEval, an LLM evaluation and testing suite designed to work nicely within a CI&#x2F;CD pipeline.<p>As more companies integrate LLM&#x2F;RAG applications into their operations, ensuring the effectiveness, reliability and safety of these models is hard.<p>About DeepEval We started with consulting on a few RAG projects and quickly realised how many issues came up when we iterated on our prompts, chunking methodologies, added function calls, added guardrails, etc. Very quickly we realised this had downstream effects that caused unexpected problems and results.<p>DeepEval, inspired by Pytest, aims to make iterating on these RAG and agent applications as easy as possible by building evaluation into part of their CI&#x2F;CD workflow. The goal is to make deployment of LLMs as straightforward as getting all tests to pass.<p>Some features of DeepEval include: - Opinionated tests for answer relevancy, factual consistency, toxicness, bias. - Web UI to view tests, implementations, comparisons. - Opinionated flow for synthetic dataset creation.<p>We are currently in a fully operational beta release and would love your feedback and suggestions to continue improving DeepEval.<p>We are happy to answer any questions you have!
评论 #37650097 未加载
vivanpuriover 1 year ago
Interesting repository - Wondering if there&#x27;s a Guardrail integration?
评论 #37650387 未加载
Superman928over 1 year ago
This is super helpful
alw3ysover 1 year ago
how is this different from langsmith?
评论 #37650371 未加载