TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DeepEval – Unit Testing for LLMs

79 pointsby jacky2wongalmost 2 years ago

5 comments

swyxalmost 2 years ago
lots of attempts at the llm eval game:<p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;BerriAI&#x2F;bettertest">https:&#x2F;&#x2F;github.com&#x2F;BerriAI&#x2F;bettertest</a> <a href="https:&#x2F;&#x2F;twitter.com&#x2F;ishaan_jaff&#x2F;status&#x2F;1665105582804832258" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;ishaan_jaff&#x2F;status&#x2F;1665105582804832258</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;AgentOps-AI&#x2F;agentops">https:&#x2F;&#x2F;github.com&#x2F;AgentOps-AI&#x2F;agentops</a><p>- <a href="https:&#x2F;&#x2F;www.ycombinator.com&#x2F;launches&#x2F;JFc-baserun-ai-ship-llm-features-with-confidence">https:&#x2F;&#x2F;www.ycombinator.com&#x2F;launches&#x2F;JFc-baserun-ai-ship-llm...</a><p>- <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36958175">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=36958175</a><p>- <a href="https:&#x2F;&#x2F;github.com&#x2F;promptfoo&#x2F;promptfoo">https:&#x2F;&#x2F;github.com&#x2F;promptfoo&#x2F;promptfoo</a><p>can you articulate what insight on the market you have that will make you stand out over time?
评论 #37160252 未加载
评论 #37158777 未加载
评论 #37159276 未加载
评论 #37158252 未加载
评论 #37158124 未加载
评论 #37164538 未加载
评论 #37157848 未加载
jacky2wongalmost 2 years ago
Thanks for the enormous amount of interest and questions in this post. I wanted to make a follow-up comment to answer all the questions above about differentiation, where we see issues in the current solution space and clarify the problem we are trying to tackle.<p>Starting with the problem - the problem we are trying to tackle is to make iteration of LLMs and Agent applications as easy as possible for data teams - not for people to quickly edit and compare prompts (although that is 1 way of doing it). There are a number of solutions out there to help you test prompts etc. but many fail to fit within the data team&#x27;s workflow (which consists of tooling like Pytest and CLI-first approaches) and do not make it easy to iterate and launch on things like agents.<p>From conducting 30+ interviews with ML Engineers and data scientists building in this space - all of them want to first build a LangChain agent&#x2F;RAG pipeline and then build their own internal version of it (due to the fact that LangChain is quick to set up but lacks tooling). A lot of them are encountering issues with developing the right evaluation infrastructure (which DeepEval aims to solve through synthetic data creation and easy-to-use testing tools).<p>Our product roadmap is to not only build the initial unit testing for LLM but also to make it easy for other developers and other MLEs to quickly iterate off this. This means long-term, our plan is to ensure that our users and customers are able to build the best agents&#x2F;LLM solutions possible.<p>And (to be frank) a lot of the existing solutions aren&#x27;t the best looking, have limited visualisations (or are just dead)
评论 #37160694 未加载
adamgordonbellalmost 2 years ago
Microsoft Guidance integration?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;guidance-ai&#x2F;guidance">https:&#x2F;&#x2F;github.com&#x2F;guidance-ai&#x2F;guidance</a>
评论 #37160303 未加载
Eddygandralmost 2 years ago
The docs for this aren&#x27;t great - leaked api keys, code doesn&#x27;t run, etc...
评论 #37158837 未加载
azzarcheralmost 2 years ago
How is this standing out from <a href="https:&#x2F;&#x2F;benchllm.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;benchllm.com&#x2F;</a>?
评论 #37158180 未加载