TechEcho

5 comments

swyxalmost 2 years ago

lots of attempts at the llm eval game:- <a href="https://github.com/BerriAI/bettertest">https://github.com/BerriAI/bettertest</a> <a href="https://twitter.com/ishaan_jaff/status/1665105582804832258" rel="nofollow noreferrer">https://twitter.com/ishaan_jaff/status/1665105582804832258</a>- <a href="https://github.com/AgentOps-AI/agentops">https://github.com/AgentOps-AI/agentops</a>- <a href="https://www.ycombinator.com/launches/JFc-baserun-ai-ship-llm-features-with-confidence">https://www.ycombinator.com/launches/JFc-baserun-ai-ship-llm...</a>- <a href="https://news.ycombinator.com/item?id=36958175">https://news.ycombinator.com/item?id=36958175</a>- <a href="https://github.com/promptfoo/promptfoo">https://github.com/promptfoo/promptfoo</a>can you articulate what insight on the market you have that will make you stand out over time?

评论 #37160252 未加载

评论 #37158777 未加载

评论 #37159276 未加载

评论 #37158252 未加载

评论 #37158124 未加载

评论 #37164538 未加载

评论 #37157848 未加载

jacky2wongalmost 2 years ago

Thanks for the enormous amount of interest and questions in this post. I wanted to make a follow-up comment to answer all the questions above about differentiation, where we see issues in the current solution space and clarify the problem we are trying to tackle.Starting with the problem - the problem we are trying to tackle is to make iteration of LLMs and Agent applications as easy as possible for data teams - not for people to quickly edit and compare prompts (although that is 1 way of doing it). There are a number of solutions out there to help you test prompts etc. but many fail to fit within the data team's workflow (which consists of tooling like Pytest and CLI-first approaches) and do not make it easy to iterate and launch on things like agents.From conducting 30+ interviews with ML Engineers and data scientists building in this space - all of them want to first build a LangChain agent/RAG pipeline and then build their own internal version of it (due to the fact that LangChain is quick to set up but lacks tooling). A lot of them are encountering issues with developing the right evaluation infrastructure (which DeepEval aims to solve through synthetic data creation and easy-to-use testing tools).Our product roadmap is to not only build the initial unit testing for LLM but also to make it easy for other developers and other MLEs to quickly iterate off this. This means long-term, our plan is to ensure that our users and customers are able to build the best agents/LLM solutions possible.And (to be frank) a lot of the existing solutions aren't the best looking, have limited visualisations (or are just dead)

评论 #37160694 未加载

adamgordonbellalmost 2 years ago

Microsoft Guidance integration?<a href="https://github.com/guidance-ai/guidance">https://github.com/guidance-ai/guidance</a>

评论 #37160303 未加载

Eddygandralmost 2 years ago

The docs for this aren't great - leaked api keys, code doesn't run, etc...

评论 #37158837 未加载

azzarcheralmost 2 years ago

How is this standing out from <a href="https://benchllm.com/" rel="nofollow noreferrer">https://benchllm.com/</a>?

评论 #37158180 未加载

5 comments

swyxalmost 2 years ago

评论 #37160252 未加载

评论 #37158777 未加载

评论 #37159276 未加载

评论 #37158252 未加载

评论 #37158124 未加载

评论 #37164538 未加载

评论 #37157848 未加载

jacky2wongalmost 2 years ago

评论 #37160694 未加载

adamgordonbellalmost 2 years ago

Microsoft Guidance integration?<a href="https://github.com/guidance-ai/guidance">https://github.com/guidance-ai/guidance</a>

评论 #37160303 未加载

Eddygandralmost 2 years ago

The docs for this aren't great - leaked api keys, code doesn't run, etc...

评论 #37158837 未加载

azzarcheralmost 2 years ago

How is this standing out from <a href="https://benchllm.com/" rel="nofollow noreferrer">https://benchllm.com/</a>?

评论 #37158180 未加载

DeepEval – Unit Testing for LLMs

5 comments

DeepEval – Unit Testing for LLMs

5 comments