TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Ragas – Open-source library for evaluating RAG pipelines

121 点作者 shahules大约 1 年前
Ragas is an open-source library for evaluating and testing RAG and other LLM applications. Github: <a href="https:&#x2F;&#x2F;docs.ragas.io&#x2F;en&#x2F;stable&#x2F;">https:&#x2F;&#x2F;docs.ragas.io&#x2F;en&#x2F;stable&#x2F;</a>, docs: <a href="https:&#x2F;&#x2F;docs.ragas.io&#x2F;">https:&#x2F;&#x2F;docs.ragas.io&#x2F;</a>.<p>Ragas provides you with different sets of metrics and methods like synthetic test data generation to help you evaluate your RAG applications. Ragas started off by scratching our own itch for evaluating our RAG chatbots last year.<p><i>Problems Ragas can solve</i><p>- How do you choose the best components for your RAG, such as the retriever, reranker, and LLM?<p>- How do you formulate a test dataset without spending tons of money and time?<p>We believe there needs to be an open-source standard for evaluating and testing LLM applications, and our vision is to build it for the community. We are tackling this challenge by evolving the ideas from the traditional ML lifecycle for LLM applications.<p><i>ML Testing Evolved for LLM Applications</i><p>We built Ragas on the principles of metrics-driven development and aim to develop and innovate techniques inspired by state-of-the-art research to solve the problems in evaluating and testing LLM applications.<p>We don&#x27;t believe that the problem of evaluating and testing applications can be solved by building a fancy tracing tool; rather, we want to solve the problem from a layer under the stack. For this, we are introducing methods like automated synthetic test data curation, metrics, and feedback utilisation, which are inspired by lessons learned from deploying stochastic models in our careers as ML engineers.<p>While currently focused on RAG pipelines, our goal is to extend Ragas for testing a wide array of compound systems, including those based on RAGs, agentic workflows, and various transformations.<p>Try out Ragas here <a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;shahules786&#x2F;openai-cookbook&#x2F;blob&#x2F;ragas&#x2F;examples&#x2F;evaluation&#x2F;ragas&#x2F;openai-ragas-eval-cookbook.ipynb" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;github&#x2F;shahules786&#x2F;openai-...</a> in Google Colab. Read our docs - <a href="https:&#x2F;&#x2F;docs.ragas.io&#x2F;">https:&#x2F;&#x2F;docs.ragas.io&#x2F;</a> to know more<p>We would love to hear feedback from the HN community :)

9 条评论

swyx大约 1 年前
congrats on launching! i think my continuing struggle with looking at Ragas as a company&#x2F;library rather than a very successful mental model is that the core of it is like 8 metrics (<a href="https:&#x2F;&#x2F;github.com&#x2F;explodinggradients&#x2F;ragas&#x2F;tree&#x2F;main&#x2F;src&#x2F;ragas&#x2F;metrics">https:&#x2F;&#x2F;github.com&#x2F;explodinggradients&#x2F;ragas&#x2F;tree&#x2F;main&#x2F;src&#x2F;ra...</a>) that are each 1-200 LOC. i can inline that easily in my app and retain full control, or model that in langchain or haystack or whatever.<p>why is Ragas a library and a company, rather than an overall &quot;standard&quot; or philosophy (eg like Heroku&#x27;s 12 Factor Apps) that could maybe be more universally adopted without using the library?<p>(just giving an opp to pitch some underappreciated benefits of using this library)
评论 #39783656 未加载
评论 #39787242 未加载
dataexporter大约 1 年前
Based on our initial analysis with RAGAS a few months ago, it didn&#x27;t provide the results that our team was expecting. Required a lot of customisation on top of it. Nevertheless a pretty solid library.
评论 #39782741 未加载
pawanapg大约 1 年前
Also check out DeepEval... our team has been using it for a while, and it&#x27;s been working well for us because we can evaluate any LLMs, something this library doesn&#x27;t seem to support (<a href="https:&#x2F;&#x2F;github.com&#x2F;confident-ai&#x2F;deepeval">https:&#x2F;&#x2F;github.com&#x2F;confident-ai&#x2F;deepeval</a>).
评论 #39784622 未加载
AndrewCook71大约 1 年前
This is nice, we&#x27;ve got more Open Source LLM Evaluation Libraries coming in more often.<p>We&#x27;re using DeepEval (<a href="https:&#x2F;&#x2F;github.com&#x2F;confident-ai&#x2F;deepeval">https:&#x2F;&#x2F;github.com&#x2F;confident-ai&#x2F;deepeval</a>) currently. How is this different from that?
评论 #39792450 未加载
redskyluan大约 1 年前
Great product and great progress.<p>The first step to build rage is always to evaluate.<p>Except all the current evaluations, cost and perf should also be part of evaluations
评论 #39793465 未加载
rhogar大约 1 年前
Congratulations on the launch! Personally would love to see a rough estimates of the expected number of requests and tokens required to run tasks like synthetic data generation for different amounts of data. Though this is likely highly variable, would like to have a loose idea of possible incurred costs and execution time.
评论 #39784113 未加载
nkko大约 1 年前
Congratulations on the launch of Ragas! This looks like an incredibly valuable tool for the LLM community. As the library continues to evolve, it will be interesting to see how it adapts to handle the growing diversity of LLM architectures and use cases.
评论 #39784125 未加载
jfisher4024大约 1 年前
Congratulations on the launch! I was unable to use this library: I was trying to evaluate different non-OpenAI models and it consistently failed due to malformed JSONs coming from the model.<p>Any thoughts about using different models? Is this just a langchain limitation?
评论 #39783805 未加载
评论 #39788104 未加载
retrovrv大约 1 年前
Phenomenal to see how Ragas has progressed. Congratulations on the launch
评论 #39783699 未加载