TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Launch HN: Openlayer (YC S21) – Testing and Evaluation for AI

94 点作者 rishramanathan超过 1 年前
Hey HN, Rish, Vikas and Gabe here. We&#x27;re building Openlayer (<a href="https:&#x2F;&#x2F;www.openlayer.com&#x2F;">https:&#x2F;&#x2F;www.openlayer.com&#x2F;</a>), an observability platform for AI. We&#x27;ve developed comprehensive testing tools to check both the quality of your input data and the performance of your model outputs.<p>The complexity and black-box nature of AI&#x2F;ML have made rigorous testing a lot harder than it is in most software development. Consequently, AI development involves a lot of head-scratching and often feels like walking in the dark. Developers need reliable insights into how and why their models fail. We&#x27;re here to simplify this for both common and long-tail failure scenarios.<p>Consider a scenario in which your model is working smoothly. What happens when there&#x27;s a sudden shift in user behavior? This unexpected change can disrupt the model&#x27;s performance, leading to unreliable outputs. Our platform offers a solution: by continuously monitoring for sudden data variations, we can detect these shifts promptly. That&#x27;s not all though – we’ve created a broad set of rigorous tests that your model, or agent, must pass. These tests are designed to challenge and verify the model&#x27;s resilience against such unforeseen changes, ensuring its reliability under diverse conditions.<p>We support seamlessly switching between (1) development mode, which lets you test, version, and compare your models before you deploy them to production, and (2) monitoring mode, which lets you run tests live in production and receive alerts when things go sideways.<p>Say you&#x27;re using an LLM for RAG and want to make sure the output is always relevant to the question. You can set up hallucination tests, and we&#x27;ll buzz you when the average score dips below your comfort zone.<p>Or imagine you&#x27;re managing a fraud prediction model and are losing sleep over false negatives. Openlayer offers a two-step solution. First, it helps pinpoint why the model misses certain fraudulent data points using debugging tools such as explainability. Second, it enables converting these identified cases into targeted tests. This allows you to deep dive into tackling specific incidents, like fraud within a segment of US merchants. By following this process, you can understand your model&#x27;s behavior and refine it to capture future fraudulent cases more effectively.<p>The MLOps landscape is currently fragmented. We’ve seen countless data and ML teams glue together a ton of bespoke and third-party tools to meet basic needs: one for experiment tracking, another for monitoring, and another for CI automation and version control. With LLMOps now thrown into the mix, it can feel like you need yet <i>another</i> set of entirely new tools.<p>We don’t think you should, so we&#x27;re building Openlayer to condense and simplify AI evaluation. It’s a collaborative platform that solves long-standing ML problems like the ones above, while tackling the new crop of challenges presented by Generative AI and foundation models (e.g. prompt versioning, quality control). We address these problems in a single, consistent way that doesn&#x27;t require you to learn a new approach. We’ve spent a lot of time ensuring our evaluation methodology remains robust even as the boundaries of AI continue to be redrawn.<p>We&#x27;re stoked to bring Openlayer to the HN community and are keen to hear your thoughts, experiences, and insights on building trust into AI systems.

13 条评论

shahargl超过 1 年前
how is it different from Traceloop and openllmetry (<a href="https:&#x2F;&#x2F;github.com&#x2F;traceloop&#x2F;openllmetry">https:&#x2F;&#x2F;github.com&#x2F;traceloop&#x2F;openllmetry</a>)?
评论 #38533092 未加载
评论 #38534003 未加载
评论 #38534638 未加载
评论 #38532801 未加载
hoerzu超过 1 年前
How does it compare to other platforms like: <a href="https:&#x2F;&#x2F;rungalileo.io" rel="nofollow noreferrer">https:&#x2F;&#x2F;rungalileo.io</a> Or <a href="https:&#x2F;&#x2F;lilacml.com" rel="nofollow noreferrer">https:&#x2F;&#x2F;lilacml.com</a>
评论 #38540391 未加载
nextworddev超过 1 年前
Hmm YC 21- so they pivoted into this after 2 years doing something different?
评论 #38533491 未加载
评论 #38536776 未加载
评论 #38532856 未加载
glial超过 1 年前
Awesome idea. I&#x27;m curious how comprehensive your set of evaluations is. For example, how does it compare to OpenAI Evals? Could I import evaluations from there? Add my own?
评论 #38539041 未加载
jwoodbridge超过 1 年前
nice to see this launch - i was waiting until they had a JS native library, but we’ve been using it since and it covers everything we need
评论 #38534240 未加载
jofer超过 1 年前
Just FYI, &quot;openlayers&quot; is the name of a widely used open source web mapping frontend library. There&#x27;s a possibility for some confusion there.<p><a href="https:&#x2F;&#x2F;openlayers.org&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;openlayers.org&#x2F;</a>
rgbrgb超过 1 年前
congrats on the product, looks great. what model formats are supported?
评论 #38539569 未加载
amtambe超过 1 年前
Curious how well this works &#x2F; how it would work if users are not directly interacting with the LLM!
评论 #38552803 未加载
la64710超过 1 年前
Using Open in a name has become a hype.
skadamat超过 1 年前
Congrats! FYI your link rendering seems funky and doesn&#x27;t seem to be clickable?
评论 #38532714 未加载
verdverm超过 1 年前
No github*, no pricing, both likely to be issues on HN<p>*ok, there is a gallery project, but something like this I would expect to be the open source variety of startups. I very much expect something like this to be open core.
评论 #38533330 未加载
Bnjoroge超过 1 年前
big fan of openlayer since rdv!
solardev超过 1 年前
This is really going to confuse people searching for OpenLayers, a major web mapping package :(<p><a href="https:&#x2F;&#x2F;openlayers.org&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;openlayers.org&#x2F;</a><p>It has an API with class names like &quot;Observable&quot;, and there are frequent discussions on inputs and performance. It&#x27;s gonna make searching for one or the other really hard...