TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Helicone (YC W23) – OSS LLM Observability and Development Platform

29 pointsby justintorre754 months ago
Hey HN, we&#x27;re Justin and Cole, the founders of Helicone (<a href="https:&#x2F;&#x2F;helicone.ai">https:&#x2F;&#x2F;helicone.ai</a>). Helicone is an open-source platform that helps teams build better LLM applications through a complete development lifecycle of logging, evaluation, experimentation, and release.<p>You can try our free demo by signing up (<a href="https:&#x2F;&#x2F;helicone.ai&#x2F;signup">https:&#x2F;&#x2F;helicone.ai&#x2F;signup</a>) or self-deploy with our new fully open-source helm chart (<a href="https:&#x2F;&#x2F;helicone.ai&#x2F;selfhost">https:&#x2F;&#x2F;helicone.ai&#x2F;selfhost</a>).<p>When we first launched 22 months ago, we focused on providing visibility into LLM applications. With just a single line of code, teams could trace requests and responses, track token usage, and debug production issues. That simple integration has since processed over 2.1B requests and 2.6T tokens, working with teams ranging from startups to Fortune 500 companies.<p>However, as we scaled and our customers matured, it became clear that logging alone wasn’t enough to manage production-grade applications.<p>Teams like Cursor and V0 have shown what peak AI application performance looks like and it&#x27;s our goal to help teams achieve that quality. From speaking with users, we realized our platform was missing the necessary tools to create an iterative improvement loop - prompt management, evaluations, and experimentation.<p>Helicone V1: Log → Review → Release (Hope it works)<p>From talking with our users, we noticed a pattern: while many successfully launch their MVP quickly, the teams that achieve peak performance take a systematic approach to improvement. They identify inconsistent behaviors through evaluation, experiment methodically with prompts, and measure the impact of each change. This observation shaped our new workflow:<p>Helicone V2: Log → Evaluate → Experiment → Review → Release<p>It begins with comprehensive logging, capturing the entire context of an LLM application. Not just prompts and responses, but variables, chain steps, embeddings, tool calls, and vector DB interactions (<a href="https:&#x2F;&#x2F;docs.helicone.ai&#x2F;features&#x2F;sessions">https:&#x2F;&#x2F;docs.helicone.ai&#x2F;features&#x2F;sessions</a>).<p>Yet even with detailed traces, probabilistic systems are notoriously hard to debug at scale. So, we released evaluators (either via LLM-as-judge or custom Python evaluators leveraging the CodeSandbox SDK - <a href="https:&#x2F;&#x2F;codesandbox.io&#x2F;docs&#x2F;sdk&#x2F;sandboxes" rel="nofollow">https:&#x2F;&#x2F;codesandbox.io&#x2F;docs&#x2F;sdk&#x2F;sandboxes</a>).<p>From there, our users were able to more easily monitor performance and investigate what went wrong. Did the embedding search return poor results? Did a tool call fail? Did the prompt mishandle an edge case?<p>But teams would still edit prompts in a playground, run a few test cases, and deploy based on intuition. This lacked the systematic testing we’re used to in traditional software development. That’s why we built experiments (similar to Anthropic&#x27;s workbench but model-agnostic) (<a href="https:&#x2F;&#x2F;docs.helicone.ai&#x2F;features&#x2F;experiments">https:&#x2F;&#x2F;docs.helicone.ai&#x2F;features&#x2F;experiments</a>).<p>For instance, when a prompt generates occasional rude support responses, you can test prompt variations against historical conversations. Each variant runs through your production evaluators, measuring real improvement before deployment.<p>Once deployed, the cycle begins again.<p>We recognize that Helicone can’t solve all of the problems you might face when building an LLM application, but we hope that we can help you bring a better product to your customers through our new workflow.<p>If you&#x27;re curious how our infrastructure handled our growth:<p>Our initial architecture struggled - synchronous log processing overwhelmed our database and query times went from milliseconds to minutes. We&#x27;ve completely rebuilt our infrastructure with two key changes: 1) using Kafka to decouple log ingestion from processing, and 2) splitting storage by access pattern across S3, Kafka, and ClickHouse. This was a long journey but resulted in zero data loss and fast query times even at billions of records. You can read about that here: <a href="https:&#x2F;&#x2F;upstash.com&#x2F;blog&#x2F;implementing-upstash-kafka-with-cloudflare-workers" rel="nofollow">https:&#x2F;&#x2F;upstash.com&#x2F;blog&#x2F;implementing-upstash-kafka-with-clo...</a><p>We&#x27;d love your feedback and questions - join us in this HN thread or on Discord (<a href="https:&#x2F;&#x2F;discord.gg&#x2F;2TkeWdXNPQ" rel="nofollow">https:&#x2F;&#x2F;discord.gg&#x2F;2TkeWdXNPQ</a>). If you&#x27;re interested in contributing to what we build next, check out our GitHub.

7 comments

alexdanilowicz4 months ago
great write up.<p>I&#x27;m starting to imagine a world in which <i>every</i> application has some sort of LLM-powered feature. Assuming that to be true, where do you see the future of observability and product analytics heading? Do you imagine software teams use two vendors, i.e., one for LLM observability and one for more general product observability? Or do you think they converge over time?<p>Your onboarding is impressive, one of the few products where &quot;get set up in one line of code&quot; is true. Why — do you think — more competitors don&#x27;t use a proxy and instead go with a pure async logging option?
Diamantino4 months ago
I think companies like OpenAI will always struggle at creating the best tools for LLM devs, simply because of how many areas of focus they are split between. I love using this platform, bravo!
sage7894 months ago
Helicone is awesome, much better and feature rich than other llm observability tools out there
sage7894 months ago
Helicone is awesome, one of the best llm observability tools out there.
mrprkr4 months ago
Great to see the evolution of this product, nice work!
radurevutchi4 months ago
justin - helicone&#x27;s great. I login in ~10 times a day. Great for debugging - I&#x27;m mostly using for analytics &#x2F; observability.
aseem_gupta4 months ago
This looks awesome!