Show HN: Langfuse – Open-source observability and analytics for LLM apps

143 点作者 marcklingen超过 1 年前

Hi HN! Langfuse is OSS observability and analytics for LLM applications (repo: <a href="https://github.com/langfuse/langfuse">https://github.com/langfuse/langfuse</a>, 2 min demo: <a href="https://langfuse.com/video">https://langfuse.com/video</a>, try it yourself: <a href="https://langfuse.com/demo">https://langfuse.com/demo</a>)Langfuse makes capturing and viewing LLM calls (execution traces) a breeze. On top of this data, you can analyze the quality, cost and latency of LLM apps.When GPT-4 dropped, we started building LLM apps – a lot of them! [1, 2] But they all suffered from the same issue: it’s hard to assure quality in 100% of cases and even to have a clear view of user behavior. Initially, we logged all prompts/completions to our production database to understand what works and what doesn’t. We soon realized we needed more context, more data and better analytics to sustainably improve our apps. So we started building a homegrown tool.Our first task was to track and view what is going on in production: what user input is provided, how prompt templates or vector db requests work, and which steps of an LLM chain fail. We built async SDKs and a slick frontend to render chains in a nested way. It’s a good way to look at LLM logic ‘natively’. Then we added some basic analytics to understand token usage and quality over time for the entire project or single users (pre-built dashboards).Under the hood, we use the T3 stack (Typescript, NextJs, Prisma, tRPC, Tailwind, NextAuth), which allows us to move fast + it means it's easy to contribute to our repo. The SDKs are heavily influenced by the design of the PostHog SDKs [3] for stable implementations of async network requests. It was a surprisingly inconvenient experience to convert OpenAPI specs to boilerplate Python code and we ended up using Fern [4] here. We’re fans of Tailwind + shadcn/ui + tremor.so for speed and flexibility in building tables and dashboards fast.Our SDKs run fully asynchronously and make network requests in the background. We did our best to reduce any impact on application performance to a minimum. We never block the main execution path.We've made two engineering decisions we've felt uncertain about: to use a Postgres database and Looker Studio for the analytics MVP. Supabase performs well at our scale and integrates seamlessly into our tech stack. We will need to move to an OLAP database soon and are debating if we need to start batching ingestion and if we can keep using Vercel. Any experience you could share would be helpful!Integrating Looker Studio got us to first analytics charts in half a day. As it is not open-source and does not work with our UI/UX, we are looking to switch it out for an OSS solution to flexibly generate charts and dashboards. We’ve had a look at Lightdash and would be happy to hear your thoughts.We’re borrowing our OSS business model from Posthog/Supabase who make it easy to self-host with features reserved for enterprise (no plans yet) and a paid version for managed cloud service. Right now all of our code is available under a permissive license (MIT).Next, we’re going deep on analytics. For quality specifically, we will build out model-based evaluations and labeling to be able to cluster traces by scores and use cases.Looking forward to hearing your thoughts and discussion – we’ll be in the comments. Thanks![1] <a href="https://learn-from-ai.com/" rel="nofollow noreferrer">https://learn-from-ai.com/</a>[2] <a href="https://www.loom.com/share/5c044ca77be44ff7821967834dd70cba" rel="nofollow noreferrer">https://www.loom.com/share/5c044ca77be44ff7821967834dd70cba</a>[3] <a href="https://posthog.com/docs/libraries">https://posthog.com/docs/libraries</a>[4] <a href="https://buildwithfern.com/">https://buildwithfern.com/</a>

12 条评论

phillipcarter超过 1 年前

Congrats on the release! I'm keenly interested in this space, as I believe that Observability is one of the top ways to steer LLMs to be more reliable in production.I noticed your SDKs use tracing concepts! Are there plans to implement OpenTelemetry support?

评论 #37313654 未加载

评论 #37317087 未加载

idosh超过 1 年前

Congrats on the launch! Sounds like an exciting project. Do you plan to store also the raw data (input + output)? It can be relevant for fine-tuning, optimizing costs, etc. Since you already store metadata, I think it makes sense to have a one-stop shop.

评论 #37489517 未加载

jayunit超过 1 年前

Congrats on the release! Having built several LLM apps in the past months and embarking on a couple new ones, I’m excited to take a look at Langfuse.Are there any alternatives you’d also suggest evaluating, and any particular strengths/weaknesses we should consider?I’m also curious about doing quality metrics, benchmarking, regression testing, and skew measurement. I’ll dig further into Langfuse documentation (just watched the video so far) but I’d love any additional recommendations base on that.

anirudhrx超过 1 年前

Congrats on the launch! This is really cool. Would love to see OTel integration in the future. I'm curious if this might eventually work with request-context based routing, i.e. being able to use the propagated metadata between layers to dynamically test different versions of the stack, replay requests / route to specific underlying implementation versions at different levels of the stack.

评论 #37315378 未加载

v3np超过 1 年前

Cool stuff and congrats on the Show HN! Out of curiosity, at what point do you see teams usually adopting something like langfuse? In regular development, you sometimes even have test-driven development - I imagine this doesn't really apply for LLMs. Do you see this changing over time as the process of building LLM apps becomes more mature?

评论 #37314739 未加载

marcklingen超过 1 年前

Many great points/ideas here and on Discord, thanks HN!For those reading this thread later, feel free to reach out with any feedback or questions marc at langfuse dot com

fiehtle超过 1 年前

If you’re looking to replace Looker with open source and the ability to style it to your needs maybe a mix of cube.dev plus tremor.so would do the trick?

评论 #37313741 未加载

elamje超过 1 年前

Awesome. There is a definitely a need for LLM product analytics that is currently completely underserved by traditional tools like GA, Mixpanel, etc.

kaspermarstal超过 1 年前

I’m curious if you investigated the TimescaleDB extension that is built into Supabase for your usecase? And if so, what was the pros and cons?

评论 #37312328 未加载

addisonj超过 1 年前

Congrats on the launch!I have quite a few years of observability experience behind me and hand't really considered some of the unique aspects that LLMs bring into the picture. Here are a few thoughts, responses to your questions, and feedback items* Generally, I think you do a good job of having a clear, concise story and value proposition that is fairly early in a market where the number of people hitting these problems is rapidly growing, which is a pretty nice place to be! But, I do think that can be a challenge in that you have to help people recognize the problem, which often means lots of content and lots of outreach.* I think going open-source and following a PLG model of cloud/managed services is pretty reasonable way to go and certainly can be a leg up over the existing players, but I noticed in your pricing a note about enterprise support of self-hosting in customer VPC and dedicated instances. There is lots of money there... but it also can just be extremely big time sink for early stage teams, so I would be careful, or at least make sure you price it such that it supports hiring.* Also on pricing, I wonder if doing this based on storage is how people would think about? Generally, I think about observability data in terms of events/sec first and then retention period. If you can make it work with a single usage based metric of storage, than that is great! but I would be concerned that 1) you aren't telling the user which plan can support throughput and 2) you could end up with some large variance in cost based on different usage patterns* The biggest question I have is how much did you explore opentelemetry? Obviously, it is not as simple as just going and building your own API and SDK... but when I look at the capabilities, I could see opentelemetry being the underlying protocol with some thinner convenience wrappers on top. From your other comments, I understand that you see some ways in which this data is different than typical trace/observability data, but I do wonder if that choice will 1) scare off some companies that are already "all in" on otel and 2) you don't get any opportunity to use all of the stuff around otel, for example, Kafka integration if you someday need that.* As far as your question about OLAP, I wouldn't rush it... In general, once you are big enough that the cost/scalability limitations of PG are looming, you will be a different company and know a lot more about the real requirements. I will also say that in all likelihood, ClickHouse is probably the right choice, but even knowing that, there are lots of different ways to tackle that problem (like using hosted vs self-managed) and the right way to do it will depend on usage patterns, cost structure, where you end up with enterprise dedicated / self-hosted, etc. I will mention though that timescaledb is not a bad way to maybe buy you a bit of headroom, but it is important to note that the timescaledb offered by supabase shouldn't be compared to timescaledb community / cloud. The supabase version isn't bad, it just isn't quite the same thing (i.e. no horizontal scalability)Anyways, congrats again! It looks like you are off to a good start.If you have any other questions for me, my email is in my profile.

评论 #37314866 未加载

评论 #37314638 未加载

评论 #37317062 未加载

pranay01超过 1 年前

Congrats on the launch! Curious to learn what specific use case you have seen around observability of LLM apps which are not covered by standard observability tools like DataDog, SigNoz, etcAlso, how do you compare in terms of features with DataDog's LLM monitoring product which was launched recently?Disclaimer : I am a maintainer at SigNoz

评论 #37312299 未加载

steventey超过 1 年前

> We will need to move to an OLAP database soon and are debating if we need to start batching ingestionHighly recommend <a href="https://tinybird.com" rel="nofollow noreferrer">https://tinybird.com</a> for this – they're a fantastic OLAP DB for ingesting & visualizing time-series data!

评论 #37311408 未加载

评论 #37311653 未加载