TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Stripe’s Veneur: A distributed, fault-tolerant pipeline for observability data

109 pointsby federicoponzialmost 7 years ago

9 comments

chimeracoderalmost 7 years ago
This was a pleasant surprise to see on Hacker News this morning! I work on the Observability team at Stripe and have been the PM for Veneur (and the rest of our metrics &amp; tracing pipeline work) pretty much since we released it ~2 years ago.<p>If you&#x27;re interested in learning more about how Veneur works and why we built it, I gave a talk at Monitorama last year that explains the philosophy behind Veneur[0]. In short, a massive company like Google is able to build their on integrated observability stacks in-house, but almost any other smaller company is going to be relying on an array of open-source tools or third-party vendors for different parts of their observability tooling[1]. When using different tools, there are always going to be gaps between them, which leads to incomplete instrumentation and awkward (inter-)operability. By taking control of the pipeline that processes the data, we&#x27;re able to provide fully integrated views into different aspects of our observability data.<p>The Monitorama talk is a year old at this point, so it doesn&#x27;t cover some of the newer things Veneur has helped us to accomplish, but the core philosophy hasn&#x27;t changed. I&#x27;ve given updated versions of the talk more recently at CraftConf (in May) and DevOpsDaysMSP (last week), but neither of those videos are online yet.<p>[0] <a href="https:&#x2F;&#x2F;vimeo.com&#x2F;221049715" rel="nofollow">https:&#x2F;&#x2F;vimeo.com&#x2F;221049715</a><p>[1] e.g. ELK&#x2F;Papertrail&#x2F;Splunk for logs, Graphite&#x2F;Datadog&#x2F;SignalFx for metrics, and maybe a third tool for tracing if you&#x27;re lucky.
tchaffeealmost 7 years ago
Am I the only one who is always slightly disappointed that neither the README file on Github nor the landing page at the website tells me why I would want to use the software in question? What problem it solves? Why might &quot;a distributed, fault-tolerant observability pipeline&quot; be interesting to programmers or anyone else? It seems like you&#x27;ve already got to be familiar with the problem space to understand what this is and what need it fulfills.<p>I&#x27;m not picking on this package. I see it all the time.<p>Can someone here explain to me what the use case is for this software?
评论 #17587251 未加载
评论 #17587275 未加载
评论 #17592841 未加载
评论 #17587937 未加载
roskillialmost 7 years ago
It’s definitely interesting to see the different systems being built for monitoring across the different tech co’s.<p>M3 aggregator, Uber’s metrics aggregation tier is similar, except it has inbuilt replication and leader election on top of etcd to avoid any SPOF during deployments, failed instances, etc. Also it uses Cormode-Muthukrishnan for estimating percentiles by default, it has support for T-Digest too. Although these days submitting histogram bucket aggregates all the way from the client to aggregator then to storage is more popular as you can estimate percentiles across more dimensions and time windows at query time quite cheaply. You need to choose your buckets carefully though.<p>It too is open source, but needs some help to make it plug into other stacks more easily: <a href="https:&#x2F;&#x2F;github.com&#x2F;m3db&#x2F;m3aggregator" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;m3db&#x2F;m3aggregator</a>
dswalteralmost 7 years ago
It always makes me happy to see approximate algorithms&#x2F;data structures like hyperloglog being used.
评论 #17586982 未加载
ebikelawalmost 7 years ago
When I&#x27;m evaluating a system like this what I want to read about is how is it hardened against client stupidity. For example, someone deploys an application in my datacenter and it emits metrics that have gibberish in their names (consider a common Java bug where a class lacks a toString, so the metric gets barfed out as foo.bar.0xCAFEBABE.baz). How does the system cope with this enormous, hyper-dimensional input?
noncomlalmost 7 years ago
Why is Go so popular in the industry at the moment? What&#x27;s the decision process for choosing Go?
评论 #17593444 未加载
评论 #17588093 未加载
pinkoalmost 7 years ago
Know of anyone using this in production outside Stripe?
评论 #17587013 未加载
评论 #17587987 未加载
评论 #17586709 未加载
评论 #17609440 未加载
madspindelalmost 7 years ago
It&#x27;s from 2016: <a href="https:&#x2F;&#x2F;stripe.com&#x2F;blog&#x2F;introducing-veneur-high-performance-and-global-aggregation-for-datadog" rel="nofollow">https:&#x2F;&#x2F;stripe.com&#x2F;blog&#x2F;introducing-veneur-high-performance-...</a>
评论 #17586884 未加载
评论 #17586531 未加载
ameliusalmost 7 years ago
What do they mean by &quot;observability data&quot;?<p>Is this a fancy way of saying &quot;privacy-sensitive user data&quot;?
评论 #17587922 未加载