TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees

107 pointsby jbarciauskasover 5 years ago

4 comments

hominover 5 years ago
Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary ranges, and found the existing solutions were not accurate enough for our needs. Happy to answer any questions.<p>Code here:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-go" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-go</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-py</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-java" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-java</a>
评论 #20830903 未加载
评论 #20837312 未加载
评论 #20830906 未加载
评论 #20831255 未加载
评论 #20830871 未加载
评论 #20834868 未加载
评论 #20836359 未加载
评论 #20830980 未加载
germanjoeyover 5 years ago
Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.<p>I&#x27;m interested in this paper because I worked on a somewhat related problem some time ago, but got stuck on how to handle data that morphs into a mixed-modal distribution. Modes that are close together are no big deal, but modes that are spaced more exponentially apart are tricky to deal with. For an example of something that would be in DataDog&#x27;s purview, it would be like trying to sketch the histogram of response times from an endpoint that sometimes took a &quot;fast&quot; path (e.g. a request for a query whose result was cached), sometimes took a &quot;normal&quot; path, and sometimes took a &quot;slow&quot; path. (e.g. a query with a flag that requested additional details to be computed) If the response times from the slow path is much bigger than the others, e.g. by an order of magnitude, their statistics might essentially drown-out the data from the other two paths since you&#x27;re using them to calculate bin size.<p>I noticed you had some results from measuring DDSketch&#x27;s performance on a mixed-modal distribution that looked pretty good (that &quot;power&quot; distribution on the last page). I was wondering if you had done any more investigation in this area? E.g. how messy&#x2F;mixed can the data be before the sketch starts to break down?
MrBuddyCasinoover 5 years ago
How does this compare to t-digest?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;tdunning&#x2F;t-digest&#x2F;blob&#x2F;master&#x2F;docs&#x2F;t-digest-paper&#x2F;histo.pdf" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tdunning&#x2F;t-digest&#x2F;blob&#x2F;master&#x2F;docs&#x2F;t-dige...</a>
评论 #20836143 未加载
skydeover 5 years ago
great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?
评论 #20830985 未加载