TechEcho

4 comments

hominover 5 years ago

Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary ranges, and found the existing solutions were not accurate enough for our needs. Happy to answer any questions.Code here:<a href="https://github.com/DataDog/sketches-go" rel="nofollow">https://github.com/DataDog/sketches-go</a><a href="https://github.com/DataDog/sketches-py" rel="nofollow">https://github.com/DataDog/sketches-py</a><a href="https://github.com/DataDog/sketches-java" rel="nofollow">https://github.com/DataDog/sketches-java</a>

评论 #20830903 未加载

评论 #20837312 未加载

评论 #20830906 未加载

评论 #20831255 未加载

评论 #20830871 未加载

评论 #20834868 未加载

评论 #20836359 未加载

评论 #20830980 未加载

germanjoeyover 5 years ago

Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.I'm interested in this paper because I worked on a somewhat related problem some time ago, but got stuck on how to handle data that morphs into a mixed-modal distribution. Modes that are close together are no big deal, but modes that are spaced more exponentially apart are tricky to deal with. For an example of something that would be in DataDog's purview, it would be like trying to sketch the histogram of response times from an endpoint that sometimes took a "fast" path (e.g. a request for a query whose result was cached), sometimes took a "normal" path, and sometimes took a "slow" path. (e.g. a query with a flag that requested additional details to be computed) If the response times from the slow path is much bigger than the others, e.g. by an order of magnitude, their statistics might essentially drown-out the data from the other two paths since you're using them to calculate bin size.I noticed you had some results from measuring DDSketch's performance on a mixed-modal distribution that looked pretty good (that "power" distribution on the last page). I was wondering if you had done any more investigation in this area? E.g. how messy/mixed can the data be before the sketch starts to break down?

MrBuddyCasinoover 5 years ago

How does this compare to t-digest?<a href="https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf" rel="nofollow">https://github.com/tdunning/t-digest/blob/master/docs/t-dige...</a>

评论 #20836143 未加载

skydeover 5 years ago

great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?

评论 #20830985 未加载

4 comments

hominover 5 years ago

评论 #20830903 未加载

评论 #20837312 未加载

评论 #20830906 未加载

评论 #20831255 未加载

评论 #20830871 未加载

评论 #20834868 未加载

评论 #20836359 未加载

评论 #20830980 未加载

germanjoeyover 5 years ago

MrBuddyCasinoover 5 years ago

评论 #20836143 未加载

skydeover 5 years ago

great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?

评论 #20830985 未加载

DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees

4 comments

DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees

4 comments