TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DDSketch: A fast, fully-mergeable quantile sketch with relative-error guarantees

107 点作者 jbarciauskas超过 5 年前

4 条评论

homin超过 5 年前
Author here. We wanted to be able to graph p99, p99.9 metrics with arbitrary ranges, and found the existing solutions were not accurate enough for our needs. Happy to answer any questions.<p>Code here:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-go" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-go</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-py</a><p><a href="https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-java" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;DataDog&#x2F;sketches-java</a>
评论 #20830903 未加载
评论 #20837312 未加载
评论 #20830906 未加载
评论 #20831255 未加载
评论 #20830871 未加载
评论 #20834868 未加载
评论 #20836359 未加载
评论 #20830980 未加载
germanjoey超过 5 年前
Hey, congratulations, this is a really cool algorithm! Thanks for sharing it.<p>I&#x27;m interested in this paper because I worked on a somewhat related problem some time ago, but got stuck on how to handle data that morphs into a mixed-modal distribution. Modes that are close together are no big deal, but modes that are spaced more exponentially apart are tricky to deal with. For an example of something that would be in DataDog&#x27;s purview, it would be like trying to sketch the histogram of response times from an endpoint that sometimes took a &quot;fast&quot; path (e.g. a request for a query whose result was cached), sometimes took a &quot;normal&quot; path, and sometimes took a &quot;slow&quot; path. (e.g. a query with a flag that requested additional details to be computed) If the response times from the slow path is much bigger than the others, e.g. by an order of magnitude, their statistics might essentially drown-out the data from the other two paths since you&#x27;re using them to calculate bin size.<p>I noticed you had some results from measuring DDSketch&#x27;s performance on a mixed-modal distribution that looked pretty good (that &quot;power&quot; distribution on the last page). I was wondering if you had done any more investigation in this area? E.g. how messy&#x2F;mixed can the data be before the sketch starts to break down?
MrBuddyCasino超过 5 年前
How does this compare to t-digest?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;tdunning&#x2F;t-digest&#x2F;blob&#x2F;master&#x2F;docs&#x2F;t-digest-paper&#x2F;histo.pdf" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;tdunning&#x2F;t-digest&#x2F;blob&#x2F;master&#x2F;docs&#x2F;t-dige...</a>
评论 #20836143 未加载
skyde超过 5 年前
great! so main difference is more accuracy on average or more the fact the maximum error possible is bounded?
评论 #20830985 未加载