TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Averages Can Be Misleading: Try a Percentile (2014)

199 点作者 donbox大约 6 年前

11 条评论

baq大约 6 年前
IMHO plotting the distribution should be the first step before trying to compute its statistics. If you know the shape, you can understand the values - otherwise it's guesswork.
评论 #19561863 未加载
评论 #19557836 未加载
评论 #19560818 未加载
评论 #19562341 未加载
评论 #19559648 未加载
评论 #19559655 未加载
Rafuino大约 6 年前
This topic always leads me to think about this great talk from Gil Tene on how NOT to measure latencies (basically, don&#x27;t use averages!).<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=lJ8ydIuPFeU" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=lJ8ydIuPFeU</a><p>I&#x27;m also a huge fan of how Dormando showed latency distributions in one of his recent Memcached Extstore posts. The default is 95th percentile but you can change the percentile to what matters to you (i.e. 99th percentile if you ask me!). Scroll down to see what he did and play with it.<p><a href="https:&#x2F;&#x2F;memcached.org&#x2F;blog&#x2F;nvm-multidisk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;memcached.org&#x2F;blog&#x2F;nvm-multidisk&#x2F;</a>
评论 #19559421 未加载
cromulent大约 6 年前
There&#x27;s a great story on <i>99% Invisible</i> about averages, particularly when used to design cockpits for the average pilot.<p><a href="https:&#x2F;&#x2F;99percentinvisible.org&#x2F;episode&#x2F;on-average&#x2F;" rel="nofollow">https:&#x2F;&#x2F;99percentinvisible.org&#x2F;episode&#x2F;on-average&#x2F;</a>
sohkamyung大约 6 年前
Check out this comic on &quot;Why Not to Trust Statistics&quot; [1]. His book, &quot;Math With Bad Drawings&quot; [2] has a chapter on statistics and why not to trust a single statistical measure only.<p>[1] <a href="https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2016&#x2F;07&#x2F;13&#x2F;why-not-to-trust-statistics&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2016&#x2F;07&#x2F;13&#x2F;why-not-to-trust-...</a><p>[2] <a href="https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2018&#x2F;05&#x2F;23&#x2F;math-with-bad-drawings-the-book&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2018&#x2F;05&#x2F;23&#x2F;math-with-bad-dra...</a>
camel_gopher大约 6 年前
Percentiles can be misleading, try a histogram - <a href="https:&#x2F;&#x2F;www.circonus.com&#x2F;2018&#x2F;11&#x2F;the-problem-with-percentiles-aggregation-brings-aggravation&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.circonus.com&#x2F;2018&#x2F;11&#x2F;the-problem-with-percentile...</a>
novaleaf大约 6 年前
My own solution, which might be useful to those using javascript (nodejs or browser):<p>I use mathjs.quantileSeq() and log 0%, 25%, 50%, 75%, and 100%. This seems to be good for &quot;casual metric logs&quot;.<p>I&#x27;ve found that this gives a good shape of the data, as well as the absolute min&#x2F;max values. If you use 1% or 99% you&#x27;ll miss the absolute worst performers, and I want to be at least aware of what the worst performance numbers are.<p><a href="https:&#x2F;&#x2F;mathjs.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathjs.org&#x2F;</a><p><a href="https:&#x2F;&#x2F;mathjs.org&#x2F;docs&#x2F;reference&#x2F;functions&#x2F;quantileSeq.html" rel="nofollow">https:&#x2F;&#x2F;mathjs.org&#x2F;docs&#x2F;reference&#x2F;functions&#x2F;quantileSeq.html</a>
LiamPa大约 6 年前
Site Reliabilty Engineering goes over this in a lot more detail.<p><a href="https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;books&#x2F;" rel="nofollow">https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;books&#x2F;</a>
评论 #19559279 未加载
phosfox大约 6 年前
Reminds me of “Don’t cross a river if it is four feet deep on average.” — Nassim Nicholas Taleb
评论 #19564026 未加载
mikorym大约 6 年前
I&#x27;ve used Elasticsearch + Kibana for agricultural data and similarly &quot;expanded&quot; the view out from averages to time series.<p>People in agriculture love averages and it makes a lot of sense in financial data since averages preserve totals e.g.:<p>50 ton &#x2F; ha average over 100 ha = 5 000 tons<p>At the same time summing each individual ha gives you 5 000 tons total.<p>But once you realise that you can expand on this, things get <i>really</i> interesting. I don&#x27;t know of other people working on the same problems that I am working on, but they are relevant both economically (in the sense of making money) and environmentally (in the sense of improving efficiency and managing climate).
SketchySeaBeast大约 6 年前
More knowledge is always better, but percentiles are a little misleading as well - the 99% at 867 ms latency makes you have a moment of panic, but when you see that 95% is 60 ms, then you really realize how few of your visitors are experiencing the slow response. Might it be a problem? Possibly, and I has brought awareness to that potential, but it also has the possibility to blow it out of proportion if you don&#x27;t look at the rest of the data.<p>Edit: I&#x27;m not saying Averages are better, but that Percentiles can be misleading as well.
评论 #19558223 未加载
评论 #19558065 未加载
评论 #19557692 未加载
评论 #19557400 未加载
评论 #19557961 未加载
Lightbody大约 6 年前
One of my favorite (short) talks on this topic. Well worth a few minutes of your time:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=coNDCIMH8bk" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=coNDCIMH8bk</a>
评论 #19562500 未加载