TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Averages Can Be Misleading: Try a Percentile (2014)

199 pointsby donboxabout 6 years ago

11 comments

baqabout 6 years ago
IMHO plotting the distribution should be the first step before trying to compute its statistics. If you know the shape, you can understand the values - otherwise it's guesswork.
评论 #19561863 未加载
评论 #19557836 未加载
评论 #19560818 未加载
评论 #19562341 未加载
评论 #19559648 未加载
评论 #19559655 未加载
Rafuinoabout 6 years ago
This topic always leads me to think about this great talk from Gil Tene on how NOT to measure latencies (basically, don&#x27;t use averages!).<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=lJ8ydIuPFeU" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=lJ8ydIuPFeU</a><p>I&#x27;m also a huge fan of how Dormando showed latency distributions in one of his recent Memcached Extstore posts. The default is 95th percentile but you can change the percentile to what matters to you (i.e. 99th percentile if you ask me!). Scroll down to see what he did and play with it.<p><a href="https:&#x2F;&#x2F;memcached.org&#x2F;blog&#x2F;nvm-multidisk&#x2F;" rel="nofollow">https:&#x2F;&#x2F;memcached.org&#x2F;blog&#x2F;nvm-multidisk&#x2F;</a>
评论 #19559421 未加载
cromulentabout 6 years ago
There&#x27;s a great story on <i>99% Invisible</i> about averages, particularly when used to design cockpits for the average pilot.<p><a href="https:&#x2F;&#x2F;99percentinvisible.org&#x2F;episode&#x2F;on-average&#x2F;" rel="nofollow">https:&#x2F;&#x2F;99percentinvisible.org&#x2F;episode&#x2F;on-average&#x2F;</a>
sohkamyungabout 6 years ago
Check out this comic on &quot;Why Not to Trust Statistics&quot; [1]. His book, &quot;Math With Bad Drawings&quot; [2] has a chapter on statistics and why not to trust a single statistical measure only.<p>[1] <a href="https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2016&#x2F;07&#x2F;13&#x2F;why-not-to-trust-statistics&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2016&#x2F;07&#x2F;13&#x2F;why-not-to-trust-...</a><p>[2] <a href="https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2018&#x2F;05&#x2F;23&#x2F;math-with-bad-drawings-the-book&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathwithbaddrawings.com&#x2F;2018&#x2F;05&#x2F;23&#x2F;math-with-bad-dra...</a>
camel_gopherabout 6 years ago
Percentiles can be misleading, try a histogram - <a href="https:&#x2F;&#x2F;www.circonus.com&#x2F;2018&#x2F;11&#x2F;the-problem-with-percentiles-aggregation-brings-aggravation&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.circonus.com&#x2F;2018&#x2F;11&#x2F;the-problem-with-percentile...</a>
novaleafabout 6 years ago
My own solution, which might be useful to those using javascript (nodejs or browser):<p>I use mathjs.quantileSeq() and log 0%, 25%, 50%, 75%, and 100%. This seems to be good for &quot;casual metric logs&quot;.<p>I&#x27;ve found that this gives a good shape of the data, as well as the absolute min&#x2F;max values. If you use 1% or 99% you&#x27;ll miss the absolute worst performers, and I want to be at least aware of what the worst performance numbers are.<p><a href="https:&#x2F;&#x2F;mathjs.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;mathjs.org&#x2F;</a><p><a href="https:&#x2F;&#x2F;mathjs.org&#x2F;docs&#x2F;reference&#x2F;functions&#x2F;quantileSeq.html" rel="nofollow">https:&#x2F;&#x2F;mathjs.org&#x2F;docs&#x2F;reference&#x2F;functions&#x2F;quantileSeq.html</a>
LiamPaabout 6 years ago
Site Reliabilty Engineering goes over this in a lot more detail.<p><a href="https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;books&#x2F;" rel="nofollow">https:&#x2F;&#x2F;landing.google.com&#x2F;sre&#x2F;books&#x2F;</a>
评论 #19559279 未加载
phosfoxabout 6 years ago
Reminds me of “Don’t cross a river if it is four feet deep on average.” — Nassim Nicholas Taleb
评论 #19564026 未加载
mikorymabout 6 years ago
I&#x27;ve used Elasticsearch + Kibana for agricultural data and similarly &quot;expanded&quot; the view out from averages to time series.<p>People in agriculture love averages and it makes a lot of sense in financial data since averages preserve totals e.g.:<p>50 ton &#x2F; ha average over 100 ha = 5 000 tons<p>At the same time summing each individual ha gives you 5 000 tons total.<p>But once you realise that you can expand on this, things get <i>really</i> interesting. I don&#x27;t know of other people working on the same problems that I am working on, but they are relevant both economically (in the sense of making money) and environmentally (in the sense of improving efficiency and managing climate).
SketchySeaBeastabout 6 years ago
More knowledge is always better, but percentiles are a little misleading as well - the 99% at 867 ms latency makes you have a moment of panic, but when you see that 95% is 60 ms, then you really realize how few of your visitors are experiencing the slow response. Might it be a problem? Possibly, and I has brought awareness to that potential, but it also has the possibility to blow it out of proportion if you don&#x27;t look at the rest of the data.<p>Edit: I&#x27;m not saying Averages are better, but that Percentiles can be misleading as well.
评论 #19558223 未加载
评论 #19558065 未加载
评论 #19557692 未加载
评论 #19557400 未加载
评论 #19557961 未加载
Lightbodyabout 6 years ago
One of my favorite (short) talks on this topic. Well worth a few minutes of your time:<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=coNDCIMH8bk" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=coNDCIMH8bk</a>
评论 #19562500 未加载