The Most Misleading Measure of Response Time: Average

78 pointsby dsirokerover 11 years ago

15 comments

JackFrover 11 years ago

Liked it better when it was called "Programmers Need To Learn Statistics Or I Will Kill Them All" <a href="http://zedshaw.com/essays/programmer_stats.html" rel="nofollow">http://zedshaw.com/essays/programmer_stats.html</a>, but still kind of wrong.If you need to reduce a distribution to a single number, the most informative number is going to be the mean.I understand their point about the 99th percentile, but consider that it's possible to improve the 99th percentile measure, while increasing the mean and degrading the performance of all but 1% of the users.The real issue is reducing a distribution to one number.

评论 #6889346 未加载

评论 #6889814 未加载

评论 #6889780 未加载

评论 #6889713 未加载

评论 #6889344 未加载

评论 #6889181 未加载

评论 #6889293 未加载

ender7over 11 years ago

Using FPS as a measure of your UI's performance is equally problematic. FPS is a great measurement for for games since performance dips usually occur over a span of many frames, but for UIs a lot of work tends to get concentrated into a single frame. A single frame that takes 110ms (or, heaven forbid, 500ms) to render won't move the needle on your FPS meter, but it will be instantly recognizable by the user.I've complained about this before; use maximum frame delay [0] instead of FPS when measuring UI responsiveness.[0] The maximum time elapsed between any two sequential frames during your test.

评论 #6890446 未加载

programminggeekover 11 years ago

Why not break the numbers down more granularly to like 25% 50% 70% 90% 95% 99% ?Understanding where your users are on the curve is probably more interesting than a single number. Worrying about that last 1% really only makes a meaningful difference if your user base is huge enough that fixing something for 1% of your users can jump revenue by a significant multiple.Mentally, I try and think of the 80 or 90% of users with a similar experience, needs, etc. and make it better for them. In this case, speed is good for everybody, but I care very little about the needs of that last 1% if your customers are all paying the same. No sense in putting the needs of a small number of users in front of the needs of a much larger set of users.

评论 #6891229 未加载

krmmalikover 11 years ago

Big fan of Optimizely here. I've used their product with a handful of my clients. The thing that struck me about the article the most was how well it was written. Very engaging all the way through. That kind of quality of writing, I would say is quite rare.Anyway, I'm really glad that they've improved the load times for their snippet, because this issue is always a genuine concern that needs resolving.

res0nat0rover 11 years ago

Anyone at AWS should know the phrase TP99. That is used all of the time to measure the 1% and is something they are very concerned with.

评论 #6890437 未加载

durbatulukover 11 years ago

Sadly is hard to say anything without any value on axis. Difference between mean and 99% is 5s or 20ms? As someone said here, threshold for "slow loading" should be used before picking metric for measuring it. Pick graphic one and draw a line where users whine about slow loading, check how many are under and above. If the number of users below threshold value is "greater" then above threshold you shouldn't be so worried. How much greater can be picked from standard error from threshold measure from users.

noelwelshover 11 years ago

Regarding mean vs 99% etc. In this case all you care about is: did loading the script delay page rendering to an extent that it was perceptible to the user? It's basically a step function. 99% is appropriate in this case.Want to do it yourself? This talk by Etsy a few weeks ago has some detail on how they did a similar thing:<a href="http://www.slideshare.net/marcusbarczak/integrating-multiple-cdn-providers-at-etsy" rel="nofollow">http://www.slideshare.net/marcusbarczak/integrating-multiple...</a>Some links at the end of the talk. Infrastructure wise, I think you have to be prepared to pay for some expensive DNS before this kind of thing is viable.

评论 #6890514 未加载

josephscottover 11 years ago

The problem with using average for many performance stats is that it hides issues. There is a great paper on the topic - <a href="http://method-r.com/downloads/doc_details/44-thinking-clearly-about-performance" rel="nofollow">http://method-r.com/downloads/doc_details/44-thinking-clearl...</a>It is only about 13 pages, making it a quick but very informative read. I highly recommend it for anyone trying to measure performance, throughput, response time, efficiency, skew and load.

michaelbuckbeeover 11 years ago

I wish they talked more about how they had combined Akamai and Edgecast - seems like a very useful and effective technique.

评论 #6888804 未加载

odonnellryanover 11 years ago

That's not true.I worked at a place that ONLY cared about the longest response time. Imagine! They ignored everything else!

评论 #6892458 未加载

amikulaover 11 years ago

I think in Optimizely's case, the most important factor is making sure that there's no statistically significant correlation between higher response times and A/B testing. In other words, if the higher response times result an imbalanced impact on the test, the test is invalid.

评论 #6890492 未加载

d4rtiover 11 years ago

I've used ApDex[1] before for giving a better measure of response times for user experience1:<a href="http://apdex.org/" rel="nofollow">http://apdex.org/</a>

zcarterover 11 years ago

Step one (always): Look at your data.Only then should you choose the statistic(s) you 'care' about.

tairizzleover 11 years ago

This was a very insightful read.

binarymaxover 11 years ago

The most misleading measure of almost everything: Average