TechEcho

10 comments

jordighalmost 10 years ago

> Stop saying: “We’ve reached 95% statistical significance.”> And start saying: “There’s a 5% chance that these results are total bullshit.”Argh, no, no, no and no!95% significance is NOT 95% probability! When you select a confidence level of a 95%, the probability that your results are nonsense is ZERO or ONE. There is no probability statement associated to it. Just because something is unknown does not mean that you can make a probability statement about it, and the mathematics around statistical testing all depend on the assumption that the parameter being tested is not random, merely unknown...Rather, 95% statistical significance means, we got this number from a procedure that 95% of the time produces the right thing, but we have no idea whether this particular number we got is correct or not.UNLESS!Unless you're doing Bayesian stats. But in that case your procedure looks completely different and produces very different probability intervals instead of confidence intervals, and you don't talk about statistical significance at all, but about raw probabilities.

评论 #9654502 未加载

评论 #9654463 未加载

评论 #9655405 未加载

scishopalmost 10 years ago

No.In Frequentist thinking; p=0.05 means that if there was in reality no difference in your A and B and you repeated the experiment many times, 5% of the observed differences would be equal to or greater than the difference you just measured.No probabilistic statement about the results being correct or incorrect can be made from a Null-Hypothesis significance test.

评论 #9655420 未加载

评论 #9654479 未加载

评论 #9654550 未加载

CountBayesiealmost 10 years ago

I've long argued that the biggest problem with orthodox NHST for A/B testing is that you actually don't care about 'significance of effect' as much as you do 'magnitude of effect'. Furthermore, p-values tell you nothing about the range of possible improvements (or lack thereof) you're facing. Maybe you are willing to risk potential losses for potentially huge gains, or maybe you can't afford to lose a single customer and would rather exchange time for certainty.My favored approach I've outlined here[0]. Where the problem is basically considered one of Bayesian parameter estimation. Benefits include:1. Output is a range of possible improvements so you can reason about risk/reward for calling a test early.2. Allows the use of prior information to prevent very early stopping, and provide better estimates early on.3. Every piece of the testing setup is, imho, easy to understand (ignore this benefit if you can comfortably derive Student's T-distribution from first principles)[0] <a href="https://www.countbayesie.com/blog/2015/4/25/bayesian-ab-testing" rel="nofollow">https://www.countbayesie.com/blog/2015/4/25/bayesian-ab-test...</a>

JDDunn9almost 10 years ago

Lots of knit picking here. In plain English, confidence intervals are about your results being bogus. You flipped 100 coins, all of them came up heads, you conclude 100% of coin tosses come up heads. By chance, you got a very unlikely sample that differed substantially from the population. You could also conclude your A/B test is a success, when it was just randomly atypical.

thanatropismalmost 10 years ago

You have to wonder: what else from their junior year in college did mr. Avshalomov get completely wrong?How many of the recent YC graduates fail at basic numeracy? Does node.js mean you don't have to understand data structures and algorithms to successfully "preneur" too?I mean, in finance this doesn't do. Or in consulting. So there's adverse selection to worry too.

sbovalmost 10 years ago

I'm not a statistician, but lately I've been wondering:When we're A/B testing code, the code is already written. If there's a 5%, or even 15% chance of it being bullshit, who cares? The effort is usually exactly the same if I switch or not.It's my understanding that 95%, 99%, etc, were established for things that require extra change. We don't want to spend extra time developing and marketing a new drug if it isn't effective. We don't want to tell people to do A instead of B if we aren't sure A is really better than B.But in software I've already spent all the time I need to to implement the variation on the feature. So given that, why do I need 95%?I would appreciate if someone with more knowledge can answer this question.Edit to add: I see a lot of answers about the cost to keep the code around. What about A/B tests that don't require extra code, just different code? Most of our A/B tests fall into this category.

评论 #9654508 未加载

评论 #9654425 未加载

评论 #9654395 未加载

mathattackalmost 10 years ago

It gets even worse.If you try 100 tests, and pick the 5 that pass the Statistically Significant threshold, most likely all 5 are BS.

glaberfickenalmost 10 years ago

Is it just me or this sentence makes no mathematical sense at all?"If you’re running squeaky clean A/B tests at 95% statistical significance and you run 20 tests this year, odds are one of the results you report (and act on) is going to be straight up wrong."

Fomitealmost 10 years ago

"We’re taking techniques that were designed for static sample sizes and applying them to continuous datasets" - Wait, seriously? Do A/B testers not use the very well developed techniques that exist for time-series data?

Grue3almost 10 years ago

It just means A is slightly worse than B. Or equal to B. Or much worse than B, but that is quite unlikely (way less than 5%).