It's a good article, and a good intro to the pitfalls of statistical interpretation, but I think it reaches the wrong conclusion. Yes, when one has a very limited data set and needs to draw a conclusion in a hurry, and one has full confidence that there are no confounding variables in one's experiment, then paying very close attention to small differences in p-values can make sense. But how often is this the case when testing a new logo or signup page?<p>I'm less mathematically sophisticated than the author, and would choose a simpler approach: ignore weak results. If one determines that there is a 95% chance that 51% of people prefer Logo A, either stick with what what you have, go with the one you like, or keep searching for a better logo. If you can't see the effect in the raw data without rigorous mathematical analysis, it's probably not a change worth spending much time on.<p>Instead of adjusting your significance test for each 'peek', simply ignore anything less than 99.9% 'significant'. And while you are at it, ignore anything that's less than a 10% improvement, on the assumption that structural errors in your testing are likely to overwhelm any effects smaller than this. Drug trials and the front page of Google aside, if the effect is so small that it flips into and out of 'significance' each time you peek, it's probably not the answer you want.