Hey! I'm the author of Confidence.js. Emily Malcolm and I have been working hard on this new approach for the past few weeks and we're super excited to share it!<p>We're both here to answer any questions :)
The statistical methods we use now were created in the context of long-running experiments that had to be set up in advance and run in parallel. In that situation, you have to decide up-front how many subjects to test on, and the methods reflect this.<p>I'd like to see someone takle creating a method aimed at our situations where results steadily trickle in. There ought to be a way to come up with adaptive thresholds such that at any given time we can ask, "Do we have statistically significant results yet, or do we keep the test running?"
Here's another paper that's relevant to this topic: <a href="http://www.qubitproducts.com/sites/default/files/pdf/most_winning_ab_test_results_are_illusory.pdf" rel="nofollow">http://www.qubitproducts.com/sites/default/files/pdf/most_wi...</a>. And discussion here: <a href="https://news.ycombinator.com/item?id=7287665" rel="nofollow">https://news.ycombinator.com/item?id=7287665</a>
There are a number of testing tools relying on this methodology. This is useful: Evan's Awesome A/B Tools (<a href="http://www.evanmiller.org/ab-testing/" rel="nofollow">http://www.evanmiller.org/ab-testing/</a>). It includes a Chi-Squared test, sample size calculator, two sample T-test, and Poisson Means test.
At my company we've implemented a Bayesian A/B test in order to minimize the amount of time a test has to run.<p><a href="http://visualrevenue.com/blog/2013/02/tech-bayesian-instant-headline-testing.html" rel="nofollow">http://visualrevenue.com/blog/2013/02/tech-bayesian-instant-...</a>