"data points required for confident results scales linearly with number of distinct states being tested"<p>My memory of the maths behind this is poor, but I'm not sure this is actually true: I would understand that (all things being equal) the number of required data points grows faster than linearly. The reason is that not only are you spreading your users across more states, but you also need much stronger results to make a conclusion: see the use of ANOVA vs (eg) two sample t tests <a href="http://en.wikipedia.org/wiki/Analysis_of_variance" rel="nofollow">http://en.wikipedia.org/wiki/Analysis_of_variance</a>.<p>Of course, if you run a series of base vs variant tests, where the winner stays on, you run into the exact same class of problem too. Meaningful AB testing is tricky.<p>EDIT: when I say not sure, I mean it. I wish I knew this better (rather than just being aware of the existence of gotchas). If anyone has any great insights on the impact of this stuff, or how to deal with it, I'd love to hear them.