Very nice, much faster than simulating it. I guess if you're using an informative prior, instead of adding 1 to the number of successes and failures, you add the corresponding parameters of your beta prior?<p>A pretty good shortcut (if you don't have a log-beta function, for example) is to approximate both A and B with the normal distribution. Then their difference is also normal, so you can just check the probability that a normal random variable is greater than zero.<p>Specifically, the mean μ of a beta distribution is α/(α+β) and the variance σ^2 is αβ/((α+β)^2(α+β+1)). Use these as the parameters for your normal approximation, and we have the difference D ~ N(μ_A-μ_B, σ_A^2+σ_B^2). The probability that B beats A is just the CDF of D evaluated at 0.<p>In Python:<p><pre><code> from scipy.stats import norm as norm
def beta_mean(a, b):
return a/(a+b)
def beta_var(a, b):
return a*b/((a+b)**2*(a+b+1))
def probability_B_beats_A(α_A, β_A, α_B, β_B):
mu = beta_mean(α_A, β_A) - beta_mean(α_B, β_B)
sigma = (beta_var(α_A, β_A) + beta_var(α_B, β_B))**.5
return norm.cdf(0, mu, sigma)</code></pre>
I'm pretty sure this formula is correct, but I haven't seen it published anywhere. John Cook has some veiled references to a closed-form solution when one of the parameters is an integer:<p><a href="http://www.mdanderson.org/education-and-research/departments-programs-and-labs/departments-and-divisions/division-of-quantitative-sciences/research/biostats-utmdabtr-005-05.pdf" rel="nofollow">http://www.mdanderson.org/education-and-research/departments...</a> [pdf]<p>But he doesn't really say what that closed form is, so I think his version must have been pretty hairy. (My version requires all four parameters to be integers, so I doubt we were talking about the same thing.)<p>Sadly I couldn't get the math to work out for producing a confidence interval on |p_B - p_A| so for now you're stuck with Monte Carlo for confidence bounds.<p>Thanks to prodding from Steven Noble over at Stripe, I'll have another formula up soon for asking the same question using count data instead of binary data. Stay tuned!
<i>If you are unable to find a log-gamma function lying around, rewrite the above equation in terms of factorials using Γ(z)=(z−1)!, and notice that there are an equal number of multiplicative terms in the numerator and denominator. If you alternately multiply and divide one term at a time, you should be able to arrive at an answer without encountering numerical overflow.</i><p>Something about the right side of the equation immediately preceding this quote seems to indicate that <i>many</i> of the terms in the numerator would cancel with equivalents in the denominator. I'm not really familiar with CAS systems, but is this the sort of thing they could do? Doing this simplification <i>once</i> when one writes the code seems to be a win over calculating the original expression every time the code runs.
Evan, does this formula account for multiple comparisons (if you have multiple goals and multiple variations)? I guess it would suffer from the same problems that if you have 100 variations and 10 goals, some of them are bound to produce a significant result, just by random chance. Isn't it? In classical testing, you can fix it by making your calculated alpha smaller than the real alpha, so you need much more data if there are multiple comparisons. What happens in Bayesian case?<p>Edit: I did some Googling and found this <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf" rel="nofollow">http://www.stat.columbia.edu/~gelman/research/unpublished/mu...</a>
Great work! However, an additional <i>testing</i> section would be nice (right below the <i>implementation</i> section).<p>That section should provide two or three lists of example input values, and the expected output value (up to some accuracy).<p>As the author notes, although this formula looks pretty simple, you can make a lot of numerical mistakes when implementing it. A test suite would help implementors to catch those mistakes early.
Indeed, much faster than Monte Carlo integration:<p><pre><code> 5.778 evaluation.py:20(evaluate) # Sampling version
0.043 evaluation.py:64(evaluate) # Closed formula
</code></pre>
(See my A/B testing library <a href="https://github.com/bogdan-kulynych/trials" rel="nofollow">https://github.com/bogdan-kulynych/trials</a>)