TechEcho

8 comments

sharnettalmost 11 years ago

Very nice, much faster than simulating it. I guess if you're using an informative prior, instead of adding 1 to the number of successes and failures, you add the corresponding parameters of your beta prior?A pretty good shortcut (if you don't have a log-beta function, for example) is to approximate both A and B with the normal distribution. Then their difference is also normal, so you can just check the probability that a normal random variable is greater than zero.Specifically, the mean μ of a beta distribution is α/(α+β) and the variance σ^2 is αβ/((α+β)^2(α+β+1)). Use these as the parameters for your normal approximation, and we have the difference D ~ N(μ_A-μ_B, σ_A^2+σ_B^2). The probability that B beats A is just the CDF of D evaluated at 0.In Python:<pre><code> from scipy.stats import norm as norm def beta_mean(a, b): return a/(a+b) def beta_var(a, b): return a*b/((a+b)**2*(a+b+1)) def probability_B_beats_A(α_A, β_A, α_B, β_B): mu = beta_mean(α_A, β_A) - beta_mean(α_B, β_B) sigma = (beta_var(α_A, β_A) + beta_var(α_B, β_B))**.5 return norm.cdf(0, mu, sigma)</code></pre>

评论 #7849482 未加载

评论 #7848794 未加载

EvanMilleralmost 11 years ago

I'm pretty sure this formula is correct, but I haven't seen it published anywhere. John Cook has some veiled references to a closed-form solution when one of the parameters is an integer:<a href="http://www.mdanderson.org/education-and-research/departments-programs-and-labs/departments-and-divisions/division-of-quantitative-sciences/research/biostats-utmdabtr-005-05.pdf" rel="nofollow">http://www.mdanderson.org/education-and-research/departments...</a> [pdf]But he doesn't really say what that closed form is, so I think his version must have been pretty hairy. (My version requires all four parameters to be integers, so I doubt we were talking about the same thing.)Sadly I couldn't get the math to work out for producing a confidence interval on |p_B - p_A| so for now you're stuck with Monte Carlo for confidence bounds.Thanks to prodding from Steven Noble over at Stripe, I'll have another formula up soon for asking the same question using count data instead of binary data. Stay tuned!

评论 #7850882 未加载

评论 #7850315 未加载

评论 #7850640 未加载

评论 #7849119 未加载

评论 #7852102 未加载

jessaustinalmost 11 years ago

If you are unable to find a log-gamma function lying around, rewrite the above equation in terms of factorials using Γ(z)=(z−1)!, and notice that there are an equal number of multiplicative terms in the numerator and denominator. If you alternately multiply and divide one term at a time, you should be able to arrive at an answer without encountering numerical overflow.Something about the right side of the equation immediately preceding this quote seems to indicate that many of the terms in the numerator would cancel with equivalents in the denominator. I'm not really familiar with CAS systems, but is this the sort of thing they could do? Doing this simplification once when one writes the code seems to be a win over calculating the original expression every time the code runs.

评论 #7850443 未加载

评论 #7848590 未加载

paraschopraalmost 11 years ago

Evan, does this formula account for multiple comparisons (if you have multiple goals and multiple variations)? I guess it would suffer from the same problems that if you have 100 variations and 10 goals, some of them are bound to produce a significant result, just by random chance. Isn't it? In classical testing, you can fix it by making your calculated alpha smaller than the real alpha, so you need much more data if there are multiple comparisons. What happens in Bayesian case?Edit: I did some Googling and found this <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf" rel="nofollow">http://www.stat.columbia.edu/~gelman/research/unpublished/mu...</a>

评论 #7850331 未加载

vogalmost 11 years ago

Great work! However, an additional testing section would be nice (right below the implementation section).That section should provide two or three lists of example input values, and the expected output value (up to some accuracy).As the author notes, although this formula looks pretty simple, you can make a lot of numerical mistakes when implementing it. A test suite would help implementors to catch those mistakes early.

评论 #7849309 未加载

hidden-markovalmost 11 years ago

Indeed, much faster than Monte Carlo integration:<pre><code> 5.778 evaluation.py:20(evaluate) # Sampling version 0.043 evaluation.py:64(evaluate) # Closed formula </code></pre> (See my A/B testing library <a href="https://github.com/bogdan-kulynych/trials" rel="nofollow">https://github.com/bogdan-kulynych/trials</a>)

spitfirealmost 11 years ago

Here's the Mathematica version:PrBbeatsA[\[Alpha]a_, \[Beta]a_, \[Alpha]b_, \[Beta]b_] := \!\( \UnderoverscriptBox[\(\[Sum]\), \(i = 0\), \(\[Alpha]b\)] \FractionBox[\(Beta[\[Alpha]a + i, \[Beta]b + \[Beta]a]\), \(\((\[Beta]b + i)\) Beta[ 1 + i, \[Beta]b] Beta[\[Alpha]a, \[Beta]a]\)]\)

psychometryalmost 11 years ago

A minor complaint: that formula calculates a probability (a number within the range [0,1]), rather than odds (a ratio of expected values).

8 comments

sharnettalmost 11 years ago

评论 #7849482 未加载

评论 #7848794 未加载

EvanMilleralmost 11 years ago

评论 #7850882 未加载

评论 #7850315 未加载

评论 #7850640 未加载

评论 #7849119 未加载

评论 #7852102 未加载

jessaustinalmost 11 years ago

评论 #7850443 未加载

评论 #7848590 未加载

paraschopraalmost 11 years ago

评论 #7850331 未加载

vogalmost 11 years ago

评论 #7849309 未加载

hidden-markovalmost 11 years ago

spitfirealmost 11 years ago

psychometryalmost 11 years ago

A minor complaint: that formula calculates a probability (a number within the range [0,1]), rather than odds (a ratio of expected values).

A Formula for Bayesian A/B Testing

8 comments

A Formula for Bayesian A/B Testing

8 comments