TechEcho

3 comments

One of the assumptions of vanilla multi-armed bandits is that the underlying reward rates are fixed. It's not valid to assume that in a lot of cases, including e-commerce.To see how things could go wrong, imagine that you are running this on an website with a control/treatment variant. After a bit you end up sampling the treatment a little more (say 60:40). You now start running a sale - and the conversion rate for BOTH variants goes up equally (say). But since you are sampling from the treatment variant more, its overall conversion rate goes up faster than the control - meaning you start weighting even more towards that variant. This could be happening purely because of the sale and random noise at the start - you could even end up optimising towards the wrong variant. There are more sophisticated MAB approaches that try to remove the identical reward-rate assumption - they have to model a lot more uncertainty, and so optimise more conservatively.

评论 #28841386 未加载

评论 #28840705 未加载

Normal_gaussianover 3 years ago

This is the classic Multi-Armed bandit problem <a href="https://en.m.wikipedia.org/wiki/Multi-armed_bandit" rel="nofollow">https://en.m.wikipedia.org/wiki/Multi-armed_bandit</a>I like the graphs and the explanation leads the reader deeper, but it takes the naive approach to exploration without discussing trade-offs.Tangentially, nearly every self-optimising a/b test I have code reviewed has been significantly flawed.

评论 #28841414 未加载

jawnsover 3 years ago

I used to work for an A/B testing company, and we used both contextual and non-contextual Bayesian multi-armed bandit approaches.Here's a cool talk my former colleague Austin Rochford gave at the 2018 PyData NYC conference about how we implemented it and made it work at scale:<a href="https://www.youtube.com/watch?v=vupP9MYXeFM" rel="nofollow">https://www.youtube.com/watch?v=vupP9MYXeFM</a>

Self-Optimizing A/B Tests

3 comments

Self-Optimizing A/B Tests

3 comments