Multi-armed bandits are also well known in game AI. They got popular with the introduction of Monte Carlo Tree Search, where MAB are used there to select which subtrees to search for largest expected payoff, e.g.:<p><a href="https://www.aaai.org/ocs/index.php/AIIDE/AIIDE13/paper/view/7377/7589" rel="nofollow">https://www.aaai.org/ocs/index.php/AIIDE/AIIDE13/paper/view/...</a>
<a href="https://courses.cs.washington.edu/courses/cse599i/18wi/resources/lecture19/lecture19.pdf" rel="nofollow">https://courses.cs.washington.edu/courses/cse599i/18wi/resou...</a>
etc.<p>For what it's worth the MAB algo in the original post looks like Epsilon Greedy. It's probably better to look into Upper Confidence Bound variants like UCB1 that dynamically adjust how much to explore vs exploit, e.g.:<p><a href="https://towardsdatascience.com/comparing-multi-armed-bandit-algorithms-on-marketing-use-cases-8de62a851831" rel="nofollow">https://towardsdatascience.com/comparing-multi-armed-bandit-...</a>
The purpose of an A/B test isn't to always show the best performing result, it's to perform a _controlled scientific experiment_ with a control group, from which you can learn things.<p>Also, I work in this field and I will just say that people _do_ behave differently based on traffic source: i.e. users coming from Facebook behave alike, but different than traffic from Reddit who act similarly to each other. If you were running a self-optimizing thing like this it _would_ make sense to split it up by the different traffic sources and handle them separately.
Multi armed bandit algorithms have been used <i>forever</i> to optimize ads, websites, newsletters etc.<p>Here is an article from 2013 that describes Googles Multi Armed Bandits algo integration into Adsense:<p><a href="https://www.cmswire.com/cms/customer-experience/google-integrates-adsense-experiment-objectives-into-content-experience-platform-022422.php" rel="nofollow">https://www.cmswire.com/cms/customer-experience/google-integ...</a><p>The first time I stumbled across the term "Multi Armed Bandit" was when I read Koza's "On the programming of computers by natural selection" in 1992. When I later got involved in e-commerce projects, it was immediately clear to me that this was the way to tackle the involved optimization tasks.
We (Facebook) just released our Adaptive Experimentation platform as open source: <a href="https://github.com/facebook/Ax" rel="nofollow">https://github.com/facebook/Ax</a>. It supports bandit optimization for discrete sets of candidates and bayesian optimization for continuous/very large search spaces.<p><a href="https://ax.dev/docs/why-ax.html" rel="nofollow">https://ax.dev/docs/why-ax.html</a>
Just a small nitpick: this doesn't take into account implementation cost. If you want something dynamic like this it means your app has <i>read</i> access to all the analytics recorded (or at least the ones needed for optimization). Most of the times apps only <i>send</i> data to the analytics services, developers read/analyze them and act based on the data. I personally didn't work on any apps that were using analytics <i>read</i> access within the app. So, in most cases it's a lot more than 20 lines to implement an approach like this (hopefully your analytics platform exposes an API with a read endpoint that you have to then integrate into your app) compared to A/B testing, where you just show several versions and then analyze the data and iterate.
Previous discussions worth checking out here <a href="https://news.ycombinator.com/item?id=11437114" rel="nofollow">https://news.ycombinator.com/item?id=11437114</a><p>and here <a href="https://news.ycombinator.com/item?id=4040022" rel="nofollow">https://news.ycombinator.com/item?id=4040022</a>
Bandit algorithms can also be approached from a Bayesian point of view! This lets you quantify the uncertainty in your estimates (e.g. how uncertain you are than ad A has a higher CTR than ad B), which a lot of other bandit methods don't offer.<p>To toot my own horn a bit, I wrote a blog post about Bayesian bandits: <a href="https://eigenfoo.xyz/bayesian-bandits/" rel="nofollow">https://eigenfoo.xyz/bayesian-bandits/</a>
It seems close to simulated annealing in the travelling salesman problem. The basics of it is that you start with a random tour and adjust path segments incrementally. Most of the time, you choose a new path segment that decreases the global route cost, but randomly, you choose a new path segment that increases the global route cost. This random factor is decreased over time, so the route anneals and settles on a close-to-optimal result.<p>It shares the same principle of choosing reasonable options most of the time, but allowing variation to keep from getting stuck in a local optimum.
Also one reason a lot of teams can't do more than two options (A, B, C, D, E, F, G, etc. testing) is because you need a TON of traffic for it to be statistically significant.
I published this work in CHI on the use of multiarmed bandits in educational games. My biggest takeaway was the importance of choosing the right metric -- because our optimization worked too well.<p><a href="https://www.researchgate.net/publication/301935710_Interface_Design_Optimization_as_a_Multi-Armed_Bandit_Problem" rel="nofollow">https://www.researchgate.net/publication/301935710_Interface...</a>
Microsoft had good talk about contextual bandits and machine learning. The business case discussed was webpage conversion optimization which increased by 80%.<p><a href="https://www.microsoft.com/en-us/research/blog/new-perspectives-on-contextual-bandit/" rel="nofollow">https://www.microsoft.com/en-us/research/blog/new-perspectiv...</a><p><a href="https://podcasts.apple.com/nl/podcast/microsoft-research-podcast/id1318021537?l=en&i=1000437509373" rel="nofollow">https://podcasts.apple.com/nl/podcast/microsoft-research-pod...</a>
UCB1 is really not that much more complicated than epsilon-greedy. Some slightly sloppy code I wrote a few years ago, maybe 20 lines of code: <a href="https://github.com/j2kun/ucb1/blob/master/ucb1.py#L7-L35" rel="nofollow">https://github.com/j2kun/ucb1/blob/master/ucb1.py#L7-L35</a><p>Sure you have to read a bit more to know why it works, but if you write your code well you could plug this in without any extra trouble. It's not like you need a special optimization solver as a dependency.
Great to see Multi-Armed Bandits getting coverage here. I wrote an article last year to help with gaining a deeper understanding on the topic:<p><a href="https://towardsdatascience.com/bandits-for-recommender-system-optimization-1d702662346e" rel="nofollow">https://towardsdatascience.com/bandits-for-recommender-syste...</a>
>In recent years, hundreds of the brightest minds of modern civilization have been hard at work not curing cancer.<p>As a cancer researcher I read this as a just criticism of my work.
Ahhh, the epsilon greedy strategy... I forgot I had read this until I was a few paragraphs in. If you have rare positive results, it gives you better statistics on your highest signal tests, while still evaluating the others (i.e. more than one alternative).
At least for us, most A/B/.. tests require a stable assignment of test to user as key metrics like retention would obviously skewed by randomly assigning on each visit/page view.
"hundreds of the brightest minds of modern civilization have been hard at work not curing cancer. Instead, they have been refining techniques for getting you and me to click on banner ads."<p>Just out of curiosity... Have you <i>ever</i> purposefully clicked on an ad on the internet? I honestly dont think I ever have.<p>ps. I mean an outright overt straight up ad, not, for example, some article linked on HN that is a thinly veiled promo piece for something.
Does anyone have a reference for solving multi-armed bandit problems with a finite time horizon? I would like something that derives rules or heuristics for how your explore/exploit tradeoff changes as the horizon approaches.<p>This seems like an obvious extension, and something that someone should have worked on given how long this problem has been around, but I've been unable to find anything on it. Any pointers?
For me the website is down (its 404'ing, accessing from Brazil) , but it can still be accessed in webarchive: <a href="https://web.archive.org/web/20190527144648/https://stevehanov.ca/blog/?id=132" rel="nofollow">https://web.archive.org/web/20190527144648/https://stevehano...</a>
MAB seems to find a local maxima subject to input biases whereas an AB test is aimed to figure out a scientific truth and isolates out all potential biases in the system. I'd be curious to hear where a MAB approach and an AB test did not yield the same results and why that happened.
Before I even clicked I knew this was going to be about multi-armed bandits. I agree with the author -- why don't more people know about this? It's so effective and easy, and you get results much quicker.
I encounter a/b testing primarily in article headlines, where the choice is always "accurate, informative title" or "inflammatory sensational bullshit clickbait" and the clickbait wins every time.