TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The multi-armed bandit problem (2012)

303 pointsby navigaidalmost 6 years ago

25 comments

60654almost 6 years ago
Multi-armed bandits are also well known in game AI. They got popular with the introduction of Monte Carlo Tree Search, where MAB are used there to select which subtrees to search for largest expected payoff, e.g.:<p><a href="https:&#x2F;&#x2F;www.aaai.org&#x2F;ocs&#x2F;index.php&#x2F;AIIDE&#x2F;AIIDE13&#x2F;paper&#x2F;view&#x2F;7377&#x2F;7589" rel="nofollow">https:&#x2F;&#x2F;www.aaai.org&#x2F;ocs&#x2F;index.php&#x2F;AIIDE&#x2F;AIIDE13&#x2F;paper&#x2F;view&#x2F;...</a> <a href="https:&#x2F;&#x2F;courses.cs.washington.edu&#x2F;courses&#x2F;cse599i&#x2F;18wi&#x2F;resources&#x2F;lecture19&#x2F;lecture19.pdf" rel="nofollow">https:&#x2F;&#x2F;courses.cs.washington.edu&#x2F;courses&#x2F;cse599i&#x2F;18wi&#x2F;resou...</a> etc.<p>For what it&#x27;s worth the MAB algo in the original post looks like Epsilon Greedy. It&#x27;s probably better to look into Upper Confidence Bound variants like UCB1 that dynamically adjust how much to explore vs exploit, e.g.:<p><a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;comparing-multi-armed-bandit-algorithms-on-marketing-use-cases-8de62a851831" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;comparing-multi-armed-bandit-...</a>
评论 #20025760 未加载
err4ntalmost 6 years ago
The purpose of an A&#x2F;B test isn&#x27;t to always show the best performing result, it&#x27;s to perform a _controlled scientific experiment_ with a control group, from which you can learn things.<p>Also, I work in this field and I will just say that people _do_ behave differently based on traffic source: i.e. users coming from Facebook behave alike, but different than traffic from Reddit who act similarly to each other. If you were running a self-optimizing thing like this it _would_ make sense to split it up by the different traffic sources and handle them separately.
评论 #20023591 未加载
评论 #20023571 未加载
评论 #20023714 未加载
评论 #20026742 未加载
评论 #20025422 未加载
评论 #20026377 未加载
评论 #20023921 未加载
founderlingalmost 6 years ago
Multi armed bandit algorithms have been used <i>forever</i> to optimize ads, websites, newsletters etc.<p>Here is an article from 2013 that describes Googles Multi Armed Bandits algo integration into Adsense:<p><a href="https:&#x2F;&#x2F;www.cmswire.com&#x2F;cms&#x2F;customer-experience&#x2F;google-integrates-adsense-experiment-objectives-into-content-experience-platform-022422.php" rel="nofollow">https:&#x2F;&#x2F;www.cmswire.com&#x2F;cms&#x2F;customer-experience&#x2F;google-integ...</a><p>The first time I stumbled across the term &quot;Multi Armed Bandit&quot; was when I read Koza&#x27;s &quot;On the programming of computers by natural selection&quot; in 1992. When I later got involved in e-commerce projects, it was immediately clear to me that this was the way to tackle the involved optimization tasks.
评论 #20023353 未加载
ampersandyalmost 6 years ago
We (Facebook) just released our Adaptive Experimentation platform as open source: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;Ax" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;facebook&#x2F;Ax</a>. It supports bandit optimization for discrete sets of candidates and bayesian optimization for continuous&#x2F;very large search spaces.<p><a href="https:&#x2F;&#x2F;ax.dev&#x2F;docs&#x2F;why-ax.html" rel="nofollow">https:&#x2F;&#x2F;ax.dev&#x2F;docs&#x2F;why-ax.html</a>
XCSmealmost 6 years ago
Just a small nitpick: this doesn&#x27;t take into account implementation cost. If you want something dynamic like this it means your app has <i>read</i> access to all the analytics recorded (or at least the ones needed for optimization). Most of the times apps only <i>send</i> data to the analytics services, developers read&#x2F;analyze them and act based on the data. I personally didn&#x27;t work on any apps that were using analytics <i>read</i> access within the app. So, in most cases it&#x27;s a lot more than 20 lines to implement an approach like this (hopefully your analytics platform exposes an API with a read endpoint that you have to then integrate into your app) compared to A&#x2F;B testing, where you just show several versions and then analyze the data and iterate.
评论 #20026175 未加载
mhoadalmost 6 years ago
Previous discussions worth checking out here <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=11437114" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=11437114</a><p>and here <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=4040022" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=4040022</a>
评论 #20025987 未加载
_eigenfooalmost 6 years ago
Bandit algorithms can also be approached from a Bayesian point of view! This lets you quantify the uncertainty in your estimates (e.g. how uncertain you are than ad A has a higher CTR than ad B), which a lot of other bandit methods don&#x27;t offer.<p>To toot my own horn a bit, I wrote a blog post about Bayesian bandits: <a href="https:&#x2F;&#x2F;eigenfoo.xyz&#x2F;bayesian-bandits&#x2F;" rel="nofollow">https:&#x2F;&#x2F;eigenfoo.xyz&#x2F;bayesian-bandits&#x2F;</a>
daenzalmost 6 years ago
It seems close to simulated annealing in the travelling salesman problem. The basics of it is that you start with a random tour and adjust path segments incrementally. Most of the time, you choose a new path segment that decreases the global route cost, but randomly, you choose a new path segment that increases the global route cost. This random factor is decreased over time, so the route anneals and settles on a close-to-optimal result.<p>It shares the same principle of choosing reasonable options most of the time, but allowing variation to keep from getting stuck in a local optimum.
评论 #20024469 未加载
a13nalmost 6 years ago
Also one reason a lot of teams can&#x27;t do more than two options (A, B, C, D, E, F, G, etc. testing) is because you need a TON of traffic for it to be statistically significant.
评论 #20023730 未加载
dr_dshivalmost 6 years ago
I published this work in CHI on the use of multiarmed bandits in educational games. My biggest takeaway was the importance of choosing the right metric -- because our optimization worked too well.<p><a href="https:&#x2F;&#x2F;www.researchgate.net&#x2F;publication&#x2F;301935710_Interface_Design_Optimization_as_a_Multi-Armed_Bandit_Problem" rel="nofollow">https:&#x2F;&#x2F;www.researchgate.net&#x2F;publication&#x2F;301935710_Interface...</a>
评论 #20027144 未加载
lowdosealmost 6 years ago
Microsoft had good talk about contextual bandits and machine learning. The business case discussed was webpage conversion optimization which increased by 80%.<p><a href="https:&#x2F;&#x2F;www.microsoft.com&#x2F;en-us&#x2F;research&#x2F;blog&#x2F;new-perspectives-on-contextual-bandit&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.microsoft.com&#x2F;en-us&#x2F;research&#x2F;blog&#x2F;new-perspectiv...</a><p><a href="https:&#x2F;&#x2F;podcasts.apple.com&#x2F;nl&#x2F;podcast&#x2F;microsoft-research-podcast&#x2F;id1318021537?l=en&amp;i=1000437509373" rel="nofollow">https:&#x2F;&#x2F;podcasts.apple.com&#x2F;nl&#x2F;podcast&#x2F;microsoft-research-pod...</a>
j2kunalmost 6 years ago
UCB1 is really not that much more complicated than epsilon-greedy. Some slightly sloppy code I wrote a few years ago, maybe 20 lines of code: <a href="https:&#x2F;&#x2F;github.com&#x2F;j2kun&#x2F;ucb1&#x2F;blob&#x2F;master&#x2F;ucb1.py#L7-L35" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;j2kun&#x2F;ucb1&#x2F;blob&#x2F;master&#x2F;ucb1.py#L7-L35</a><p>Sure you have to read a bit more to know why it works, but if you write your code well you could plug this in without any extra trouble. It&#x27;s not like you need a special optimization solver as a dependency.
评论 #20024360 未加载
sbovalmost 6 years ago
One thing that is vastly more expensive than picking a winner in your A&#x2F;B test is maintaining N variations across M features.
seoulsisteralmost 6 years ago
Great to see Multi-Armed Bandits getting coverage here. I wrote an article last year to help with gaining a deeper understanding on the topic:<p><a href="https:&#x2F;&#x2F;towardsdatascience.com&#x2F;bandits-for-recommender-system-optimization-1d702662346e" rel="nofollow">https:&#x2F;&#x2F;towardsdatascience.com&#x2F;bandits-for-recommender-syste...</a>
astazangastaalmost 6 years ago
&gt;In recent years, hundreds of the brightest minds of modern civilization have been hard at work not curing cancer.<p>As a cancer researcher I read this as a just criticism of my work.
kurthralmost 6 years ago
Ahhh, the epsilon greedy strategy... I forgot I had read this until I was a few paragraphs in. If you have rare positive results, it gives you better statistics on your highest signal tests, while still evaluating the others (i.e. more than one alternative).
NewsAwarealmost 6 years ago
At least for us, most A&#x2F;B&#x2F;.. tests require a stable assignment of test to user as key metrics like retention would obviously skewed by randomly assigning on each visit&#x2F;page view.
评论 #20023503 未加载
everyonealmost 6 years ago
&quot;hundreds of the brightest minds of modern civilization have been hard at work not curing cancer. Instead, they have been refining techniques for getting you and me to click on banner ads.&quot;<p>Just out of curiosity... Have you <i>ever</i> purposefully clicked on an ad on the internet? I honestly dont think I ever have.<p>ps. I mean an outright overt straight up ad, not, for example, some article linked on HN that is a thinly veiled promo piece for something.
评论 #20027046 未加载
moultanoalmost 6 years ago
Does anyone have a reference for solving multi-armed bandit problems with a finite time horizon? I would like something that derives rules or heuristics for how your explore&#x2F;exploit tradeoff changes as the horizon approaches.<p>This seems like an obvious extension, and something that someone should have worked on given how long this problem has been around, but I&#x27;ve been unable to find anything on it. Any pointers?
评论 #20027017 未加载
gchamonlivealmost 6 years ago
For me the website is down (its 404&#x27;ing, accessing from Brazil) , but it can still be accessed in webarchive: <a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20190527144648&#x2F;https:&#x2F;&#x2F;stevehanov.ca&#x2F;blog&#x2F;?id=132" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20190527144648&#x2F;https:&#x2F;&#x2F;stevehano...</a>
cfarmalmost 6 years ago
MAB seems to find a local maxima subject to input biases whereas an AB test is aimed to figure out a scientific truth and isolates out all potential biases in the system. I&#x27;d be curious to hear where a MAB approach and an AB test did not yield the same results and why that happened.
评论 #20027058 未加载
jedbergalmost 6 years ago
Before I even clicked I knew this was going to be about multi-armed bandits. I agree with the author -- why don&#x27;t more people know about this? It&#x27;s so effective and easy, and you get results much quicker.
Aeolunalmost 6 years ago
How is a bandit related to this?<p>Based on just the article it could be a multi armed chef.
评论 #20026612 未加载
Causality1almost 6 years ago
I encounter a&#x2F;b testing primarily in article headlines, where the choice is always &quot;accurate, informative title&quot; or &quot;inflammatory sensational bullshit clickbait&quot; and the clickbait wins every time.
messealmost 6 years ago
Despite being just text, images and tables, the post doesn&#x27;t appear at all with js disabled.