UCB bandit algorithms are quite simple to understand, and the generalisation of them by the UCT algorithm to tree search spaces seems like a pretty clean idea.<p>I had a go at using UCT to explore a search space earlier this year (not games, but selecting a good sequence of decisions inside a simulation).<p>The problem I ran into with this approach was that the algorithm has no way to generalise or lift experience from one explored state to other similar / unexplored states, and my search space was large. So having to evaluate each state at least once was prohibitive.<p>This isn't really a criticism of how well UCT can tackle the problem as framed, but more a criticism of how much better one might be able to do with a bit more structure -- e.g. perhaps with the addition of some kind of similarity metric between states, along with some bounds or assumptions on how the values of "similar" states should relate to each other.<p>It seems like combining some machine learning / statistical / approximation techniques to approximate the true state->value function with a UCT-like approach to guide exploration could be very effective for the right problem, but getting it to work well might be more of an art than a science.<p>If anyone has any pointers to reading about this kind of thing, I'd be most grateful! I have previously read about "approximate dynamic programming" techniques, e.g. [1]<p>[1] <a href="http://castlelab.princeton.edu/Papers/Powell-NRLWhat%20you%20should%20know%20about%20approximate%20dynamic%20programming.pdf" rel="nofollow">http://castlelab.princeton.edu/Papers/Powell-NRLWhat%20you%2...</a>
This algorithm (in it's regular form used often in games and examples) has one interesting "downside" I was exploring some time ago - selection is performed using the UCB formula. So basically it tries to maximize the player payout. But in the most games this is in fact impractical assumption, because we end up tending to expand branches that will be most likely not chosen by our opponent. As in the example (I assume gray moves are "our" moves) - we will much more likely choose to expand 5/6 branch instead of the 2/4, that will be in fact more likely chosen by our opponent.
sounds a lot like gaussian kd-trees<p><a href="http://graphics.stanford.edu/papers/gkdtrees/gkdtrees.pdf" rel="nofollow">http://graphics.stanford.edu/papers/gkdtrees/gkdtrees.pdf</a><p>Instead of doing a brute force nearest neighbor search, samples are propagated through a tree.<p>This technique is exotic, but not new.