TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Introduction to Monte Carlo Tree Search

173 点作者 wlkr超过 9 年前

4 条评论

shoo超过 9 年前
UCB bandit algorithms are quite simple to understand, and the generalisation of them by the UCT algorithm to tree search spaces seems like a pretty clean idea.<p>I had a go at using UCT to explore a search space earlier this year (not games, but selecting a good sequence of decisions inside a simulation).<p>The problem I ran into with this approach was that the algorithm has no way to generalise or lift experience from one explored state to other similar &#x2F; unexplored states, and my search space was large. So having to evaluate each state at least once was prohibitive.<p>This isn&#x27;t really a criticism of how well UCT can tackle the problem as framed, but more a criticism of how much better one might be able to do with a bit more structure -- e.g. perhaps with the addition of some kind of similarity metric between states, along with some bounds or assumptions on how the values of &quot;similar&quot; states should relate to each other.<p>It seems like combining some machine learning &#x2F; statistical &#x2F; approximation techniques to approximate the true state-&gt;value function with a UCT-like approach to guide exploration could be very effective for the right problem, but getting it to work well might be more of an art than a science.<p>If anyone has any pointers to reading about this kind of thing, I&#x27;d be most grateful! I have previously read about &quot;approximate dynamic programming&quot; techniques, e.g. [1]<p>[1] <a href="http:&#x2F;&#x2F;castlelab.princeton.edu&#x2F;Papers&#x2F;Powell-NRLWhat%20you%20should%20know%20about%20approximate%20dynamic%20programming.pdf" rel="nofollow">http:&#x2F;&#x2F;castlelab.princeton.edu&#x2F;Papers&#x2F;Powell-NRLWhat%20you%2...</a>
评论 #10211359 未加载
评论 #10210650 未加载
评论 #10210470 未加载
jasisz超过 9 年前
This algorithm (in it&#x27;s regular form used often in games and examples) has one interesting &quot;downside&quot; I was exploring some time ago - selection is performed using the UCB formula. So basically it tries to maximize the player payout. But in the most games this is in fact impractical assumption, because we end up tending to expand branches that will be most likely not chosen by our opponent. As in the example (I assume gray moves are &quot;our&quot; moves) - we will much more likely choose to expand 5&#x2F;6 branch instead of the 2&#x2F;4, that will be in fact more likely chosen by our opponent.
评论 #10210928 未加载
JD557超过 9 年前
A quick note to the author (in case he reads this): I think the &quot;4&#x2F;8&quot; node should be &quot;5&#x2F;8&quot; in all images.
评论 #10211568 未加载
评论 #10212081 未加载
CyberDildonics超过 9 年前
sounds a lot like gaussian kd-trees<p><a href="http:&#x2F;&#x2F;graphics.stanford.edu&#x2F;papers&#x2F;gkdtrees&#x2F;gkdtrees.pdf" rel="nofollow">http:&#x2F;&#x2F;graphics.stanford.edu&#x2F;papers&#x2F;gkdtrees&#x2F;gkdtrees.pdf</a><p>Instead of doing a brute force nearest neighbor search, samples are propagated through a tree.<p>This technique is exotic, but not new.