There is a public distributed effort happening for Go right now: <a href="http://zero.sjeng.org/" rel="nofollow">http://zero.sjeng.org/</a>. They've been doing a fantastic job, and just recently fixed a big training bug that has resulted in a large strength increase.<p>I ported over from GCP's Go implementation to chess: <a href="https://github.com/glinscott/leela-chess" rel="nofollow">https://github.com/glinscott/leela-chess</a>. The distributed part isn't ready to go yet, we are still working the bugs out using supervised training, but will be launching soon!
In 1989, Victor Allis "solved" the game of Connect 4, proving (apparently) that the first player can always force a win, even if both sides play perfectly.<p>In 1996, Giuliano Bertoletti implemented Victor Allis's strategy in a program named Velena:<p><a href="http://www.ce.unipr.it/~gbe/velena.html" rel="nofollow">http://www.ce.unipr.it/~gbe/velena.html</a><p>It's written in C. If someone can get it to compile on a modern system, it would be interesting to see how well the AlphaZero approach fares against a supposedly perfect AI.
Can someone share some intuition of the tradeoffs between monte-carlo tree search compared to vanilla policy gradient reinforcement learning?<p>MCTS has gotten really popular as of AlphaZero, but it's not clear to me how this compares to more simple reinforcement learning techniques that just have a softmax output of the possible moves the agent can make. My intuition is that MCTS is better for planning, but takes longer to train/evaluate. Is that true? Is there some games one will work better than the other?
Shameless self plug. I spent a Saturday morning doing a similar (no monte-carlo, no AI library) thing recently with tic-tac-toe. I based this mostly on intuition, would love any feedback.<p><a href="https://github.com/frenchie4111/genetic-algorithm-playground/blob/master/tictactoe.ipynb" rel="nofollow">https://github.com/frenchie4111/genetic-algorithm-playground...</a>
Use this to get rid of the obnoxiously large sticky header <a href="https://alisdair.mcdiarmid.org/kill-sticky-headers/" rel="nofollow">https://alisdair.mcdiarmid.org/kill-sticky-headers/</a>
> Not quite as complex as Go, but there are still 4,531,985,219,092 game positions in total, so not trivial for a laptop to learn how to play well with zero human input.<p>That's a small enough state space that it is indeed trivial to brute force it on a laptop.<p>Putting aside that though, it would be interesting to compare vs a standard alpha-beta pruning minimax algorithm running at various depth levels.
Thanks for the great demo! Uploaded to Azure Notebooks in case anyone wants to run/play/edit...<p><a href="https://notebooks.azure.com/smortaz/libraries/Demo-DeepReinforcementLearning" rel="nofollow">https://notebooks.azure.com/smortaz/libraries/Demo-DeepReinf...</a><p>Click Clone to get your own copy, then Run the run.ipynb file.
As an aside, does anybody know the monospace font that we see in the screenshots? Here, for instance: <a href="https://cdn-images-1.medium.com/max/1200/1*8zfDGlLuXfiLGnWlzvZwmQ.png" rel="nofollow">https://cdn-images-1.medium.com/max/1200/1*8zfDGlLuXfiLGnWlz...</a>
RISE Lab's Ray platform (now includes RLlib) is another option <a href="https://www.oreilly.com/ideas/introducing-rllib-a-composable-and-scalable-reinforcement-learning-library" rel="nofollow">https://www.oreilly.com/ideas/introducing-rllib-a-composable...</a>
Is there magic incantation I have to say to get this to compile? Jupyter says I'm missing things when I try to run it (despite installing the things with pip)
If you actually want to contribute towards an open source AlphaZero implementation you may want to checkout <a href="https://github.com/gcp/leela-zero" rel="nofollow">https://github.com/gcp/leela-zero</a>