The main reason AlphaGo Zero learns so much faster than its predecessors is because it uses temporal-difference learning.[1] This effectively removes a <i>huge</i> amount of the value network's state space for the learning algorithm to search through, since it bakes in the assumption that a move's value ought to equal that of the best available move in the following board position, which is exactly what you'd expect for a game like Go.<p>A secondary reason for AlphaGo Zero's performance is that it combines both value and policy networks into a single network, since it's redundant to have two networks for move selection.<p>These are the two biggest distinguishing characteristics of AlphaGo Zero compared to previous AlphaGos, and the OP doesn't discuss either of them.<p>[1] <a href="https://en.wikipedia.org/wiki/Temporal_difference_learning" rel="nofollow">https://en.wikipedia.org/wiki/Temporal_difference_learning</a>
If the author's here: some of the math formulas don't render correctly. In particular, 10^170 is parsed as 10^{1}70, and $5478$ shows up without TeX applied to it.
They are two things a human brain does when playing chess or go: evaluating a position and mentally playing some positions (by doing a search tree).<p>The AlphaGo neural network is able to do the first part (evaluating positions) but the search tree is still a hand crafted algorithm. Do they have plans to work on a version with a pure neural network? (i.e. a version which would be able to learn how to do a search tree.)
Would be really cool to see a generic framework for this, where you can plug in the rules of your discrete-deterministic-game-with-perfect-information and get a superhuman bot. Does something like this already exist?
Given that AlphaGo Zero was trained on several million games of self plays, each game involving hundreds of steps, each step with 1600 MCTS simulations, the total number of board positions it has considered is on the order of trillions. While impressive it pales in comparison to the number of possible board positions of 10^170 (<a href="https://en.m.wikipedia.org/wiki/Go_and_mathematics" rel="nofollow">https://en.m.wikipedia.org/wiki/Go_and_mathematics</a>). So its amazing performance tells us that:<p>1. Possibly the elegant rule of the game cuts down the search space so much that there is a learnable function that gives us optimal MCTS supervision;<p>2. Or CNN approximates human visual intuition so well, so while Zero has not evaluated so many board positions it has evaluated all the positions that human has ever considered - so it remains possible that a different network could produce different strategies and be better than Zero.
> It is interesting to see how quickly the field of AI is progressing. Those who claim we will be able to see the robot overlords coming in time should take heed - these AI's will only be human-level for a brief instant before blasting past us into superhuman territories, never to look back.<p>This final paragraph is just editorializing. A computer will never care about anything (including games like Go and domination of other beings) that it is not programmed to imitate care about, and will thus remain perennially unmotivated.<p>Also, my intuition says that gradient descent is an ugly hack and that there HAS to be some better way (like a direct way) to get at the inverse of a matrix (not just in specific cases but in the general case!), but I digress, and not being a mathematician, perhaps someone has already proved somehow that a general method to directly and efficiently invert all possible matrices is impossible
I wonder how the STYLE of Alpha Go Zero is regarded by human experts. Is it far different from AlphaGo? Why bother learning from AlphaGo if they can learn from AlphaGo Zero?<p>Did they unleash a second "Master" program?<p>I am wondering if the "better" strategy moves are now super wacky and weird and break all theory.
someone posted an attempt at an open source implementation of alphagozero <a href="https://github.com/yhyu13/AlphaGOZero-python-tensorflow" rel="nofollow">https://github.com/yhyu13/AlphaGOZero-python-tensorflow</a><p>anyone try it yet ?
Saw the AlphaGo movie at a festival recently.<p>Been following the AlphaGo Zero developments, which leap-frog what was going on in the movie (although still very much worth seeing).<p>One thing I was curious about is if Go would be considered solved, either hard or weakly solved, since AlphaGo Zero at this point doesn't seem to be able to be beat by any living human. Wikipedia does not list it as solved in either sense, and I was wondering if this was an oversight.
I don't get what is new in the set of attributes that this article describes.<p>Monte Carlo was already used in 2005 in AIs playing on KGS. Gradient Descent is a basic algorithm is a basic algorithm that I saw in an AI class in ~2008 as well. I bet both are even a lot older and well known by all experts.<p>This is not what makes AlphaGo special or Zero successful. The curious thing about Zero is that usually with Gradient Descent you run a huge risk of running into a local maximum and then stop evolving because every evolution makes you not better than the current step.<p>So one question is actually how they used these same old algorithms so much more efficiently, and the second question is how did they overcame the local maximum problem. Additionally there may be other problems involved that experts know better than me.<p>But an explanation of basic algorthims can't be the answer.
Are there any plans to do this for Chess?<p>I imagine that this is an iteration of the Alpha Go engine, people working on this are very current with Alpha Go.<p>If Chess is similar, then wouldn't DeepMind be able to bootstrap game knowledge. Perhaps this isn't a big goal, but Chess is Chess after all.
Go has been studied for hundreds of years. In many cases, by people who study the game since their childhood and work on it as a full-time occupation.<p>The consequence of Alpha Go Zero is that it can, in a matter of days, disregard and surpass all human knowledge about the game.<p>Maximizing a score margin has been equated for a long time with maximizing your probability of winning. Alpha Go doesn't play like that... it's a substantial paradigm shift. If you see the early commentaries you will see that human players were initially saying Alpha Go moves were mistakes because they were slow and wasted opportunities to get more territory, to then realize that Alpha Go was actually winning.
"On a long enough timeline, everything is a discrete game." (With apologies to _Fight Club_)<p>Personally, I look forward to the day when the software I own works for me to the extent of optimizing the decisions I make during the day, even many mundane ones. Properly executed, such a system could make a big difference in my quality of life. I believe that a big piece that is missing is a solid life-model, a life-representation, that can be optimized. Once that is defined, an RNN or MCS can optimize it and I can reap the benefits.
I wonder how AlphaGo Zero would fare against the others if they were all using the same search algorithm, and I wonder how the search depth vs breadth changes in Zero compared to earlier variants.
As someone who has played many a game of Tic-Tac-Toe, I found the numerical examples really hard to follow. s(0,5) is obviously the winning move for the X player, but for some reason all examples seem to favor s(0,1).