Alpha Go Zero: How and Why It Works

354 pointsby Mageekover 7 years ago

20 comments

shghsover 7 years ago

The main reason AlphaGo Zero learns so much faster than its predecessors is because it uses temporal-difference learning.[1] This effectively removes a huge amount of the value network's state space for the learning algorithm to search through, since it bakes in the assumption that a move's value ought to equal that of the best available move in the following board position, which is exactly what you'd expect for a game like Go.A secondary reason for AlphaGo Zero's performance is that it combines both value and policy networks into a single network, since it's redundant to have two networks for move selection.These are the two biggest distinguishing characteristics of AlphaGo Zero compared to previous AlphaGos, and the OP doesn't discuss either of them.[1] <a href="https://en.wikipedia.org/wiki/Temporal_difference_learning" rel="nofollow">https://en.wikipedia.org/wiki/Temporal_difference_learning</a>

评论 #15628215 未加载

评论 #15627936 未加载

评论 #15628735 未加载

评论 #15630129 未加载

评论 #15628136 未加载

评论 #15628833 未加载

评论 #15629612 未加载

评论 #15629943 未加载

评论 #15628109 未加载

评论 #15628704 未加载

评论 #15628831 未加载

gwernover 7 years ago

Some additional discussion: <a href="https://www.reddit.com/r/reinforcementlearning/comments/778vbk/mastering_the_game_of_go_without_human_knowledge/" rel="nofollow">https://www.reddit.com/r/reinforcementlearning/comments/778v...</a> <a href="https://www.reddit.com/r/MachineLearning/comments/7apvr4/d_it_seems_like_alphago_zeros_biggest_successes/" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/7apvr4/d_i...</a>

saagarjhaover 7 years ago

If the author's here: some of the math formulas don't render correctly. In particular, 10^170 is parsed as 10^{1}70, and $5478$ shows up without TeX applied to it.

评论 #15628221 未加载

评论 #15627952 未加载

unpseudoover 7 years ago

They are two things a human brain does when playing chess or go: evaluating a position and mentally playing some positions (by doing a search tree).The AlphaGo neural network is able to do the first part (evaluating positions) but the search tree is still a hand crafted algorithm. Do they have plans to work on a version with a pure neural network? (i.e. a version which would be able to learn how to do a search tree.)

评论 #15630089 未加载

nemo1618over 7 years ago

Would be really cool to see a generic framework for this, where you can plug in the rules of your discrete-deterministic-game-with-perfect-information and get a superhuman bot. Does something like this already exist?

评论 #15630064 未加载

评论 #15628783 未加载

fspeechover 7 years ago

Given that AlphaGo Zero was trained on several million games of self plays, each game involving hundreds of steps, each step with 1600 MCTS simulations, the total number of board positions it has considered is on the order of trillions. While impressive it pales in comparison to the number of possible board positions of 10^170 (<a href="https://en.m.wikipedia.org/wiki/Go_and_mathematics" rel="nofollow">https://en.m.wikipedia.org/wiki/Go_and_mathematics</a>). So its amazing performance tells us that:1. Possibly the elegant rule of the game cuts down the search space so much that there is a learnable function that gives us optimal MCTS supervision;2. Or CNN approximates human visual intuition so well, so while Zero has not evaluated so many board positions it has evaluated all the positions that human has ever considered - so it remains possible that a different network could produce different strategies and be better than Zero.

pmarreckover 7 years ago

> It is interesting to see how quickly the field of AI is progressing. Those who claim we will be able to see the robot overlords coming in time should take heed - these AI's will only be human-level for a brief instant before blasting past us into superhuman territories, never to look back.This final paragraph is just editorializing. A computer will never care about anything (including games like Go and domination of other beings) that it is not programmed to imitate care about, and will thus remain perennially unmotivated.Also, my intuition says that gradient descent is an ugly hack and that there HAS to be some better way (like a direct way) to get at the inverse of a matrix (not just in specific cases but in the general case!), but I digress, and not being a mathematician, perhaps someone has already proved somehow that a general method to directly and efficiently invert all possible matrices is impossible

评论 #15629478 未加载

EGregover 7 years ago

I wonder how the STYLE of Alpha Go Zero is regarded by human experts. Is it far different from AlphaGo? Why bother learning from AlphaGo if they can learn from AlphaGo Zero?Did they unleash a second "Master" program?I am wondering if the "better" strategy moves are now super wacky and weird and break all theory.

评论 #15627515 未加载

评论 #15627971 未加载

naveen99over 7 years ago

someone posted an attempt at an open source implementation of alphagozero <a href="https://github.com/yhyu13/AlphaGOZero-python-tensorflow" rel="nofollow">https://github.com/yhyu13/AlphaGOZero-python-tensorflow</a>anyone try it yet ?

评论 #15627960 未加载

bluetwoover 7 years ago

Saw the AlphaGo movie at a festival recently.Been following the AlphaGo Zero developments, which leap-frog what was going on in the movie (although still very much worth seeing).One thing I was curious about is if Go would be considered solved, either hard or weakly solved, since AlphaGo Zero at this point doesn't seem to be able to be beat by any living human. Wikipedia does not list it as solved in either sense, and I was wondering if this was an oversight.

评论 #15628274 未加载

评论 #15628248 未加载

评论 #15628530 未加载

erikbover 7 years ago

I don't get what is new in the set of attributes that this article describes.Monte Carlo was already used in 2005 in AIs playing on KGS. Gradient Descent is a basic algorithm is a basic algorithm that I saw in an AI class in ~2008 as well. I bet both are even a lot older and well known by all experts.This is not what makes AlphaGo special or Zero successful. The curious thing about Zero is that usually with Gradient Descent you run a huge risk of running into a local maximum and then stop evolving because every evolution makes you not better than the current step.So one question is actually how they used these same old algorithms so much more efficiently, and the second question is how did they overcame the local maximum problem. Additionally there may be other problems involved that experts know better than me.But an explanation of basic algorthims can't be the answer.

评论 #15629891 未加载

评论 #15630156 未加载

zeristorover 7 years ago

Are there any plans to do this for Chess?I imagine that this is an iteration of the Alpha Go engine, people working on this are very current with Alpha Go.If Chess is similar, then wouldn't DeepMind be able to bootstrap game knowledge. Perhaps this isn't a big goal, but Chess is Chess after all.

评论 #15629289 未加载

评论 #15628975 未加载

partycoderover 7 years ago

Go has been studied for hundreds of years. In many cases, by people who study the game since their childhood and work on it as a full-time occupation.The consequence of Alpha Go Zero is that it can, in a matter of days, disregard and surpass all human knowledge about the game.Maximizing a score margin has been equated for a long time with maximizing your probability of winning. Alpha Go doesn't play like that... it's a substantial paradigm shift. If you see the early commentaries you will see that human players were initially saying Alpha Go moves were mistakes because they were slow and wasted opportunities to get more territory, to then realize that Alpha Go was actually winning.

评论 #15627843 未加载

评论 #15628015 未加载

ytersover 7 years ago

Are there adversarial examples for Alpha Go Zero?

评论 #15627756 未加载

评论 #15627775 未加载

评论 #15627733 未加载

评论 #15627729 未加载

javajoshover 7 years ago

"On a long enough timeline, everything is a discrete game." (With apologies to _Fight Club_)Personally, I look forward to the day when the software I own works for me to the extent of optimizing the decisions I make during the day, even many mundane ones. Properly executed, such a system could make a big difference in my quality of life. I believe that a big piece that is missing is a solid life-model, a life-representation, that can be optimized. Once that is defined, an RNN or MCS can optimize it and I can reap the benefits.

评论 #15627619 未加载

评论 #15627587 未加载

评论 #15627909 未加载

评论 #15628665 未加载

评论 #15628847 未加载

评论 #15628647 未加载

QMLover 7 years ago

What makes this different from a minimax algorithm with alpha-beta pruning?

评论 #15627834 未加载

评论 #15627715 未加载

评论 #15627705 未加载

评论 #15627708 未加载

cgearhartover 7 years ago

I wonder how AlphaGo Zero would fare against the others if they were all using the same search algorithm, and I wonder how the search depth vs breadth changes in Zero compared to earlier variants.

Blazespinnakerover 7 years ago

Mageek, any reason why they haven't applied this to chess yet?

评论 #15628209 未加载

评论 #15628268 未加载

jacobkgover 7 years ago

Can this technique be used to write a strong chess engine?

评论 #15627905 未加载

评论 #15628931 未加载

falcor84over 7 years ago

As someone who has played many a game of Tic-Tac-Toe, I found the numerical examples really hard to follow. s(0,5) is obviously the winning move for the X player, but for some reason all examples seem to favor s(0,1).

评论 #15629975 未加载