honestly whenever I read anything written about AlphaGo, I wanna start pulling out my hair at the inanity of it. Look at this utterly uninformed comment for example:<p>> As far as algorithmic ingenuity goes, this is pretty much all there is to it. With all the
hype surrounding AlphaGo's victory this year, its success is just as much if not more
attributable to data, compute power and infrastructure advancements than algorithmic
wizardry.<p>... Like is anything about this sentence even approximately true??? There was nothing new about the dataset used (and to this day I can't understand why they use the exact datasets they do; KGS for the predictive net, and Tygen for rollout softmax - both amateur player databases, and don't even use the GoGoD database of pro players; seems other teams have comparable or better results at prediction with it), nor in using a sizable cluster to execute a go playing algorithm (limits to the size of it were and are when the algorithm one uses hits steep diminishing returns but anyhow, already MoGo was using rather big ones), nor in training on such a dataset for go playing. The only damn difference were PRECISELY the algorithms used, exactly in contradiction to the claim here! The nugget of truth there is that the algorithms aren't terribly innovative , just their application in this problem - but its training setup and targets are however literally groundbreaking.<p>Rewind the time a little bit, to the end of 2014, and you'll see the result of an Oxford team, as well from Google's team, that demonstrate a large improvement in the accuracy of the move predictor task - given a board position, predict the next move from an actual game, by using a convnet to do it. That was a first sign that deep learning had potential in go, though at that point it didn't make for a particularly strong player. Systems were getting to a point where they could bias the search effectively with such a convnet for decent gains when the AlphaGo result was announced, that dwarfed these already exciting advances!<p>So clearly its not about "just" using more computers nor bigger datasets, nor just doing the straightforward thing in applying deep learning to the problem; all of the above was done before AlphaGo, yeah it helped but there was just no contest between the 7d KGS amateur rank the best of the rest were getting and at or beyond top humans AlphaGo did.<p>AlphaGo came up with the second component, and actually solved a problem thought unfeasable in the computer go world - creating an evaluation function for go. The entire monte carlo tree search revolution of the mid-to-late 00' in computer go was how to sidestep the problem of evaluating if a particular board position is good or bad, by just running full stochastic playthroughs of the game and scoring them instead. AlphaGo on the other hand first created a decent-ish player network (though honestly nothing special - 5d KGS - that's the one that's finetuned by reinforcement learning and notably isn't even a part of the final configuration but just generates a large dataset effectively, cuz humans just haven't played enough games in history for this training setup), then generated a large dataset of games this network was made to play, and then trained (supervised!) a net on predicting the game outcome, given a board position (and taking just one position from each game, somewhat conservatively avoiding overfitting this way).<p>THIS is the genious of the AlphaGo algorithm; it is a monte carlo tree search algorithm, biased by a convnet, rolled out by a softmax, and crucially with an evaluation function that is mixed, 50%-50% with rollout scores<p>THAT is a completely novel algorithm, nothing in the literature is particularly like it, in particular the evaluation function and its mix with rollouts was not considered possible! And it works orders of magnitude better than any other monte carlo tree search tried since 2006, as well as orders of magnitude better than other deep learning biased approaches tried since 2014.<p>And to think that the least of these things; ie the from all the work done since 2014 on AlphaGo, 3 days (!!!) spend on finetuning(!!!) the prediction net by reinforcement learning to make it stronger (though still just a 5d amateur) so as to generate the needed learningset for another component is the only thing mentioned about their approach !?