How AlphaZero Mastered Its Games

247 点作者 jsomers超过 6 年前

11 条评论

glinscott超过 6 年前

James put together a really nice summary of the ideas and the projects!It was almost a year ago that lc0 was launched, since then the community (led by Alexander Lyashuk, author of the current engine) has taken it to a totally different level. Follow along at <a href="http://lczero.org" rel="nofollow">http://lczero.org</a>!Gcp has also done an amazing job with Leela Zero, with a very active community on the Go side. <a href="http://zero.sjeng.org" rel="nofollow">http://zero.sjeng.org</a>Of course, DeepMind really did something amazing with AlphaZero. It’s hard to overstate how dominant minimax search has been in chess. For another approach (MCTS/NN) to even be competitive with 50+ years of research is amazing. And all that without any human knowledge!Still, Stockfish keeps on improving - Stockfish 10 is significantly stronger than the version AlphaZero played in the paper (no fault of DeepMind; SF just improves quickly). We need a public exhibition match to setttle the score, ideally with some GM commentary :). To complete the links you can watch Stockfish improve here: <a href="http://tests.stockfishchess.org" rel="nofollow">http://tests.stockfishchess.org</a>.

评论 #18782655 未加载

评论 #18782938 未加载

alan_wade超过 6 年前

God what a well written article! I don't have much to say on the subject, but this was pure joy to read, it's crazy good. Clear, engaging, to the point, making a difficult subject accessible without dumbing it down, no fluff or unnecessary side stories, just awesomeness.

评论 #18783719 未加载

评论 #18783722 未加载

stabbles超过 6 年前

What's very interesting is that the Komodo developers have implemented a Monte Carlo Tree Search version of their engine without neural nets for evaluation / move selection. This brand new engine can actually compete at the top level (still much worse than Stockfish and slightly worse than Lc0) [1] [2]The exact implementation details are probably kept secret, but the idea is to do a few steps of minimax / alpha-beta rather than completely random play in the playout phase of MCTS.This makes me think that the contribution of AlphaZero is not necessarily neural nets, but rather MCTS as a succesful method to search the game tree efficiently.[1] <a href="http://tcec.chessdom.com/" rel="nofollow">http://tcec.chessdom.com/</a> [2] <a href="http://www.chessdom.com/komodo-mcts-monte-carlo-tree-search-is-the-new-star-of-tcec/" rel="nofollow">http://www.chessdom.com/komodo-mcts-monte-carlo-tree-search-...</a>

评论 #18783101 未加载

YeGoblynQueenne超过 6 年前

>> In fact, less than two months later, DeepMind published a preprint of a third paper, showing that the algorithm behind AlphaGo Zero could be generalized to any two-person, zero-sum game of perfect information (that is, a game in which there are no hidden elements, such as face-down cards in poker).I can't find this claim in the linked paper. What I can find is a statement that AlphaZero has demonstrated that 'a general-purpose reinforcement learning algorithm can achieve, tabula rasa, superhuman performance across many challenging domains'.Personally, and I'm sorry to be so very negative about this, but I don't even see the "many" domains. AlphaZero plays three games that are very similar to each other. Indeed, shoggi is a variant of chess. There are certainly two-person, zero-sum, perfect-information games with radically different boards and pieces to either Go, or chess and shoggi - say, the Royal Game of Ur [1], or Mancala [2], etc, not to mention stochastic games of perfect information, like backgrammon, or assymetric games like the hnefatafl games [3], and so on.Most likely, AlphaZero can be trained to play many such games very powerfully, or at a superhuman level. The point however is that, currently, it hasn't. So no "demonstration" of general game-playing has taken place, and of course there is no such thing as some sort of theoretical analysis that would serve as proof, or indication, of such ability in any of the DeepMind papers.I was hoping for less ra-ra cheerleading from the New Yorker, to be honest.________________[1] <a href="https://en.wikipedia.org/wiki/Royal_Game_of_Ur" rel="nofollow">https://en.wikipedia.org/wiki/Royal_Game_of_Ur</a>[2] <a href="https://en.wikipedia.org/wiki/Mancala" rel="nofollow">https://en.wikipedia.org/wiki/Mancala</a>[3] <a href="https://en.wikipedia.org/wiki/Tafl_games" rel="nofollow">https://en.wikipedia.org/wiki/Tafl_games</a>

评论 #18783223 未加载

评论 #18783168 未加载

评论 #18785369 未加载

评论 #18785024 未加载

cdelsolar超过 6 年前

Awesome article. Does anyone know how to begin applying the AlphaZero techniques to games where information is NOT perfect? I'm trying to apply it to Scrabble. There hasn't been much AI research in this game and right now the best AI just uses brute force Monte Carlo with a flawed evaluation function (which doesn't take into account the state of the board at all, just points and tiles remaining on the opponent's rack). It's still good enough to beat top human experts about half the time, but I want to make something better.Is it impossible to apply to these types of games? Every time I read about AlphaZero the articles mention that the techniques are meant for games of perfect information.

评论 #18784760 未加载

FPGAhacker超过 6 年前

They mention a documentary on Netflix about AlphaGo. Any recommendations for or against?

评论 #18783241 未加载

评论 #18782660 未加载

评论 #18782735 未加载

评论 #18782642 未加载

评论 #18783297 未加载

eismcc超过 6 年前

If you are interested in how to build a bot, Manning is having a Go bot competition:<a href="https://deals.manning.com/go-comp/" rel="nofollow">https://deals.manning.com/go-comp/</a>It’s been really fun to work through the books.

lgeorget超过 6 年前

The article is very well written but that sentence felt a bit wierd:> Before there could be acceptance, there was depression. “I want to apologize for being so powerless,” he said in a press conference.Lee Sedol was clearly upset, especially after the first two matches, but I think that apology was more out of politeness than depression, really.

deegles超过 6 年前

Is there a way to play against an AlphaGo or equivalent but with adaptive difficulty? I know next to nothing about go and think it would be interesting to learn it just by playing vs. a neural network. Maybe over time the strategies it uses would be "transferred" over to me!

pie_hacker超过 6 年前

The match between Stockfish and AlphaZero was played with certain unjustified parameters (time control, ponder off, different hardware, no opening book or endgame tablebase for Stockfish etc.). By "unjustified," I mean that the authors of the paper did not justify their choice of parameters in the paper as being designed to implement a fair match.At a glance, the parameters of the match seem unfair to me -- and tilted heavily towards AlphaZero. If the code, were open source, this would not matter; anyone could run a rematch. As it is, I haven't seen any convincing evidence that AlphaZero is stronger than Stockfish when Stockfish is allowed to use its full breadth of knowledge and run on equal hardware.

评论 #18783522 未加载

评论 #18783494 未加载

tosser0001超过 6 年前

> An expert human player is an expert precisely because her mind automatically identifies ...The "Patronizing 'Her'"Almost invariably, when the author decides to use the patronizing 'her' instead of the gender-neutral 'they' it's written by a man.

评论 #18784317 未加载

评论 #18783934 未加载