Mastering Chess and Shogi by Self-Play with General Reinforcement Learning

539 pointsby dennybritzover 7 years ago

40 comments

gwernover 7 years ago

This is an incredible demonstration that the AG Zero expert iteration method is a general method. If you go back to the discussions of AG Zero lo a month ago, there was a lot of skepticism that NNs would ever challenge Stockfish et al - they are just too good, too close to perfection, and chess not well suited for MCTS and NNs. Well, it turns out that AG Zero doesn't work as well in chess: it works better as it only takes 4 hours of training to beat Stockfish. This is going to be an impetus for researchers to explore solving many more MDPs than just chess or Go using expert iteration... ("There is no fire alarm.")

评论 #15858623 未加载

评论 #15858927 未加载

评论 #15859212 未加载

评论 #15858502 未加载

评论 #15859130 未加载

评论 #15861427 未加载

soveranover 7 years ago

The ten sample games:Sample game 1 <a href="https://lichess.org/VMe0gfa2" rel="nofollow">https://lichess.org/VMe0gfa2</a>Sample game 2 <a href="https://lichess.org/Zqwn4Gzk" rel="nofollow">https://lichess.org/Zqwn4Gzk</a>Sample game 3 <a href="https://lichess.org/G2fPHci8" rel="nofollow">https://lichess.org/G2fPHci8</a>Sample game 4 <a href="https://lichess.org/LLt8wyYp" rel="nofollow">https://lichess.org/LLt8wyYp</a>Sample game 5 <a href="https://lichess.org/3r6CXx3H" rel="nofollow">https://lichess.org/3r6CXx3H</a>Sample game 6 <a href="https://lichess.org/sbdyUYS4" rel="nofollow">https://lichess.org/sbdyUYS4</a>Sample game 7 <a href="https://lichess.org/88vsAftE" rel="nofollow">https://lichess.org/88vsAftE</a>Sample game 8 <a href="https://lichess.org/1uvCwaeB" rel="nofollow">https://lichess.org/1uvCwaeB</a>Sample game 9 <a href="https://lichess.org/743quCXj" rel="nofollow">https://lichess.org/743quCXj</a>Sample game 10 <a href="https://lichess.org/SkCjxXkb" rel="nofollow">https://lichess.org/SkCjxXkb</a>

评论 #15860760 未加载

评论 #15860249 未加载

评论 #15862536 未加载

评论 #15863048 未加载

评论 #15865280 未加载

xianshouover 7 years ago

One impressive statistic from the paper: AlphaZero analyzes 80,000 chess positions per second, while Stockfish looks at 70,000,000. Seventy million, three orders of magnitude higher. Yet AG0 beats Stockfish half the time as White and never loses with either color.A stunning demonstration of generality indeed.

评论 #15859038 未加载

magoghmover 7 years ago

"We also analysed the relative performance of AlphaZero’s MCTS search compared to the state-of-the-art alpha-beta search engines used by Stockfish and Elmo. AlphaZero searches just 80 thousand positions per second in chess and 40 thousand in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations – arguably a more “human-like” approach to search, as originally proposed by Shannon." <- Amazing!

评论 #15858546 未加载

partycoderover 7 years ago

If you have seen the Stockfish project you will see many hardcoded weights in the configuration, found through experimentation. All these adjustments took probably years to achieve... and now Alpha Go Zero just self-learns everything and surpasses it.Would be good to see Deepmind's solution play Arimaa and Stratego, and see what kind of strategy it comes up with. Or weird variations of Go.Eventually this tech will make it into military strategy simulators and that's where things will get really messed up. 4 star generals will be replaced by bots.

评论 #15858858 未加载

zwischenzugover 7 years ago

I smell a rat.The paper says:'AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi'In the first game Stockfish's, 9. Qe1 is one of the strangest moves I've ever seen, which would never be considered by a human, let alone a superhuman.11. Kh1 also makes little sense, but is not as bad. My Stockfish sees it as losing 0.2 pawns, which makes it highly suspect in such a position.35. Nc4 is also a deeply puzzling move that my Stockfish sees as losing half a pawn immediately, and a whole pawn soon after.50. g4 also suspect52. e5 is insane.This is bullshit.Edit: bullshit is too much - see comments below.Edit: Oh dear. We're doomed.<a href="https://lichess.org/study/qiwMCyNQ" rel="nofollow">https://lichess.org/study/qiwMCyNQ</a>

评论 #15859238 未加载

评论 #15859243 未加载

评论 #15859199 未加载

评论 #15859655 未加载

评论 #15860054 未加载

cdelsolarover 7 years ago

I wanted to contact the authors directly but can't seem to find contact info at the moment, with a question. I hope some of you might know enough to answer it.I'm interested in applying this method, or a similar neural-network / tabula rasa based method to the game of Scrabble. I read the original AlphaGo Zero paper and they mentioned that this method works best for games of perfect information. The standard Scrabble AI right now is quite good and can definitely beat top experts close to 50% of the time, but it uses simple Monte Carlo simulations to evaluate positions and just picks the ones that perform better. It doesn't quite account for defensive considerations or other subtleties of the game. I was wondering if anyone who had more insight into MCTS and NN would be able to talk me through how to apply this to Scrabble, or if it even makes sense. One of the issues I can see currently would be very slow convergence; as it has a luck factor, the algorithm could make occasional terrible moves and still win games, and thus be "wrongly trained".

评论 #15870840 未加载

ericandover 7 years ago

Two things to note:1) Alpha Zero beats AlphaGo Zero and AlphaGo Lee and starts tabla rasa2) "Shogi is a significantly harder game, in terms of computational complexity, than chess (2, 14): it is played on a larger board, and any captured opponent piece changes sides and may subsequently be dropped anywhere on the board. The strongest shogi programs, such as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated human champions (5)"

评论 #15858768 未加载

评论 #15859743 未加载

Scarblacover 7 years ago

As a chess player I find the win rate astonishing.Given the drawish tendency at top level, among human players, in correspondence chess and also in the TCEC final, I thought that even absolutely perfect play wouldn't score so well against a decent Stockfish setup (which 64 cores and 1 minute per move should be).

thomover 7 years ago

I can’t see any reference to whether Stockfish was configured with an endgame tablebase. It’d be interesting to see results then, as you’d expect AlphaZero’s superior evaluation to give it an advantage out of the opening, but later in the game Stockfish would have access to perfect evaluations. Obviously there’s nothing stopping you from plugging a tablebase into AlphaZero but that feels wrong.

评论 #15860410 未加载

Invictus0over 7 years ago

I'm not sure it's really fair to compare Stockfish to AlphaZero; AlphaZero used 24h of 5000 TPUs in compute time, and still needed 4 TPUs in real play, while Stockfish ran on just 64 threads and 1GB RAM. Nonetheless, still an impressive achievement.

评论 #15859561 未加载

评论 #15859617 未加载

Aissenover 7 years ago

Serious question: how does one evaluate the results reproducibility of this paper ?Maybe I'm missing some things but:- Are 1st gen TPUs even accessible ? You have to fill out a form to learn more about those second generation TPUs: <a href="https://cloud.google.com/tpu/" rel="nofollow">https://cloud.google.com/tpu/</a>- I can't find the source codeThis does not look like a scientific paper, but a (very impressive) tech demo.

评论 #15860194 未加载

评论 #15860096 未加载

评论 #15860068 未加载

评论 #15860343 未加载

评论 #15863865 未加载

thomasahleover 7 years ago

Discussion at the Computer Chess Club (CCC) forum: <a href="http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=741214&t=65910" rel="nofollow">http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...</a>and<a href="http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=741211&t=65909" rel="nofollow">http://www.talkchess.com/forum/viewtopic.php?topic_view=thre...</a>

tboerstadover 7 years ago

Stockfish plays like an ambitious amateur in the first game, giving away a piece for two pawns on move 13.Perhaps this move was justified though, as later in the same game Stockfish gets a position which is at worst drawn, likely winning. Moves later however, around move 40, Stockfish gets its own knight trapped and the game is over.This is not the kind of chess we normally see from Stockfish.

评论 #15859154 未加载

naveen99over 7 years ago

Very happy to see this result. It's like a moral victory for humans, as alphago is more human like (discounting montecarlo search) than stockfish. Maybe deep learning will give us the next Euler, Newton, or Einstein.

评论 #15858425 未加载

nlover 7 years ago

For those complaining about the TPU resources used during self training it is worth noting that Stockfish has used over 10,000 CPU hours for tuning its parameters. See <a href="https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU%20Contributors.txt" rel="nofollow">https://github.com/mcostalba/Stockfish/blob/master/Top%20CPU...</a>

评论 #15864602 未加载

110011over 7 years ago

What an amazing result! Evaluating fewer (by a factor of 1000) positions AlphaZero still beats Stockfish.In the figure on its preferred openings I find it very interesting that it doesn't like the Ruy Lopez very much over training time (there is a small bump but that is transient). I am hardly a chess expert but I know that it was very favored at the world championships so maybe the chess world will be turned upside down by this result now?Positing that the chess world is bigger than the Go world (in terms of interest and finances) there is probably going to be a race to replicate these results "at home" and train yourself before your competitors :)

elcapitanover 7 years ago

What would be a good starting point to learn about the AI behind that for a "normal" programmer? There seem to be so many resources now that it's hard to choose. Combination of hands-on plus theory would be good.

评论 #15860161 未加载

评论 #15860075 未加载

asdfologistover 7 years ago

While this sounds impressive, I'll believe it when AlphaZero wins TCEC.

评论 #15858823 未加载

评论 #15859560 未加载

评论 #15859369 未加载

评论 #15860417 未加载

gallerdudeover 7 years ago

I wonder if being an expert at one game makes it easier to be an expert at another. If so, then maybe the examples are datasets, and convergence would be able to complete new tasks after a few examples.

评论 #15858485 未加载

评论 #15861426 未加载

评论 #15858706 未加载

luckytover 7 years ago

It doesn't seem to like the Sicilian Defense (1.e4 c5), which is the most popular opening by human players. I wonder if this will change opening theory?

评论 #15860605 未加载

评论 #15859348 未加载

评论 #15858802 未加载

评论 #15867787 未加载

评论 #15859273 未加载

narratorover 7 years ago

So when are they going to apply this to Atari Games or well anything? The next step is they have one AI figure out the rules by making a GAN that imitates player behavior and the other AI be Alpha Go which tweaks the GAN inputs to generate different moves to win. Voila...Almost General Purpose AI that can learn to play any game.

评论 #15860037 未加载

评论 #15870428 未加载

Sukottoover 7 years ago

Is this a library or something I can download and try training myself (on a small scale)?I'm not in a position to read the paper right now, so my apologies if that's covered in there. I want to ask just in case it's not, while this is still on the front page.

评论 #15858695 未加载

lern_too_spelover 7 years ago

What is its win percentage against itself on each side of the board in each game? Is chess a draw for its style of play? Is there a first move advantage for the other games with its play style?

hmate9over 7 years ago

So AlphaGo Zero used 4 TPUs while AlphaZero used 1500. It’s not immediately obvious to me why there is this massive difference. Can anyone elaborate?

评论 #15861009 未加载

skcover 7 years ago

I'm only a fairly pedestrian chess player, but I looked at one of these games between AGZ and SF and aside from the endgame, AGZ played in a manner that almost seemed alien. It seemed to completely ignore various little rules of thumb which is to be expected in hindsight but fairly mind-blowing when you actually watch a game.

bfirshover 7 years ago

Here's an HTML version of the paper:<a href="https://www.arxiv-vanity.com/papers/1712.01815/" rel="nofollow">https://www.arxiv-vanity.com/papers/1712.01815/</a>Table 2 is broken, but the rest is much more readable if you're on a phone.

wskishover 7 years ago

The more interesting metric going forward is performance at a given power budget (not unlike with motorsports). The TPUs are consuming sooo much power here! Most interesting real-world problems are power-limited, including in nature (e.g. metabolic limits).

评论 #15870086 未加载

k2xlover 7 years ago

Chess.com forum thread <a href="https://www.chess.com/forum/view/general/stockfish-dethroned" rel="nofollow">https://www.chess.com/forum/view/general/stockfish-dethroned</a>

hyperpapeover 7 years ago

This paper compares AlphaZero to the 20 block version of AlphaGo Zero that was trained for 3 days. Am I right in thinking that this version was significantly less strong than the 40 block version? If so, does it matter?

TwoBitover 7 years ago

Wasn't Stockfish gimped for this competition? No openings, no endgame tables, low RAM, etc? If that's so then this AI did not in fact beat the computer chess champ.

naveen99over 7 years ago

Is there an sdk or compiler for using the google tpu's beyond just using tensorflow ? Is the tpu backend of tensorflow based on cuda, opencl, plain c or something else ?

imrehgover 7 years ago

As a Shogi enthusiast (but complete beginner), I'd like to have seen more Shogi details in the article. Nevertheless there's plenty of other things to geek out on...

auggieroseover 7 years ago

Great result, but without access to source code this is not a scientific paper.

SubiculumCodeover 7 years ago

There is only one way for a human to win at chess against these computers; and it involves violence against the chess board.

foobawover 7 years ago

Did Magnus play against this? Is there a way we can see the game?

评论 #15859140 未加载

plgover 7 years ago

source code?

stretchwithmeover 7 years ago

See, Mom? Self play is a good thing.

firebonesover 7 years ago

A lot of the graphs in the paper seem to level out as they hit the level of the opponent. It makes me wonder to what extent AlphaGo Zero is merely optimizing to beat flaws in existing opponents' current implementations (even if "existing opponents" == all available opponents' data and algorithms today) rather than generalizable insights into the underlying game. Because wouldn't you expect that unless we are at the theoretical limit of perfect chess that a tabula rasa approach might exceed existing best practice significantly, especially with the massive computation advantage it has?Not that there's anything wrong with that; AlphaGo Zero supposedly optimized for the "just enough" win rather than the crushing win. It doesn't even mean Stockfish is doomed--I suspect Stockfish could beat it in a future heads up match provided that Zero didn't have time to retrain, but that a retrained Zero (having the benefit of optimizing against a new Stockfish) would be able to supersede it once again.

评论 #15858467 未加载

评论 #15858454 未加载

评论 #15858486 未加载

评论 #15858580 未加载

ericandover 7 years ago

Certainly a significant achievement. Also, kind of interesting that the AlphaGo team spent a lot of energy to convince us Go is much harder than Chess, only to turn around and tell us that it is amazing that it can also win at Chess.

评论 #15858396 未加载

40 comments

gwernover 7 years ago

评论 #15858623 未加载

评论 #15858927 未加载

评论 #15859212 未加载

评论 #15858502 未加载

评论 #15859130 未加载

评论 #15861427 未加载

soveranover 7 years ago

评论 #15860760 未加载

评论 #15860249 未加载

评论 #15862536 未加载

评论 #15863048 未加载

评论 #15865280 未加载

xianshouover 7 years ago

评论 #15859038 未加载

magoghmover 7 years ago

评论 #15858546 未加载

partycoderover 7 years ago

评论 #15858858 未加载

zwischenzugover 7 years ago

评论 #15859238 未加载

评论 #15859243 未加载

评论 #15859199 未加载

评论 #15859655 未加载

评论 #15860054 未加载

cdelsolarover 7 years ago

评论 #15870840 未加载

ericandover 7 years ago

评论 #15858768 未加载

评论 #15859743 未加载

Scarblacover 7 years ago

thomover 7 years ago

评论 #15860410 未加载

Invictus0over 7 years ago

评论 #15859561 未加载

评论 #15859617 未加载

Aissenover 7 years ago

评论 #15860194 未加载

评论 #15860096 未加载

评论 #15860068 未加载

评论 #15860343 未加载

评论 #15863865 未加载

thomasahleover 7 years ago

tboerstadover 7 years ago

评论 #15859154 未加载

naveen99over 7 years ago

评论 #15858425 未加载

nlover 7 years ago

评论 #15864602 未加载

110011over 7 years ago

elcapitanover 7 years ago

评论 #15860161 未加载

评论 #15860075 未加载

asdfologistover 7 years ago

While this sounds impressive, I'll believe it when AlphaZero wins TCEC.

评论 #15858823 未加载

评论 #15859560 未加载

评论 #15859369 未加载

评论 #15860417 未加载

gallerdudeover 7 years ago

评论 #15858485 未加载

评论 #15861426 未加载

评论 #15858706 未加载

luckytover 7 years ago

It doesn't seem to like the Sicilian Defense (1.e4 c5), which is the most popular opening by human players. I wonder if this will change opening theory?

评论 #15860605 未加载

评论 #15859348 未加载

评论 #15858802 未加载

评论 #15867787 未加载

评论 #15859273 未加载

narratorover 7 years ago

评论 #15860037 未加载

评论 #15870428 未加载

Sukottoover 7 years ago

评论 #15858695 未加载

lern_too_spelover 7 years ago

What is its win percentage against itself on each side of the board in each game? Is chess a draw for its style of play? Is there a first move advantage for the other games with its play style?

hmate9over 7 years ago

So AlphaGo Zero used 4 TPUs while AlphaZero used 1500. It’s not immediately obvious to me why there is this massive difference. Can anyone elaborate?

评论 #15861009 未加载

skcover 7 years ago

bfirshover 7 years ago

wskishover 7 years ago

评论 #15870086 未加载

k2xlover 7 years ago

Chess.com forum thread <a href="https://www.chess.com/forum/view/general/stockfish-dethroned" rel="nofollow">https://www.chess.com/forum/view/general/stockfish-dethroned</a>

hyperpapeover 7 years ago

TwoBitover 7 years ago

Wasn't Stockfish gimped for this competition? No openings, no endgame tables, low RAM, etc? If that's so then this AI did not in fact beat the computer chess champ.

naveen99over 7 years ago

Is there an sdk or compiler for using the google tpu's beyond just using tensorflow ? Is the tpu backend of tensorflow based on cuda, opencl, plain c or something else ?

imrehgover 7 years ago

As a Shogi enthusiast (but complete beginner), I'd like to have seen more Shogi details in the article. Nevertheless there's plenty of other things to geek out on...

auggieroseover 7 years ago

Great result, but without access to source code this is not a scientific paper.

SubiculumCodeover 7 years ago

There is only one way for a human to win at chess against these computers; and it involves violence against the chess board.