Beating AlphaZero at Go, Chess, and Shogi, an mastering a suite of Atari video games that other AIs have failed to do efficiently. No explicit heads-up contests with a trained AlphaZero; but apparently hits an ELO threshold w/ fewer training cycles. Yowsa.
It doesn't seem possible to give "no explicit rules", unless you count "making an illegal move equivalent to a loss" as giving no explicit rules. Which doesn't seem like anything but word laundering.<p>If you do any less than this, the net will be incentivized to make an illegal move for a win. In which case, yah, I'd guess that net would win a lot of chess games against other rule-bound nets.
Every time a Deepmind paper is published, I feel simultaneously very excited, and also very depressed that they keep speeding ahead while I'm not anywhere close to understanding their first paper about AlphaGo.
Just released - walkthrough of the MuZero pseudocode:
<a href="https://link.medium.com/KB3f4RAu51" rel="nofollow">https://link.medium.com/KB3f4RAu51</a>
Very impressive - key quote: "In addition, MuZero is designed to operate in the general reinforcement learning setting: single-agent domains
with discounted intermediate rewards of arbitrary magnitude. In contrast, AlphaGo Zero and AlphaZero were
designed to operate in two-player games with undiscounted terminal rewards of ±1."<p>It's worth noting that, as impressive as MuZeros performance is in the many Atari games, it achieves a score of 0.0 in Montezuma's Revenge.
So if you ever find yourself at the mercy of a Superintelligence, simply challenge it to a round of Solaris for the Atari 2600 ;)<p>I still don't understand how the "prediction function" is generating frames?<p>From the last line of the paper it seems to suggest MuZero is generalizable to other domains.<p>But the appendix states "the network rapidly learns not to predict actions that never occur in the trajectories
it is trained on"<p>Consider the problem of predicting the next N frames of video from a one minute youtabe sample chosen at random. Where there is a high probability of some sort of scene transition in the interval. Short of training on a large subset of the youtube corpus.
Exciting research. Not my area of expertise. An opinion:<p>> without any knowledge of the game rules<p>I'd prefer 1000 times an AI that can explain to me why an opposite-colored bishop ending is drawish or why knights are more valuable near the center, and can come up with those and more new concepts/relationships/models on its own (regardless of whether we have given it the game rules or not), than a black box that is excellent at beating you at chess but you can't understand or trust. Adversarial examples (false positives) are supporters for this preference.
> In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.<p><rant>
DeepMind "superhuman" hype machine strikes again.<p>I mean, it's cool that computers are getting even better at chess and all (and other perfectly constrained game environments), but come on. "Superhuman" chess performance hasn't been particularly interesting since Deep Blue vs Kasparov in 1997.<p>The fact that the new algorithms have "no knowledge of underlying dynamics" makes it sound like an entirely new approach, and on one level it is. ML vs non-statistical methods. But on a deeper level, it's the same shit.<p>Unless I'm grossly mistaken, (someone please correct me if this is inaccurate), the superhuman performance is only made possible by massive compute. In other words, brute force.<p>But it uses less training cycles, you say! AlphaZero et all mastered the game in only 3 days! etc etc. This conveniently ignores the fact that this was 3 days of training on an array of GPUs that is way more powerful than the supercomputers of old.<p>Don't get me wrong. These ML algorithms have value and can solve real problems. I just really wish DeepMind's marketing department would stop beating us over the head with all of this "superhuman" marketing.<p>For those just tuning in, this is the same company that got the term "digital prodigy" on the cover of Science [0]. Which is again a form of cheating, because the whole prodigy aspect conveniently ignores the compute power required to achieve AlphaZero. For the record, if you took A0 and ran it on hardware from a few years ago, you would have a computer that achieves superhuman performance after a <i>very</i> long time, which wouldn't be making headlines.<p></rant><p>0. <a href="https://science.sciencemag.org/content/362/6419" rel="nofollow">https://science.sciencemag.org/content/362/6419</a>