TechEcho

10 comments

metasjover 5 years ago

Beating AlphaZero at Go, Chess, and Shogi, an mastering a suite of Atari video games that other AIs have failed to do efficiently. No explicit heads-up contests with a trained AlphaZero; but apparently hits an ELO threshold w/ fewer training cycles. Yowsa.

评论 #21591938 未加载

Tomminnover 5 years ago

It doesn't seem possible to give "no explicit rules", unless you count "making an illegal move equivalent to a loss" as giving no explicit rules. Which doesn't seem like anything but word laundering.If you do any less than this, the net will be incentivized to make an illegal move for a win. In which case, yah, I'd guess that net would win a lot of chess games against other rule-bound nets.

评论 #21590757 未加载

评论 #21592072 未加载

评论 #21590115 未加载

2bitencryptionover 5 years ago

Every time a Deepmind paper is published, I feel simultaneously very excited, and also very depressed that they keep speeding ahead while I'm not anywhere close to understanding their first paper about AlphaGo.

评论 #21591217 未加载

davidfosterover 5 years ago

Just released - walkthrough of the MuZero pseudocode: <a href="https://link.medium.com/KB3f4RAu51" rel="nofollow">https://link.medium.com/KB3f4RAu51</a>

SmooLover 5 years ago

Very impressive - key quote: "In addition, MuZero is designed to operate in the general reinforcement learning setting: single-agent domains with discounted intermediate rewards of arbitrary magnitude. In contrast, AlphaGo Zero and AlphaZero were designed to operate in two-player games with undiscounted terminal rewards of ±1."It's worth noting that, as impressive as MuZeros performance is in the many Atari games, it achieves a score of 0.0 in Montezuma's Revenge.

Animatsover 5 years ago

What sort of resources did this take? Racks of GPUs, or a laptop, or what?

评论 #21593329 未加载

sliloover 5 years ago

Are there any game records (kifu) of the MuZero - AlphaZero games?

评论 #21593121 未加载

ArtWombover 5 years ago

So if you ever find yourself at the mercy of a Superintelligence, simply challenge it to a round of Solaris for the Atari 2600 ;)I still don't understand how the "prediction function" is generating frames?From the last line of the paper it seems to suggest MuZero is generalizable to other domains.But the appendix states "the network rapidly learns not to predict actions that never occur in the trajectories it is trained on"Consider the problem of predicting the next N frames of video from a one minute youtabe sample chosen at random. Where there is a high probability of some sort of scene transition in the interval. Short of training on a large subset of the youtube corpus.

评论 #21591243 未加载

notelonmuskover 5 years ago

Exciting research. Not my area of expertise. An opinion:> without any knowledge of the game rulesI'd prefer 1000 times an AI that can explain to me why an opposite-colored bishop ending is drawish or why knights are more valuable near the center, and can come up with those and more new concepts/relationships/models on its own (regardless of whether we have given it the game rules or not), than a black box that is excellent at beating you at chess but you can't understand or trust. Adversarial examples (false positives) are supporters for this preference.

评论 #21590966 未加载

评论 #21592798 未加载

mindgam3over 5 years ago

> In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.<rant> DeepMind "superhuman" hype machine strikes again.I mean, it's cool that computers are getting even better at chess and all (and other perfectly constrained game environments), but come on. "Superhuman" chess performance hasn't been particularly interesting since Deep Blue vs Kasparov in 1997.The fact that the new algorithms have "no knowledge of underlying dynamics" makes it sound like an entirely new approach, and on one level it is. ML vs non-statistical methods. But on a deeper level, it's the same shit.Unless I'm grossly mistaken, (someone please correct me if this is inaccurate), the superhuman performance is only made possible by massive compute. In other words, brute force.But it uses less training cycles, you say! AlphaZero et all mastered the game in only 3 days! etc etc. This conveniently ignores the fact that this was 3 days of training on an array of GPUs that is way more powerful than the supercomputers of old.Don't get me wrong. These ML algorithms have value and can solve real problems. I just really wish DeepMind's marketing department would stop beating us over the head with all of this "superhuman" marketing.For those just tuning in, this is the same company that got the term "digital prodigy" on the cover of Science [0]. Which is again a form of cheating, because the whole prodigy aspect conveniently ignores the compute power required to achieve AlphaZero. For the record, if you took A0 and ran it on hardware from a few years ago, you would have a computer that achieves superhuman performance after a very long time, which wouldn't be making headlines.</rant>0. <a href="https://science.sciencemag.org/content/362/6419" rel="nofollow">https://science.sciencemag.org/content/362/6419</a>

评论 #21591728 未加载

10 comments

metasjover 5 years ago

评论 #21591938 未加载

Tomminnover 5 years ago

评论 #21590757 未加载

评论 #21592072 未加载

评论 #21590115 未加载

2bitencryptionover 5 years ago

评论 #21591217 未加载

davidfosterover 5 years ago

Just released - walkthrough of the MuZero pseudocode: <a href="https://link.medium.com/KB3f4RAu51" rel="nofollow">https://link.medium.com/KB3f4RAu51</a>

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

10 comments

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

10 comments