TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

161 pointsby metasjover 5 years ago

10 comments

metasjover 5 years ago
Beating AlphaZero at Go, Chess, and Shogi, an mastering a suite of Atari video games that other AIs have failed to do efficiently. No explicit heads-up contests with a trained AlphaZero; but apparently hits an ELO threshold w/ fewer training cycles. Yowsa.
评论 #21591938 未加载
Tomminnover 5 years ago
It doesn&#x27;t seem possible to give &quot;no explicit rules&quot;, unless you count &quot;making an illegal move equivalent to a loss&quot; as giving no explicit rules. Which doesn&#x27;t seem like anything but word laundering.<p>If you do any less than this, the net will be incentivized to make an illegal move for a win. In which case, yah, I&#x27;d guess that net would win a lot of chess games against other rule-bound nets.
评论 #21590757 未加载
评论 #21592072 未加载
评论 #21590115 未加载
2bitencryptionover 5 years ago
Every time a Deepmind paper is published, I feel simultaneously very excited, and also very depressed that they keep speeding ahead while I&#x27;m not anywhere close to understanding their first paper about AlphaGo.
评论 #21591217 未加载
davidfosterover 5 years ago
Just released - walkthrough of the MuZero pseudocode: <a href="https:&#x2F;&#x2F;link.medium.com&#x2F;KB3f4RAu51" rel="nofollow">https:&#x2F;&#x2F;link.medium.com&#x2F;KB3f4RAu51</a>
SmooLover 5 years ago
Very impressive - key quote: &quot;In addition, MuZero is designed to operate in the general reinforcement learning setting: single-agent domains with discounted intermediate rewards of arbitrary magnitude. In contrast, AlphaGo Zero and AlphaZero were designed to operate in two-player games with undiscounted terminal rewards of ±1.&quot;<p>It&#x27;s worth noting that, as impressive as MuZeros performance is in the many Atari games, it achieves a score of 0.0 in Montezuma&#x27;s Revenge.
Animatsover 5 years ago
What sort of resources did this take? Racks of GPUs, or a laptop, or what?
评论 #21593329 未加载
sliloover 5 years ago
Are there any game records (kifu) of the MuZero - AlphaZero games?
评论 #21593121 未加载
ArtWombover 5 years ago
So if you ever find yourself at the mercy of a Superintelligence, simply challenge it to a round of Solaris for the Atari 2600 ;)<p>I still don&#x27;t understand how the &quot;prediction function&quot; is generating frames?<p>From the last line of the paper it seems to suggest MuZero is generalizable to other domains.<p>But the appendix states &quot;the network rapidly learns not to predict actions that never occur in the trajectories it is trained on&quot;<p>Consider the problem of predicting the next N frames of video from a one minute youtabe sample chosen at random. Where there is a high probability of some sort of scene transition in the interval. Short of training on a large subset of the youtube corpus.
评论 #21591243 未加载
notelonmuskover 5 years ago
Exciting research. Not my area of expertise. An opinion:<p>&gt; without any knowledge of the game rules<p>I&#x27;d prefer 1000 times an AI that can explain to me why an opposite-colored bishop ending is drawish or why knights are more valuable near the center, and can come up with those and more new concepts&#x2F;relationships&#x2F;models on its own (regardless of whether we have given it the game rules or not), than a black box that is excellent at beating you at chess but you can&#x27;t understand or trust. Adversarial examples (false positives) are supporters for this preference.
评论 #21590966 未加载
评论 #21592798 未加载
mindgam3over 5 years ago
&gt; In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.<p>&lt;rant&gt; DeepMind &quot;superhuman&quot; hype machine strikes again.<p>I mean, it&#x27;s cool that computers are getting even better at chess and all (and other perfectly constrained game environments), but come on. &quot;Superhuman&quot; chess performance hasn&#x27;t been particularly interesting since Deep Blue vs Kasparov in 1997.<p>The fact that the new algorithms have &quot;no knowledge of underlying dynamics&quot; makes it sound like an entirely new approach, and on one level it is. ML vs non-statistical methods. But on a deeper level, it&#x27;s the same shit.<p>Unless I&#x27;m grossly mistaken, (someone please correct me if this is inaccurate), the superhuman performance is only made possible by massive compute. In other words, brute force.<p>But it uses less training cycles, you say! AlphaZero et all mastered the game in only 3 days! etc etc. This conveniently ignores the fact that this was 3 days of training on an array of GPUs that is way more powerful than the supercomputers of old.<p>Don&#x27;t get me wrong. These ML algorithms have value and can solve real problems. I just really wish DeepMind&#x27;s marketing department would stop beating us over the head with all of this &quot;superhuman&quot; marketing.<p>For those just tuning in, this is the same company that got the term &quot;digital prodigy&quot; on the cover of Science [0]. Which is again a form of cheating, because the whole prodigy aspect conveniently ignores the compute power required to achieve AlphaZero. For the record, if you took A0 and ran it on hardware from a few years ago, you would have a computer that achieves superhuman performance after a <i>very</i> long time, which wouldn&#x27;t be making headlines.<p>&lt;&#x2F;rant&gt;<p>0. <a href="https:&#x2F;&#x2F;science.sciencemag.org&#x2F;content&#x2F;362&#x2F;6419" rel="nofollow">https:&#x2F;&#x2F;science.sciencemag.org&#x2F;content&#x2F;362&#x2F;6419</a>
评论 #21591728 未加载