Human-level control through deep reinforcement learning

209 pointsby daisystantonabout 10 years ago

17 comments

erostrateabout 10 years ago

The code is online if you want to play with it. <a href="https://sites.google.com/a/deepmind.com/dqn/" rel="nofollow">https://sites.google.com/a/deepmind.com/dqn/</a>If you're interested, one of the main authors (David Silver) teaches a very good and intuitive introductory class on reinforcement learning at UCL: <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html" rel="nofollow">http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html</a>

评论 #9110303 未加载

bmh100about 10 years ago

> ...the authors used the same algorithm, network architecture, and hyperparameters on each game...This is huge. It shows that the algorithm was able to generalize across multiple problem sets within the same domain of "playing Atari 2600 games", and not simply a "lucky" choice of algorithm, network architecture, or hyperparameters that a random search for each game might choose. This is also not a violation of the No Free Lunch (NFL) Theorem [1], because the domain is limited to playing Atari 2600 games, which share many characteristics.[1]: <a href="https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization" rel="nofollow">https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_op...</a>

评论 #9110741 未加载

评论 #9110702 未加载

sjtrnyabout 10 years ago

Watch it play:<a href="http://www.nature.com/nature/journal/v518/n7540/extref/nature14236-sv1.mov" rel="nofollow">http://www.nature.com/nature/journal/v518/n7540/extref/natur...</a><a href="http://www.nature.com/nature/journal/v518/n7540/extref/nature14236-sv2.mov" rel="nofollow">http://www.nature.com/nature/journal/v518/n7540/extref/natur...</a>

评论 #9110738 未加载

superfxabout 10 years ago

Here's a publicly-accessible link to the full paper:<a href="http://rdcu.be/cdlg" rel="nofollow">http://rdcu.be/cdlg</a>

评论 #9110378 未加载

j_m_babout 10 years ago

It is interesting how they are using various biological models to develop their own model. They gave their model a reward system and a memory. It will be interesting to see how far deep Q-networks can be extended and at what point they hit the wall of diminishing returns.|Nevertheless, games demanding more temporally extended planning strategies still constitute a major challenge for all existing agents including DQN.|Notably, the succesfsful integration of reinforcement learning with deep network architectures was critically dependent on our incorporation of a replay algorithm involving the storage and representations of recently experienced transitions.I am not for sure what data the replay algorithm has access to, but I wonder what happens if you extend the amount of data it has. This might be the brick wall this algorithm hits of diminishing returns.It would be interesting to hear what the authors think could help help improve how their model deals with temporally extended planning strategies.As someone who grew up on Atari, Nintendo and Sony this is pretty cool work.

评论 #9111869 未加载

albertzeyerabout 10 years ago

An interesting critic by Schmidhuber about this publication:<a href="https://plus.google.com/100849856540000067209/posts/eLQf4KC97Bs" rel="nofollow">https://plus.google.com/100849856540000067209/posts/eLQf4KC9...</a>

评论 #9111877 未加载

nlabout 10 years ago

Is this a different paper to the original DeepMind video game paper? <a href="http://arxiv.org/abs/1312.5602" rel="nofollow">http://arxiv.org/abs/1312.5602</a>

评论 #9109665 未加载

评论 #9110836 未加载

评论 #9109770 未加载

discardoramaabout 10 years ago

Is there a chance this paper will be available as PDF? I' finding it difficult to read the readcube version. :-(

评论 #9110572 未加载

评论 #9111087 未加载

javierluraschiabout 10 years ago

I think qlearning is really interesting, I posted yesterday a simple implementation/demo in Javascript of qlearning. This paper goes way beyond qlearning by deducing states based on a deep neural network from the actual game rendering, really cool. Regardless, as a first intro to qlearning I had fun putting this together <a href="https://news.ycombinator.com/item?id=9105818" rel="nofollow">https://news.ycombinator.com/item?id=9105818</a>

javierluraschiabout 10 years ago

Here is the marketing side of this publication in which Google scientists (aquihired from Deepmind) have developed a way to outperform humans in Atari games: <a href="http://m.phys.org/news/2015-02-hal-bests-humans-space-invaders.html" rel="nofollow">http://m.phys.org/news/2015-02-hal-bests-humans-space-invade...</a>

plinkplonkabout 10 years ago

Is the paper available anywhere to read without having to pay Nature? From the comments it seems as if everyone is able to read this but me! Even in their "readcube" access method, only the first page is (barely) visible, the rest seems blurred.

nlabout 10 years ago

The most interesting thing about this is that it shows significant progress towards goal-oriented AI. The fact this system is effectively learning what "win" means in the context of a game is something of a breakthrough.

评论 #9110563 未加载

评论 #9110334 未加载

craftitabout 10 years ago

It is an amazingly powerful technique. We've been working on a service which lets you do this kind of learning with any JSON stream. You can see a demo here:<a href="https://aiseedo.com/demos/cookiemonster/" rel="nofollow">https://aiseedo.com/demos/cookiemonster/</a>

评论 #9111844 未加载

viggityabout 10 years ago

Can someone convert "academia nerd language" down one notch into "regular nerd language". On the surface, this sounds interesting but despite being a huge nerd I'm not really sure what the hell they're talking about.

评论 #9110118 未加载

评论 #9110177 未加载

sharemywinabout 10 years ago

PDF:<a href="http://arxiv.org/pdf/1312.5602v1.pdf" rel="nofollow">http://arxiv.org/pdf/1312.5602v1.pdf</a>

评论 #9111068 未加载

eveningcoffeeabout 10 years ago

I am wondering that what kind of real life problems could be modelled this way.

评论 #9111803 未加载

评论 #9111018 未加载

Someoneabout 10 years ago

For comparison: <a href="http://www.cs.cmu.edu/~tom7/mario/" rel="nofollow">http://www.cs.cmu.edu/~tom7/mario/</a>. That is way more of a hack, but I am not sure this is that big a step forward. Space invaders and breakout aren't the hardest games and I haven't heard a hard argument that it is just a matter of scale to create a machine that, say, plays chess.

评论 #9110849 未加载

评论 #9110812 未加载

17 comments

erostrateabout 10 years ago

评论 #9110303 未加载

bmh100about 10 years ago

评论 #9110741 未加载

评论 #9110702 未加载

sjtrnyabout 10 years ago

评论 #9110738 未加载

superfxabout 10 years ago

Here's a publicly-accessible link to the full paper:<a href="http://rdcu.be/cdlg" rel="nofollow">http://rdcu.be/cdlg</a>

评论 #9110378 未加载

j_m_babout 10 years ago

评论 #9111869 未加载

albertzeyerabout 10 years ago

评论 #9111877 未加载

nlabout 10 years ago

Is this a different paper to the original DeepMind video game paper? <a href="http://arxiv.org/abs/1312.5602" rel="nofollow">http://arxiv.org/abs/1312.5602</a>

评论 #9109665 未加载

评论 #9110836 未加载

评论 #9109770 未加载

discardoramaabout 10 years ago

Is there a chance this paper will be available as PDF? I' finding it difficult to read the readcube version. :-(

评论 #9110572 未加载

评论 #9111087 未加载

javierluraschiabout 10 years ago

plinkplonkabout 10 years ago

nlabout 10 years ago

评论 #9110563 未加载

评论 #9110334 未加载

craftitabout 10 years ago

评论 #9111844 未加载

viggityabout 10 years ago

评论 #9110118 未加载

评论 #9110177 未加载

sharemywinabout 10 years ago

PDF:<a href="http://arxiv.org/pdf/1312.5602v1.pdf" rel="nofollow">http://arxiv.org/pdf/1312.5602v1.pdf</a>