The code is online if you want to play with it.
<a href="https://sites.google.com/a/deepmind.com/dqn/" rel="nofollow">https://sites.google.com/a/deepmind.com/dqn/</a><p>If you're interested, one of the main authors (David Silver) teaches a very good and intuitive introductory class on reinforcement learning at UCL:
<a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html" rel="nofollow">http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html</a>
> <i>...the authors used the same algorithm, network architecture, and hyperparameters on each game...</i><p>This is huge. It shows that the algorithm was able to generalize across multiple problem sets within the same domain of "playing Atari 2600 games", and not simply a "lucky" choice of algorithm, network architecture, or hyperparameters that a random search for each game might choose. This is also not a violation of the No Free Lunch (NFL) Theorem [1], because the domain is limited to playing Atari 2600 games, which share many characteristics.<p>[1]: <a href="https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_optimization" rel="nofollow">https://en.wikipedia.org/wiki/No_free_lunch_in_search_and_op...</a>
It is interesting how they are using various biological models to develop their own model. They gave their model a reward system and a memory. It will be interesting to see how far deep Q-networks can be extended and at what point they hit the wall of diminishing returns.<p>|Nevertheless, games demanding more temporally extended planning strategies still constitute a major challenge for all existing agents including DQN.<p>|Notably, the succesfsful integration of reinforcement learning with deep network architectures was critically dependent on our incorporation of a replay algorithm involving the storage and representations of recently experienced transitions.<p>I am not for sure what data the replay algorithm has access to, but I wonder what happens if you extend the amount of data it has. This might be the brick wall this algorithm hits of diminishing returns.<p>It would be interesting to hear what the authors think could help help improve how their model deals with temporally extended planning strategies.<p>As someone who grew up on Atari, Nintendo and Sony this is pretty cool work.
An interesting critic by Schmidhuber about this publication:<p><a href="https://plus.google.com/100849856540000067209/posts/eLQf4KC97Bs" rel="nofollow">https://plus.google.com/100849856540000067209/posts/eLQf4KC9...</a>
Is this a different paper to the original DeepMind video game paper? <a href="http://arxiv.org/abs/1312.5602" rel="nofollow">http://arxiv.org/abs/1312.5602</a>
I think qlearning is really interesting, I posted yesterday a simple implementation/demo in Javascript of qlearning. This paper goes way beyond qlearning by deducing states based on a deep neural network from the actual game rendering, really cool. Regardless, as a first intro to qlearning I had fun putting this together <a href="https://news.ycombinator.com/item?id=9105818" rel="nofollow">https://news.ycombinator.com/item?id=9105818</a>
Here is the marketing side of this publication in which Google scientists (aquihired from Deepmind) have developed a way to outperform humans in Atari games: <a href="http://m.phys.org/news/2015-02-hal-bests-humans-space-invaders.html" rel="nofollow">http://m.phys.org/news/2015-02-hal-bests-humans-space-invade...</a>
Is the paper available anywhere to read without having to pay Nature? From the comments it seems as if everyone is able to read this but me! Even in their "readcube" access method, only the first page is (barely) visible, the rest seems blurred.
The most interesting thing about this is that it shows significant progress towards goal-oriented AI. The fact this system is effectively learning what "win" means in the context of a game is something of a breakthrough.
It is an amazingly powerful technique. We've been working on a service which lets you do this kind of learning with any JSON stream. You can see a demo here:<p><a href="https://aiseedo.com/demos/cookiemonster/" rel="nofollow">https://aiseedo.com/demos/cookiemonster/</a>
Can someone convert "academia nerd language" down one notch into "regular nerd language". On the surface, this sounds interesting but despite being a huge nerd I'm not really sure what the hell they're talking about.
For comparison: <a href="http://www.cs.cmu.edu/~tom7/mario/" rel="nofollow">http://www.cs.cmu.edu/~tom7/mario/</a>. That is way more of a hack, but I am not sure this is that big a step forward. Space invaders and breakout aren't the hardest games and I haven't heard a hard argument that it is just a matter of scale to create a machine that, say, plays chess.