Sometimes it's hard to separate signal from noise when you're not part of a field and just hearing about projects/papers, so I wanted to quickly pitch in to say that this is a legitimately ground-breaking approach and line of work that you can expect to hear much more about in the future. It's probably the most exciting robotics/manipulation project I'm currently aware of.<p>What's exciting here is that the entire system is trained end-to-end (including the vision component). In other words, it's heading towards agents/robots that consist entirely of a single neural net and that's it; There is no software stack at all - it's just a GPU running a neural net "code base", from perception to actuators. In this respect this work is similar to the Atari game-playing agent that has to learn to see while also learning to play the game. Except this setting is quite a lot more difficult in some respects; In particular, the actions in the Deepmind Atari paper are few and discrete, while here the robot is an actual physical system with a very large-dimensional and continuous action space (joint torques). Also, if you're new to the field you might think "why is the robot so slow?", while someone in the field is thinking "holy crap how can it be so fast?"