Looks really cool.<p>I recently hit a roadblock when trying to implement the original DeepMind Atari algorithm [0] with TensorFlow. They don't mention this in the paper, but the network wasn't trained to convergence at each training step (maybe this would be obvious to people more well-versed in deep learning, but it wasn't to me coming from a classical RL background).<p>As it turns out, TensorFlow's optimizers don't have a way to manually terminate training before convergence. That meant I was getting through several orders of magnitude fewer training steps than the DeepMind team did, even when accounting for my inferior hardware. This might not be a problem in some learning cases, where training more on certain examples lets you extract more information from them, but in games with sparse rewards it's bad.<p>Of course, TensorFlow does let you do the gradient calculations and updates by hand, but I wasn't prepared to go that far at the time. Maybe in the next few weeks I'll dive back into it.<p>[0] <a href="https://arxiv.org/pdf/1312.5602.pdf" rel="nofollow">https://arxiv.org/pdf/1312.5602.pdf</a>