Hi, I'm one of the authors of this paper (<a href="https://arxiv.org/abs/1803.10122" rel="nofollow">https://arxiv.org/abs/1803.10122</a>, <a href="https://worldmodels.github.io" rel="nofollow">https://worldmodels.github.io</a>).<p>Happy to answer any questions you may have.
This is a neat paper - it's an interesting empirical result combining known techniques - but machine learning academics should really know better than to contribute to the over-hyping of results. For example, talking about "dreams" and "hallucinations" is not helpful - it doesn't make the work more accessible and only adds unnecessary hype.
<i>Our agent consists of three components that work closely
together: Vision (V), Memory (M), and Controller (C)</i><p>Next web frameworks are going to be smart!
The original interactive blog post is also really awesome <a href="https://worldmodels.github.io/" rel="nofollow">https://worldmodels.github.io/</a>
The post talks about running "video" on a remote server for the RL training, but not how to take that image and visualize it locally (which would be helpful for debugging failing models).<p>Let's say I wanted to run a Twitch stream of RL training on a remote server (and stream directly from the server to Twitch). What is the intended way to render the video in real time remotely?
Is this similar to Dyna-Q learning, but with modeling/simulation being handled by the RNN?<p>It looks like the VAE is just used to create a feature vector, so the main difference seems to be in the MDN-RNN - which is taking the place of the usual state/action simulation in Dyna-Q.
Who decides what is the correct information to learn? What will prevent a bad actor from providing subject material that teaches people to bring harm to themselves or others. Post Traumatic Stress Disorder sounds, at least to the layman, as this very design pattern, but obviously reinforces undesirable subjects.