It seems to me there's been an interesting turn in AI recently, toward focusing on adaptability as a goal in itself. Deep learning has shown that there is incredible power in stochastic gradient descent over a space of functions, but so far that has mostly been applied to rigid tasks. Now work like this is about turning that power towards adaptability itself as a goal, and it seems to me that this brings us towards "real" intelligence.<p>The logical extreme of this thinking would be agents that actually maximize entropy of future actions as the only objective function, like in [1]<p>[1] <a href="http://paulispace.com/intelligence/2017/07/06/maxent.html" rel="nofollow">http://paulispace.com/intelligence/2017/07/06/maxent.html</a>
Does this optimise the hierarchy as the environment changes? For example when cooking, I unpackage food as needed, but when it starts to clutter the workspace I make a decision to fit in a 'clean up cycle' while waiting on some other food to cook.
I was mulling over this idea yesterday in the context of RTS games...
There's no reason to consider changing your overall strategy every frame. Nice to see it works!<p>It will be interesting to see how it performs with more tiers in the hierarchy, and with more structured tasks.<p>Controlling a virtual arm to play a board game for example.
Found the paper from the wired article below<p><a href="https://s3-us-west-2.amazonaws.com/openai-assets/MLSH/mlsh_paper.pdf" rel="nofollow">https://s3-us-west-2.amazonaws.com/openai-assets/MLSH/mlsh_p...</a>
Next step: transfer learning and sharing amongst sub-policies in the graph hierarchy. If an Ant Agent learns to "move up" to avoid obstacle or reach goal. Why can't it infer the same for any cardinal or diagonal direction, after observing the world around it. It's just a rotation or translation after all.<p>Also, for small numbers of sub-policies, would Monte Carlo playouts be faster. Where we are searching over the next step the Any may encounter. Which presumably is a finite set of possible "wall-floor" configurations ;)<p>In any case, great work! Always love watching OpenAI vids...
I don't understand where the 'hierarchy' comes into play? This reads to me as a standard computer program where you execute code, and some of those lines execute other segments of code which might be much more complex than what I see. If I execute the line 'printline('Hello World')' I only excuted one line, but many other things happened that I did not directly execute. I'm sure I'm missing something, and this is somehow different and novel, but I'm just missing it from this blog post.
Is it just me or is there something revolting about the character model?<p>Good work nonetheless but for god's sake give it six legs and make it black