"EPG takes a step toward agents that are not blank slates but instead know what it means to make progress on a new task, by having experienced making progress on similar tasks in the past."
Can someone explain to me how they take a step? It seems like they just use random search define a loss function for the sub-policy to optimize against. Is it because the loss function is "learned" over the sequence of actions, making it adaptive?
TL;DR:<p>Parametrize your loss function and wrap a normal policy optimization with a random search to find a better loss function. Don't call it "random search," call it "evolution strategies" to make it sound sophisticated.<p>Neat idea.
Would someone here know how to go about recreating a physics sandbox using a virtual robot arm with cubes in a game engine editor like Unity/UE4 where we'd be able to apply ML?<p>Any suggestion is welcome