48 pointsby gdbabout 7 years ago

3 comments

edhu2017about 7 years ago

"EPG takes a step toward agents that are not blank slates but instead know what it means to make progress on a new task, by having experienced making progress on similar tasks in the past." Can someone explain to me how they take a step? It seems like they just use random search define a loss function for the sub-policy to optimize against. Is it because the loss function is "learned" over the sequence of actions, making it adaptive?

twtwabout 7 years ago

TL;DR:<p>Parametrize your loss function and wrap a normal policy optimization with a random search to find a better loss function. Don't call it "random search," call it "evolution strategies" to make it sound sophisticated.<p>Neat idea.

yohann305about 7 years ago

Would someone here know how to go about recreating a physics sandbox using a virtual robot arm with cubes in a game engine editor like Unity/UE4 where we'd be able to apply ML?<p>Any suggestion is welcome

评论 #16870165 未加载

Evolved Policy Gradients

3 comments

Evolved Policy Gradients

3 comments