This looks very singularity-ish to me; LLMs can construct a reward function that can be trained in simulation, and the trained reward function works surprisingly well in the physical world.<p>Some of the videos raised questions in my mind as to whether or not the leash was doing stabilization work on the robot; it might be. But, if you watch the video where they deflate a yoga ball, you can see the difference when they're "saving" the robot and when they're just keeping it taut. The 'dextrous cube' manipulation video is also pretty compelling.<p>This sort of 'automate a grad student' type work, in this case, coming up with a reasonable reward function, all stacks up over time. And, I'd bet a lot of people's priors would be "this probably won't work," so it's good to see it can work in some circumstances -- will save time and effort down the road.