科技回声

6 条评论

throwup238大约 1 年前

This research is all beyond me so maybe someone can explain: How does this compare to the state of the art in using simulators to train physical robots? Does using transformers help in any way or can this just as easily be done with other architectures?To the uninitiated this looks cool as all heck and yet another step towards the Star Trek future where we do everything in a simulator first and it always kinda just works in the real world (plot requirements notwithstanding).Although I can also hear the distant sounds of a hundred military R&D labs booting up Metalhead [1] simulators.Edit: Looks like the previous SOTA was still a manual process where the user had to come up with a reward function that actually rewards the actions they wanted to the algorithm to learn. This research uses language models to do that tedious step instead.[1] <a href="https://en.wikipedia.org/wiki/Metalhead_(Black_Mirror)" rel="nofollow">https://en.wikipedia.org/wiki/Metalhead_(Black_Mirror)</a>

评论 #40253029 未加载

评论 #40256897 未加载

vessenes大约 1 年前

This looks very singularity-ish to me; LLMs can construct a reward function that can be trained in simulation, and the trained reward function works surprisingly well in the physical world.Some of the videos raised questions in my mind as to whether or not the leash was doing stabilization work on the robot; it might be. But, if you watch the video where they deflate a yoga ball, you can see the difference when they're "saving" the robot and when they're just keeping it taut. The 'dextrous cube' manipulation video is also pretty compelling.This sort of 'automate a grad student' type work, in this case, coming up with a reasonable reward function, all stacks up over time. And, I'd bet a lot of people's priors would be "this probably won't work," so it's good to see it can work in some circumstances -- will save time and effort down the road.

userbinator大约 1 年前

From the current casing I want to clarify that this is "sim" as in simulation, not as in SIM card. (Made me click.)

refulgentis大约 1 年前

Every single second of every example has a handler holding a leash - and not just holding it, holding it without any slack.Blindingly obvious interference from Ouija board effect.I don't mean to denigrate the work, I believe the researchers are honest and I hope there's demoes outside the published one. Just, at best, an obvious unforced error that leaves open a big question.EDIT: Replier below shared a gif with failures, tl;dr this looks like two different experiment protocols, one for success, one for failure. <a href="https://imgur.com/a/DmepBVU" rel="nofollow">https://imgur.com/a/DmepBVU</a>

评论 #40251090 未加载

canadiantim大约 1 年前

So the robot dog that's going to kill me in the near future will atleast be adorably balancing on a big rubber ball

评论 #40251958 未加载

FrustratedMonky大约 1 年前

Kind of like how a human visualizes before a sport.?Like visualizing free throws in basketball, makes you measurably better, without actually doing free throws for real?

6 条评论

throwup238大约 1 年前

评论 #40253029 未加载

评论 #40256897 未加载

vessenes大约 1 年前

userbinator大约 1 年前

From the current casing I want to clarify that this is "sim" as in simulation, not as in SIM card. (Made me click.)

refulgentis大约 1 年前

评论 #40251090 未加载

canadiantim大约 1 年前

So the robot dog that's going to kill me in the near future will atleast be adorably balancing on a big rubber ball

评论 #40251958 未加载

FrustratedMonky大约 1 年前

Kind of like how a human visualizes before a sport.?Like visualizing free throws in basketball, makes you measurably better, without actually doing free throws for real?

DrEureka: Language Model Guided SIM-to-Real Transfer

6 条评论

DrEureka: Language Model Guided SIM-to-Real Transfer

6 条评论