TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DrEureka: Language Model Guided SIM-to-Real Transfer

65 pointsby jasondaviesabout 1 year ago

6 comments

throwup238about 1 year ago
This research is all beyond me so maybe someone can explain: How does this compare to the state of the art in using simulators to train physical robots? Does using transformers help in any way or can this just as easily be done with other architectures?<p>To the uninitiated this looks cool as all heck and yet another step towards the Star Trek future where we do everything in a simulator first and it always kinda just works in the real world (plot requirements notwithstanding).<p>Although I can also hear the distant sounds of a hundred military R&amp;D labs booting up Metalhead [1] simulators.<p>Edit: Looks like the previous SOTA was still a manual process where the user had to come up with a reward function that actually rewards the actions they wanted to the algorithm to learn. This research uses language models to do that tedious step instead.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Metalhead_(Black_Mirror)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Metalhead_(Black_Mirror)</a>
评论 #40253029 未加载
评论 #40256897 未加载
vessenesabout 1 year ago
This looks very singularity-ish to me; LLMs can construct a reward function that can be trained in simulation, and the trained reward function works surprisingly well in the physical world.<p>Some of the videos raised questions in my mind as to whether or not the leash was doing stabilization work on the robot; it might be. But, if you watch the video where they deflate a yoga ball, you can see the difference when they&#x27;re &quot;saving&quot; the robot and when they&#x27;re just keeping it taut. The &#x27;dextrous cube&#x27; manipulation video is also pretty compelling.<p>This sort of &#x27;automate a grad student&#x27; type work, in this case, coming up with a reasonable reward function, all stacks up over time. And, I&#x27;d bet a lot of people&#x27;s priors would be &quot;this probably won&#x27;t work,&quot; so it&#x27;s good to see it can work in some circumstances -- will save time and effort down the road.
userbinatorabout 1 year ago
From the current casing I want to clarify that this is &quot;sim&quot; as in simulation, not as in SIM card. (Made me click.)
refulgentisabout 1 year ago
Every single second of every example has a handler holding a leash - and not just holding it, holding it without any slack.<p>Blindingly obvious interference from Ouija board effect.<p>I don&#x27;t mean to denigrate the work, I believe the researchers are honest and I hope there&#x27;s demoes outside the published one. Just, at best, an obvious unforced error that leaves open a big question.<p>EDIT: Replier below shared a gif with failures, tl;dr this looks like two different experiment protocols, one for success, one for failure. <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;DmepBVU" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;DmepBVU</a>
评论 #40251090 未加载
canadiantimabout 1 year ago
So the robot dog that&#x27;s going to kill me in the near future will atleast be adorably balancing on a big rubber ball
评论 #40251958 未加载
FrustratedMonkyabout 1 year ago
Kind of like how a human visualizes before a sport.?<p>Like visualizing free throws in basketball, makes you measurably better, without actually doing free throws for real?