TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DrEureka: Language Model Guided SIM-to-Real Transfer

65 点作者 jasondavies大约 1 年前

6 条评论

throwup238大约 1 年前
This research is all beyond me so maybe someone can explain: How does this compare to the state of the art in using simulators to train physical robots? Does using transformers help in any way or can this just as easily be done with other architectures?<p>To the uninitiated this looks cool as all heck and yet another step towards the Star Trek future where we do everything in a simulator first and it always kinda just works in the real world (plot requirements notwithstanding).<p>Although I can also hear the distant sounds of a hundred military R&amp;D labs booting up Metalhead [1] simulators.<p>Edit: Looks like the previous SOTA was still a manual process where the user had to come up with a reward function that actually rewards the actions they wanted to the algorithm to learn. This research uses language models to do that tedious step instead.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Metalhead_(Black_Mirror)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Metalhead_(Black_Mirror)</a>
评论 #40253029 未加载
评论 #40256897 未加载
vessenes大约 1 年前
This looks very singularity-ish to me; LLMs can construct a reward function that can be trained in simulation, and the trained reward function works surprisingly well in the physical world.<p>Some of the videos raised questions in my mind as to whether or not the leash was doing stabilization work on the robot; it might be. But, if you watch the video where they deflate a yoga ball, you can see the difference when they&#x27;re &quot;saving&quot; the robot and when they&#x27;re just keeping it taut. The &#x27;dextrous cube&#x27; manipulation video is also pretty compelling.<p>This sort of &#x27;automate a grad student&#x27; type work, in this case, coming up with a reasonable reward function, all stacks up over time. And, I&#x27;d bet a lot of people&#x27;s priors would be &quot;this probably won&#x27;t work,&quot; so it&#x27;s good to see it can work in some circumstances -- will save time and effort down the road.
userbinator大约 1 年前
From the current casing I want to clarify that this is &quot;sim&quot; as in simulation, not as in SIM card. (Made me click.)
refulgentis大约 1 年前
Every single second of every example has a handler holding a leash - and not just holding it, holding it without any slack.<p>Blindingly obvious interference from Ouija board effect.<p>I don&#x27;t mean to denigrate the work, I believe the researchers are honest and I hope there&#x27;s demoes outside the published one. Just, at best, an obvious unforced error that leaves open a big question.<p>EDIT: Replier below shared a gif with failures, tl;dr this looks like two different experiment protocols, one for success, one for failure. <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;DmepBVU" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;DmepBVU</a>
评论 #40251090 未加载
canadiantim大约 1 年前
So the robot dog that&#x27;s going to kill me in the near future will atleast be adorably balancing on a big rubber ball
评论 #40251958 未加载
FrustratedMonky大约 1 年前
Kind of like how a human visualizes before a sport.?<p>Like visualizing free throws in basketball, makes you measurably better, without actually doing free throws for real?