TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Generalist Agent [pdf]

72 点作者 randomperson_24大约 3 年前

2 条评论

gwern大约 3 年前
Gato, a Decision Transformer on steroids, is pretty much what you would expect, with the expected RL scaling curves†, if you&#x27;ve been following ML scaling research for the past 2 years. It is, however, still mindblowing to see it in reality.<p>And note that it&#x27;s only as small (and thus, weak) as it is because they want to run it directly on robots (&quot;We focus our training at the operating point of model scale that allows real-time control of real-world robots, currently around 1.2B parameters&quot;).<p>† <a href="https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;A%20Generalist%20Agent&#x2F;Generalist%20Agent.pdf#page=11" rel="nofollow">https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;A%20Generalist...</a> looks just like any scaling curve from a text or vision paper...<p>Also submitted at <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31355657" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31355657</a>
评论 #31356304 未加载
评论 #31359690 未加载
评论 #31362399 未加载
vladf大约 3 年前
This work has two really interesting contributions, in my opinion.<p>1. Creating a few data points (3) for scaling laws (Figure 8). These behave similar to language models, as gwern puts it [1], but across three data points, it&#x27;s a bit tough to draw a power-law conclusion (eyeballing the figure, they increase params 4.5x and 3.2x and see about 20% relative performance improvement from each jump).<p>2. What I find more interesting than the scaling is the out-of-distribution (OOD) generalization results (Figure 9). They test the performance of the agent on a completely unseen task (though possibly from within the same domain, i.e., they might train on a fixed physics engine from the DeepMind Control Suite [2], but never let the agent look at the cartpole task). They compare this to various ablations: from-scratch training with the same architecture, pretraining only with same-domain data, and pretraining only on non-control data (presumably unsupervised contrastive-learning based data).<p>The results from (1) are impressive and from (2) are mixed (but no less interesting as a contribution!) in terms of the additional training data actually helping with generalization. The reason OOD generalization performance is most interesting is because it really tests whether control-based pretraining helps the agent in a truly new situation. And certainly, there are a couple tasks at which the zero-shot performance improves over the ablations (but there are others where it hurts).<p>What I&#x27;d find exciting to see in coming research is further investigation into variants of Figure 9.<p>- How does scaling affect the impact of control-data pretraining vs non-control data pretraining?<p>- The authors used a custom fine-tuning schedule for the few-shot evaluation on unseen tasks. It&#x27;s possible the schedule needs to be changed for the ablated versions of the agents to give them the best performance, too. What would Figure 9 look like with the &quot;best&quot; training setup for each ablation individually? I.e., can we tease apart how much, if at all, it&#x27;s a matter of low-level modality-specific features helping zero-shot adaptation vs some kind of truly generalized &quot;control pretraining&quot;?<p>[1] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31356155" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31356155</a> [2] <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1801.00690" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1801.00690</a>