TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tackling multiple tasks with a single visual language model

114 点作者 Ftuuky大约 3 年前

6 条评论

Jack000大约 3 年前
2022: Deepmind releases paper on bootstrapped meta-learning and scaling RL agents<p>2023: RL agent trained for multi-task learning solves majority of perfect information games. It&#x27;s a scaled up decision transformer. Scaling laws for RL agents are discovered, similar to language models.<p>2024: Large scale RL agents are combined with frozen vision and language models via cross-attention, can be prompted one-shot with language&#x2F;vision tokens to solve novel tasks.<p>2025: RL agents enter the real world - first pre-trained in diverse synthetic environments, then via imitation learning from youtube videos, and finally in an online fashion via realtime human interaction.<p>timeline might be optimistic, but one can hope!
评论 #31199341 未加载
评论 #31195681 未加载
评论 #31196319 未加载
maxwells-daemon大约 3 年前
Wow! The ability to ingest the &quot;cross product&quot; of data on the internet and in the real world is huge; I bet a lot of what LMs don&#x27;t know yet lives in that space. This seems a lot more general-purpose than CLIP, so I&#x27;m hopeful for even more impressive downstream applications, eg robotics.
goldenkey大约 3 年前
&quot;I am not affected by this difference&quot; - What The Fuck?!
bobbylarrybobby大约 3 年前
The conversations are scary. They almost don&#x27;t seem believable -- did I miss the part where they say they&#x27;re just an example of what a conversation might look like?
评论 #31194804 未加载
jcims大约 3 年前
I would love to hear some of the spine tingling moments these researchers experience when developing and interacting with large models.
razodactyl大约 3 年前
AI. Just casually evolving alongside and using us as their conduit. Lol