TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

V-JEPA: Video Joint Embedding Predictive Architecture (V-JEPA) Model

68 点作者 agnosticmantis超过 1 年前

2 条评论

jimmySixDOF超过 1 年前
This would have been bigger news except for Gemini 1.5, Sora, and the Magic investment all happening at the same time. Gemini can do needle in a haystack reliably in the 3hrs of video they tested against.
评论 #39402841 未加载
评论 #39396416 未加载
bitshiftfaced超过 1 年前
Look at how Alpha Go started with human data, and then they found a way to train it without that. I've been wondering if it might be possible to do a similar thing with LLMs by grounding them on real world video by having them predict what happens in the video. I suppose you'd still need some minimal language ability to bootstrap it from, but imagine it learning the laws of physics and mathematics from the ground up.