68 点作者 agnosticmantis超过 1 年前

2 条评论

This would have been bigger news except for Gemini 1.5, Sora, and the Magic investment all happening at the same time. Gemini can do needle in a haystack reliably in the 3hrs of video they tested against.

评论 #39402841 未加载

评论 #39396416 未加载

bitshiftfaced超过 1 年前

Look at how Alpha Go started with human data, and then they found a way to train it without that. I've been wondering if it might be possible to do a similar thing with LLMs by grounding them on real world video by having them predict what happens in the video. I suppose you'd still need some minimal language ability to bootstrap it from, but imagine it learning the laws of physics and mathematics from the ground up.

V-JEPA: Video Joint Embedding Predictive Architecture (V-JEPA) Model

2 条评论

V-JEPA: Video Joint Embedding Predictive Architecture (V-JEPA) Model

2 条评论