科技回声

18 条评论

Amazing that you can just shove a ton of multimodal data into a big transformer and get a really good multimodal model. I wonder where things will top out. For many years a lot of people (including me) were saying "you can't just take existing architectures, scale them up, feed them a lot of data, and expect something mpressive", but here we are.

评论 #39372224 未加载

评论 #39371105 未加载

评论 #39382230 未加载

brucethemoose2超过 1 年前

We've been testing it in the local llm Discords, turns out its just a llama 7B finetune that can run on any old GPU (which is cool).<a href="https://huggingface.co/brucethemoose/LargeWorldModel_LWM-Text-Chat-128K-55bpw" rel="nofollow">https://huggingface.co/brucethemoose/LargeWorldModel_LWM-Tex...</a><a href="https://huggingface.co/dranger003/LWM-Text-Chat-128K-iMat.GGUF" rel="nofollow">https://huggingface.co/dranger003/LWM-Text-Chat-128K-iMat.GG...</a>And its long context recall is quite good! We've already kind of discovered this with Yi, but there are some things one can do with a mega context that you just can't get with RAG.

评论 #39374952 未加载

评论 #39370607 未加载

bbor超过 1 年前

Because it might not be clear:<pre><code> … d)Fully open-sourced a family of 7B parameter models capable of processing long text documents (LWM-Text, LWM-Text-Chat) and videos (LWM, LWM-Chat) of over 1M tokens. </code></pre> <a href="https://huggingface.co/LargeWorldModel" rel="nofollow">https://huggingface.co/LargeWorldModel</a>In terms of content, I am blown away yet again by the SoTA speeding on by as I try to catch up. Can someone with a more cynical eye point me to competitors or problems with this approach? Because as it stands… that jump to a context length of a million tokens is pretty impressive to an outsider.

评论 #39367859 未加载

评论 #39370146 未加载

C4stor超过 1 年前

I wonder why are the example videos this specific clip compilation format.It feels to me that to navigate that, you essentially have to index 500 10-seconds videos, and that looks a lot easier than retrieving information that is in an actual 1 hour long video, because the later one will have a lot more of easy to mix-up moments. So maybe it hides an inability to answer questions about actual long videos (in the paper, the other example videos cap at 3 minutes length for what I can see).On the other hand, maybe it's just for results presentation purposes, because it is much more readily "verifiable" for everyone than saying "trust us, in this very long video, there's the correct answer unarguably".So if someone happens to more about that, I'd be very interested

kromem超过 1 年前

It's pretty wild watching technology develop where I genuinely don't have a confident idea of just how far it will progress by December in February of the same year.Open models have just been on fire lately, and the next generation of SotA models to pull synthetic data from in training the next generation of open models each taking nuanced and clever approaches to infrastructure improvements has me pretty much considering all bets to be off.At this point, the bottleneck is increasingly the human ability to adapt to improving tools than limitations in the tools themselves.

评论 #39368047 未加载

kavaivaleri超过 1 年前

Some pretty fascinating collaborators:- Matei Zaharia, a CTO of Databricks - Pieter Abbeel Director of the Berkeley Robot Learning Lab, Co-Director of the Berkeley Artificial Intelligence Research (BAIR) lab - Two talented PhD students: Hao Liu, Wilson Yan

jerpint超过 1 年前

This looks really promising!Other than this sentence:> We curated a large dataset of videos and languages from public book and video datasets, consisting of videos of diverse activities and long-form books.I didn’t see any other mention of datasets used, is this on intentional?

评论 #39370092 未加载

评论 #39375836 未加载

评论 #39372936 未加载

评论 #39369485 未加载

kleiba超过 1 年前

What does "Million-Length" mean?

评论 #39369459 未加载

jaredsohn超过 1 年前

Berkeley

评论 #39368058 未加载

Delumine超过 1 年前

It blows my mind how quickly we are moving with these advances in LLM, and these are just the ones we see in PUBLIC. I'm sure there are more advanced proprietary solutions that we aren't privy to.

labrador大约 1 年前

This implementation is similar to something Ilya Sutskever said a few months ago but I think I am misunderstanding both: I think they are saying robots could learn how to move and what facial expressions to use by watching millions of hours of videos involving humans, a sort of LLM of human behavior. I am not a scientist so I may have this wrong.

评论 #39376379 未加载

pfooti超过 1 年前

nit: UC Berkeley. Not Berkley.

robertkoss超过 1 年前

It feels like Matei is everywhere, impressive!

7734128超过 1 年前

Figure 2 and 3 are incredible, and I hope they're true in real life scenarios.

pk-protect-ai超过 1 年前

This is a BOMB! Love it!

xvector超过 1 年前

This is incredible.

yusml超过 1 年前

Thanks for sharing! I've added to smmry.tech

cranberryturkey超过 1 年前

wow. talk about that show with the Lost guy as an eccentric billionaire? this is what he built as a surveillance system.

评论 #39367631 未加载

18 条评论

knightoffaith超过 1 年前

评论 #39372224 未加载

评论 #39371105 未加载

评论 #39382230 未加载

brucethemoose2超过 1 年前

评论 #39374952 未加载

评论 #39370607 未加载

bbor超过 1 年前

评论 #39367859 未加载

评论 #39370146 未加载

C4stor超过 1 年前

kromem超过 1 年前

评论 #39368047 未加载

kavaivaleri超过 1 年前

jerpint超过 1 年前

评论 #39370092 未加载

评论 #39375836 未加载

评论 #39372936 未加载

评论 #39369485 未加载

kleiba超过 1 年前

What does "Million-Length" mean?

评论 #39369459 未加载

jaredsohn超过 1 年前

Berkeley

评论 #39368058 未加载

Delumine超过 1 年前

It blows my mind how quickly we are moving with these advances in LLM, and these are just the ones we see in PUBLIC. I'm sure there are more advanced proprietary solutions that we aren't privy to.

labrador大约 1 年前

评论 #39376379 未加载

pfooti超过 1 年前

nit: UC Berkeley. Not Berkley.

robertkoss超过 1 年前

It feels like Matei is everywhere, impressive!

7734128超过 1 年前

Figure 2 and 3 are incredible, and I hope they're true in real life scenarios.

pk-protect-ai超过 1 年前

This is a BOMB! Love it!

xvector超过 1 年前

This is incredible.

yusml超过 1 年前

Thanks for sharing! I've added to smmry.tech

cranberryturkey超过 1 年前

wow. talk about that show with the Lost guy as an eccentric billionaire? this is what he built as a surveillance system.

评论 #39367631 未加载

World model on million-length video and language with RingAttention

18 条评论

World model on million-length video and language with RingAttention

18 条评论