I turned it off after a few minutes of yelling at the screen. In the interest of giving Yann a fair shake.... I started watching it again, and making this post.<p>Yann has a very, VERY narrow view of what constitutes a Large Language Model, that doesn't include any accessories at all, and it sounds like he's going to be one of those folks saying "Yeah... but that's not purely an LLM", as it does everything up to and including AGI. I'm starting to think they'll have AGI by the end of the year, if they don't already, thanks to MemGPT.<p>I strongly suspect Yann is a packer, not a mapper like me.<p>My objections with timestamps from the first 21 minutes. I could likely keep going through the whole thing like this, if sufficiently encouraged.<p><pre><code> 2:56 - Things LLMs don't have #1 "Capacity to understand the world" - We understand things we can't see or touch, like atoms and quarks and black holes. I don't think an LLM has to "understand" the world the way we do, in order to make predictions and plans involving the world.
3:05 - Things LLMs don't have #2 "The ability to remember and retrieve things / Persistent Memory"
- MemGPT fixes this with some tool usage on the part of the LLM. My understanding of how it was done:
Generate synthetic training data that shows an LLM how to use a memory prosthetic
Train it against said data until it knows how to use the memory prosthetic.
Sure it eats up some of the context, but "pay attention or die" (aka driving a car) tends to consume human context too.
https://news.ycombinator.com/item?id=37901902
3:08 - Things LLMs don't have #3 "The ability to reason"
Thanks to a "Chain of Thought Prompting", this can be overcome.
https://news.ycombinator.com/item?id=36131450
3:11 - Things LLMs don't have #4 "The Ability to Plan"
LLMplan seems to get around this by translating to PDDL and using a prosthetic to do the planning.
https://news.ycombinator.com/item?id=35735375
5:11 - "The optical never carries about 20 megabytes per second"
Actually, our capacity comprehend data is far, FAR lower than that, closer to 60 bits/second
https://www.technologyreview.com/2009/08/25/210267/new-measure-of-human-brain-processing-speed/
6:20 - Lex correctly states there is a lot of structure and wisdom in human language
Strong agree, it's a miracle that Word2Vec works as well as it does, and it's mostly the structure inherent in human language that makes it possible to build it at all.
6:40 - Yann discounts completely the trillions of bytes of human descriptions of the real world and our encounters with it as a guide for the LLM to understand the world.
I suspect this is probably because of his category error in estimating how much data we intake from vision.
8:56 - Yann sites Moravec's Paradox, and the problems in robotics doing physical tasks as a way to dismiss any progress with LLMs.
It's my strong opinion that these limitations are temporary, and can be overcome in time.
13:50 - Yann states that because an LLM only predicts the next word, it can't think more than one word ahead, I believe again, this is a category error that leads him astray.
The context window of an LLM is large enough to allow for multiple possible ways to describe something, and the randomness of the first word then drives the coherent subsequent choices. People do this all the time when trying to compose rhymes, etc. LLMs can do similar tasks, but are a bit handicapped by the way their input is tokenized.
19:30 - "LLMs can't predict the next frame of a video"
Comma.AI showed it off almost a year ago
https://www.youtube.com/watch?v=hpRzNxQvZDI</code></pre>