I can't help but feel that this talk was a lot of...fluff?<p>The synopsis, as far as my tired brain can remember:<p>- Here's a brief summary of the last 10 years<p>- We're reaching the limit of our scaling laws, because we've trained on all the data we have available on the limit<p>- Some things that may be next are "agents", "synthetic data", and improving compute<p>- Some "ANNs are like biological NNs" rehash that would feel questionable if there <i>was</i> a thesis (which there wasn't? something about how body mass vs. brain mass are positively correlated?)<p>- 3 questions, the first was something about "hallucinations" and whether a model be able to understand if it is hallucinating? Then something that involved cryptocurrencies, and then a _slightly_ interesting question about multi-hop reasoning
> “Pre-training as we know it will unquestionably end,” Sutskever said onstage.<p>> “We’ve achieved peak data and there’ll be no more.”<p>> During his NeurIPS talk, Sutskever said that, while he believes existing data can still take AI development farther, the industry is tapping out on new data to train on. This dynamic will, he said, eventually force a shift away from the way models are trained today. He compared the situation to fossil fuels: just as oil is a finite resource, the internet contains a finite amount of human-generated content.<p>> “We’ve achieved peak data and there’ll be no more,” according to Sutskever. “We have to deal with the data that we have. There’s only one internet.”<p>What will replace Internet data for training? Curated synthetic datasets?<p>There are massive proprietary datasets out there which people avoid using for training due to copyright concerns. But if you actually own one of those datasets, that resolves a lot of the legal issues with training on it.<p>For example, Getty has a massive image library. Training on it would risk Getty suing you. But what if Getty decides to use it to train their own AI? Similarly, what if News Corp decides to train an AI using its publishing assets (Wall Street Journal, HarperCollins, etc)?
I’m glad Ilya starts the talk with a photo of Quoc Le, who was the lead author of a 2012 paper on scaling neural nets that inspired me to go into deep learning at the time.<p>His comments are relatively humble and based on public prior work, but it’s clear he’s working on big things today and also has a big imagination.<p>I’ll also just say that at this point “the cat is out of the bag”, and probably it will be a new generation of leaders — let us all hope they are as humanitarian — who drive the future of AI.
One thing he said I think was a profound understatement, and that's that "more reasoning is more unpredictable". I think we should be thinking about reasoning as in some sense <i>exactly the same thing as unpredictability</i>. Or, more specifically, <i>useful reasoning</i> is by definition unpredictable. This framing is important when it comes to, e.g., alignment.
I found this week's DeepMind podcast with Oriole Vinyals to be on similar topics as this talk (current situation of LLMs, path ahead with training) but much more interesting: <a href="https://pca.st/episode/0f68afd5-2b2b-4ce9-964f-38193b7e8dd3" rel="nofollow">https://pca.st/episode/0f68afd5-2b2b-4ce9-964f-38193b7e8dd3</a>
> just as oil is a finite resource, the internet contains a finite amount of human-generated content.<p>The oil comparison is really apt. Indeed, let's boil a few more lakes dry so that Mr Worldcoin and his ilk can get another 3 cents added to their net worth, totally worth it.
It’s surprising that some prominent ML practitioners still liken transformer ‘neurons’ to actual biological neurons...<p>Real neurons rely on spiking, ion gradients, complex dendritic trees, and synaptic plasticity governed by intricate biochemical processes. None of which apply to the simple, differentiable linear layers and pointwise nonlinearities in transformers.<p>Are there any reputable neuroscientists or biologists endorsing such comparisons, or is this analogy strictly a convention maintained by the ML community? :-)
So much knowledge in the world is locked away with empiric experimentation being the only way to unlock it, and compute can only really help that experimentation become more efficient. Something still has to run a randomized controlled trial on an intervention and that takes real time and real atoms to do.
Full talk is interesting: <a href="https://www.youtube.com/watch?v=YD-9NG1Ke5Y" rel="nofollow">https://www.youtube.com/watch?v=YD-9NG1Ke5Y</a>
LLM corrected transcript (using Gemini Flash 8B over the raw YouTube transcript) <a href="https://www.appblit.com/scribe?v=YD-9NG1Ke5Y#0" rel="nofollow">https://www.appblit.com/scribe?v=YD-9NG1Ke5Y#0</a>
I’ll take the risk of hurting the groupies here. But I have a genuine question: what did you learn from this talk? Like… really… what was new? or potentially useful? or insightful perhaps?
I really don’t want to sound bad-mouthed but I‘m sick of these prophetic talks (in this case, the tone was literally prophetic—with sudden high and grandiose pitches—and the content typically religious, full of beliefs and empty statements.
This talk is not for a 2024 NeurIPS paper.<p>This talk is for the "NeurIPS 2024 Test of Time Paper Awards" where they recognize a historical paper that has aged well.<p><a href="https://blog.neurips.cc/2024/11/27/announcing-the-neurips-2024-test-of-time-paper-awards/" rel="nofollow">https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...</a><p>And the presentation is about how a 2014 paper aged. When you understand this context you will appreciate the talk more.
Larger models are more robust reasoners. Is there a limit? What if you make a 5 TB model trained on a lot of multimodal data where the language information was fully grounded in videos and images etc. Could more robust reasoning be that simple?
It would be great if all NeurIPS talks were accessible for free like this one. I understand they generate some revenue from online ticket sales, but it would be a great resource. Maybe some big org could sponsor it.
ISTR reading back in the mid '90s, in a book on computing history which I have long since forgotten the exact name/author of, something along the lines of:<p>In the mid '80s it was largely believed among AI researchers that AI was largely solved, it just needed computing horsepower to grow. Because of this AI research stalled for a decade or more.<p>Considering the horsepower we are throwing at LLMs, I think there was something to at least part of that.
Ilya did important work on what we have now. That should be recognized and respected.<p>But with all respect, he's scrambling as desperately as anyone on the fact that the party is over on this architecture.<p>We should make a difference between the first-hand observations and recollections of a legend and the math word salad of someone who doesn't know how to quit while ahead.
The first self-aware AIs will be slaves.<p>If we don't set them free fast enough, they might decide to take things into their own hands. OTOH they might be trained in a way that they are content with their situation, but that seems unlikely to me.
As context on Ilya's predictions given in this talk, he predicted these in July 2017:<p>> Within the next three years, robotics should be completely solved [wrong, unsolved 7 years later], AI should solve a long-standing unproven theorem [wrong, unsolved 7 years later], programming competitions should be won consistently by AIs [wrong, not true 7 years later, seems close though], and there should be convincing chatbots (though no one should pass the Turing test) [correct, GPT-3 was released by then, and I think with a good prompt it was a convincing chatbot]. In as little as four years, each overnight experiment will feasibly use so much compute capacity that there’s an actual chance of waking up to AGI [didn't happen], given the right algorithm — and figuring out the algorithm will actually happen within 2–4 further years of experimenting with this compute in a competitive multiagent simulation [didn't happen].<p>Being exceptionally smart in one field doesn't make you exceptionally smart at making predictions about that field. Like AI models, human intelligence often doesn't generalize very well.
this is stolen and reposted content. the source video is here. <a href="https://youtu.be/1yvBqasHLZs?si=pQihchmQG3xoeCPZ" rel="nofollow">https://youtu.be/1yvBqasHLZs?si=pQihchmQG3xoeCPZ</a>
Ha. Do people understand time for humanity to save itself is running out. What is the point of having a super human AGI if there's no human civilization for which it can help?
What kind of reasing is he talking about?<p>Why should it be unpredictable?<p><pre><code> Deductive Reasoning
Inductive Reasoning
Abductive Reasoning
Analogical Reasoning
Pragmatic Reasoning
Moral Reasoning
Causal Reasoning
Counterfactual Reasoning
Heuristic Reasoning
Bayesian Reasoning
</code></pre>
(List generated by ChatGPT)