The best analogy for LLMs (up to and including AGI) is the internet + google search. Imagine explaining the internet/google to someone in 1950. That person might say "Oh my god, everything will change! Instantaneous, cheap communication! The world's information available at light speed! Science will accelerate, productivity will explode!" And yet, 70 years later, things have certainly changed, but we're living in the same world with the same general patterns and limitations. With LLMs I expect something similar. Not a singularity, just a new, better tool that, yes, changes things, increases productivity, but leaves human societies more or less the same.<p>I'd like to be wrong but I can't help but feel that people predicting a revolution are making the same, understandable mistake as my hypothetical 1950s person.
I think there’s a huge assumption here that more LLM will lead to AGI.<p>Nothing I’ve seen or learned about LLMs leads me to believe that LLMs are in fact a pathway to AGI.<p>LLMs trained on more data with more efficient algorithms will make for more interesting tools built with LLMs, but I don’t see this technology as a foundation for AGI.<p>LLMs don’t “reason” in any sense of the word that I understand and I think the ability to reason is table stakes for AGI.
>Furthermore, the fact that LLMs seem to need such a stupendous amount of data to get such mediocre reasoning indicates that they simply are not generalizing. If these models can’t get anywhere close to human level performance with the data a human would see in 20,000 years, we should entertain the possibility that 2,000,000,000 years worth of data will be also be insufficient. There’s no amount of jet fuel you can add to an airplane to make it reach the moon.<p>Never thought about it in this sense. Is he wrong?
The original title ("will scaling work?") seems like a much more accurate description of the article than the editorialized "why scaling will not work" that this got submitted with. The conclusion of the article is not that scaling won't work! It's the opposite, the author thinks that AGI before 2040 is more likely than not.
I was thinking last night about LLMs with respect to Wittgenstein after watching this interesting discussion of his philosophy by John Searle [1].<p>I think Wittgenstein's ideas are pertinent to the discussion of the relation of language to intelligence (or reasoning in general). I don't meant this in a technical sense (I recall Chomsky mentioning that almost no ideas from Wittgenstein actually have a place in modern linguistics) but from a metaphysical sense (Chomsky also noted that Wittgenstein was one of his formative influences).<p>The video I linked is a worthy introduction and not too long so I recommend it to anyone interested in how language might be the key to intelligence.<p>My personal take, when I see skeptics of LLMs approaching AGI, is that they implicitly reject a Wittgenstein view of metaphysics without actually engaging with it. There is an implicit Cartesian aspect to their world view, where there is either some mental aspect not yet captured by machines (a primitive soul) or some physical process missing (some kind of non-language <i>system</i>).<p>Whenever I read skeptical arguments against LLMs they are not credibly evidence based, nor are they credibly theoretical. They almost always come down to the assumption that language alone isn't sufficient. Wittgenstein was arguing long before LLMs were even a possibility that language wasn't just sufficient, it was inextricably linked to reason.<p>What excites me about scaling LLMs, is we may actually build evidence that supports (or refutes) his metaphysical ideas.<p>1. <a href="https://www.youtube.com/watch?v=v_hQpvQYhOI&ab_channel=PhilosophyOverdose" rel="nofollow">https://www.youtube.com/watch?v=v_hQpvQYhOI&ab_channel=Philo...</a>
Almost everything interesting about AI so far has been unexpected emergent behavior, and huge gains through minor insights. While I don't doubt that the current architecture is likely to have a current ceiling below that of peak human intelligence in certain dimensions, it's already surpassed it in some, and there are still gains to be made in others through things like synthetic data.<p>I also don't understand the claims that it doesn't generalize. I currently use it to solve problems that I can absolutely guarantee were not in its training set, and it generalizes well enough. I also think that one of the easiest ways to get it to generalize better would simply be through giving it synthetic data which demonstrates the process of generalizing.<p>It also seems foolish to extrapolate on what we have under the assumption that there won't be key insights/changes in architecture as we get to the limitations of synthetic data wins/multi-modal wins.
I think the more interesting question is how long will people cling to the illusion that LLMs will lead us to AGI?<p>Maintaining the illusion is important to keep the money flowing in.
I think there's a need to separate knowledge from learning algorithm. There's need to be a latent representation of knowledge that models attend to but the way it's done right now (with my limited understanding) doesn't seem to be it. Transformers seems to only attend to previous text in the context but not to the whole knowledge they posses which is obvious limitation IMO. Human brain probably also doesn't attend to whole knowledge but loads something into context so maybe it's fixable without changing architecture.<p>LLMs can work as data extraction already, so one can build some prolog DB and update it as it consumes data. Then translate any logic problems into prolog queries. I want to see this in practice.<p>Similar with usage of logic engines and computation/programs.<p>I also think that RL can come up with better training function for LLMs.
In the programming domain for example one could ask LLM to think about all possible test for given code and evaluate them automatically.<p>I was also thinking about using diffusER pattern where programming rules are kinda hardcoded (similar to add/replace/delete but instead algebra on functions/variables). Thats probably not AGI path but could be good for producing programs.
Author is leveraging mental inflexibility to generate an emotional response of denial. Sure, his points are correct but are constrained. Let’s remove 2 constraints and reevaluate:<p>1 - Babies learn much more with much less
2 - Video training data can be made in theory at incredible rates<p>The questions becomes: why is the author focusing on approaches in AI investigated in like 2012? Does the author think SOTA is text only? Are OpenAI or other market leaders only focusing on text? Probably not.
>Here’s one of the many astounding finds in Microsoft Research’s Sparks of AGI paper. They found that GPT-4 could write the LaTex code to draw a unicorn.<p>a lot of people have tried to replicate this, I have tried. It's very hard to get GPT-4 to draw a unicorn, also asking it to draw an upside down unicorn is even harder.
> ‘5 OOMs off’<p>I think Google, Microsoft and facebook could easily have 5 OOM data than the entire public web combined if we just count text. Majority of people don't have any content on public web except for personal photos. A minority has few public social media posts and it is rare for people to write blog or research paper etc. And almost everyone has some content written in mail or docs or messaging.
I love Dwarkesh, his podcast is phenomenal.<p>But every article/post of this kind immediately begs the question; What is AGI?
I have yet to hear even a decent standard.<p>It always seems like I'm reading Greek philosophers struggling with things they have almost no understanding of and just throwing out the wildest theories.<p>Honestly, it raised my opinion of them seeing how hard it is to reason about things which we have no grasp of.
If the size of the internet is really a bottleneck it seems Google is in quite a strong position.<p>Assuming they have effectively a log of the internet, rather than counting the current state of the internet as usable data we should be thinking about the list of diffs that make up the internet.<p>Maybe this ends up like Millenium Management where a key differentiator is having access to deleted datasets.
A few more interesting papers not mentioned in the article:<p>"Faith and Fate: Limits of Transformers on Compositionality"<p><a href="https://arxiv.org/abs/2305.18654" rel="nofollow">https://arxiv.org/abs/2305.18654</a><p>"Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks":<p><a href="https://arxiv.org/abs/2311.09247" rel="nofollow">https://arxiv.org/abs/2311.09247</a><p>"Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve"<p><a href="https://arxiv.org/abs/2309.13638" rel="nofollow">https://arxiv.org/abs/2309.13638</a><p>"Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models"<p><a href="https://arxiv.org/abs/2311.00871" rel="nofollow">https://arxiv.org/abs/2311.00871</a>
LLM is going to bring tons of cool applications, but AGI is not an application!<p>You can feed your dog 100,000 times a day, but that won't make it a 1,000kg dog. The whole idea that AGI can be achieved by predicting the next word is just pure marketing nonsense at best.
I am in the believer camp for simple reasons: 1) we haven’t even scratched the surface of government led investments into AI, 2) AI itself could probably discover better architectures than transformers (willing to bet heavily on this)
I think the "self-play" path is where the scary-powerful AI solutions will emerge. This implies persistence of state and logic that lives external to the LLM. The language model is just one tool. AGI/ASI/whatever will be a <i>system</i> of tools, of which the LLM might be the <i>least</i> complicated one to worry about.<p>In my view, domain modeling, managing state, knowing when to transition <i>between</i> states, techniques for final decision making, consideration for the time domain, and prompt engineering are the real challenges.
No, human is not that intelligent to generate super intelligent bot in a short time.<p>My estimation is about 200 years in future to have a "human-brain AI" that works.<p>All idea should be treated equally, not based on revenue metrics. If everyone could make a Youtube clone, the revenue should be divided equally to all of creator, that's the way the world should move forward, instead of monopoly.<p>Everything will be suck, forever.
Here is an idea. Maybe the most optimized neural network is the brain. Computation to energy consumption ratio.
So essentially the way doing this in silicon is just pointless.<p>There must be a reason we can do so much while consuming so little, and then again struggling with other tasks.<p>What is the success if we build a machine that consumes just heaps of energy and then is as bad in maths as us?
Really stellar, well-sourced article that comes across as unbiased as possible. I especially enjoyed the almost-throw-away link to "the bitter lesson" near the end, the gist of which is: "Methods that leverage massive compute to capture intrinsic complexity always outperform humans' attempts to encode that complexity by hand"
I'm not sure how one can percentage-wise compare scaling and algorithmic advances - per Dwarkesh's prediction that "70% scaling + 30% algorithmic advance" will get us to AGI ?!<p>I think a clearer answer is that scaling alone will certainly NOT get us to AGI. There are some things that are just architecturally missing from current LLMs, and no amount of scaling or data cleaning or emergence will make them magically appear.<p>Some obvious architectural features from top of my list would include:<p>1) Some sort of planning ahead (cf tree of thought rollouts) which could be implemented in a variety of ways. A simple single-pass feed forward architecture, even a sophisticated one like a transformer, isn't enough. In humans this might be accomplished by some combination of short term memory and the thalamo-cortical feedback loop - iterating on one's perception/reaction to something before "drawing conclusions" (i.e. making predictions) based on it.<p>2) Online/continual learning so that the model/AGI can learn from it's prediction mistakes via feedback from their consequences, even if that is initially limited to conversational feedback in a ChatGPT setting. To get closer to human-level AGI the model would really need some type of embodiment (either robotic or in a physical simulation virtual word) so that it's actions and feedback go beyond a world of words and let it learn via experimentation how the real world works and responds. You really don't understand the world unless you can touch/poke/feel it, see it, hear it, smell it etc. Reading about it in a book/training set isn't the same.<p>I think any AGI would also benefit from a real short term memory that can be updated and referred to continuously, although "recalculating" it on each token in a long context window does kind of work. In an LLM-based AGI this could just be an internal context, separate from the input context, but otherwise updated and addressed in the same way via attention.<p>It depends too on what one means by AGI - is this implicitly human-like (not just human-level) AGI ? If so then it seems there are a host of other missing features too. Can we really call something AGI if it's missing animal capabilities such as emotion and empathy (roughly = predicting other's emotions, based on having learnt how we would feel in similar circumstances)? You can have some type of intelligence without emotion, but that intelligence won't extend to fully understanding humans and animals, and therefore being able to interact with them in a way we'd consider intelligent and natural.<p>Really we're still a long way from this type of human-like intelligence. What we've got via pre-trained LLMs is more like IBM Watson on steroids - an expert system that would do well on Jeopardy and increasingly well on IQ or SAT tests, and can fool people into thinking it's smarter and more human-like than it really is, just as much simpler systems like Eliza could. The Turing test of "can it fool a human" (in a limited Q&A setting) really doesn't indicate any deeper capability than exactly that ability. It's no indication of intelligence.