Oh my point was NOT that LLMs can't simulate a Turing machine reliably. I was just replying to the tweet above. The main point of this (speculative) text is that GPTs <i>can</i> "evolve a general learner" inside them, but are unable to give enough computing power to it, since it will dispute dispute compute and space with billions of task-specific circuits. Also, even if it could, the "computer" on which learned procedures run (inside transformers) is too limited in many senses, but not being Turing complete isn't the main issue, just one of them.<p>Basically I'm saying that (on <i>my</i> view) GPTs is kinda growing an AGI procedure inside it, but that AGI is restricted by rigid computational constraints imposed by the GPT architecture. I think something slightly more flexible might be capable of filtering out the noise more efficiently and replicate what GPTs do with an absurdly small fraction of the cost.<p>So, to be very clear, the point has nothing to do with Turing completeness and more to do with the ultra-limited computation and expressivity granted to internally learned functions. It is a small distinction, but a non-Turing complete model can still be extremely expressive.
somewhat unrelated to what the article is about but:<p>LLMs can't do anything reliable AFIK<p>they are too much probabilistic to be fully "reliable"<p>at least if we take the definition of reliable from well written programs or even proof based programs<p>if we take the definition of reliable from idk. how reliable a underpayed and not well treated employee is then ... it might be a different thing
How about this as a nice reductive example of the things that current LLMs struggle/fail at:<p><a href="https://x.com/VictorTaelin/status/1776677635491344744" rel="nofollow">https://x.com/VictorTaelin/status/1776677635491344744</a><p>This guy is offering a $10K prize if anyone can prompt an LLM to reliably (by itself - no tool use) repeatedly apply some simple letter-pair substitution rules to strings of up to 12 characters.<p>The rules to be applied (to strings composed of the letters A, B, X & Y) are:<p>1) AX -> delete<p>2) AY -> YA<p>3) BX -> BY<p>4) BY -> delete<p>In his Tweet he uses A# #A B# #B rather than A X B Y, but has clarified it makes no difference.<p>Try it - it's remarkably hard. You should be able to get the LLM to do a few short ones, but try getting it to be reliable or do longer ones.<p>So, Turing? Not going to happen!
Neither will the vast majority of humans for any reasonable length of time unless you apply significant amount of force and incentives for staying on task.<p>If you want something that will "mindlessly" execute repetitive steps, anything trained to try to act in any way like how humans do would seem an exceedingly bad choice.
Deepmind did a study of which language classes different neural network architectures can learn with generalization: <a href="https://arxiv.org/abs/2207.02098" rel="nofollow">https://arxiv.org/abs/2207.02098</a> ("Neural Networks and the Chomsky Hierarchy")<p>LSTMs could learn slightly deeper into the Chomsky hierarchy than transformers, neural turing machines could learn them all including recursively enumerable (context free grammar or turing machine to produce).<p>Transformers did outperform LSTMs in something in an alternate hierarchy if I remember.<p>It's not conclusive since it is not training indefinitely, there are formal proofs that many architectures can do things that they can't actually effectively be trained for, but they threw a lot of compute at it.
>To get a true AGI, we would need some kind of architecture that is able to not just learn functions by example, but also grant these functions all the computing power they need to run and thrive.<p>Computers can’t simulate Turing machines reliably either, and for the same reason.<p>This is probably true for humans also.
LLMs sample each next token from a probability distribution. So there's an inherent random process at work. As long as your LLM's "next Turing machine step" output token has anything less than 100.000% odds of being the correct one, you'll get mistakes and unreliability.<p>You could change your sampling to top-token-only. But (and maybe this is hand-wavey) I still suspect the incorrect token will be sometimes predicted on top.
After making several attempts at figuring it out, I'm giving up. Takes too much thought. Guess I'm not an "average" person, and even my MS in CS is useless :shrugs: