ITT an awful lot of smart people who still don't have a good mental model of what LLM are actually doing.<p>The "stochastic continuation" ie parrot model is pernicious. It's doing active harm now to advancing understanding.<p>It's pernicious, and I mean that precisely, because it is both technically accurate yet deeply unhelpful indeed actively, intentionally AFAICT, misleading.<p>Humans could be described in the same way, just as accurately, and just as unhelpfully.<p>What's missing? What's missing is one of the <i>gross</i> features of LLM: their interior layers.<p>If you don't understand what is necessarily transpiring in those layers, you don't understand what they're doing; and treating them as black box that does something you imagine to be glorified Markov chain computation, leads you deep into the wilderness of cognitive error. You're reasoning from a misleading model.<p>If you want a better mental model for what they are doing, you need to take seriously that the "tokens" LLM consume and emit are being converted into something else, processed, and then the output of that process, re-serialized and rendered into tokens. In lay language it's less misleadly and more helpful to put this directly: they extract semantic meaning as propositions or descriptions about a world they have an internalized world model of; compute a solution (answer) to questions or requests posed with respect to that world model; and then convert their solution into a serialized token stream.<p>The complaint that they do not "understand" is correct, but not in the way people usually think. It's not that they do not have understanding in some real sense; it's that the world model they construct, inhabit, and reason about, is a flatland: it's static and one dimensional.<p>My rant here leads to a very testable proposition: that deep multi-modal models, particularly those for whom time-base media are native, will necessarily have a much richer (more multidimensional) derived world-model, one that understands (my word) that a shoe is not just an opaque token, but a thing of such and such scale and composition and utility and application, representing a function as much as a design.<p>When we teach models about space, time, the things that inhabit that, and what it means to have agency among them—well, what we will have, using technology we already have, is something which I will contentedly assert is undeniably a <i>mind</i>.<p>What's more provocative yet is that systems of this complexity, which necessarily construct a world model, are only able to do what they do because they have a <i>self-model</i> within it.<p>And having a self-model, within a world model, and agency?<p>That is self-hood. That is personhood. That is the substrate as best we understand for self-awareness.<p>Scoff if you like, bookmark if you will—this will be commonly accepted within five years.