Here's my theory:<p>Consider a typical LLM token vector used to train and interact with an LLM.<p>Now imagine that other aspects of being human (sensory input, emotional input, physical body sensation, gut feelings, etc.) could be added as metadata to the the token stream, along with some kind of attention function that amplified or diminished the importance of those at any given time period -- all still represented as a stream of tokens.<p>If an LLM could be trained on input that was enriched by all of the above kind of data, then quite likely the output would feel much more human than the responses we get from LLMs.<p>Humans are moody, we get headaches, we feel drawn to or repulsed by others, we brood and ruminate at times, we find ourselves wanting to impress some people, some topics make us feel alive while others make us feel bored.<p>Human intelligence is always colored by the human experience of obtaining it. Obviously we don't obtain it by getting trained on terabytes of data all at once disconnected from bodily experience.<p>Seemingly we could simulate a "body" and provide that as real time token metadata for an LLM to incorporate, and we might get more moodiness, nostalgia, ambition, etc.<p>Asking for a theory of mind is in fact committing the Cartesian error of making a mind/body distinction. What is missing with LLMs is a theory of mindbody... similarity to spacetime is not accidental as humans often fail to unify concepts at first.<p>LLMs are simply time series predictors that can handle massive numbers of parameters in a way that allows them to generate corresponding sequences of tokens that (when mapped back into words) we judge as humanlike or intelligence-like, but those are simply patterns of logic that come from word order, which is closely related in human languages to semantics.<p>It's silly to think that we humans are not abstractly representable as a probabilistic time series prediction of information. What isn't?