The problem that arises when you fail to interpret an ambiguous statement is that you end up on the wrong semantic path, and start asking and answering the wrong questions.<p>That's what's happening all around LLMs; and it's even happening right here in this post.<p>The very word AI is ambiguous. Does it denote a category of pursuit (AI research), or the end goal realized (<i>an</i> AI)? This is an incredibly important distinction that is entirely glossed over every time someone uses "AI" in the title of an LLM. Is the LLM simply in the category of AI research, or is it actually <i>an Artificial Intelligence</i>? It's the category. That's obvious, isn't it? It really should be; because otherwise we are treading down the wrong semantic path, and talking about an <i>LLM personified</i>, instead of an <i>LLM in reality</i>.<p>This problem goes even deeper. The very name "Large Language Model" is ambiguously misleading in the same way. Does "Language" define the content being modeled or the model itself? In this case, neither: which is where this conversation gets really interesting.<p>The entire <i>purpose</i> of an LLM is to process language without completely falling apart at ambiguity. An LLM accomplishes this by doing something else entirely: it process <i>text</i> instead.<p>To understand this, we need to understand the key limitation of traditional language parsing: it must always be literal. Everything written must be <i>unambiguously defined</i>, or the parser will fail to read it. This means that parsers are limited to the category of "context-free grammar".<p>Natural Language is in the category of "context-dependent grammar". It can contain ambiguity, which may be resolved with context.<p>An LLM doesn't do that. In fact, an LLM doesn't <i>define</i> anything at all! LLMs didn't overcome the limitation of parsing: they flipped it around: an LLM can <i>never</i> be literal. Instead, it must always be <i>literary</i>. Let me explain what I mean by that:<p>To construct an LLM, we start with a training corpus: lots of text. That text goes through a single traditional parsing step: tokenization. This isn't strictly necessary, but it's more efficient, and the rest of the process is anything but. Unlike traditional parsers, tokens are intentionally misaligned with grammar: words are split into separate tokens, like "run,ning".<p>Now that we have tokens to work with, machine learning begins. A Neural Net is trained with them. The tokens are fed in order to the NN, and the result is a model.<p>We call that model a "Large Language Model", because we <i>hope</i> it contains the patterns language is made of. This is a mistake: the model is <i>so much more interesting</i> than that!<p>An LLM contains patterns that we can recognize as language grammar <i>and</i> patterns we don't understand at all! It didn't model language: it went one step higher on the ladder of abstraction, and modeled <i>text</i>.<p>There are many ways to write an idea into language, but we can only use one at a time. That decision is part of the data we feed into the LLM's training corpus. We can't write all of our ideas at once: we must do one at a time <i>in order</i>. All of that is data in the training corpus.<p>Parsers deal with grammar. Language Grammar is <i>everything that can be written</i> in a language. An LLM doesn't have a clue what we <i>could have</i> been written: it only sees what <i>was</i> written. That's why the model must be "Large": without examples, a valid language pattern doesn't exist.<p>This is where the nature of ambiguity intersects with the nature of LLMs: what example do we want? Can we choose?<p>When we give an LLM a prompt, it gives us a continuation. There were many valid possible continuations: how did it choose just one? It didn't. It isn't working in the realm of <i>possible</i>: it's working in the realm of <i>known</i>. The choice was made all the way back before the LLM was even trained: back when a person wrote an idea into language, into text, that would eventually be used in the training corpus. The content of the training corpus is what resolves ambiguity: not the language model itself.<p>LLMs don't model ambiguity or even language: they model the text they are trained with, ambiguity included. This is a fundamental feature.