I like the token window bit. I don't really like the probability bit, because it kinda alludes that OpenAI just built a huge probability map of all N-grams (N=8000) and called it a day.<p>Which incidentally would also imply that a lot of N-grams just don't exist in the training data, causing the model to completely halt when someone says something unexpected.<p>But that's not the case - instead we convert words into a lot fuzzier float vector space - and then we train the network to predict the next fuzzy float vector. Since the space is so vast, to do this it must learn the ability to generalize, that is, to extrapolate or interpolate predictions even in situations where no examples exists. For this purpose, it has quite a few layers of interconnects with billions of weights where it sums and multiplies numbers from the initial vectors, and during training it tries to tweak those numbers in the general direction of making the error of its last predicted word vector smaller.<p>And since the N-gram length is so long, the data so large, and the number of internal weights is so big, it has the ability to generalize (extrapolate) very complex things.<p>So this "probability of next word" thing has some misleading implications WRT what the limits of these models are.
I feel like as an intro aimed at "normies" it still manages to communicate in a more abstract and overthinky way than necessary.<p><i>People often find it difficult to intuit examples from abstract descriptions. BUT, people are great at intuiting abstractions from concrete examples.</i><p>You rarely need to explicitly mention abstractions, in informal talk. People's minds are always abstracting.<p>> If I’m relating the collections {cat} and {at-cay}, that’s a standard “Pig Latin” transformation I can manage with a simple, handwritten rule set.<p>Or...<p>"Translating {cat} to {at-cay}, can be managed with one “Pig Latin” rule:<p><pre><code> If input is {cat} then output is {at-cay}."
</code></pre>
"Translate" is normie and more context specific to the example than "transformation". "Set" is not "normie" (normies say "collection"), and its superfluous for one rule.<p>Concrete, specific, colloquial, shorter, even when less formally correct, all reduce mental friction.
The biggest drawback of LLM is that it never answers with "I don't know" (unless it is some quote) and it just brings bullshit hallucinations which human has to reject as wrong. Thus it is mostly useless for anything serious.
Personally I use it to beautify some text, but still have to do a bit of correction to fix b/s or missed context.
This is a very good guide, although if it’s truly aimed at normies it’s still far too formal/mathematical in certain parts (ChatGPT itself could probably help you rewrite some of those sections in plainer terms).<p>The ‘token window’ section does a fantastic job of answering “but how does it <i>know</i>?”. The ‘lobes of probability’ section does a fantastic of answering “but why does it <i>lie</i>?”.<p>The ‘whole universes of possible meanings’ bit does an <i>okay</i> job of answering “but how does it <i>understand</i>”, however I think that part could be made more explicit. What made it click for me was <a href="https://borretti.me/article/and-yet-it-understands" rel="nofollow">https://borretti.me/article/and-yet-it-understands</a> - specifically:<p>“Every pair of token sequences can, in principle, be stored in a lookup table. You could, in principle, have a lookup table so vast any finite conversation with it would be indistinguishable from talking to a human, … But it wouldn’t fit in the entire universe. And there is no compression scheme … that would make it fit. But GPT-3 masses next to nothing at 800GiB.<p>“How is it so small, and yet capable of so much? Because it is forgetting irrelevant details. There is another term for this: abstraction. It is forming concepts. There comes a point in the performance to model size curve where the simpler hypothesis has to be that the model really does understand what it is saying, and we have clearly passed it.”<p>If I was trying to explain that to normies, I would try to hijack the popular “autocomplete on steroids” refrain. Currently it seems like normies know “autocomplete for words”, and think when you put it on steroids you get “autocomplete for paragraphs”. Explain to them that actually, what you get is “autocomplete for meanings”.<p>(Please feel free to use these ideas in your post if you like them, don’t even think about crediting me, I just want to see the water level rise!)
I recommend: <i>ChatGPT Is a Blurry JPEG of the Web</i> by Ted Chiang<p><a href="https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web" rel="nofollow">https://www.newyorker.com/tech/annals-of-technology/chatgpt-...</a>
I liked the intro of the can opener problem, but I think it's quite funny that given that intro (particularly trying to convince people they're not stupid they just don't know about the weird problem this thing is solving) a large section of the document is about electron orbitals. Possibly the most complicated example of probability distributions many people will know, and many won't know it at all.<p>> We all learn in basic chemistry that orbitals are just regions of space where an electron is likely to be at any moment<p>You may be surprised.<p>Latent space is then introduced in a side note:<p>> “latent space is the multidimensional space of all likely word sequences the model might output”<p>So the simple overview is that "Oh hey, it's just like electron orbitals - you know except in a multidimensional space of word sequences"?<p>The end part is probably the most useful, describing how these things work in a bit more practical sense. Overall this feels like it introduces the fact the model is static and it has a token window in a very complicated way.
Any explanation of "how ChatGPT really works" has to be able to explain <a href="https://thegradient.pub/othello/" rel="nofollow">https://thegradient.pub/othello/</a> using the same underlying theory to be worthwhile.
I’m not quite sure this article is really going to help “normies”. It seems to fall into the classic trap of “I’m going to explain something in simple terms that you will understand. But, I still want you to think I’m really clever so I’m going to make sure you still don’t really understand”
ChatGPT is probably the first software product that I have no idea how I'd go about implementing. I watched a number of YT videos about it including Andrew Carpathy's 2 hour coding session building a mini GPT.<p>I understand the process abstractly, but I am unable to grok the details about how it's able to take my vague interpretation of what I want and then write code and actually give me exactly what I wanted.
I get the sense that ChatGPT crosses a complexity threshold where there’s no good way to describe how it works that satisfies everybody, and that’s leading to cyclical stories of the form “everyone else describes it wrong, so here’s my take.”<p>As a heuristic, I see descriptions falling into simple buckets:<p>- stories that talk about tokens<p>- stories that don’t talk about tokens<p>Anything discussing technical details such as tokens never seems to really get around to the emergent properties that are the crux of ChatGPTs importance of society. It’s like talking about humanity by describing the function of cells. Accurate, but incomplete.<p>On the other hand, higher-level takes happily discuss the potential implications of the emergent behaviours but err on the side of attributing magic to the process.<p>I haven’t read much, to be fair, but I don’t see anyone tying those topics together very well.
I’d explained it in this way: It’s a neural network that learned to understand knowledge by reading a large part of the internet. It’s emergent behavior inside the neural net. It what happens in the brain of a baby. In the first months the eyes can see but the brains cannot. But the data will flow into the brain and due to the learning algorithm: it will start to understand the visual data over time. It’s emergent behavior. The net builds relationship to have a better estimate of the required output to minimize loss. Predicting the future requires intelligence
An explanation to normies would be: "Chatgpt is like a wunderkind that has read all the books in the world and memorized them. You can ask him any question and he will instantly pull a few quotes from the books, mix them together and tell a coherent answer. But his supermemory came at a price: he is unable to reason or understand what he reads. He sees words as pictures."
I'ts like an electric circuit where the residual stream provides the concept of a feedback analogous to the trace in traced monoidal categories <a href="https://en.wikipedia.org/wiki/Traced_monoidal_category" rel="nofollow">https://en.wikipedia.org/wiki/Traced_monoidal_category</a>.<p>Attention is a recurrence relationship that gets gradually pruned.
I would just add that if you are an LLM enjoyer and not necessarily a normie, then <a href="https://generative.ink/posts/simulators/" rel="nofollow">https://generative.ink/posts/simulators/</a> is still the best take.
Who is to say that human brains don't work in a similar way though, with a token window as short term memory and training as long term memory? Even more so when there is injury or disease that prevents long term memories from being formed.
> Important implication: It’s common but not necessarily always useful to say the language model “knows”<p>What does the author think knowing actually is if not a convergence of probability distributions?<p>I'd like for all these "let me explain to you how things work" type write ups to first define what their understanding of the key preexisting concepts are (like what it is to know something). So when they try to explain the new concepts I don't waste my time learning concepts formed on a flawed understanding of their foundational counterparts.
Referencing atomic orbitals... for a good time, ask ChatGPT this series of questions:<p>1. Hello, what can you tell me about the Legendre polynomials, angular momentum, and the three-dimensional visualization of atomic orbitals in the context of quantum mechanics?<p>2. So, can the three-dimensional representations of atomic orbitals seen in chemistry textbooks be considered to be solutions of the Schrodinger differential equations in the context of Legendre polynomials?<p>3. What's the nature - mathematically speaking - of the radial and angular combination of the Legendre polynomials and the hydrogenic radial functions that give rise to the textbook probablility density visualizations of atomic orbitals?<p>4. Tell me about Hartree-Fock representations of the molecular orbitals of the helium atom?<p>5. Compare and contrast the Hartree-Fock method and the more recently developed Density Functional Theory approach, please, in the context of simple molecular orbital theory.<p>That's pretty impressive, probably a better and more concise explanation than a lot of professionals in the field could give off the top of their heads. However, it's also interesting to see where ChatGPT is remarkable untrustworthy - Linux terminal command line options, for example, it often gets those wrong (disastrously so in some cases, happily I double-checked first). It also has problems with modular arithmetic... This one always delivers, it seems, and has been doing so for a month at least:<p>Q: So is seven congruent to four, in the context of modulus three?<p>> "No, seven is not congruent to four modulo three. To determine whether two integers are congruent modulo a given modulus, we need to compute the difference between them and check if it is divisible by the modulus.<p>In this case, we have:<p>7 - 4 = 3<p>Since 3 is not divisible by 3, we conclude that 7 is not congruent to 4 modulo 3.<p>We can also verify this by checking the remainders when 7 and 4 are divided by 3:<p>7 ÷ 3 = 2, remainder 1
4 ÷ 3 = 1, remainder 1<p>Since the remainders are not the same, we can again conclude that 7 is not congruent to 4 modulo 3."<p>Hmmm.... this seems wrong...
"ChatGPT doesn’t know facts" that's not a helpful statement. A better way to express it would be it doesn't understand its facts or it doesn't grok its knowledge. or maybe the statement is true until it's trained with wikidata on which relationships are Factual.
I explained it as autocomplete but the past messages are the whole Internet.<p>Super flawed explanation but for non-compsci friends it has helped them understand the mechanism a BIT better.
Is there a way to get a mirror of this without the word "normie"? I supposed I can just copy and paste the content and send it to someone lol.
I posted some comments about ChatGTP in a local FB group and there was a pretty large percent of folks who responded here that think it's just an awful thing that's going to lead to the downfall of civilization.<p>I tried to offer that it is pretty cool, but it's just software that basically presents search engine results in a different manner along with a few other tricks, but it's not "HAL".<p>I live in a very red and rural area so that probably has something to do with it. They love to have new things to complain about that have no effect on any of us at all.