I have reservations about several aspects of this article but what sits least well is the substantial conclusions regarding AI compression loss. In short, I disagree.<p>The "Hill" example in the text is easily understandable by the author's own presented concept of "text-based, word placement, associative semantics" vs "semantics by definition" we obviously use semantics in the latter sense, hence it's definition, the AI doesn't.<p>Semantic word relationships identified by GPT3 are based on the frequency of words prior to and following a position in a sentence/text – as presented to the AI in the programmed/learned dataset. Another easier example is when information is known to be untrue. If I include 1,000 written examples of the words “Jack and Jill ran down the volcano” than my AI will be incorrectly answering prompts to finish the nursery rhyme. How many instances of providing users the wrong answer or number of analyzed writings with the correct “ran down the HILL” text before my 1,000 false volcano statements ceases to be the most probable and likely accepted answer.<p>So, like the article example, if asked "Where was John Smith born?" the AI sees that it has to answer definitively bc its been asked question, so it's going to make a statement it concludes to be the most probable acceptable answer to the prompt - it doesn't see the prompt as a question, the words are not defined as ideas of themselves, nor do the sum of the words present an idea. Word definitions are not really part of the answer process. The AI checks it's dataset and knows all the related word examples it's previously identified through its handy token system – that self-controlled tokenization of memory for storage & retrieval, further removes this from a human brain-like function/process we can empathize with.<p>Anyways, the AI knows that most statements following words arranged like the question in the prompt include words/textual identifiers not used similarly used in other places of texts – Names! Imagine how it figured out names without understanding definitions conceptually by using frequency of word composition and grammar structure alone. Even though it “knows” the definition of the word “name” that definition is just more words – it has no meaning without context. Prompts provide context not definitions.<p>There are 3 names in the example prompt question, first/last name and the name/word required as an answer – the location! None of these words the AI see as a person's name or the name of a place, it sees our question as "word that requires my reply to be a definitive statement, primary “name word”, secondary “name word” + "born" --- it knows a different type of name word (a place) is the most plausible next word in sequence because it has a whole token dedicated just to birthdates, with limitless examples. Upon searching its dataset for John Smith, identifies "Hill" as the name word most often associated with the words “where” “was” “john” “smith” “born” used consecutively. The incorrect city, Hill, makes sense given his academic career obviously generating more digital information than his birth announcement/obituary in the hometown paper.<p>Regarding the wrong date – the AI was never actually answering a question and never made any statements with intent to be truthful. The incorrect birthdate is simply the most probable date given the incorrect "John Smith born in Hill" statement. It couldn't present the correct date following the word “Hill” bc no such examples existed with a higher probability of being acceptable than the incorrect date given. In fact, given the incorrect semantic links made early in the answer, an incorrect date was most probable.<p>None of that is compression loss. It's just an AI being an AI, doing exactly what it does. I think it's obvious, based solely on what the authors presented themselves, that it's in fact recalling everything - only arriving at a failed answer due to differing expectations of what the answer was. The AI delivered the most probable reply to the prompt, given the contextual data available to it – the same way it delivers answers we expect and are factually correct. It didn’t draw incorrect conclusions bc it chunked everything it learned up and consequentially “lost” some its “memories” in the process.<p>Programming AI with facts, or only factual information, doesn't solve the problem at all - operating with only factual data would help it regurgitate a "born in this town on this day" type answer more correctly but only bc the token words identified as correlated, in a factual text, do in fact have actual correlation. That only increases the probability of an AI arriving at a "correct" reply/answer while still using the flawed “logic” that allowed for these errors to occur. An AI that speaks only true statements will still have no actual concept of truth.<p>“Prediction leads to compression, compression leads to generalization, generalization leads to computer intelligence.” - quote from the article.<p>I know when ppl do this, memory chunking, we do lose stuff – why do we assume that is true for an AI also? What exactly are they compressing? Our memories are filled with lots and lots of data beyond the reason a memory is a memory. The background clutter, noises of a crowd, cars driving by or what lunch was that day, are not necessary to recall the memory of your first kiss, for example – unless you were in a crowded cafeteria at a racetrack, that might be all you recall then. A kiss, loud crowd, race cars – from an entire day of activity, those highlights will be all that remain in time. We need to do that - even with that feature we still forget important things.<p>What background noise is an AI having to “chunk” away? The parameters for it too broadly set? - narrow them. If it sees too much, we tell what not to see. If it's capacity to effectively store and utilize information encountered as it exists, than we have failed to create an effective AI.<p>If an AI “reads” a 500 page paper and tokenizes the data – what makes you so sure it cannot recall exactly all 500 words from that token alone?<p>AI compression loss, in a tokenized type system, would have to derive from the further compression of the tokens themselves or failure with the token system.<p>Just my quick 1,000+ words<p>Yeah... sry for the book.<p>Tl;dr -<p>I find the idea of AI being wrong due to “compression loss” to be a silly concept.<p>We should avoid humanizing AI and AI learning – all similarity lies on the surface.<p>Thanks for reading my rant – have a great day! - Jakksen