I find it interesting how far linguistic, and experienced based approaches have fallen out of fashion. Humans don't read character by character, even if we <i>can</i> it's not a standard operating mode. We have word stems and understand modifications by endings. Tokenization doesn't replicate this experience (seriously, look at the tokens that appear in LLM vocabularies), nor does character or byte encoding. Humans have multiple ways to parse words. You can grok a full sentence, read a phrase, read word by word, or sound out a new word character by character. Very few papers explicitly claim that a method is good because it replicates the way a human would perform a task, or perceive the world.<p>I suspect as LLM reliance increases we'll want to align the models to our experience more closely. I further suspect this will make the errors that models make more comprehensible.