PHP-centric, not mentioned in the title. Most of it is relevant to everybody, but it is jarring to run into stuff about PHP. Isn't that dead yet?<p>Of greater moment is that the article keeps talking about "characters", which is an undefined term in Unicode. Unicode offers you code points, code units, graphemes, grapheme clusters, and ... other things, none of which maps to the grouping of dots you see on your screen (and probably cannot imagine how to type in).<p>"Character" has outlived its sell-by date. Let it be retired and buried with dignity, but with a good thick slab of concrete on top.<p>It also fails to mention "expanded form" and "canonical form", and other ways that two completely different sequences of bits mean, at some level, the same text. Different forms are convenient for different things; there is a shortest possible representation nice for sending and storing, and a maximally decomposed representation that might be best for editing if you like adding and removing diereses ("umlauts") and accents piecemeal.<p>And it fails to mention WTF-8, a way to package up byte sequences that are not valid UTF-8, but may have valid UTF-8 characters that you want to display in case they offer the poor human a clue as to what was intended. WTF-8 sequences often arise in file systems and databases that don't enforce any particular encoding, but just store whatever bytes the benighted programs users run provide as, e.g., names for files. You <i>wish</i> you could display them in sorted order. There had better be a way to point at it, because there is no way to type it. But you have to store it, because that is the only way to tell the OS which file you wanted to rename or delete. Deletion is tempting, but we can't, always, can we?.