TL;DR : Characters and Strings considered harmful.<p>And he's right, they totally are ! (Also, 'string' can mean an ordered sequence of similar objects of any kind, not just characters.)<p>But (as these discussions also mention) replacing them by much more clearly defined concepts like byte arrays, codepoints, glyphs, grapheme clusters and text fields is only the first step...<p>The big question (these days) is what to do with text, specifically the 'code' kind of text (either programming or markup, and poor separation between 'plain' text and code keeps causing security issues).<p>To start with, even code needs formatting, specifically some way to signal a new line, or it will end up unreadable.<p>Then, code can't be just arbitrary Unicode text, some limits have to apply, because Unicode can get verrrry 'fancy' !
(Arbitrary Unicode is fine in text fields and comments embedded in code.)<p>So, I'm curious, is there any Unicode normalization specifically designed for code ? (If not, why, and which is the closest one ?)<p>I'm thinking of Python (3), which has what seems to be a somewhat arbitrary list of what can and what can't be used as a variable name ? (And the language itself seemingly only uses ASCII, though this shouldn't be a restriction for programming/markup languages !)<p>Also I hear that Julia goes much further than that (with even (La)TeX-like shortcuts for characters that might not be available on some keyboards), what kind of 'normalization' have they adopted ?