From the article:<p>> Each place name is represented by a UTF-8 (en.wikipedia.org/wiki/UTF-8) text line record (variable length) with more than 15 tab-separated data columns. <i>Note: The UTF-8 encoding assures that a tab (0x9) or line feed (0xA) value won’t occur as part of a multi-byte sequence; this is essential for several implementations.</i><p>What? I guess that they use a longer (2-byte?) encoding for those codepoints, but from the very same wikipedia page that they link:<p>> a sequence that decodes to a value that should use a shorter sequence (an "overlong form") [is invalid]<p>...<p>> Implementations of the decoding algorithm MUST protect against decoding invalid sequences<p>Are they advising to use an invalid and potentially broken UTF8 encoding?