With the benefit of hindsight, if the standard was done over again either a lot of unnecessary difficulties mentioned in this article should be avoided or the standard split in two. It is arguably too ambitious, too inefficient, and too unrealistic for a number of these things to be handled as recommended in all contexts. There are many examples - in operating system kernels and system programming in general, to name two.<p>To start with, alternative representations for the same visibly identical character should have been excluded. In the base standard all supported characters should be precomposed - no modifiers. The article points out the difficulties in something as simple (and as security critical) as determining whether two strings refer to the same visibly identical characters.<p>A character set that does not make that trivial is not suitable for general use in system programming or for security critical identifiers (unfortunately). It massively complicates programming in many programming languages as well, and even UTF-32 is not sufficient to remedy the problem, as the article well notes.<p>The inability to handle and process any arbitrary string of bytes in a universal character set is a serious problem as well. The world is complicated, and the inability to pass through incorrectly coded data, alternatively coded data, and general binary data is a major limitation with serious consequences. System code such as device drivers or filesystems typically can't deal with inefficiencies or limitations like that.<p>In addition, there probably should be two types of uppercase and lowercase conversions, a simple, predictable one for system programming, and a more complex one that deals with considerations in languages that do not follow the normal rules.<p>String collation should be done at two levels as well, a simple code point level suitable for such things as prefix matches on database indexes, and a more complex variation for applications where linguistically sensitive collation is critical.<p>In general trying to solve all the higher end use cases in a large body of software that do not need to deal with them, should not need to deal with them, or cannot deal with them is impractical and has exacerbated the common string processing issues we see today. A lightweight standard - perhaps a subset or profile of Unicode that supported arbitrary binary data as opaque codepoints - could be helpful in a lot of contexts.