> the 4 bytes sequences (Supplementary Planes) are very rarely used (most used languages and useful symbols are already in the Basic Multilingual Plane) and also more complicated so I did not vectorize that case for the scope of this blog post. This is left as an exercise for the reader ;-)<p>And you still call them UTF-8 and UTF-16?