Let me cheat a bit and say Unicode comes in three flavors: UTF-8, UCS-2 aka UTF-16, and UTF-32. UTF-8 is byte-oriented, UTF-16 is double-byte oriented, and UTF-32 nobody uses because you waste half the word almost all of the time.<p>You can't reduce the <i>bytes</i> in UTF-8 or UTF-16, because you'll scramble the encoding. But you could parsing the string, codepoint-at-a-time, handling the specifics of UTF-8, or UTF-16 with its surrogate pairs, and reversing those. This sounds equivalent to reversing UTF-32, and I believe is what the original poster was imagining.<p>Except you can't do that, because Unicode has composing characters. Now, I'm American and too stupid to type anything other than ASCII, but I know about n+~ = ñ. If you have the pre-composed version of ñ, you can reverse the codepoint (it's one codepoint). If you don't have it, and you have n+dead ~, you can't reverse it, or in the word "año" you might put the ~ on the "o". (Even crazier things happen when you get to the ligatures in Arabic; IIRC one of those is about 20 codepoints.)<p>So we can't just reverse codepoints, even ancient versions of Unicode. Other posters have talked about the even more exotic stuff like Emoji + skin tone. It's necessary to be very careful.<p>Now, the old fart in me says that ASCII never had this problem. But the old fart in me knows about CRLF in text protocols, and that's never LFCR; and that if you want to make a ñ in ASCII you must send n ^H ~. I guess you can reverse that, but if you want to do more exotic things it becomes less obvious.<p>(IIRC UCS-2 is the deadname, now we call it UTF-16 to remind us to always handle surrogate pairs correctly, which we don't.)<p>TLDR: Strings are hard.