ASCII is by far the most successful character encoding that computers have used. It was invented in 1963, back in the era of punch cards and core memory. Modern RAM did not exist until 1975 -- a decade later.<p>Unicode is the replacement, not the competitor, like 64-bit IP addresses are the replacement for 32-bit IP addresses. It was developed in the early 1990s when RAM got cheap enough that you could afford two-bytes per character.<p>Personally, I deal with data all the time and rarely encounter unicode. Of course, I'm in the US dealing with big files out of financial and marketing databases. In fact, I've seen more EBCDIC than UNICODE.
I really hate to nitpick, but the article implies that ASCII was the first character encoding. In fact, there was a rich history of different encodings before that, with different word sizes and/or incompatible 8 bit encodings. It's quite interesting to look back and see what trade-offs were made and why.
The fact that UTF-8 and UTF-16 are often exposed to programmers when dealing with text is a major failure of separation-of-concerns. If you had a stream of data that was gzipped, would it ever make sense to look at the bytes in the data stream before decompressing it? Variable-length text encodings are the same. Application code should only see Unicode code points.<p>In general it was a mistake to put variable-length encodings into the Unicode standard. A much better design would have been to use UTF-32 for the application-level interface to characters, and use a separate compression standard that is optimized for fixed alphabets when transporting or storing text. This has the advantage that the compression scheme can be dynamically updated to match the letter frequencies in the real-world text, and it logically separates the ideas of encoding and compression so that the compression container is easier to swap out. And, of course, an entire class of bugs would be eliminated from application code.<p><i>Edited first paragraph to clarify: </i>Variable-length <i>text encodings are the same.</i>
I'm impressed. Easily readable and understandable, short and as far as I can tell no factual inaccuracies and wrong information (unlike many other Unicode introductions and tutorials).
>ASCII really should have been named ASCIIWOA: the American Standard Code for Information Exchange With Other Americans.<p>So he thinks Americans are the only people to use the <i>English</i> language does he?
Another good article on this topic is the one by Joel Spolsky:<p><a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">http://www.joelonsoftware.com/articles/Unicode.html</a>
<i>"Designed as a single, global replacement for localised character sets, the Unicode standard is beautiful in its simplicity. In essence: collect all the characters in all the scripts known to humanity and number them in one single, canonical list. If new characters are invented or discovered, no problem, just add them to the list. The list isn’t an 8-bit list, or a 16-bit list, it’s just a list, with no limit on its length."</i><p>Is this really true? My impression was that UTF-32 is a fixed-length encoding which uses 32 bits to encode all of Unicode. It seems that this means that Unicode can never have more code points than could fit in 32 bits. Right?
> These mappings of numbers to characters are just a convention that someone decided on when ASCII was developed in the 1960s. There’s nothing fundamental that dictates that a capital A has to be character number 65, that’s just the number they chose back in the day.<p>I don't think it's mere coincidence that the capital letters start at 65 and the lower case at 97 and the decimal digits at 48.
It's not a matter of winning or loosing. The pre-unicode mix of character sets was a mess when it came internationalization. Try truncating a Japanese Shift-JIS string in C. That will learn you..
OT and out of curiosity...how do non-native English speakers experience typing/keyboard education? I can barely remember how to make any of the basic accents over the `e` when trying to sound French...are typing classes in non-English schooling systems much more sophisticated than in English (i.e. ASCII-centric) schools? I wonder if non-native English typists come away with a better handling of the power of keyboard shortcuts (whether to create accents or not)
Well, when 99% think unicode = encoding = ucs2 = utf-16, don't believe there's something outside BMP, and wtf is the only word coming to their mind when they hear about graphemes… Unicode won?