TechEcho

13 comments

ASCII is by far the most successful character encoding that computers have used. It was invented in 1963, back in the era of punch cards and core memory. Modern RAM did not exist until 1975 -- a decade later.Unicode is the replacement, not the competitor, like 64-bit IP addresses are the replacement for 32-bit IP addresses. It was developed in the early 1990s when RAM got cheap enough that you could afford two-bytes per character.Personally, I deal with data all the time and rarely encounter unicode. Of course, I'm in the US dealing with big files out of financial and marketing databases. In fact, I've seen more EBCDIC than UNICODE.

评论 #6012219 未加载

评论 #6012367 未加载

评论 #6012221 未加载

评论 #6012470 未加载

评论 #6013252 未加载

timthornalmost 12 years ago

I really hate to nitpick, but the article implies that ASCII was the first character encoding. In fact, there was a rich history of different encodings before that, with different word sizes and/or incompatible 8 bit encodings. It's quite interesting to look back and see what trade-offs were made and why.

评论 #6012148 未加载

评论 #6012334 未加载

评论 #6012200 未加载

salmonellaeateralmost 12 years ago

The fact that UTF-8 and UTF-16 are often exposed to programmers when dealing with text is a major failure of separation-of-concerns. If you had a stream of data that was gzipped, would it ever make sense to look at the bytes in the data stream before decompressing it? Variable-length text encodings are the same. Application code should only see Unicode code points.In general it was a mistake to put variable-length encodings into the Unicode standard. A much better design would have been to use UTF-32 for the application-level interface to characters, and use a separate compression standard that is optimized for fixed alphabets when transporting or storing text. This has the advantage that the compression scheme can be dynamically updated to match the letter frequencies in the real-world text, and it logically separates the ideas of encoding and compression so that the compression container is easier to swap out. And, of course, an entire class of bugs would be eliminated from application code.Edited first paragraph to clarify: Variable-length text encodings are the same.

评论 #6014071 未加载

评论 #6013060 未加载

评论 #6012819 未加载

ygraalmost 12 years ago

I'm impressed. Easily readable and understandable, short and as far as I can tell no factual inaccuracies and wrong information (unlike many other Unicode introductions and tutorials).

评论 #6012317 未加载

评论 #6012264 未加载

评论 #6012239 未加载

评论 #6012235 未加载

Digit-Alalmost 12 years ago

>ASCII really should have been named ASCIIWOA: the American Standard Code for Information Exchange With Other Americans.So he thinks Americans are the only people to use the English language does he?

评论 #6012986 未加载

评论 #6013820 未加载

评论 #6012632 未加载

peterkellyalmost 12 years ago

Another good article on this topic is the one by Joel Spolsky:<a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">http://www.joelonsoftware.com/articles/Unicode.html</a>

gnosisalmost 12 years ago

"Designed as a single, global replacement for localised character sets, the Unicode standard is beautiful in its simplicity. In essence: collect all the characters in all the scripts known to humanity and number them in one single, canonical list. If new characters are invented or discovered, no problem, just add them to the list. The list isn’t an 8-bit list, or a 16-bit list, it’s just a list, with no limit on its length."Is this really true? My impression was that UTF-32 is a fixed-length encoding which uses 32 bits to encode all of Unicode. It seems that this means that Unicode can never have more code points than could fit in 32 bits. Right?

评论 #6012199 未加载

评论 #6012197 未加载

评论 #6012232 未加载

评论 #6012192 未加载

okwaalmost 12 years ago

> These mappings of numbers to characters are just a convention that someone decided on when ASCII was developed in the 1960s. There’s nothing fundamental that dictates that a capital A has to be character number 65, that’s just the number they chose back in the day.I don't think it's mere coincidence that the capital letters start at 65 and the lower case at 97 and the decimal digits at 48.

stuartcwalmost 12 years ago

It's not a matter of winning or loosing. The pre-unicode mix of character sets was a mess when it came internationalization. Try truncating a Japanese Shift-JIS string in C. That will learn you..

评论 #6012404 未加载

dansoalmost 12 years ago

OT and out of curiosity...how do non-native English speakers experience typing/keyboard education? I can barely remember how to make any of the basic accents over the `e` when trying to sound French...are typing classes in non-English schooling systems much more sophisticated than in English (i.e. ASCII-centric) schools? I wonder if non-native English typists come away with a better handling of the power of keyboard shortcuts (whether to create accents or not)

评论 #6012573 未加载

评论 #6021643 未加载

评论 #6013353 未加载

评论 #6013033 未加载

评论 #6012469 未加载

评论 #6013359 未加载

评论 #6012453 未加载

lmmalmost 12 years ago

Given the controversy over Han unification, I suspect that incompatible charactersets will be with us for a while yet, more's the pity.

lelfalmost 12 years ago

Well, when 99% think unicode = encoding = ucs2 = utf-16, don't believe there's something outside BMP, and wtf is the only word coming to their mind when they hear about graphemes… Unicode won?

rayineralmost 12 years ago

Unicode, meh. Nobody will ever need more than 128 characters.

13 comments

joshuaellingeralmost 12 years ago

评论 #6012219 未加载

评论 #6012367 未加载

评论 #6012221 未加载

评论 #6012470 未加载

评论 #6013252 未加载

timthornalmost 12 years ago

评论 #6012148 未加载

评论 #6012334 未加载

评论 #6012200 未加载

salmonellaeateralmost 12 years ago

评论 #6014071 未加载

评论 #6013060 未加载

评论 #6012819 未加载

ygraalmost 12 years ago

I'm impressed. Easily readable and understandable, short and as far as I can tell no factual inaccuracies and wrong information (unlike many other Unicode introductions and tutorials).

评论 #6012317 未加载

评论 #6012264 未加载

评论 #6012239 未加载

评论 #6012235 未加载

Digit-Alalmost 12 years ago

评论 #6012986 未加载

评论 #6013820 未加载

评论 #6012632 未加载

peterkellyalmost 12 years ago

Another good article on this topic is the one by Joel Spolsky:<a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">http://www.joelonsoftware.com/articles/Unicode.html</a>

gnosisalmost 12 years ago

评论 #6012199 未加载

评论 #6012197 未加载

评论 #6012232 未加载

评论 #6012192 未加载

okwaalmost 12 years ago

stuartcwalmost 12 years ago

It's not a matter of winning or loosing. The pre-unicode mix of character sets was a mess when it came internationalization. Try truncating a Japanese Shift-JIS string in C. That will learn you..

评论 #6012404 未加载

dansoalmost 12 years ago

评论 #6012573 未加载

评论 #6021643 未加载

评论 #6013353 未加载

评论 #6013033 未加载

评论 #6012469 未加载

评论 #6013359 未加载

评论 #6012453 未加载

lmmalmost 12 years ago

Given the controversy over Han unification, I suspect that incompatible charactersets will be with us for a while yet, more's the pity.

lelfalmost 12 years ago

Well, when 99% think unicode = encoding = ucs2 = utf-16, don't believe there's something outside BMP, and wtf is the only word coming to their mind when they hear about graphemes… Unicode won?

rayineralmost 12 years ago

Unicode, meh. Nobody will ever need more than 128 characters.

How ASCII lost and unicode won

13 comments

How ASCII lost and unicode won

13 comments