科技回声

2 条评论

ANSI C specified wchar_t in 1989, two years before Unicode 1.0. They couldn't even be sure Unicode was going to win.<p>Besides, the <i>whole point</i> of wchar_t is to not be variable width. UTF-16 in wchar_t is an abomination that dates back to the industry building APIs that take UCS-2 (which the author really ought to cover) before they realized UCS-2 was too narrow to do its job. So now we have a lot of code that appears to support Unicode but may not handle it correctly, depending on whether QA knew they should try surrogate pairs. Almost nobody realizes UTF-16 needs to be searched and spliced as carefully as UTF-8. Each is just a compression scheme for the million or so actual codepoints, and there aren't many reasons to favor one over the other (in memory, at least).<p>What's the actual problem here, the team made assumptions that ANSI warned against making? Apple failed to accept UCS-4 for their API?

评论 #1239192 未加载

jheriko大约 15 年前

This is silly... not only is the article inaccurate but the problem described is trivial to solve. As long as you know what encoding wchar_t uses and what encoding your data is stored in this is not a big problem, use one format internally and convert your data on the way in as appropriate. Trust me, I solved it with no prior knowledge, no formal education in less than a day, as a distraction during my day job... I did not have to re-write the entire library from scratch.

评论 #1239046 未加载

A rant about cross-platform programming with wchar_t

2 条评论

A rant about cross-platform programming with wchar_t

2 条评论