TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Absolute Minimum Every Dev Must Know About Unicode and Character Sets (2003)

3 pointsby krausejjalmost 7 years ago

1 comment

nabla9almost 7 years ago
This article should be retired because it&#x27;s harmful.<p>This and other &quot;absolute minimums&quot; like it seem to stop before teaching absolute minimum probably because the authors don&#x27;t know the absolute minimum. They just teach the encodings and stop there. That&#x27;s harmful.<p>Consider the two incorrect sentences Joel makes:<p>&gt; In Unicode, a letter maps to something called a code point<p>and<p>&gt;Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium<p>These are incorrect statements and Joel does not (or did not) know enough about Unicode to know that he is wrong.<p>Above the code points are &quot;grapheme clusters&quot;, &quot;extended grapheme clusters&quot; or “user-perceived character” (“a basic unit of a writing system for a language”) that match the &quot;platonic letter&quot; Joel talks about. wchar_t can&#x27;t represent<p>Extended grapheme clusters can have arbitary number of code points in them . You need to use unicode-segmentation to cut unicode string into smaller strings that represent &quot;platonic characters&quot; if you want to do it right.
评论 #17784420 未加载