TechEcho

1 comment

nabla9over 8 years ago

>Every platonic letter in every alphabet is assigned a magic number by the Unicode consortium which is written like this: U+0639. This magic number is called a code point.This is wrong.Code-point does not match each platonic letter (abstract character in Unicode) nor grapheme. It does so in many western languages and alphabets but not in general. Code-point is just unit of information used in __encoding__.Mapping from code-points to abstract characters is not total, injective, or surjective. Some abstract characters need more than one code point to express them. Also a grapheme can be sequence of one or more code points and so can abstract character and so can abstract character. You can't split code points to split text into abstract characters or graphemes.What every software developer must know beyond code points and code units:User-perceived character : what user thinks is a character.Grapheme cluster : A sequence of coded characters that ‘should be kept together'. They try to represent user perceived character in language independent way. Selecting single character or cursor movement happen at this level.

What Every Software Developer Must Know About Unicode and Character Sets

1 comment

What Every Software Developer Must Know About Unicode and Character Sets

1 comment