I wrote a potted history of the precursors to Unicode.[1] Mainly because I was dealing with Oracle and Unicode far, far too much.<p>Might be useful to someone...<p>1. <a href="http://www.randomtechnicalstuff.blogspot.com.au/2009/05/unicode-and-oracle.html" rel="nofollow">http://www.randomtechnicalstuff.blogspot.com.au/2009/05/unic...</a>
This should be a mandatory read before anyone asks questions about Unicode:<p>The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets<p><a href="http://www.joelonsoftware.com/articles/Unicode.html" rel="nofollow">http://www.joelonsoftware.com/articles/Unicode.html</a>
I made a little practical demo about using Unicode with Ruby for a presentation a while back, perhaps of interest to some people:<p><a href="https://github.com/norman/enc/blob/master/equivalence.rb" rel="nofollow">https://github.com/norman/enc/blob/master/equivalence.rb</a>
There is an explanation missing in that answer about how python 2 goofed up and python 3 strings fixed things. Without that explanation, showing how the Python 2 strings must be "decoded" is just terribly confusing.<p>It doesn't really make sense to "decode" a proper string type. Ideally the language should never reveal how it represents strings internally in memory, so you can think of strings as a sequence of abstract unicode codepoints with no specific encoding at all. And Python 3 strings are like that.