I'm not a Python geek, but I found the C implementation for unicode strings in CPython really interesting code reading:<p><a href="http://hg.python.org/cpython/file/tip/Objects/unicodeobject.c" rel="nofollow">http://hg.python.org/cpython/file/tip/Objects/unicodeobject....</a><p>CPython supports several internal representations from one to four bytes per character to optimize for space and performance. There's also a nifty sort of Bloom filter for quick discrimination of strings that might contain characters of interest.