TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Making Sense of Python Unicode

26 点作者 leecho0大约 15 年前

4 条评论

jmillikin大约 15 年前
"But UTF-8 has a dark side, a single character can take up anywhere between one to six bytes to represent in binary."<p>What? No! UTF-8 takes, <i>at most</i>, 4 bytes per code point.<p>"But UTF-8 isn't very efficient at storing Asian symbols, taking a whole three bytes. The eastern masses revolted at the prospect of having to buy bigger hard drives and made their own encodings."<p>Many asian users object to UTF-8/Unicode because of the Han Unification, and because many characters supported in other character sets are not present in Unicode. Size of the binary encoding has nothing to do with it -- in fact, most east-asian characters take 4 bytes in UTF-16.<p>"American programmers: In your day to day grind, it's superfluous to put a 'u' in front of every single string."<p>American programmers <i>who aren't morons</i>: Use 'u' or the first time somebody tries to run an accent through your code, it'll come out looking like line noise.
评论 #1341075 未加载
评论 #1341089 未加载
qw大约 15 年前
<p><pre><code> Lobstertech wrote: &#62; American programmers: In your day to day grind, &#62; it's superfluous to put a 'u' in front of every single &#62; string."* Good idea, who cares about internationalization? You can always just pay someone in India to go over all of your code the day you notice the rest of the world Regards, European developer (... who doesn't want more competition)</code></pre>
s-phi-nl大约 15 年前
A good tutorial on Python Unicode is <a href="http://diveintopython3.org/strings.html" rel="nofollow">http://diveintopython3.org/strings.html</a>. It's also my favorite explanation of Unicode in general.
leecho0大约 15 年前
bonus tip:<p>don't forget to add:<p><pre><code> # -*- coding: utf-8 -*- </code></pre> also, if you're using vim, make sure your encoding as well as your fileencoding are correct (they're different):<p><pre><code> set encoding=utf-8 set fileencoding=utf-8</code></pre>
评论 #1341330 未加载