TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

UTF-n: Brainstorming alternate text encodings

9 点作者 tr4nslator超过 15 年前

4 条评论

sern超过 15 年前
Sure, UTF-8 sucks for Chinese and UTF-16 is bad at English, but in practice, high-codepoint languages are rarely mixed with low-codepoint ones. Notice that when sending an email many mail programs will select the most concise encoding that happens to encompass every character in your message and usually not UTF-8 or UTF-16.
评论 #1093431 未加载
randallsquared超过 15 年前
Adding some Chinese (from wikipedia) does actually show UTF-n as worst-case compared to UTF-16 and UTF-8, at least. I got UTF-16: 0%, UTF-8: 5%, and UTF-n: 7%.
blasdel超过 15 年前
It would be good to have demo text that plays to UTF-n's advantage, so I don't have to copy-paste from someplace like jp.wikipedia.org myself :)<p>It looks like it preserves some of UTF-8's stream synchronization properties, but does it have UTF-8's wonderful property of being recognizable by simple heuristics to great confidence even for tiny sequences?
gritzko超过 15 年前
For Russian, the difference between UTF-8 and UTF-n is statistically insignificant.