TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

JavaScript’s internal character encoding: UCS-2 or UTF-16?

37 点作者 mathias超过 13 年前

7 条评论

__alexs超过 13 年前
&#62; Both UCS-2 and UTF-16 are character encodings for Unicode.<p>UCS-2 is not an encoding that is generally compatible with Unicode. It's kind of like saying that 7-bit ASCII and UTF-8 are character encodings for Unicode.
评论 #3490224 未加载
评论 #3489325 未加载
hmottestad超过 13 年前
"It produces a variable-length result of either one or two 16-bit code units per code point"<p><pre><code> from the article. </code></pre> "It produces a variable-length result of either one or two 16-bit code units per code point"<p><pre><code> from wikipedia.org </code></pre> I feel this should have been quoted or referenced in some way in the article. Or it might just be a very rare case of coincidence.
lambda超过 13 年前
The encoding is UTF-16, but what it calls "characters" are code units <a href="http://unicode.org/glossary/#code_unit" rel="nofollow">http://unicode.org/glossary/#code_unit</a>, not code points <a href="http://unicode.org/glossary/#code_point" rel="nofollow">http://unicode.org/glossary/#code_point</a>.
评论 #3490387 未加载
apaprocki超过 13 年前
For all the gory details about TC-39 work to possibly get rid of this restriction in ECMAScript and support full Unicode, venture to the TC-39 wiki:<p><a href="http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings" rel="nofollow">http://wiki.ecmascript.org/doku.php?id=strawman:support_full...</a>
herge超过 13 年前
Why doesn't everybody use UTF-8? How much overhead is incurred in encoding a non-ascii language (say Chinese) in UTF-8 compared to UTF-16?
评论 #3493629 未加载
patorjk超过 13 年前
Very nice write up. I was actually looking for something like this about a week ago, and was referred to the ECMAScript spec (section 8.4) which talked about "UTF-16 code units" - which I believe is just UCS-2. If this is the case, I kind of wonder if the spec should be updated to make things a little more clear, since the issue isn't straight forward for those who don't know a lot about unicode.
评论 #3490261 未加载
yonran超过 13 年前
This means that for applications that want to store binary data as efficiently as possible in localStorage (e.g. Offline Wikipedia <a href="https://news.ycombinator.com/item?id=3409512" rel="nofollow">https://news.ycombinator.com/item?id=3409512</a>), you can pack two bytes into each string character. ECMAScript strings are just arrays of 16-bit unsigned integers (e.g., '\ud800' is a valid JS string but is not valid UTF-16).
评论 #3490398 未加载