TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Unicode is tricky in Java and might be impossible in C++ (2006)

17 点作者 mblakele大约 16 年前

9 条评论

gchpaco大约 16 年前
Aha, I have uncovered it. There are two problems. One is that apparently std::locale is horribly, horribly broken in the C++ standard library for Mac OS X. You can work around this by using C, like Real Programmers. The other problem is that without some form of setlocale, the standard library will run in "C" or "POSIX" mode, which can do nothing at all interesting with Unicode. By running setlocale, or by running std::locale::global (std::locale ("")) on a platform with a C++ library that works, it will print appropriately.<p>Interestingly on my Ubuntu box, before I put the std::locale in it was printing "I have EUR100 to my name.", which is a pretty cool fallback.
评论 #644588 未加载
评论 #644604 未加载
vasi大约 16 年前
Err...this works fine for me if I use UTF-8 rather than wide strings, eg:<p>std::string s("foo\u203D"); std::cout &#60;&#60; s &#60;&#60; std::endl;<p>Also, he's completely wrong about the program terminating on output of a wide string, it's just that wcout is broken. If he had tried causing output afterwards with, say, printf(), he'd notice the program is still alive. Or running it in GDB would show this just as well. It's generally speaking a bad idea to test whether your program is alive, using the same mechanism that you suspect of killing it!
cr0bar大约 16 年前
afaik, when not using literal non-ascii characters in java source code you're supposed to use native2ascii. Running it on my mac gives this:<p><pre><code> $ native2ascii Foo.java public class Foo { public static void main(String[] args) { System.out.println("I have \u201a\u00c7\u00a8100 to my name."); } } </code></pre> The resulting program from that works perfectly fine in my mac terminal, which makes the "I know Java pretty well" statement pretty suspicious...
zkz大约 16 年前
Yet it works in cat. So you can at least program the solution in C, compile it with cpp, and then use it from inside your C++ program.
评论 #644652 未加载
评论 #644550 未加载
mblakele大约 16 年前
Old, but the MacRoman issue in Java bit me today so I thought someone else might benefit.<p>MacRoman? In 2009? Really?
评论 #644944 未加载
thwarted大约 16 年前
Interesting article, but it would really help me believe the author knows the topic (text encoding) by not having goofy HTML entity encoding scattered all over the place (making the C++ code extremely hard to read) and not use smart quotes in the code samples.
zokier大约 16 年前
Better headline would be 'Unicode is broken in OS X'. Lets just try out for fun how Linux box handles this: <a href="http://codepad.org/vIa5Byya" rel="nofollow">http://codepad.org/vIa5Byya</a><p>Yeah right, Unicode might be impossible in C++ ...<p>... on OS X
allenbrunson大约 16 年前
I print unicode strings to a Mac terminal all the time without problems. just send a string formatted as UTF8 to puts(), printf(), or similar.<p>It never would have occurred to me to use the built-in locale stuff. That's heading for a world of hurt.
TwoBit大约 16 年前
\u20ac is not UTF8; it's UCS2. If you set the locale to UTF8 then you need to printf with UTF8 and not UCS2 (or UTF16).
评论 #644552 未加载
评论 #644547 未加载