TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Homoglyph Substitution

3 点作者 jmhobbs超过 10 年前

1 comment

ggchappell超过 10 年前
Worthy thoughts. But you should think of the Icelandic eth (&quot;ð&quot;) as a variation on a &quot;d&quot;, not an &quot;o&quot;. :-) Not that I know any Icelandic, but a capital eth looks like this: &quot;Ð&quot;.<p>More info: <a href="https://en.wikipedia.org/wiki/Eth" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Eth</a><p>EDIT: As for &quot;クッキー&quot;: dealing with this is mostly straightforward, if you&#x27;re willing to allow <i>multiple ASCII characters</i> for a single unicode character. Most of the katakana (which these are), along with the hiragana, have a standard romanization. For the first three here, the romanizations are (if I&#x27;m not mistaken) &quot;ku&quot;, &quot;tu&quot;, and &quot;ki&quot;. The dash-looking thing lengthens the last vowel, so this is &quot;kutukii&quot;. That probably requires more intelligence than you&#x27;re wanting to bake into this thing, but I think &quot;kutuki-&quot; wouldn&#x27;t be bad.<p>Korean characters similarly have standard romanizations (e.g., &quot;원&quot; = &quot;won&quot;).<p>Figuring out all of the above would take some work, but it could easily be crowd-sourced, and it would only have to be done once.<p>The zillions of Chinese characters are the problem. These could have different romanizations depending on which of the Chinese languages they&#x27;re being used for, and often multiple possible romanizations when used for Japanese (in which case they are &quot;kanji&quot;). So there might be no good solution for these.