科技回声

1 comment

ggchappell超过 10 年前

Worthy thoughts. But you should think of the Icelandic eth ("ð") as a variation on a "d", not an "o". :-) Not that I know any Icelandic, but a capital eth looks like this: "Ð".More info: <a href="https://en.wikipedia.org/wiki/Eth" rel="nofollow">https://en.wikipedia.org/wiki/Eth</a>EDIT: As for "クッキー": dealing with this is mostly straightforward, if you're willing to allow multiple ASCII characters for a single unicode character. Most of the katakana (which these are), along with the hiragana, have a standard romanization. For the first three here, the romanizations are (if I'm not mistaken) "ku", "tu", and "ki". The dash-looking thing lengthens the last vowel, so this is "kutukii". That probably requires more intelligence than you're wanting to bake into this thing, but I think "kutuki-" wouldn't be bad.Korean characters similarly have standard romanizations (e.g., "원" = "won").Figuring out all of the above would take some work, but it could easily be crowd-sourced, and it would only have to be done once.The zillions of Chinese characters are the problem. These could have different romanizations depending on which of the Chinese languages they're being used for, and often multiple possible romanizations when used for Japanese (in which case they are "kanji"). So there might be no good solution for these.

Homoglyph Substitution

1 comment

Homoglyph Substitution

1 comment