HN Discussion about the same topic from 2 days ago (126 comments to date): <a href="https://news.ycombinator.com/item?id=14119713" rel="nofollow">https://news.ycombinator.com/item?id=14119713</a>
Can a browser could track how many language/character sets are
typically used by a browser profile, and warn the user when they are
about to use a new, previously unused set, rather than waving the
duty off as the "responsibility of domain owners"?<p>With now over 1000 top-level domains, and however many homographic
matches among character sets, expecting people to register dozens of
matching domains seems unrealistic.
I wonder how the domain displays on email clients like gmail and outlook, this is the scariest part, most people will just look at the domain and think it's a valid mail and follow the instructions of that mail, it could be catastrophic for companies, the ubiquity $40 million fiasco comes to mind.
What an odd coincidence: I just published a Go package yesterday to detect such attacks in source code. Is there a homography bug going around?<p><a href="https://github.com/NebulousLabs/glyphcheck" rel="nofollow">https://github.com/NebulousLabs/glyphcheck</a><p>(btw, Wikipedia notes that "The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters.")
This is the scariest one: <a href="http://www.арр.com/" rel="nofollow">http://www.xn--80a6aa.com/</a> & <a href="http://www.app.com/" rel="nofollow">http://www.app.com/</a>
Interesting. The apple.com one (<a href="https://www.xn--80ak6aa92e.com/" rel="nofollow">https://www.xn--80ak6aa92e.com/</a>) shows literally that text in Pale Moon (27.2), but shows "аррӏе.com" (Cyrillic text) in Chrome 57 and Firefox 51.<p>Someone else's example that looks like "app.com" ( <a href="http://www.xn--80a6aa.com/" rel="nofollow">http://www.xn--80a6aa.com/</a>) translates to the Cyrillic text, even in Pale Moon. I wonder if Apple's site is on a hard-coded blacklist in the browser, or if every update includes the top-1000 list, or something?<p>I remember reading about issues with Unicode domains <i>years</i> ago, though. It surprises me that something hasn't been figured out by this point. One mitigation that I remember being discussed was coloring characters from different scripts in different colors, to make variant characters more obvious.