Can we believe our eyes? Misleading people with Unicode.

487 pointsby bensummersalmost 14 years ago

19 comments

This was how some folks pretended that Google had erased all mentions of "Oracle":<a href="http://giorgiosironi.blogspot.com/2010/08/google-never-removed-oracle-from-its.html" rel="nofollow">http://giorgiosironi.blogspot.com/2010/08/google-never-remov...</a>I used this to prank some people on the in-house SEO team at my last job. I'd ask them if they had done anything that might be considered black-hat. Then I sent him a link to a "site:" query on Google indicating that our site had been removed from the index.e.g. <a href="http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=site%3An%D0%B5ws.ycombinator.com" rel="nofollow">http://www.google.com/search?sourceid=chrome&ie=UTF-8&#3...</a>

评论 #2874660 未加载

评论 #2873984 未加载

评论 #2876468 未加载

评论 #2874570 未加载

Rygualmost 14 years ago

I'm pretty shocked that I have never heard of the RLO unicode character before this article. Let's see if it works: ppa.emorhCelgooG => ‮ppa.emorhCelgooG

评论 #2873037 未加载

评论 #2873117 未加载

评论 #2873558 未加载

评论 #2873646 未加载

评论 #2873170 未加载

评论 #2873939 未加载

driverdanalmost 14 years ago

I personally find the RLO / LRO issue much more concerning. I just tested Chrome and Firefox and found it works in URLs. You could rewrite pyapla.com to paypal.com and phish people easily.

评论 #2874147 未加载

评论 #2876567 未加载

评论 #2874105 未加载

mcantoralmost 14 years ago

It's funny to think that a hapless vimmer who happens to be running Windows would have never noticed this, because they would simply have typed ":edit $SYSTEMROOT\system32\drivers\etc\hosts" and gotten the real file.(This isn't a "look how cool command line junkies are" comment; I was just musing.)

评论 #2874243 未加载

gwernalmost 14 years ago

I remember years back on Wikipedia, clever vandals would play Unicode tricks. It was interesting, to say the least - you'd register a name that looks identical to a real user, vandalize, and hope the administrator would type the name in...This was ultimately stopped by Antispoof (<a href="https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:AntiSpoof" rel="nofollow">https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extens...</a>) but the bug reports are still interesting:- <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=2593" rel="nofollow">https://bugzilla.wikimedia.org/show_bug.cgi?id=2593</a>- <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=2290" rel="nofollow">https://bugzilla.wikimedia.org/show_bug.cgi?id=2290</a>

评论 #2874685 未加载

noahcalmost 14 years ago

Somewhat related to this is the ability to change 'l" and "I" around when they both look the same, basically a straight line.This was very common in Yahoo Chat Rooms when folks would pretend to be someone else by registering their name with the opposite of what they had (assuming it had an "i" or "l" in it).They would then take a screen shot of their font and copy that exactly so they could appear to be the other person. I'll let you imagine the chaos that could occur because of this!

评论 #2873702 未加载

评论 #2873755 未加载

andresmhalmost 14 years ago

this reminds me of IDN domains, a few weeks ago I purchased fácebook.com and góogle.com. I get a few hundred visits every day. I posted about it here <a href="https://plus.google.com/110362380602139255131/posts/NPS3VNyDxuJ" rel="nofollow">https://plus.google.com/110362380602139255131/posts/NPS3VNyD...</a>

评论 #2873931 未加载

matthaveneralmost 14 years ago

This is why "filters" that prevent XSS, etc by remove malicious characters are so easily breakable. This type of attack is called a canonicalization attack (more here <a href="https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode" rel="nofollow">https://www.owasp.org/index.php/Canonicalization,_locale_and...</a>)

评论 #2873115 未加载

评论 #2873177 未加载

fferenalmost 14 years ago

A while ago I compiled a list of unicode characters that looked like letters, to get past curse filters. Not comprehensive, because I just manually skimmed through a unicode table, but here it is if anyone cares:Wide letters: ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚBetter looking letters: ϲ р с Ѕ І А В Е М Н О Р С а е о ѕ і ԛAnyone know of a better resource?

评论 #2875480 未加载

nodataalmost 14 years ago

Seems easy enough to guard against. Highlight the characters which are unexpected for my locale.

评论 #2873150 未加载

评论 #2873121 未加载

评论 #2876703 未加载

评论 #2874137 未加载

sarenjialmost 14 years ago

This is an important issue in some chat programs. I had to deal with this all the time: malicious users using i vs. l to pose as others, and using unicode to mess up or reverse the entire chat. One of the more interesting unicode had characters going left, right, and up and down. This confused moderators about who to kick/ban and obscured other users' text.The solution was to implement a regex of whitelisted characters; since it's an English-only program, this works well and is future-safe. For multiple languages, a blacklist is probably okay, but the difficulty lies in keeping the blacklist both complete and up to date.

评论 #2873758 未加载

mikelwardalmost 14 years ago

On Linux, how could you make standard tools highlight or differentiate potentially misleading characters?I guess the solution would have to be in the terminal emulator? Would a blacklist of Unicode ranges be sufficient?

评论 #2873112 未加载

评论 #2873427 未加载

evilswanalmost 14 years ago

Yes - reminds me of how several users would exploit the Bolt.com chat system (back in the day) using upper-case 'I's as lower case 'L's to pose as different users and cause mayhem.

评论 #2873251 未加载

评论 #2873132 未加载

ams6110almost 14 years ago

Isn't the real issue that in order to be vulnerable to this, you have to be running as a user who has permission to diddle with the hosts file? Or that your hosts file has too-liberal write permissions?Hosts file attacks are well-known enough that on windows I always set them to read-only, so that even administrators can't change them without first clearing the read-only flag.

评论 #2873471 未加载

delinkaalmost 14 years ago

With the exe as jpg example, it'd be even more misleading if the exe used a photo for an icon, launched the photo viewer app for a matching jpg photo, and launched an insidious process in the background. Even harder to detect from whence the malware came.

HankMcCoyalmost 14 years ago

0x43E (о) is the stuff of nightmaresfacebооk.com is still available ;)

tantaloralmost 14 years ago

But, how does this work? Does Windows source all of the files in your %SystemRoot%\system32\drivers\etc? Why does it matter what the file is named? To hide from idiots?

评论 #2873077 未加载

评论 #2873054 未加载

评论 #2873076 未加载

adr_almost 14 years ago

I've had a Fасевоок Offіcіal account lying around for a year or so.

arvinjoaralmost 14 years ago

Seems like there's a hidden benefit to always keeping a tab with the hosts file open in notepad++. I use the hosts file from time to time, and I just leave the tab open, never thought it could help me out security-wise though.