This was how some folks pretended that Google had erased all mentions of "Oracle":<p><a href="http://giorgiosironi.blogspot.com/2010/08/google-never-removed-oracle-from-its.html" rel="nofollow">http://giorgiosironi.blogspot.com/2010/08/google-never-remov...</a><p>I used this to prank some people on the in-house SEO team at my last job. I'd ask them if they had done anything that might be considered black-hat. Then I sent him a link to a "site:" query on Google indicating that our site had been removed from the index.<p>e.g. <a href="http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=site%3An%D0%B5ws.ycombinator.com" rel="nofollow">http://www.google.com/search?sourceid=chrome&ie=UTF-8...</a>
I'm pretty shocked that I have never heard of the RLO unicode character before this article. Let's see if it works: ppa.emorhCelgooG => ppa.emorhCelgooG
I personally find the RLO / LRO issue much more concerning. I just tested Chrome and Firefox and found it works in URLs. You could rewrite pyapla.com to paypal.com and phish people easily.
It's funny to think that a hapless vimmer who happens to be running Windows would have never noticed this, because they would simply have typed ":edit $SYSTEMROOT\system32\drivers\etc\hosts" and gotten the real file.<p>(This isn't a "look how cool command line junkies are" comment; I was just musing.)
I remember years back on Wikipedia, clever vandals would play Unicode tricks. It was interesting, to say the least - you'd register a name that looks identical to a real user, vandalize, and hope the administrator would type the name in...<p>This was ultimately stopped by Antispoof (<a href="https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:AntiSpoof" rel="nofollow">https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extens...</a>) but the bug reports are still interesting:<p>- <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=2593" rel="nofollow">https://bugzilla.wikimedia.org/show_bug.cgi?id=2593</a><p>- <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=2290" rel="nofollow">https://bugzilla.wikimedia.org/show_bug.cgi?id=2290</a>
Somewhat related to this is the ability to change 'l" and "I" around when they both look the same, basically a straight line.<p>This was very common in Yahoo Chat Rooms when folks would pretend to be someone else by registering their name with the opposite of what they had (assuming it had an "i" or "l" in it).<p>They would then take a screen shot of their font and copy that exactly so they could appear to be the other person. I'll let you imagine the chaos that could occur because of this!
this reminds me of IDN domains, a few weeks ago I purchased fácebook.com and góogle.com. I get a few hundred visits every day. I posted about it here <a href="https://plus.google.com/110362380602139255131/posts/NPS3VNyDxuJ" rel="nofollow">https://plus.google.com/110362380602139255131/posts/NPS3VNyD...</a>
This is why "filters" that prevent XSS, etc by remove malicious characters are so easily breakable. This type of attack is called a canonicalization attack (more here <a href="https://www.owasp.org/index.php/Canonicalization,_locale_and_Unicode" rel="nofollow">https://www.owasp.org/index.php/Canonicalization,_locale_and...</a>)
A while ago I compiled a list of unicode characters that looked like letters, to get past curse filters. Not comprehensive, because I just manually skimmed through a unicode table, but here it is if anyone cares:<p>Wide letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z<p>Better looking letters: ϲ р с Ѕ І А В Е М Н О Р С а е о ѕ і ԛ<p>Anyone know of a better resource?
This is an important issue in some chat programs. I had to deal with this all the time: malicious users using i vs. l to pose as others, and using unicode to mess up or reverse the entire chat. One of the more interesting unicode had characters going left, right, <i>and up and down</i>. This confused moderators about who to kick/ban and obscured other users' text.<p>The solution was to implement a regex of whitelisted characters; since it's an English-only program, this works well and is future-safe. For multiple languages, a blacklist is probably okay, but the difficulty lies in keeping the blacklist both complete and up to date.
On Linux, how could you make standard tools highlight or differentiate potentially misleading characters?<p>I guess the solution would have to be in the terminal emulator? Would a blacklist of Unicode ranges be sufficient?
Yes - reminds me of how several users would exploit the Bolt.com chat system (back in the day) using upper-case 'I's as lower case 'L's to pose as different users and cause mayhem.
Isn't the real issue that in order to be vulnerable to this, you have to be running as a user who has permission to diddle with the hosts file? Or that your hosts file has too-liberal write permissions?<p>Hosts file attacks are well-known enough that on windows I always set them to read-only, so that even administrators can't change them without first clearing the read-only flag.
With the exe as jpg example, it'd be even more misleading if the exe used a photo for an icon, launched the photo viewer app for a matching jpg photo, and launched an insidious process in the background. Even harder to detect from whence the malware came.
But, how does this work? Does Windows source all of the files in your %SystemRoot%\system32\drivers\etc? Why does it matter what the file is named? To hide from idiots?
Seems like there's a hidden benefit to always keeping a tab with the hosts file open in notepad++. I use the hosts file from time to time, and I just leave the tab open, never thought it could help me out security-wise though.