Well that definitely takes the 𝕡𝕣𝕚𝕫𝕖 for most noticeable Hacker News submission.<p>Suggestion (if you are author): There are a lot of chars that look like another char, often used on the web, so i think that there are more advanced versions to be made. I think i read that a lot of thai signs and cyrillic look like latin chars.
Funny how it triggered a bug in Firefox. When the tab is unfocused, its title in the handle is "𝑼𝒏…", but when it gets the focus it becomes "𝑼<D835>…" (in a square box). The next codepoint is U+1D48F whose UTF-16 BE encoding is d8 35 dc 8f.<p>I'd say that the truncation algorithm operates on bytes and that it can't make sense of d8 35, but I'm not too sure how to fix that since graphemes can have arbitrary length (right?). Do you have to compute the width in advance?
This is similar to the pseudolocalization (þšéûðöļöçåļîžåţîöñ), that adds random accents to English word to test the localization capabilities of a program without requiring another language knowledge.<p>An online version: <a href="http://www.pseudolocalize.com/" rel="nofollow">http://www.pseudolocalize.com/</a><p>A library: <a href="http://code.google.com/p/pseudolocalization-tool/" rel="nofollow">http://code.google.com/p/pseudolocalization-tool/</a>
Hey! I was just thinking about this site, and visited it for the first time in years, after mentioning the old <i>San Francisco</i> ransom-font in another thread.<p>By randomly mixing these Unicode letter and letterlike characters, you can simulate a cut-and-paste ransom-note. For example, an acquired company could announce changes to its privacy policy:<p><pre><code> wE ℎåve yøuR ρrIvᴀçy ⅈn a ᴡiNdøwleSs ℞oøm,
& ℙℓaℕ τø ⅆo µnSρεaKᴀble †hiℕℊs t○ ⅈt</code></pre>
Oh, no !<p>The cat should have stayed in a box, if this gains too much popularity, HN will read like MySpace back in the days.<p>And top HN news will be: "A browser plugin that translates Unicode back to ASCII".
This surprises me, what exactly is the point of encoding what are essentially different fonts in unicode? Isn't that the job of the presentation layer?<p>(the Fraktur variant is awesome btw, and is apparently in the valid unicode range for Java...)
Since it wasn't mentioned here earlier, it's worth to take a look at shapecatcher to see what glyphs might resemble latin letters.<p>Scribbling something resembling the latin capital letter A returns for example any of these codepoints: A𝘈ΑАÅ𝖠∆ДΔ𝐴𝟺дᎪߡ𝛢Å4𝛥ᴬᐃⵠ𐌀𝘼𝛬Λ△𝟦Ą𝜟𝓐⌓⧍ᗋ🜂Ⲇ🗻🍙ⲇѦᗩᗅ<p><a href="http://shapecatcher.com/" rel="nofollow">http://shapecatcher.com/</a> (<a href="https://news.ycombinator.com/item?id=5150107" rel="nofollow">https://news.ycombinator.com/item?id=5150107</a>)<p>Also the Unicode Consortium has some reports on security:<p><a href="http://www.unicode.org/reports/tr36/" rel="nofollow">http://www.unicode.org/reports/tr36/</a><p><a href="http://www.unicode.org/reports/tr39/" rel="nofollow">http://www.unicode.org/reports/tr39/</a><p>listing all kind of spoofing methods you haven even thought of.
One of my friends, moving to China for a semester to teach, was thinking of using a proper Chinese name to make it easier for students to address him. He had a good idea, even, which he shared on Facebook.<p>I proposed that we should name him after the lack of unicode support in our browsers, and we ended up calling him "Box Boxbox" for a couple of months.
Does anyone know why there are separate Unicode code points for letters in bold, bold italic and Fraktur? Normally this sort of thing should be handled by different fonts / font variants. Is it for compatibility with some legacy encoding?
I couldn't help but notice that this converter was copyrighted by Eli the Bearded. Google "Eli the Bearded", but not from work. You'll get some very interesting results.<p><a href="https://encrypted.google.com/#q=Eli%20the%20Bearded" rel="nofollow">https://encrypted.google.com/#q=Eli%20the%20Bearded</a>
I was once bilked into buying some scraped content as original work by this method. It passed copyscape, and my test of Googling a a random sentence in quotes didn't bring anything up. I let it go because I had already accepted the work, and the lesson was worth more than the article anyway.<p>Don't be fool as I was! Had I manually transcribed a sentence into Google instead of copying + pasting the Unicode chars, I would have found hundreds of copies of the same article.
In Javascript, many unicode characters are allowed [0], so háćḱéŕŃéẃś is a valid variable name [1].<p>Note: The number of іllэБіъlэVаѓіаъlэИамэѕ [2] used in your production code is inversely proportional to the number of friends you'll make in the maintenance team.<p>[0] <a href="https://mathiasbynens.be/notes/javascript-identifiers" rel="nofollow">https://mathiasbynens.be/notes/javascript-identifiers</a><p>[1] <a href="https://mothereff.in/js-variables#h%C3%A1%C4%87%E1%B8%B1%C3%A9%C5%95%C5%83%C3%A9%E1%BA%83%C5%9B" rel="nofollow">https://mothereff.in/js-variables#h%C3%A1%C4%87%E1%B8%B1%C3%...</a><p>[2] <a href="http://www.panix.com/~eli/unicode/convert.cgi?text=illegibleVariableNames" rel="nofollow">http://www.panix.com/~eli/unicode/convert.cgi?text=illegible...</a>
What I need is something that takes all the extended characters (think Spanish or Swedish) and turns them into alternative safe versions.<p>For instance, á into a, ñ into n, å into a, etc.<p>Had my hopes up when I saw the title.<p>Does anyone have any ideas or links to working scripts that I can turn into something useful? I need to "sanitize" a database of foreign documentaries before uploading to YouTube (their metadata input system chokes on extended chars). Thanks!
I made an iPhone app that does kind of the same thing, but converts letters to their upside-down unicode equivalent. It's fun for sending upside-down texts.<p>Free and ad-free, just a fun project:<p><a href="https://itunes.apple.com/us/app/texting-upside-down-free/id435354073?mt=8" rel="nofollow">https://itunes.apple.com/us/app/texting-upside-down-free/id4...</a>
Just a PSA for discoverability: since the replacement characters use different code points than their more standard equivalents, the default HN search (<a href="https://hn.algolia.com" rel="nofollow">https://hn.algolia.com</a>) at least doesn't find this submission when searching for "unicode."
Great, now we'll have to rely on IDEs with clickable drop-down lists of variables and function names because simple text input just got a lot harder for languages where Unicode is allowed for symbols!<p><a href="http://play.golang.org/p/2zYfCx_J-O" rel="nofollow">http://play.golang.org/p/2zYfCx_J-O</a>
My iOS/Safari shows squares in the page itself, but a row of boxed aliens in the `Bookmarks and History` list:<p><a href="http://imgur.com/l98p9oN" rel="nofollow">http://imgur.com/l98p9oN</a><p>(image is safe for work, though other stuff on imgur.com is likely not)
Interesting; the title displayed OK minutes ago, on the main page, in Firefox/OSX. But now it's showing as unsupported-glyph boxes inside the page... but still looks OK in the titlebar of the item (comments) page.<p>Did some automated or administrative process mutate the characters? Or is this just Firefox drifting, in choice of font?
Strangely, for me on Firefox 33.1 on OS X, the title shows up fine on the main page. But when I click through to the comment, I get boxes only, and from then on, the main page also doesn't work anymore until I restart Firefox. I suspect an extension, but I'm not sure.
Also, strike-through. Which is the one I find genuinely useful because I like the suggestive way to say s̶o̶m̶e̶t̶h̶i̶n̶g̶ then visibly correcting to something else.<p><a href="http://adamvarga.com/strike/" rel="nofollow">http://adamvarga.com/strike/</a>
Note that XP cannot show<p><pre><code> Negative Circled
Squared
Negative Squared
Double-struck
Bold
Bold italic
Bold script
Fraktur
</code></pre>
At least not with the fonts I have.
Very cool. Although the upside-down text doesn't work with ümlauts and numbers. A reverse function would also be nice.<p>I wrote a similar tool that does this (<a href="http://lunicode.com" rel="nofollow">http://lunicode.com</a>). It's on Github if you want to use the code: <a href="https://github.com/combatwombat/Lunicode.js" rel="nofollow">https://github.com/combatwombat/Lunicode.js</a>
Different problem, but someone who knows about unicode will probably know this -<p>When I paste from microsoft documents into putty, characters will often be transformed to weird versions. Example - emdash is a different character to '-'. It comes through as a weird tilda character instead of a dash. Mmm. Frustating.<p>Is there a robust program you can run on putty to catch such type and flatten it to ascii?
I’ve never been a fan of this sort of thing. The Unicode characters in these font blocks are not letters for making words; at least the double‐struck, fraktur, bold, italic, and bold italics are semantically for use in mathematical equations.<p>This can have some strange effects if you try to use them like letters. Example: What’s the lowercase transform of 𝑼? 𝑼! Not 𝒖.
If you like this sort of thing, you might like this piece I wrote some time back about writing a Ruby script using whitespace for all identifiers: <a href="http://www.rubyinside.com/the-split-is-not-enough-whitespace-shenigans-for-rubyists-5980.html" rel="nofollow">http://www.rubyinside.com/the-split-is-not-enough-whitespace...</a>
I don't really speak/read Russian, but I have a passable understanding of Cyrillic, and those always look dumb. It doesn't look like "the" to be, it looks lik "guh-buh-yeh" or something.<p>Same thing with the Borat DVD cover.
Finally a way to express myself on facebook properly ;) I wonder if bold text would lead to better conversion from ads using this trick. And I wonder when is facebook going to ban this because obviously it works :)
See <a href="https://news.ycombinator.com/item?id=7383672" rel="nofollow">https://news.ycombinator.com/item?id=7383672</a> though they changed my title to normal text.
Chrome on iOS is giving me the character unavailable boxes. Normally I'd just change the font but I can't do that here.<p>This doesn't feel like the future.
The question I have is, what's the easiest way to strip this 🅹🆄🅽🅺 out of unicode strings submitted by web users? With a nod to Cunningham's Law, surely the right answer is a regular expression?