Everyone here is asking if this is an "intentional easter-egg" or an "accidental bug"<p>But what about accidentally working-as-intended?<p>Sure it's a little trickier to read, but it's certainly not a "bug" that will cause any damage / danger / instability / etc.
You still have to be mindful of \u202e in anything new that you're writing, but browsers do a much better job of not having it bleed across elements like they did back in the 2000s.<p>Back in the era of forums that didn't support unicode correctly (2005ish?), it was trollish fun to post messages containing \u202E and watch the UI and all subsequent messages and elements get messed up. (One stray \u202E would flip the entire page contents following it.) I never took it to a level of abuse since it was easy to remove and then ban offenders, but it was fun in a one-off thread, and it always had great reactions.<p>I patched my own software to handle it, but I don't recall anyone really abusing it in a widespread manner. (Contrast this with the era of prolific and widely abused AOL/AIM exploits that would kill your IM client with malformed messages.)<p>IIRC, a bunch of messaging clients also didn't (or still don't) handle \u202e termination and it sometimes bled into new messages and even the text input box. That was pretty horrible and unfixable without restarting.<p>Obligatory XKCD: <a href="https://xkcd.com/1137/" rel="nofollow">https://xkcd.com/1137/</a><p>Some shenanigans in the wild:<p><a href="https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_right_to_left_override_character_in_my/" rel="nofollow">https://www.reddit.com/r/Unicode/comments/hc1rxi/i_put_a_rig...</a><p><a href="https://twitter.com/mkolsek/status/1237123571341803522" rel="nofollow">https://twitter.com/mkolsek/status/1237123571341803522</a><p>(These are way tamer than the effects used to be.)<p>(Also, HN filters it out. I tried to have some fun. :P)
Similarly, if I try <a href="https://www.google.com/search?q=u202e" rel="nofollow">https://www.google.com/search?q=u202e</a>, the second result I currently get (YMMV) is from <a href="https://unicode-table.com/" rel="nofollow">https://unicode-table.com/</a>, and almost the entire snippet shows up backwards in the search results.
Stacking combining diacritics[1] is also fun, to make extremely tall text.<p>Also fun is enumerating all the characters in the Private Character section[2] to see what UI symbols are able to be inserted into unintended places.<p>[1] <a href="https://www.unicode.org/charts/PDF/U0300.pdf" rel="nofollow">https://www.unicode.org/charts/PDF/U0300.pdf</a><p>[2] <a href="http://www.unicode.org/faq/private_use.html" rel="nofollow">http://www.unicode.org/faq/private_use.html</a> <a href="https://www.unicode.org/charts/PDF/UE000.pdf" rel="nofollow">https://www.unicode.org/charts/PDF/UE000.pdf</a>
If there was ever a clear signal that working with Unicode is incredibly hard, it would be the fact that no one on HN can decide if this is accidental or intentional.
Our programming languages might need a unicode aware string concatenation operator, similar to locale aware capitalization. Joining LTR text to RTL text seems like it should result in combined LTR + RTL text, not letting the LTR marker override and change meaning.
Are there any lists of unicode characters (like the OWASP one) that should be blacklisted from most apps (not just for XSS, but even for desktop apps)?<p>Are there any good security guides/best practices for unicode sanitation?
The funny thing is that search queries preceded by a backslash on DuckDuckGo are supposed to take you to the first search result, but that functionality seems to be buggy anyway:<p><a href="https://www.reddit.com/r/duckduckgo/comments/sp9e5r/backslash_does_not_actually_go_to_first_result/" rel="nofollow">https://www.reddit.com/r/duckduckgo/comments/sp9e5r/backslas...</a>
Reminds me of searching for the terms "do a barrel roll", "recursion" or "askew" on Google. I'm sure there's plenty of others.
Instantly reminded me of a relevant xkcd: <a href="https://xkcd.com/1137/" rel="nofollow">https://xkcd.com/1137/</a>
> This is often abused by hackers to disguise file extensions: when using it in the file name my-text.'U+202E'cod.exe, the file name is actually displayed as my-text.exe.doc<p>So every programmer has to know about and support U+202E, but not filesystem programmers?
Extremely bad design. This kind of complexity should have been moved to some kind of post-processing spec rather than core Unicode. It's already causing issues and will cause more. The more universal something is, the more effort should be applied to keeping it simple.
It's intentional, if you inspect the `innerText` you'll see it's reversed there too:<p><pre><code> zero_click_wrapper.innerText.codePointAt(0)
</code></pre>
Evaluates to 32. And if you think 32 = 0x20 could mean the next one would be 0x2E, then no, codePointAt(1) is 0x55.
Why can't I just disable RTL on my system?<p>I do not speak a word of Arabic. There is no circumstance in which my life will be materially improved by correct RTL text rendering. I might want proper display of individual characters so I can copy-paste them, but I have no use for RTL text.<p>On the other hand, RTL causes a lot of unpleasant problems like this. Why can't I simply coerce all foreign languages into LTR?