My favorite is U+202E Right-to-Left Override, which doesn't appear to be listed there. A surprising amount of UIs (apps, sites) can be broken with it as they were never tested with right-to-left writing direction in mind. Even a Unicode reference website that I just used to recall the code is broken by it. [0] Entering RLO into arbitrary input forms for fun can bend spacetime, I swear.<p>[0] <a href="https://unicode-table.com/en/202E/" rel="nofollow">https://unicode-table.com/en/202E/</a>
This is another good reason to have a text editor you really trust, which can show you these things. Whether it's different line-endings or weird invisible space stuff, I know I can just open it in Vim and figure out what's really going on pretty quickly. Wasted a lot of time earlier in my life on that nonsense (:
Great for doing tacit programming[1] in JavaScript:<p><pre><code> avg=ㅤ=>ㅤ.reduce((ㅤㅤ,ㅤㅤㅤ)=>ㅤㅤ+ㅤㅤㅤ)/ㅤ.length
avg([3,1,4,1,5])
2.8
</code></pre>
[1] <a href="https://en.wikipedia.org/wiki/Tacit_programming" rel="nofollow">https://en.wikipedia.org/wiki/Tacit_programming</a>
A while back I used these kinds of characters to encode programs into invisible text: <a href="https://www.thelisowe.com/sleeper-cell-a-method-of-embedding-invisible-programs-into-source-code/" rel="nofollow">https://www.thelisowe.com/sleeper-cell-a-method-of-embedding...</a><p>It doesn't do much on its own. I feel like it could, but the most effective use case I've come up with it you can invisibly plant a piece of code in some piece of text, then later on run another script that looks for that piece of code and runs it. I'm guessing that splitting the code up like this would make it harder to detect (not to mention that this code could even reside in other programs' comments undetected).
Zero-width characters can be used to covertly watermark text and to figure out who copied text from a page and pasted it somewhere else. Server software can encode a hidden number between every few words, which corresponds to a server log entry with your username (if logged in), IP address, browser fingerprint, etc. I wrote more about this here:<p><a href="https://nervuri.net/stega" rel="nofollow">https://nervuri.net/stega</a><p>I think the best solution to this type of problem would be a clipboard utility that warns you when you copy text which contains hidden characters, homoglyps, rarely used whitespace characters, etc.
I've built a tool specifically to test if these kind of characters will reach API backends: <a href="https://github.com/Endava/cats" rel="nofollow">https://github.com/Endava/cats</a>. My idea was that APIs should explicitly reject or sanitise input containing such characters.
So I guess the only future-proof solution to check for this is to render user input off screen and count the number of solid pixels, at least until "falsehoods programmers believe about names" gets updated to include "Names must consist of at least one readable glyph".