Fun story about one of the devices mentioned there that I worked on. We used to store the saved wifi creds in a file named exactly what the SSID was.<p>Some user managed to break things, and with their permission we gathered detailed wifi logs and found they were connected to an SSID that was an ASCII depiction of the equation: [redacted] plus [redacted] equals [redacted]. The issue was the forward slashes, presumably there to add [redacted]. Must have been an awkward customer service follow up when we told them to change their SSID while they waited for an update.
I used to have something like this as my SSID:
ʕ•̫͡•ʕ<i>̫͡</i>ʕ•͓͡•ʔ-̫͡-ʕ•̫͡•ʔ<i>̫͡</i>ʔ-̫͡-ʔ (Not this particular one as it was too long though!) Many nice examples at: <a href="https://1lineart.kulaone.com/#/" rel="nofollow">https://1lineart.kulaone.com/#/</a><p>It was fun but some OSes didn't show it correctly, in particular Windows. It would just show it in HEX. And more annoyingly, some devices refused to connect to it at all, especially IoT crap like those WiFi power sockets.<p>So eventually I gave up.<p>PS: Something with more vertical stuff would also be really fun, some of these can write across multiple lines of unrelated content! Unfortunately most OSes block this from happening now. Example:<p>Ỏ̷͖͈̞̩͎̻̫̫̜͉̠̫͕̭̭̫̫̹̗̹͈̼̠̖͍͚̥͈̮̼͕̠̤̯̻̥̬̗̼̳̤̳̬̪̹͚̞̼̠͕̼̠̦͚̫͔̯̹͉͉̘͎͕̼̣̝͙̱̟̹̩̟̳̦̭͉̮̖̭̣̣̞̙̗̜̺̭̻̥͚͙̝̦̲̱͉͖͉̰̦͎̫̣̼͎͍̠̮͓̹̹͉̤̰̗̙͕͇͔̱͕̭͈̳̗̭͔̘̖̺̮̜̠͖̘͓̳͕̟̠̱̫̤͓͔̘̰̲͙͍͇̙͎̣̼̗̖͙̯͉̠̟͈͍͕̪͓̝̩̦̖̹̼̠̘̮͚̟͉̺̜͍͓̯̳̱̻͕̣̳͉̻̭̭̱͍̪̩̭̺͕̺̼̥̪͖̦̟͎̻̰_Ỏ̷͖͈̞̩͎̻̫̫̜͉̠̫͕̭̭̫̫̹̗̹͈̼̠̖͍͚̥͈̮̼͕̠̤̯̻̥̬̗̼̳̤̳̬̪̹͚̞̼̠͕̼̠̦͚̫͔̯̹͉͉̘͎͕̼̣̝͙̱̟̹̩̟̳̦̭͉̮̖̭̣̣̞̙̗̜̺̭̻̥͚͙̝̦̲̱͉͖͉̰̦͎̫̣̼͎͍̠̮͓̹̹͉̤̰̗̙͕͇͔̱͕̭͈̳̗̭͔̘̖̺̮̜̠͖̘͓̳͕̟̠̱̫̤͓͔̘̰̲͙͍͇̙͎̣̼̗̖͙̯͉̠̟͈͍͕̪͓̝̩̦̖̹̼̠̘̮͚̟͉̺̜͍͓̯̳̱̻͕̣̳͉̻̭̭̱͍̪̩̭̺͕̺̼̥̪͖̦̟͎̻̰<p>So the Unicode above this would write through the next lines on some platforms, even system screens like the wifi chooser :)<p>But these quickly get too long for an SSID too.
The 802.11 standards have always allowed up to 32 bytes which can be filled with any data, it does not have to be in a particular encoding. In 802.11-2012 there is a separate tag SSIDEncoding which can be used to specify if these bytes are in UTF-8 or "unspecified". If the UTF-8 option is set, the SSID should be interpreted as UTF-8.<p>It is not clear in this case if the router sets this flag or not. Either way there is no stipulation in the spec about how the UTF-8 characters should be displayed so many of these options are potentially valid.
> Both the s8 and the Firestick are rendering the result in what I deem as the correct way with it showing the name just with some of the vertical characters cutoff.<p>At least one is doing a poor job, though, because the diacritics look nothing alike…<p>> After asking around on the Apple discord server someone said it might be using the Mac OS Roman character set. It turns out it which is strange because iOS used UTF-8 internally and not Mac OS Roman as that was phased out with the release of Mac OS X.<p>I would guess that some part of IOKit is passing a C or C++ string to CoreFoundation using an inappropriate function or using the “system encoding”. I can’t remember of the top of my head, but Mac OS Roman might also be encoding 0. In any case there’s certainly a convention going on there with a poor default or some sort of strange compatibility story.<p>(I’m actually curious if there is “supposed” to be an encoding for this. Perhaps Mac OS Roman is just as correct and more convenient?)
And out of curiosity, taking some from this:<p><a href="https://github.com/minimaxir/big-list-of-naughty-strings/blob/master/blns.txt" rel="nofollow">https://github.com/minimaxir/big-list-of-naughty-strings/blo...</a><p>especially the Asian ones, seems to varying from mildly amusing to interesting effects, when you try to set them as SSID.
Ok i am a bit angry, first i was thinking that a fly shit is on my screen, then that my GPU has a problem, then i read the Title ;)<p>It's really crazy, looks completely different on my bsd-box compared to my linux-laptop LOVE IT!!
My Canon printer won’t join my SSID containing an emoji, helpfully throws generic E36 (or something like that). All Apple devices show and connect to the SSID just fine.
I'd be curious to see how a car may display that.<p>I've paired my phone with a family members Volkswagen SUV and it could not display the SSID properly, an emoji.<p>Most laptops are capable of displaying emoji SSIDs (bluetooth and wifi).
Very cool. It's pretty interesting to see the various failure modes. Some seem straightforward (e.g., the font is missing the glyphs) while others seem to be parsing limitations.<p>As an aside, this finally convinced me to explore using additional SSIDs in creative ways with emojis.
Out of curiosity, I ran this test on Nintendo Switch: <a href="https://i.imgur.com/8o2LLUm.png" rel="nofollow">https://i.imgur.com/8o2LLUm.png</a><p>It seems like its OS doesn't support combining characters.
For most of the Western world, if you take the set of all commonly used characters in the language(s) that are widely recognized in each country and form their intersection, you'll have at least the Arabic numerals and plain A-Z.<p>If SSIDs were restricted to just those characters, it would be fine in the Western World. But of course there is more to the world than the West.<p>Question: do most or all non-Western languages also have small subsets of characters that would be fine to restrict SSIDs to? For instance, Wikipedia tells me that Persian is written with a 32 character alphabet, and Arabic uses 28 characters for its alphabet.<p>I'd expect that for every alphabet-based language, there is a similar base set of characters you could reasonably limit SSIDs too, and so avoid all the problems you get with allowing full Unicode.<p>How about the languages that use logographic writing systems, such as Chinese, Japanese, Korean, and Vietnamese? Do they all have reasonable (albeit probably very large) subsets SSIDs could be limited to that would avoid all their weird stuff that can happen in Unicode but still allow most reasonable names to be used?
I tested this out of curiosity, and all iPhones I could find in my household rendered correctly in UTF-8 with only 12 octets [0]. This is replicated on iPhone 7, SE and XR, all running 13.5.1. So it may well be the issue was fixed in 6s or 7.<p>[0] <a href="https://i.imgur.com/KDau4PP.jpg" rel="nofollow">https://i.imgur.com/KDau4PP.jpg</a>
This is a really good post that shines some light on how the insanity of encodings still isn't fixed today, since so many operating systems still don't completely use Unicode everywhere.<p>Some of the reasonings behind why the characters are displayed like that are slightly incorrect, though, so here are some corrections:<p>I'm going to supply each example here with some python3 code to reproduce with, with the following definition:<p>`data = b"a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd"`<p>First, let's start at the beginning:<p>> My router just cut the name down to 32 octets though to stay complient
> This was what was being sent according to iw
> `a\xcc\xb6\xcc\x81\xcc\x93\xcc\xbf\xcc\x88\xcc\x9b\xcc\x9b\xcd\x90\xcd\x98\xcd\x86\xcc\x90\xcd\x9d\xcc\x87\xcc\x92\xcc\x91\xcd`<p>If you look at this closely, the last byte in this sequence is `\xcd`, which is an incomplete UTF-8 character. It's missing the final `\x84` that the router cut off (along with the three additional `a` characters).<p>> with the raw hex being
> `97ccb6cc81cc93ccbfcc88cc9bcc9bcd90cd98cd86cc90cd9dcc87cc92cc91cd`<p>small mistake: the hex of `a` is `61`, not `97` (that's decimal), but otherwise correct.<p>> Galaxy S8 running Android 9 with Kernel 4.4.153
> Amazon Firestick<p>Everything correct, except for a small detail:<p>These two devices render the result of UTF-8 decoding while ignoring bytes that are invalid unicode (in python3: `data.decode('utf-8', 'ignore')`)<p>> iPhone 6 running iOS 13.5.1
> Apple TV Second Generation<p>Completely correct. This is definitely Mac OS Roman (in python3: `data.decode('mac_roman')`)<p>> Windows 10 Pro 10.0.19041<p>This one is a incorrect again:<p>Windows is interpreting the characters in the "Windows Codepage 1252" (also known as "Western") encoding and ignoring invalid characters (in python3: `data.decode('cp1252', 'ignore')`)<p>Decoding every character separately as UTF-8 would fail (since every byte that can be a continuation of a UTF-8 character is not a valid start byte).<p>Interpreting every character as a Unicode code-point number would give something very similar, but not exactly the same: What Windows decodes as quote, caret-y thing, angle bracket-y thing, tilde, dagger, double dagger, and single quote fall into a control character block at the start of the Unicode "Latin-1 Supplement" block (`\x80` to `\x9f`).<p>> Chromebook running ChromeOS 83.0.4103.97<p>Correct.<p>The Chromebook seems to have rendered the ASCII a, but replaced all other 31 characters with question marks.<p>> Kindle Paperwhite running Firmware 5.10.2
> Vizio M55-C2 TV<p>Also correct.<p>Those two devices seem to opt to display hex instead of falling back to question marks as the Chromebook does.<p>I hope this comment gave some useful insight into why these devices decoded it this way :)
> <i>Comparing how different devices display the SSID “á̶̛̛̓̿̈͐͆̐̇̒̑̈́͘͝aaa”</i><p>I always though that such Unicode characters not allowed in the HN titles.