TechEcho

3 comments

Anthony-Gabout 3 years ago

In 2009, David A. Wheeler wrote a comprehensive article covering problems with Unix/Linux/POSIX filenames¹. Given that the OS naïvely treats filenames as a simple stream of bytes, he advocated that developers use UTF-8 for encoding filenames. He mentioned the issue of multiple normalisation systems being used to encode characters that have more than one Unicode representation but glossed over it because such problems are “overshadowed by the terrible awful even worse problems caused by filenames all being in random unguessable charsets”.<p>I’m guessing that, by now, most developers on Unix-like systems would be using UTF-8 for filenames – though a decade after these articles were published, there still doesn’t seem to be any good/universal solution to the problem of characters with multiple Unicode representations.<p>¹ <a href="https://dwheeler.com/essays/fixing-unix-linux-filenames.html" rel="nofollow">https://dwheeler.com/essays/fixing-unix-linux-filenames.html</a>

juancnabout 3 years ago

You should normalize names on write, on read is very hard to fix. You can have a perfectly valid, denormalized strings representing codepoints with different normalizations.<p>So if you have four possible normalizations: NFD, NFC, NFKD, NFKC and your string has N ambiguous codepoints, the number of possible strings you need to try is N^4.

评论 #30959265 未加载

评论 #30949496 未加载

baal80spamabout 3 years ago

Side note - after all these years I still don't feel comfortable with using special characters (like ą, ż, ź) and spaces in filenames in Windows. DOS times sit deeply in my soul and It just doesn't feel right.

评论 #30949571 未加载

评论 #30949987 未加载

评论 #30963917 未加载

评论 #30949756 未加载

Filenames with Accents (2011)

3 comments

Filenames with Accents (2011)

3 comments