TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Filenames with Accents (2011)

21 点作者 FrankSansC大约 3 年前

3 条评论

Anthony-G大约 3 年前
In 2009, David A. Wheeler wrote a comprehensive article covering problems with Unix&#x2F;Linux&#x2F;POSIX filenames¹. Given that the OS naïvely treats filenames as a simple stream of bytes, he advocated that developers use UTF-8 for encoding filenames. He mentioned the issue of multiple normalisation systems being used to encode characters that have more than one Unicode representation but glossed over it because such problems are “overshadowed by the terrible awful even worse problems caused by filenames all being in random unguessable charsets”.<p>I’m guessing that, by now, most developers on Unix-like systems would be using UTF-8 for filenames – though a decade after these articles were published, there still doesn’t seem to be any good&#x2F;universal solution to the problem of characters with multiple Unicode representations.<p>¹ <a href="https:&#x2F;&#x2F;dwheeler.com&#x2F;essays&#x2F;fixing-unix-linux-filenames.html" rel="nofollow">https:&#x2F;&#x2F;dwheeler.com&#x2F;essays&#x2F;fixing-unix-linux-filenames.html</a>
juancn大约 3 年前
You should normalize names on write, on read is very hard to fix. You can have a perfectly valid, denormalized strings representing codepoints with different normalizations.<p>So if you have four possible normalizations: NFD, NFC, NFKD, NFKC and your string has N ambiguous codepoints, the number of possible strings you need to try is N^4.
评论 #30959265 未加载
评论 #30949496 未加载
baal80spam大约 3 年前
Side note - after all these years I still don&#x27;t feel comfortable with using special characters (like ą, ż, ź) and spaces in filenames in Windows. DOS times sit deeply in my soul and It just doesn&#x27;t feel right.
评论 #30949571 未加载
评论 #30949987 未加载
评论 #30963917 未加载
评论 #30949756 未加载