TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Protecting paths in macro expansions by extending UTF-8

30 点作者 nalgeon大约 1 年前

6 条评论

CJefferson大约 1 年前
This article seems to assume paths will be valid UTF-8, which isn&#x27;t true on either Linux certainly, and Windows as far as I know.<p>of course we could say &quot;paths must be valid UTF-8 for this program to work&quot; (quite a few Rust programs do require this, as they store paths in standard Rust strings, which themselves must be valid UTF-8), but if your concern is dodgy paths breaking things, you probably need to check for that somewhere?
评论 #39602911 未加载
lifthrasiir大约 1 年前
This sounds like a perfect recipe for the disaster. You have essentially made a separate character encoding that looks like but in fact is unlike UTF-8, so they have to be very strictly separated from each other. In most cases, of course, they will be inevitably mixed to each other.
评论 #39615009 未加载
gpvos大约 1 年前
This looks like a hack that will inevitably bite you in the back sometime in the future, for example if one of the involved programs starts to validate UTF-8 in the future, or your system locale changes, or something similar.
评论 #39608458 未加载
amake大约 1 年前
Seems like you might as well use Private Use Area characters[0] and keep things valid UTF-8.<p>(Yes, you will have problems with paths that contain PUA characters. But people have pointed out that paths aren&#x27;t necessarily valid UTF-8, so you can&#x27;t inline-encode your way out of this <i>anyway</i>. PUA characters are likely vanishingly less common than spaces, so you still <i>mostly</i> solve the problem.)<p>[0] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Private_Use_Areas" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Private_Use_Areas</a>
WorldMaker大约 1 年前
If you are going to manipulate spaces into other things in Unicode there are already so many fun tools like non-breaking spaces and half-width spaces and medium mathematical space. You could even go for weird, rare ASCII-compatible like &quot;form feed&quot;.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Whitespace_character" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Whitespace_character</a><p>Seems more fun to use something that exists, is rare, and is already weirdly space-like. (Though yes, you have to find a way to escape it if someone is crazy enough to do something like name a file with a &quot;form feed&quot; in the middle.)
rini17大约 1 年前
If you insist going that way there&#x27;s a perfectly cromulent &quot;File Separator&quot; ASCII command character. While it&#x27;s still possible file names contain it on Linux, it&#x27;s easier to detect and sanitize or better, reject any such input.