TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How efficient can cat(1) be?

74 点作者 benhoyt将近 3 年前

13 条评论

DannyBee将近 3 年前
While it&#x27;s certainly just example code, the initial sendfile version is badly buggy.<p><pre><code> &#x2F;* Fall back to traditional copy if the spliced version fails. *&#x2F; if (!spliced_copy(srcfd)) copy(srcfd); </code></pre> The thing that they are trying to avoid is sendfile failing due to inability to mmap the the fd. But they don&#x27;t check for that (it would return EINVAL), and in fact, by converting the error code to boolean, destroy the ability to differentiate[1]. Instead, they check that sendfile failed for <i>any</i> reason, and then redo it with copy.<p>Which means sendfile could output half the data, fail for some reason, and depending on why it failed, the fallback copy read&#x2F;write will do bad things. for example, output the same data again, or more likely, skip data. Since they are just reading from the fd as it now exists after sendfile failing, it is most likely to skip data but pretend it completed successfully.<p>Normally, cat would just fail in that situation, as this should - it should not retry the copy when sendfile returns EINVAL or ENOSYS<p>This is what you get for transforming error codes into booleans :)<p>(I guess errno will still be set, and they could still check it here, but ugh)<p>[1] This is why the man page says: Applications may wish to fall back to read(2)&#x2F;write(2) in the case where sendfile() fails with EINVAL or ENOSYS.
评论 #32137200 未加载
digitalsushi将近 3 年前
From my experience, it will never be so efficient that someone smarter than me doesn&#x27;t publicly shame me for winning the &quot;useless use of cat&quot; award on a forum where I ask for help.<p>28 years later and I&#x27;m still sore I asked for help as a 15 year old that one time. Very effective way to teach a new user.
评论 #32142478 未加载
评论 #32140214 未加载
评论 #32138760 未加载
jstimpfle将近 3 年前
A bit of a tangent, there are few instances of &quot;do-while&quot; that I&#x27;ve ever ever written and not removed soon after. In practice, I&#x27;ve found that the looping situations that don&#x27;t easily match the &quot;for (int i = 0; i &lt; n; i++)&quot; pattern are normally &quot;random&quot; enough that it&#x27;s best to just write &quot;for (;;)&quot; and put explicit checks and breaks inside the body, wherever they naturally fit. Forcing &quot;for (...)&quot; or &quot;while (...)&quot; or &quot;do-while(...)&quot; syntactic constructs is likely to lead to an unnatural sequence of statements. Doing break anywhere is just fine!<p><pre><code> do { splice(from stdin...); if (A) handle_a(); goto out; splice(to stdout...); if (B) handle_b(); goto out; } while (nwritten &gt; 0); &#x2F;&#x2F; do we need some kind of handle_c()?? out: ... </code></pre> Why make a special case for the &quot;nwritten &gt; 0&quot; condition here? And what&#x27;s wrong with &quot;break&quot; vs &quot;goto&quot;?<p><pre><code> for (;;) { thing_a(); if (A) handle_a(); break; thing_b(); if (B) handle_b(); break; if (C) handle_c(); break; } </code></pre> is cleaner in my eyes.
评论 #32136877 未加载
评论 #32137971 未加载
_krii将近 3 年前
Tangent: It frustrates me that it&#x27;s apparently impossible to implement cat(1) in a truly portable way.<p>The problem is supporting unbuffered I&#x2F;O (`cat -u`). Standard C simply can&#x27;t do it. setvbuf(3) allows you to change the buffering on an I&#x2F;O stream, but then fread(3) only allows you to read exact-sized blocks of data. You can only get a short read on EOF or error. So there is no way to say &quot;give me as much data as is available, up to X amount of bytes&quot; and therefore no way to implement unbuffered cat(1) efficiently using only ISO C. You need POSIX for that.
rstarast将近 3 年前
This was a good read! The missing hyperlinks:<p>- <a href="https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man2&#x2F;sendfile.2.html" rel="nofollow">https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man2&#x2F;sendfile.2.html</a><p>- <a href="https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man2&#x2F;splice.2.html" rel="nofollow">https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man2&#x2F;splice.2.html</a><p>(Funny how used I&#x27;ve gotten to &quot;hypertext&quot;, I was quite irritated I couldn&#x27;t click those function names.)
评论 #32136064 未加载
评论 #32136157 未加载
bear8642将近 3 年前
remember enjoying reading the simple plan9 version of cat(1) - &lt;<a href="http:&#x2F;&#x2F;9p.io&#x2F;sources&#x2F;plan9&#x2F;sys&#x2F;src&#x2F;cmd&#x2F;cat.c" rel="nofollow">http:&#x2F;&#x2F;9p.io&#x2F;sources&#x2F;plan9&#x2F;sys&#x2F;src&#x2F;cmd&#x2F;cat.c</a>&gt;
pif将近 3 年前
Slightly out of topic:<p>&gt; There have been a few initiatives in recent years to implement new a new userspace base system for Linux distributions as an alternative to the GNU coreutils and BusyBox.<p>Have there been? And why?
评论 #32138010 未加载
评论 #32140288 未加载
评论 #32136601 未加载
ape4将近 3 年前
It would be nice if user programs didn&#x27;t have to jump through loops like this. It would be ideal if the kernel made the naive implementation work efficiently.
评论 #32138417 未加载
formerly_proven将近 3 年前
copy_file_range is much preferable to any of these because filesystems <i>can</i> &quot;hook into&quot; it and just share the underlying data, not copying at all.
评论 #32136097 未加载
robertlagrant将近 3 年前
I couldn&#x27;t find the bit where the original performance claim was refuted (or not). Was it one of the listed options?
评论 #32137355 未加载
dochtman将近 3 年前
I&#x27;m wondering if it would make sense to use io_uring for this kind of thing. If not, why not?
blibble将近 3 年前
how about ptracing into your target process and dup2()&#x27;ing your FD across?<p>infinitely fast cat
eurasiantiger将近 3 年前
<p><pre><code> cat_spew() </code></pre> Eww.