> "Nice, but why on earth would I want that?" I have no idea.<p>I know this is referring mostly to the `cat` portion and not the `splice` portion of the article, but I'll throw in a quick shoutout to `splice` for giving me one of the single biggest build performance wins in my time at Zynga (and possibly across most teams at the company at the time).<p>We had a ruby script which ran the majority of the build, and as the game grew we found that by far the slowest part was a loop which MD5 hashed each individual asset and used that as its filename on our CDN for per-asset-versioning.<p>At its worst it was taking nearly an hour and a half; the code was basically as inefficient as you could make it - multiple shell calls for each file rather than any sort of inlining of the hashing process.<p>I wrote a basic C program using splice and an MD5 library which took the whole process to under 10s. A bit overkill, perhaps, but the naive speedup I tried first still took over 1-2 minutes, and I figured 99.99% was worth the extra few hours to put it together knowing how many builds we ran each day.<p>Definitely gave me a healthy appreciation for the cost of transferring to user space that has stuck with me.
> In this case, if you notice that cat is the bottleneck try fcat (but first try to avoid cat altogether).<p>"Useless Use of Cat Award" [0] is the canonical text for avoiding unnecessary use of cat, for those who haven't come across it yet.<p>[0] <a href="http://porkmail.org/era/unix/award.html" rel="nofollow">http://porkmail.org/era/unix/award.html</a> (2000)
I re-implemented it in C and for some reason O_APPEND is set on stdout by default.<p>But aside from that it works just as the Rust version.<p><pre><code> #define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define BUF_SIZE 16384
static void unset_flag(int fd, int flag) {
int flags = fcntl(fd, F_GETFL, 0);
flags &= ~flags;
fcntl(fd, F_SETFL, flags);
}
int main(int argc, char** argv) {
int pipefd[2];
pipe(pipefd);
unset_flag(STDOUT_FILENO, O_APPEND);
for (int i = 1; i < argc; ++i) {
int fd = strcmp(argv[i], "-") ? open(argv[i], O_RDONLY) : STDIN_FILENO;
if (fd < 0) {
fprintf(stderr, "%s: No such file or directory\n", argv[i]);
exit(1);
}
while (splice(fd, NULL, pipefd[1], NULL, BUF_SIZE, 0))
splice(pipefd[0], NULL, STDOUT_FILENO, NULL, BUF_SIZE, 0);
close(fd);
}
return 0;
}
</code></pre>
WTFPL if anyone cares.
Newer kernels also have the copy_file_range syscall (with compatibility shim in glibc) which is supposed to use the most efficient copying approach available between any two file descriptors. So it's more general than splice or sendfile.
There is a ruby gem for Linux called io_splice that does zero-copy IO. Hasn’t been updated in a while but it doesn’t have any dependencies other than modern Linux and doesn’t mean it won’t work. “Old” code that works still works, novelty, job-securitization and API churn be damned when it doesn’t add value.<p><a href="https://rubygems.org/gems/io_splice/versions/4.4.0" rel="nofollow">https://rubygems.org/gems/io_splice/versions/4.4.0</a><p><a href="http://www.bigfastblog.com/zero-copy-transfer-data-faster-in-ruby" rel="nofollow">http://www.bigfastblog.com/zero-copy-transfer-data-faster-in...</a><p>EDIT: source to current stable coreutils’ cat <a href="http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_plain;f=src/cat.c;hb=e5dae2c6b0bcd0e4ac6e5b212688d223e2e62f79" rel="nofollow">http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob_p...</a>
The most interesting thing about all this to me, other than the existence of splice(I really should finish The Linux Programming Interface), is that you need a pipe and two splice operations to get the data between other file types.. There must be some dirty implementation detail forcing this right? Right?!
>Windows doesn't provide zero-copy file-to-file transfer (only file-to-socket transfer using the TransmitFile API).<p>Anybody knows if the Windows TransmitFile API can also be used to make file-to-file copies?
Tl;dr: splice() as a Linux-only, zero-userspace-copy, file-descriptor to file-descriptor copy that has to use pipes for one FD.<p>Interesting, but less than earthshaking.