So after the last blog post by The Author which mainly showed The Author's lack of understanding, we have another article from The Author highlighting that he does indeed not understand things he writes blog posts about (incorrect rationale and assumptions about 128 KiB block size being optimal, no readahead on virtual device files, and of course not mentioning any of the FD splicing alternatives in a post titled "Efficient ..." or any of the approaches involving memory mappings and explicit prefetching on said mappings).<p>I don't want to be overly extremely dismissive or arrogant here, but this post pretty much boils down to "128 KiB is optimal because that number appears somewhere else, too, and that other spot even has somehow something to do with I/O".
Explanation concludes readahead is the reason that 128KB buffer is fastest on the benchmark, while the benchmark uses /dev/zero and /dev/null which don't have readahead.<p>You need to redo this article using actual reads and writes. Try it with both a quiet machine and a semi-busy one.
I'd be interested to see how this compares to 1) mmapping both files and using memcpy, 2) mmaping the source and making a single call to write passing the whole buffer, and 3) copy_file_range.
For even faster copying on the same device, use a copy-on-write (COW) filesystem.<p>(I wonder though what API the "cp" command would use to accomplish that).
The statvfs system call indicates the preferred block size of the filesystem. It is a very large value on zfs, for example.<p><a href="https://docs.python.org/2/library/statvfs.html#statvfs.F_BSIZE" rel="nofollow">https://docs.python.org/2/library/statvfs.html#statvfs.F_BSI...</a>
I doubt these benchmarks are relevant anymore as Linux has a system just dedicated to copying files.<p>So you never even have to leave the page cache, let alone copy into user space.<p>OFC it was implemented post 4.0 so I doubt GLibc supports it therefore the whole world pretends it doesn't exist.
The most efficient way to copy a large number of small files is often to use a tarpipe. What block size does "tar" use? And for that matter, "nc", as a tarpipe through nc is a super fast way to move data between machines.