TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Efficient File Copying on Linux

95 pointsby eklitzkeabout 8 years ago

9 comments

dom0about 8 years ago
So after the last blog post by The Author which mainly showed The Author&#x27;s lack of understanding, we have another article from The Author highlighting that he does indeed not understand things he writes blog posts about (incorrect rationale and assumptions about 128 KiB block size being optimal, no readahead on virtual device files, and of course not mentioning any of the FD splicing alternatives in a post titled &quot;Efficient ...&quot; or any of the approaches involving memory mappings and explicit prefetching on said mappings).<p>I don&#x27;t want to be overly extremely dismissive or arrogant here, but this post pretty much boils down to &quot;128 KiB is optimal because that number appears somewhere else, too, and that other spot even has somehow something to do with I&#x2F;O&quot;.
评论 #13936301 未加载
评论 #13936594 未加载
arsabout 8 years ago
Explanation concludes readahead is the reason that 128KB buffer is fastest on the benchmark, while the benchmark uses &#x2F;dev&#x2F;zero and &#x2F;dev&#x2F;null which don&#x27;t have readahead.<p>You need to redo this article using actual reads and writes. Try it with both a quiet machine and a semi-busy one.
评论 #13936631 未加载
JoshTriplettabout 8 years ago
I&#x27;d be interested to see how this compares to 1) mmapping both files and using memcpy, 2) mmaping the source and making a single call to write passing the whole buffer, and 3) copy_file_range.
评论 #13936307 未加载
ameliusabout 8 years ago
For even faster copying on the same device, use a copy-on-write (COW) filesystem.<p>(I wonder though what API the &quot;cp&quot; command would use to accomplish that).
评论 #13936187 未加载
评论 #13936142 未加载
jquastabout 8 years ago
The statvfs system call indicates the preferred block size of the filesystem. It is a very large value on zfs, for example.<p><a href="https:&#x2F;&#x2F;docs.python.org&#x2F;2&#x2F;library&#x2F;statvfs.html#statvfs.F_BSIZE" rel="nofollow">https:&#x2F;&#x2F;docs.python.org&#x2F;2&#x2F;library&#x2F;statvfs.html#statvfs.F_BSI...</a>
valarauca1about 8 years ago
I doubt these benchmarks are relevant anymore as Linux has a system just dedicated to copying files.<p>So you never even have to leave the page cache, let alone copy into user space.<p>OFC it was implemented post 4.0 so I doubt GLibc supports it therefore the whole world pretends it doesn&#x27;t exist.
LeoPantheraabout 8 years ago
The most efficient way to copy a large number of small files is often to use a tarpipe. What block size does &quot;tar&quot; use? And for that matter, &quot;nc&quot;, as a tarpipe through nc is a super fast way to move data between machines.
评论 #13942370 未加载
jaimex2about 8 years ago
Good explanation, thank you. So when using dd or other copy tools setting a block size of 128 kb should also be the best choice?
heinrich5991about 8 years ago
What about `copy_file_range(2)`?
评论 #13936426 未加载
评论 #13937170 未加载