Hi, I'm the guy who came up with the splice idea. It's based on what I learned doing this:<p>www.connectathon.org/talks96/bds.pdf<p>which was for the EIS (Earth Imaging System) project, a government effort to image the earth about 15 years ago. That project eventually had 200Mhz MIPS SMP boxes moving data through NFS at close to 1Gbyte/sec 24x7.
So far as I know, nobody else has ever come close to that even with 10x faster CPUs.<p>Most of the people in this thread pretty clearly don't understand the issues involved, Rob included (sorry, Rob, go talk to Greg). Moving lots and lots of data very quickly precludes looking at each byte by the CPU. The only thing that should look at each byte is a DMA engine.<p>Sendfile(2) is a hack, that's true. It is a subset of what I imagined splice(2) could be (actually splice(3), the syscalls are pull(2) and push(2)). But it's a necessary hack.<p>Jens' splice() implementation was a start but wasn't really what I imagined for splice(), to really go there you need to rethink how the OS thinks about moving data. Unless the buyin is pervasive splice() is sort of a wart.
The unnecessary data copying problem, as Robert Pike suggests, can be also solved by a more generic Zero-Copy approach, instead of adding a specific single purpose system call.<p><a href="http://www.linuxjournal.com/article/6345" rel="nofollow">http://www.linuxjournal.com/article/6345</a>
<a href="http://kerneltrap.org/node/294" rel="nofollow">http://kerneltrap.org/node/294</a>
<a href="http://www.cs.duke.edu/ari/trapeze/freenix/node6.html" rel="nofollow">http://www.cs.duke.edu/ari/trapeze/freenix/node6.html</a><p>It has to be noted however that often the term Zero-Copy is used to describe a technique which avoids memory copying by employing virtual memory remapping.<p>VM tricks are also expensive because, depending on the architecture, it might require flushing the TLBs and impact subsequent memory accesses. The advantage of this way of zero copy approach thus depends on several factors such as the amount of data being transferred to the kernel's buffers.<p>I don't have any recent data regarding real word performances, any references are welcome. However it's far from being self-evident that VM tricks can rival the performance of a dedicated 'sendfile' like system call.
For the un-initiated, sendfile() is a system call that sends data between two file descriptors. The intent is to make the kernel do the read/write cycle instead of the application (user-level) code, thereby cutting down the number of times the data needs to be mapped between kernel and userspace memory spaces.<p>The manual page: <<a href="http://linux.die.net/man/2/sendfile>" rel="nofollow">http://linux.die.net/man/2/sendfile></a>. A related Linux Journal article: <<a href="http://www.linuxjournal.com/article/6345>" rel="nofollow">http://www.linuxjournal.com/article/6345></a>.
Rob Pike should read about D-Bus in the kernel[1], next. Maybe he'll have some comments. No, I mispoke, I think he'll definitely have some comments.<p>[1] <a href="http://git.collabora.co.uk/?p=user/alban/linux-2.6.35.y/.git;a=summary" rel="nofollow">http://git.collabora.co.uk/?p=user/alban/linux-2.6.35.y/.git...</a>
>It can be written in a few lines of efficient user code.<p>I'm not sure he's understanding what this is. There is no copying needed here at all. The kernel could make the hard drive write to a place in memory, have the NIC read from that place and just manage interrupts between the two. The kernel wouldn't have to touch the data at all. This may not be how Linux does it (given the requirement for a memmap'able file descriptor) but that would be possible at least. I don't think you could do anything near this in user code.
I was just doing research on using sendfile's modern zero-copy replacement(s), splice & tee: <a href="http://kerneltrap.org/node/6505" rel="nofollow">http://kerneltrap.org/node/6505</a><p>They were linked elsewhere here, but worth repeating in a top-level comment. :) Possibly a handy tool for the rare times you need to squeeze blood out of a stone.