In real-life web serving situations, and not in benchmarks, the majority of the fds is not active. It's the slow guys that kill you.<p>A client on a fast connection will come in and will pull the data as fast as the server can spit it out, keeping the process and the buffers occupied for the minimum amount of wall clock time and the number of times the 'poll' cycle is done is very small.<p>But the slowpokes, the ones on dial up and on congested lines will get you every time. They keep the processes busy far longer than you'd want and you have to hit the 'poll' cycle far more frequently, first to see if they've finally completed sending you a request, then to see if they've finally received the last little bit of data that you sent them.<p>The impact of this is very easy to underestimate, and if you're benchmarking web servers for real world conditions you could do a lot worse than to run a test across a line that is congested on purpose.
Zed isn't the only one who has found epoll to be slower than poll. The author of libev basically says the same thing. See <a href="http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod" rel="nofollow">http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod</a> and search for EVBACKEND_EPOLL.<p>I wonder how kqueue behaves compares to poll and epoll. Kqueue has a less stupid interface because it allows you to perform batch updates with a single syscall.
It is worth pointing out that the original epoll benchmarks were focused on how performance scaled with the number of dead connections, not performance in general:<p><a href="http://www.xmailserver.org/linux-patches/nio-improve.html" rel="nofollow">http://www.xmailserver.org/linux-patches/nio-improve.html</a><p>And as jacquesm points out, in a web-facing server, that's the case you should care about. A 15-20% performance hit in a situation a web-facing server is never going to see doesn't matter when you consider that the 'faster' method is 80% slower (or worse) in lots of real world scenarios.<p>I'll be interested to see how the superpoll approach ends up working, but my first impression is 'more complexity, not much more benefit'.
Pardon my ignorance, I haven't built high performance servers at this low a level, but I'm intrigued:<p>What exactly is the definition of an "active" file descriptor in this context?<p>My best guess after reading the man pages is that poll() takes an array of file descriptors to monitor and sets flags in the relevant array entries, which your code then needs to scan linearly for changes, whereas epoll_wait() gives you an array of events, thus avoiding checking file descriptors which haven't received any events. Active file descriptors would therefore be those that did indeed receive an event during the call.<p>EDIT: thanks for pointing out Zed's "superpoll" idea. I somehow completely missed that paragraph in the article, which makes the following paragraph redundant.<p>If this is correct, it sounds to me (naive as I am) as if some kind of hybrid approach would be the most efficient: stuff the idling/lagging connections into an epoll pool and add the <i>pool</i>'s file descriptor to the array of "live" connections you use with poll(). That of course assumes you can identify a set of fds which are indeed most active.
The blog post does not say if the epoll code uses level triggering or edge triggering. It would be interesting to see the results for both modes. The smaller number of system calls required for edge triggering might make a difference in performance.
Is it just me, or did Zed not describe his testing methodology in any detail?<p>I can't even find a reference to his OS configuration and version details that he's developing on, which seems to me like a critical detail.
Lets assume we have 20k opened FDs.<p>In case of poll(), you have to transfer this array of FDs from the userland vm to the kernel vm each time you call poll(). Now compare this with epoll() (let's assume we are using EPOLLET trigger), when you only have to transfer the file descriptors once.<p>You might say the copying won't matter, but it will matter when you have a lot of events coming on the 20k FDs which eventually leads to calling xpoll() at a higher rate, hence more copying of data between the userland and kernel (4bytes * 20k, ~80kbytes each call).
Zed, whats with all the premature optimization? Surely Mongrel2 should first be able to make coffee, build you an island and f@!in transform into a jet and fly you there, before you start to make it faster!<p>Just kidding. It's always nice to see science in action. Great work! I suspect there's an impact on ZeroMQ's own poll/epoll strategy.
Question: as the ATR is going higher, so would the proportional time spent in poll or epoll, no?<p>So if you have a thousand fds, and they're all active, you have to deal with a thousand fds, which would make the difference between poll and epoll insignificant (only <i>twice</i> as fast, not even an order of magnitude!)?<p>This would make the micro-benchmark quite micro! Annoyingly enough, I think that means that the real way to find out would be an httpperf run, with each backends. A lot more work...
Very nice write-up. Little details such as this should make Mongrel2 very solid. It's nice to see how he analyzed the issues around poll and epoll and then figured out how to make use of both for optimum performance no matter what happens in production. Many other programs could benefit from this sort of analysis although at different levels... e.g. Sorted vectors may be better for smaller containers but hash tables better for larger containers, etc.
interesting article! Is 'super-poll' done yet? i would have liked to see a super poll line on some of those graphs to see how it compares to just vanilla poll and ePoll at different ATRs. Though i guess you would also have to test for situations where ATR varies over time (so that you could measure the impact of moving fds back and forth).
It is a little wonder why this kind of people think that everyone else are just stupid to realize such things. What they want is a fame and followers. (btw, don't you forget to donate!)<p>hint: nginx/src/event/modules/ngx_epoll_module.c<p>May be one should learn how to use epoll and, perhaps, how to program? ^_^