Sigh, another completely synthetic benchmark with big iron:
A dual Intel Xeon X5670 with 24GB of RAM from SoftLayer. The X5670 has 6 cores @ 2.93 GHz, 2 threads per core, /proc/cpuinfo shows 24 CPUs.<p>Serving a plain HTML page over <i>localhost</i> via nginx. A medium length blog post with pretty much no real-world information.
My toy web server can achieve similar performance using a much more modest hardware (Core i7 2640 laptop), using way less RAM (a few dozen kilobytes).<p>Granted this is also being tested on localhost with static content (no disk I/O) -- but shows that event-driven servers are not that novel or difficult to write: my code weighs around 1700 LOC of (might I say) readable C.<p>Static file serving is also fast (using sendfile(), etc), but needs an overhaul to achieve usable concurrency. Currently there's a ~4x performance drop while serving files, but I'm working on this.<p>(The sources are at <a href="http://github.com/lpereira/lwan" rel="nofollow">http://github.com/lpereira/lwan</a> by the way.)
nginx saturates at ~18k req/sec / core with a latency of ~24ms. This saturation is not coming from nginx in particular but from OS limits (mode switching, stack copying etc.. for read/write system calls).<p>There is nothing new in "modern HTTP servers". They are event-driven programs and this has existed for a long time.
Since this is almost pure sendfile() work (aside from the headers), it really doesn't seem like a very useful example... there isn't much static content left on the web.
I'm interested in more info on your linux TCP tuning, especially how you decided to tcp_tw_recycle and tcp_fin_timeout (and how you decided setting the former to 1 is safe)