Their system works well on a RISC-V, which is a deeply pipelined but in-order architecture running at only 50MHz, with very fast memory relative to its clock speed. I'd like to see results compared to a more reasonable CPU implementation. For example they could have used a Xilinx Zynq, which includes hard Cortex A9 cores, which are out-of-order and superscalar. They also run at a much higher frequency relative to the memory speed.<p>I think this paper vastly underestimates memory constraints of higher performance systems.
At first, I was impressed they reduced latency by 10x:<p><pre><code> Our initial evaluation with a realistic workload shows
a 10x improvement in latency for 40% of requests without
adding significant overhead to the remaining requests.
</code></pre>
And then I re-read this claim, did a little math, and realized that they only reduced it by 36%.<p>"10x for 40% of requests" is a skeezy way of saying 36%.
Nothing revolutionary, but someone had to do it. (Of course, you would take something expensive in software and implement the logic in software. Of course you can get by implementing only the most common requests: GETs to a small subset of keys.)<p>I don't intend to criticize this paper in particular, but, generally, I don't see small performance improvements in such software to be very useful for society. Academia just becomes a research arm of corporations that might even be a net negative for society: eroding privacy rights (Facebook et al) or introducing volatility into stock markets (HFT could use this paper's insight just as fruitfully.)
Main source of latency will be network. The main problem are synchronous GET requests as then performance == latency. Better go async instead of reducing latency with hardware accel.