> Applications that have high syscall rates include proxies, databases, and others that do lots of tiny I/O. Also microbenchmarks, which often stress-test the system, will suffer the largest losses.<p>My team's RDS instances got hit hard with a 40% increase in CPU usage: <a href="https://imgur.com/a/khGxU" rel="nofollow">https://imgur.com/a/khGxU</a>
The syscall in his benchmark made me laugh (<a href="https://github.com/brendangregg/Misc/blob/master/s1bench/s1bench.c#L124" rel="nofollow">https://github.com/brendangregg/Misc/blob/master/s1bench/s1b...</a>):<p><pre><code> close(999); // the syscall (it errors, but so what)</code></pre>
An open question is still who should enable the mitigation. The risk-cost doesn't seem to fit too many scenarios.<p>Meltdown requires running native, intrusted code, and that doesn't apply to too many servers. While it may be possible to chain this onto another exploit, once an attacker has gained remote code execution, you have much bigger problems.<p>While meltdown is interesting, i wouldn't enable kpti on my database servers buried behind other network infrastructure.
What syscall rates do different databases sustain at maximum load? Transparent huge pages negating most of the overhead is very good news-- but probably helps less with mmap'd IO which so many databases use.
It would be interesting to know the interaction between the patched host and the patched guest. As a simple example, if the host aggressively flushes the TLB, the performance impact on the guest of doing the same could be lower. On the other hand, depending on how the host was patched, the loss of performance of the guest could be different when using some features.
It would be great to see performance deltas for AMD CPUs too, especially since Meltdown only effects Intel and AMD patches for Spectre Variant 2 are considered optional. I would also be nice to see a discussion of AMD's ASID and any differences it has with Intel's PCID when when PCID is addressed.