<i>> The graph shows that a Cassandra server instance could spend 2.5% of runtime on garbage collections instead of serving client requests. The GC overhead obviously had a big impact on our P99 latency</i><p>No, this is not obvious. If you have a fully concurrent GC then spending 25 out of 1000 CPU cycles on memory management does not "obviously" have an impact on your 99th percentile latency. It would primarily impact your throughput (by 2.5%), just like any other thing consuming CPU cycles.<p><i>> We defined a metric called GC stall percentage to measure the percentage of time a Cassandra server was doing stop-the-world GC (Young Gen GC) and could not serve client requests.</i><p>Again, this metric doesn't tell you anything if you don't know how long each of the pauses are. If they are at the limit infinitesimally small then you are again only measuring the impact on throughput, not latency.<p>Certainly, GCs with long STW pauses do impact latency, but then you need to measure histograms of absolute pause times, not averages of ratios relative to application time. That's just a silly metric.<p>And neither does the article mention which JVM or GC they're using. Absent further information they might have gotten their 10x improvement relative to some especially poor choice of JVM and GC.
We do want to contribute our work back to the Cassandra upstream, instead of keeping it as a fork. So that more users from C* community can benefit from the improvements. The pluggable storage engine is an ambitious project (<a href="https://issues.apache.org/jira/browse/CASSANDRA-13474" rel="nofollow">https://issues.apache.org/jira/browse/CASSANDRA-13474</a>). Any help will be appreciated!
RocksDB is used all over Facebook, powers the entire social graph. Great storage engine that pairs well with multiple DBMS: MySQL, Mongo, Cassandra... We'll be at Percona Live 2018 in April, giving several talks, and are looking forward to hanging out and talking with users in our lounge area. We're working hard to support our open source community as well! <a href="https://github.com/facebook/rocksdb" rel="nofollow">https://github.com/facebook/rocksdb</a>
I'm not an expert on these things, but it seems to me if you're implementing a database in Java you wouldn't want to keep your data on the JVM Heap, as this seems to indicate. My understanding is that in most applications (like servers) the average object lives for a very short period of time, and most GC implementations are built from that idea. But, in a database, especially an in-memory database, the majority of the objects are going to live for a very long time. That makes the mark phase of GC a lot more expensive, puts more pressure on the generations, etc.<p>Is my guess here correct, or are there things I'm missing or mistaken on?
"To reduce the GC impact from the storage engine, we considered different approaches and ultimately decided to develop a C++ storage engine to replace existing ones."<p>I wonder how the numbers would have looked with the new low latency GC for Hotspot (ZGC).
<a href="https://wiki.openjdk.java.net/display/zgc/Main" rel="nofollow">https://wiki.openjdk.java.net/display/zgc/Main</a><p>Early results from SPECjbb2015 are impressive.
<a href="https://youtu.be/tShc0dyFtgw?t=5m1s" rel="nofollow">https://youtu.be/tShc0dyFtgw?t=5m1s</a>
For Stream's feed tech we also moved from Cassandra to an in-house solution on top of RocksDB. It's been a massive performance and maintenance improvement. This StackShare explains how Stream's stack works. It's based on Go, RocksDB and Raft: <a href="https://stackshare.io/stream/stream-and-go-news-feeds-for-over-300-million-end-users" rel="nofollow">https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...</a>
Unrelated: as a CS undergrad, I read this article and was immediately inspired. This is definitely the type of work I want to be doing when I graduate (infrastructure engineering). But my next thought was: where do I start?!<p>Any advice?
Join our meetup to chat with some of the developers: <a href="https://www.meetup.com/Apache-Cassandra-Bay-Area/events/248376266/" rel="nofollow">https://www.meetup.com/Apache-Cassandra-Bay-Area/events/2483...</a>
Has any tried running Casandra on Azul Zing[1]? The slowdown here is not surprisingly related to GC pauses which Azul has eliminated in Zing.<p>[1] <a href="https://www.azul.com/products/zing/" rel="nofollow">https://www.azul.com/products/zing/</a>
As a Java scoffer trying to be fair-minded, I resisted the urge to joke that "it's was the GC, stupid" and assume that a big project like Cassandra had somehow worked around the GC latency problems.<p>But, what? It turns out the article is really about replacing Java with C++.
I remember using quite early versions of Cassandra back in an ad-tech startup I was at back in 2009 or 2010, spending unfortunate amounts of time fighting the JVM GC and trying to tune things so it behaved responsibly. It was a real problem then and I know a lot of work went into fixing GC behaviour. Then I stopped using Cassandra for work, but it's unfortunate this is still an issue?<p>What I took out of that is that I really feel like something like Cassandra is better suited to implementation in a language like C++ or Rust. And I believe others have since come along and done this.<p>I really liked the gossip-based federation in Cassandra though.
Or just try and benchmark Azul VM with pause-less GCs ?!<p>(I have used Azul in low-latency production environments. It has pros and cons but it certainly beats re-writing the storage layer... )
Did you all find that there were changes to the Java heap/GC configuration that would make tuning this setup different? I imagine if most everything that "sticks" is moved off heap, the GC could be tuned more heavily for young gen throughput vs trying to balance it with long-lived objects.
> We also observed that the GC stalls on that cluster dropped from 2.5% to 0.3%, which was a 10X reduction!<p>Umm .. shouldn't the stalls go to 0, because now you have moved to C++ ? Or is this the time it takes for the manual garbage collection to occur ?
Great, now can you fix the Python Cassandra Driver to work in a multi-threaded application environment without the connection pooling bugs and default synchronous app-blocking (vs lazy-init) connection setup?<p><a href="https://github.com/datastax/python-driver" rel="nofollow">https://github.com/datastax/python-driver</a>