Open-sourcing a 10x reduction in Apache Cassandra tail latency

408 点作者 mikeyk大约 7 年前

23 条评论

the8472大约 7 年前

> The graph shows that a Cassandra server instance could spend 2.5% of runtime on garbage collections instead of serving client requests. The GC overhead obviously had a big impact on our P99 latencyNo, this is not obvious. If you have a fully concurrent GC then spending 25 out of 1000 CPU cycles on memory management does not "obviously" have an impact on your 99th percentile latency. It would primarily impact your throughput (by 2.5%), just like any other thing consuming CPU cycles.> We defined a metric called GC stall percentage to measure the percentage of time a Cassandra server was doing stop-the-world GC (Young Gen GC) and could not serve client requests.Again, this metric doesn't tell you anything if you don't know how long each of the pauses are. If they are at the limit infinitesimally small then you are again only measuring the impact on throughput, not latency.Certainly, GCs with long STW pauses do impact latency, but then you need to measure histograms of absolute pause times, not averages of ratios relative to application time. That's just a silly metric.And neither does the article mention which JVM or GC they're using. Absent further information they might have gotten their 10x improvement relative to some especially poor choice of JVM and GC.

评论 #16525338 未加载

评论 #16524781 未加载

评论 #16524372 未加载

评论 #16524796 未加载

dikanggu大约 7 年前

We do want to contribute our work back to the Cassandra upstream, instead of keeping it as a fork. So that more users from C* community can benefit from the improvements. The pluggable storage engine is an ambitious project (<a href="https://issues.apache.org/jira/browse/CASSANDRA-13474" rel="nofollow">https://issues.apache.org/jira/browse/CASSANDRA-13474</a>). Any help will be appreciated!

评论 #16524771 未加载

评论 #16530248 未加载

gfosco大约 7 年前

RocksDB is used all over Facebook, powers the entire social graph. Great storage engine that pairs well with multiple DBMS: MySQL, Mongo, Cassandra... We'll be at Percona Live 2018 in April, giving several talks, and are looking forward to hanging out and talking with users in our lounge area. We're working hard to support our open source community as well! <a href="https://github.com/facebook/rocksdb" rel="nofollow">https://github.com/facebook/rocksdb</a>

openasocket大约 7 年前

I'm not an expert on these things, but it seems to me if you're implementing a database in Java you wouldn't want to keep your data on the JVM Heap, as this seems to indicate. My understanding is that in most applications (like servers) the average object lives for a very short period of time, and most GC implementations are built from that idea. But, in a database, especially an in-memory database, the majority of the objects are going to live for a very long time. That makes the mark phase of GC a lot more expensive, puts more pressure on the generations, etc.Is my guess here correct, or are there things I'm missing or mistaken on?

评论 #16523716 未加载

评论 #16524775 未加载

评论 #16523666 未加载

评论 #16526407 未加载

评论 #16524353 未加载

haglin大约 7 年前

"To reduce the GC impact from the storage engine, we considered different approaches and ultimately decided to develop a C++ storage engine to replace existing ones."I wonder how the numbers would have looked with the new low latency GC for Hotspot (ZGC). <a href="https://wiki.openjdk.java.net/display/zgc/Main" rel="nofollow">https://wiki.openjdk.java.net/display/zgc/Main</a>Early results from SPECjbb2015 are impressive. <a href="https://youtu.be/tShc0dyFtgw?t=5m1s" rel="nofollow">https://youtu.be/tShc0dyFtgw?t=5m1s</a>

评论 #16527123 未加载

评论 #16525900 未加载

Thaxll大约 7 年前

Weird, did they try to use <a href="https://www.scylladb.com/" rel="nofollow">https://www.scylladb.com/</a>?

评论 #16523471 未加载

评论 #16523242 未加载

评论 #16523372 未加载

评论 #16523215 未加载

tschellenbach大约 7 年前

For Stream's feed tech we also moved from Cassandra to an in-house solution on top of RocksDB. It's been a massive performance and maintenance improvement. This StackShare explains how Stream's stack works. It's based on Go, RocksDB and Raft: <a href="https://stackshare.io/stream/stream-and-go-news-feeds-for-over-300-million-end-users" rel="nofollow">https://stackshare.io/stream/stream-and-go-news-feeds-for-ov...</a>

3uclid大约 7 年前

Unrelated: as a CS undergrad, I read this article and was immediately inspired. This is definitely the type of work I want to be doing when I graduate (infrastructure engineering). But my next thought was: where do I start?!Any advice?

评论 #16523823 未加载

评论 #16547015 未加载

评论 #16523920 未加载

评论 #16525124 未加载

StreamBright大约 7 年前

In a similar situation we just adjust the GC and started to use G1GC which resulted in similar numbers.

评论 #16523650 未加载

fdeliege大约 7 年前

Join our meetup to chat with some of the developers: <a href="https://www.meetup.com/Apache-Cassandra-Bay-Area/events/248376266/" rel="nofollow">https://www.meetup.com/Apache-Cassandra-Bay-Area/events/2483...</a>

评论 #16523902 未加载

en4bz大约 7 年前

Has any tried running Casandra on Azul Zing[1]? The slowdown here is not surprisingly related to GC pauses which Azul has eliminated in Zing.[1] <a href="https://www.azul.com/products/zing/" rel="nofollow">https://www.azul.com/products/zing/</a>

评论 #16523566 未加载

评论 #16523571 未加载

评论 #16523628 未加载

adrianratnapala大约 7 年前

As a Java scoffer trying to be fair-minded, I resisted the urge to joke that "it's was the GC, stupid" and assume that a big project like Cassandra had somehow worked around the GC latency problems.But, what? It turns out the article is really about replacing Java with C++.

评论 #16524337 未加载

cmrdporcupine大约 7 年前

I remember using quite early versions of Cassandra back in an ad-tech startup I was at back in 2009 or 2010, spending unfortunate amounts of time fighting the JVM GC and trying to tune things so it behaved responsibly. It was a real problem then and I know a lot of work went into fixing GC behaviour. Then I stopped using Cassandra for work, but it's unfortunate this is still an issue?What I took out of that is that I really feel like something like Cassandra is better suited to implementation in a language like C++ or Rust. And I believe others have since come along and done this.I really liked the gossip-based federation in Cassandra though.

评论 #16609171 未加载

评论 #16523785 未加载

bfrog大约 7 年前

Meanwhile scylladb looks like a better option for numerous reasons

yazr大约 7 年前

Or just try and benchmark Azul VM with pause-less GCs ?!(I have used Azul in low-latency production environments. It has pros and cons but it certainly beats re-writing the storage layer... )

评论 #16543316 未加载

评论 #16527128 未加载

jjirsa大约 7 年前

Nicely done! Looking forward to the pluggable storage engine.

评论 #16523603 未加载

评论 #16523633 未加载

rbranson大约 7 年前

Did you all find that there were changes to the Java heap/GC configuration that would make tuning this setup different? I imagine if most everything that "sticks" is moved off heap, the GC could be tuned more heavily for young gen throughput vs trying to balance it with long-lived objects.

评论 #16523608 未加载

agnivade大约 7 年前

> We also observed that the GC stalls on that cluster dropped from 2.5% to 0.3%, which was a 10X reduction!Umm .. shouldn't the stalls go to 0, because now you have moved to C++ ? Or is this the time it takes for the manual garbage collection to occur ?

steeve大约 7 年前

Why not use ScyllaDB ? (Serious)

评论 #16528744 未加载

xuanyue大约 7 年前

Is there any trade off after replacing LSM tree-based storage engine to RocksDB storage engine?

评论 #16527371 未加载

welder大约 7 年前

Great, now can you fix the Python Cassandra Driver to work in a multi-threaded application environment without the connection pooling bugs and default synchronous app-blocking (vs lazy-init) connection setup?<a href="https://github.com/datastax/python-driver" rel="nofollow">https://github.com/datastax/python-driver</a>

ismail大约 7 年前

So question:Any thoughts on replacing HDFS + Yarn + Hive + HBASE with GulsterFS + Kubernetes + Cassandra??

评论 #16527919 未加载

alsadi大约 7 年前

Can we add lz4 to the blend to reduce disk IO?