Originally thinking about SQL databases done with PostgresSQL/MySQL, but would be interested in anything. Wondering about number of queries/transactions per second, and how you handled that scale.
At Timehop we currently work with a single instance AWS Aurora (MySQL-ish) database with over 40TB of data (plus a read-only replica on a smaller instance). Some stats: 1.5MB/sec receive throughput, 10MB/sec transmit throughput, commit latency around 3-4ms (with very occasional spikes to 10-20ms), select & commit counts are about 300/s, and select latency hovers around 35ms (we do about a dozen unions per query though).<p>All in all it's the easiest relational database I've ever worked with in terms of stability, speed, and scalability. I know this sounds like an ad for Aurora, but I just really like it.
Having worked with Cassandra for many years, I have worked on:<p>* 1000+ node clusters<p>* Petabyte scale data<p>* 10s of millions of reads and writes per second<p>Given my preface, the "how" is scale out on top of Cassandra, of course. Not SQL, and hard to do if you have a highly relational model, but many stories of success at those kinds of scale.
MLB's Statcast collects 7TB/game, or 17 petabytes of raw data annually. <a href="http://fortune.com/2015/09/04/mlb-statcast-data/" rel="nofollow">http://fortune.com/2015/09/04/mlb-statcast-data/</a>
Would be nice if people answering include:<p>- hot vs cold data ratio of the total size
- read vs write data ratio
- if read/writes are split
- how partitioning, if used, is done
- total machine(s) resources disk/ram
- average (read)query response time
- how machine/node failure is handled
My biggest installation was an accounting system for a multi-national corporation. 30 Oracle instances running on a 256 core Sun cluster, 192 GB RAM, 40 TB EMC SAN. Typical enterprise system overkill.One of the big 6 consulting firms designed it and deployed PeopleSoft on the completed system. I was just the lowly engineer who configured the hardware, the SAN and the Oracle instances. As for Rolls Royce cars, the performance was "adequate".
I've handled ~1 TB DBs in Postgres with about 2000 read queries/sec. Technically these were stored function invocations, so wrapped a considerably larger number of queries inside.<p>This didn't seem to be a problem. It was the simultaneous write operations that created real limits, banging on the disks/disk controller like that.
I had worked with ~2 TB of data in CouchDB (a few thousand endpoint calls / sec) for my capstone project at University and I thought I had experience, but reading these comments, I realized how much less experience I really have.
12TB MongoDB spread across 9 shards (2 replicas per shard)
4TB MySQL with some tables ranging the 400GB size.<p>MongoDB handled about 13K ops/sec at peak times with around 5-8K of these being writes.<p>MySQL was probably around 2-3K ops/sec.