Lots of architectural and cultural problems IMO. Mixing different kinds of queries on the same cluster, no auto scaling on neither cluster nor web server, they seem to be okay with customer-impacting maintenance events (seriously?!), and their "fix" for an event caused by cache misses is to add more caching, which will make their system even harder to understand and predict, increasing the likelihood of more severe and byzantine failures.<p>It's often an unpopular opinion around here, but this is why I prefer simple hosted databases with limited query flexibility for high volume and high availability services (Firestore, DynamoDB, etc.). It's harder to be surprised by expensive queries, and you won't have to fiddle with failovers, auto scaling, caching, etc. Design your system around their constraints and it will have predictable performance and can more easily scale under unexpected load.