I would recommend against running stateful apps in kubernetes. It's not really ready for it. Big problems include routing (it works fine for http requests, but not for DBs, message brokers, etc) and just the pain of setting up stateful sets.<p>If you don't believe me, take it from someone who should know what they're talking about: <a href="https://twitter.com/kelseyhightower/status/963413508300812295" rel="nofollow">https://twitter.com/kelseyhightower/status/96341350830081229...</a>
> Because Kubernetes itself runs on the machines that are running your databases, it will consume some resources and will slightly impact performance. In our testing, we found an approximately 5% dip in throughput on a simple key-value workload.<p>5% seems like a surprisingly large overhead. What is k8s doing in this situation that would have that kind of impact?
I'd like to know how to solve the storage dilution problem with stateful apps in k8s where you have to buy 3-18x more raw capacity than desired to meet availability & durability guarantees.<p>For example if you ran CDB on a baremetal cluster of 3 nodes with 30TB of raw capacity, 15TB is lost to RAID10, 10TB is lost to running a replicated database such as cockroach DB, leaving you with 5TB effective capacity which is a 1/6 dilution of your initial capacity.<p>If you ran cockroach DB on a replicated network volume, with a replication factor of three, it gets worse. If you bought 30 TB of disks, you'd lose 20 TB to volume replication, ~6.67TB to CDB replication leaving you with 3.3TB of effective capacity or a 1/9 dilution. If those disks were configured with RAID your effective capacity would drop to a 1/18 dilution.<p>You could achieve a 1/3 dilution which is the effective minimum for a replicated database if you didn't configure RAID, but you increase the impact of disk failure, in that it would take much much longer to recover a cluster.
>Given its pedigree of literally working at Google-scale<p>I understood that a team at google developed k8s but google doesn't actually run it for their "google-scale" workloads. Am I misinformed?
Has anyone looked at Service Fabric (Microsoft tech) for things like this? That has offered stateful services for years now. I'm pretty sure it runs on Linux, and I've seen that it's Docker compatible. I know it's kinda in the same space as K8s but I don't really know the details. Would SF be able to do something like this in a similar (or better?) way?
Are there any cloud providers providing remote disks without replications?
It looks such needs are popular for deploying databases in which replications are maintained by the databases themselves.