Hey thanks for the article. I know etcd upgrades can look complex but upgrading distributed databases live is always going to be quite non-trivial.<p>That said for many people taking some downtime on their Kube API server isn't the end of the world. The system, by design, can work OK for sometime in a degraded state: workloads keep running.<p>A few things that I do want to try to clarify:<p>1) The strict documented upgrade path for etcd is because testing matrixes just get too complicated. There aren't really technical limitations as much as wanting to ensure recommendations are made based on things that have been tested. The documentation is all here: <a href="https://github.com/coreos/etcd/tree/master/Documentation/upgrades" rel="nofollow">https://github.com/coreos/etcd/tree/master/Documentation/upg...</a><p>2) Live etcd v2 API -> etcd v3 API migration for Kubernetes was never a priority for the etcd team at CoreOS because we never shipped a supported product that used Kube + etcd v2. Several community members volunteered to make it better but it never really came together. We feel bad about the mess but it is a consequence of not having that itch to scratch as they say.<p>3) Several contributors, notably Joe Betz of Google, have been working to keep older minor versions of etcd patched. For example 3.1.17 was released 30 days ago and the first release of 3.1 was 1.5 years ago. These longer lived branches intend to be bug fix only.
I don't know about you, but my application is tested on a single platform/stack with a specific set of operations. When the operation of the thing I'm running on changes, my application has changed. It just can't be expected to run the same way. Upgrade means your app is going to work differently.<p>Not only is the app now different, but the upgrade itself is going to be dangerous. The idea that you can just "upgrade a running cluster" is a bit like saying you can "perform maintenance on a rolling car". It is physically possible. It is also a terrible idea.<p>You can do some maintenance while the car is running. Mainly things that are inside the car that don't affect its operation or safety. But if you want to make significant changes, you should probably stop the thing in order to make the change. If you're in the 1994 film <i>Speed</i> and you literally can't stop the vehicle, you do the next best thing: get another bus running alongside the first bus and move people over. Just, uh, be careful of flat tires. (<a href="https://www.youtube.com/watch?v=rxQI2vBCDHo" rel="nofollow">https://www.youtube.com/watch?v=rxQI2vBCDHo</a>)
I'm old school.
I look at containers as jails and all the work to isolate applications in containers as of indifferent value given a flat plane process scope with MAC and application resource controls in well designed applications.<p>That is I default to good design and testing rather than boilerplate orchestration and external control planes.<p>All containers have done (popularly) in my opinion is add complexity and insecurity to the OS environment and encouraged bad behavior in terms of software development and systems administration.
This may be an unpopular opinion, but I’m not a big fan of containers and K8S.<p>If your app needs a container to run properly, it’s already a mess.<p>While what K8s has done for containers is freaking impressive, to me it does not make a lot of sense unless you run your own bare metal servers. Even then, the complexity it adds may not be worth it. Did I mention that the tech is not mature enough to just run on autopilot and now instead of worrying about the “devops” for your app/service you are playing catch-up with upgrading your K8s cluster?<p>If you’re in the cloud, VMs + autoscalling or fully managed services (eg S3, lambda, etc) make more sense and allow you to focus on your app. Yes there is lock-in. Yes, if not properly arhitected it can be a mess.<p>I wish we would live in a world where people pick simple over complex and think long term vs chasing the latest hotness.
The clustering story for etcd is pretty lacking in general. The discovery mechanisms are not built for cattle type infrastructure or public clouds. ie it is difficult to bootstrap a cluster on a public cloud without first knowing the network interfaces your nodes will have or it requires you to already have an etcd cluster OR use SRV records. From my experience etcd makes it hard to use auto scaling groups for healing and rolling updates.<p>From my experience consul seems to have a better clustering story but I'd be curious why etcd won out over other technologies as the k8s datastore of choice.
To sidestep upgrade issues, we're pursuing stateless immutable K8S clusters as much as possible. If we need new K8S, etcd, etc., we'll spin a new cluster and move the apps. Data at rest (prod DBs, prod Posix FS, prod object stores, etc.) is outside the clusters.
This article really hits home.<p>A K8s cluster can survive just about anything. Worker nodes destroyed, meh, scheduler will take care of bringing stuff up. Master nodes destroyed. Meh. It doesn't care.<p>ETCD issues though? Prepare for a whole lot of pain. They are very uncommon though. Upgrading is the most frequent operation.
From my experience, running etcd in cluster mode simply creates too many problems. It can scale vertically very well and if you run etcd (and other Kubernetes control plane components) on top of Kubernetes you can get away with running only a single instance.
Etcd misbehaving during upgrades or when a VM was replaced was a <i>massive</i> source of bugs for Cloud Foundry.<p>There is no longer an etcd anywhere in Cloud Foundry.
Lol. CoreOS and Hashicorp products often throw “cloud” and “discoverability” around but lack crucial features for ops supportability found in solutions that came before. Zookeeper, Cassandra, Couchbase didn’t evolve in a development vacuum chamber. New != better.