Zero-Downtime Kubernetes Deployments on AWS with EKS

203 点作者 pmig2 个月前

14 条评论

We had to figure this out the hard way, and ended up with this approach (approximately).K8S provides two (well three, now) health checks.How this interacts with ALB is quite important.Liveness should always return 200 OK unless you have hit some fatal condition where your container considers itself dead and wants to be restarted.Readiness should only return 200 OK if you are ready to serve traffic.We configure the ALB to only point to the readiness check.So our application lifecycle looks like this:* Container starts* Application loads* Liveness begins serving 200* Some internal health checks run and set readiness state to True* Readiness checks now return 200* ALB checks begin passing and so pod is added to the target group* Pod starts getting traffic.time passes. Eventually for some reason the pod needs to shut down.* Kube calls the preStop hook* PreStop sends SIGUSR1 to app and waits for N seconds.* App handler for SIGUSR1 tells readiness hook to start failing.* ALB health checks begin failing, and no new requests should be sent.* ALB takes the pod out of the target group.* PreStop hook finishes waiting and returns* Kube sends SIGTERM* App wraps up any remaining in-flight requests and shuts down.This allows the app to do graceful shut down, and ensures the ALB doesn't send traffic to a pod that knows it is being shut down.Oh, and on the Readiness check - your app can use this to (temporarily) signal that it is too busy to serve more traffic. Handy as another signal you can monitor for scaling.e: Formatting was slightly broken.

评论 #43328317 未加载

评论 #43327258 未加载

评论 #43328882 未加载

评论 #43330725 未加载

评论 #43331927 未加载

bradleyy2 个月前

I know this won't be helpful to folks committed to EKS, but AWS ECS (i.e. running docker containers with AWS controlling) does a really great job on this, we've been running ECS for years (at multiple companies), and basically no hiccups.One of my former co-workers went to a K8S shop, and longs for the simplicity of ECS.No software is a panacea, but ECS seems to be one of those "it just works" technologies.

评论 #43326047 未加载

评论 #43325859 未加载

评论 #43327513 未加载

评论 #43325348 未加载

评论 #43326234 未加载

评论 #43325267 未加载

_bare_metal2 个月前

I run <a href="https://BareMetalSavings.com" rel="nofollow">https://BareMetalSavings.com</a>.The amount of companies who use K8s when they have no business nor technological justification for it is staggering. It is the number one blocker in moving to bare metal/on prem when costs become too much.Yes, on prem has its gotchas just like the EKS deployment described in the post, but everything is so much simpler and straightforward it's much easier to grasp the on prem side of things.

评论 #43325414 未加载

评论 #43323320 未加载

评论 #43330600 未加载

评论 #43327531 未加载

评论 #43324329 未加载

评论 #43327953 未加载

paol2 个月前

I'm not sure why they state "although the AWS Load Balancer Controller is a fantastic piece of software, it is surprisingly tricky to roll out releases without downtime."The AWS Load Balancer Controller uses readiness gates by default, exactly as described in the article. Am I missing something?Edit: Ah, it's not by default, it requires a label in the namespace. I'd forgotten about this. To be fair though, the AWS docs tell you to add this label.

评论 #43324438 未加载

评论 #43323524 未加载

cassianoleal2 个月前

A few years ago, while helping build a platform on Google Cloud & GKE for a client, we found the same issues.At that point we already had a CRD used by most of out tenant apps, which deployed an opinionated (but generally flexible enough) full app stack (Deployment, Service, PodMonitor, many sane defaults for affinity/anti-affinity, etc, lots of which configurable, and other things).Because we didn't have an opinion on what tenant apps would use in their containers, we needed a way to make the pre-stop sleep small but OS-agnostic.We ended up with a 1 LOC (plus headers) C app that compiled to a tiny static binary. This was put in a ConfigMap, which the controller mounted on the Pod, from where it could be executed natively.Perhaps not the most elegant solution, but a simple enough one that got the job done and was left alone with zero required maintenance for years - it might still be there to this day. It was quite fun to watch the reaction of new platform engineers the first time they'd come across it in the codebase. :D

评论 #43329613 未加载

NightMKoder2 个月前

This is actually a fascinatingly complex problem. Some notes about the article: * The 20s delay before shutdown is called “lame duck mode.” As implemented it’s close to good, but not perfect. * When in lame duck mode you should fail the pod’s health check. That way you don’t rely on the ALB controller to remove your pod. Your pod is still serving other requests, but gracefully asking everyone to forget about it. * Make an effort to close http keep-alive connections. This is more important if you’re running another proxy that won’t listen to the health checks above (eg AWS -> Node -> kube-proxy -> pod). Note that you can only do that when a request comes in - but it’s as simple as a Connection: close header on the response. * On a fun note, the new-ish kubernetes graceful node shutdown feature won’t remove your pod readiness when shutting down.

评论 #43327072 未加载

评论 #43327001 未加载

happyweasel2 个月前

>The truth is that although the AWS Load Balancer Controller is a fantastic piece >of software, it is surprisingly tricky to roll out releases without downtime.20 years ago we used simple bash scripts using curl to do rest calls to take one host out of our load balancers, then scp to the host and shut down the app gracefully, and updated the app using scp again, then put it back into the load balancer after testing the host on its own. we had 4 or 5 scripts max, straightforward stuff..They charge $$$ and you get downtime in this simple scenario ?

评论 #43330334 未加载

glenjamin2 个月前

The fact that the state of the art container orchestration system requires you to run a sleep command in order to not drop traffic on the floor is a travesty of system design.We had perfectly good rolling deploys before k8s came on the scene, but k8s insistence on a single-phase deployment process means we end up with this silly workaround.I yelled into the void about this once and I was told that this was inevitable because it's an eventually consistent distributed system. I'm pretty sure it could still have had a 2 phase pod shutdown by encoding a timeout on the first stage. Sure, it would have made some internals require more complex state - but isn't that the point of k8s? Instead everyone has to rediscover the sleep hack over and over again.

评论 #43328169 未加载

评论 #43324739 未加载

评论 #43325285 未加载

评论 #43327033 未加载

评论 #43329606 未加载

js22 个月前

Nit: "How we archived" subheading should be "How we achieved".

评论 #43331413 未加载

strangelove0262 个月前

We’re using Argo rollouts without issue. It’s a super set of a deployment with configuration based blue green deploy or canary. Works great for us and allows us to get around the problem laid out in this article.

评论 #43331239 未加载

jayd162 个月前

Does this or any of the strategies listed in the comments properly handle long lived client connections? It's sufficient enough to wait for the LB to stop sending traffic when connections are 100s of ms or less but when connections are minutes or even hours long it doesn't work out well.Is there a slick strategy for this? Is it possible to have minutes long pre-stop hooks? Is the only option to give client connections an abandon ship message and kick them out hopefully fast enough?

gurrone2 个月前

Might be noteworthy that in recent enough k8s lifecycle.preStop.sleep.seconds is implemented <a href="https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3960-pod-lifecycle-sleep-action/README.md#summary" rel="nofollow">https://github.com/kubernetes/enhancements/blob/master/keps/...</a> so no longer any need to run an external sleep command.

yosefmihretie2 个月前

highly recommend porter if you are a startup who doesn't wanna think about things like this

评论 #43327390 未加载

evacchi2 个月前

somewhat related <a href="https://architect.run/" rel="nofollow">https://architect.run/</a>> Seamless Migrations with Zero Downtime(I don't work for them but they are friends ;))