It feels like it should be possible to fix the reconnect experience, especially in the planned termination of underlying container case: if you ask the client to reconnect, rather than abruptly disconnecting them then they could possibly even wait until their new session was fully established before dropping the old one.<p>That doesn't take away from my appreciation of the pattern, though: I'm very much in favour of rolling releases forwards, rather than being limited to two colours.
Question for the OP-<p>I haven't ever worked on chat services, so this may not be reasonable. Would it be possible to use some other termination endpoint that sits in front of the service, that allows you to maintain persistent connections to the clients, but make for more transparent swaps of backend services?<p>So, for example could you leverage nginx or haproxy as the "termination" point for the chat connection, with those proxying back to the kubernetes service endpoint, which then proxy back to the real backend service. So, when you go to swap out the backend service, nginx / haproxy start forwarding to the new service transparently, while still maintaining the long-lived connection with the client.<p>If this was doable, it would mean you'd only have to drain if you needed to swapout the proxy layer, which is likely a less-frequent task, and thus allows you more agility with your backend services.
Seems like the kind of thing that a Deployment should be able to manage on its own... some kind of DrainPolicy object maybe?<p>Also, if the previous ReplicaSet a Deployment is rolling past has several pods, maybe only some of them need to stay alive (maybe some drain sooner than others.)<p>Perhaps the whole endeavor should just be to make Pod drainage a bit more explicit than just terminationGracePeriodSeconds... perhaps letting a pod signal with a positive confirmation that it's shutting down (letting connections drain) and the rest of the k8s controllers can just leave it alone until it terminates itself.<p>Although really, I think a combination of setting terminationGracePeriodSeconds to unlimited, and having a health check that ensures that it doesn't get wedged and miss the termination signal (by checking that a pod status of "shutting down" corresponds to some property of the container, like a health endpoint saying the shutdown is in progress...) and then nothing else needs to be done. Basically, color me skeptical when they say:<p>"We used service-loadbalancer to stick sessions to backends and we turned up the terminationGracePeriodSeconds to several hours. This appeared to work at first, but it turned out that we lost a lot of connections before the client closed the connection. We decided that we were probably relying on behavior that wasn’t guaranteed anyways, so we scrapped this plan."<p>(This also depends on the container obeying the standard SIGTERM contract to properly drain connections but not accept new ones, which is pretty standard in most web servers nowadays.)
I'm not sure what problem the author is solving. I might be misunderstanding something.<p>The author points out that the issue with Blue/Green/AnyColors deploys is that they need 16 pods per color at all time (which in their case would end up being 128 pods) and 24/48 hours for each connection pool to drain.<p>But how is using a SHA instead of a COLOR any different? Unless I am missing something, and, if running 128 pods and 24/48 hours of draining is the issue, then using SHA instead of colors is not solving those 2 issues.<p>You'll still need 16 pods and 24/48 hours per SHA-deploy, and you're actually making it worst by not using fixed colors since you have quite a lot more SHA at your disposition.
This was really interesting. I'm thinking about moving to Kubernetes and have wondered how to gracefully deal with websocket connections.<p>I'm curious though, if the rollout was over a couple of hours for example, why would reconnections be that big of a problem? We host about 10,000+ websocket on a $20 VPS, and the Go server hosting it crashes from time to time. A surge of 10,000 reconnections instantly afterwards has never lasted for more than a minute or so, so why is it so bad? Moments of peak load aren't that big of a deal, or?
This is a great use case for kube-metacontroller that was introduced in the Day 2 Keynote at Kubecon. With minimal work, you can replicate a deployment or stateful set, but with custom update strategies.<p>Live demo: <a href="https://youtu.be/1kjgwXP_N7A?t=10m46s" rel="nofollow">https://youtu.be/1kjgwXP_N7A?t=10m46s</a>
Code: <a href="https://github.com/kstmp/metacontroller" rel="nofollow">https://github.com/kstmp/metacontroller</a>
> So far we haven’t found a good way of detecting a lightly used deployment, so we’ve been cleaning them up manually every once in a while.<p>Am I missing something, or wouldn’t it be as “simple” as connecting to the running container and running netstat and conditionally killing the pod based on the number of connections? I bet you thought of that, so I’m curious why it didn’t work for you.
One thing I didn't put in here that's also turned out to be useful: We can prerelease things relatively easily this way too. Each deployment has a git sha, and we can have a canary/beta/dogfood version that points at an entirely different sha.
> We still have one unsolved issue with this deployment strategy: how to clean up the old deployments when they’re no longer serving (much) traffic.<p>Could probably solve this with a readiness probe / health check of sorts that is smart enough to know what low usage means.
Curious about the 24h-48h burndown...could it potentially be longer for you guys or is there some mechanism in place to force disconnection (and thus risk a spike) after some TTL?
TL;DR<p>You can drain stuff by changing a Service's selector but leaving the Deployment alone. Instead of changing a Deployment and doing a rolling update, create a new deployment and repoint the Service. Existing connections will remain until you delete the underlying Deployment.