I Stopped Using Kubernetes. Our DevOps Team Is Happier Than Ever

214 pointsby yarapavan6 months ago

43 comments

rdsubhas6 months ago

I wouldn't trust the management of this team for anything. They appear totally incompetent in both management and basic half-brain analytical skills. Who in the heck creates a cluster per service per cloud provider, duplicate all the supporting services around it, burn money and sanity in a pit, and blame the tool.Literally every single decision they listed was to use any of the given tools in the absolute worst, incompetent way possible. I wouldn't trust them with a Lego toy set with this record.The people who quit didn't quit merely out of burnout. They quit the stupidity of the managers running this s##tshow.

评论 #42258705 未加载

评论 #42257037 未加载

figassis6 months ago

Why exactly did they have 47 clusters? One thing I noticed (maybe because I’m not at that scale) is that companies are running 1+ clusters per application. Isn’t the point of kubernetes that you can run your entire infra in a single cluster, and at most you’d need a second cluster for redundancy, and you can spread nodes across regions and AZs and even clouds?I think the bottleneck is networking and how much crosstalk your control nodes can take, but that’s your architecture team’s job?

评论 #42255100 未加载

评论 #42256027 未加载

评论 #42254439 未加载

评论 #42254773 未加载

评论 #42253608 未加载

评论 #42254804 未加载

评论 #42253852 未加载

评论 #42253460 未加载

评论 #42254884 未加载

评论 #42255430 未加载

评论 #42256031 未加载

评论 #42259180 未加载

评论 #42254136 未加载

评论 #42258943 未加载

评论 #42254111 未加载

评论 #42254284 未加载

timhigins6 months ago

If you have 200 YAML files for a single service and 46 clusters I think you're using k8s wrong. And 5 + 3 different monitoring and logging tools could be a symptom of chaos in the organization.k8s, and the go runtime and network stack have been heavily optimized by armies of engineers at Google and big tech, so I am very suspicious of these claims without evidence. Show me the resource usage from k8s component overhead, and the 15 minute to 3 minute deploys and then I'll believe you. And the 200 file YAML or Helm charts so I can understand why in gods name you're doing it that way.This post just needs a lot more details. What are the typical services/workloads running on k8s? What's the end user application?I taught myself k8s in the first month of my first job, and it felt like having super powers. The core concepts are very beautiful, like processes on Linux or JSON APIs over HTTP. And its not too hard to build a CustomResourceDefinition or dive into the various high performance disk and network IO components if you need to.I do hate Helm to some degree but there are alternatives like Kustomize, Jsonnet/Tanka, or Cue <a href="https://github.com/cue-labs/cue-by-example/tree/main/003_kubernetes_tutorial#controlling-kubernetes-with-cue">https://github.com/cue-labs/cue-by-example/tree/main/003_kub...</a>. You can even manage k8s resources via Terraform or Pulumi

评论 #42254266 未加载

评论 #42254220 未加载

评论 #42273362 未加载

m1keil6 months ago

How to save 1M off your cloud infra? Start from a 2M bill.That's how I see most of these projects. You create a massively expensive infra because webscale, then 3 years down the road you (or someone else) gets to rebuild it 10x cheaper. You get to write two blog posts, one for using $tech and one for migrating off $tech. A line in the cv and a promotion.But kudos for them for managing to stop the snowball and actually reverting course. Most places wouldn't dare because of sunken costs.

评论 #42254795 未加载

评论 #42256684 未加载

评论 #42255955 未加载

Havoc6 months ago

How do you end up with 200 yaml file “basic deployments” without anyone looking up from their keyboard and muttering “guys what are we doing”?Honestly they could have picked any stack as next one because the key win here was starting from scratch

评论 #42253757 未加载

评论 #42255593 未加载

评论 #42253725 未加载

risson6 months ago

So they made bad architecture decisions, blamed it on Kubernetes for some reason, and then decided to rebuild everything from scratch. Solid. The takeaway being what? Don't make bad decisions?

评论 #42252835 未加载

评论 #42253809 未加载

评论 #42253970 未加载

评论 #42252198 未加载

评论 #42255212 未加载

评论 #42252234 未加载

edude036 months ago

Like most tech stories this had pretty much nothing to do the tool itself but with the people/organization. The entire article can be summarized with this one quote> In short, organizational decisions and an overly cautious approach to resource isolation led to an unsustainable number of clusters.And while I emphasize with how they could end up in this situation, it feels like a lot of words were spent blaming the tool choice vs being a cautionary tail about for example planning and communication.

评论 #42256271 未加载

评论 #42255614 未加载

yarapavan6 months ago

<a href="https://archive.ph/2024.11.26-114301/https://blog.stackademic.com/i-stopped-using-kubernetes-our-devops-team-is-happier-than-ever-a5519f916ec0" rel="nofollow">https://archive.ph/2024.11.26-114301/https://blog.stackademi...</a>

评论 #42257523 未加载

deskr6 months ago

So many astonishing things were done ...> As the number of microservices grew, each service often got its own dedicated cluster.Wow. Just wow.

评论 #42256264 未加载

paxys6 months ago

Are the people who decided to spin up a separate kubernetes cluster for each microservice still employed at your organization? If so, I don't have high hopes for your new solution either.

m00x6 months ago

I feel like OP would've been much better off if they just reworked their cluster into something sensible instead of abandoning K8s completely.I've worked on both ECS and K8s, and K8s is much better. All of the problems they listed were poor design decisions, not k8s limitations.- 47 Clusters: This is insane. They ack it in the post, but they could've reworked this.- Multi-cloud: It's now not possible with ECS, but they could've limited complexity with just single-cloud k8s.

评论 #42253556 未加载

评论 #42277547 未加载

评论 #42254957 未加载

jacek6 months ago

<a href="http://archive.today/FGqCu" rel="nofollow">http://archive.today/FGqCu</a>

qrios6 months ago

<a href="https://news.ycombinator.com/item?id=42234097">https://news.ycombinator.com/item?id=42234097</a>

评论 #42252530 未加载

mercurialuser6 months ago

We have 3 cluster, prod, dev, test with a few pod each.Each cluster is wasting tons of cpu and i/o bandwidth just to be idle. I was told that it is etcd doing thousands i/o per second and this is normal.For a few monolith

huksley6 months ago

47 clusters? Is that per developer? You could manage small, disposable VPS for every developer/environment, etc and only have Kubernetes cluster for a production environment...

phendrenad26 months ago

Too bad the author and company are anonymous. I'd like to confirm my assumption that the author has zero business using k8s at all.Infrastructure is a lost art. Nobody knows what they're doing. We've entered an evolutionary spandrel where "more tools = better" meaning the candidate for an IT role who swears by 10 k8s tools is always better than the one who can fix your infra, but will also remove k8s because it's not helping you at all.

xenospn6 months ago

I’ve been building software for 25 years across startups and giant megacorps alike, and I still don’t know what Kubernetes is.

评论 #42256255 未加载

评论 #42255486 未加载

bklw6 months ago

The leaps in this writing pain me. There are other aspects, but they’ve been mentioned enough.Vendor lock in does not come about by relying only on one cloud, but by adopting non-standard technology and interfaces. I do agree that running on multiple providers is the best way of checking if there is lock-in.Lowering the level of sharing further by running per-service and per-stage clusters, as mentioned in the piece was likewise at best an uninformed decision.Naturally moving to AWS and letting dedicated teams handle workload orchestration at much higher scale will yield better efficiencies. Ideally without giving up vendor-agnostic deployments by continuing the use of IaC.

ribadeo6 months ago

Sensible. Kubernetes is an anti-pattern, along with containerized production applications in general.-replicates OS services poorly-OS is likely running on a hypervisor divvying up hardware resources into VPS's-wastes ram and cpu cycles-forces kubectl onto everything-destroys integrity of kernel networking basic principles-takes advantage of developer ignorance of OS and enhances itI get it, it's a handy hack, for non production services or oneoff installs, but then its basically just a glorified VM

Jean-Papoulos6 months ago

>$25,000/month just for control planesTo get to this point, someone must have fucked up way earlier by not knowing what they were doing. Don't do k8s kids !

cies6 months ago

I've looked into K8s some years back and found so many new concepts that I thought: is our team big enough for so much "new".Then I read someone saying that K8s should never be used for teams <20 FTE and will require 3 people to learn it for redundancy (in case used to self-host a SaaS product). This seemed really good advice.Our team is smaller than 20FTE, so we use AWS/Fargate now. Works like a charm.

teekert6 months ago

I can’t post a translated Dutch website on HN without it being shadow blocked, yet one can post stuff like this. Still love HN of course!

评论 #42255466 未加载

woile6 months ago

What else is out there? I'm running docker swarm and it's extremely hard to make it work with ipv6. I'm running my software on a 1GB RAM cloud instance and I pay 4EUR/month, and k8s requires at least 1GB of RAM.As of now, it seems like my only alternative is to run k8s on a 2GB of RAM system, so I'm considering moving to Hetzner just to run k3s or k0s.

评论 #42254891 未加载

评论 #42255267 未加载

nisa6 months ago

I've read this article now multiple times and I'm still not sure if this is just good satire or if it's real and they can burn money like crazy or some subtle ad for aws managed cloud services :)

评论 #42253030 未加载

评论 #42252274 未加载

评论 #42253945 未加载

zeroc86 months ago

It's the same story over and over again. Nobody gets fired for choosing AWS or Azure. Clueless managers and resume driven developers, a terrible combination. The good thing is that this leaves a lot of room for improvement for small companies, who can out compete larger ones by just not making those dumb choices.

评论 #42258591 未加载

miyuru6 months ago

is this article written by AI? other posts are non related and at the end promotes a tool that writes AI powered blog posts.

sontek6 months ago

I'm interested in reading this but don't want to have to login to medium to do so. Is there an alternate link?

评论 #42257443 未加载

评论 #42257548 未加载

b99andla6 months ago

Medium, so cant read.

ExoticPearTree6 months ago

Is there a non-paid version of this? The title is a little clickbait, but reading the comments here seems like this is a story that jumped on the k8s bandwagon, made a lot of terrible decisions along the way and now they're blaming k8s for everything.

andrewstuart6 months ago

When they pulled apart all those kubernetes clusters they probably found a single fat computer would run their entire workload.“Hey, look under all that DevOps cloud infrastructure! There’s some business logic! It’s been squashed flat by the weight of all the containers and orchestration and serverless functions and observability and IAM.”

评论 #42253830 未加载

chris_wot6 months ago

Would have loved to know more about this, but I sure as heck am not going to pay to find out.

评论 #42260113 未加载

threePointFive6 months ago

Does anyone have access to the full article? I'm curious what their alternative was. Terraform? Directly using cloud apis? VMs and Ansible?

评论 #42255907 未加载

jumpoddly6 months ago

> 2 team members quit citing burnoutAnd I would have gotten away with it too if only someone would rid me of that turbulent meddling cluster orchestration tooling!

评论 #42253746 未加载

nrvn6 months ago

Kubernetes is not a one size fits all solution but even the bullet points in the article raise a number of questions. I have been working with Kubernetes since 2016 and keep being pragmatic on tech stuff. Currently support 20+ clusters with a team of 5 people across 2 clouds plus on-prem. If Kubernetes is fine for this company/project/business use case/architecture we'll use it. Otherwise we'll consider whatever fits best for the specific target requirements.Smelly points from the article:- "147 false positive alerts" - alert and monitoring hygiene helps. Anything will have a low signal-to-noise ratio if not properly taken care of. Been there, done that.- "$25,000/month just for control planes / 47 clusters across 3 cloud providers" - multiple questions here. Why so many clusters? Were they provider-managed(EKS, GKE, AKS, etc.) or self-managed? $500 per control plane per month is too much. Cost breakdown would be great.- "23 emergency deployments / 4 major outages" - what was the nature of emergency and outages? Post mortem RCA summary? lessons learnt?..- "40% of our nodes running Kubernetes components" - a potential indicator of a huge number of small worker nodes. Cluster autoscaler been used? descheduler been used? what were those components?- "3x redundancy for high availability" - depends on your SLO, risk appetite and budget. it is fine to have 2x with 3 redundancy zones and stay lean on resource and budget usage, and it is not mandatory for *everything* to be highly available 24/7/365.- "60% of DevOps time spent on maintenance" - <a href="https://sre.google/workbook/eliminating-toil/" rel="nofollow">https://sre.google/workbook/eliminating-toil/</a>- "30% increase in on-call incidents" - Postmortems, RCA, lessons learnt? on-call incidents do not increase just because of the specific tool or technology being used.- "200+ YAML files for basic deployments" - There are multiple ways to organise and optimise configuration management. How was it done in the first place?- "5 different monitoring tools / 3 separate logging solutions" - should be at most one for each case. 3 different cloud providers? So come up with a cloud-agnostic solution.- "Constant version compatibility issues" - if due diligence is not properly done. Also, Kubernetes API is fairly stable(existing APIs preserve backwards compatibility) and predictable in terms of deprecating existing APIs.That being said, glad to know the team has benefited from ditching Kubernetes. Just keep in mind that this "you don't need ${TECHNOLOGY_NAME} and here is why" is oftentimes an emotional generalisation of someone's particular experience and cannot be applied as the universal rule.

mrkeen6 months ago

> DevOps Team Is Happier Than EverOr course they are. The original value proposition of cloud providers managing your infra (and moreso with k8s) was that you could fire your ops team (now called "DevOps" because the whole idea didn't pan out) and the developers could manage their services directly.In any case, your DevOps team has job security now.

评论 #42254104 未加载

ninininino6 months ago

A cluster per service sounds a bit like having an aircraft carrier for each aircraft.

scirob6 months ago

The price comparison doesn't make sense if they used to have a multi cloud system and now its just AWS. Makes me fear this is just content paid by AWS. Actually getting multi cloud to work is a huge achievment and I would be super interested to hear of another tech standard that would make that easier.also : post paywall mirror <a href="https://archive.is/x9tB6" rel="nofollow">https://archive.is/x9tB6</a>

rob_c6 months ago

How did your managers ever _ever_ sign off on something that cost an extra $0.5M?Either your pre profit or some other bogus entity, or your company streamlined moving to k8s and then further streamlined by cutting away things you don't need.I'm frankly just alarmed at the thought of wasting that much revenue, I could bring up a fleet of in house racks for that money!

dvektor6 months ago

I feel like the medium paywall saved me... as soon as I saw "47 clusters across 3 different cloud providers", I begin to think that the tool used here might not actually the real issue.

jimberlage6 months ago

> We were managing 47 Kubernetes clusters across three cloud providers.What a doozy of a start to this article. How do you even reach this point?

denysvitali6 months ago

Skill issue

geuis6 months ago

Oh boy. Please, please stop using Medium for anything. I have lost count of how many potentially interesting or informative articles are published behind the Medium sign-in wall. At least for me, if you aren't publishing blog articles in public then what's the point of me trying to read them.

评论 #42253560 未加载

评论 #42253425 未加载

评论 #42254424 未加载

评论 #42254646 未加载

评论 #42268300 未加载

评论 #42253554 未加载

评论 #42255213 未加载

amne6 months ago

Don't bother reading. This is just another garbage in garbage out kind of article written by something that ends in gpt. Information density approaches zero in this one.