Kubernetes on Hetzner: cutting my infra bill by 75%

375 pointsby BillFranklin6 months ago

38 comments

We've [1] been using Hetzner's dedicated servers to provide Kubernetes clusters to our clients for a few years now. The performance is certainly excellent, we typically see request times half. And because the hardware is cheaper we can provide dedicated DevOps engineering time to each client. There are some caveats though:1) A staging cluster for testing updates is really a must. YOLO-ing prod updates on a Sunday is no one's idea of fun.2) Application level replication is king, followed by block-level replication (we use OpenEBS/Mayastor). After going through all the Postgres operators we found StackGres to (currently) be the best.3) The Ansible playbooks are your assets. Once you have them down and well-commented for a given service then re-deploying that service in other cases (or again in the future) becomes straightforward.4) If you can I'd recommend a dedicated 10G network to connect your servers. 1G just isn't quite enough when it comes to the combined load of prod traffic, plus image pulls, plus inter-service traffic. This also gives a 10x latency improvement over AWS intra-az.5) If you want network redundancy you can create a 1G vSwitch (VLAN) on the 1G ports for internal use. Give each server a loopback IP, then use BGP to distribute routes (bird).6) MinIO clusters (via the operator) are not that tricky to operate as long as you follow the well trodden path. This provides you with local high-bandwidth, low-latency object storage.7) The initial investment to do this does take time. I'd put it at 2-4 months of undistracted skilled engineering time.8) You can still push ancillary/annoying tasks off onto cloud providers (personally I'm a fan of CloudFlare for HTTP load balancing).[1]: <a href="https://lithus.eu" rel="nofollow">https://lithus.eu</a>

评论 #42294950 未加载

评论 #42295183 未加载

评论 #42309734 未加载

评论 #42294446 未加载

tutfbhuf6 months ago

I have experience running Kubernetes clusters on Hetzner dedicated servers, as well as working with a range of fully or highly managed services like Aurora, S3, and ECS Fargate.From my experience, the cloud bill on Hetzner can sometimes be as low as 20% of an equivalent AWS bill. However, this cost advantage comes with significant trade-offs.On Kubernetes with Hetzner, we managed a Ceph cluster using NVMe storage, MariaDB operators, Cilium for networking, and ArgoCD for deploying Helm charts. We had to handle Kubernetes cluster updates ourselves, which included facing a complete cluster failure at one point. We also encountered various bugs in both Kubernetes and Ceph, many of which were documented in GitHub issues and Ceph trackers. The list of tasks to manage and monitor was endless. Depending on the number of workloads and the overall complexity of the environment, maintaining such a setup can quickly become a full-time job for a DevOps team.In contrast, using AWS or other major cloud providers allows for a more hands-off setup. With managed services, maintenance often requires significantly less effort, reducing the operational burden on your team.In essence, with AWS, your DevOps workload is reduced by a significant factor, while on Hetzner, your cloud bill is significantly lower.Determining which option is more cost-effective requires a thorough TCO (Total Cost of Ownership) analysis. While Hetzner may seem cheaper upfront, the additional hours required for DevOps work can offset those savings.

评论 #42293599 未加载

评论 #42292745 未加载

评论 #42295142 未加载

评论 #42293947 未加载

评论 #42292679 未加载

评论 #42291373 未加载

评论 #42293092 未加载

评论 #42293172 未加载

dvfjsdhgfv6 months ago

> Hetzner volumes are, in my experience, too slow for a production database. While you may in the past have had a good experience running customer-facing databases on AWS EBS, with Hetzner's volumes we were seeing >50ms of IOWAIT with very low IOPS.There is a surprisingly easy way to address this issue: use (ridiculously cheap) Hetzner metal machines as nodes. The ones with nvme storage offer excellent performance for dbs and often have generous amounts of RAM. I'd go as far as to say you'd be better off to invest in two or more beefy bare metal machines for a master-replica(s) setup rather than run the db on k8s.If you don't want to be bothered with the setup, you can use one of many modern packages such as Pigsty: <a href="https://pigsty.cc/" rel="nofollow">https://pigsty.cc/</a> (not affiliated but a huge fan).

评论 #42290121 未加载

评论 #42289282 未加载

评论 #42309712 未加载

评论 #42293418 未加载

mythz6 months ago

Been a happy Hetzner customer for over a decade, previously using their dedicated servers in their German DC's before migrating to their Cloud US VMs for better latency with the US. Slightly disappointed with their recent cuts of their generous 20TB free traffic down to 3TB (€1.19 per additional TB), but they still look to be a lot better value than all other US cloud providers we've evaluated.Whilst I wouldn't run Kubernetes by choice, we've had success moving our custom SSH / Docker compose deployments over to use GitHub Actions with kamal-deploy.org, easy to setup and nice UX tools for monitoring remote deployed apps [1][1] <a href="https://servicestack.net/posts/kamal-deployments" rel="nofollow">https://servicestack.net/posts/kamal-deployments</a>

评论 #42294167 未加载

0xbadcafebee6 months ago

I used to do my own car maintenance, because I wanted to save money, and it was fun. It turned out it was more complex than I thought, things slowly fell apart or broke. I spent a good deal of time "re-fixing" things. Spent probably thousands on tools over the years, partly replacing the cheap stuff that broke or rusted quickly. My cars were often up on blocks. But I learned a lot of great lessons. The biggest one? Some things are not worth DIYing; pay a mechanic or lease your car, especially if you depend on it for your livelihood.Even something as simple as an oil change, really isn't worth doing yourself. First you buy the tools (oil drip pan, filter wrench, funnel, creeper). Then you set aside the time to use them, find your dingy work clothes. You go to the store and buy new oil and a filter. You go home and change the oil. Then that day or another day you go to a store that will take your used oil. Versus 20 minutes at an auto mechanic, for about $15 more than the cost of the oil and filter.Kubernetes is an entire car (and a complex one). It's really not worth doing the maintenance yourself, I promise you. Unless you're just doing it for fun.

评论 #42296596 未加载

评论 #42296472 未加载

评论 #42297323 未加载

jonas216 months ago

This is an interesting writeup, but I feel like it's missing a description of the cluster and the workload that's running on it.How many nodes are there, how much traffic does it receive, what are the uptime and latency requirements?And what's the absolute cost savings? Saving 75% of $100K/mo is very different from saving 75% of $100/mo.

评论 #42294689 未加载

slillibri6 months ago

When I worked in web hosting (more than 10 years ago), we would constantly be blackholeing Hetzner IPs due to bad behavior. Same with every other budget/cheap vm provider. For us, it had nothing to do with geo databases, just behavior.You get what you pay for, and all that.

评论 #42289985 未加载

评论 #42294151 未加载

评论 #42292851 未加载

评论 #42289841 未加载

评论 #42294839 未加载

评论 #42290707 未加载

wvh6 months ago

I work for a consultancy company that helps companies building and securing infrastructure. We have a lot of customers running Kubernetes at low-cost providers (like Hetzner), more local middle-tier and top-three (AWS, GCP, Azure). We also have some governmental, financial and medical companies that can not or will not run in public clouds, so they usually host on-prem.If Hetzner has an issue or glitch once a month, the middle-tier providers have one every 2-3 months, and a place like AWS maybe every 5-6 months. However, prices also follow that observation, so you have to carefully consider on a case-by-case basis whether adding some extra machines and backup and failure scenarios is a better deal.The major benefit by using basic hosting services is that their pricing is a lot more predictable; you pay for machines and scale as you go. Once you get hooked into all the extra services a provider like AWS provides, you might get some unexpectedly high bills and moving away might be a lot harder. For smaller companies, don't make short-sighted decisions that threaten your ability to survive long-term by choosing the easy solution or "free credits" scheme early on.There is no right answer here, just trade-offs.

Volundr6 months ago

I haven't used it personally, but <a href="https://github.com/kube-hetzner/terraform-hcloud-kube-hetzner">https://github.com/kube-hetzner/terraform-hcloud-kube-hetzne...</a> looks amazing as a way to setup and manage kubernetes on Hetzner. At the moment I'm on Oracle free tier, but I keep thinking about switching to it to get off... Well Oracle.

评论 #42289868 未加载

评论 #42289787 未加载

评论 #42294248 未加载

评论 #42290746 未加载

no_carrier6 months ago

> While DigitalOcean, like other providers, offers a free managed control plane, there is typically a 100% markup on the nodes that belong to these managed clusters.I don't think this is true. With Digital Ocean, the worker nodes are the same cost as regular droplets, there's no additional costs involved. This makes Digital Ocean's offering very attractive - free control plane you don't have to worry about, free upgrades, and some extra integrations to things like the load balancer, storage, etc. I can't think of a reason to not go with that over self-managed.

评论 #42291900 未加载

评论 #42309826 未加载

chipdart6 months ago

I loved the article. Insightful, and packed with real world applications. What a gem.I have a side-question pertaining to cost-cutting with Kubernetes. I've been musing over the idea of setting up Kubernetes clusters similar to these ones but mixing on-premises nodes with nodes from the cloud provider. The setup would be something like:- vCPUs for bursty workloads,- bare metal nodes for the performance-oriented workloads required as base-loads,- on-premises nodes for spiky performance-oriented workloads, and dirt-cheap on-demand scaling.What I believe will be the primary unknown is egress costs.Has anyone ever toyed around with the idea?

评论 #42289865 未加载

评论 #42294419 未加载

评论 #42290149 未加载

评论 #42289707 未加载

评论 #42289853 未加载

jillesvangurp6 months ago

The key take home point here is not how amazingly cheap Hetzner is, which it is. But how much of an extortion game Google, Amazon, MS, etc. are playing with their cloud services. These are trillion dollar companies because they are raking in cash with extreme margins.Yes, there is some added value in the level of convenience provided. But maybe with a bit more competition, pricing could be more competitive. A lot more competitive.

surrTurr6 months ago

> Hetzner volumes are, in my experience, too slow for a production database. While you may in the past have had a good experience running customer-facing databases on AWS EBS, with Hetzner's volumes we were seeing >50ms of IOWAIT with very low IOPS. See <a href="https://github.com/rook/rook/issues/14999">https://github.com/rook/rook/issues/14999</a> for benchmarks.I set up rook ceph on a talos k8s cluster (with vm volumes) and experienced similar low performance; however, I always thought that was because of the 1Gi vSwitch (i.e. networking problem)?! The SSD volumes were quite fast.

评论 #42295185 未加载

评论 #42293874 未加载

hipadev236 months ago

Be careful with Hetzner, they null routed my game server on launch day due to false positives from their abuse system, and then took 3 days for their support team to re-enable traffic.By that point I had already moved to a different provider of course.

评论 #42290803 未加载

评论 #42289962 未加载

评论 #42290380 未加载

esher6 months ago

As far as I see, no one is mentioning sustainability AKA environmental impact or 'green hosting' here. Don't you care about that?I believe that Hetzner data centers in Europe (Germany, Finland) are powered by green energy, but not the locations in US.

评论 #42294230 未加载

评论 #42293842 未加载

评论 #42294357 未加载

Hetzner_OL6 months ago

Hi Bill, Wow! Thanks for the amazing write-up and for sharing it on your blog and here! I am so happy that we've helped you save so much money and that you're happy with our support team! It's a great way to start off the week! --Katie

ArtTimeInvestor6 months ago

Can anybody speak to the pros and cons of Hetzner vs OVH?There ain't many large European cloud companies, and I would like to understand how they differentiate.Ionos is another European one. Currently, it looks like their cloud business is stagnating, though.

评论 #42294785 未加载

评论 #42290596 未加载

评论 #42294846 未加载

usrme6 months ago

This is probably out of left field, but what is the benefit of having a naming scheme for nodes without any delimiters? Reading at a glance and not knowing the region name convention of a given provider (i.e. Hetzner), I'm at a loss to quickly decipher the "<region><zone><environment><role><number>" to "euc1pmgr1". I feel like I'm missing something because having delimiters would make all sorts of automated parsing much easier.

评论 #42290392 未加载

评论 #42290904 未加载

s3rius6 months ago

That's a really good article. Actually, recently we were migrating as well and we were using dedicated nodes in our setup.In order to integrate a load-balancer provided by hetzner with our k8s on dedicated servers we had to implement a super thin operator that does it: <a href="https://github.com/Intreecom/robotlb">https://github.com/Intreecom/robotlb</a>If anyone will be inspired by this article and would want to do the same, feel free to use this project.

aliasxneo6 months ago

I’m planning on doing something similar but want to use Talos with bare metal machines. I suspect to see similar price reductions from our current EKS bill.

评论 #42290129 未加载

评论 #42290486 未加载

sureglymop6 months ago

I went hetzner baremetal, set up a proxmox cluster over it and then have kubernetes on top. Gives me a lot of flexibility I find.

bittermandel6 months ago

We're very happy to use Hetzner for our bare-metal staging environments to validate functionality, but I still feel reluctant to put our production there. Disks don't quite work as intended at all times and our vSwitch setup has gotten reset more than once.All of this makes sense considering the extremely low price.

Scotrix6 months ago

Very nicely written article. I’m also running a k8s cluster but on bare metal and qemu-kvms for the base load. Wonder why you would chose VMs instead of bare metal if you looking for cost optimisation (additional overhead maybe?), could you share more about this or did I miss it?

评论 #42290127 未加载

mnming6 months ago

I feel lots of the work described in the article can be automated by kops, probably in a much better way, especially when it comes to day 2 operations.I wonder what is the motivation behind manually spinning up a cluster instead of going with more established tooling?

lucasrattz6 months ago

We at Syself.com also have great experiences with Kubernetes on Hetzner. We built a platform on top of Cluster API and brought a managed Kubernetes experience to Hetzner. Now we have self-healing, automated updates and 100% reproductibility, with full bare metal support.> Hetzner volumes are, in my experience, too slow for a production database.That's true, though. To solve that we developed a way to persist the local storage of bare metal servers across reprovisionings. This way it's both faster and cheaper. Now we are adding an automated database deployment layer on top of it.

MuffinFlavored6 months ago

<a href="https://github.com/puppetlabs/puppetlabs-kubernetes">https://github.com/puppetlabs/puppetlabs-kubernetes</a>What do the fine people of HN think about the size/scope/amount of technology of this repo?It is referenced in the article here: <a href="https://github.com/puppetlabs/puppetlabs-kubernetes/compare/main...bilbof:puppetlabs-kubernetes:main#diff-50ae7fb3724b662b58dbc1c71663cb16a484ab36aecd5a11317fb14465f847fa">https://github.com/puppetlabs/puppetlabs-kubernetes/compare/...</a>

评论 #42295219 未加载

评论 #42293953 未加载

czhu126 months ago

Funnily enough, we made the exact same transition from heroku to DigitalOceans managed Kubernetes service, and saved about 75%. Presumably this means that had we moved from heroku to hetzner, it would have been 93% savings!The costs of cloud hosting are totally out of control, would love to see more efforts that lets developers move down the stack.I’ve been humbly working on <a href="https://canine.sh" rel="nofollow">https://canine.sh</a> which basically provides a Heroku like interface to any K8 cluster

Neil446 months ago

When I first started hosting servers/services for customers I was using EC2 and Rackspace, then I discovered Linode and was happy it was so much cheaper with apparently no downside. After the first couple of interactions with support I started to relax. Then I discovered OVH, same story. I haven't needed the support yet though.

acac106 months ago

// Taking another slant at the discussion: Why kubernetes?Thank you for sharing your experience. I also have my 3 personal servers with Hetzner, plus a couple VM instances in Scaleways (French outfit).Disclaimer: I’m a Googler, was SRE for ~10 years for GMail, identity, social, apps (gsuites nowadays) and more, managed hundreds of jobs in Borg, one of the 3 founders of the current dev+devops internal platform (and I focused on the releases,prod,capacity side of the platform), dabbled in K8s on my personal time. My opinions, not Google’s.So, my question is: given the significant complexity that K8s brings (I don’t think anyone disputes this) why are people using it outside medium-large environments? There are simpler and yet flexible & effective job schedulers that are way easier to manage. Nomad is an example.Unless you have a LOT of machines to manage, with many jobs (I’d say +250) to manage, K8s complexity, brittleness and overhead are not justifiable, IMO.The emergence of tools like Terraform and the many other management layers in top of K8s that try to make it easier but just introduce more complexity and their own abstractions are in itself a sign of that inherent complexity.I would say that only a few companies in the world need that level of complexity. And then they will need it, for sure. But, for most is like buying a Formula 1 to commute in a city.One other aspect that I also noticed is that technical teams tend to carry on the mess they had in their previous “legacy” environment and just replicate in K8s, instead of trying to do an architectural design of the whole system needs. And K8s model enables that kind of mess: a “bucket of things”.Those two things combined, mean that nowadays every company has soaring cloud costs, are running things they know nothing about but are afraid to touch in case of breaking something. And an outage is more career harming than a high bill that Finance will deal with it later, so why risk it, right? A whole new IT area has been coined now to deal with this: FinOps :facepalm:I’m just puzzled by the whole situation, tbh.

评论 #42295180 未加载

评论 #42296433 未加载

评论 #42322017 未加载

评论 #42296420 未加载

kakoni6 months ago

Anybody running k3s/k8s on Hetzner using cax servers? How's that working?

评论 #42312431 未加载

james_sulivan6 months ago

For those considering Hetzner, there is also Contabo which is another German hosting company that is also good, at least in my experience

devops0006 months ago

Did you try Cloud66 for deploy?

cjr6 months ago

What about cluster autoscaling?

评论 #42289318 未加载

评论 #42294286 未加载

awinter-py6 months ago

cut my kube bill 100% on GKE by switching from regional to zonal cluster bc the first zonal cluster is free

aravindputrevu6 months ago

Do you know that they are cutting their free tier bandwidth? Did not read too much into it, but heard a few friends were worried about.End of they day, they are a business!

评论 #42309852 未加载

segmondy6 months ago

Great write up Bill!

Iwan-Zotow6 months ago

this is goodwell, running on bare metal would be even better

评论 #42312434 未加载

postepowanieadm6 months ago

Lovely website.

38 comments

adamcharnock6 months ago

评论 #42294950 未加载

评论 #42295183 未加载

评论 #42309734 未加载

评论 #42294446 未加载

tutfbhuf6 months ago

评论 #42293599 未加载

评论 #42292745 未加载

评论 #42295142 未加载

评论 #42293947 未加载

评论 #42292679 未加载

评论 #42291373 未加载

评论 #42293092 未加载

评论 #42293172 未加载

dvfjsdhgfv6 months ago

评论 #42290121 未加载

评论 #42289282 未加载

评论 #42309712 未加载

评论 #42293418 未加载

mythz6 months ago

评论 #42294167 未加载

0xbadcafebee6 months ago

评论 #42296596 未加载

评论 #42296472 未加载

评论 #42297323 未加载

jonas216 months ago

评论 #42294689 未加载

slillibri6 months ago

评论 #42289985 未加载

评论 #42294151 未加载

评论 #42292851 未加载

评论 #42289841 未加载

评论 #42294839 未加载

评论 #42290707 未加载

wvh6 months ago

Volundr6 months ago

评论 #42289868 未加载

评论 #42289787 未加载

评论 #42294248 未加载

评论 #42290746 未加载

no_carrier6 months ago

评论 #42291900 未加载

评论 #42309826 未加载

chipdart6 months ago

评论 #42289865 未加载

评论 #42294419 未加载

评论 #42290149 未加载

评论 #42289707 未加载

评论 #42289853 未加载

jillesvangurp6 months ago

surrTurr6 months ago

> Hetzner volumes are, in my experience, too slow for a production database. While you may in the past have had a good experience running customer-facing databases on AWS EBS, with Hetzner's volumes we were seeing >50ms of IOWAIT with very low IOPS. See <a href="https://github.com/rook/rook/issues/14999">https://github.com/rook/rook/issues/14999</a> for benchmarks.I set up rook ceph on a talos k8s cluster (with vm volumes) and experienced similar low performance; however, I always thought that was because of the 1Gi vSwitch (i.e. networking problem)?! The SSD volumes were quite fast.

评论 #42295185 未加载

评论 #42293874 未加载

hipadev236 months ago

评论 #42290803 未加载

评论 #42289962 未加载

评论 #42290380 未加载

esher6 months ago

评论 #42294230 未加载

评论 #42293842 未加载

评论 #42294357 未加载

Hetzner_OL6 months ago

ArtTimeInvestor6 months ago

评论 #42294785 未加载

评论 #42290596 未加载

评论 #42294846 未加载

usrme6 months ago

评论 #42290392 未加载

评论 #42290904 未加载

s3rius6 months ago

aliasxneo6 months ago

I’m planning on doing something similar but want to use Talos with bare metal machines. I suspect to see similar price reductions from our current EKS bill.

评论 #42290129 未加载

评论 #42290486 未加载

sureglymop6 months ago

I went hetzner baremetal, set up a proxmox cluster over it and then have kubernetes on top. Gives me a lot of flexibility I find.

bittermandel6 months ago

Scotrix6 months ago

评论 #42290127 未加载

mnming6 months ago

lucasrattz6 months ago

MuffinFlavored6 months ago

评论 #42295219 未加载

评论 #42293953 未加载

czhu126 months ago

Neil446 months ago

acac106 months ago

评论 #42295180 未加载

评论 #42296433 未加载

评论 #42322017 未加载

评论 #42296420 未加载

kakoni6 months ago

Anybody running k3s/k8s on Hetzner using cax servers? How's that working?

评论 #42312431 未加载

james_sulivan6 months ago

For those considering Hetzner, there is also Contabo which is another German hosting company that is also good, at least in my experience

devops0006 months ago

Did you try Cloud66 for deploy?

cjr6 months ago

What about cluster autoscaling?

评论 #42289318 未加载

评论 #42294286 未加载

awinter-py6 months ago

cut my kube bill 100% on GKE by switching from regional to zonal cluster bc the first zonal cluster is free

aravindputrevu6 months ago

Do you know that they are cutting their free tier bandwidth? Did not read too much into it, but heard a few friends were worried about.End of they day, they are a business!

评论 #42309852 未加载

segmondy6 months ago

Great write up Bill!

Iwan-Zotow6 months ago

this is goodwell, running on bare metal would be even better

评论 #42312434 未加载

postepowanieadm6 months ago

Lovely website.