TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Learning to operate Kubernetes reliably

357 pointsby mglukhovskyover 7 years ago

11 comments

KaiserProover 7 years ago
Much as it burns me to admit this, for this usecase, jenkins is king. &lt;60 nodes and its perfect.<p>At previous job, we had migrated from a nasty cron orchestration system to jenkins. It did a number of things including building software, batch generating thumbnails and moving data about on around 30 nodes, of which about 25 were fungible.<p>Jenkins job builder meant that everything was defined in yaml, stored in git and was repeatable. A sane user environment meant that we could execute as user and inherit their environment. It has sensible retry logic, and lots of hooks for all your hooking needs. pipelines are useful for chaining jobs together.<p>We _could_ have written them as normal jobs to be run somewhere in the 36k node farm, but that was more hassle than its worth. Sure its fun, but having to contend with sharing a box that&#x27;s doing a fluid sim or similar, so we&#x27;d have to carve off a section anyway.<p>However kuberenetes to _just_ run cron is a massive waste. It smacks of shiny new tool syndrome. seriously jenkins is a single day deployment. transplanting the cron jobs is again less than a day (assuming your slaves have got a decent environment.)<p>So, with the greatest of respect, talking about building a business case is pretty moot when you are effectively wasting what appears to be &gt; two man months on what should be a week long migration. Think gaffer tape, not carbon fibre bonded to aluminium.<p>If however, the rest of the platform lives on kuberenetes, then I could see the logic, having all your stuff running on one platform is very appealing, especially if you have invested time in translating comprehensive monitoring into business relevant alerts.
评论 #15974768 未加载
评论 #15974988 未加载
评论 #15975858 未加载
评论 #15974924 未加载
评论 #15974149 未加载
评论 #15974270 未加载
评论 #15981879 未加载
评论 #15974865 未加载
评论 #15976028 未加载
评论 #15974847 未加载
评论 #15976618 未加载
评论 #15975485 未加载
评论 #15981118 未加载
评论 #15974251 未加载
评论 #15974173 未加载
alexebirdover 7 years ago
I always search for mentions of Hashicorp Nomad in the comments section of front-page Kubernetes articles like this. There are often few or no mentions, so I’d like to add a plug for the Hashistack.<p>For some reason Nomad seems to get noticeably less publicity than some of the other Hashicorp offerings like Consul, Vault, and Terraform. In my opinion Nomad is right up there with them. The documentation is excellent. I haven’t had to fix any upstream issues in about a year of development on two separate Nomad clusters. Upgrading versions live is straightforward, and I rarely find myself in a situation where I can’t accomplish something I envisioned because Nomad is missing a feature. It schedules batch jobs, cron jobs, long running services, and system services that run on every node. It has a variety of job drivers outside of Docker.<p>Nomad, Consul, Vault, and the Consul-aware Fabio load balancer run together to form most of what one might need for a cluster scheduler based deployment, somewhat reminiscent of the “do one thing well” Unix philosophy of composability.<p>Certainly it isn’t perfect, but I’d recommend it to anyone who is considering using a cluster scheduler but is apprehensive about the operational complexity of the more widely discussed options such as Kubernetes.
评论 #15978329 未加载
评论 #15978335 未加载
评论 #15977623 未加载
评论 #15977880 未加载
评论 #15978088 未加载
评论 #15978163 未加载
mephitixover 7 years ago
Setting aside the k8s content itself, I love the way this article is written. It&#x27;s not a typical tutorial or tips&#x2F;tricks but takes you time-traveling through the experience of a big company adopting nascent tech. Lot of great things to take away even outside of the kubernetes tips.
评论 #15978625 未加载
robszumskiover 7 years ago
&gt; “Sometimes when we do an etcd failover, the API server starts timing out requests until we restart it.”<p>This is likely related a set of Kubernetes bugs [1][2] (and grpc[3]) that CoreOS is working diligently to get fixed. The first set of these, the endpoint reconciler[4], has landed in 1.9.<p>More work is pending on the etcd client in Kubernetes. The good news is that the client is used everywhere, so one fix and all components will benefit.<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;community&#x2F;pull&#x2F;939" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;community&#x2F;pull&#x2F;939</a> [2]: <a href="https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;22609" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;22609</a> [3]: <a href="https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;47131" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;47131</a> [4]: <a href="https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;pull&#x2F;51698" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;pull&#x2F;51698</a>
评论 #15975651 未加载
scarface74over 7 years ago
I&#x27;m curious about what people think about HashiCorp&#x27;s Nad bs Kubernetes.<p>I chose Nomad because I&#x27;m already using Consul and I wanted to run raw .Net executables. Would it have been worth it to use Docker with .Net Core?<p>Not trying to change my infrastructure now, but just curious about whether it is worth the time to play with it on the side.
评论 #15976250 未加载
YesThatTom2over 7 years ago
Such good writing style AND useful technical content. Why can&#x27;t all blog posts be this good?
评论 #15976218 未加载
djsumdogover 7 years ago
I haven&#x27;t been at a k8s shop yet, but at my last job we used Marathon (on DC&#x2F;OS). I know you can run Kubernetes on DC&#x2F;OS, but the default scheduler it comes with is Marathon.<p>Is there an advantage to one over the other? It looks like in both cases, you need a platform team (at least 2, maybe 3 people; we had a large complex setup and had like 10) to setup things like K8s, DC&#x2F;OS or Nomad, because they are complex systems with a lot of different components .. components like Flanel vs Weavenet vs some other container networks, handling storage volumes, labels and automatic configuration of HAProxy from them (marathon-lb on DC&#x2F;OS).<p>All schedulers (k8s, swarm, marathon) seems to use a json format for job information that&#x27;s pretty specific, not only to the scheduler, but to the way other tooling is setup at your specific shop.
perfmodeover 7 years ago
Why do you need a 99.99% from job completion rate? Why not just design for failure and inevitable retries? Almost seems like you grant platform users a false sense of security by making it very reliable but not perfect.
评论 #15974130 未加载
评论 #15973869 未加载
评论 #15975252 未加载
ad_hominemover 7 years ago
How do you deal with sidecar containers in CronJobs (and regular batch Jobs) not terminating correctly?<p><a href="https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;25908" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;kubernetes&#x2F;kubernetes&#x2F;issues&#x2F;25908</a>
评论 #15975717 未加载
asimpletuneover 7 years ago
What is the benefit of using Kubernetes over Mesos (or in conjunction with Mesos)?
评论 #15976507 未加载
评论 #15975474 未加载
评论 #15975312 未加载
minimaxirover 7 years ago
Kubernetes very recently added native Cronjob support: <a href="https:&#x2F;&#x2F;kubernetes.io&#x2F;docs&#x2F;concepts&#x2F;workloads&#x2F;controllers&#x2F;cron-jobs&#x2F;" rel="nofollow">https:&#x2F;&#x2F;kubernetes.io&#x2F;docs&#x2F;concepts&#x2F;workloads&#x2F;controllers&#x2F;cr...</a><p>How does Stripe&#x27;s approach differ?
评论 #15974381 未加载