TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Horrors of Using Azure Kubernetes Service in Production

371 点作者 pdeva1将近 7 年前

20 条评论

summarity将近 7 年前
My $DAYJOB is leading a team which develops applications and gateways (for the 1k+ employee B2B market) that integrate deeply with Azure, Azure AD and anything that comes with it. We do have Microsoft employees (who work on Azure) on our payroll, too.<p>I can tell you, as I&#x27;m sure anyone in my team can, that Azure is one big alpha-stage amalgation of half-baked services. I would never ever recommend Azure to literally any organization no matter the size. Seeing our customers struggle with it, us struggle with it, and even MS folks struggle with even the most basic tasks gets tiring really fast. We have so many workarounds in our software for inconsistency, unavailability, questionable security and general quirks in Azure that it&#x27;s not even funny anymore.<p>There are some days where random parts of Azure completely fail, like customers not being able to view resources, role assignments or even their directory config.<p>An automatic integration test of one of our apps, which makes heavy use of Azure Resource Management APIs, just fails dozens of times a week not because we have a bug, but because state within Azure didn&#x27;t propagate (RBAC changes, resource properties) within a timeout of more than 15 minutes!<p>Two weeks back, the same test managed to reproducibly produce a state within Azure that completely disabled the Azure Portal resource view. All &quot;blades&quot; in Azure just displayed &quot;unable to access data&quot;. Only an ultra-specific sequence of UI interactions and API calls could restore Azure (while uncovering a lot of other issues).<p>That is the <i>norm</i>, not the exception. In 1.5 years worth of development, there has never been a single week without an Azure issue robbing us of hours of work just debugging their systems and writing workarounds.<p>&#x2F;rant<p>On topic though, we&#x27;ve had good experiences with these k8s runtimes:<p>- GKE<p>- Rancher + DO<p>- IBM Cloud k8s (yeah, I know!)
评论 #17704432 未加载
评论 #17702615 未加载
评论 #17702554 未加载
评论 #17706630 未加载
评论 #17706450 未加载
评论 #17711045 未加载
评论 #17711824 未加载
评论 #17704304 未加载
评论 #17703356 未加载
评论 #17704378 未加载
评论 #17717353 未加载
评论 #17706965 未加载
QiKe将近 7 年前
(Eng lead for AKS here) While lots of people have had great success with AKS, we&#x27;re always concerned when someone has a bad time. In this particular case the AKS engineering team spent over a day helping identify that the user had over scheduled their nodes, by running applications without memory limit, resulting in the kernel oom (out of memory) killer terminating the Docker daemon and kubelet. As part of this investigation we increased the system reservation for both Docker and kubelet to ensure that in the future if a user over schedules their nodes the kernel will only terminate their applications and not the critical system daemons.
评论 #17703687 未加载
评论 #17703200 未加载
评论 #17705083 未加载
评论 #17704025 未加载
评论 #17704430 未加载
评论 #17703239 未加载
评论 #17703246 未加载
评论 #17703029 未加载
评论 #17704896 未加载
评论 #17703171 未加载
评论 #17707520 未加载
评论 #17703169 未加载
评论 #17706990 未加载
评论 #17703804 未加载
评论 #17702960 未加载
paxys将近 7 年前
Worth nothing that both Microsoft and Amazon&#x27;s Kubernetes offerings are very new (literally <i>weeks</i> since GA). While &quot;officially&quot; ready, it is pretty naive to rely on them for production-critical workloads just yet, at least compared to Google Kubernetes Engine which has been running for years.<p>If you absolutely need managed Kubernetes, stick to GCP for now.
ageitgey将近 7 年前
Here&#x27;s a fun fact about Azure Kubernetes:<p>1. Deploy your Linux service on k8s with redundant nodes<p>2. Create a k8s VolumeClaim and mount it on your nodes to give your application some long-lived or shared disk storage (i.e. for processing user-uploaded files).<p>3. Wait until the subtle bugs start to appear in your app.<p>Because persistent k8s volumes on Azure are provided by Azure disk storage service behind the scenes, lots of weird Windows-isms apply. And this goes beyond stuff like case insensitivity for file names.<p>For example, if a user tries to upload a file called &quot;COM1&quot; or &quot;PRN1&quot;, it will blow up with a disk write error.<p>Yes, that&#x27;s right, Azure is the only cloud vendor that is 100% compatible with enforcing DOS 1.0 reserved filenames - on your Linux server in 2018!
评论 #17703966 未加载
评论 #17703693 未加载
评论 #17703387 未加载
nojvek将近 7 年前
I believe this is a cultural problem with Microsoft. Probably similar to other companies but it was very evident at Microsoft. People responsible to allocate resources (The management chain) rarely dogfood the product.<p>While the Engineers and PM would complain a lot about quality issues, management wants to prioritize more features. It was a running joke at Microsoft: No one gets promoted for improving existing things, if you want a quick promo, build a new thing.<p>So when you see a bazillion half baked things in Azure. That’s because someone got promoted for building each of those half baked things and moving on to the next big thing.<p>Going from 0-90% is the same amount of work as 90-99% and the same amount of work as 99.0% - 99.99%. Making things insanely great is hard and requires a lot of dedicated focus and a commitment to set a higher bar for yourself.
hb3b将近 7 年前
I joined a healthcare startup in 2014 that had a small infrastructure on Azure. Back then AWS weren&#x27;t signing BAAs and Azure was the only player in town. Being an early startup, the company didn&#x27;t purchase a support plan from Azure. One day Azure suffered a major outage (may have been storage related) and over an hour later, I reached out to Microsoft for written confirmation that we could forward to customers. Since we didn&#x27;t have a support plan they flat-out refused to provide any documentation whatsoever about the issue. They wanted $10,000.<p>Azure - never again. Company moved to AWS within a quarter.
mgalgs将近 7 年前
FWIW, Amazon&#x27;s hosted Kubernetes offering (EKS) isn&#x27;t stable either (DNS failures, HPA is known to be broken, etc.).
评论 #17702743 未加载
评论 #17784244 未加载
评论 #17784242 未加载
curiousDog将近 7 年前
This is sad. From what I hear, one of the founders of k8s works on AKS.<p>Only a matter of time before GCP becomes the #1&#x2F;2 cloud provider.
评论 #17703137 未加载
评论 #17703224 未加载
评论 #17708086 未加载
taherchhabra将近 7 年前
Had a similar experience with azure cosmos graph API. The API is half baked. Doesn&#x27;t support all gremlin operations. Even supported operations give non standard output. Switched to aws Neptune immediately when they launched
评论 #17704407 未加载
AaronFriel将近 7 年前
There are definitely growing pains with using Kubernetes on Azure. I&#x27;ve wondered a few times if other platforms have similar issues and have seen more than a few complaints about EKS.<p>Microsoft has some great people working on Azure, but I do feel like AKS was released to GA too soon. Without a published roadmap and scant acknowledgment of issues, I&#x27;m not sure I could recommend it to my clients or employer. It&#x27;s disappointing, because I&#x27;ve had few issues with other Azure services.<p>Full disclosure: I receive a monthly credit through a Microsoft program for Azure.
评论 #17703559 未加载
rcconf将近 7 年前
Hmm, this isn&#x27;t great. Currently using Azure Kubernetes Service and we haven&#x27;t had many issues so far, but we just made the shift.<p>Hope I don&#x27;t have to move over to Google cloud.
评论 #17702968 未加载
评论 #17703784 未加载
评论 #17702718 未加载
parasubvert将近 7 年前
DNS failures were almost certainly related to all the k8s system services on the cluster not having CPU or memory reservations, and KubeDNS was flaking.<p>In general AKS is a vanilla k8s cluster and expects you know what you’re doing. MS arguably should enforce some opinions about how things like system services have reservations, etc, but none of this is vanilla. The trouble is that K8s defaults are pretty poor from a security (no seccomp profiles or apparmor&#x2F;se profiles) and performance perspective (no reservations on key system DaemonSets).<p>We’ve had this interesting industry pendulum swing between extreme poles of “we hate opinionated platforms! Give me all the knobs!” And “this is too hard, we need opinions and guard rails!”. I think the success of K8s is exposing people to the complexity of supplying all of the config details yourself and we will see a new breed of opinionated platforms on top of it very shortly. It reminds me of the early Linux Slackware and SLS and Debian days where people traded X11 configs and window manager configs like they were treasured artifacts before Red Hat, Gnome and KDE, SuSE, and eventually Ubuntu, started to force opinions.
spicyusername将近 7 年前
This is probably why they&#x27;re releasing OpenShift on Azure. To let Red Hat engineers manage the kubernetes part.<p><a href="https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;blog&#x2F;openshift-on-azure-the-easiest-fully-managed-openshift-in-the-cloud&#x2F;" rel="nofollow">https:&#x2F;&#x2F;azure.microsoft.com&#x2F;en-us&#x2F;blog&#x2F;openshift-on-azure-th...</a>
评论 #17702623 未加载
FlorianRappl将近 7 年前
We have a larger migration project going on for months. So far not a single failure occurred and our TEST environment is already fully migrated (quite responsive and rock solid) since 2 weeks.<p>However, I do share that Azure indeed has released a lot of half-baked features and services lately (last 1.5 to 2 years). I hope this trend does not continue.
stefanatfrg将近 7 年前
Couple of questions to the OP:<p>1. What version of docker &#x2F; container runtime is being used?<p>2. What base image for your containers is being used? eg. alpine has known DNS issues [1]<p>[1] <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ZnW3k6m5AY8" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=ZnW3k6m5AY8</a>
bsaul将近 7 年前
Side question : what are the best practices for development ? Are you suppose to run a local kubernetes deployment ( it looks like it&#x27;s pretty hard to set up) , or do you run everything outside of containers when developping and then deal with k8 packaging and deployment as a completely separated issue ( which looks like it could lead to discovering a lot of issues on the preproduction environment) ?
gercheq将近 7 年前
Azure is not bad but there are definitely some rough edges. We&#x27;re having trouble with their Bizspark Sponsorship biling <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17698948" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=17698948</a>
rdl将近 7 年前
Key Vault (their HSM product) is even worse.
ubuntunero将近 7 年前
interesting. thanks
partiallypro将近 7 年前
It&#x27;s a very new offering, the Linux App Services are still in beta, I have no idea why you would roll it into production expecting no hiccups. AWS is also new on this. Give it 6 months and let the kinks work out before migrating workloads. Seems like common sense.
评论 #17704329 未加载