Disasters I've seen in a microservices world

256 pointsby gregdoesitalmost 4 years ago

22 comments

In the modern world of computing, the one thing I have seen literally everywhere I've been is people implementing systems that they absolutely do not understand.Most microservices I've seen implemented are not actually microservices. Most teams (unless they are developing "simple" web or mobile applications) have no idea who is consuming their services, or how, or if what they've made is working correctly. They frequently don't have an understanding of the different models of releasing software, much less of a complete system architecture, failure modes, consistency models, reliability estimates, performance limits, etc. Mostly what I see time and again is a team of people who just write some code that seems to work for them, and then go home, without ever considering how or if it's working in the real world.I don't know anything about modern computer science education. But it appears that new developers have absolutely no idea how anything but algorithms work. It's like we taught them how hammers and saws and wrenches work, and then told them to go build a skyscraper. There are only two ways I know of for anyone today to correctly build a modern large-scale computer system: 1) Read every single book and blog post, watch every conference talk, and listen to every podcast that exists about building modern large-scale computing systems, and 2) Spend 5+ years making every mistake in the book as you try to build them yourself.It feels like the industry mostly just re-learns the same mistakes over and over like we're in Groundhog's Day (we're the extras, not Bill Murray). But it's equally possible that I just lack perspective and am expecting too much. Maybe the auto industry at the turn of the 20th century also spent decades re-learning the same lessons over and over, as the novelty of mass-producing complex systems continued to elude us. Hell, the new auto companies still don't get it right.

评论 #27498747 未加载

评论 #27498683 未加载

评论 #27498802 未加载

评论 #27498725 未加载

评论 #27498723 未加载

评论 #27499485 未加载

评论 #27498823 未加载

评论 #27512340 未加载

评论 #27512792 未加载

deckard1almost 4 years ago

I was going to comment on this very thing in the other thread about software glue[1]. In that article there is a youtube video on Multics vs Unix[2] which really outlines why microservices were always doomed.Someone should coin a new law for how programmers have to rediscover Brooks's law every 5-10 years. The issue with microservices, as it has always been, is that you need an enormous company to brute force the communication pathways and maintenance overhead for it to all work. And by work, I don't mean function efficiently (as the Multics vs Unix video shows). I mean just function. Just work at all. The Multics team had all the devs and the Unix team was two guys doing laps around the Multics team. Because they had the mathematics on their side.Remember the bad old days of memory thrashing? That's what happens to teams that do not have enough bandwidth to properly maintain the dozens of services they are responsible for. Your organization gets frozen.This is what we all get for taking advice from the Googles and Facebooks of the world. Google has like a billion lines of code in a monorepo. They do not do things remotely like 99% of the businesses out there. They are sitting on huge piles of money that lets them be incredibly inefficient for decades.[1] <a href="https://news.ycombinator.com/item?id=27482832" rel="nofollow">https://news.ycombinator.com/item?id=27482832</a>[2] <a href="https://www.youtube.com/watch?v=3Ea3pkTCYx4" rel="nofollow">https://www.youtube.com/watch?v=3Ea3pkTCYx4</a>

评论 #27499441 未加载

评论 #27501214 未加载

jillesvangurpalmost 4 years ago

People do microservices for the wrong reasons. There are only a few somewhat valid reasons:1) something has different run time needs than something else. Think CPU, memory, network. Breaking stuff up allows you to make different choices here. Although I have dodged this by simply deploying the same monolith and configure it to do different things.2) Something needs to be developed by a different team and for whatever reasons you don't want those teams to be too dependent. It's a bad reason but it's valid in a lot of companies where certain teams just need to be engineered around or where there is a fundamental lack of trust between different parts of the org chart. Conway's law is a thing. It's the most common reason to do micro services.3) You have two things depending on each other (cyclical dependency) and you want to reuse that thing. Extracting it to a third thing is a common way out. It's true for almost any component technology. If you have two components, you'll find a reason to create a third. And a fourth. And so on. However, consider using something less dramatic. E.g. code libraries are a valid choice. Or having an extra module in your source tree.Everything else is just needlessly/prematurely increasing overhead, deployment friction, etc. You get more things to monitor, deploy, manage the roadmap off, worry about, specialize in, etc. Big bloated organizations do micro services because they are big and bloated. Many smart startups keep this nonsense to a minimum. Of course some startups start out being over funded and bloat too early. VC money is great and sometimes requires over engineering like this (i.e. impress the suits). I've heard more than a few CTOs boast their multi cloud strategy and micro service architecture. In my mind that translates to we funnel a lot of VC money to Amazon and pay people full time to do just that. Ridiculous monthly bills and no users or traction is a common pattern in that world.

评论 #27494019 未加载

评论 #27494179 未加载

评论 #27494079 未加载

评论 #27497948 未加载

jaredcwhitealmost 4 years ago

The problem with the hype around microservices was that it's the web app deployment equivalent of writing device drivers in C. Sure, some people can do it. Some people have to do it. Yet most people shouldn't even make the attempt.I've been on teams where we can't even reliably deploy code over time to one service. The idea of our team maintaining multiple services is madness. That's not a knock on any individual's technical merits. Deploying code to the cloud is just hard, period—and that's even when talking about a traditional monolith!I'm glad the pendulum is swinging back. The microservices pattern is useful for the areas in which it's useful…it just so happens that problem space is way smaller than the hype cycle cared to admit a few years ago.

评论 #27496997 未加载

评论 #27500538 未加载

评论 #27498988 未加载

kevmo314almost 4 years ago

> Some teams were suffering from servicitis. Even worse than that, it generated a lot of friction while developing. One could not just look into a project in their IDE, but it required to have multiple projects open simultaneously to make sense of all that mess.This is real: I've worked in projects where purely transformational code was offloaded into a "service". Refactoring it into a library reduced lines of code, computational cost, and code complexity dramatically.But wouldn't it be cool if there were a framework where the developer didn't have to demarcate where services started and ended? In principle, any pure asynchronous function could be abstracted out to a service. It would be neat if the compiler did that for me and deployment of the application was more like "deploy the cluster" instead of deploying each individual service.

评论 #27493525 未加载

评论 #27493383 未加载

评论 #27493359 未加载

评论 #27494707 未加载

评论 #27498539 未加载

评论 #27496742 未加载

spaetzleesseralmost 4 years ago

I recently got talked into developing a new project with Kubernetes and microservices. It's an interesting journey but the complexity this adds is just enormous. Debugging is hard, refactoring is hard once it touches service boundaries, coordinating releases between services is hard and so on. I highly doubt that we will ever scale to a size where the complexity pays off.I feel this kind architecture of especially appeals to people who like to write only new code instead of understanding existing code. They don't like to read old code to see where new functionality may fit in so they spin up new services.We are complaining about maintenance of old COBOL code but God be with the poor people who in 20 years will have to maintain the monstrosities we are creating today.

评论 #27495280 未加载

cratermoonalmost 4 years ago

> Timeouts, retries, and resilienceAt a previous employer I was responsible for a critical service that was starting to show strain as traffic ramped up. It used Hystrix as the circuit breaker for calls to backend services, including the DB, and at peak times the thread pool would fill and start rejecting additional requests. I was tasked with fixing that.There's a very simple formula for tuning the number of threads:> requests per second at peak when healthy × 99th percentile latency in seconds + some breathing roomThe catch is that getting good RPS and latency numbers is in a distributed system deployed across three geographically separated datacenters is the opposite of simple. In particular, the legacy of the system meant that we had one write instance of the DB, in one datacenter, meaning that latency was different depending on which DC was the source of the call, so there was no one setting that worked for all instances.

sackerhewsalmost 4 years ago

I was once working with an engineer so hell bent on splitting everything up into microservices that at one point logging became incompatible with his solution.He then argued that our banking solution didn't need logging, because it was so well tested that failure rates would be extremely low.I'm not making this up.

e67f70028a46fbaalmost 4 years ago

Putting a network connection between your application abstractions was always a dicey proposition. (See EJB 1.0)Making it the entire basis for application abstraction is lunacy, the sort of extremely clever idiocy that can only occur in the tech world.

评论 #27495163 未加载

评论 #27498359 未加载

评论 #27498433 未加载

slveralmost 4 years ago

Disaster #1 is too small services, and Disaster #4 is huge, shared databases (between many services).Which reaffirms my overall opinion that most people writing services have no idea what's a service and what it encapsulates (it encapsulates its own state, for example, it's basically distributed OOP).BTW, remember when "your service should be 100 lines of code top" was considered a best practice?Why is it so hard for most to resist this hype nonsense wave when its oncoming and it's so hard to resist the anti-hype wave that inevitably follows it? Because hype and anti-hype are simply the oscillation of a sea of empty minds in look for a solution to a problem they don't understand.I've been writing service oriented apps for decades. In my "world" nothing has changed.

评论 #27493441 未加载

评论 #27495481 未加载

评论 #27494952 未加载

aprdmalmost 4 years ago

Really good write up! I think the suite spot in a < 50 eng organization is either a monolithic or a microservice per domain (instead of per functionality)Once you have thousands of engineers, then you either need extreme discipline and a huge team maintaining the "devops" pipeline that everything goes through, or, it's basically everyone for themselves and a "devops" team trying to help others setting standards, best practices and whatnot.

FpUseralmost 4 years ago

In places where micro/services were *really* needed people/orgs were implementing those even decades back. I personally was doing it in the 90s. For myself the criteria for making something as a service was simple: it will really cost the organization not to have it as a service.Now as with many things that nothingburger got suddenly overhyped and ended up being shoved into every hole disregarding of any technical rationale.

trixie_almost 4 years ago

Another one for <a href="http://microservices.fail/" rel="nofollow">http://microservices.fail/</a>There are so many developers who think microservices are the answer to every ill in the world it is infuriating. It’s like one of those ‘nosql is webscale we should use it’ conversions.

helge9210almost 4 years ago

> I've seen many engineers ignoring these because it's "an edge case", to realize later they have a massive data integrity problem.Massive optimism detected at the part of "to realize later".Also, if you have "edge case" with high (close to 1 but lower than 1) probability of successful outcome, with more and more tries probability of all outcomes being successful moves to zero (multiplication of lower than 1 values) and probability of at least single failed outcome goes to 1 (addition of lower than 1 values). You still have to handle low probability edge cases.

tjpnzalmost 4 years ago

The E2E testing one is driving me nuts. I've told people time and time again that we'll never be able to have hundreds (or even a dozen) of them working reliably. Yet people still try and to date we've lost even more hours in trying to make them reliable. The idea is seductive but in my experience an exercise in futility. At the same time I've proposed alternatives such as semantic monitoring but people typically recoil in horror when they learn what it means.

评论 #27498852 未加载

mavelikaraalmost 4 years ago

A response someone wrote to this article: <a href="https://medium.com/productboard-engineering/countering-microservice-disasters-5a8f957803cb" rel="nofollow">https://medium.com/productboard-engineering/countering-micro...</a>

评论 #27498506 未加载

ferdowsialmost 4 years ago

It's interesting to hear about stability concerns. Overall I think my organization moving to microservices improved our resiliency story. It allowed us to freeze sensitive legacy services and gradually build other surrounding services that incrementally replaced those legacy services with better-performing Go services. Rolling out new services is not onerous due to our Kubernetes platform (which was nowhere near as difficult to build on as some might suggest).Strong service boundaries helped us, they didn't hold us back.

mbrodersenalmost 4 years ago

If you are not a good enough developer to build monoliths then you are not a good enough developer to build micro-services.

gravypodalmost 4 years ago

> One could not just look into a project in their IDE, but it required to have multiple projects open simultaneously to make sense of all that messWhy not both? You can set things up so your go-to-def can understand your API calls and head to the right place. This is very easy to do with a monorepo + protobuf setup.> How much does it cost to spin 200 services in a cloud provider? Can you do it? Can you also spin up the infrastructure needed to run them?Assuming 256mb RAM/service you're still well within 1 machine territory. Once you get above 1 machine territory you can set things up so that you can:1. Build really good integration testing tooling so that devs don't really need to interface with all services. In a test spin up everything you need + deps, run an API call, tear everything down. This can be cached if your build system does that. You can run into issues if you have situations where 1 API call hits every service but if you've done that you've already messed up. In those cases the best you can do is mock a step in the chain there, run a few test hits the entire chain before release, but then have devs run against the mock in their integration tests.2. Hybrid environments. You run a dev cluster that has all of your basic infrastructure that doesn't change much and provide a way for developers to launch new tasks that don't get routed to unless the driver has a feature flag flipped. Essentially you have a "dev" cluster that is continuously delivered from your master repo, each developer has the ability to launch new tasks in this cluster, and they can say "all traffic from alice for FoobarService should go to `{namespace=bob,service=Foobar}`.> As you can imagine, end-to-end tests have similar problems to development environments. Before, it was relatively easy to create a new development environment using virtual machines or containers. It was also fairly simple to create a test suite using Selenium to go through business flows and assert they were working before deploying a new version.Why is it not simple anymore? I've implemented this at more than one company.> Aside from being an obvious single-point-of-failure, defeating some of the service-oriented architecture's principles, there's more. Do you create a user per service? Do you have fine-grained permissions so service A can only read or write from specific tables? What if someone removes an index unintentionally? How do we know how many services are using different tables? What about scaling?Come up with a convention for your company and stick to it. If you can automate it that's better. If you build some way for your task you are running to know "who" it is they can inject that information into other libraries. For example you can inject the following environment variables into a container:<pre><code> FOOBAR_DB_ABCD=pg FOOBAR_DB_ABCD_PASSWORD=... FOOBAR_DB_ABCD_HOST=... FOOBAR_DB_ABCD_PORT=... </code></pre> You can then have some library you write expose a `OpenDatabase("abcd")` that connects in and injects everything. A security operator can then provision accounts and everything transparently. If you generate those env vars from some automated config management tool you don't even have to see the passwords.> Instead of having a monolith getting all of the traffic, now you have a home-made Spring Boot service getting all of it! What could go wrong? Engineers quickly realize this is a mistake, but as there are many customizations, sometimes they cannot substitute this piece for stateless, scale-friendly ones.I don't think this is a single point of failure. This is a single point of failure for a specific subset of your infrastructure and at that it should be a very simple (mostly-pass-through) component. If your mobile gateway dies, your backend one shouldn't. If all of your API gateways die then your integrations to third parties that are required for legal compliance should stay up, etc.> I've seen teams using circuit breakers and then increase the timeouts of an HTTP call to a service downstream.You should always decrease timeouts for operations if you're attempting to retry calls. You can also use a load balancer that already knows the liveness state of all of your instances.Aside from this bit I mostly agree with this section about timeouts, retries, etcs is actually correct. If you are tackling a single problem that's simple don't break it into a distributed system. If you are saying "I want to have X but we should implement a Y" where Y is a completely different thing that doesn't need to talk to X directly, then why not implement it in a separate binary? There's no reason they can't share code to make the burden of operation low.

jameshartalmost 4 years ago

I guess you just won’t get traction on HN writing ‘Disasters I’ve seen in a monolithic world’.If you have never gone through a significant dependency upgrade on a large monolithic codebase, you might not appreciate the value of microservice architecture. I’ve been at companies where the number one technical achievement for a tech organization for an entire calendar year was ‘successfully moved the app from .net 2 to .net 4’. Have you ever gone through the process of integrating an acquired company’s monolith into an acquiring company’s monolith? It’s futile. With a microservice architecture though I’ve seen acquisitions integrate significant systems (things like billing and auth) in a matter of weeks where monolith projects dragged on for years.Sure, there’s no silver bullets. But there are a lot of problems with monoliths which microservices eliminate. Not without trade offs, naturally.

评论 #27496059 未加载

评论 #27496304 未加载

评论 #27498030 未加载

ryanthedevalmost 4 years ago

I work in a 300+ project monolith. If you think microserivces are an issue. I can’t help you.I have worked in both. Until you realize that both worlds have their pros and cons, just stop.

fraysalmost 4 years ago

Great read, thanks.