Avoid rewriting a legacy system from scratch by strangling it

397 pointsby ahuthover 5 years ago

26 comments

d_wattover 5 years ago

I had the experience of inheriting a codebase that was halfway through the process of being “strangled”, and it was a nightmare. The biggest reason being that it's not a "fail safe" way to plan a project. In this particular case, a full replacement was probably a 12 month affair, but due to poor execution and business needs, priorities shifted 6 months in. It was full of compromises. In some places, instead replacing an API completely, it would call into the old system, and the decorate the response with some extra info. Auth had to be duplicated in both layers. Debugging was awful.While some of the issues could be chalked up to "not doing it right," at the core of it, the process of strangulation described in the article leaves the overall architecture in a much more confusing state for the lifetime of the project, and if you have to shift, you've created vastly more tech debt then you had with the original service, as you now have a distributed systems problem. Unless you can execute on it quickly, I think it's a very dangerous way to fix tech debt, avoiding fixing the core issues, and instead planning for a happy path where you can just replace everything.If you absolutely think you need to quarantine the existing code, I'd recommend putting a dedicated proxy in place that routes either to the old service or the new service, and not mixing the proxy and the new code. That separation of concerns makes it much easier to debug, and vastly reduces the likelihood of creating a system of distributed spaghetti. What I’d really recommend, though, is understanding the core codebase that powers the business, and make iterative improvements there, rather than throwing it all out.

评论 #22354368 未加载

评论 #22353933 未加载

评论 #22352579 未加载

评论 #22353636 未加载

ineedasernameover 5 years ago

My recent experience with an ERP, specifically some major bolt-on modules, was that the vendor simply made the switch to a new platform that had maybe 60% of the capabities. A roadmap (which has actually been fairly accurate) showed about 3 years to get to 90%.New customers were pushed to the new product. Existing one were encourage to do so and temporarily live without prior features (usually with temp workers doing things manually) for a deep discount. Those who had to stay with the legacy system were told to expect nothing but bug fixes and compliance-related updates (for federal programs and reporting requirements) and that if they needed something more than that, they'd either need to built their own bolt on (there was a robust, if clunky sdk) or pay contractors to do so.It sucked, yeah, but it seemed like a reasonable way to go about such a transition that was always going to make people unhappy.

评论 #22353644 未加载

评论 #22354016 未加载

wiradikusumaover 5 years ago

I'm in the middle of a rewrite. It's very challenging, but the alternative is worse (a sinking ship). My lessons learned:<pre><code> 1. Do it sooner 2. Get full commitment from stakeholders 3. Agree on feature freeze 4. Get it done quickly 5. Don't over promise, esp about the timeline 6. Focus on delivering big/important items first (MVP) 7. Appoint a benevolent dictator, don't assemble a committee to avoid second-system syndrome 8. Have test scenarios ready (black box) </code></pre> Unfortunately they all depend on another, e.g. the longer you wait for rewrite, the harder it will be to finish it (feature creep).I will write a blog post when it's done successfully, otherwise I will hide under rock.

layer8over 5 years ago

Of course, that approach is difficult to apply if the interface is a significant part of, or deeply entangled with, the pain points that the rewrite is intended to solve.

评论 #22351565 未加载

评论 #22353468 未加载

评论 #22351663 未加载

评论 #22352313 未加载

aargh_aarghover 5 years ago

The comma, present in the article and missing in HN, completely changes the meaning of the title.

评论 #22351584 未加载

评论 #22354233 未加载

评论 #22353712 未加载

评论 #22353429 未加载

评论 #22351594 未加载

rusticpennover 5 years ago

We did it very differently in our group 1. The developers of the old tool continued to work on it. 2. A new team took requirements from the old team and filtered to make them more meaningful 3. Designed a system architechture that would work with the targeted workflow 4. Designed a minimal version and ran it with a new branding next to the old one. 5. Reached feature parity with the old one and dumped itThe important thing to note is that the new tool does not do everything the old tool does. The workflow is also different from the old one. However the customers loved the new one as it was simpler, faster and more robust to use.

nooberminover 5 years ago

Am I the only one in the IT adjacent world who thinks the inverse of this is the larger problem (churn, NIH, reinventing the wheel) in software today?

评论 #22351462 未加载

评论 #22351891 未加载

评论 #22354194 未加载

评论 #22352069 未加载

评论 #22351831 未加载

mannykannotover 5 years ago

One thing that complicates matters somewhat (as if they were not already complicated) is at the decision point marked isRoundtrip? in the fourth (penultimate) diagram, where the affirmative case is handled within the new system.Given, however, what is being posited -- a legacy system that is not modular and which contains unrefactorable pathological dependencies -- the old system must also handle this case in parallel, in order to be in the correct state to handle future requests of a type that still need to be delegated to the old system.This parallel implementation may have to persist well into the replacement process, and the requirement for it to do so may mean that you still have to do double implementation of features and fixes for most of the transition.

评论 #22356929 未加载

kazinatorabout 5 years ago

Fantasy:> Here’s the plan:> Have the new code acts as a proxy for the old code. Users use the new system, but it just redirects to the old one.> Re-implement each behavior to the new codebase, with no change from the end-user perspective. Progressively fade away the old code by making users consume the new behavior. Delete the old, unused code.Here is the reality:1. People do the above incompletely; their deletion of the old system slows down and then they move on to another project or organization, leaving a situation in which 7% of the old system still remains.2. People iterate on the entire above process, ending up with multiple generations of systems, which still have bits of all their predecessors in them.

khendronover 5 years ago

I think an overlooked aspect of a legacy system that makes "strangling" difficult is that nobody fully understands the behaviours of the system anymore.It is really hard to replace the functionality of a piece of code when you don't know 100% what that functionality is.

评论 #22357649 未加载

myth_drannonover 5 years ago

I see it working for a backend code, legacy UI systems has way more coupling so it would be better to do a complete rewrite. If you have a legacy framework A and you start replacing it with framework B, component by component, it will have to follow the practices of framework A and basically you are going to be writing legacy style code in the new framework B which is much worse than having legacy framework A. Because framework B is now written in a completely alien way and not how it was intended to be used.

pflanzeover 5 years ago

I have written a set of libraries and dev tools (like a better repl) for Perl (the FunctionalPerl project) with the idea to help write better code in that language, and to give me and whoever joins in such efforts a way to hopefully save a legacy code base. Maybe it is the case that when a company reaches the point where they feel their code base has become unmaintainable, it can still be saved by using the tools and programming approaches that I can provide. That (other than, and more than just, "because I can") is the major motivation why I invested into that project. But I wonder how much it will help. I haven't had the chance to try it out so far. I got to know companies that have begun to split up Perl applications into micro services and then move the individual services to other languages, and they don't necessarily have an interest in my approach. But I'm also very diffident reaching out to more companies, due to worrying about how much pain it would be to deal with (and how likely it would fail)--investing my time into newer tech (Haskell, Rust etc.) instead looks tempting in comparison. Should I continue to reach out to companies to find the right case (presumably working as a contractor, with some big bonus if successful)? Any insights?

Cthulhu_over 5 years ago

I'm dealing with a rewrite at the moment (that is, I was hired to start rewriting an existing web application). I want to apply this pattern but the existing codebase was already dated by the time it was written. It's a huge load of mixed responsibilities, globals (it's a PHP backend), RPC-like http API (every request is a post containing an entity name, action, parameter, and additional parameters handled in a big switch), etc. Files of 13K lines of code.So far I'm stuck in the overthinking phase of the new application. And as the article states, I'm asked to keep adding new features to the existing application - nothing big (because individual things aren't big), but at the same time, I've been adding a REST API on top of the existing codebase for the past few weeks. It's satisfying in a way but it hurts every time I have to interact with the existing codebase and figure out what it's doing.Plus we're not going to get rid of the existing application at this rate. I should probably set myself limits - that is, I'll postpone and refuse work on the existing application if it's not super critical. And quit if they're not committed to the rewrite before the summer.

jillesvangurpabout 5 years ago

Strangling is a good way to slowly replace a system by simply starting to work around it until whatever value it adds is so diminished you can safely pull the plug.Big software rewrites are extremely risky because they take inevitably more time than people are able to estimate and also the outcome is not always guaranteed.An evolutionary approach is better because it allows you to focus on more realistic short term goals and it allows you to adapt based on priorities. Strangling is essentially evolutionary and much less risky. It boils down to basically deciding to work around rather than patch up software and minimize further investment in the old software.Also, there are some good software patterns out there for doing it responsibly (e.g. introducing proxies and then gradually replacing the proxy with an alternate solution).

ncmncmabout 5 years ago

I did a rewrite.The old code worked, but was slow. Adding features would make it slower. Lock-free queues and threads everywhere, packet buffers bouncing from input queues to delivery queues to free queues to free lists, threads manfully shuttling them around, with a bit of actual work done at one stage.Replaced it all with one big-ass ring buffer and one writer process per NIC interface. Readers in separate processes map and watch the ring buffer, and can be killed and started anytime. Packets are all processed in place, not copied, not freed, just overwritten in due time.It took a few months. Now a single 2U server and a disk array captures all New York and Chicago market activity (commodity futures excepted).I kept the part that did the little work, scrapped the rest.C++, mmap, hugepages FTW.

corneliusphi2over 5 years ago

Having successfully replaced a legacy system one time we got it to work by turning the legacy system's business logic into a library that the new system could use. This key is just replace the underlying architecture without reimplementing years of work.

d--bover 5 years ago

What the article describes is a rewrite! In the end there will be no more legacy code left...What the article is saying is: don’t rewrite your code in one go, but rather cut the system in pieces that are independent and rewrite each in successive phases.It’s kind of obvious, though. And the difficult part of the rewrite is actually to slice the original code in indépendant chunks. More often than not legacy systems are riddle with leaky abstractions and dependencies (the infamous spaghetti code), that’s a hell to disentangle.

sivanmzover 5 years ago

Often, the clients of legacy code are old too, and are hard coded to access it.I've done this, but on a private branch, with a single merge to trunk in the end. Starting with complex integration tests, new interfaces were gradually defined and made the code testable, giving me the needed confidence.

shujitoover 5 years ago

So, how can this be applied to mobile app development? I can think of adding dependencies and new code to get along with the old code in the app, but it will cause a considerable bloat (size) of the app, which it can be noticeable by management, unlike web services/sites/apps

评论 #22352848 未加载

rbosingerover 5 years ago

All of this gets harder if it's your data model that is the problem. So, get those data models right early if you can!

评论 #22352562 未加载

评论 #22353565 未加载

评论 #22354836 未加载

smadgeabout 5 years ago

Does the strangler have to be a separate server? Couldn't you wrap the existing code within the same binary?

p0nceover 5 years ago

Does this happen in practice or the old product is just replaced by a newer _competitor_?

评论 #22357530 未加载

utxaaover 5 years ago

but how come the linux and BSD kernels, emacs (since RMS), the java language, even python (python3 was not a rewrite), git, hg, django, etc ... have never been rewritten from scratch?what is the lesson here?

grey_earthlingover 5 years ago

> After 7 months, you start testing the new version.Translation: after 7 months you stop mucking about and start trying to produce something useful.

utxaaover 5 years ago

i thought this was an article about not rewriting a legacy system.

benignslimeover 5 years ago

<a href="https://martinfowler.com/bliki/StranglerFigApplication.html" rel="nofollow">https://martinfowler.com/bliki/StranglerFigApplication.html</a>If you wanted to read the referenced article. This was the first thing I thought of. I appreciate Fowler's writing style and his sourcing. He always links some interesting stuff.

26 comments

d_wattover 5 years ago

评论 #22354368 未加载

评论 #22353933 未加载

评论 #22352579 未加载

评论 #22353636 未加载

ineedasernameover 5 years ago

评论 #22353644 未加载

评论 #22354016 未加载

wiradikusumaover 5 years ago

layer8over 5 years ago

Of course, that approach is difficult to apply if the interface is a significant part of, or deeply entangled with, the pain points that the rewrite is intended to solve.

评论 #22351565 未加载

评论 #22353468 未加载

评论 #22351663 未加载

评论 #22352313 未加载

aargh_aarghover 5 years ago

The comma, present in the article and missing in HN, completely changes the meaning of the title.

评论 #22351584 未加载

评论 #22354233 未加载

评论 #22353712 未加载

评论 #22353429 未加载

评论 #22351594 未加载

rusticpennover 5 years ago

nooberminover 5 years ago

Am I the only one in the IT adjacent world who thinks the inverse of this is the larger problem (churn, NIH, reinventing the wheel) in software today?

评论 #22351462 未加载

评论 #22351891 未加载

评论 #22354194 未加载

评论 #22352069 未加载

评论 #22351831 未加载

mannykannotover 5 years ago

评论 #22356929 未加载

kazinatorabout 5 years ago

khendronover 5 years ago

评论 #22357649 未加载

myth_drannonover 5 years ago

pflanzeover 5 years ago

Cthulhu_over 5 years ago

jillesvangurpabout 5 years ago

ncmncmabout 5 years ago

corneliusphi2over 5 years ago

d--bover 5 years ago

sivanmzover 5 years ago

shujitoover 5 years ago

评论 #22352848 未加载

rbosingerover 5 years ago

All of this gets harder if it's your data model that is the problem. So, get those data models right early if you can!

评论 #22352562 未加载

评论 #22353565 未加载

评论 #22354836 未加载

smadgeabout 5 years ago

Does the strangler have to be a separate server? Couldn't you wrap the existing code within the same binary?

p0nceover 5 years ago

Does this happen in practice or the old product is just replaced by a newer _competitor_?

评论 #22357530 未加载

utxaaover 5 years ago

grey_earthlingover 5 years ago

> After 7 months, you start testing the new version.Translation: after 7 months you stop mucking about and start trying to produce something useful.

utxaaover 5 years ago

i thought this was an article about not rewriting a legacy system.

benignslimeover 5 years ago