Ask HN: Has anyone implemented a zero-maintenance system?

29 pointsby _pdp_almost 10 years ago

I am more and more interested in designing and developing a zero-maintenance system - i.e. a system that once built requires no maintenance whatsoever.Is this possible to achieve at all?

30 comments

ChuckMcMalmost 10 years ago

It is an interesting question, but it would help to be a bit more crisp. You need to define 'maintenance'. Allow me to explain.A sun-dial, requires little maintenance after it has been set up. Cleaning off the bird feces and the leaves is all. It is not "zero" maintenance though. A trail duck is just a pile of rocks, and in the absence of getting knocked over requires no maintenance, but may require repair from time to time. A spacecraft on a mission to Pluto might require a software change or a course correction, but its hardware is not changed once it launches. A humming bird feeder sits on my porch and provides sugar water to birds in the neighborhood, when it runs out I pour in more.So there are three things people often lump under the general heading of maintenance:Repair - making something work after breaking.Resupply - replacing consumable supplies, cleaning, and adjustingRenew - giving something new capabilities or changing its operation by replacing parts or softwareIf you're being pedantic about the term, there is no such thing as 'zero maintenance' if you want a system to remain stable. Entropy is a thing.So perhaps the more crisp question is, are people working on systems with really low maintenance requirements? Where "low" can be defined as hours of maintenance vs hours of operation or skill level of maintainers vs skill level of the builders, etc. Sure there are some really famous ones like the clock of the Long Now. Every museum in the world with interactive exhibits tries to design systems that will run with very little maintenance for a long time. Pretty much every operations lead spends time working on ways to keep large numbers of computer systems running with little human intervention.

ams6110almost 10 years ago

This was the default way software worked before widespread use of the internet, and in particular game software up until the last few years. You buy a game on a CD or cartridge, and that's it. There is no "maintenance" after that.Before internet use was common, there was just no really convenient way to deploy bug fixes, at least to consumers. Enterprise software updates were probably handled by a visiting rep who applied the updates, or a set of tapes with instructions to the local administrators.But for consumer/office stuff, when you shipped something, that is what it was. You worked pretty hard to be sure did what it was supposed to do.

评论 #9912393 未加载

评论 #9913778 未加载

andyidsingaalmost 10 years ago

This question should include a time frame. (just like when someone desires a "real time system", the deadline is key to answering the question "is it real time or not"One way to set this time frame could be "life of the system" -- which might be say 2 years, or 5 years, or 50 years.The more constrained your requirements and design are in terms of inputs and outputs and error states, the more you can validate and get closer to maintenance free (finite state machines are a nice pattern for this).Many embedded systems (appliances, space program gear) achieve this, but I've noticed that as those are built more and more with off the shelf general purpose hardware and software they become more error prone and require maintenance (due to competition and time to market tradeoffs?)edit: one final thing re error states : watchdog timers. These can be tremendously useful in reducing maintenance, and keeping functionality alive. When certain error states are entered, the watchdog timer triggers and the system can restart to a known good state and, hopefully, restore its core functionality until that particular error state is entered again.

评论 #9912201 未加载

forgottenpassalmost 10 years ago

Of course it's possible, but I don't know where you're trying to go with this.Set a lifespan the product should be able to operate without maintenance, based on your goals and engineering constraints. Design with the lifespan in mind, perform lifecycle testing. Then release your product, stop working on it and don't offer support.The feasibility of such depends on what you're trying to build. Customers have different expectations of lifespan, maintenance needs, and manufacture support depending on the product. A garden hoe is not an alarm clock radio, which is not a car, which is not a general purpose operating system.The status quo of maintenance for any product domain is an inherent limitation to putting out something better. You can't just jump ahead of the competition because you will it so. You need some combination of: building with higher quality components, insight into the problem domain that none of the current commercial solutions have, or simply more productive (ie. more expensive) R&D.If you're just looking at products that currently require maintenance and are licking your lips at the idea of selling a turn-key solution, join the party. Otherwise roll up your sleeves and make low maintenance a key design goal of your product.

acdalmost 10 years ago

Sure I've seen it it did not run anything modern os wise. The MS DOS based control computer at my dads previous work place had an uptime of what was close to forever. We are talking stability in the range of ten to fifteen years. It ran a climate computer system control program and the control software newer crashed. Because MS DOS it was single tasked and the control program was very stable.At the time there was little computer hacking activity so you did not have to update the software either. Backups was done to floppy disks and there was a reserve hotswap backup computer. For remote control you would dial into the computer via modem, first via 300/1200 baud that is 1200 bits per second. Then later at lightning 9600 baud.I'm also sure you find very good old engineering in the computers of Voyager.

评论 #9912736 未加载

Zenstalmost 10 years ago

It is always something to aim for but with anything time always catches up. I would say if it is anything internet or connectivity wise externally in any form comm wise, then you have to plan for maintenance from a security and protocol changes aspect over time (thinking transaction data formats).So if security not an issue or removed from the project via some wrapper be that system locked in a room with control and other layers. Then you still have to plan for how long it is meant to last and being realistic.WIth most business assets your talking 5 year write off at best tech wise with many 3 or maybe less, depending upon use.SO possible - in the right situation - yes. But do not forget security.THough remember time always a factor and also wear and tear and with that a pen eventually runs out of ink and required refiling. So dependant upon the timeframe doable but for many at what cost.Another aspect would be look at what you have got that has fulfilled this criteria and not required maintenance and then ask, were you lucky or did you plan it that way as for many it usually ends up as some legacy system, hardly anybody uses or with low usage that has no internet connectivity at all. That is when you see your Amiga or C64 or early PC working away without issue.But things fail and however well you plan things, things happen and to not at least check everything working and monitoring because it was designed to just work is something you should also plan. Expect the unexpected.

deanclatworthyalmost 10 years ago

Your question provoked an interesting thought process when I began to answer it.I had first thought that one of my projects was zero-maintenance as I truly gave it zero-maintenance in the three years between when I built it, and when I sold it. It was a website which indexed content from others. I had coded the "spider" in such a way that it was quite flexible if the content on the page changed a bit and it stood me well for those three years. I had a database back up system in place that backed up to another server.But then I began to realise that although I had gave the site zero-maintenance, it wasn't truly zero maintenance. If I'd had a hardware failure, I'd have a backup of the database but no way to automatically spin up a new server and put it in place. But even if I had some monitoring in place for that, what if the monitoring service went down? I guess what I'm getting it as that there can never be a zero-maintenance self-sufficient service - just one that is incredibly well automated by great engineers.

kwhitefootalmost 10 years ago

My fridge is 30 years old and unless you count periodic cleaning and defrosting as maintenance it has had no maintenance at all in that period. One of my two freezers is 25 years old, ditto. Microwave oven is over twenty as well, but as the internal light has failed I suppose it doesn't quite count. My wife's hair dryer wasn't new when we met and that was forty years ago, still going strong.I very much doubt that anything I buy from now on will last so well, mostly because of quite unnecessary use of electronic control systems that add unnecessary features and extra failure modes. Of course my new fridge uses only half as much power as the old one which means that the new one saves me about 125kWhr per year, which in turn means that it will take about twenty years to pay off the investment, good job that wasn't the reason I bought it.As someone already pointed out you have to specify the expected life time to be able to evaluate 'zero-maintenance-ness' of a system.

rbcalmost 10 years ago

The idea of zero maintenance is appealing but out of reach in a lot of ways. I suppose you mean some kind of software system. From the perspective of the systems world the runtime dependencies are a moving target. Most platforms evolve and their interfaces change over time.The teams the develop these platforms (Java/Node.js/Ruby) have to balance support for old code branches with the development of new ones. Once the platform team leaves a code branch behind, the problems start to multiply. Zero day exploits are developed for zero day vulnerabilities. The underlying operating systems also evolve and leave the old versions of the platform behind as well.At some point you have to open your applications code base back up and update it for the updated platforms. If you don’t the application dies with its platform. Sort of like the ship that sinks of rust or building that collapses due to the accumulation of deferred maintenance.

pjc50almost 10 years ago

Many non-networked systems neither need nor support updates.If your system is networked, there is always the risk of security updates being required. This can be minimised by extreme effort, but if you have to support SSL/TLS? At least you need a way of deprecating the vulnerable ciphersuites.Would you want a web browser with zero updates? It would gradually become a handicap.For consumer equipment, "zero maintenance" means "disposable": in the event of trouble, throw it away and buy a new one. Good for manufacturer turnover, not so good for the environment.The other option is simply 100% outsourced maintenance, which has long been an option. Some mainframes would even do their own fault reporting, all you'd have to do is say hi to the technician and let him into the building.

_pdp_almost 10 years ago

I think I should have been a bit more clear but I also like that the question was a bit vague because there is a lot of interesting comments highlighting things I never thought about it.I believe that software engineers failed in some ways because we always factor the maintenance as part of the development cost. In other words, we assume that we have to maintain a system once it is built. It is even more relevant with web applications which give the impression that they are in a constant flux. I don't buy the idea that web apps needs to adopt because standards don't move that fast and almost all browser are backwards compatible with sites that were developed in the 80s. I don't know what is behind HN but its simplistic, non-fluid interface hasn't been changed (I don't know actually) since it was made which for me shows that it is possible to make something useful without the need to constantly change. There are many example like that.I also like the analogy that some of you wrote about physical systems. Indeed, there are physical systems that hasn't been changed for years and they still work. There is also a lot of software systems, mainly SCADA stuff, that are also designed never to be changed. Not long ago I heard a story about a pentest on a SCADA system that was controlling a dam. Despite that the pentest found a bunch of vulnerabilities, it was not possible to change the system because its author was dead long ago. The dam operated just fine even with the vulnerabilities in its software.Another point I wanted to bing about is about defining time and version constrains. Back in the days software was not continues. In other words you get Doom 1. Doom 2 is a different story. Therefore, Doom 1 source code can go free. I would say that Doom was very much low to zero maintenance system, albeit a software one. Bugs were part of the character of the software. Hacking the software through its bugs was as close as it get to magic in the digital world. As we get more connected we require to maintain the software.These are just a few things that came to my mind. I will add more thoughts as I go over your comments.

评论 #9912508 未加载

评论 #9912426 未加载

评论 #9912416 未加载

vinceguidryalmost 10 years ago

If it's doing anything useful, then no. The definition of 'useful' will change over time, so what the system is doing will fall away from that unless it is changed to bring it in line with the new requirements. Cue maintenance.The trick isn't eliminating maintenance, but in making maintenance as hassle-free as possible. The best way I've found to do this, the way I do it personally, is to build enough slack into the human system surrounding the automated system (in other words, the business you're working for) to adequately design, implement and iterate maintenance procedures. "Ship" the procedures by handing them off to an unskilled person so you can get feedback for iteration.

mattkreaalmost 10 years ago

I don't think there is such a thing.Best case scenario is a lot of automation and monitoring.Where I work our dev / engineering team is investigating machine learning to automate repair a bit better. For example, over a 7 day period we record data for load on our backend and that ML model predicts and generates autoscaling rules according to historical data. Some of our services are about as low maintenance as I would think they can get (autoscaling, autoremoving poorly performing instances, etc) but I'd like to improve and only make an engineer get involved when the system can't be trusted to make the right call.

Spooky23almost 10 years ago

If the requirements are static and you don't require external connectivity, it's very possible.My wife is a financial person. Her billing system (it's a public utility) until 2009 ran on an AS/400 with 33mhz processor. It was probably rolled out in 1991 or 1992 and upgraded to support IP connectivity in the late 90s.It's only connectivity was a modem that called IBM when something broke.I made fun of her about it, but It actually worked great. The only reason it was replaced was that IBM ran out of spare parts for the printer.

Pamaralmost 10 years ago

Google Search Appliance was supposed to be "deploy and forget" (at least in its earliest version) but it still had a connection (over phone line, IIRC) so that Google support could reach it in case of need.Such a design makes sense only if the work the system has to do is predictably static though. Even in the case of Google Search Appliance, the moment your company adds a document in a new format the device will require some kind of external intervention to keep working.

lugus35almost 10 years ago

A Zero-maintenance system ? This is just what NASA is trying to build.It's just a matter of relation of cost versus maintenance work.If you want to build a zero-maintenance system by yourself, just try to imagine your service and servers will be launched over to Pluto or a comet, and will never come back.But remember that you have zero-maintenance if you pay someone to do the maintenance in your place. That's what people generally do when paying for cloud services.

lazerwalkeralmost 10 years ago

I have a server that doesn't do anything at all. It has neither inputs nor outputs connected to it, and isn't even connected to a power source.Except that I suppose I'll still have to blow the dust out of it every so often, and make sure it's in an environment where it won't rust.(To be less glib: what do you mean by "system"? What do you mean by "maintenance"?)

milankragujevicalmost 10 years ago

My hair dryer has been working for 20 years without any maintenance. However I don't think that would be a good idea for anything connected to the internet. However, if it's a closed system, and it does the same thing constantly in a constant environment, I think it's possible.

评论 #9912222 未加载

fizxalmost 10 years ago

Any system is zero maintenance as long as you choose not to maintain it.Also, zero is a very small number.

brianwawokalmost 10 years ago

Most hardware is like this, right? Your microwave does not (yet) get firmware updates.. what ships has to work right.I don't think this is a good idea for a web server.. browsers change, you have to do some things to keep current..

otikikalmost 10 years ago

AFAIK the only zero-maintenance system is no system.So in a way, every time I realize I don't need to build a "system" for a given task, I am building "a zero-maintenance system" of sorts.

cgioalmost 10 years ago

I have a couple "quick fix" excel based solutions that still run, with no maintenance 7 years later. There is nothing more permanent than the temporary.

beginrescueendalmost 10 years ago

It depends on what you mean, what your goals are, what you are trying to build, etc."No maintenance whatsoever?" I doubt it. Things break, get old, go obsolete, are insecure, and wear out.Perhaps, you should worry about "nines of uptime" or "fixed requirements" or something along those lines.Simplicity is first. The more complex, the more maintenance. "Everything Should Be Made as Simple as Possible, But Not Simpler" - <a href="http://quoteinvestigator.com/2011/05/13/einstein-simple/" rel="nofollow">http://quoteinvestigator.com/2011/05/13/einstein-simple/</a>"No moving parts."Make it highly available and redundant. Power, cooling, networking, hardware, and software redundancies are needed.Make it immutable. Change and mutable state will create maintenance. Implement functional programming, if you write software.Monitor it and make it self-restart. Somebody already mentioned watchdogs, for hardware.Make it ultra secure. No outside networking?Program finite state machines....If you imply a "hardware and software" solution, these points sound like you need redundant hardware and Erlang/OTP. Take a peek at OTP and the Erlang-based languages (Erlang, Elixir, Joxa, and LFE).At least with redundant power, cooling, hardware, and Erlang/OTP (Elixir/OTP, etc.,) you gain the ability to do all of these things.With Erlang/OTP, you can achieve very high uptimes, and if you design it correctly, you do have the ability to hot-patch running code, if you do have to (rarely) perform maintenance.While you're at it, you also get distributed programming, concurrency, and parallelism, for free, with Erlang/OTP. This, in and of itself, can "reduce maintenance."See <a href="https://pragprog.com/articles/erlang" rel="nofollow">https://pragprog.com/articles/erlang</a> and <a href="http://stackoverflow.com/questions/8426897/erlangs-99-9999999-nine-nines-reliability" rel="nofollow">http://stackoverflow.com/questions/8426897/erlangs-99-999999...</a>

lcfgalmost 10 years ago

You should probably define "maintenance", otherwise it's very difficult to agree on such a system.

评论 #9912076 未加载

bdastousalmost 10 years ago

This seems analogous to a perpetual-motion machine.

BurningFrogalmost 10 years ago

Define maintenance.Are you talking about software?

blincolnalmost 10 years ago

Back when I was a systems engineer, I built a couple of systems that come as close as I'm ever likely to see to zero-maintenance.One of them is a piece of automation that looks for Active Directory accounts that are "inactive" based on a variety of criteria (including correlation with data not stored in AD). If they're considered "inactive", then they're disabled and their description is updated to indicate when they were disabled and why.I originally wrote it as a 6-12 month stopgap until it was replaced by a fancy commercial account lifecycle product. I believe it's now 7 years later and it's still in use by the team I used to be on when I wrote it.This may sound like a relatively simple task, but it actually wasn't, which is why (IMO) almost no one ever succeeds in building low/zero-maintenance systems. Some of the challenges I ran into:- Some users only use their accounts in ways that don't update the last logon timestamp in Active Directory. I can't remember all of the specifics offhand, but one was that at the time, there were still BlackBerry users, and if they only ever used their AD account for email, and only read email on their BlackBerry, the timestamp wouldn't be updated, so the automation had to query the BES database to look at their last usage their too and use the most recent of that vs. AD. I think there were 3-4 things like this, and the automation used the most recent of them all.- Employees go on maternity and military leave, and disabling/deleting their account would make for them being really unhappy when they got back. So the automation also has to check the HR system to see if their record is flagged as being on some sort of extended leave.- The government requirement that spurred the development of the automation only applies to accounts for people. There are plenty of service accounts that log on infrequently enough that they would be disabled, so the automation also has to differentiate between those accounts and accounts that represent people.In addition to the basic functionality, I also felt that it had to have significant safety features in place, because if something goes wrong and all accounts get disabled, then no one can log on to fix the problem. Among some of the other safety features:- With each iteration, the automation calculates how many accounts will be disabled during the next iteration as well. If that number exceeds one threshold, warning emails are sent to the account administration team. If it exceeds a larger threshold, it will refuse to operate altogether.- If no information was obtained from one of the data sources that it uses (e.g. a database was moved to a different server and the connection string wasn't updated in the account automation config), it will refuse to operate and generate warning emails.- I don't remember the details, but there are special conditions for things like "too much time elapsed between iterations" and "the last iteration took place after the current iteration" to catch edge cases where there are problems with time synchronization.I was able to build the automation in a way that made maintenance as close to zero as possible. In the ~7 years it's been running, AFAIK the only thing that's needed to be changed were the thresholds for number of accounts disabled in a single iteration, because the company expanded its use of AD and suddenly it became normal to operate against a thousand or more accounts at once instead of e.g. 200. A database connection string might have been updated when a DB got moved to another server as well.Anyway, trying to predict all of the things that can go wrong even for such a simple system turned out to be a lot of work. My experience is that it gets dramatically worse as the system itself becomes more complicated (e.g. complicated enough to be a commercial product as opposed to an engineering maintenance task). I don't think it's really practical to do it with significantly more complex systems - modern computing involves too many changing variables.For an analogy, consider trying to build a zero-maintenance system that waters and fertilizes a garden in a way that keeps the plants healthy. It wouldn't be that hard to build one that would handle the current garden. Imagine trying to build one that would handle literally any plant that someone could put in the garden. It's not impossible, but you'd need all kinds of wacky sensors and logic to figure out what each plant was and how frequently to water/fertilize it. It's much easier and less error-prone to just require that the gardener update a table that lists what types of plant are in which part of the grid, and maybe generate an alert if something changes that indicates that the table may no longer be up to date.

umanwizardalmost 10 years ago

The iPhone is pretty close, unless you physically damage itNot sure what is meant by "system" here.

评论 #9912466 未加载

评论 #9912105 未加载

marknadalalmost 10 years ago

Yes. Or at least that is the goal.What type of "system" is it? A database.I got fed up having to do maintenance on my database, specifically worrying about having to expand its storage capacity. I'd constantly be woken up in the middle of the night because my entire web application had crashed because it ran out of disk space, because the database greedily pre-allocated excess space to improve performance. I'm not a DevOps guy, so figuring out LVM and MDADM on the fly was a nightmare. And as my app got more popular, things got worse and I just could not keep up.But then I sat back and thought about it for a minute. I was deploying my app in the cloud, not on my own physical machines. Why was I worrying about expanding storage capacity when the cloud can sell me more of it than I could possibly keep up with? Why was I maintaining finite space when there was an infinite amount available? That doesn't make sense.The other problem was that despite the fact everything else was working (the web server, the frontend javascript, etc.), if the database server was down then nothing would function properly because data is the life blood of my app. This sucks, if one piece breaks the whole thing is defunct.So after a while, I decided to change all this. I decided to build a database that would be zero-maintenance, it hit the top of hackernews several times, some big name investors (Tim Draper) got behind it. Check out this demo of it automatically recovering from the complete loss of primaries - <a href="https://medium.com/@marknadal/gun-0-2-0-pre-release-auto-recovery-of-primary-fault-5f4ffbe63301" rel="nofollow">https://medium.com/@marknadal/gun-0-2-0-pre-release-auto-rec...</a> .So how is it zero maintenance? Well some other people in this thread mentioned some important points:- There are fewer moving parts - there is no database server. Instead it gets embedded into your app server and your frontend (see next point), so the only thing you have to maintain is your app. If your app is flawless or (more likely) can auto-restart just fine, then you'll won't need to do any maintenance.- It is highly available and redundant because it uses Peer-to-Peer architecture, like BitTorrent. So even if your server crashes, your app can continue to work in offline mode or with WebRTC (not implemented yet) continue to interact with other users. When the server auto-restarts at some point, the offline data will sync back up and resolve any conflicts. If you have more than one server running, then things will continue to work even if one or many of them crash.- Immutable data allows the system to recover and resolve conflicts without your manual intervention. I'll let others talk about how awesome immutable data is, I'm sure you've heard enough about how many problems it helps with. Especially when it comes to maintaining systems, stuff can't get corrupted, and even when it does, the immutable data allows it to reconstruct itself.- State machines allow the system in advance to know what is valid and invalid behavior, so it for the most part can avoid going down paths that are "incorrect" which lead to having to do manual maintenance, because it is already instructed in the first place what states to avoid or how to exit those states if it gets into them.I'm super glad you asked your question, because I feel like a lot of software developers out there are super negative and bitter about it because most systems they have worked with have basically ruined their lives (like what happened to me, having to fix stuff in the middle of the night). But just because a lot of things have been like this, doesn't mean we can't borrow from engineers or mathematicians ways to make zero-maintenance systems. So I really hope you find what you are looking for or build one yourself, if you do please let me know mark@gunDB.io because I'm interested in that sort of stuff. Maybe we could start some group/forum around zero-maintenance systems!

michaelochurchalmost 10 years ago

Yes, I have. It failed. No one used it, so it didn't require maintenance.I assume that that's not what you're looking for, though.There's a fine line between "maintenance" and "improvement", and without the latter, you have stagnation. There certainly are systems that require very low levels of maintenance. I have a friend who built a program in Erlang that is still running, 10 years later. (I don't mean that the code is still in production. I mean that the program itself is still running.) Of course, Erlang allows the definition of "a program" to span multiple machines, and we're debating terminology here...Pay-as-you-go maintenance is best. Don't allow technical debt if you can help it, push back against The Business on deadlines, certainly don't allow that micromanagement under the name of "Scrum" to get in or else you're just fucked when it comes to quality because you'll get a fuck-quality-I-need-to-complete-story-points culture, and create a culture of doing things right the first time.Not that you'll necessarily use them, but learn a few things about strong statically typed languages like Haskell or Ocaml (Java doesn't count; that's shitty static typing). One of the great things about Haskell is that it allows safe refactoring. You're not holding your breath every time you change the code, because the compiler will usually tell you where your change broke things, and you can just go in a fix them. It is possible to write highly reliable software in dynamically typed languages (such as Erlang, mentioned above) and I don't mean to denigrate those tools at all, but it's a bit harder, especially when you're fairly new to programming, to do so.Finally, once your system reaches a certain size, you will need tests no matter how good your type system is. They start to become an obvious win around a thousand lines of code. Consider generative testing (e.g. QuickCheck) rather than hand-written tests if you can.