No server NEEDS to go down for maintenance. You can avoid doing so for anything, at any scale, DB change, server updates, etc.<p>The problem is that a 0-downtime system, at a certain scale, is very costly to create and maintain. You need redundancy everywhere, load balancing everywhere, data replication, synchronization. Those are hard problems.<p>Basically you need to arrive to the level of being able to release the Netflix Chaos Monkey in prod to be sure it works even if part of your system is busy with the update, or just out of sync. This is certainly doable. It's also very expensive, requires a lot of time and many experts to work on the problem.<p>Putting a site on maintenance mode can be a middle ground you choose, because you don't want to invest that much just to avoid taking down you site for a little time once in a while.<p>Economics.<p>Of course, if you do choose the road of 0down time, you site will gain more than just availability, it will gain reliability as well, since those best practices serve both purposes.
Reasons I have been "down for maintenance" in the past.<p>- Moving from AWS to our own datacenter.
- Payment processor issues. We weren't making money with the payment processor down... “down for maintenance” meant lower customer service costs.
- Because the CEO told me to. I shit you not. Be wary of working for someone has a name that sounds like it belongs on a bond villan.
- Because sometimes you NEED all the resources to get something done quickly
- In the days before AWS and "cloud computing" you only had hardware on hand. It is hard to get your boss to budget for a traffic spike of one hour that is greater than the sum of the previous 6 months of traffic.
- Because non technical people have access to technology: It was just some javascript -or- I didn't think I needed to tell you before I emailed 5 million people with an offer for free stuff -or- why is everything on sale for %25 off ....
- Because load and time and complex systems sometimes do funny things together, "maintenance" means were getting enough data reproduce it finally.
- The very beginning of a DDOS attack (only for some industries & sites)
Always avoidable if that's a priority - schema changes can be done online in MySQL. Patches can be done on subsets of servers. Erlang even supports hot code reloading so that even if you had a single point of failure you can upgrade without losing file descriptors or in memory state. It is a lot simpler if you have the choice though, since you don't have to have multiple versions online at the same time. "Divisions of Ericsson that do [hot code reloading] spend as much time testing them as they do testing their applications themselves." [1]<p>[1]: <a href="http://learnyousomeerlang.com/relups" rel="nofollow">http://learnyousomeerlang.com/relups</a>
I don't recall sites like Google or Facebook ever being down for maintenance. Are there any articles that discuss how they manage application layer and database layer migrations?
Because of thing they were not thought of.<p>You don't see 'Maintenance' on systems of companies which do this for a long time. You might see this at 'normal' companies. Smaller ones who used the 'wrong' database and had to migrate it.<p>If you start with one database and 'forget' or just don't think about it to have a master, slave, slave combination, you have to fix that once.<p>When you made a mistake, you have to fix it once.<p>Also today you are able to maintain quite a big page with a very small amount of people. The chances, that one of those didn't think about all necessary elements of an always online system is not far fetched.
I've always wondered whether Apple takes its website "down for maintenance" before a product launch out of necessity or simply to build excitement.
Common causes are things like software upgrades and database changes. There's probably always a way to avoid it but going down for maintenance might be less effort and cheaper overall depending on the site. For example, if you can do it during a known time of low traffic or when you know users will just come back later. I've noticed several UK bank websites go down for maintenance during the night.
The short answer is cost versus benefit.<p>For some types of websites, zero-downtime upgrades and maintenance are costly.<p>Online banking is a good example. I have accounts with several banks, and all of them periodically "go down for maintenance". I assume that's because the talent and infrastructure needed to do those tasks with zero downtime are more expensive than whatever customer service hit they take for planned outages.
Because it is much easier than performing complicated modifications while the site is running.<p>For example, at Google "down for maintenance" is not on the table. That can in some cases lead to lots of extra work or time, e.g. dual writes for a period of time followed by mapreduces to fix the remaining part.<p>My internet bank is often down for maintenance on Sunday nights. I assume it is because they have a very old system.
PHP board software:<p>- occasionally benefits from clean-up tasks which can be long running and would result in an irritating experience. While slow read operations in theory may be possible it is better to tell the users to come back later than to erode their confidence.<p>- sometimes the database of a board can corrupt. The repair operations (sort of a disk fsck for the board) require the database exclusively.<p>- software upgrades
Not every aircraft has all the expertise, tools, and spares on board at all times to be able to service or replace their engines in flight.<p>If the system has not been designed from the ground up for that type of service, then the on-board expertise would also have to be gifted at developing workarounds on-the-spot that reliably work the first time.
I really don't think there is any excuse for it this day and age especially when building sites from scratch. There are so many different techniques and technologies for doing zero downtime deploys, not to mention the numerous PaaS that will do it out of the box if you dont know how.
Mistakes were made during the deploy of the new website to production. A failed website deploy is a bit more noticeable to the public than the failed deployment of an internal only system.