<i>All of these problems could have been found by formal analysis.</i><p>"If only we'd had the human, time, money and organizational support resources to plan ahead more accurately, we wouldn't have made this particular mistake!" That's called the benefit of hindsight, and it's the project manager's classic "told you so". To management it sounds like "give me more budget and a slacker timeline", and to engineering it sounds like "someone wants to use a different one-true-solves-all-problems-solution".<p>Experienced system designers know that the real art is knowing that out in the real world, things will fail no matter how careful you are, so anticipating and detecting both known and unknown failure modes and recovering from them is really the critical need.<p>For an accessible, real world study of how this can be achieved with arbitrarily complex software systems, I can highly recommend reading about Erlang, or alternatively deploying a nontrivial pacemaker/corosync cluster. Most engineers never build a system this resilient in their lifetime, but once you have, you can never look back.