The Ariane 5 rocket was carrying four ESA spacecraft known as "Cluster" (because they were to work together, in a tetrahedral formation). The bug and subsequent failure give another meaning to the word "clusterf%#k".<p><a href="https://en.wikipedia.org/wiki/Cluster_%28spacecraft%29" rel="nofollow">https://en.wikipedia.org/wiki/Cluster_%28spacecraft%29</a><p>Edit: The above Wikipedia article has the Ada source code that caused the problem.
I've heard from people in the space sector that it was the <i>exception</i>, not the overflow per se that caused the problem.
Had it not been caught the flight could have made it to orbit (if there weren't other problems). Wikipedia says it was a hardware exception but <a href="http://www.ima.umn.edu/~arnold/disasters/ariane5rep.html" rel="nofollow">http://www.ima.umn.edu/~arnold/disasters/ariane5rep.html</a> says it was a <i>software</i> one, and it was only in code that was needed in pre-flight so it seems likely to not cause problems if there wasn't that crippling exception.<p>These systems have become so big and expensive. This was the case since ICBM:s and it got only worse with Apollo.<p>Yet they are so vulnerable since there is no way to abort intactly once you have flown something like 0.1 seconds. (At least Saturn V had some redundancy.) You do not get a second try.<p>Both issues create a perfect recipe for stagnation - everything has to be checked and rechecked for years before and after a software or hardware change. If someone tries something new, and there is a launch or spacecraft failure, it is a political issue and heads will roll. People's technical and political careers are destroyed.<p>In short, this way is not likely to reach real spacefaring.<p>A more organic approach with lots of smaller actors working in parallel and trying and failing a lot more - but with better processes built in to handle said failures (technical, political and cultural) could be much more conducive to real progress like increase in operational flexibility, shortened schedules, better reliability and lowered price.<p>Reasonable sized reusable rockets with good intact abort capability in a testing and development program could up the launch rate hugely, and all kinds of different solutions could be quickly tested. I find it likely that this will eventually happen, but it is frustrating how long it is taking.<p>In this "horizontal velocity overflow" case, you could do an intact abort if you had a fallback to some alternate control law or even manual control. Those are not incorporated to current expendable space launchers but they exist in aircraft. (Saturn V and also the Lunar Module <i>did</i> have manual backup. You could fly the Saturn to orbit. The LM was hard got get to the right orbit where the CM was waiting...)
The Therac-25 is also a fascinating case of software failures causing tangible loss: <a href="http://courses.cs.vt.edu/cs3604/lib/Therac_25/Therac_1" rel="nofollow">http://courses.cs.vt.edu/cs3604/lib/Therac_25/Therac_1</a>
You know, also I'm slightly tired of that story (I mean it stings, my family works in the field), sometimes I feel like it's a good thing. Here in France, and with the elite political clique at the power even more, are afraid of risk. Our constitution was even emended towards risk-averseness. I think blowing up the GNP of an african country had various positive side effect: 1) risk is there, wether you have correctly signed the process paperwork or not 2) innovation feeds on blowups 3) be humble, stop being cocky on TV before a test launch, sending back the champaign and buffet was very painful to watch and there is no need for that.
I'd really love more details about this. What did the surrounding code look like, why wasn't there a compiler warning being produced by this code etc.<p>There's certainly a much larger - and probably quite informative - story here.
Reminds me of the (alleged) reason why first Soviet Mars missions missed the planet - there was an erroneous period instead of a comma at some part of its nav program written in Fortran.
Another (probable) software failure due to unexpected scenario was the Mars polar lander <a href="http://en.wikipedia.org/wiki/Mars_Polar_Lander#Loss_of_communications" rel="nofollow">http://en.wikipedia.org/wiki/Mars_Polar_Lander#Loss_of_commu...</a><p>The failure review concluded that the probable cause of loss was that the landing system software apparently interpreted the deployment of lander's legs as touchdown and shut down the descent engines. The vibrations caused by the deployment of the legs was not taken into account when designing the software.
There has been several references that SpaceX "fly" their Falcon 9 computer systems to test for bugs like this. The idea being that as far as the computer is concerned, it is a real flight and should act accordingly. Most of the problems to date, have been related to a mechanical problem. During the first docking, there was a minor issue with the sensor "field of vision" but this was fixed.<p>The point is that SpaceX procedures seem to be able to prevent similar software bugs in the Ariane 5 from causing a catastrophic abort or failure.
happens to the best of us... If its any condolence my first game app crashed after 10k points for a similar reason. check it out it should still be on the android store - Alliegator
There is a good article examining the various possible causes of the Ariane-5 disaster by Bashar Nuseibeh: "Ariane-5: Who-Dunnit?". See PDF here: <a href="http://www.inf.ed.ac.uk/teaching/courses/seoc/2007_2008/resources/ariane5.pdf" rel="nofollow">http://www.inf.ed.ac.uk/teaching/courses/seoc/2007_2008/reso...</a>
"R1...More generally, no software function should run during flight unless it is needed."<p>This means, that even using the most reliable language and trying to test as much as possible, there is always a risk of an overseen bug.
From the linked James Gleick article:<p>"the programmers had decided that this particular velocity figure would never be large enough to cause trouble. After all, it never had been before. Unluckily, Ariane 5 was a faster rocket than Ariane 4. One extra absurdity: the calculation containing the bug, which shut down the guidance system, which confused the on-board computer, which forced the rocket off course, actually served no purpose once the rocket was in the air. Its only function was to align the system before launch. So it should have been turned off. But engineers chose long ago, in an earlier version of the Ariane, to leave this function running for the first 40 seconds of flight -- a "special feature" meant to make it easy to restart the system in the event of a brief hold in the countdown."
compare this to the failures experienced recently with SpaceX: Elon's launch, while not perfect, recovered. I think this shows the power of E's vision, and how he's going to change the launch market.