This is a slide deck that that every software developer reading this blog should pay special attention to. It gives a visually appealing account of how software bugs have been the root cause behind many of the technology/scientific disasters (the context is US space exploration & other telecom /nuclear related mishaps).
Check out the cases below… its all about overflow errors, wrong conversions or syntax, division by ZERO, rounding errors – does this sound unfamiliar????
Explosion of Ariane 5, 1996 due to “…conversion of a 64 bit integer into a 16 bit signed integer lead to an overflow…”
Loss of Mars Climate Orbiter, 1999 due to “…mix-up between pounds and kilogram….”
Mars Polar Lander, 1998 due to “…software error that mistakenly identified the vibration caused by the deployment of the lander’s legs as being caused by the vehicle touching down on the Martian surface….”
Loss of Mariner 1, 1962 due to “..period instead of comma in FORTRAN DO-Loop…”
Breakdown of AT&T’s long-distance telephone network, 1990 due to “…a single line of buggy code in a complex software upgrade implemented to speed up calling caused a ripple effect that shut down the network….”
USS Yorktown dead in the water, 1998 due to “….input and Division by ‘0’. „ X / 0 = undefined…”
MIM-104 Patriot Missile Failure, 1991 due to “…rounding error”
Shutdown of 5 nuclear reactors, 1985 due to “..use of arithmetic sum of variables instead of the square root of the sum of the squares of the variables….”
Denver International Airport, 1994 due to “…baggage handling system broke down because of numerous bugs….”
Undoubtedly the most publicized investigation for scientific/technology failure has to be the Richard Feynman expose’ of how a mechanical failure of hydraulic rings led to the Challenger Space shuttle disaster. Richard Feynman was a recipient of Physics Nobel Prize and he was called to be part of the inquiry commission that went into the cause of the Challenger breakup. Check out this (dramatic) inquiry commission video that demonstrates how the shuttle exploded due to hydraulic O-rings loosing their elasticity when exposed to cryogenic temperatures. More details here. This was not necessarily a software failure but the investigations had spillover effects in the software field as well (see this report).
This ZDnet article has a detailed account of the Top 10 worst IT disasters of all time (some of the above cases actually figure in this top 10 list)
1. Faulty Soviet early warning system nearly causes WWIII (1983)
2. The AT&T network collapse (1990)
3. The explosion of the Ariane 5 (1996)
4. Airbus A380 suffers from incompatible software issues (2006)
5. Mars Climate Observer metric problem (1998)
6. EDS and the Child Support Agency (2004)
7. The two-digit year-2000 problem (1999/2000)
8. When the laptops exploded (2006)
9. Siemens and the passport system (1999)
10. LA Airport flights grounded (2007)
What about the Indian case? Ironical as it may sound, much of the success of the Indian IT industry has its original roots in the biggest software bug of all times – the Y2K problem. I hate to say this, but the Millenium bug was probably the tipping point in the global advance of our software/IT companies.
I did a quick google search for cases in India where major disasters have been caused due to software bugs. My search did not yield anything apart from the occassional cases of financial (banking related mostly) fraud caused by organised phising gangs. My hunch is that in a society like ours, which has always tried to sweep failure under the carpet, the cases would surely exist but they are unlikely to be as open and as well documented.
Questions to readers – are you aware of Indian examples of disasters caused by software bug failures? Please leave behind in the comments if you do and I’ll compile a list.