DEV Community

Cover image for 11 of the most costly software errors in history
Nick Harley for Raygun

Posted on • Originally published at raygun.com

11 of the most costly software errors in history

In this post, we take a look at some of the biggest disasters over the years to see what happens when software errors cause chaos!

NASA's Mars Climate Orbiter

On its mission to Mars in 1998 the Climate Orbiter spacecraft was ultimately lost in space. Although the failure bemused engineers for some time it was revealed that a sub contractor on the engineering team failed to make a simple conversion from English units to metric. An embarrassing lapse that sent the $125 million craft fatally close to Mars' surface after attempting to stabilize its orbit too low. Flight controllers believe the spacecraft ploughed into Mars' atmosphere where the associated stresses crippled its communications, leaving it hurtling on through space in an orbit around the sun.

Ariane 5 Flight 501

Europe’s newest un-manned satellite-launching rocket reused working software from its predecessor, the Ariane 4. Unfortunately, the Ariane 5’s faster engines exploited a bug that was not found in previous models. Thirty-six seconds into its maiden launch the rocket's engineers hit the self destruct button following multiple computer failures. In essence, the software had tried to cram a 64-bit number into a 16-bit space. The resulting overflow conditions crashed both the primary and backup computers (which were both running the exact same software).

The Ariane 5 had cost nearly $8 billion to develop, and was carrying a $500 million satellite payload when it exploded. Read more here.

EDS Child Support System

In 2004, EDS introduced a highly complex IT system to the U.K.’s Child Support Agency (CSA). At the exact same time, the Department for Work and Pensions (DWP) decided to restructure the entire agency. The two pieces of software were completely incompatible, and irreversible errors were introduced as a result. The system somehow managed to overpay 1.9 million people, underpay another 700,000, had US$7 billion in uncollected child support payments, a backlog of 239,000 cases, 36,000 new cases “stuck” in the system, and has cost the UK taxpayers over US$1 billion to date.

Soviet Gas Pipeline Explosion

The Soviet pipeline had a level of complexity that would require advanced automated control software. The CIA was tipped off to the Soviet intentions to steal the control system's plans. Working with the Canadian firm that designed the pipeline control software, the CIA had the designers deliberately create flaws in the programming so that the Soviets would receive a compromised program. It is claimed that in June 1982, flaws in the stolen software led to a massive explosion along part of the pipeline, causing the largest non-nuclear explosion in the planet’s history.

Bitcoin Hack, Mt. Gox

Launched in 2010, Japanese bitcoin exchange, Mt. Gox, was the largest in the world. After being hacked in June, 2011, Mt. Gox stated that they’d lost over 850,000 bitcoins (worth around half a billion US dollars at the time of writing).

Although around 200,000 of the bitcoins were recovered, Mark Karpeles admits "We had weaknesses in our system, and our bitcoins vanished.”

Heathrow Terminal 5 Opening

Just before the opening of Heathrow's Terminal 5 in the UK, staff tested the brand new baggage handling system built to carry the vast amounts of luggage checked in each day. Engineers tested the system thoroughly before opening the Terminal to the public with over 12,000 test pieces of luggage. It worked flawlessly on all test runs only to find on the Terminal's opening day the system simply could not cope. It is thought that "real life" scenarios such as removing a bag from the system manually when a passenger had left an important item in their luggage, had caused the entire system to become confused and shut down.

Over the following 10 days some 42,000 bags failed to travel with their owners, and over 500 flights were cancelled.

The Mariner 1 Spacecraft

On a mission to fly-by Venus in 1962, this spacecraft barely made it out of Cape Canaveral when a software-coding error caused the rocket to veer dangerously off-course, threatening to crash back to earth. Alarmed, NASA engineers on the ground issued a self-destruct command. A review board later determined that the omission of a hyphen in coded computer instructions allowed the transmission of incorrect guidance signals to the spacecraft. The cost for the rocket was reportedly more than $18 million at the time.

The Morris Worm

A program developed by a Cornell University student for what he said was supposed to be a harmless experiment wound up spreading wildly and crashing thousands of computers in 1988 because of a coding error. It was the first widespread worm attack on the fledgling Internet. The graduate student, Robert Tappan Morris, was convicted of a criminal hacking offense and fined $10,000. Morris's lawyer claimed at the trial that his client's program helped improve computer security.

Costs for cleaning up the mess may have gone as high as $100 Million. Morris, who interestingly co-founded the startup incubator Y Combinator, is now a professor at the Massachusetts Institute of Technology. A disk with the worm's source code is now housed at the University of Boston.

The Morris Worm

Patriot Missile Error

Sometimes, the cost of a software glitch can't be measured in dollars. In February of 1991, a U.S. Patriot missile defence system in Saudi Arabia, failed to detect an attack on an Army barracks. A government report found that a software problem led to an inaccurate tracking calculation that became worse the longer the system operated. On the day of the incident, the system had been operating for more than 100 hours, and the inaccuracy was serious enough to cause the system to look in the wrong place for the incoming missile. The attack killed 28 American soldiers. Prior to the incident, Army officials had fixed the software to improve the Patriot systems accuracy. That modified software reached the base the day after the attack.

Pentium FDIV bug

When a math professor discovered and publicized a flaw in Intel's popular Pentium processor in 1994, the company's response was to replace chips upon request to users who could prove they were affected. Intel calculated that the error caused by the flaw would happen so rarely that the vast majority of users wouldn't notice. Angry customers demanded a replacement for anyone who asked, and Intel agreed. The episode cost Intel $475 million.

Knight's $440 Million Error

One of the biggest American market makers for stocks struggled to stay afloat after a software bug triggered a $440 million loss in just 30 minutes. The firm's shares lost 75 percent in two days after the faulty software flooded the market with unintended trades. Knight’s trading algorithms reportedly started pushing erratic trades through on nearly 150 different stocks, sending them into spasms.

Honourable mention: NOAA-19 Satellite

The NOA-19 satellite

Although not a software error, on September 6, 2003, this satellite was badly damaged while being worked on at the Lockheed Martin Space Systems factory. The satellite fell to the floor as a team was turning it to a horizontal position. An inquiry into the mishap determined that it was caused by a lack of procedural discipline throughout the facility. Turns out that while the turn-over cart used during the procedure was in storage, a technician removed twenty-four bolts securing an adapter plate to it without documenting the action. The team subsequently using the cart to turn the satellite failed to check the bolts, as specified in the procedure, before attempting to move the satellite.

Repairs to the satellite cost $135 million.

Don't want to get caught out by your software bugs? Get automatically notified of your software errors with instant notifications. Book a demo with an experienced team member or sign up for a free trial.

Top comments (5)

Collapse
 
phlash profile image
Phil Ashby

Thanks Nick, good to be reminded that we can all make bigger mistakes! Possibly my favourite expensive 'bug' to date, creating null references :)

infoq.com/presentations/Null-Refer...

Your link for the Ariane 5 flight 501 failure seems broken, this MIT write up is pretty official:

sunnyday.mit.edu/accidents/Ariane5...

which notes that engineers didn't need to 'hit the self destruct', as flight 501 broke up and self-destructed autonomously, after a series of failures induced by gaps in testing and communication between CNES and industry suppliers.. big project teams eh?

Collapse
 
harley1984 profile image
Nick Harley

Thanks Phil. I updated the broken link and also added your MIT link, thanks for providing that :)

Collapse
 
hdennen profile image
Harry Dennen

"There are two SRIs operating in parallel, with identical hardware and software." Identical hardware and software...

Collapse
 
lennartb profile image
Lennart

Another one would be Therac-25 radiotherapy machine, the paper is also a fascinating read: sunnyday.mit.edu/papers/therac.pdf

Collapse
 
kurisutofu profile image
kurisutofu

Good read!

Now, we can understand a little better why sometimes the higher-ups are reluctant to update a system.