Review of Glitch: The Hidden Impact of Faulty Software

February 12, 2011 - book-review programming

(This review originally appeared on Amazon.com)

This is designed a C-level executive book and should give those executives an idea about the pitfalls of software, why software is apparently difficult to get right, and how it can go disastrously wrong. The reader who is unfamiliar with software and the business of software is likely to believe much of this book despite its obvious lack of connection to reality and its anecdotal lessons and vague and useless tips. This book isn’t really about software anyway. It’s about technology management, a wholly different topic. It’s about the whole of information technology, with all the buzzwords and hot trends that will make this book a remainder within six months. Indeed, by scrapping the title and changing the focus slightly, this book wouldn’t be so egregiously bad.

The author relies on a few outstanding examples to make his point throughout the entire book. Or, I should say, the author relies on a team of researchers who only came up with a couple of outstanding examples. Find any discipline and you can find stories where the process goes horribly wrong, the tools fail, and people make the wrong decisions. Finding a couple for software stories doesn’t enlighten anyone.

The author’s favorite example, called back into service in many chapters, is some software used to control medical radiation-delivery devices. He blames the software for leaving the collators wide open, delivering lethal doses of radiation. Why does this happen? It’s not just a failure of software. If you are dealing with deadly radiation, part of the process should be a test, using the exact setup you’ll use with a patient. Before you put someone in front of the beam, measure the output directly. Don’t rely on anything else to tell you how much radiation is coming out of the machine. Software might be the fault, but it could be something mechanical. Why weren’t these devices calibrated before the technician used them to kill people? Who was supervising their use? Why does this situation go so wrong? People trust computers instead of using their common sense, even if they know that what they see on the computer screen is wrong. However, the author doesn’t really hit that point. Why do we hear stories about GPS users driving their cars into lakes? Because they turned off their brains and let the computers think for them despite everything else their senses told them. Why does my friend pay attention to the directions that her GPS gives her even though she’s been driving the same route for 20 years? That’s not a glitch in software. That’s a defect in humans.

He also is quick to jump on the latest round of Toyota recalls, suggesting, before the facts are in, that the unintended accelerations were the fault of software. We now know that to be false. Those issues were largely mechanical or operator related. However, when software is around, it’s an easy target that can’t defend itself. We scapegoat it. Other examples he uses includes Bernie Madoff maliciously and purposedly entering bad data (how is that software’s fault?) and security breaches through loss of laptops and smartphones.

He cites an example of police repeatedly raiding the wrong house because the computer gave them the wrong address. How is improper police procedure software’s fault? Shouldn’t the police surveil the address first to verify their target is there? Why blame computers when the stormtroopers go in with guns blazing without any direct evidence that there is a threat or a suspect inside? How is software to blame there, but not in Waco or Ruby Ridge? How does the paramilitary trend in law enforcement have anything to do with software? Again, by shifting blame away from people, he absolves them of their responsibility in the outcomes and ultimately undermines the points that he is trying to make. Software is not the culprit in these cases; people are. By using sensationalistic current events, even before the facts are in, he shows that he’s not up to a serious and deep discussion of what’s really happening in software.

This book does not deal with how software comes into being, and seems wholly ignorant of it. The author makes the common mistake of equating computer science with software. That’s about the same as equating a physicist with a civil engineer. Although the physicist may tell you he can design a bridge because he knows the laws of the universe, do you really want a physicist in charge of something you’ll drive over? Computer Science, which he wants to rename to “The Art of Computer Science” (with no nod to Donald Knuth’s “The Art of Computer Programming”), is not a software creation discipline yet people keep hiring computer science graduates thinking they know how to write software. That’s the problem.

Computer Science is not vocational training for software creators anymore than a Chemistry program is vocational training for pharmaceutical research. However, since many people don’t understand this, that there is a shortage of qualified software workers, that there isn’t good training for software managers, and that everyone can shift blame to the software and the tools, the black arts of software creation and management causes a lot of problems. It’s not a problem with the software—that’s just the symptom of an underskilled, poorly managed, and misunderstood craft, and it’s nothing new. To be sure, bad software comes out of this mess, but do you blame the culinary world when your kids burn the eggs they cook you or you get a bad meal in a restaurant? Who does Gordon Ramsey blame in Kitchen Nightmares? If you want to know more about why your software workers don’t deliver, try The Psychology of the Computer Programmer, The Mythical Man Month (or anything by Frederick Brooks), or many other books that have already dealt with the topic much better.

At one point, the author says that the senior level executives should trust their IT managers. Again with the trust. Instead of trust, the senior level executives should be holding their managers’ feet to the fire to find out how much they really understand what they are doing, how they are evaluating alternatives, and what long term effects their decisions will have. They should also be holding their vendors feet to same fires so they aren’t sold a bill of goods that not only doesn’t meet their needs but doesn’t even do what it says on the tin. However, since very few people actually understand any of this (compare with the Better Off Ted’s “Jabberwocky” episode), they let it slide. Again, they can shift the blame and shirk their responsibility as managers. Who gets fired when something goes wrong? It’s often not the top. Compare that to the Navy, for instance. Who gets fired if something goes wrong on a ship? It’s usually at least the captain of the ship, even if he did everything he should have done; he is ultimately responsible. Add career consequences to IT failure, and you’ll change the game. Make people liable for what they do and they’ll be more careful.

Having said that, I expected to see examples of companies employing good practices with high standards still run into problems that can’t be blamed on anything else but the software. Are there times when you can do everything right and still lose? Where are the stories from the aerospace industry, like the Air France Flight 296 computer that decided to land when the pilot was doing a low speed pass during an airshow, or the Mars missions that miss Mars because one group used imperial units and another the metric system? There’s not much breadth in this book. It’s the USA Today of the subject.

I would have also like to have read a lot more personal stories from leaders of technology companies, looking back at their big mistakes. You can’t effectively do that with anything current without inciting investor revolt, but how about the glitches from 20 years ago? Raymond Chen’s idea on the decisions that went into Windows, the operating system that causes quite a bit of IT headaches and loss, are quite enlightening. When do the business decisions trump the software decisions?

How do you balance the desire for having it now and keeping it working with the rest of your IT infrastructure with the competing forces of doing it well no matter how long it takes? How about Steve Jobs talking about any of his many failures, which he readily admits to (Apple III, Newton, etc)? Why do you have to buy expensive support contracts for Oracle, SAP, or PeopleSoft? What is the real reason those products takes so much care and feeding and cause so much frustration?

What about Open Source software that still hasn’t delivered a desktop operating system that normal people want to use, can’t outshine Adobe’s Photoshop, and still has egregious bugs despite its “many eyes”? Why do we lie to ourselves? Is it just confirmation bias, or is it something else?