When Computers Go Wrong

Explosion. Waziristan
Photo by Northampton Museum

PC Pro has an interesting (if somewhat frightening) article on the “10 most calamitous computer cock-ups.” It got me to thinking about all of the times I’ve screwed things up either with code I’ve written or by some administration task I’ve performed. I dare not count them for fear that the number may be bigger than I think (and I can think of a pretty big number!)

What makes these 10 stand out is not so much the mistakes themselves, but the scale of their effect. They are particularly calamitous not because the scope of the error was so large but because the reach of the technology in question. In our own environments, our mistakes are not less calamitous with their own scopes. Accidentally deleting a bunch of user accounts or erasing backup files won’t quite effect the same number of people as the blowing up of a gas pipeline, but try telling that to your users when they can’t do their work.

Perhaps progress isn’t so much a measure of success, but the size of the potential problems. If you are at the point where a simple mistake can cost thousands of hours of productivity, then you’ve progressed through a series of smaller tragedies to get there. I don’t think I’d want anyone in a position to cause a lot of damage unless they’ve learned how to deal with causing a lot of damage. And learned to be sufficiently scared of pressing the “Submit” button. Not so scared that they don’t dare to push it, but scared enough to make it clear that they understand what pushing it may mean. To painfully twist a famous phrase: If you aren’t terrified of making a change to production, then you don’t fully understand what production is.