Knowledge Flows From Mistakes

Edit: I still think that there is much to be learned when we make mistakes.

Originally posted March 17, 2009 on AIXchange

The above image is the work-safe version of a popular problem-determination flowchart. (The more widely circulated NSFW version can be found by searching on “flowchart no problem” or “problem solving flowchart.”)

Why bring this up in the first place? The point of the flowchart seems to center around blaming mistakes on others, or, if people don’t know about the mistakes, being sure we don’t tell anyone about them. It’s fun in theory, but hopefully in real life, we aren’t looking to hide things or blame others.

Hopefully the environments in which we work have test labs and other places where people can figure things out. Even with test labs though, people will make mistakes. We all do, we all have and we all will. It’s a fact of life. Hopefully with experience we make fewer mistakes, but I bet that many of the good habits that you have formed over the years are the direct result of yours or someone else’s mistake, coupled with the desire to not repeat that same mistake.

Of course, we must own up to our mistakes. Recently someone accidentally pulled the wrong power cords loose in a computer room. Those cords fed a critical SAN switch that was being used by a ton of machines. However, the guy who pulled the cord didn’t report what happened, leaving it to the SAN administrators to figure out what had gone wrong.

I remember watching a guy pull up one of the tiles on a raised floor at the same instant that the power went out and the UPS kicked on. The look on everyone’s face was priceless. He hadn’t touched anything or done anything, but for whatever reason, that exact moment was when the power went out. Would he have told anyone about it if he had been the cause of the outage, or would he have covered up the tile and gone to hide in his office?

In our work environments, mistakes need to be–if not tolerated- accepted, at least to the extent that people are allowed to understand and learn from them. The point of root-cause analysis isn’t to assign blame, but to figure out what went wrong and how it can be done better in the future. Sometimes that brings new procedures to bear. Sometimes it leads to better documentation. In all cases, if done right, it should reduce the likelihood of the same situation reoccurring.

In test labs I’d go as far to say that mistakes should be encouraged. Some of the best learning comes from trying and failing–multiple times–to get something to work. All of the effort that went into gaining that hard-earned knowledge is much more valuable than simply going step by step through
someone else’s documentation. Yes, you can learn that way. But when you actually figure things out for yourself, you’re in a much better position to really fix things when unexpected problems arise.

The flowchart is worth a laugh. It may even be worth printing out and displaying in your work area. But don’t live by it. In fact, when it comes to your job, do the opposite.