Edit: With automated builds and golden images it is even easier to have standards.
Originally posted December 8, 2009 on AIXchange
This fall I attended the RedHat conference. The conference hosted a break fix challenge, and the outcome was interesting.
Consider two particular teams of administrators that participated in the challenge. One team came from a large company that adheres to very strict server build and maintenance standards. Each machine its administrators touch is identical; there are no deviations. Every node admins login to is the same as every other node. You can see how this makes administration easier. When all of your tools, logs and cron jobs are in the same place, running at the same time, troubleshooting can be simplified. When you work in the same environment every day, you become very good at fixing that set of machines.
Another team came from a large company that manages machines for customers. When something breaks, these admins literally have no idea what they might find. Where are the tools? Where did the customer load the scripts? Which jobs are running when? When these folks get a call, they must spend some time getting familiar with the environment.
You can imagine which team fared better in the break fix challenge. Although I’m sure there are shades of gray when describing these administrators, the guys that constantly go into unknown environments to solve problems prevailed.
But what about administering your environment? Would you rather have the admins from the first team or the second team? You might think that there’s no need for standard server builds. You want to keep your guys sharp. You want them on their toes. You want them to be able to figure out the environment before they go fix things.
While this makes sense when you’re managing machines for customers, when they’re your own machines, I certainly prefer a standardized approach. I know when I’ve worked across large teams where we shared pager duty, it made life easier knowing I could log into a machine and find things where I expected to find them. Solving problems was quicker. I could get back to sleep sooner.
I’ve said it before (here and here), and I continue to believe that companies benefit from having good server build documentation and procedures.
For instance, I was recently at a customer site. The company had had some staff turnover, and the machines were set up in different ways. This caused nothing but headaches for the new guys who were trying to figure out the environment and prioritize the projects.
Standardization does make life easier, both for the teams that we work on now, and those who will follow in our footsteps. So let’s have good documentation, and good server build documentation. It’s fun to sharpen our skills to win a contest, but let’s make sure we’re sharpest where it counts. Let’s build the perfect automated system to turn out identically built servers.