Edit: This is still a good list of rules.
Originally posted March 8, 2011 on AIXchange
A few months ago I took a class with IBMer Tommy Todd, who highlighted 10 rules for administrators that he had accumulated over the years. I’ll run down his list, and comment about each rule. Then I’d appreciate your thoughts.
Documentation: Make sure your documentation is up to date. Ask yourself how you’re documenting your systems. I really like to generate a sysplan from the HMC. It shows me a diagram of the physical hardware, where the adapters are assigned, how the LPARs are configured, etc.
Make backups: How are you backing up your machine? Do you backup both the operating system (rootvg) and data (datavg)? Are you periodically running mksysb commands, and have you tested them? Can you restore your machines? Have you tested your disaster/recovery plans? Did you back up your HMC and VIO servers?
Try it three times: Did you fat finger something? Do you have poor typing skills? Did you use the wrong flag? Do you need to go look at the man pages?
Don’t overlook the obvious: Many times the answer will be simple. Recently someone was trying to remove a directory and couldn’t do it. Fuser, lsof — nothing was showing that the directory was in use. The admin was stumped. It turned out he still had a mounted filesystem on that mountpoint. Once he unmounted the filesystem, he was good to go. How many obvious things have you overlooked?
Try it, it might work: I like to log into test machines and try different things; you never know what you’ll learn. For me the best learning is hands-on learning.
Never say never, always avoid always: There will be exceptions, there is usually more than one way to reach the same endpoint. In other words, don’t say “it always works that way,” or “it will never work like that.” The technology does change. Things that didn’t work before do now, and vice versa.
Make a copy before you edit anything: You might have a copy out on a TSM server or backed up somewhere, but what if that backup copy has an issue? It’s nice to have that safety net, but it’s smart to cp /etc/hosts /etc/hosts.orig before making file changes. If you find yourself making changes to /etc/inittab without using chitab, be sure you back it up first.
There’s usually another way to do it: Especially in UNIX, there’s more than one way to do something. The religious wars come up when people believe that theirs is the only way. I like to hear about how other people do things and learn from them. Many times they do things their way because they had issues in the past. We can all learn from others’ mistakes and benefit from their hard-earned knowledge.
Login as yourself, switch to root when it’s needed: With tools like sudo and role based access control (RBAC), do we really need to be logging in and moving around as root? One wrong keystroke can spell disaster when you have super-user authority.
Don’t say, “I’ll go back and fix that later”: There’s no time like the present to fix your issues. If you must “fix that later,” be sure document it somewhere so you have a reminder to actually come back and fix it later.
Never keep your resume on the system you’re supporting: What if the machine crashes and you don’t have it on a backup server? What will you do then?
Do you abide by all these rules? What are your own rules? Please register your thoughts in Comments.