IT Has its Mysteries

Edit: The link where authorities might step in no longer works, I found a working link to the same article.

Originally posted March 24, 2009 on AIXchange

Some people love a good mystery. Others enjoy a challenging puzzle. Working in IT, many times the task at hand involves solving mysteries and working out puzzles.

In the computing world, some people want to know “whodunnit” so that they can affix blame. Others want to know who did it so they can educate and enlighten that person or persons, and help them avoid repeating their mistake in the future. I suppose if it’s a serious problemauthorities might step in so they can prosecute.

The puzzle-solving starts when the problems are first reported. Either the help desk will get a call or an admin will notice something changed via reporting software or system alerts. What is causing the system to behave this way? What has changed in the environment? Who made the change that caused the problem? What new software was installed?

Once a mystery is solved, you’re often left with a mess to clean up. If you don’t have good policies in place, or if developers or junior level administrators have root access to machines, one simple mistake can cause problems.

Recently an administrator was trying to install the OpenSSH server from the IBM expansion pack CD. When he tried the installation, he would get an error:

RSA key generation failed

instal: Failed while executing the ./openssh.base.server.post_i script.

As a result, when an admin tried to run ssh-keygen, they would get
“PRNG not seeded” error messages.

In this case, the /dev/urandom file was somehow missing from the machine, and the randomctl –l command was used to re-create it.

After running this command, he was able to install the openssh.server filesets without any problems. It was pretty obvious who had deleted the file, and some education was in order.

Do you have the tools in place to know who’s logging in to your machines, what commands they’re running and what changes are occurring on the machine? Do you have file-level backups so that you can recover individual files or, if things get really fouled up and you have to rebuild the system, conduct full system backups?

If the answer to any of the preceding is no, why not? Do you like mysteries and puzzles that much?