More on the HMC and root

Edit: Still a good discussion.

Originally posted September 20, 2016 on AIXchange

Did you know about the AIX forums that are hosted at unix.com? It had been awhile since I checked them, but when I did recently, I found an open letter that was written to me a few weeks after I wrote about whether IBM should allow root access to the HMC.

It was a happy discovery and an interesting read, so I’ll share a summarized version of it here. I do agree with many of the points that were brought up in the letter and the discussion that followed.

From the first post:

“So, do I want root on the HMC, as McNelly finally asks? No, for the most time a decent user account with a normal, not-restricted shell would suffice. But to manage this account — in the same responsible way I manage the rest of my 350 LPARs — I’d like to become root now and then to do whatever administrators do. Of course I know how to jailbreak the HMC (like perhaps every halfways capable admin does), but why do I need to “break into” a system I have set up, a system I run and for which I (well, actually my company) have paid good money?

…we are not talking about some mobile phone for $69.99. We are talking about the two HMCs I use to manage one and a half dozen p780s and p880s, about 2 million dollars apiece. Do you think it is necessary to squeeze out some minimal additional benefit by pestering me with a restricted shell for my daily work? And if you really think I couldn’t handle the responsibility for such a vital system: don’t you think I should be removed from the position where I manage the LPARs running the corporate SAP systems too?”

Another commenter replied to the thread and argued that a user with unfettered access could blow up the HMC. Then this came up, concerning education:

“Second: this is digging into a much larger area so I’ll try to keep it short. The reason that so few capable admins for AIX are there is because IBM did (and, IMHO, still does) a very bad job at educating them. If I am a Linux admin and want to hone my skills I get myself a PC for $300 and start hacking. I will perhaps make it go FUBAR a few times but all this will teach me valuable lessons and I will be all the more capable once I work on really productive systems professionally. If I am an AIX admin I do… what? Buy myself a system for ~ $20k only to find out I can’t even create an LPAR because I need to shell out another $50k in various licenses for one thing or the other? This might be OK for a bank, but is beyond my financial reach.”

I’ve written plenty about education over the years (for starters, herehere and here), and I do believe this continues to be a problem.

This is from another commenter:

“I can agree with most of what has been said above, I can understand IBM wanting to lock the HMC appliance down as much as possible and I understand the sysadmin desire to have full control of any machine on the network as Bakunin says – if there’s not a competency issue. In truth, my main reason for coming down on the restricted side of this argument is exactly that – competency! I have a number of systems that have been up and running for longer than many of my support contacts have been systems admins, I don’t actually have privileged access to many of the systems – I have elevated access or “root” access on none of the systems. Should I need root access, it has to be requested, approved and I am issued with a one-time password.

I find it to be a total pain, but that is the implemented system. On investigation the reason for the system being implemented was, you guessed it, competency! Cited examples, well I could give you any number. But an example that I think sums it up quite well is one that was easy to recover from, but could have been catastrophic had it been a customer facing system with say five or six thousand users. Instead of a development system, with just a couple of hundred developers. Where the “root” user executed a recursive delete command with a space in it, from the root directory and effectively deleted the full contents of the server – mostly source code and development tools.

I have worked in the *NIX world since 1981, over that time I have watched the skill level of the sysadmin degrade, a lot of it revolves around training – my first “Sysadmin I” course was five weeks long and I never actually saw a machine. It was all spent sitting at a Wyse 30 terminal, with a number of other trainees. Now I see sysadmins working for major vendors, with no training whatsoever.”

The final post on the thread covers at length an issue caused by being locked out of the HMC. Here’s the conclusion:

“Yes, it was my fault not to have the idea with the /var FS earlier. I was tricked by both HMCs losing connection at about the same time and investigated in the completely wrong direction. On the other hand, this is not a UNIX system, it is an appliance. Why am I supposed to act as am admin checking for filesystems when I was first denied all the tools admins have?

Second, my life was made so much easier by being forced to rely on tricks like pulling MAC addresses out of the routers logs instead of simply issuing ifconfig. Find out how long a system is up: uptime. Find out how long a HMC is up: impossible. Check how many packets are being sent/received on a UNIX system: entstat or netstat. Find out the same on a HMC: impossible. This list goes on and on.

And finally: even if I had diagnosed the problem correctly it wouldn’t have helped me any. We actually tried the “official” methods of cleaning up before, but they didn’t work at all (as they usually do — I have seen them fail more often than not). Only breaking in and using normal UNIX commands did what was expected. And why did IBM not see that full FS in the 2.6GB dump they required me to upload? Do I really want to take the risk of my multi-million-dollar environment becoming completely unusable because I have a system at the center which I can neither diagnose nor administrate…”

I can certainly commiserate with the sentiment, although, had it been me, I would have engaged a duty manager and escalated the support ticket. I’d also ask for the one-time HMC password to help with the diagnosis, and maybe even request a shared-screen conversation so I knew I was getting a technician’s full attention. If you’re really stuck, you owe it to yourself to utilize the minds and resources at IBM Support. Keep making noise there until you get what you need.

Anyway, this is a great discussion, and I wouldn’t mind seeing it continued here. So what do you think? Should IBM just give us root to the HMC? Should they continue to offer the one-time password option via support? Is there another solution?

And if you haven’t signed up for the AIX forums, you should. You may not use it regularly, but it’s a great place for launching discussions and getting answers.