HMC installios Cleanup

Edit: Some links no longer work. Some updates at the bottom.

Originally posted October 27, 2015 on AIXchange

Awhile back, I was called in to assist an IBM i heritage customer that encountered difficulty installing a VIO server from their HMC.

Fortunately, this support document had some helpful information:

This document describes how to cleanup HMC installios after a failure or interruption of the command.

HMC installios process failed or was interrupted before completing, and subsequent installios command fails with a permission error, such as “/tmp/installios.lock : print Operation not permitted.”

1. If a problem occurred during the installation and installios did not automatically unconfigure itself, run the following command to manually unconfigure the installios installation resources.

    installios -u

Some times the command may fail with a “Permission Denied” error or an error similar to the one below. If it does, proceed with the remaining procedure.

    hscroot@hostname:~> installios -u
    nimol_config MESSAGE: Unconfiguring the NIMOL server…
    nimol_config ERROR: The file /etc/nimol.conf does not exist.
    nimol_config MESSAGE: Running undo…
    ERROR unconfiguring nimol_config.

2. Check if any of the following exist. If so, they need to be removed:

    /tmp/installing.lock
    /tmp/installios_cfg.lock
    /tmp/installios_pid

To remove the file(s), you must obtain a “temporary” PESH access code to gain root access by contacting an HMC Software Support Representative at 1-800-IBM SERV. You will need the HMC serial number. …

Once you have root access to the HMC, change the file(s) permissions by running:

    chmod 775 /tmp/<filename>

At this point, you can try ‘installios -u’ again or manually remove the file(s). Then try the installation again.

HMC 7.3.4 has a known issue with lpar_netboot command creating log files in /tmp such that later execution will cause a log file collision resulting in a failure due to permission error. The fix is in HMC 7.3.5 with (mandatory fix) PTF MH01197. For more details, please, contact an HMC Software Support Representative.

In our case, cleaning up from the installation was as simple as running installios –u and then retrying the operation. Sure enough, on the retry, it again hung partway through the install. I guessed that this was the point where the previous attempt had been aborted.

On the HMC I was able to look at the log file:

    /var/log/nimol.log

I found that the install got this far:

    2015-08-13T06:14:35.088694-05:00 ioserver nimol: ,info=initialization
    2015-08-13T06:14:36.037522-05:00 ioserver nimol: ,info=verifying_data_files
    2015-08-13T06:14:41.084288-05:00 ioserver nimol: ,info=prompting_for_data_at_console
    It LPAR was hung at LED 0c48

I was able to open a console to the LPAR and then select the LUN that the VIO server would be installed to. In this case the LUN was being reused, and the installer recognized that a rootvg was already there. Rather than simply auto-overwrite the LUN, we received a warning prompt. It was making sure we actually wanted to overwrite it. I found this behavior pretty slick.

In general, I prefer NIM for installing VIOS, but in this case the alternative was the best choice, given the overall expertise of the people doing the installation. For an IBM i team with no knowledge of AIX or the NIM server, NIM would have been too much trouble.

——-

EDIT: This was where the original post ended. I got an email from an old co-worker from my days at IBM, Vic Walter. He gave me permission to share our conversation.

Hey Rob,

                I hope all is well with you. I am having issues with VIO installs via HMC image failing and ran across your article.

                When I run the cmd…

                                installios -F -e -R default1

                I get an error message….

                                ERROR removing default1 label in nimol_config.

                And am not finding anything about where nimol_config is

                Maybe you can help ?  thx

——-

I replied with:

Were you able to get the PESH password from IBM?  Seems like they would be able to help?  I guess I would run a find command to see if I could find the file..

——-

He replied with:

I do have a case open with IBM and did get the pesh passwords, even running the installios cmd as root also fails.

Find as root did find it.  /usr/sbin/nimol_config is a script, but has no default1 reference in it.

[hmc1 /] # grep default /usr/sbin/nimol_config

                               \rdefaults:

                               \r\t-L    default

                msg “No NIMOL server hostname specified, using %s as the default.\n” “$NIMOL_SERVER”

# Specify the defaults if variables aren’t set.

[[ -z ${LABEL} ]] && LABEL=”default”

——-

After some back and forth, he sent me an update from IBM

——-

Hi Rob,

                Sorry for the delay in responding. IBM’s solution was to shell into the HMC as hscpe (with pw they provided) su – root and run these cmds.

Once you login as root, first perform cleanup of the previous installios attempts with the below commands:
installios -F -e -R default1
installios -u
check for the below lock files and remove them if exist:
ls -la /tmp/installing.lock
ls -la /tmp/installios_cfg.lock
ls -la /tmp/installios_pid

                In my case the 3 files were present and I removed them. After that the HMC was rebooted by one of the AIX admins before I could get back to the VIO install.

When I did get back to the VIO installs all went well.

One other issue is the NIC used in the failing VIO install was not able to network boot off of the HMC for some reason. I borrowed the NIC from the other VIO to complete the install and this is when the failures appeared. This failure of the network boot could have been the original cause of the VIO install fail and incomplete cleanup. I am not real sure here. This is a new frame, but it also was not seeing one of the internal NVMe disks. One slot had “unknown” instead of the usual 800 GB NVMe description. I had the IBM CE reseat things and run diag on the box. He did find the drive not seated properly and otherwise found no issues.

——-

The main reason I wanted to document this was so that in the future, if this post comes up in your search, there will be another option for you to try.