IBM Support Comes Through

Edit: I always advocate for calling problems into IBM support. This is old information, but I leave it here because you never know what people are still running and what problems they might run into. Why reinvent the wheel?

Originally posted March 11, 2008 on AIXchange

Recently I was working on a customer machine that was giving these lppchk errors: 
lppchk -v
lppchk:  The following filesets need to be installed or corrected to
         bring the system to a consistent state:
 
  bos.rte.xxxxxxx 5.3.7.0         (usr: not installed, root: APPLIED)
 
Using oslevel -s  and instfix -i, I received this output:
 
5300-04-00-0000
 
instfix -i | grep ML
   All filesets for 5.3.0.0_AIX_ML were found.
   All filesets for 5300-01_AIX_ML were found.
   All filesets for 5300-02_AIX_ML were found.
   All filesets for 5300-03_AIX_ML were found.
   All filesets for 5300-04_AIX_ML were found.
   Not all filesets for 5300-05_AIX_ML were found.
   Not all filesets for 5300-06_AIX_ML were found.
   Not all filesets for 5300-07_AIX_ML were found.
 
TL7 had  been applied at some point, but there must have been issues during that install that weren’t caught then. The customer had no backups of the machine prior to the TL7 upgrade. I opened a PMR with IBM and the correct update media was quickly shipped out, but when I tried to install it, I couldn’t due to the state the machine was in.

On my attempts to reinstall, I received this error:
 
fileset is applied on the “root” part but not on the “usr” part.
      Before attempting to re-apply this fileset you must remove its
      “root” part.  (Use the reject facility if the fileset is an
      update.  Remove the fileset via the deinstall facility if it is
      a base level fileset.)
 
If I tried to reject it, I got this error:
 
SELECTED FILESETS:  The following is a list of filesets that you
  asked to reject.  They cannot be rejected until all of their
  dependent filesets are also rejected.  See subsequent lists for
  details of dependents.
 
We tried to force overwrite the fileset, but it gave us errors as well. So I was in a catch-22. But then I called IBM support and referenced the PMR number, and was connected with a knowledgeable AIX support person.
 
We had no mksysb of the machine, and reloading the operating system from scratch was a last resort. I think the IBM representative understood my position. She took the time to help us explore all options before finally having me reload the machine.
 
Thanks to IBM support’s hard work, I was able to resolve the problem by performing “surgery” on the machine’s ODM. Now, I would NOT recommend trying this on a production machine unless support instructs you to do so. (I guess though if you’re on a test machine that you don’t care about destroying if you make a mistake, have it at.)

Here’s a rough idea of what we did to make the machine ignore the broken updated fileset.
 
# export ODMDIR=/etc/objrepos
# mkdir -p /tmp/odmfix
# cd /tmp/odmfix
# odmget -q name=’fileset name’ lpp > lpp.out
==> vi lpp.out to get the lpp_id = ###
# odmget -q lpp_name=’fileset name’ product > product.out
# odmget -q lpp_id=### history > history.out
# vi history.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# vi product.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# odmdelete -q lpp_name=’fileset name’ -o product
# odmdelete -q lpp_id=### -o history
# odmadd product.out
# odmadd history.out
 
After running this procedure, we put the machine in a state where it ignored the broken TL7 file set. At that point, I could reload it. After swapping a few CDs and finishing the TL7 update, the lppchk errors went away.
 
Support also reminded me of something found here:
 
“The rule has been changed that previously allowed applying individual updates/PTFs from a TL. The rule now says that installing a Technology Level is an ‘all or nothing’ operation. Requisites are now added so the whole Technology Level is installed. Before applying a TL, you should always create a backup and plan on restoring that backup if you need to rollback to your previous level.”
 
The IBM rep gave me this explanation: When doing TL updates, plan to commit the fixes rather than apply them, because they don’t support rejection of TL updates. The backout procedure is to restore from mksysb (or boot from your alternate disk if you go this route).

Long story short: Make sure you have valid environments for testing fixes before installing them in production, and always be sure you have good backups.

And the moral of this story? Never take IBM support for granted. Their help is invaluable.