When it Comes to Backups, Move Forward

Edit: I miss my reel to reel tape drive.

Originally posted January 27, 2009 on AIXchange

Twenty years ago the option available to me to back up my machine was reel-to-reel tape drives. I’d bring my machine down to single-user mode to perform the backup, and each tape would take 12 minutes.

I remember this because we would set the time on a portable kitchen timer when we started each tape. When the timer went off, we’d head to the computer room to swap out the tape, and go to the console to press G to continue the backup. The real fun came toward the end of the process. If the media that was being written to was bad, the backup job would abort, and you’d have to restart the backup from the beginning.

There were four boxes of tapes that would go offsite each week. I can’t recall how many tapes fit in each box (10 maybe?) or the capacity of each tape. I do remember that a box of reel-to-reel tapes was relatively heavy. I can only wonder if my current collection of lightweight USB flash drives that floats around my computer bag contains more capacity than the system that I was backing up at the time.

Our users didn’t work on nights or weekends, so there was a built-in maintenance window during the night and on the weekends so we could do the necessary work on the machine. Of course there wasn’t any online access to accounts in those days, and we didn’t have offices outside the U.S., so 24-7 availability wasn’t necessary. Some might argue that things were easier with such large maintenance windows, but I can remember backups taking a long time, disk capacities being much smaller, job run times being much longer, etc.

The point of this trip down memory lane is to illustrate how good things are today. Processors have made huge leaps in the last 20 years. Disk capacity and performance gets better year over year. Backup windows have shrunk to nothing–in many cases applications have APIs built in that allow products such as Tivoli Storage Manager to backup data while the application is up and running. In other cases, you can quickly quiesce an application, take a flash backup on your disk subsystem and then restart the application. In many instances the interruption is so brief your users don’t even notice.

When we include products like HACMP and capabilities like Live Partition Mobility, our need for maintenance interruptions falls off dramatically. If you’re still doing the equivalent of taking your machine into single-user mode to perform backups, or if you’re still manually performing many of these tasks, it may be time to re-evaluate your processes. Today, there’s a better way of doing things.

Revisiting mksysb Migrations

Edit: The infocenter site no longer works. It is fun to look back at migrating from POWER5 to POWER6.

Originally posted January 20, 2009 on AIXchange

I wrote an AIXchange blog entry about mksysb migrations using NIM more than a year ago. At the time I said I’d write more after testing it out. However, after seeing in the comments from that post that others had used it (and in one case, written the IBM Redbook about it), it moved down on my to-do list. I figured that between the comments I saw and Steve’s documentation, the method worked as advertised.

The topic came up again recently when a client wanted to know the available options for migrating from AIX 5.2 running on a 550 to a 9117-MMA. Since AIX 5.2 doesn’t run on the 9117-MMA, we needed to figure out how to best handle the migration, which would also include an operating system upgrade.

There are multiple options for doing this; I only cite two here. The first, which is explained on the InfoCenter site, is to use an available internal disk on the 550 and perform an alt disk install. Once the install completes, you reboot and take the mksysb from the updated OS and use it to load the newly upgraded OS onto the new hardware. After the mksysb completes, boot the source system back into the 5.2 environment using the original rootvg disk. A downside to this method is that it requires an outage when booting into the new environment to take the mksysb. The amount of change and the downtime makes this method unacceptable for some environments.

An alternative is to do a regular OS upgrade on the original machine, take the OS to a supported version and then take a mksysb of the new OS and move it to the new hardware. A downside here is if something goes wrong with the upgrade, your backout plan is to resort to a backup tape or NIM reinstall using a mksysb file. I prefer to leave the source machine intact to simplify my backout plan–just run the workload on that source server until the cutover. If there’s a problem after the cutover, I move back to the original machine.

We chose a mksysb migration for our migration from the 550 to the MMA. Since I hadn’t tried it before, I thought I’d test it out in the lab first. I took an AIX 5.3 mksysb, performed a mksysb migration on it and took it to AIX 6.1. I followed the directions in the “NIM from A to Z in AIX 5L” Redbooks publication, pages 205-216. In the Redbooks example, the authors create a file called /other_res/bid.np.hd0.mkmig. The file I created was called /home/test/aix53.custom. Here’s how it looked:

control_flow:
INSTALL_METHOD = migrate
PROMPT = no
EXISTING_SYSTEM_OVERWRITE = yes
RECOVER_DEVICES = no
MKSYSB_MIGRATION_DEVICE = network
CONSOLE = Default

target_disk_data:
PVID =
PHYSICAL_LOCATION =
CONNECTION =
LOCATION =
SIZE_MB =
HDISKNAME = hdisk0

locale:
BOSINST_LANG =
CULTURAL_CONVENTION =
MESSAGES =
KEYBOARD =

I ran:

/usr/lpp/bosinst/bicheck /home/test/aix53.custom

When it came back clean, I went through these smitty menu options to create the new bosinst.data file as a resource:

Smitty nim / Perform NIM Administration Tasks / Manage resources /
Define a resource

I selected:

bosinst_data    = config file used during base system installation

Then I came up with a name for the resource, the server it was located on (the master in my case) and the location of the file (/home/test/aix53.custom).

After that resource was defined, I reselected those menu options to define my 5.3 mksysb file as a resource:

Smitty nim / Perform NIM Administration Tasks / Manage resources /
Define a resource

This time I picked:

mksysb          = a mksysb image

I gave it a resource name, the server (master) and the location of the file (/home/test/aix53.mksysb).

Then I went to back to smitty:

Smitty nim / NIM Administration Tasks / Manage machines / Manage network install
resource allocation / Allocate network install resources

I chose the NIM client I was going to target. Then I selected the mksysb resource I’d previously created.

Lastly, I prepared my client for the install. Back to the smitty menu:

Smitty nim / Perform NIM Software Installation and Maintenance Tasks /
Install and Update Software / Install the Base Operating System on Standalone Clients

I picked my client, and selected:

rte – Install from installation images

I selected my lppsource and my spot as if I were doing a normal rte base installation. Then I selected the bosinst.data file that I created earlier as a resource in this field:

BOSINST_DATA to use during installation

I also made sure to accept licenses where appropriate, and set the “Initiate Reboot and Installation Now” prompt to no.

At this point, I was able to kick off my NIM install from my client as I always would. In this case, I happened upon a bug in AIX 6.1. It would get partway through the install, then puke and die, leaving me in single-user mode on my client LPAR. At the time I wondered whether this method was ready for prime time. Turns out it is indeed. I contacted IBM Support and learned that I was running a slightly older AIX 6.1 version that didn’t include a needed fix. IBM gave me a manual work-around that involved editing my image.data file to add information about the /admin directory. Once I did that, it worked exactly as advertised. (Note: There is now an APAR that’s included in the latest fixes that resolves this issue.)

When I was done, my system was running AIX 6.1 exactly as it would have been had I migrated it on the source machine and then cloned it to my target machine. As I stated in my original post, this
method is particularly handy for migrating old, unsupported operating systems from old, unsupported hardware onto our new POWER6 gear.

Mark Your Calendar for Education, Reorganize Your Day for Exercise

Edit: I did not realize I had started talking about exercise so many years ago. It took me quite a while to make it a real part of my day. The Tech U link no longer works.

Originally posted January 13, 2009 on AIXchange

I see that this year’s IBM Power Systems Technical University is set for Sept. 21-25 in Orlando. As I’ve noted in previous AIXchange entries, the Technical University is a valuable educational event. Be sure to start making plans to attend.

While I’m digging up previous posts, I want to revisit this, where I suggest that we all should “work smart, but don’t forget to take time for other things. Go outside. Take a walk.” Perhaps I should have given greater emphasis to taking a walk.

How many of us IT professionals are putting on a few pounds? We do generally have relatively sedentary lifestyles. We drive to our jobs, and sit in front of a computer all day. And if we’re not doing that, we’re sitting in a meeting. Then we go home and play video games and/or watch TV and movies. We eat more fast food than fruits and vegetables. Over time, this lifestyle takes its toll.

Of course this is the time of year for making New Year’s resolutions and thinking about things that we want to change in our lives. From what I understand, health-club memberships spike around this time of year. Gyms tend to be very busy in January, but the crowds dwindle back to normal as the year wears on. The lines for the exercise machines will disappear, just watch and see.

Starting healthy new habits like eating better and exercising more can be tough. It can be harder still to maintain these habits. I would argue that some in the IT industry–myself included–should think about getting the habit in the first place.

It has been done. One former co-worker who’d put on some pounds over the years changed his diet, started biking to work and shed some pounds. Eventually he started cycling recreationally on the weekends. Now he’s in the best shape of his life (and no, for this discussion, round doesn’t count as a shape).

Another co-worker got into running. He now participates in marathons, and his times are enviable. I’ve seen others take up martial arts and achieve similar results.

Employers want healthy people working for them. It reduces their healthcare costs, and it increases productivity when people take fewer sick days. Many larger companies provide a gym on site; take
advantage of it. Others reimburse some portion of employees’ health-club membership fees. Make use of that if you can. And if you already have a membership, start going again, or go more often.

Give yourself e-mail or text message reminders to go take a break–a healthy break. Instead of going outside for a smoke or running to get more coffee, go for a walk. Get a pedometer and measure the steps you take each day, then try for more and more. Park your car further away from the entrance to buildings that you visit.

Setting aside a regular time for exercise will help you remember to do it. Whether it’s first thing in the morning, over lunch or the last thing before bed, establish a routine. Soon it will just become part of your normal day.

Find what works for you. Carve out the time and make it happen. Our jobs require us to constantly exercise our minds. We need to take the time to do the same for our bodies.

AIX Grab Bag

Edit: Short but sweet. The NIM starter guide is still there. The links to fixes still work, the support best practices no longer works. Chris’s article is no longer at that site.

Originally posted January 6, 2009 on AIXchange

In this inaugural post of 2009, I bring you a grab bag of links. While none of these links warranted their own full-length entry, I expect that you’ll find them to be useful.

First, there’s this NIM starter guide.

Most of you probably know about AIX fixpacks and best practices, but readers of this blog do come from many different backgrounds and span all skill levels, so I’m including these: 5.3 fixpacks6.1 fixpacksservice and support best practices.

Here’s a support document covering AIX, VIO and HMC.

Here’s an IDC study on adding business value with cross-platform solutions.

Also, in case you missed it, Chris Gibson commenting on this post, offered a link to his own article about live partition mobility on JS22 blades.

Please post your own helpful AIX links in comments.

IBM, DeVry Partnership a Start

Edit: I imagine even more senior level personnel have retired since I wrote this. The itjungle link is no longer active. The link to subscribe to IBM Systems Magazine still works.

Originally posted December 23, 2008 on AIXchange

If you’ve been into a computer room lately (and most likely you have if you read this blog) let me ask you something: How old are the senior-level personnel? Are they approaching retirement age?

How you answer these questions probably depends on where in the world you’re located. In the United States, between the retirements and the lack of entry-level, on-the-job training, skill shortages in IT is an issue. I previously wrote a bit on this, discussing how you don’t see junior-level admins joining the help desk or operations and working their way up as much as you once did. Today it seems businesses will only hire people who already have the necessary skills.

I’ve also noted that some organizations are reluctant to train their current employees, for fear that they will take their new-found knowledge to a new employer.

With this in mind, I was happy to see that IBM and DeVry University are launching an enterprise computing track for students to gain experience with AIX, IBM i and z/OS. (Read the press release.)

This article  offers details:

“‘Within the next five to seven years, baby boomers will begin retiring and DeVry University can help fill the pipeline with a pool of qualified applicants for IBM, its customers, and business partners. Our students will be educated on IBM’s technology that currently runs the world’s top 50 banks and 22 of the 25 top U.S. retailers. Through this practical education in enterprise computing, DeVry University’s graduates will be set apart from other computer science graduates. …

“DeVry will be plugged into a large Power Systems hub that is specifically managed for the university. The hub will be the direct connection used for teaching IBM Power Systems and mainframe environments. It allows DeVry access to high-end enterprise hardware without the overhead of buying and maintaining systems.”

I like the fact that students will be able to work on live systems remotely, but I have to hope that they will also be able to get hands-on experience. Maybe local businesses or business partners–entities that need to grow new talent–would benefit by providing internships for these students. Maybe bring them in to see what goes on with firmware updates and backups and system migrations.

Then perhaps, through this exposure to the strengths of Power Systems and the robustness of IBM operating systems, these students will be less interested in working with toy operating systems and more inclined to use real operating systems.

Also a note to readers: The magazine is launching a digital version in January. The digital version is free worldwide. To subscribe, click here.

The Case for SNMP

Edit: The link no longer works.

Originally posted December 16, 2008 on AIXchange

Many of us are familiar with Simple Network Management Protocol (SNMP). I wanted to see what information I could display from my SNMP agent on a fresh AIX 6.1 install, so I logged into an AIX 6.1 machine and ran the snmpinfo command:

snmpinfo -m dump -c public –v

All I got back was:

ibm.2.1.1.1.0 = 32769

That seemed odd. My AIX 5.3 machines give me much more information out of the box. After searching Google and the IBM developerWorks forums, I found some information. So I stopped the snmpd daemons and sub-agents:

# stopsrc -s aixmibd
# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s snmpd

Then I entered this command:

vi /etc/snmpdv3.conf

And located these lines:

#VACM_VIEW defaultView       internet                   – included –
VACM_VIEW defaultView        1.3.6.1.4.1.2.6.191        – excluded –

I then needed to uncomment the first line and change the second line from excluded to included. Next, I restarted snmpd and its sub-agents:

# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s hostmibd
# startsrc -s snmpmibd

After this, running my snmpinfo command yielded much more information. In my case, I was looking for the object identifier (OID) for available and used memory, which are found in hrStorage.

I ran these commands:

snmpinfo -m dump -c public -v hrStorage > /tmp/snmp-v.out

snmpinfo -m dump -c public hrStorage > /tmp/snmp.out

When I looked at the verbose listing (captured from the –v command output), I found the line I was interested in:

hrStorageType.9 = hrStorageRam (1.3.6.1.2.1.25.2.1.2)

Once I knew that I was looking for, I found:

hrStorageSize.9 = 262144
hrStorageUsed.9 = 175084
hrStorageAllocationUnits.9 = 4096

To find the OIDs associated with these values, I ran a grep 262144 /tmp/snmp* and got:

snmp-v.out:hrStorageSize.9 = 262144
snmp.out:1.3.6.1.2.1.25.2.3.1.5.9 = 262144

After I ran grep 175084 /tmp/snmp* I found:

snmp-v.out:hrStorageUsed.9 = 175084
snmp.out:1.3.6.1.2.1.25.2.3.1.6.9 = 175084

Now I knew the OIDs that I was interested in for this machine:

1.3.6.1.2.1.25.2.3.1.6.9
1.3.6.1.2.1.25.2.3.1.5.9

The last number (.9 in this case) change based on the number of logical volumes that are on the AIX machine.

To determine out how much memory I had on the machine, I did some math:

(262144*4096)/1024 = 1048576 (or about a gig of RAM on this machine)

To see how much was being used, I calculated:

(175117*4096)/1024 = 700468

With appropriate monitoring software, you can use SNMP polling to get all kinds of information about your machines. You can also set up SNMP traps to be sent to your management machine based on predefined thresholds or events.

Of course, there’s much more with SNMP. I didn’t touch on security, community strings or the fact that many organizations turn off SNMP completely in their environments. When writing this I’m assuming that using SNMP is acceptable in your environment, or you’re looking at this in a non-production/test environment or you’ve taken additional steps to set up SNMP security before deploying it.

SNMP is supported by all kinds of hardware. It’s certainly worth investigating, especially as the number of devices in your data center increases and/or becomes more difficult to manage.

A Smart Admin’s NIM

Edit: This is still a useful technique to know about.

Originally posted December 9, 2008 on AIXchange

In last week’s AIXchange entry, I wrote about Janel Barfield’s presentation on file-backed virtual disks. The end of the slides from her presentation included a statement about using mkcd and creating a bootable ISO image from mksysb images. This allows you to use a virtual optical device as a “smart man’s” NIM. I’ve been using base OS install media with virtual optical devices, so using mksysb files with virtual optical devices looked like another great idea. This would allow me to bypass setting up a NIM server and worrying about network speeds and settings. I’d just install my operating systems directly over the hypervisor.

I took a spare mksysb image and ran the mkcd command. By doing so, I immediately discovered that  the operating system of the machine upon which you’re running the mkcd command had better not be down-level compared to the OS of the machine from which you take it. Once I reran the command on a machine with the same OS level, it was fine.

From the man page: “The mkcd command creates a system backup image (mksysb) to CD-Recordable (CD-R) or DVD-Recordable (DVD-R, DVD-RAM) from the system rootvg or from a previously created mksysb.”

Here’s the command I used. (If you have a better suggestion on how to run it, please post it in comments.)

mkcd -L -S -I /testfs -m /home/guest/mksysbfile

Learn more about the flags from the man page for mkcd:

  • -L creates final CD images that are DVD sized (up to 4.38 GB).
  • -S stops mkcd before writing to the CD-R, DVD-R or DVD-RAM without removing the final CD images.
  • -I cd_image_dir specifies the directory or file system where the final CD images are stored before writing to the CD-R, DVD-R or DVD-RAM device.
  • -m mksysb_image specifies a previously created mksysb image.


In this case I wanted to use an existing mksysb image, but you could also use mkcd from cron to create your mksysb image from scratch.

When the command finished processing, my ISO image resided in /testfs and I moved the resultant image to my VIO server’s /var/vio/VMLibrary directory.

I was then able to use this image to boot my LPAR. The installation was fast, and I quickly had a clone of my source machine without having to set up any resources on my NIM server or set any network settings on my client machine.

Although restoring mksysb images from NIM is a great way to ensure you have bootable backups in a DR situation, using ISO images and virtual optical disks is yet another method that you can use to bare metal restore your AIX machines.

Some New Virtual Disk Techniques

Edit: Changed to the current Power Systems Virtual User Group link as of this writing. The links to the presentation and replay may or may not work depending on the status of the transition from the developerworks site.

Originally posted December 2, 2008 on AIXchange

The AIX Virtual User Group (Central Region, U.S.) recently hosted an informative webinar presented by Janel Barfield. Download the presentation or listen to a replay. The topics covered in the presentation and replay go nicely with some AIXchange blog entries (here and here) I previously wrote.

By listening to the webinar, I learned a few other techniques that we can all benefit from. For instance, in my VIO server, I tried creating a file-backed virtual disk (obviously on this test box I was using rootvg; if this were a production machine I’d create another volume group and use that instead):

mksp -fb fbpool -sp rootvg -size 1G

This added a new file system to my rootvg:

lsvg –lv rootvg

fbpool             jfs2       4       4       1    open/syncd
/var/vio/storagepools/fbpool

In my environment I did an lsmap –all | more and found a client to try this with. I ran:

mkbdsp -sp fbpool 500m -bd test_disk -vadapter vhost2

Here’s the output I saw:

Creating file “test_disk” in storage pool “fbpool”.
Assigning file “test_disk” as a backing device.
vtscsi5 Available
test_disk

I wanted to see what would appear in the file system that I just created, so I ran:

ls -la /var/vio/storagepools/fbpool

total 1024008
drwxr-xr-x   3 root     system          256 Nov 03 15:24 .
drwxr-xr-x   3 root     staff           256 Nov 03 15:18 ..
-rw-r–r–   1 root     staff           206 Nov 03 15:24 .test_disk
drwxr-xr-x   2 root     system          256 Nov 03 15:18 lost+found
-rw-r–r–   1 root     staff     524288000 Nov 03 15:24 test_disk

Interestingly, that .test_disk file contains some XML data describing the disk that I just created:

more /var/vio/storagepools/fbpool/.test_disk

When I look at the mapping that exists after creating the disk, I can see:

 lsmap -vadapter vhost2
SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost2          U7998.61X.100BB8A-V1-C15                     0x00000007

VTD                   vtscsi2
Status                Available
LUN                   0x8100000000000000
Backing device        hdisk3
Physloc
U78A5.001.WIH0A68-P1-C6-T1-W5005076801303022-LD000000000000

VTD                   vtscsi5
Status                Available
LUN                   0x8200000000000000
Backing device        /var/vio/storagepools/fbpool/test_disk
Physloc
U78A5.001.WIH0A68-P1-C6-T1-W5005076801303022-LD000000000000

This shows me that now I have the new virtual disk with the backing device being handled by the newly created file. When I run cfgmgr in my partition that uses vhost2, I see a new disk. AIX running in my partition doesn’t differentiate between the file-backed storage or my normal hdisk-backed storage (hdisk1 is my newly created disk in this instance).

# lspv
hdisk0          00004daa45ffe3fd                    rootvg          active
hdisk1          none                                None

# lsdev -Cc disk
hdisk0 Available  Virtual SCSI Disk Drive
hdisk1 Available  Virtual SCSI Disk Drive

I can now use this disk as I would any other on my machine.

On the topic of virtual optical disks, one point brought up in the webinar was that instead of running multiple unloadopt/loadopt commands when using virtual optical disks, you can just use loadopt –f to force the disk image to load, even if a disk image is already loaded. This makes it simpler when using more than one CD to load the OS for instance, as you don’t have to unloadopt before running the loadopt –f command when switching between disk images.

I urge you to take the time to look over the presentation materials as well as listen to the replay to get more information. Also be sure check the Central Region user group archives for some other great webinars.

Snapshots Make File Recovery a Snap

Edit: Changed the first link to a Youtube video, the second link no longer works but when you search for JFS2 snapshots you can find information on the IBM website.

Originally posted November 25, 2008 on AIXchange

Watching this technical demo by Nigel Griffiths got me thinking. I like taking snapshots and flash copies on my external SAN devices. What if I want to take a snapshot of a file system on my AIX machine that isn’t connected to a SAN?

Read more about these capabilities here.

http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.baseadmn/doc/baseadmndita/jfs2_snapshots.htm

From the document:

“You can make a point-in-time image of a JFS2 file system that you can then use for backup purposes. The point-in-time image for a JFS2 file system is called a snapshot. The snapshot remains static and retains the same security permissions that the original file system (called the snappedFS) had when the snapshot was made. Also, you can create a JFS2 snapshot without unmounting the file system, or quiescing the file system. You can use a JFS2 snapshot to:

* Access the files or directories as they existed when the snapshot was taken.
* Backup to removable media.

“There are two types of JFS2 snapshots: internal and external. A JFS2 external snapshot is created in a separate logical volume from the file system. The external snapshot can be mounted separately from the file system at its own unique mount point.”

So how do I do it? I’ll duplicate what Nigel did in his video demo using my test machine that runs AIX 6.1.

First, I use the following command to create a file system in rootvg that’s 500MB in size:

crfs -v jfs2 –g rootvg -a size=500M –m /snaptest -a isnapshot=yes

Then I mount the file system:

mount /snaptest

Then run two commands:

touch hello this is a test

echo “this is a test” > test.out

Finally, I run:

ls -la
drwxr-xr-x    3 root     system          256 Oct 23 14:42 .
drwxr-xr-x   22 root     system         4096 Oct 23 14:37 ..
-rw-r–r–    1 root     system            0 Oct 23 14:38 a
-rw-r–r–    1 root     system            0 Oct 23 14:38 hello
-rw-r–r–    1 root     system            0 Oct 23 14:38 is
drwxr-xr-x    2 root     system          256 Oct 23 14:37 lost+found
-rw-r–r–    1 root     system            0 Oct 23 14:38 test
-rw-r–r–    1 root     system           15 Oct 23 14:42 test.out
-rw-r–r–    1 root     system            0 Oct 23 14:38 this

To see the available menu options, go here:

smitty > system storage management > file systems > add change show delete filesystems > enhanced journaled filesystems

The menu options are:

List Snapshots for an Enhanced Journaled File System
Create Snapshot for an Enhanced Journaled File System
Mount Snapshot for an Enhanced Journaled File System
Remove Snapshot for an Enhanced Journaled File System
Unmount Snapshot for an Enhanced Journaled File System
Change Snapshot for an Enhanced Journaled File System
Rollback an Enhanced Journaled File System to a Snapshot

In our case we want to create a snapshot. This can also be done from the command line:

snapshot -o snapfrom=/snaptest –n testsnap

Running ls –la reveals no additions in the file system, but if you cd to .snapshot, you’ll find the files you just created.

I cd into /snaptest/.snapshot/testsnap and run ls to find all of the files that were in my file system when I took the snapshot.

Creating a snapshot allows you to easily recover files should someone delete them, without having to resort to restoring from a TSM machine or some other backup mechanism.

No Substitute for Support

Edit: I still love getting emails, and I still often redirect questions to IBM Support.

Originally posted November 18, 2008 on AIXchange

I love hearing from readers. People from around the world e-mail me, telling me about challenges they face in their environments. I also love it when readers leave comments about my posts. While I hope you learn from things I write, I can assure you that I learn from you. That’s one reason this blog is called AIXchange, and not aix-i-know-it-all-so-listen-to-me-or-else. Hopefully this continues to be a place where ideas and information are shared.

That said, some things can’t be done in this forum. For instance, readers occasionally ask me for specific answers to specific problems they’re experiencing in their data centers. I like hearing about problems and how they were solved, but unless you want to set up something formally with my employer, I cannot serve as the support organization for your business. What if, by the time I answer, you have another issue? And then what if, given time differences or my schedule, I’m unable to respond immediately?

It may be trivial for me to provide a specific answer to a problem outlined in a reader e-mail. Then again, I don’t know everything either. I could respond with what I think is the correct answer, given my experiences and the information I receive. But maybe, because I don’t have all the information or because I don’t know the specific hardware and other requirements, there’s an outage. I certainly don’t want to get an e-mail informing me that my advice had caused harm to your business.

I try to be very polite with all of these interactions, but for the record, I’ll probably ask you to contact IBM Support or your business partner for help with specific problem situations. Again, I love to talk about the different things you’ve done and experienced, but I am not currently in the position to be your back-end support mechanism.

I always encourage customers to be current with their hardware and software maintenance contracts. If you’re not current, set something up with your business partner. And if your system is down, rather than take time tracking down answers (especially if you’re not strong in a particular area), call IBM Support from the start. Let them gather the snap, trace and other information that will help them quickly and accurately resolve your problem. You’re paying for access to what I consider to be the world’s best IT support organization. Take advantage of it.

Some people tell me that they won’t call IBM unless they have to. While I agree that we should know what we’re doing, again, none of us knows everything. There’s no shame in calling support. At the end of the day we’re trying to provide the best uptime that we can for our customers, while using the best practices available.

Please keep the comments and the e-mails coming. A lot of my “how-to” or informational posts are based on real-world scenarios, so your input is invaluable. But remember, specific questions will probably be answered with a gentle reminder to call IBM Support so you can get the timely help you deserve.

Writing a Certification Test

Edit: I have been on several teams that have written tests over the years, and I hope to be on more in the future. The link no longer works.

Originally posted November 11, 2008 on AIXchange

If you’ve worked on AIX machines for a while, you’ve probably taken an IBM certification test. But have you ever taken a certification exam on a subject you weren’t especially well-versed in?

I have. I attended a conference where a free certification exam was included in the registration fee, so I tried my hand, even though the test I choose covered material I didn’t know well. Still, I thought I could pass. I was wrong. I lacked the relevant experience and knowledge, and I couldn’t guess my way through to a passing score.

When you take a test, assumptions are made about your skill level. In order to pass, you must know specific information, and you need to understand real-world scenarios and how to apply that knowledge. You need to know different commands and, in some cases, the specific flags that go with them.

While this is a good time to re-emphasize the importance of studying for your certification, the primary purpose of this post is to note the work that goes into developing certification exams.

With hardware and software ever-evolving, certification tests are regularly updated. And with the availability of POWER6 and AIX 6, now’s an appropriate time for another test refresh. I recently joined a team that was writing one of the revised certification tests. The professionals in our group came from all over the world, and have many different backgrounds. I thought you might be interested to know a bit about the test-writing experience.

The first thing I noticed  was the strict confidentiality required for all team members. We were not to discuss questions or answers with anyone outside of the team for any reason. The last thing we want to do is allow a test taker to get access to the questions and answers. If people are able to cheat their way through an exam, it lessens the value of the certification for those who pass the exam legitimately.

Team members write test questions and corresponding answers on their own–you’re also responsible for including the documentation (IBM Redbooks, Web sites, etc.) that supports your answers. Once all the documentation is created, team members go over them as a group during frequent conference calls. We discuss the questions and make sure they’re clear and grammatically correct. We also see that the answers are correct, and, in the case of multiple-choice questions, we make sure that the wrong answers aren’t overly obvious. We want to be sure that you pass the exam because of your knowledge, not because of your ability to make educated guesses.

A nice side benefit to being a certification test writer is that, in exchange for helping write the test and putting in the time to verify the answers, I automatically earn that certification. I don’t have to go to a Prometric site to take it.

The test we wrote, Virtualization Technical Support for AIX and Linux, is now live, so once you take it, let me know how we did with the questions we chose.

Why Consider IBM Storage? Performance

Edit: The title is still true, especially with the newer technologies available, however with the gear we had in 2008 I doubt these specific models are very compelling anymore. Neither link still works.

Originally posted November 4, 2008 on AIXchange

Once the network and fibre cables are plugged in, many server administrators consider their work finished. Getting the LUNs needed to load their data or doing the magic that allows their server to talk to the world? That’s what network and storage teams are for.

Of course, in some organizations, the server admin, network admin and storage admin are one and the same. But whether we’re working in small shops or large enterprises, we must understand the impact storage has on overall system performance.

In the case of IBM storage, here’s a handy link for your toolkit. The information here is self-explanatory, but I would direct your attention to that Web site’s eLearning section. This section features the IBM Learner Portal, where you’ll find material on topics like cabling machines, installing hardware and software, and configuring the DS3000, DS4000 and DS5000 storage families. There are also links to webcasts, classroom-based education and a mailing list.

So what does IBM have going for it in a competitive storage market? High-performing devices. Take a few moments to learn about the advantages of IBM storage here. Even if you have no intention of becoming a storage expert, knowing what to look for when you’re disks aren’t performing as expected will only make your job easier.

Console Counseling

Edit: I am pretty sure we all have HMCs by now. The first link still works, the second does not. At the end of the post I am including information that may be useful, although I imagine this issue is pretty rare these days.

Originally posted October 28, 2008 on AIXchange

Having a Hardware Management Console (HMC) in your environment has its benefits. You can remotely power a machine on and off. You can remotely get to a console from the HMC. You can use the HMC to create logical partitions. And, should problems arise with the actual POWER server or the HMC, IBM can help you. If you’re using another vendor’s solution for your remote console and remote “power off and on” solutions, who knows what kind of support you might get from them if things are not working properly.

After running your machine in stand-alone mode with an async terminal for a console, you may have an issue defining your console after you attach it to the HMC. Sometimes when logging into the HMC and trying to bring up a console window, I’ve seen strange characters on the screen, or no characters at all. I call this gobbledygook, though I’m not sure that phrase translates well for our international readers.

In one scenario, IBM Support had me remove devices from the ODM and then reboot, so that it would give me the opportunity to select the console. On another occasion, the machine was still in manufacturing default configuration (MDC) and IBM had a document that advised us to take it out of MDC. In each case, the console worked fine once I followed IBM’s suggestions.

I thought I’d pass on these items should you ever find yourself with a non-working console from your HMC. If you click on the links below, you can find information on the problems and the solutions.

  • Problem: Terminal window to the default partition is unusable on new managed systems due to system being in MDC. Symptom: Users receive garbage characters on terminal window to newly added p5 or p6 managed systems or do not get a login prompt once default partition is activated.”
  • Problem: System p is newly attached to an HMC and the server property for MDC is false. The vterm console works in hardware mode (i.e., SMS and for displaying hardware parameters when LPAR is activating). However, the vterm does not display login prompt after AIX has been booted. Cause: AIX was installed on system p server when it was attached to an async console prior to attaching system to HMC.”

Read both of these links and add them to your toolkit. As always, if you get stuck, call IBM Support. They will know how to help you.

Edit 2: The second link no longer works, but I found a post about it in the developerworks community, before it went away, so I am posting it here for completeness.

https://www.ibm.com/developerworks/community/forums/html/topic?id=77777777-0000-0000-0000-000014813333

Remove stand-alone Power 520 /550/9117 from Serial and add to HMC

Apr 16, 2012 
I am having issues setting up my power systems to use vterm on HMC. I get a error that the system is reserved for serial connection. This is understandable cause they were setup via serial. They are 1 Lpar machines , and have now been setup to connect to HMC.

This is the IBM technote that I followed but still no luck. I removed serial and only had HMC connected , but still couldn’t connect VIA vtmenu .

Any advice ?
Technote (troubleshooting)

Problem(Abstract)
System p is newly attached to a hardware management console (HMC) and the server property for manufactoring default configuration (MDC) is false. The vterm console works in hardware mode (i.e. SMS and for displaying hardware parameters when lpar is activating). However, the vterm does not display login prompt after AIX has been booted.

Symptom

Cause
AIX was installed on system p server when it was attached to an async console prior to attaching system to HMC.
Resolving the problem
To reconfigure the AIX’s console device to use the new hardware connection you will need to do the following:
Either telnet (or SSH) into the server if networking had been configured or boot into a maintenance shell and run following procedure.

The procedure will completely remove vty0, vsa0 (and any other vty or vsa device) and syscons objects from the ODM and allow them to be recreated by AIX when system reboots.

Once telneted in or in a maintenance shell, as root execute the following:

  1. odmdelete -q name=vty0 -o CuDv
  1. odmdelete -q name=vty0 -o CuDv
  1. odmdelete -q name=vsa0 -o CuDv
  1. odmdelete -q name=vsa1 -o CuDv
  1. odmdelete -q attribute=syscons -o CuAt
  1. bosboot -ad /dev/ipldevice <—- customer hung trying to run this

command

  1. sync

#reboot

When system comes up it should reconfigure vsa0 and vty0
and since we removed “syscons” from ODM it should prompt
you to select a terminal. This time, the only device
connected for a console should be the HMC’s vterm.

You should messages similar to following when LPAR is booting back up:


  • Please define the System Console. *********

Type a 1 and press Enter to use this terminal as the
system console.

1

The Command Line Remains a Prime Remote-Support Option

Edit: Portmir still works and it is still glorious. Assuming you are both logged into the same machine at the same time.

Originally posted October 21, 2008 on AIXchange

When someone reports a problem to me, many times the situation can be addressed by using some kind of remote desktop-sharing software. This allows me to quickly and easily see what the user is seeing, instead of having it described it to me over the phone or in an instant message session.

Tools like Lotus Sametime Unyte, WebEx and VNC allow administrators to remotely access end user desktops to resolve their issues. Through this process, you both see what’s happening on the screen while discussing it over the phone.

In AIX however, I’ve found that the command line can be a preferable option for monitoring other interactive login sessions. In this environment you don’t have to deal with the graphics overhead that exists with some of the other tools.

If I’m already running an ssh session into the same machine as the person who’s reporting the problem, I find it useful to run the portmir command.

If I am logged in as root and run a “who” command for instance, I might receive this output:

who
root        pts/0       Sep 15 23:23     (10.9.0.1)
user1      pts/1       Sep 16 13:47     (10.9.0.5)

In this case, since I’m logged in as root, I can see what problems user1 is having by entering:

portmir –t /dev/pts/1

From that, I see:

portmir: Remote user connected, mirroring active.

This message is displayed on both sessions to prevent you from mirroring someone’s session without their knowledge.

Now you can both see the same thing, either the administrator or end user can take control of the session.

When you’re done with the other session, simply enter:

portmir –o

The session ends with:

portmir: Mirroring is stopped.

Instead of figuring out which desktop sharing tool to use, who will run it and what password is needed, sometimes it’s quicker and easier to use portmir. It can also be used between non-root users. Read the man page for more on getting up and running with portmir.

Digging Into the Latest IBM Announcement

Edit: I wonder what was going on with the stock market or the election in 2008. The announcement links no longer work. I am leaving the post in the archives for completeness, but I am not sure how much interest there is in this hardware today.

Originally posted October 14, 2008 on AIXchange

In case you were busy watching the stock market or the election coverage and missed it, IBM issued this announcement on Oct. 7. Go here for the overview (.pdf). And for those who prefer pictures to words, here’s a video.

Each of these links is worth your time, but here are some highlights:

  • There are new options on the  Power 570, including 4.4 GHz and 5.0 GHz processor speeds. You can also choose to have 32 cores running at 4.2-GHz processor speeds.
  • IBM also announced a hot-node repair where a failed CEC can be “isolated and then removed without an IPL of the system. When the enclosure is repaired, it can be reintegrated to the system using the hot-node add feature previously announced. This new function can be used to upgrade memory on a system without powering the system down.” From what I can tell this will work on nodes 2, 3 and 4, but not node 1. I imagine you need the necessary machine resources to handle the workload for this feature, or else run live partition mobility to move some of the partitions to another server while this is happening.
  • In addition, a new  Power 560 express has 3.6-GHz processors in a possible 2-node configuration, 8 cores per node or 16 cores total with a maximum of 384 GB of memory.
  • I also see support added for the DS3400 and IBM Power Systems running IBM i, so this increases the available options for connecting the i to external storage.
  • Finally, I see you can “order JS12 and JS22 blades with AIX or IBM i pre-installed for faster deployment,” and that IBM i will run on a 4-core 520, a 2-, 6- and 8-core 550 and a 3.5 GHz 550.  Also, “AIX, i, and Linux can run concurrently across all entry power systems configurations.” I have machines using all three operating systems on the same hardware, and they run great.

As you’re  digesting this information, if you find something that I missed, feel free to add it in comments.

The Myth of Indispensability

Edit: I fixed the links. I still advocate being sure you find ways to recharge your batteries and find fulfillment outside of work.

Originally posted October 7, 2008 on AIXchange 

How do you manage to accomplish all of the items on your daily to-do list? With the documentation that needs to be written, meetings to attend and new things to learn–not to mention the proverbial fires that flare up and must be put out–it’s easy to feel overwhelmed by it all.

We all have unique methods of prioritizing. Maybe you make a formal list, or maybe you just respond to the loudest user. Certainly what IT professionals do from day to day is worthwhile. But while we work hard, we also need to make sure we’re working smart.

Thomas J. Watson Sr., the president of IBM over four decades in the early 20th century, simply told people to think.

In his renowned self-help book, Stephen R. Covey tells people to sharpen the saw.

Whatever you call it, be sure you do take the time to step back, see the big picture, and look for ways to do things better. Are your scripts the best they can be? Do you notice the same devices failing, or the same people calling for help? Have you tried to identify trends in your problem reports, or are end users finding and solving problems before you even know about them?

Step away from your job now and again and recharge. Burning yourself out doesn’t help anyone. I regularly meet people who insist that if they take a vacation, their workplace will fall apart. But if you truly feel that way, then you probably haven’t documented your job duties and/or trained others to fill in for you. Of course, in smaller shops it can be tricky finding people to backfill you, but there’s always someone who can learn the critical aspects of what you do.

So stop. Think. Work smart, but don’t forget to take time for other things. Go outside. Take a walk. It’s surprising what may come to you.

How Much is Too Much Downtime?

Edit: The link has changed, the whitepaper was revised in 2011, but you can still read about it, although I do not think this applies anymore unless you are running some ancient code and older hardware.

Originally posted September 30, 2008 on AIXchange

How often do you hear someone say they’re happy running their applications using Linux on their x86 hardware? They don’t want to hear about Power systems–in their minds they perceive them to be “too expensive.”

I always wonder how much is too much when you’re running your core business applications on these commodity servers. Really, it comes down to how much system downtime can you afford in your environment.

How quickly do you want to be able to call support, diagnose a problem, dispatch a CE and have a repair made? Better yet, what if your machine detects problems and “heals itself,” calling home to IBM so the service reps can let you know that your machine is reporting that it needs service.

If downtime doesn’t translate into lost dollars for your business, then maybe you can afford to take a commodity hardware approach. Some people are just fine deploying server farms consisting of commodity hardware. If they lose one machine, it’s no big deal, because the others that are still running continue to provide service.

The server farm approach has its downsides–the overall power consumption, rack space, infrastructure cabling issues, etc. One thing to consider when making these decisions involves reliability, availability and serviceability (RAS), a topic covered in this great whitepaper.

From IBM:

“In IBM’s view, servers must be designed to avoid both planned and unplanned outages, and to maintain a focus on application uptime. From a reliability, availability and serviceability (RAS) standpoint, servers in the IBM Power Systems family include features designed to increase availability and to support new levels of virtualization, building upon the leading-edge RAS features delivered in the IBM
[System p and System i] servers. This paper gives an in-depth view of how IBM creates highly available servers for business-critial applications.”

Many issues are covered here, including dynamic processor sparing, processor recovery, hot node add (add a drawer to a running system) and protecting memory.

More from the whitepaper:

“The overriding design goal for all IBM Power Systems is simply stated: Employ an architecture-based design strategy to devise and build IBM servers that can avoid unplanned application outages. In the unlikely event that a hardware fault should occur, the system must analyze, isolate and identify the failing component so that repairs can be effected (either dynamically, through “self-healing” or via standard service practices) as quickly as possible – with little or no system interruption. This should be accomplished regardless of the system size or partitioning.”

How much downtime you can afford is something that each company must determine for itself. The question revolves around the total cost of ownership. What do you need your machine to do to support your business? What kind of performance are you looking for? What kind of reliability are you looking for? Ultimately, this will tell you the amount of downtime you can tolerate.

IBM Technical University Worth Planning For

Edit: I remember seeing John McCain that night, and I still find value in attending IBM Technical University.

Originally posted September 23, 2008 on AIXchange

If you happened to be in the lobby of the Chicago Hilton on the evening of Sept. 8, you might have seen Secret Service agents posted at all the doors. Then, if you looked closely, you would have seen John McCain entering the ballroom for a fundraising dinner. A few nights later, in that very same ballroom, you could have seen Jake and Elwood with a Blues Brothers tribute band providing entertainment.

Readers might have been in the Chicago Hilton those nights since this was the venue where the IBM Power Systems Technical University featuring AIX and Linux conference was held. According to the attendees that I spoke to, and from my perspective, this was another worthwhile event.

After the consolidation of AIX and IBM i onto the same hardware platform earlier this year, the AIX conference was held simultaneously with the IBM Power Systems Technical University featuring IBM i. This was a great opportunity for IBM i focused people to learn more about AIX, and vice versa. I was able to attend some IBM i sessions, and many of the messages are the same on both sides of the house–we all use the same hardware, the same HMC and many of the same procedures to get things done.

Each session consisted of around 20 different classes (taking both conferences into account), covering basic to advanced topics. Many of them repeated, so if you had a conflict, you could usually find a convenient time to attend the class of your choice.

One topic I enjoyed learning more about was virtualizing IBM i partitions. IBM i can act like a VIO server and host AIX and Linux guest partitions, but, as of IBM i 6.1, it can also host another IBM i partition. IBM i can manage the disk for the guest partitions, or you can use VIO to host an IBM i partition. The IBM i administrators at the conference seemed to express some concern about the new interface and command line options that need to be learned when setting up VIO partitions, but, once they get a chance to try it, I’m sure it will all start to make sense.

Put next year’s Technical University on your calendar now. This way, when it comes time to request education, you’ll already have one event that you know you won’t want to miss.

IBM Unveils AIX Enterprise Edition

Edit: The first link no longer works, the second link still does. AIX Enterprise Edition is still a thing, although what is included has changed over the years.

Originally posted September 16, 2008 on AIXchange

During last week’s IBM Power Systems Technical University in Chicago, IBM announced  AIX Enterprise Edition. Take a few moments to look into this solution. I think you’ll be glad you did.

From IBM:

“The AIX Enterprise Edition is a new IBM offering that includes AIX 6 and several key manageability products. AIX Enterprise Edition consists of the AIX 6 operating system, the PowerVM Workload Partitions Manager for AIX (WPAR Manager), and three Tivoli products: Tivoli Application Dependency Discovery Manager (TADDM), IBM Tivoli Monitoring, and the IBM Usage and Accounting Manager Virtualization Edition for Power Systems. This offering delivers significant manageability capabilities beyond the capabilities of the standard AIX 6.1 product (AIX Standard Edition).”

In the past, if I wanted to control and relocate workload partitions with WPAR Manager, I had to buy a separate product. Now it’s bundled in Enterprise Edition. This is the product we need to take advantage of application mobility (the capability to move WPARs from one system running AIX 6.1 to another system running AIX 6.1).

Tivoli Application Dependency Discovery Manager (TADDM) is designed to “discover system and application data center resources.” I haven’t had a chance to try it yet, but it sounds like it makes it easier to visualize what’s going on in the computer room by telling me which applications are running on which virtual and physical machines. TADDM also locates changes in the data center, which can help make troubleshooting easier.

AIX Enterprise Edition also includes IBM Tivoli monitoring (ITM)–this allow us to monitor physical and virtual resources and, if need be, look at historical data. In addition, a usage and accounting manager (UAM) reports computer resource usage by a department or an organization. This can be handy if multiple departments want to be charged for their actual computing utilization.

I expect that purchasing the Enterprise Edition, as opposed to buying each package as a stand-alone product, would be a money-saver for your organization.

So read through the announcement, and contact your sales organization for more information if you think this would make sense in your environment.

The Value of Being ‘Well-Red’

Edit: This redbook looks like it was last updated in 2013, so some of the entries have probably been changed by now. The advice about reading Redbooks and keeping informed still applies. It also speaks to the value of downloading material so that you have different versions available to you.

Originally posted September 5, 2008 on AIXchange

Do you read IBM Redbooks? Some people tell me they consult Redbooks when they’re seeking specific information, but that they’re too busy to read them from start to finish. (And since Redbooks are typically hundreds of pages in length, they aren’t generally quick reads.)

My response is that, if you look hard enough, you can find extra time for learning. For instance, do you commute on a train? That could be time spent reading. Do you watch a bunch of mindless sitcoms? Kill your television and study. Maybe there’s some other time-waster you can remove from your life? Replace that with something worthwhile. Don’t just punch the clock, or wait for a team lead or senior person to spoon-feed you what you should know.

To me, the Redbook, “PowerVM Virtualization on IBM System p: Managing and Monitoring,” published in June, contains tons of worthwhile information.

For starters:

  • Setting up a firewall on your VIO server (pages 56-58)
  • Setting up NTP on your VIO server (page 73)
  • Setting up additional users besides padmin on your VIO server (page 77)
  • Creating vio backups (page 82)
  • Backing up IVM profile data using the VIO command line with
    bkprofdata -o backup -f /home/padmin/profile.bak (page 84)
  • Different ways to use the backupios command: backupios -file /mnt/backup –mksysb (page 87
  • Backing up other disk structures with savevgstruct (page 93)
  • Sending error logs to another server (page 127)
  • Using Linux on Power tools to dynamically move memory on a running machine (page 156)
  • This statement (page 170): “There are no architected maximum distances between systems for PowerVM Live Partition Mobility. The maximum distance is dictated by the network and storage configuration used by the systems, and by the fact that both systems must be managed by the same HMC. Provided both systems are on the same network, are connected to the same shared storage, and are managed by the same HMC, PowerVM Live Partition Mobility will work. Standard long-range network and storage performance considerations apply.” That’s interesting, even if it might not yet be practical.
  • The script that creates a simple report (vios_report.ksh) of all the disks on a given VIO server (page 187)
  • The tip about connecting to https:/hostname:5336/ibm/console on your AIX 6 machines (page 208). You’ll get a systems director console for AIX. Take some time to try this out.

And this is just one Redbook. Take the time to read Redbooks, and any other documentation that’s applicable to your job. I can promise you’ll find information that you can use in your shop.

Lesson Learned about Citrix on Linux

Edit: This is mainly here for historical purposes, I would be surprised if anyone would still find this useful.

Originally posted August 26, 2008 on AIXchange 

At a recent training session we had to connect to a Citrix server to access the machines used for the class. I didn’t have any issues, but the student next to me couldn’t get a Citrix client working on his laptop. He tried uninstalling/reinstalling and rebooting, among other things, but couldn’t connnect using his Windows laptop. In the end he had to borrow another machine to work on the labs.

That made me think: If I were in his boat, what would I do?

I like vnc. Maybe I could use a VPN to access a Linux machine in my lab, fire up vncserver, run Firefox inside of the vnc session and run a Citrix client that way?

Then I realized that the Linux partition I have up and ready to go runs Linux on Power, and on top of the Linux on Power installation is a version of Lx86, which I discussed in another AIXchange blog entry.

So I give it a try. I point my Firefox browser in my vnc session to the Web site to log on. The login page displays a link to download a client. It recognizes that a Citrix client isn’t installed, and offers me a choice of clients to download. I choose the Linux client and end up downloading a package called linuxx86.tar.

When I untar the file, I run:

./setupwfc

This command gives me this output:

./setupwfc
Citrix ICA Client 9.0 setup.

Select a setup option:

1. Install Citrix ICA Client 9.0
2. Remove Citrix ICA Client 9.0
3. Quit Citrix ICA Client 9.0 setup

Enter option number 1-3 [1]:

Option 1 is the default. I choose this option, and receive this output:

Please enter the directory in which Citrix ICA Client is to be installed
[default /usr/lib/ICAClient]
or type “quit” to abandon the installation:

I choose the default directory, and press “Enter”.

Then I’m prompted to accept a license agreement, which I do by selecting option 1.

Select an option:

1. I accept
2. I do not accept

Enter option number 1-2 [2]

Option 2 is the default here.

After the installation completes, I quit the client setup. Then I change directories and create a link to the newly installed file so Firefox can find the plug-in:

cd /usr/lib/firefox-1.5.0.10/plugins

ln –s /usr/lib/ICAClient/npica.so npica.so

After I restart Firefox and return to the login page, I receive this error in Firefox:

“You have chosen not to trust Equifax secure certificate Authority the issuer of the server’s security certificate.”

I poke around on Google and find this answer, which points me here and here.

I download the files from each link and rename them with a .crt rather than a .cer extension. Then I copy those files into the /usr/lib/ICAClient/keystore/cacerts directory and run chmod o+r  to give Firefox permission to read them.

After restarting Firefox and logging into the Web site, I’m able to login and do the labs. I end up with Firefox running a Citrix client on top of my Lx86 (x86) installation, which is running on my Linux on Power partition (ppc64).

I was proud of my accomplishments, but others in the room were less impressed. However, this solution paid off when we had a network outage later in the day. While everyone else had to log back into their sessions, I just reconnected to my vnc session and picked right up where I left off. And the performance of the Citrix client running in Lx86 world was pretty good.

I was able to convince myself that if my browser wouldn’t work on my Windows machine, I could still connect to the IBM machines using my Citrix running on Linux solution–and in this case, one running on an LPAR running POWER6 hardware as well.

Doing More With Less

Edit: Although we are working with POWER9 instead of POWER6, these questions and discussions are still ongoing today.

Originally posted August 19, 2008 on AIXchange

A vendor wants to charge you a “per CPU license” fee. What number are vendors looking for when they want to count CPUs in your machine to calculate what you owe? Are they looking for the number of physical sockets on your machine, or maybe the number of processor cores? Are they looking for the total number of processors installed, or do they only count the number of CPUs that are actually activated on your machine?

With the advent of Micro-partitioning, there’s another thing to consider. Are they looking at your HMC to learn my minimums and maximums for each LPAR? Are they looking at the number of virtual processors that your operating system sees? Are they looking for the “number” of processors that you consume when borrowing CPU cycles from other, less busy LPARs during busy times?

More vendors are realizing that we’re less likely to dedicate a whole CPU to an LPAR. Depending on the workload, dedicating a CPU to a “less busy” LPAR when another LPAR could make use of those “idle” cycles can be a waste of resources.

What is a processor? When your HMC reports that 4 CPUs are available on your machine, what does that mean? Do you have four actual chips in your machine?

On my JS22 for instance, it tells me I have four processors. There are two chips on the blade; each is dual core. Is my vendor interested in the number of physical sockets on the blade? Is it interested in my processor class? My POWER6 system will do more work than my POWER5 system did, so I can configure fewer CPUs for my LPAR to do the same amount of work. Will my vendor then charge me less?

As we become more virtualized, this topic will continue to be revisited, but it’s another thing to think about as we justify upgrading our hardware. We may well be able to do more with less, and lower our software bills in the process.

IBM Installation Toolkit for Linux Does More Than Just Install Linux

Edit: Changed the first link, the second link, the third link, and the 4th link, the text no longer reads as it did in 2008. I am not sure how applicable this tool is going to be anyway, but you never know what someone may find useful in the future so I am keeping it here.

Originally posted August 12, 2008 on AIXchange

Don’t be taken in by the title–the IBM Installation Toolkit for Linux isn’t just for installing Linux. From IBM: “The IBM Installation Toolkit for Linux provides a set of tools that simplifies the installation of Linux on IBM Power Systems. The toolkit also provides IBM value-added software that you can install, so that you can take advantage of Power Systems capabilities, such as Dynamic Logical Partitioning (DLPAR). The toolkit also supports Web-based updates, providing immediate access to the latest offerings.

“The Toolkit can also be used as a rescue bootable DVD to run diagnostic tools and repair previously installed operating systems. It also provides a wealth of IBM documentation for configuring and managing Power systems.

“The Toolkit is available as a single ISO image that that you can download from this website. This image can be used to create a bootable DVD or to create a network installation server, which makes multiple and parallel Linux installations over the network possible.”

While I plan on using it to install Linux, I first wanted to check out the Toolkit’s other features. First, I downloaded the .iso image. As always, I prefer virtual optical instead of physical media. So I went to my virtual I/O (VIO) server and ran:

mkrep -sp datavg -size 16G

Then I ran:

oem_setup_env
cd /var/vio/VMLibrary/
scp source_machine:/path/to/iso.image ./

This copied the .iso image to my VIO server so that I could assign it as a virtual optical device.

I did a DLPAR operation on my HMC to add a virtual SCSI adapter to my VIO server. Then I ran cfgdev on my VIO server so that I could see the adapter.

And then I ran:

mkvdev -fbo -vadapter vhost3
loadopt -vtd vtopt0 -disk IBM_Installation_Toolkit.iso

lsmap showed me:

vhost3

VTD                   vtopt0
Status                Available
LUN                   0x8100000000000000
Backing device        /var/vio/VMLibrary/IBM_Installation_Toolkit.iso
Physloc

After assigning the virtual SCSI adapter to my client LPAR, I was then able to boot from this CD image.

Once I booted the machine, I got a root prompt. Then I entered:

WelcomeCenter

After accepting the license, I was presented with these options:

Install Linux
Utilities
Help

I went into the Utilities and saw:

Configure network
Eject Media
Reboot System
System Diagnostics
Firmware Update

The System Diagnostics has:

System Properties
System Inventory
Error Log
Service Configuration
Boot Configuration

These look like useful tools, and if I boot from this CD image I don’t need to have an OS installed to run the utilities. Once I’ve had more time to explore them, I’ll report back with more findings.

More from IBM:

“The server consolidation tool provided by the Toolkit tackles the most time-consuming and error-prone aspect of server consolidation: the migration of OS stack and user and application data. With the Toolkit, the administrator can quickly put a new server into production. The tool targets the migration and customization of LAMP stack (Linux – Apache – mySQL – Perl, Python, and PHP) and data from
X86 servers running RHEL 4, RHEL5, SLES9 and SLES10 to Power Systems.

“The administrator has complete control over the installation and migration process. He chooses the level of the OS and whether additional RPMs should be installed as well as whether to migrate user accounts and data. So whether you intend to migrate one or more servers, the new server consolidation tool is sure to save you a lot of time.

“The IBM Installation Toolkit is intended for customers who want to:

  • Install and configure Linux on a non-virtualized Power System.
  • Install and configure Linux on machines with previously configured Logical Partitions (virtualized machines).
  • Install IBM RAS Tools along with Linux or on a previously installed system.
  • Upgrade firmware level on Power Systems.
  • Perform diagnostics or maintenance operations on previously installed systems.
  • Improve application performance using the latest Power Systems optimizations available in the Advance Toolchain.
  • Migrate LAMP stack from X86 RHEL and SLES servers to Power Systems.
  • Browse and search Linux documentation included on the Toolkit ISO. “

What is Crush in AIX, Anyway?

Edit: This link sheds a little more light on the crush command. I still think this intro holds up.

Originally posted August 5, 2008 on AIXchange

Are administrators violent by nature? Given our terminology, I sometimes wonder.

Consider: When a machine fails, it crashes. If a user has a process that doesn’t behave, we kill it. When we want to manipulate data, we use cut–or in perl, we use chop. We can even use finger if it’s appropriate. I guess that’s not as bad as “the” finger.

Of course, we do have our soft side. Nice, cat, and sleep are other commands we use. So maybe it just depends on our mood.

AIX also has a command called crush. Have you used it? If you have bos.perf.tools installed, you can find it in:

/usr/lib/perf/crush

When I googled /usr/lib/perf/crush, it returned one page, in Chinese. A Google translation indicated that it was a listing of files that are contained in the bos.perf.tools fileset.

I asked around, but I couldn’t find any other information. So I got on a test box and I ran crush. The machine returned this output:

Please supply an integer number of pages.

So I did. I ran /usr/lib/perf/crush 1. Nothing exciting seemed to happen.

Then someone told me that it was an undocumented command. Another guy told me it’s undocumented for a reason. A third guy told me that all crush does is allocate a bunch of memory. Then it goes through and touches each page in memory and cleans up the cruft that accumulates in the machine while it’s been running.

What you will find, when you give it a large enough integer number of pages, is that your free list will grow, and it will page some of your memory out to paging space.

In my case, I ran:

svmon -G
                     size          inuse         free         pin         virtual
memory       958464     957272       1192     216590     520248
pg space     786432     245608

                 work         pers         clnt
pin            216590          0             0
in use       331024           0       626248

I took the memory size (958464), subtracted my pinned memory (216590) and tried the result (roughly 740000):

/usr/lib/perf/crush 740000

When I reran svmon, I saw:

svmon -G
                    size           inuse         free           pin          virtual
memory       958464     248584     709880     216532     520256
pg space     786432     271874

                work         pers       clnt
pin           216532           0           0
in use       246062          0      2522

My free list had gone from 1192 to 709880.  After a few hours, I saw:

svmon -G
                    size           inuse           free         pin          virtual
memory       958464     548408     410056     216596     517442
pg space     786432     265683

                 work       pers       clnt
pin           216596          0            0
in use       289660          0      258748

With this post, at least the next person who searches for /usr/lib/perf/crush will have a little more to go on. Hopefully whoever wrote crush will see this and leave a comment so we can better understand when it’s appropriate to run the tool on a production machine, and what it’s meant for in the first place.

Staying Current on AIX Takes Effort

Edit: The infocenter link is a blast from the past. I edited the user group link. I also edited the link to the irc and usenet article, although there is much more there as well.

Originally posted July 29, 2008 on AIXchange

Writing this blog is interesting. I hear from people with many different backgrounds, and from many different places. Some readers are brand new to AIX (having recently come from other UNIX flavors), while others have been around AIX from the start. Some are part of very large enterprises with multiple locations, hundreds of machines and teams of people. Others work in small businesses that might have one or two critical servers.

While there are commonalities to every administrator’s job–you need to know how to patch, upgrade, and manage the machines from day to day–a lot of what you do depends on the type organization you work for.

In large organizations powered by enterprise-class machines, IT personnel may be specialized and devoted to specific areas. They want to read about topics that cover things like networking, storage, best practices or disaster recovery. In a smaller shop, fewer people handle multiple roles. In fact, the Windows admin and the AIX admin may well be the same person. For these AIX professionals, the interest in areas like networking and SANs may be even greater, since they’re the ones supporting it all.

Security should be a focus in all organizations. Of course, it’s harder to be confident that your machines are secure and set up properly when they’re the first and only AIX machines that you’ve ever seen. Things that seasoned administrators take for granted may not be done according to best practices in a smaller shop with less skilled personnel.

I like sending new administrators to the Information Center, but there’s a difference between reading about things and doing them over and over in a production environment.

Another way you can get help is to get involved with or start a user group in your area.

For newer admins and the guys in the smaller shops, user groups can provide great opportunities to get information and advice from more seasoned professionals. Most people I know are willing to help out someone who’s looking for help, especially when the person asking the question has already put some effort into finding the answer.

Two other good resources are Usenet and IRC.

Here’s a final piece of advice: Someone once told me, turn off the TV and use that time to study. Even if you only do it a couple times a week, you’ll be amazed at what you can get done. Nobody knows everything. Sometimes doing the same things the same way over a period of years makes you reluctant to learn new things. Staying current requires effort. Regardless of your environment, you can know as much as you want, depending on what you’re willing to put into it. Put in the effort, and you’ll quickly gain the necessary skills to do the job.

The Case for Trace

Edit: I do not know that anyone would argue about the overhead, but you never know. I would probably call Earl Jew and Nigel Griffiths and let them hash it out. I did not edit the links, I will leave it as an exercise for the reader to google for more information if it is desired.

Originally posted July 22, 2008 on AIXchange

During a class I recently attended, an AIX systems internals guy argued that nmon and topas add (minimal) overhead to any machine that you might need to analyze. He also said that the tools’ granularity is such that they could miss some things. The intervals these tools use are measured in seconds, while events on the machine occur at the millisecond level.

His recommendation: If you really want to analyze a machine, use trace, and then use curt or trcrpt to analyze the information trace generates. Events on your machine are being collected all the time. Using trace just logs the information that’s already being collected. While the argument could then be made that logging this information creates some overhead, I think we’re just being pedantic at that point.

I use trace all the time. To get started, run this command:

trace -a -o /tmp/trc.out

Make sure you let the trace run for a reasonable period of time (hopefully long enough for the behavior that you’re trying to detect to present itself).

To end trace, run trcstop. Then you’re ready to run curt:

curt -i /tmp/trc.out | more

With curt, it obviously takes some knowledge to make sense of the information that’s generated. If you need help, IBM is an option here–I’ve seen IBM support use trace when helping cusomers with their performance problems.

Information on curt is available from the Information Center.

This Information Center link introduces trcrpt, which you may also be familiar with. This tool analyzes captured trace data.

To look at the trace information collected with trcrpt, run:

trcrpt /tmp/trc.out

curt provides a clean summary report of all the information in the trace output, but if I’m looking at something specific, I may use trcrpt.

I won’t tell you which tool you should use, but getting down to the trace level can be a good next step when you’re working on an issue.

The Value of Business Partners

Edit: There is still value to be found with your IBM Business Partner. I updated the link at the end to take you to a Business Partner search tool.

Originally posted July 15, 2008 on AIXchange

Something needs to change on your raised floor. Maybe you need to implement a SAN, MetroMirror or FlashCopy. Maybe you need to virtualize and consolidate, or maybe you need to look at blades. Maybe you’re not sure what you need.

Yours was once a small shop, but you’ve been continually growing. The business is making more acquisitions and you’re struggling to keep up. Nobody seems to know where these servers came from over the last few years, but the raised floor is now full of critical production stand-alone machines. They all have internal disks and their own tape drives. Space is an issue. Heat is an issue. Maintenance and backups have become painful. How do you get out of this mess? What’s the rest of the industry doing? What are best practices? Where do you go for advice?

Business partners can be a lifesaver when you’re making these decisions. They know what works and what doesn’t, because they know what other shops are implementing. Best of all, creating lasting customer relationships is their priority. After all, what good is selling new equipment to a customer that has no idea how to install and maintain it? That would only lead to frustration, and complaints about the machines not working as advertised. Then, when it came time for the next upgrade, that customer would likely look elsewhere.

Good business partners do more than sell. They keep you abreast of IT’s constant changes. They take you to briefings, and bring in people to educate your staff. The customer/business partner relationship really is meant to be a partnership. Both sides should work together. Your business partner should bring you new ideas and solutions to help you address real business needs. These should not be cookie-cutter, one-size-fits-all solutions, but solutions that are geared toward your organization’s unique needs and individual skill sets. You should be confident that the IT equipment you purchase from your business partner will work as designed, allowing you to concentrate on your own customers and business challenges rather than dwell on IT issues.

Business partners don’t want to be vendors. Although they might make some money on a single transaction, they focus on the long-term — at least that’s what they should be doing. If you’re finding this isn’t the case with your business partner, maybe it’s time to sever the relationship.

If your company is looking for a business partner, IBM provides this search capability.

Training Fears Unfounded

Edit: I still recommend have lab equipment and not learning on the job with production systems. I also still think that training employees is the way to go, and I still advocate for attending the IBM Technical university. The last link no longer works.

Originally posted July 8, 2008 on AIXchange

Even though we all work hard at our jobs, we also want to continue learning and growing to keep pace with trends and new technology. But staying current can be challenging. Particularly if you spend your time periodically setting up machines that run for the most part with no fuss and no muss, your skills can erode.

Sure it helps to read articles and documentation, but there’s nothing like a lab or test machines for actually learning how new technology works. You definitely don’t want to use your production machine to test new things, not unless you like outages and restores.

When employers are seeking people with new skills, they often turn to contractors. Their skills are continually kept up to date through training and hands-on experience, and by working at multiple customer sites, they get exposed to different kinds of systems.

I concede that some contractors are better than others. You must be careful when you’re trying out someone untested. A contractor not only needs the necessary skill set, but the ability to fit into an organization. A genius contractor who cannot communicate or work well with others is of little use to an organization.

However, good contractors can be invaluable. I’ve seen contractors who are onsite so often that they’re mistaken for a regular employee. In other situations, where they’re only needed for specific projects that require specialized expertise, good contractors will work side by side helping in-house staffers get up to speed, and when the project is complete, they’ll leave behind documentation and knowledge. While organizations can benefit from bringing in contractors, they shouldn’t dismiss the more traditional way of bringing new IT skills: training.

I certainly recommend that regular employees receive as much hands-on training as possible.

Unfortunately, training and education budgets are unpredictable, and many organizations offer less training than they once did. Besides budgetary issues, there’s a trust issue. Some organizations fear that educating employees equates to padding their resumes, and the expenditure inevitably leads to people taking their new skills to a different company. My response is that if your organization trains people and provides avenues for growth, it’s a good place to be, and there’s no reason employees would want to look elsewhere. If people are leaving, it’s not because of training, but for another reason that must be identified and addressed.

Speaking of training, IBM Power Systems Technical University featuring AIX and Linux (formerly IBM System p, AIX and Linux Technical University, which I covered in a previous AIXchange blog entry) is set for Sept. 8-12 in Chicago.

Make plans to attend this year’s conference, and you’ll be well-positioned for the future.

The Starting Point for AIX Tools

Edit: Had to update the link to the toolbox, otherwise it is still relevant. I also added a link to an article from June 2004 that talked about vnc and screen.

Originally posted June 24, 2008 on AIXchange

AIX newcomers will often ask where they can find useful tools for their machines. Though they can certainly download source code and compile the tools themselves, many times they’re just looking for something that’s precompiled and ready to run. In those cases, I point them to the AIX Toolbox for Linux Applications.

While I like building tools from source, especially when newer versions are available compared to what’s on the Toolbox, oftentimes the Toolbox is good enough.

Once the files (which are also available on CD) are copied to the target machine, they can be installed using the rpm command. I usually run rpm -ivh . In some cases, prerequisite filesets must also be loaded from the toolbox, but rpm is pretty good about telling you which files must also be installed. To see which files are already loaded on your machine, enter this command:

rpm -qa | more

Which tools do I like? At the top of my list are vnc and screen. I wrote about both in an article in IBM Systems Magazine.

Using these tools I can disconnect (either on purpose or due to a network or computer error) and then reconnect later from a different location.

I also like wget, rsync, lsof, and expect.

The Toolbox has many tools to choose from, so load them onto a test box and try them out. I also encourage you to recommend your favorite AIX tools in comments.

Getting to Know SVC

Edit: SVC is still here, and the links still work.

Originally posted June 17, 2008 on AIXchange

For many system administrators, SAN management is like an unsolvable mystery. The fibre cable is plugged into the server’s host bus adapter, and then somehow, like magic, a LUN appears. Others who more frequently interact with their disk vendors are involved with disk management on an ongoing basis. Although I’m usually more OS-centric, I understand that disk subsystems hugely impact system performance and data availability. I know that the world’s fastest processor does me no good if my disk subsystem isn’t sized appropriately.

When disk subsystems get refreshed, questions crop up. What’s the method for migrating from an EMC disk subsystem to a new IBM disk solution? Should you back up your data from one storage unit and restore it to another using tape? Maybe you should export a LUN from the old and the new disk units and then mirror them using LVM in AIX. What if you want your production data to be stored in a new disk subsystem, but you also want a flash backup copy of it so that data is available on older, slower disk?

If you’re unfamiliar with IBM’s SAN Volume Controller (SVC), get to know this product. IBM’s recent announcement of SVC Version 4.3 is a good starting point.

According to the announcement, SVC allows you to:

  • Combine storage capacity from multiple vendors for centralized management.
  • Increase storage utilization by providing more flexible access to storage assets.
  • Improve administrator productivity by enabling management of pooled storage from a single interface.
  • Insulate host applications from changes to the physical storage infrastructure.
  • Enable a tiered storage environment to match the cost of storage to the value of data.
  • Apply common network-based copy services across storage systems from multiple vendors.
  • Support data migration among storage systems without interruption to applications.

More from IBM:

“System Storage SAN Volume Controller Software in version 4.3.0 further extends its dynamic and high availability storage management capabilities with the introduction of space-efficient VDisks and VDisk mirroring functions. Space-efficient VDisks add the capability to define virtual disk capacity that is separate from the physical disk capacity, and use only the physical disk capacity required to store the data.

“VDisk mirroring offers a significant improvement for high availability SVC configurations by providing the capability to have a VDisk supported by two sets of physical managed disks (MDisks) in different managed disk groups on different storage controllers.

“SVC copy services are further enhanced by allowing FlashCopy to be used with the new space-efficient VDisks to yield a space-efficient FlashCopy capability, which combines with the support for up to 256 FlashCopy targets to enable more frequent FlashCopy while improving physical disk usage.”

For a basic introduction to the SVC, there’s always Wikipedia:

“SVC uses an in-band architecture which means that data flowing between a host and a storage controller flows through an SVC node. On the front end, SVC presents an interface to a host which looks like a storage controller (like a target). On the SVC’s back end, it provides interface to a storage controller that looks like a host (like an initiator).

“SVC holds the current Storage Performance Council (SPC) world record for SPC-1 performance benchmarks, returning over 272K iops (release 4.2.0). There is no faster storage subsystem benchmarked by SPC. The SPC-2 benchmark also returns a world leading measurement over 7GB/s throughput.”

Many of you might be thinking, “I’m a system admin, not a storage guy.” But outside of large specialized environments, the system admin and the storage guy are often the same person–or at least they work closely together to keep the organization’s data accessible and available.

So, chances are, making disk administration easier and data more highly available in your environment is part of your job. Check into the SVC and see if it can help you.

Once More: How Much is Your Data Worth?

Edit: Another oldie but a goodie, backups are still relevant, although the available tools make it even easier to set it and forget it.

Originally posted June 10, 2008 on AIXchange

Recently I covered the topic of server backups. Though this post doesn’t pertain directly to your back-end server environment, I still think the topic needs more exploration.

How much is your data worth?

Pause for a minute and really think about it. How much is the data on your laptop worth after you suffer a disk crash and you can no longer get to it? Do you have VPN connection information, documentation, configurations, procedures, etc.? On personal machines, do you have pictures, letters, scripts and/or financial information that you might want to keep?  Do you have any information stored on your machines that would need to be recreated, or would it be lost forever? Certainly you have applications loaded and configured, and your desktop is set up the way you like it.

After a theft, hardware failure or human error has occurred is the wrong time to ask yourself about your personal data backup strategy.

Many organizations offer space in a SAN environment and tell you to keep files in that shared space as it is being properly backed up on the back end. This is great when you have network connectivity, but when you’re disconnected from the network, it might not be as useful. Although cellular data and wifi coverage is good, it’s not everywhere yet, and the bandwidth to move some of these files might be an issue.

There are several methods to look into, from taking images of your hard drive with software that you can use to restore onto replacement hardware to writing scripts with cygwin and rsync. For critical data, use some of your USB thumb drives and copy information there. Look into automated tools to handle these chores if you don’t trust yourself to remember to handle it manually. Have cron send you a reminder message at appropriate intervals to be sure your data is protected.

So how much is your data worth? If your answer is nothing, then you have nothing to worry about. But if the thought of your machine no longer booting makes your heart skip a beat, now’s the time to take action.

When Maintaining Your IT Environment, Little Things are Worth the Effort

Edit: Another post that still rings true today.

Originally posted June 3, 2008 on AIXchange

For a lot of us in North America, a chunk of our springtime is devoted to yardwork. Winters can be harsh, and yards and gardens need care. So we remove clutter, trim plants and pull weeds. And now, as we near summer, we–make that our kids–are mowing the lawn regularly.

While yardwork is a seasonal chore, when it comes to maintaining your IT environment, there’s always work that needs doing. So are all of your patches and microcode up to date? Do you have old users that can be removed? Can you reclaim disk space by removing old file systems that are no longer in use?

Are the tools and scripts in place to automatically document your machines? Is useful performance information being automatically collected? Are these reports being sent to a back-up machine so that they can still be analyzed if your source machine is no longer responding? Are your machines and cables labeled correctly? Is your documentation up to date?

These items may not seem as urgent as a run-away job or problem tickets that must be dealt with, but you need to plan for them. Because if you don’t, your IT environment will grow out of control like a weed-infested lawn, and you’ll eventually find yourself with a mess. Constantly caring for your machines may seem like more work, but the attention to detail ultimately makes your job easier.

VIOS Update Now Available From IBM

Edit: You had better not still be running this code anywhere.

Originally posted May 27, 2008 on AIXchange

A VIOS update is now available. “Fix Pack 11.1 provides a migration path for existing Virtual I/O Server (VIOS) installations. Applying this package will upgrade the VIOS to the latest level, V1.5.2.1.”

Read about all of the enhancements on the IBM Web site.

Be sure to note:

“In order to take full advantage of all the function available in the VIOS (including SEA bandwidth apportioning.), it is necessary to be at system firmware level SF235 or later. SF230_120 is the minimum level of SF230 firmware supported by the Virtual I/O Server. If a system firmware update is necessary, it is recommended that the firmware be updated before upgrading the VIOS to V1.5.2.1.”

Here’s a sampling of what you will find on the Web site. Many more items are listed there that you can read about by selecting the above link.

• Added support for Bandwidth apportioning for Shared Ethernet Adapter.
• Added support for a Shared Ethernet Adapter accounting tool (new CLI command “seastat”).
• Added virtual switch support to Partition Mobility commands.
• Added CLI command for new SEA command “seastat”
• Improved topas reporting for SEA, EtherChannel and VLAN.
• Added new VLAN attribute called vlan_priority to VLAN pseudo device.
• Added LoopBack Device Support.

Another important thing to keep in mind according to the IBM Web site:

“If you are updating from an ioslevel prior to 1.3.0.1, the updateios command may indicate several failures (i.e. missing requisites) during fix pack installation. These messages are expected. Proceed with the update if you are prompted to “Continue with the installation [y/n]”.

Here’s the list of failures that I saw in my /home/padmin/install.log file when I ran the update:

SELECTED FILESETS: The following is a list of filesets that you asked to install. They cannot be installed until all of their requisite filesets are also installed. See subsequent lists for details of requisites.

    bos.ecc_client.rte 5.3.8.0                    # Electronic Customer Care Run…
    bos.esagent 6.5.8.0                            # Electronic Service Agent
    bos.sysmgt.nim.spot 5.3.8.0               # Network Install Manager – SPOT
    bos.sysmgt.trcgui_samp 5.3.0.30        # Trace Report GUI
    ifor_ls.msg.en_US.java.gui 5.3.7.0     # LUM Java GUI Messages – U.S….
    rsct.msg.EN_US.basic.rte 2.4.0.0        # RSCT Basic Msgs – U.S. Engli…
    rsct.msg.en_US.basic.rte 2.4.0.0        # RSCT Basic Msgs – U.S. English

MISSING REQUISITES: The following filesets are required by one or more of the selected filesets listed above. They are not currently installed and could not be found on the installation media.

    bos.sysmgt.nim.spot 5.3.0.0               # Base Level Fileset
    bos.sysmgt.trcgui_samp 5.3.0.0          # Base Level Fileset
    ifor_ls.java.gui 5.3.0.0                       # Base Level Fileset
    lwi.runtime 5.3.8.0                             # Base Level Fileset
    rsct.basic.rte 2.4.0.0                          # Base Level Fileset

From the IBM documentation:

VIOS V1.5.2 provides several key enhancements in the area of POWER Virtualization.

  • VIOS network bandwidth apportioning. The bandwidth apportioning feature for the Shared Ethernet Adapter (SEA), allows the VIOS to give a higher priority to some types of packets. In accordance with the IEEE 802.1q specification, VIOS administrators can instruct the SEA to inspect bridged VLAN-tagged traffic for the VLAN priority field in the VLAN header. The 3-bit VLAN priority field allows each individual packet to be prioritized with a value from 0 to 7 to distinguish more important traffic from less important traffic. More important traffic is sent faster and uses more VIOS bandwidth than less important traffic.
  • Virtual I/O Server Command Line Interface (CLI) was enhanced to support Image Management commands. The CLI is in a unique position to have access to virtual disks and their contents. Users will be able to make a copy of virtual disks and install virtual disks using the Image Management command, cpbd. This command will allow other programs to create and copy virtual disk images.
  • The Virtual I/O Server runs on Internet Protocol version 6 (IPv6) networks, therefore users can configure IPv6 type IP addresses.
  • The updates to the Systems Planning and Deployment tool include updates to ensure none of the existing VIOS mappings are changed during the deployment step.

For me it was a smooth update. I downloaded the code to my machine (it was a pretty big download; you might also think about ordering the updates on CD depending on your network connection), ran updateios -commit and then updateios -accept -install -dev /mnt. (Since I was doing this over an nfs connection, I had mounted the remote filesystem that I needed to /mnt.)

When it was done, and I rebooted the VIOS and saw this output:

$ioslevel
1.5.2.1-FP-11.1

I’ll keep an eye on it and notify you of any issues.

Can You Restore? Now’s the Time to Find Out

Edit: Still true today, have you tested your backup lately?

Originally posted May 20, 2008 on AIXchange

A buddy recently told me about a situation he encountered where a non-disruptive disk update on a storage area network proved extremely disruptive. The client lost its LUNs, which impacted all of the LPARs that were booting from SAN, along with many machines that had datavg stored out in the SAN.  Rootvg on these partitions was gone.

Still, the client figured it could restore its environment from their most recent backups. However, this was a test environment. There were no backups. The machines were built and used, with no thought ever given to recovering them. Since this wasn’t production, it wasn’t important enough to add jobs to the backup server to accommodate these machines, right?

This was another painful lesson learned. The machines had to be rebuilt. The scripts, users, tools and code that was being developed were lost.

I, for one, have covered this topic repeatedly, because it’s important. Yet some continue to ignore the message. I still find environments that don’t have consistent, recent backups. Yes, IBM hardware is robust, and it’s known for reliability and availability. But even the best hardware is still man-made, and machines break. And even the most experienced administrators logon and make mistakes. Stuff happens.

To this, some respond: “I have nothing to worry about in my environment. I take my weekly mksysb. I send the tapes offsite. I take my nightly backup. It goes over the network to my backup server, which we then offload to tape, which we then take offsite.”

But do you test those backups? Are you sure the tapes can be read? I’m not even talking about a true disaster-recovery scenario, where your building burned down and you’re trying to rebuild. I’m talking about simply trying to restore data to machines in your current environment.

When you really need to restore isn’t the time to find out that you can’t restore. It’s critical that you take the time to confirm this now, when things are still running smoothly.

Ask yourself what will be lost if you must go back to your last backup. Did you take that backup last night? If your outage happened right before closing time, is your business prepared to rollback to last night’s backup and redo the work that was done during the day?

More and more the answer is no: Outages any kind and any length cannot be tolerated.

Constantly look at your environments. If you add file systems to machines, are the new file systems being backed up? Is the frequency of the backups acceptable to the business? Does management agree with your conclusion?

The objective of a backup is to be able to restore. Be sure that you can.

Another Great AIX Script

Edit: I love revisiting these scripts, and I wonder if anyone still runs them.

Originally posted May 13, 2008 on AIXchange

I recently saw another great script from the mailing list, written by Dean Roswell. To get it working on my machine, I loaded these rpms from the AIX Toolbox CD:

tcl-8.4.7-3.aix5.1.ppc.rpm
tk-8.4.7-3.aix5.1.ppc.rpm
expect-5.42.1-3.aix5.1.ppc.rpm

Then I followed the instructions in the script’s introductory comments (seen in the script listed below) and set up the ssh keys to allow automatic login from my AIX machine to my VIO server. After editing the list of VIO servers in the script to match what I had in my environment, I was able to use this script to send out the same command to multiple VIO servers at the same time.

They gave some examples from the e-mail I saw.

Syntax:

root@coenim:/:# dshvio -?
Valid parameters are:
-r for a root command
-n for a list of VIO servers
-n vios1
-n vios1,vios2

A regular VIOS command:

root@coenim:/:# dshvio ioslevel
=========================
VIO server –> coevios1
=========================
1.5.1.1-FP-10.1
=========================
VIO server –> coevios2
=========================
1.5.1.1-FP-10.1

An example of running a command through oem_setup_env automatically:

dshvio -r fget_config -Av
=========================
VIO server –> coevios1
=========================

—dar0—

User array name = ‘COE-DS4700’
dac0 ACTIVE dac1 ACTIVE

Disk DAC LUN Logical Drive
hdisk2 dac1 0 coenim_nim
hdisk3 dac1 1 coesap01_disk1

=========================
VIO server –> coevios2
=========================

—dar0—

User array name = ‘COE-DS4700’
dac0 ACTIVE dac1 ACTIVE

Disk DAC LUN Logical Drive
hdisk2 dac1 0 coenim_nim
hdisk3 dac1 1 coesap01_disk1

And here’s a command that I ran in my environment:

./dshvio.ksh -r “lscfg | grep disk”
=========================
VIO server –> vios1
=========================

vios1:/home/padmin # oem_setup_env
+ hdisk0           U787F.001.DPM1XRD-P1-T10-L3-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
+ hdisk1           U787F.001.DPM1XRD-P1-T10-L4-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
+ hdisk2           U787F.001.DPM1XRD-P1-T10-L5-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
=========================
VIO server –> vios3
=========================

vios3:/home/padmin # oem_setup_env
* hdisk0           U787F.001.DPM1XRD-P1-C1-T1-W5005076801103022-L0
        MPIO Other FC SCSI Disk Drive
* hdisk1
U787F.001.DPM1XRD-P1-C6-T1-W5005076801202FFF-L1000000000000  MPIO
Other FC SCSI Disk Drive
* hdisk2           U787F.001.DPM1XRD-P1-C6-T1-W5005076801202FFF-
MPIO Other FC SCSI Disk Drive

This means that I can run commands across all of my VIO servers from my AIX machine, in either the padmin or oem_setup_env environment when I specify the -r option. I welcome any changes or improvements that readers might suggest, and I hope that Dean continues to share these useful tools with us.

#!/bin/ksh
# Created by Dean Rowswell, IBM, April 26, 2006
# Modified by Dean Rowswell, IBM, October 11, 2007
#       Added a check for the -r flag for a root user command
#       NOTE: this flag will require the expect RPM package to be installed
# Modified by Dean Rowswell, IBM, October 12, 2007
# Added a check for the -n flag to specify a single or multiple VIO servers
#
#——————————————————————————
# Assumption: this server is a trusted host for running ssh commands to
# the VIO server(s)
#   To set this up:
#     ssh-keygen -t dsa (press ENTER for all prompts)
#     scp $HOME/.ssh/id_dsa.pub padmin@VIOserver:.ssh/authorized_keys2
#
# NOTE: if the VIO server responds with “rksh: ioscli:  not found” then
# login to the VIO server and change to the root shell via oem_setup_env.
# Edit /etc/ssh/sshd_config
#       Change: PermitUserEnvironment no
#       To: PermitUserEnvironment yes
#       Run: stopsrc -s sshd ; startsrc -s sshd
#——————————————————————————
#===========================================================#
# Define the list of VIO servers in this variable
#===========================================================#
VIOS=”vios1 vios3″
#===========================================================#

DisplayUsage() {
echo “Syntax: dshvio COMMAND\n  Run dshvio -? for the valid parameters”
exit
}

if [ ${#*} -eq 0 ]
then
      DisplayUsage
else
      while getopts :rn: PARMS
        do
         case $PARMS in
          r) lslpp -L|grep -w expect >/dev/null
               if [ $? -ne 0 ]
                 then
                   echo “ERROR: cannot use -r flag because expect\
RPM package is not installed”
                   exit 1
                 else
                   ROOT=1
               fi ;;
          n) VIOS=${OPTARG}
                VIOS=`echo ${VIOS}|sed ‘s/,/ /g’`;;
          ?) echo “Valid parameters are:\n  -r for a root command\n\
  -n for a list of VIO servers\n  -n vios1\n  -n vios1,vios2″ ; exit ;;
         esac
        done

        shift $(($OPTIND -1))
        VIOSCMD=${*}
        if [ ${#VIOSCMD} -eq 0 ]
        then
                DisplayUsage
        fi
fi

for VIO in ${VIOS}
do
  ping -c1 ${VIO} >/dev/null 2>/dev/null
if [ $? -eq 0 ]
    then
    echo “======================\nVIO server –> ${VIO}\n\
======================”
        if [ ${ROOT:=0} -ne 1 ]
        then
         ssh padmin@${VIO} “ioscli ${VIOSCMD}”
        else
         expect -c “spawn ssh padmin@${VIO} ;expect \”\$\*\”;\
send \”oem_setup_env\\r\”;expect \”\#\*\”;send \”${VIOSCMD}\\r\”;\
send \”exit\\r\”;expect \”\$\*\”;send \”exit\\r\””|egrep -v “^spawn\
|^Last|oem_setup_env|^exit|^#”
        fi
     else
        echo “===================\\nVIO server –> ${VIO}\n\
===================”
        echo “VIO server: ${VIO} is not responding”
     fi
done

Making the Case for AIX and Power Systems

Edit: IBM’s Virtualization is still as powerful today, if not more so.

Originally posted May 6, 2008 on AIXchange

I recently received an e-mail from a mailing list that linked these documents from The Sageza Group (link not active) and Forrester Research. Both reports offer information that may help non-technical personnel understand the value proposition of AIX and Power Systems.

In “The Value of PowerVM Workload Partitions New Virtualization Options in IBM AIX v6.1,” Sageza focuses on on workload partitions (WPARs). I’ve previously covered WPARs herehere and here.

While the Sageza report is apparently accessible online, Forrester’s report, “Virtualization Trends On IBM’s System p: Unraveling The Benefits In IBM’s PowerVM,” seems to be available only through registration and subscription. From Forrester:

“IBM’s PowerVM (formerly Advanced POWER Virtualization) technology has catalyzed the consolidation of server systems resources and a variety of applications workload types–both AIX- and Linux-led–as virtualized on more powerful multi-core System p servers. Evolution of IBM’s virtualization stack has improved dramatically–from its early 2001 introduction of logical partitions on the first multicore POWER4-based systems–to its current PowerVM virtualization stack. In 2007, IBM’s refresh to POWER6 came fast and furious: debuted with high-end System p 570 (in May), followed by the P6-based JS22 blade (November), and sweeping through the System p 520 (entry) model and System p 550 (midrange) server (January 2008). Traction for PowerVM virtualization now accounts for 70 percent of its IT customer base — showing to what extent IBM’s virtualization stack has become a shortlist contender as a systems consolidation enabler. The November 2007 release of AIX 6 added two breakthrough features–Live Partition Mobility and Live Application Mobility–further cementing IBM’s advanced virtualization advantages against its Unix competitors.

“1. What’s the history behind the IBM virtualization stack?
2. What is the overall business value of the System p virtualization stack?
3. What role does the POWER Hypervisor play in the PowerVM?
4. How does the POWER Hypervisor integrate with PowerVM technologies?
5. What are the benefits of micro-partitioning and the shared-processor pool?
6. What are workload partition (WPAR) and Live Application Mobility?
7. What business problems are solved with PowerVM’s new Live Partition
Mobility?”

And, from Sageza:

“Many organizations have embraced virtualization to improve IT utilization and reduce the expenses associated with equipment acquisition, installation and operation. While traditional virtualization or partitioning schemes have improved IT resource utilization, reducing the number of physical servers has not reduced the number of server operating system (OS) images requiring administration and maintenance. If anything, virtualization has encouraged growth in the number of servers that support the application workloads in organizations. There is an opportunity for IT to reduce this administrative overhead to become more streamlined and cost-efficient while continuing to provide the levels of service on which organizations have become dependent.

“IBM AIX 6.1, through its support for Workload Partitions, enables organizations to rethink the way they deploy multiple workloads on a single server. While traditional approaches such as virtualization using logical partitioning provide OS isolation and independence, for many workloads, this degree of isolation exceeds the user’s need and results in unnecessary administrative and operational overhead.

“WPARs offer IT managers a more cost-effective yet secured approach that meets the needs of many organizations. WPARs differ from other partitioning or virtualization schemes in that they partition server resources by the workload and share access to a single OS image. This is in contrast with the more common approach of creating a discrete operating system image to support each virtual server. By reducing the number of OS images required, the level of server software maintenance and other related IT administrative and management activity can be decreased while maintaining streamlined operational management and reduced need for physical resources.

“WPARs increase resource utilization from the typical 5-20 percent average, reduce partition creation and teardown times, and reduce the number of OS instances and associated system management workload. WPARs provide standard application environments, support mobility and templates as well as cloning, and have automated policy-based resource and workload management through the WPAR manager. Consolidating with WPARs saves floor space and reduces the power consumption and expense associated with servers and air conditioning in the data center while maintaining the one-app/one-server deployment paradigm.

“In this paper, we examine the flexibility that WPARs offer IT professionals in their virtualized UNIX server environments. In particular, we review how WPARs are different from other partitioning technologies and how WPARs complement existing environments. We discuss the capabilities and practical uses of WPARs in sample scenarios and articulate the ways in which WPARs provide an alternative to other partitioning schemes. Through AIX 6.1 and its support for WPARs and PowerVM Live Application Mobility, IT managers have greater flexibility in server configuration and can select the best approach to meet the user organization’s needs while also simplifying the operational and cost efficiency of the IT environment.”

PowerVM Redbook Recommendation

Edit: Still useful concepts to study and be familiar with

Originally posted April 29, 2008 on AIXchange

If you’re working with PowerVM but haven’t kept up with the changes, or if you’re new to virtualization, then the updated Redbook, “PowerVM Virtualization on IBM System p: Introduction and Configuration” (4th edition), should be required reading. It serves as everything from an introduction to virtualization to a cookbook for setting up dual VIO servers for redundancy. Much of this post quotes directly from the Redbook, as I don’t think I can say it any better than the authors have.

From the abstract:

“This IBM Redbook provides an introduction to PowerVM virtualization technologies on IBM System p servers. The Advanced POWER Virtualization features and partitioning and virtualization capabilities of IBM Systems based on the Power Architecture have been renamed to PowerVM.

“PowerVM is a combination of hardware, firmware and software that provides CPU, network and disk virtualization. The main virtualization technologies are:

* POWER5 and POWER6 hardware
* POWER Hypervisor
* Virtual I/O Server

“Though the PowerVM brand includes partitioning, software Linux emulation, management software and other offerings, this publication focuses on the virtualization technologies that are part of the PowerVM Standard and Enterprise editions.

“This publication is also designed to be an introduction guide for system administrators, providing instructions for:

* Configuration and creation of partitions and resources on the HMC
* Installation and configuration of the Virtual I/O Server
* Creation and installation of virtualized partitions

“While discussion in this publication is focused on IBM System p hardware and AIX, the basic concepts can be extended to the i5/OS and Linux operating systems as well as the IBM System i hardware.

“This edition has been updated with the new features available with the IBM POWER6 hardware and firmware.”

And, from the introduction:

“The first edition of this publication was published over three years ago. Since then the number of customers using Advanced POWER Virtualization (currently named PowerVM) editions on IBM System p servers has grown rapidly. Customers use PowerVM in a variety of environments including business-critical production systems, development, and business continuity. This fourth edition includes best practices learned over the past years to build on the foundation work of the previous versions of the Redbook.

“This publication targets customers new to virtualization as well as more experienced virtualization professionals. The publication is split into four chapters, each with a different target audience in mind.

“Chapter one is a high-level introduction for those wanting a quick overview of the technology.

“Chapter two is a slightly more in-depth discussion of the technology aimed more
at the estate- or project-architect for deployments.

“Chapters three and four are aimed at professionals who are deploying the technology. Chapter three works through a simple scenario and Chapter four introduces the more advanced topics such as VLANs, Multiple Shared Processor Pools and Linux. Additionally it will introduce the techniques that can be used to provide the periods of continuous availability required in production systems.”

Be sure to look for the shaded sections throughout the book, which include different sections labeled Important, Note, and Tip. Reading and understanding these will save you headaches when deploying your machines. For example, take a look at this from the SEA section:

“Note: A Shared Ethernet Adapter does not need to have IP configured to be able to perform the Ethernet bridging functionality. It is very convenient to configure IP on the Virtual I/O Server. This is because the Virtual I/O Server can then be reached by TCP/IP, for example, to perform dynamic LPAR operations or to enable remote login. This can be done either by configuring an IP address directly on the SEA device, but it can also be defined on an additional virtual Ethernet adapter in the Virtual I/O Server carrying the IP address. This leaves the SEA without the IP address, allowing for maintenance on the SEA without losing IP connectivity if SEA failover has been configured. Neither has a remarkable impact on Ethernet performance.”

With this Redbook and a test environment, it wouldn’t take long to better understand the topics presented.

Script Changes

Edit: I wonder if this script is still running in the wild.

Originally posted April 22, 2008 on AIXchange

I received an interesting e-mail from a mailing list. Included was this information submitted by Dean Rowswell:

1. Turn on PuTTY logging
2. Copy and paste these 6 commands first:

lshmc -v
lshmc -V
lshmc -r
lshmc -n
lshmc -b
lssysconn -r all
lssyscfg -r sys
lssyscfg -r frame
lshmc -n -F clients
cat /opt/hsc/data/.hmc/.removed
lspartition -dlpar
lspartition -sfp

3. Copy and paste these commands last:

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
echo ” ============MANAGED SYSTEM –> LIC level”
lslic -m ${MANAGEDSYS} -t sys
echo ” ============MANAGED SYSTEM –> processor config”
lshwres -m ${MANAGEDSYS} -r proc –level sys
echo ” ============MANAGED SYSTEM –> lpar processor usage”
lshwres -m ${MANAGEDSYS} -r proc –level lpar -F
lpar_name:curr_proc_mode:curr_sharing_mode:run_proc_units:run_procs
echo ” ============MANAGED SYSTEM –> memory config”
lshwres -m ${MANAGEDSYS} -r mem –level sys
echo ” ============MANAGED SYSTEM –> lpar memory usage”
lshwres -m ${MANAGEDSYS} -r mem –level lpar -F lpar_name:run_mem
echo ” ============MANAGED SYSTEM –> lpar status”
lssyscfg -m ${MANAGEDSYS} -r lpar -F name:state
echo ” ============MANAGED SYSTEM –> cuod processor config”
lscod -m ${MANAGEDSYS} -t cap -r proc -c cuod
echo ” ============MANAGED SYSTEM –> cuod memory config”
lscod -m ${MANAGEDSYS} -t cap -r mem -c cuod
echo ” ============MANAGED SYSTEM –> drawer config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype unit
echo ” ============MANAGED SYSTEM –> bus config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype bus
echo ” ============MANAGED SYSTEM –> slot config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot
echo ” ============MANAGED SYSTEM –> slot config summary”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
drc_name:description:lpar_name:lpar_id
echo ” ============MANAGED SYSTEM –> virtual ethernet”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype eth –level sys
echo ” ============MANAGED SYSTEM –> virtual ethernet all lpar”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype eth –level lpar
echo ” ============MANAGED SYSTEM –> virtual scsi all lpar”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype scsi –level lpar
for LPAR in `lssyscfg -r lpar -m ${MANAGEDSYS} -F name`
do
echo ” ============LPAR –> ${LPAR} –> CPU resources”
lshwres -r proc -m ${MANAGEDSYS} –level lpar –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Memory resources”
lshwres -r mem -m ${MANAGEDSYS} –level lpar –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Physical adapters”
lshwres -r io –rsubtype slot -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Virtual Ethernet config”
lshwres -r virtualio –rsubtype eth –level lpar -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Virtual SCSI config”
lshwres -r virtualio –rsubtype scsi –level lpar -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> LPAR config”
lssyscfg -r lpar -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> LPAR profiles”
lssyscfg -r prof -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
done
done

4. Copy and paste these lines to add the information to my LPAR resource allocation spreadsheet
NOTE: when I do a paste into Excel click on the paste options and select “Text import wizard”

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description
done

For System p machines that already have the physical resources assigned:

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description:lpar_id:lpar_name
done

5. To capture any errors and events

lssvcevents -t hardware
lssvcevents -t console

After messing around with the script, I wanted to get it working from cron. I’m unable to run scripts on my HMC, but after looking here (link not active), I set up my ssh keys so I could auto login from my AIX machine to my HMC.

Then I modified the above script so I could run it from my AIX machine and have it connect to my HMC using ssh. Now I can run the job out of cron on my AIX machine instead of messing with putty.

#!/usr/bin/ksh
#
# scriptname -m hmchostname -l hmcuser
#
#
hmc=hmc.ip.address

user=hscroot

while getopts m:l: option
do
case $option in
m) hmc=”$OPTARG”;;
l) user=”$OPTARG”;;
esac
done

echo “HMC Information:”
echo “”
ssh $hmc -l $user ‘date’
ssh $hmc -l $user ‘lshmc -v’
ssh $hmc -l $user ‘lshmc -V’
ssh $hmc -l $user ‘lshmc -r’
ssh $hmc -l $user ‘lshmc -n’
ssh $hmc -l $user ‘lshmc -b’
ssh $hmc -l $user ‘lssysconn -r all’
ssh $hmc -l $user ‘lssyscfg -r sys’
ssh $hmc -l $user ‘lssyscfg -r frame’
ssh $hmc -l $user ‘lshmc -n -F clients’
ssh $hmc -l $user ‘cat /opt/hsc/data/.hmc/.removed’
ssh $hmc -l $user ‘lspartition -dlpar’
ssh $hmc -l $user ‘lspartition -sfp’

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”

echo ” ============MANAGED SYSTEM –> LIC level”
ssh $hmc -l $user “lslic -m ${MANAGEDSYS} -t sys”
echo ” ============MANAGED SYSTEM –> processor config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r proc –level sys”
echo ” ============MANAGED SYSTEM –> lpar processor usage”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r proc –level lpar -F
lpar_name:curr_proc_mode:curr_sharing_mode:run_proc_units:run_procs”
echo ” ============MANAGED SYSTEM –> memory config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r mem –level sys”
echo ” ============MANAGED SYSTEM –> lpar memory usage”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r mem –level lpar -F
lpar_name:run_mem”
echo ” ============MANAGED SYSTEM –> lpar status”
ssh $hmc -l $user “lssyscfg -m ${MANAGEDSYS} -r lpar -F name:state”
echo ” ============MANAGED SYSTEM –> cuod processor config”
ssh $hmc -l $user “lscod -m ${MANAGEDSYS} -t cap -r proc -c cuod”
echo ” ============MANAGED SYSTEM –> cuod memory config”
ssh $hmc -l $user “lscod -m ${MANAGEDSYS} -t cap -r mem -c cuod”
echo ” ============MANAGED SYSTEM –> drawer config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype unit”
echo ” ============MANAGED SYSTEM –> bus config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype bus”
echo ” ============MANAGED SYSTEM –> slot config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot”
echo ” ============MANAGED SYSTEM –> slot config summary”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
drc_name:description:lpar_name:lpar_id”
echo ” ============MANAGED SYSTEM –> virtual ethernet”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
eth –level sys”
echo ” ============MANAGED SYSTEM –> virtual ethernet all lpar”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
eth –level lpar”
echo ” ============MANAGED SYSTEM –> virtual scsi all lpar”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
scsi –level lpar”

for LPAR in `ssh $hmc -l $user “lssyscfg -r lpar -m ${MANAGEDSYS} -F name”`
do
echo ” ============LPAR –> ${LPAR} –> CPU resources”
ssh $hmc -l $user “lshwres -r proc -m ${MANAGEDSYS} –level lpar
–filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Memory resources”
ssh $hmc -l $user “lshwres -r mem -m ${MANAGEDSYS} –level lpar
–filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Physical adapters”
ssh $hmc -l $user “lshwres -r io –rsubtype slot -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}”

echo ” ============LPAR –> ${LPAR} –> Virtual Ethernet config”
ssh $hmc -l $user “lshwres -r virtualio –rsubtype eth –level lpar -m
${MANAGEDSYS} –filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Virtual SCSI config”
ssh $hmc -l $user “lshwres -r virtualio –rsubtype scsi –level lpar
-m ${MANAGEDSYS} –filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> LPAR config”
ssh $hmc -l $user “lssyscfg -r lpar -m ${MANAGEDSYS} –filter
lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> LPAR profiles”
ssh $hmc -l $user “lssyscfg -r prof -m ${MANAGEDSYS} –filter
lpar_names=${LPAR}”
done
done

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description”
done

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description:lpar_id:lpar_name”
done

ssh $hmc -l $user ‘lssvcevents -t hardware’
ssh $hmc -l $user ‘lssvcevents -t console’

Feel free to improve upon what Dean and I have done so far. I’ll add your contribution to a future posting.

IBM Power Announcement: Not Just Another Renaming

Edit: There are still ongoing wars over what you call your AS/400, System i, does it run OS/400, i5/OS, IBMi, etc. The change happened more than ten years ago, maybe it is time to call it IBMi on POWER?

Originally posted April 15, 2008 on AIXchange

I spent the first part of my career working on OS/400, but since then I have been much more focused on the System p and AIX. Like a lot of you, I’ve been around long enough to recall when these machines were known as the AS/400 and RS/6000. I was amused when a coworker forwarded me this article on April Fool’s Day. It talks about IBM and the name changes they have made over the years to the AS/400.

Ironically, the day after that article was published, April 2, 2008, IBM announced yet another name change. However, the latest announcements represent much more than a simple name change and a new faceplate on the hardware.

If you haven’t been following this (you have plenty to do keeping machines running from day-to-day, after all), then let me bring you up to speed. IBM is unifying its System i and System p systems onto common hardware platforms. The way the company puts it, there’s a new power equation for the new enterprise data center: Power = i + p.

From IBM’s announcement letter:

“The IBM System i and IBM System p organizations are unifying the value of their server offerings into a single, powerful lineup of Power Systems servers based on industry-leading IBM POWER6 processor technology with support for the IBM i (formerly known as i5/OS), IBM AIX, and Linux operating systems. This new, single portfolio of Power Systems servers offers industry-leading technology, continued IBM innovation, and the flexibility to deploy the operating system that your business requires.

“Specifically, being announced today are: IBM Power 520 Express, IBM Power 550 Express, IBM BladeCenter JS12 Express blade server. All three of these systems can be ordered in the AIX Edition, i Edition or Linux Edition.”

Then last week came another IBM announcement:

“IBM announced two high-end Power Systems models–the world’s fastest UNIX server and a unique water-cooled supercomputer. The new systems offer sophisticated IBM virtualization technology and energy-saving capabilities to help dramatically reduce bottom-line operating costs, such as those for energy, floor space and systems management, while improving system performance, helping clients transition to a new enterprise data center. Beginning today, clients will be able to leverage the world’s most powerful microprocessor, POWER6–with new world-record speeds of up to 5 GHz–in these new systems, leading to significant performance improvements across a wide array of applications.

“The new UNIX enterprise server, the Power 595, designed to extend IBM’s leadership in the UNIX market, will be attractive to existing IBM clients as well as Sun Solaris and HP UNIX users. IBM’s new POWER6 “hydro-cluster” supercomputer, the Power 575, is built to help users tackle some of the world’s most challenging problems in fields such as energy, aerospace and weather modeling. The new super-dense system uses a unique, in-rack, water-cooling system and with 448 processor cores offers users nearly five times the performance and more than three times the energy efficiency of its predecessor, IBM’s POWER5+ processor-based p5-575 supercomputer.

“The new IBM Power 570 is a unified version of the popular midrange POWER6 processor-based System p 570 and the System i 570. Existing customers can update to the new system at no-charge. The Power 570 runs any permutation and combination of i, AIX or Linux partitions offering the ultimate in flexibility and increased asset utilization and reuse. And with PowerVM, Power servers also run many Linux x86 applications.”

So we’ll be able to run i, AIX, Linux on Power, and Lx86 on everything from JS12 and JS22 blades, to the 520, 550, 570, 575 and 595 Power models.

What makes the most sense in your environment? A BladeCenter with some JS12 or JS22 blades running AIX, i or Linux? You could mix those blades in the same chassis with Intel or AMD blades to run whatever Windows or native x86 Linux applications you might require. Or maybe you need an IBM Power 595 running 5ghz POWER6 chips with 4 TB of RAM? You could carve that machine into LPARs running i, AIX and Linux as needed.

The same virtualization options are still available:

Using PowerVM for virtualization, we can “aggregate and manage resources via a consolidated, logical view.”

What do these announcements mean for your organization? Again, the keyword is unification. You may have had separate System i and System p IT teams that managed their own hardware and operating system, with each always figuring that its platform was the best and the most important.

At a minimum, as new Power hardware arrives on the floor, I’d expect more communication between the teams. It makes sense to get some cross-training as we seek ways to make coexisting on the same hardware a reality.

If IT personnel don’t communicate and make an effort to understand the other operating system, if organizations continue to maintain separate computing empires, the capability to run i and AIX on the same hardware will be useless. Although initially it may make sense in some environments to let each group run with its own hardware, that mentality will be harder to justify as management keeps hearing about virtualization and consolidation and the machines keep handling larger workloads.

Of course, since every shop has budget and hardware lifecycle concerns, not everyone will get the new Power hardware right away. But plenty of shops are struggling with older technology that is in need of a refresh. For those organizations that will soon go through the refresh process, be sure to look at the new machines and make the virtualization and consolidation decisions that are right for you.

Getting Started When You’re on Your Own

Edit: It seems to still be an issue where the people that built machines are not necessarily the ones that manage and maintain machines as time goes on.

Originally posted April 8, 2008 on AIXchange

You need to log-on to a machine that you’ve never seen before. There’s no documentation. The administrator who built and ran the machine is no longer with the business and left nothing behind, so nobody has any idea how it was put together. The machine just sits in a corner and runs.

So what are some of the first commands you use to get more information?

To answer this question, you should ask several more. What are you trying to understand? Do you want to know how the disks are laid out? Do you want to know how the application starts? Do you want to know what is running in cron or what applications are running on it now?

This is by no means comprehensive, but a few things that I like to check on a machine would be found in the output from these commands:

oslevel –s
instfix –i | grep SP
hostname
lsdev –Cc adapter
netstat –rn
netstat –in
cat /etc/hosts
cat /etc/resolv.conf
lslpp –l
rpm -a
lscfg
prtconf
lsmcode
lssrc -a
lspv
lsvg
df
lsfs
lsps -a
uname -L
errpt –a
crontab –l

I’ve left a few things off this list. What other commands would you add?

Do you have scripts and tools that you like to load and run on a “foreign” machine? Do you keep copies of how the machine looks so that you can easily compare today’s output to last week’s? Do you have run books or other documentation–not to mention any necessary backups–to use if a machine rebuild is needed?

Although in many cases the people who build the machines are also the ones who run them day to day, it’s not always the case. Sometimes, due to the size and complexity of the environment or the simple fact that people do switch jobs in IT, the guy who loaded and understands the machine may not be available to do work on it now.

That leaves it up to us. We need to be able to log-on to any machine, learn about the environment and then do what needs to be done. It starts with knowing which commands to run.

Getting Hands-On with Live Partition Mobility

Edit: This is still a useful tool for us to utilize.

Originally posted April 1, 2008 on AIXchange

My first experience with live partition mobility came as an observer. A few months back I went to Austin and saw an LPAR move from one POWER6 570 to another.
 
Since then, I’ve become a live partition mobility user. I move running LPARs between my two JS22 POWER6 blades, which are connected to the same SAN and able to see the same LUNs. Once I got it set up, I haven’t had any problems. It functions with the blades just as I saw it work on the larger systems.
 
Checklist
To get started, I made sure I had the enterprise edition of PowerVM. After loading the virtual I/O (VIO) server onto my blade and logging into the Integrated Virtualization Manager (IVM) GUI, I was able to enter my APV key.
 
I did some other checks during set-up. I made sure my network was set to use a shared ethernet adapter instead of a host ethernet adapter, and I made sure the reserve_lock on my SAN disk was set to no_reserve. (It was originally set to single_path.) If you fail to fix your reserve_lock, you’ll need to change it. When searching on “change reserve lock,” I found some documentation which included the following:
 
To change the setting with an LPAR running, from your vio server run:

# chdev -dev hdisk7 -attr reserve_policy=no_reserve
 
Some error messages may contain invalid information for the virtual I/O
server environment.
 
Method error (/usr/lib/methods/chgdisk):
    0514-062 Cannot perform the requested function because the specified
    device is busy.

The error makes sense because we’ve mapped the physical device to a virtual
device.

    $lsmap -vadapter vhost3
    SVSA            Physloc                                      Client Partition ID
    ————— ——————————————– ——————
    vhost3          U7998.61X.100BB8A-V1-C17                     0x00000007
 
    VTD                   vtscsi3
    Status                Available
    LUN                   0x8100000000000000
    Backing device        hdisk7
    Physloc               U78A5.001.WIH0A68-P1-C6-T1-W5005076801303022-L6000000000000
 

Shutdown the LPAR that is using the device.

Remove the virtual adapter and its mapping.

$ rmdev -dev vhost3 -recursive
hdisk7 deleted
vhost3 deleted

Change the reserve_lock setting.

# chdev -l hdisk7 -a reserve_lock=no
hdisk7 changed

Let’s make sure the setting changed.

$ oem_seup_env
(to leave the restricted shell)

# lsattr -El hdisk7
 
Check for
 
reserve_policy  no_reserve

With the changes verified, the next step is to re-create the virtual device using mkvdev.

Using my IVM GUI, I selected the LPAR that I was going to migrate. (In the drop-down menu, go to “mobility” and select “migrate.”) I entered the IP address of the machine I was moving to, followed by the padmin password. Then I selected “validate.” This verified that my LPAR was ready to move. Once it passed the tests, I clicked on the “migrate” option, and my LPAR moved to the other blade.   
 
Keep Running
The value of this technology became clear to me on a recent customer call. My customer was conducting maintenance. I had to bring the machine down, and the users suffered an outage. Had the customer been on POWER6, live partition mobility would have been a perfect solution here. I could have moved the running LPAR and then brought down the source machine without affecting the workload that was running.

I’m sure that as more customers deploy POWER6 technology, we’ll see live partition mobility become more widely adopted. It’s extremely useful technology.

Lx86 Works as Advertised

Edit: now that is a name that I have not heard for a very long time. Some of the links just resolve to generic IBM pages.

Originally posted March 25, 2008 on AIXchange

After hearing so much about Lx86 (formerly known as the System p Application Virtual Environment, or System p AVE), I finally decided to try it out.

Lx86 allows you to run unmodified Intel/x86 Linux binaries on IBM Power hardware. This is significant because the alternative–running Linux applications natively on Power–requires a recompile. This can be painful or impossible, and in fact it’s this reason that many IBM customers choose Intel x86 to run their Linux applications.

For more about how Lx86 works, see the references at the end of this post. Now, I’ll tell you about installing and getting started with Lx86.

First, I set up a trial subscription at Red Hat to get the .iso files I needed. There are 10 files in all–both 32-bit x86 images and 64-bit Power images. Once I downloaded the files, I moved the ppc .iso files to the /var/vio/VMLibrary directory on my virtual I/O (VIO) server, a process I cover here.

This allowed me to use my virtual optical drive to mount the images rather than burn and mount a bunch of physical media.
 
I booted from the first ppc .iso file, and after setting my install selections, I was able to install the machine onto an LPAR. I had to do some loadopt and unloadopt commands to get through all the CDs, but that seemed easier than personally taking a fistful of media to the site.
 
After the installation, I downloaded the code here. (IBM registration required.)
 
As of this writing I was using IBM PowerVM Lx86 V1.1 p-ave-1.1.0.0-1.tar (8652800).
 
I copied the tarball and my x86 .iso files to my newly built Linux on Power installation. I untarred the p-ave .tar file, and ran ./installer.pl. After accepting the license and registering with both IBM and Red Hat, I was ready to load the software. I chose to do a full install, and had it share my /home directories.
 
The installer prompts for the path to your x86 linux .iso files, as it expects to find them loaded on the machine. I gave the installer the correct path, and it examined the .iso files to verify it could find the rpm files it needed. I selected continue and it installed my x86 world. It basically copies all the necessary files to my /i386 directory, so that when you start the environment it can chroot into /i386 and function like this is the / directory.
 
After getting confirmation that System p AVE and x86 World were installed successfully, I was prompted to run /usr/local/bin/runx86 to start a shell. It returned to the menu and I selected Option 6 to quit. You can look at the install log by searching for p-ave_install*log.
 
I am now able to cd /i386 and runx86. Once I do this, the shell runs as it would on an x86 machine. If you run “arch” from your shell before you runx86, you’ll see ppc. If you run arch after you’re in your x86 shell, you’ll see i686.
 
I can now run and install rpms and do anything else as if I were on a regular Linux machine. I ran vncserver and my desktop came up as I would have expected. I moved some rpm files into the environment and installed them with rpm -ivh as I normally would. There are caveats–this isn’t a panacea where any and all applications will run at the same speeds as on native x86 hardware. But in many cases, the performance will be quite good.
 
I found that I wanted to ssh into the x86 World instead of the ppc world, so I opened my console, copied my /etc/ssh/ information from ppc world to x86 world, killed sshd that was running in ppc world, did a runx86 and started sshd from there. Once I did that I could ssh -Y into my x86 world and start exporting X applications to my desktop. I’ll need to read more documentation and play with it more, but on first glance it works as advertised. I was able to simply install rpms and run them.

References

From:   
 
Transitive Corporation, a leading provider of cross-platform virtualization software that enables the execution of applications across diverse computing platforms, today announced that IBM will commence shipping PowerVM Lx86 with all copies of PowerVM Editions, available across its entire line of  System p servers. PowerVM Editions, a set of advanced virtualization offerings developed by IBM for Power Systems platforms, now includes the x86 feature (developed for IBM by Transitive) which simplifies migration of x86 Linux applications onto this popular platform for server consolidation and business application deployment. PowerVM Lx86 allows the creation of an x86 application virtual environment so users may easily install and run a wide range of x86 Linux applications on a Power Systems platform with a Linux for POWER operating system. PowerVM Lx86 allows thousands of x86 Linux binaries to run unmodified and without recompilation on System p servers, helping to bring additio!
nal benefits with IBM PowerVM virtualization to enterprise customers by enabling more applications to be consolidated.
 
From:
 
Up to now more than 2500 applications have been ported to Linux on POWER, but still there are thousands only ported to x86 based platforms. With the IBM PowerVM Lx86 environment, a customer can take the original installation media of a Linux on x86 application and install it as is within a Linux on POWER partition running on IBM System p. There are many workloads that will run well within this environment. There are a few workloads that are not recommended to be run in this environment
From a customer perspective, this environment allows a very transparent and easy way to start taking the benefits of such an advanced infrastructure platform. From an ISV perspective this environment provides an excellent opportunity for a jump-start onto a new marketplace, postponing the decision of the code porting from Linux on x86 to Linux on POWER to a more appropriate moment in time if he judges necessary. It also allows the ISV the opportunity for keeping his development and support costs on a lower level, since there is only a single source (x86-based) code.
 
From:
 
In addition, the performance of some x86 Linux applications running on PowerVM Lx86 may significantly vary from the performance obtained when these applications are run as a native port. There are various architectural differences between x86 and POWER processors which can impact performance of translated applications. For example, translating dynamically generated code like Java byte codes is an ongoing translation process, which can be expected to impact the performance of x86 Java applications using an x86 Java virtual machine. Floating point applications running under x86 have a different default precision level from Power Architecture processors, so translating between these levels can have additional performance penalties. And finally, translating and protecting multi-threaded applications can incur an additional performance overhead as the translator works to manage shared memory accesses. IBM suggests that clients carefully consider these performance characteristics when selecting the best method for enabling applications for their environment.

Updating to a New TL or Service Pack? Call This Doc

Edit: I love that this document is still available.

Originally posted March 18, 2008 on AIXchange

I think you’ll find this IBM support document quite useful. It explains how to upgrade to a new technology level or service pack in AIX.
 
The document describes the recommended processes of updating your system to a new technology level or adding a service pack to an existing technology level. I’ll review some key words and terminology, run through recommended pre-checks, discuss the update_all process using both SMIT and command line and, finally, cover post-checks and FAQ.

Let’s start with the pre-checks. As noted in the document:

“It is recommended to have at least one back-out method available to you in case a restore is required. Recommended back out methods include: mksysb restore, sysback restore, altinst_rootvg clone, and multibos.”
 
This paragraph makes it pretty clear: Do not reject applied filesets with a TL update–use another back-out method if things don’t work out. Be sure to update test machines before production, and actually run tests to ensure that things work as expected on the test machines.
 
The document also tells you how to perform operations like boot image verification and conduct firmware, fileset consistency and free space checks. Then, under the Post Checks heading, is this statement:
 
“Presuming your update_all was successful you will want to check the following commands. If you receive unexpected output please contact the support center for assistance.

# oslevel -r
This should return with the expected TL output

# oslevel -s
This should return with the expected TL and SP output.

# lppchk -v
This should come back ideally with no output.”
 
And be sure to check out the FAQ section. There are some good questions (and answers) here. For instance:
 
Q: Is it okay to install my Technology Level in the “APPLIED” state ? Doesn’t that let me reject them if there is a problem?

A: With the introduction of Technology Levels and the “all or nothing” packaging format, these updates are bringing in on the upwards of 400+ fileset updates for each TL. Attempting to perform a “reject” process on so much code simply doesn’t work well. Recommended back-out methods are discussed earlier in this document.

Q: Does the same hold true for Service Packs?

A: The Service Pack updates are certainly much smaller groups of updates….typically numbering around 40-50 per update. While you certainly will have a better chance of successfully rejecting a 40 filesets instead of 400, it would still be best to have one of the back-out methods mentioned earlier.
 
Q: I need to run my update today but I won’t be able to reboot until next week. Is that a problem?

A: Plans should be made to reboot as soon as the update is complete and checks have been made to ensure there were no failures. System files will have been replaced, but the corresponding kernel and library updates will not be loaded until boot time. You will likely encounter problems if you delay rebooting.
 
Q: Is it recommended to reboot before issuing a TL upgrade?

A: If this is possible, absolutely. There are systems out there that haven’t been rebooted in over a year or more. Who is to say that something hasn’t happened in that time that simply wouldn’t show up until a reboot. Rebooting the system first assures a good boot image, a stable system, and would isolate any problem that normally wouldn’t be caught until the post-update reboot as either a preexisting issue, or an issue directly related to the TL update itself.

Q: Some say to use base AIX install media when updating the TL, others say the TL fix downloads or CDs should be used. Which is right?

A: The recommendation is to use the TL fix downloads from FixCentral, or the TL CDs that can be ordered either online or from AIX SupportLine. You can also use the base AIX installation media, however without getting into a long answer, the recommendation is using the TL fix packages.
 
Q: Is it okay to run the update_all while people are online?

A: Updating could affect running processes. As such, applications should be down and users offline as a general rule.

Even though I’ve quoted a lot here, I suggest you read the whole thing. And, as is noted throughout the document, feel free to contact IBM support with any questions.
 
I’m always on the lookout for good documentation, recommendations, cookbooks and the like. Whenever I find something, I’ll be sure to mention it here.

IBM Support Comes Through

Edit: I always advocate for calling problems into IBM support. This is old information, but I leave it here because you never know what people are still running and what problems they might run into. Why reinvent the wheel?

Originally posted March 11, 2008 on AIXchange

Recently I was working on a customer machine that was giving these lppchk errors: 
lppchk -v
lppchk:  The following filesets need to be installed or corrected to
         bring the system to a consistent state:
 
  bos.rte.xxxxxxx 5.3.7.0         (usr: not installed, root: APPLIED)
 
Using oslevel -s  and instfix -i, I received this output:
 
5300-04-00-0000
 
instfix -i | grep ML
   All filesets for 5.3.0.0_AIX_ML were found.
   All filesets for 5300-01_AIX_ML were found.
   All filesets for 5300-02_AIX_ML were found.
   All filesets for 5300-03_AIX_ML were found.
   All filesets for 5300-04_AIX_ML were found.
   Not all filesets for 5300-05_AIX_ML were found.
   Not all filesets for 5300-06_AIX_ML were found.
   Not all filesets for 5300-07_AIX_ML were found.
 
TL7 had  been applied at some point, but there must have been issues during that install that weren’t caught then. The customer had no backups of the machine prior to the TL7 upgrade. I opened a PMR with IBM and the correct update media was quickly shipped out, but when I tried to install it, I couldn’t due to the state the machine was in.

On my attempts to reinstall, I received this error:
 
fileset is applied on the “root” part but not on the “usr” part.
      Before attempting to re-apply this fileset you must remove its
      “root” part.  (Use the reject facility if the fileset is an
      update.  Remove the fileset via the deinstall facility if it is
      a base level fileset.)
 
If I tried to reject it, I got this error:
 
SELECTED FILESETS:  The following is a list of filesets that you
  asked to reject.  They cannot be rejected until all of their
  dependent filesets are also rejected.  See subsequent lists for
  details of dependents.
 
We tried to force overwrite the fileset, but it gave us errors as well. So I was in a catch-22. But then I called IBM support and referenced the PMR number, and was connected with a knowledgeable AIX support person.
 
We had no mksysb of the machine, and reloading the operating system from scratch was a last resort. I think the IBM representative understood my position. She took the time to help us explore all options before finally having me reload the machine.
 
Thanks to IBM support’s hard work, I was able to resolve the problem by performing “surgery” on the machine’s ODM. Now, I would NOT recommend trying this on a production machine unless support instructs you to do so. (I guess though if you’re on a test machine that you don’t care about destroying if you make a mistake, have it at.)

Here’s a rough idea of what we did to make the machine ignore the broken updated fileset.
 
# export ODMDIR=/etc/objrepos
# mkdir -p /tmp/odmfix
# cd /tmp/odmfix
# odmget -q name=’fileset name’ lpp > lpp.out
==> vi lpp.out to get the lpp_id = ###
# odmget -q lpp_name=’fileset name’ product > product.out
# odmget -q lpp_id=### history > history.out
# vi history.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# vi product.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# odmdelete -q lpp_name=’fileset name’ -o product
# odmdelete -q lpp_id=### -o history
# odmadd product.out
# odmadd history.out
 
After running this procedure, we put the machine in a state where it ignored the broken TL7 file set. At that point, I could reload it. After swapping a few CDs and finishing the TL7 update, the lppchk errors went away.
 
Support also reminded me of something found here:
 
“The rule has been changed that previously allowed applying individual updates/PTFs from a TL. The rule now says that installing a Technology Level is an ‘all or nothing’ operation. Requisites are now added so the whole Technology Level is installed. Before applying a TL, you should always create a backup and plan on restoring that backup if you need to rollback to your previous level.”
 
The IBM rep gave me this explanation: When doing TL updates, plan to commit the fixes rather than apply them, because they don’t support rejection of TL updates. The backout procedure is to restore from mksysb (or boot from your alternate disk if you go this route).

Long story short: Make sure you have valid environments for testing fixes before installing them in production, and always be sure you have good backups.

And the moral of this story? Never take IBM support for granted. Their help is invaluable.

Supporting Users Starts with Data

Edit: Another relevant post with things to consider, it still holds up today.

Originally posted March 4, 2008 on AIXchange

A while back I injured my knee and needed treatment. My doctor referred me to a specialist, who had me fill out some forms since I was a new patient. When I met with the nurse, she looked over my forms and entered the information into the computer. While doing so, she called up my entire patient history, which included details about older prescriptions that I’d forgotten to include on the form. She could do this because the specialist and my primary physician had access to the same database of patient information.

I’ve experienced something similar when calling my ISP to report a network outage. A technician would bring up my call history to access information about issues I’d previously reported. I’ve seen the other side of this, too. I’ve contacted help desks that either didn’t maintain or didn’t bother to check my call history. Instead of immediately responding to my problem, they wasted my time getting basic information. It wasn’t that the technicians were rude or incompetent, but their companies just seem less professional, especially when I compare those encounters to my experiences with the ISP.When supporting users, the more information we track and act upon, the better. I try to be proactive by monitoring machines and networks. When users contact me, I find it extremely helpful to have their call history available. Is the user calling about the same printer problem? Then maybe some hardware needs to be fixed, or the user needs some training.

A friend was telling me about his efforts to figure out why an important network device was intermittently going down at his company. Whenever he thought he’d resolved the issue, it would resurface. It turned out that a user who’d taken an extended leave of absence was periodically coming into the office with a laptop and a dedicated IP address, and that was causing the network conflict. But my friend only figured it out because he checked the problem reports and correlated that with the data being logged on the network.

Systems and network monitoring–along with good ticket-tracking software and procedures–can provide first-level support personnel with the information they need to resolve user problems. Chances are, your company is engaged in these practices. But if you’re not, and you find yourself putting out the same fires and handling the same kinds of problems, maybe it’s time to rethink the way you’re doing things.

Customer Satisfaction Starts with Us

Edit: This is still good information to consider and think about.

Originally posted February 26, 2008 on AIXchange

When assisting customers with their hardware designs, communication is key. Every step of the way we need to educate customers about the configurations we’ve chosen and the thought processes that went into those choices. Then we must listen and address any issues they might have with our proposals.

Rather than push the latest and greatest hardware and virtualization tools, we must recognize what our customers are trying to accomplish and help them understand the tools that can best help them realize their objectives. Ultimately, whether it’s new network gear or System p hardware, we must be sure that the solutions we propose to our customers fit their needs.
 
We may run from customer to customer and implementation to implementation living and breathing PowerVM, CuoD, VIO and LPAR. But if they’ve not had our training and hands-on experience, we may find that the same words have different meanings. For instance, when I say “LPAR,” my customer might hear “all my eggs in one basket.” When I say “virtual I/O server,” my customer might hear “performance bottleneck.” We must be certain we’re speaking the same language. We must be sure that customers understand the pros and cons of any proposed solution.
 
We must also be sure that our customers are up to speed on best practices. We need to explain that we wouldn’t propose a solution to them unless we were convinced it was right for their situation.

And again, we must listen. If we’ve educated a customer on LPAR’s benefits but that customer elects to run on a standalone machine, we should help them with their chosen server solution rather than hammer at an option they’re not interested in at this time. This doesn’t mean that we stop educating customers about virtualization’s many benefits. But it’s a matter of priorities. Ensuring that they’re comfortable with the machines they’ll be running in their environment is foremost.
 
Again, we must be clear on the assumptions we’ve made and the tradeoffs that we took when coming up with our designs. Customers need to know what they’re signing off on. The last thing they want is to discover months down the road that they don’t have the equipment they thought they did. I’ve seen customers who believed that the machines at their production and disaster recovery (DR) sites were identical, only to learn that the DR site was slightly less powerful. Now, in a DR scenario, running a piece of the application and/or using a less powerful machine at the secondary data center may make financial or business sense. But the decision must be communicated to all involved parties. Certainly, system operators need to be aware of these choices long before they start loading new operating systems or testing applications.
 
We’re in this together. Customers want appropriate solutions to solve real business problems, and we want happy and satisfied customers. We have the right hardware and the right tools–it’s up to us to help customers architect the right solutions.

Workload Partition Manager Offers a Better Way

Edit: Guess what I do not run anymore? The links seem to redirect, but I was able to find the information after a little digging. Your mileage may vary.

Originally posted February 18, 2008 on AIXchange

Managing and organizing an environment with several workload partitions (WPARs) running on many different machines can be difficult. To start and stop WPARs, you must log into each individual global instance. But there’s a better way: Workload Partition Manager.

With Workload Partition Manager, you can much more easily see which WPARs are running on your machines. You can also use this product to relocate your WPARs automatically, by establishing policies, instead of moving them manually. While WPARs are built into AIX 6, Workload Partition Manager is a separately purchased program that you should consider if your shop plans on using WPARs.

WPARs have been covered a lot lately — including herehere and here.
 
But in this entry I’ll dig deeper, and focus on how to relocate workload partitions from one machine to another. First, I installed the “IBM Workload Partition Manager for AIX” CD on my AIX 6 machine and read the README.wparmgr.txt file. This info is also available here and here.

Following the readme file, I ran these two commands:

installp -acqgYXd wparmgt.mgr
installp -acqgYXd wparmgt.db
 
Then I ran:
 
/opt/IBM/WPAR/manager/db/bin/DBInstall.sh -dbinstallerdir /db2 -dbpassword
 
and received this output:
 
DBInstall.sh:Database install started. 
DBInstall.sh:Database install successful. 
DBInstall.sh:Database instance creation started. 
DBInstall.sh:Database instance creation successful.
DBInstall.sh:Database creation started.
DBInstall.sh:Database population started.
DBInstall.sh:Database creation successful.
 
The instructions then called for me to “execute the following command with the X11 DISPLAY variable set.” So I pointed my DISPLAY to my X session and ran:
 
# /opt/IBM/WPAR/manager/bin/WPMConfig.sh
 
(Note: You can also use the console version by adding -console to the end of that command.)
 
The WPMConfig.sh command presents a GUI that’s used to configure Workload Partition Manager. I used all of the default values that it presented to me, except when entering the password.
 
Then I installed the agent code on my agent machine with:
 
installp -acqgYXd wparmgt.agent
 
After that, I ran:
 
/opt/IBM/WPAR/agent/bin/configure-agent -hostname
 
(Note: I found one gotcha that revolved around the hostname I used. When I tried host1_aix6, I received this error:

java.net.URISyntaxException: Illegal character in hostname.

The GUI kept displaying “failed registration” in the state column. But when I changed it to host6, the state changed to “online” and it worked fine. Hopefully this tip will help someone out there. You must also be sure that your host names are resolvable in your network.)
 
I was prompted for the password I’d configured earlier. I received this output:
 
Agent Registration Password:
Re-enter Agent Registration Password:
0513-059 The wparagent Subsystem has been started. Subsystem PID is 426010
 
Then I installed the agent on a second machine using the same process.
 
With that, I could I point my browser to my management machine and login by going to:
 
http://<localhost>:14080/ibm/console
 
At this point I could log in with root:password and create and relocate WPARs, change settings, look at error logs, etc.
 
Then I looked at the documentation and on my NFS server and ran:
 
crfs -v jfs2 -m /wparsfs -A yes -a size=1G -g datavg
mount /wparsfs
mknfsexp -d /wparsfs/ -r -B
exportfs

(Note: exportfs should show you the directory that you just exported using nfs.)
 
I went ahead and used the wizard to create a WPAR, following the prompts that were presented to me. They seemed pretty self-explanatory. I made sure to select “enable relocation” when creating my WPAR so that I could test out the relocation of my WPARs from one machine to another. (WPARs can obviously be created from the command line, but I chose the wizard instead to see how it worked. You can also get the ouput that the GUI generates and then save that for later use in scripts, etc.)
 
By selecting the task activity and then the workload partitions tab, I could toggle back and forth between them in case I needed to do some troubleshooting. The task activity tab provided important warnings and information as I set up my WPARs to be “relocatable.”
 
Once I got it all working, I was able to create, deploy and relocate WPARs, which was the point of the exercise. I could also change their properties, all from the GUI. I could easily see which WPARs were defined, active, broken and undeployed, and on which machines. I could use the GUI to create and remove them.

Again, all of these things (save for the actual relocation–this is why you need to purchase the software) can be done from the command line, but Workload Partition Manager makes it much easier to keep track of your WPARs. Familiarize yourself with this valuable tool.

Quick Tips

Edit: The first link no longer works, although when you google for the publication you can find it on other sites. Some of the Linux tips may not work the exact same way, but the principles are the same and the link still works. The Youtube video is gone as well.

Originally posted February 11, 2008 on AIXchange

I’m passing on these tips that I picked up on a mailing list.
 
#1: The first is related to advanced system management interface (ASMI) access. There are now different default IP addresses for POWER6 machines. More information is available in the System p Operations Guide for ASMI and for Nonpartitioned Systems (SA76-0094-02).
 
For the primary service processor use:

HMC1 = 169.254.2.147
HMC2 = 169.254.3.147

For the secondary service processor use:

HMC1 = 169.254.2.146
HMC2 = 169.254.3.146

Incidentally, I covered the topic of using the ASMI to manage your machines here.
 
#2: The second tip involves new service and productivity tools for Red Hat Linux on POWER (non-blade, non-HMC-managed) systems. Check IBM’s Web site for different tabs pertaining to RHEL5, RHEL4 and RHEL3.
 
IBM lists rpms for service aids, hardware inventory, service log, error log analysis, service agent, etc. From IBM:

“The following tools are available for servers running Red Hat Linux that are not managed by an HMC and are not BladeCenter servers. Click the Tool name link for a brief description of the tool. Click the Download link to download the tool package.

“To install and use the following tools under RHEL4, ensure that compat-libstdc++-33-3.2.3-47.3.ppc.rpm is installed from the RHEL4 media. When installing the powerpc-utils and powerpc-utils-papr packages, include the “–force” option on the rpm command line. Tool packages must be installed in the order listed in the table.”

These tools should help make managing Red Hat Linux on Power a bit easier.

#3: Finally, check out this new YouTube video that demonstrates PowerVM Lx86 and Live Partition Mobility. 

If you have tools and tips that others should know about, let me know, or add a comment.

Configuring Your Machine Before it Arrives? Now That’s a Good Plan

Edit: Modified the link to go to a current SPT site.

Originally posted February 4, 2008 on AIXchange

I hope you’re keeping current with the latest version of the IBM System Planning Tool (SPT). From IBM:
 
“The SPT is a browser-based application that helps you design system configurations; it is particularly useful for designing logically partitioned systems. The SPT is integrated with the IBM Systems Workload Estimator (WLE), which enables you to plan a system based on existing performance data or based on new workloads. System plans generated by the SPT can be deployed on the system by the Hardware Management Console (HMC) and Integrated Virtualization Manager. The SPT is available to assist the user in system planning, design, validation and to provide a system validation report that reflects the user’s system requirements while not exceeding system recommendations.”
 
The latest version (as of this writing) is dated Jan. 29 and offers the following improvements. (Again, the list is from IBM.)

  • Added support to convert an HMC or Integrated Virtualization Manager (IVM) system plan to an SPT system plan.
  • Added support to allow comments and order status on expansion units.
  • Added the ability to change order status on multiple items at once.
  • Added support to record the alternate restart device for i5/OS.
  • Added support for partition profile names.
  • Added support to allow utilization of unused dedicated processors.
  • Added capability to copy systems between system plan files. Choose the “Add…” option from the Work with Planned Systems panel and select “Import from another system plan.”
  • Added support for creating Virtual Ethernet Adapters in Linux, AIX and VIOS partitions which communicate on multiple VLANs.

A final thing from IBM:

“SPT 2.0 will be the last release that will support .lvt and .xml files. Users should load their old .lvt and .xml plans and save them as .sysplan files. It is recommended that you take action prior to March 31, 2008.”
 
When you fire it up, you see this message:
 
“The SPT helps you plan the configuration of a partitioned system. You can place your hardware order based on this system plan. You can also use this system plan to automate the creation of logical partitions on the system.”
 
I had a plan that I generated from my HMC, and the first thing I saw was:
 
“System plans generated by the IVM or HMC must be converted into a format that is compatible with the SPT. This wizard steps you through the process of converting a system plan to the correct format. When you finish the wizard, the changes you make are applied and the plan is saved with a new name. You can view the original system plan as you go through the wizard.”
 
So I clicked on the Convert button and was then able to go in and edit the file that came from my HMC.

As the tool evolves and improves, I keep hearing more positive things. Being able to configure your machine before it arrives is a great idea. When you import your system plan to the HMC, you avoid configuring each partition by hand. You can get straight to loading the OS when the hardware arrives on your raised floor instead of spending time configuring the partitions. When you plan on having multiple VIO servers and multiple LPARs, this tool makes things go much more smoothly. It will warn you when you make mistakes or when your partitions aren’t set up properly.
 
Be sure to look at the system plan view from the HMC–it will show you a complete description of how your machine has been set up, along with a graphical view of the actual hardware, the slot numbers that have been assigned to each partition, etc. Download the code and start using it, and keep watching IBM’s Web site for updates and patches.

BladeCenter: More Than Intel Inside

Edit: I can’t remember the last time I messed with blades. A couple of the youtube links no longer work, the hardware links take you to generic IBM pages, entropy lives.

Originally posted February 1, 2008 on AIXchange

I was tuned in for playoff football, but when I was too slow with my remote control and my DVR, I found myself watching an IBM BladeCenter commercial. Surely you’ve caught some of the spots:

http://www.youtube.com/watch?v=bPm4IHY6vvg&NR=1

http://www.youtube.com/watch?v=zuWvBy3Ttc4&feature=related

http://www.youtube.com/watch?v=eGX1QpLIbSA&feature=related

http://www.youtube.com/watch?v=cmmiJJOyJm0&feature=related

A quick search of “IBM blades” on YouTube yields some interesting material, including commercials, demonstrations and comparisons. In many cases you’re pointed to IBM’s website, which also offers useful information, like how the BladeCenters are designed to help reduce power, cooling and cabling costs.

What caught my attention with many of these commercials is the message that the blades have Intel inside. Sure, that’s great for my Windows and Linux administrators, but why should AIX or i5/OS administrators care about BladeCenter? Two reasons: the IBM BladeCenter JS22 Express and the IBM BladeCenter JS21 Express.

The JS22 has POWER6 processors on a blade. When would you consider deploying these POWER blades? Maybe you need to refresh some older standalone machines  with newer hardware. Blades might be great for consolidating smaller machines, or they might make a terrific test lab or QA environment. With a POWER6 processor, this solution may be suitable for larger workloads as well.

Depending on the size of your shop, running some Intel servers along with your AIX/Linux servers in the same chassis could make sense. In my case, I had spare slots in an existing BladeCenter H chassis and was able to quickly and easily load a JS21 and a JS22.  Neither of them had an OS loaded, and although I could have loaded AIX 5.3 or AIX 6.1 directly onto the hardware, I chose VIO 1.5 instead. This loads the Integrated Virtualization Manager (IVM) onto the blade, allowing me to carve up my blades into LPARs using an interface similar to that of the new HMC v7. One of those blades currently runs three LPARs, the other runs four. I run VIO, Linux, AIX 5.3 and AIX 6.1 on the same blade.   

The JS21 can have two internal drives and the JS22 can have one, so to really benefit from running multiple LPARs, I strongly recommend connecting the BladeCenter to a SAN-based storage solution.

How do you begin loading an OS onto the blade? First, run the secure shell command (ssh) and log into the BladeCenter environment. Then run:

console –o –T blade[x]

where x is the blade number that you’re connecting to. From here, load the VIO server. Once the VIO server had an IP address, connect to it, and a Web browser front end comes up. Then log into the IVM and start carving up LPARs.   

For more information, click here.

Creating and Using a WPAR

Edit: How many of you used or still use WPARs?

Originally posted January 21, 2008 on AIXchange

Last week I discussed workload partitions (WPARs) in AIX 6. Now let’s continue with this topic and look at how you actually create and use a WPAR.
 
With WPARs in AIX 6.1, there’s only one copy of the AIX operating system to worry about–it’s called the global instance. From this global instance, you manage your WPARs. Creating a basic WPAR is as simple as entering:
 
mkwpar -n mywpar

and waiting a few minutes.  After the wait is done, enter:
 
startwpar mywpar
 
and you have a running WPAR.   
 
As I previously noted, the IBM Redbook on Workload Partition Mobility gives much more information.   
Here you’ll learn about specification files that you can create so that you can clone your WPARs, the differences between application WPARs and system WPARs, etc.  If you set up networking (or if your hostname already existed in /etc/hosts on your machine when you created your WPAR) then you can ssh or telnet into your WPAR, as if it were any other machine on the network. You can also get a console login by entering:
 
clogin mywpar
 
from the global instance of AIX.
 
Again, from the Redbook:
 
“The separation of user sets (or security domains) between different system workload partitions also enables the system administrators to isolate groups of users logging on in AIX environments according to their application access control requirements. Users defined in one system WPAR are unaware of the applications executing in the global environment or in other WPARs. They cannot see the list of users or processes outside their WPAR.”
 
This means that there’s a different /etc/passwd file and a different root user for the WPAR. You can change the WPAR root password and give it to a junior administrator or database admin, or any users who think that they need root. They can do what they need to do as root, but they don’t effect the AIX global instance. If they break something, they only hurt themselves, not anyone else on the system.

Perhaps, for example, an application runs better when managed using root. Instead of setting up sudo, or a role-based access control (RBAC), just give the user the root password to the WPAR. Think of a chroot jail, or any other virtual environment you’re used to.
 
You cannot see any disks in a WPAR. It lives in a bunch of filesystems in the global instance:
 
/dev/fslv03       262144    208144   21%      1710     7% /wpars/mywpar
/dev/fslv04       131072    128312     3%           5     1% /wpars/mywpar/home
/opt                  262144     54144    80%      2103    26% /wpars/mywpar/opt
/proc                  –         –    –          –     –  /wpars/mywpar/proc
/dev/fslv05       262144    256856    3%           10     1% /wpars/mywpar/tmp
/usr                 3276800   113072   97%     33643    68% /wpars/mywpar/usr
/dev/fslv06       262144   236008   10%         365     2% /wpars/mywpar/var
 
There are flags to encapsulate the whole WPAR into one filesystem on your machine. If you want to set up 10 WPARs on your machine, your /etc/filesystems and df output  in your global instance can get pretty ugly pretty quickly.
 
It is spooky the first time you run lspv and lsvg in WPAR and get nothing in return.   
 
# lspv
# lsvg
0516-318 lsvg: No volume groups found.
 
Be sure to read about the -@ flags that many commands use now. If I’m in my global instance and I want to see the processes running in my WPAR, I can enter:
 
ps -ef -@ mywpar
 
   WPAR      UID    PID    PPID   C    STIME    TTY  TIME CMD
mywpar     root 278754 385194   0    Dec 07      –  0:00 /usr/sbin/syslogd
mywpar     root 315502 385194   0    Dec 07      –  0:00 /usr/sbin/qdaemon
mywpar     root 319598 385194   0    Dec 07      –  0:00 /usr/sbin/sshd
mywpar     root 344148 385194   0    Dec 07      –  0:00 /usr/sbin/writesrv
mywpar     root 348376 385194   0    Dec 07      –  0:00 /usr/sbin/rsct/bin/IBM
mywpar     root 364548 385194   0    Dec 07      –  0:01 /usr/sbin/rsct/bin/rmc
mywpar     root 385194 413910   0    Dec 07      –  0:00 /usr/sbin/srcmstr
mywpar     root 409814 413910   0    Dec 07      –  0:00 /usr/local/bin/aixagen
mywpar     root 413910 200850   0    Dec 07      –  0:00 /etc/init
mywpar     root 426046 413910   0    Dec 07      –  0:00 /usr/lib/errdemon
mywpar     root 430208 413910   0    Dec 07      –  0:00 /usr/sbin/cron
mywpar     root 438510 385194   0    Dec 07      –  0:00 /usr/sbin/rpc.lockd -d
mywpar     root 442490 385194   0    Dec 07      –  0:00 /usr/sbin/portmap
mywpar     root 446646 385194   0    Dec 07      –  0:00 /usr/sbin/inetd
mywpar     root 458986 385194   0    Dec 07      –  0:00 /usr/sbin/biod 6
mywpar     root 463090 385194   0    Dec 07      –  0:04 sendmail: accepting co
mywpar     root 557080 385194   0    Dec 07      –  0:06 /usr/sbin/rsct/bin/IBM
mywpar     root 561182 385194   0    Dec 07      –  0:00 /usr/sbin/rsct/bin/IBM
 
and only see the processes that belong to that WPAR.   
 
This command

topas -@ mywpar
 
also shows interesting output, as there are no disk stats to report.

So read the Redbook, load AIX 6 on a test box and see what else you can do with WPARs. Breathe new life into that old hardware. Yes, POWER6 and APV certainly have their place, but AIX 6.1 gives us new options in the way we manage our environments.

WPAR Mobility has its Benefits

Edit: I have not done much with this lately but it is always fun to look back at what we were able to do with the technology as it evolved.

Originally posted January 14, 2008 on AIXchange

In this post, I discussed a trip to Austin where I had my first chance to look at Live Partition Mobility. You can move an actual running workload from one physical machine to another, and nobody can tell that you’ve made this change–it happens on the fly.

While I was in Austin, there was also some discussion of Live Application Mobility using Workload Partition (WPAR) Manager in AIX 6.1. At the time, I was far more impressed with Live Partition Mobility, since users would experience an interruption with Live Application Mobility. Sticking with Live Partition Mobility and POWER6 seemed like a no-brainer.

To use Workload Partition Mobility, you had to actually check-stop your WPAR; then it would restart on the machine that you moved it to. Although you’d keep track all of your transactions and all of your data that was “in flight” at the time of the move, there would still be a period of time when the application was unresponsive. At first glance, this seemed unacceptable. However, now that I’ve had some time to rethink my position, I can see the benefits of each approach.

Here’s an excerpt from an IBM Redbook on Workload Partition Mobility

“In 2007, IBM System p6 and AIX V6 have two features that seem similar, but are different: WPAR mobility and live partition mobility:

“WPAR mobility, which is discussed in this book, is a feature of AIX V6 and WPAR Manager. It is available on POWER4, POWER5 and POWER6 systems.

“Live partition mobility relies on the POWER6 hardware and hypervisor technology (Advance Power Virtualization). It is available on POWER6 systems only. This feature is also available to AIX 5.3 LPARs.”

If you have older POWER4 hardware that you want to use micropartitions with, you’re out of luck– Advanced Power Virtualization (APV) isn’t supported. But if you didn’t pay for APV with POWER6 or POWER5 hardware, or if you have the older POWER4 hardware, you can try to simulate micropartitions with WPARs. There are tradeoffs–for instance, you won’t get the full benefit of APV using a WPAR–but you can still do some workload consolidation, assuming it makes sense for your environment.

By loading AIX 6.1 on POWER4 or POWER5 machines, you’ll find a whole new way to manage these systems using WPARs. When you set up WPARs, they can dynamically change their CPU and memory usage on the fly. You can create limits so that they can only consume some percentage or share of the system. You can also set up automatic movement of WPARs between machines, so if Machine A is getting bogged down, but more are resources available on Machine B, you can either manually or automatically move those workloads.

As with Live Partition Mobility, if you need to do hardware maintenance, you can move workloads in your WPARs to other machines, and then power down the departure system to work on it. Once that maintenance is completed, you can return the workload to the original machine.

Again, there are limitations. As of this writing you can only use NFS to move your workloads between machines. You can’t move a WPAR from a POWER6 machine down to a POWER4, but you can certainly move WPARs between machines from the same CPU family.

I’ll probably spend a little more time on this topic next week, so be sure to check back then.

More on Virtual Optical Devices

Edit: This is still something I use all the time. An oldie but a goodie.

Originally posted January 8, 2008 on AIXchange

The more I use virtual optical devices with the IBM Virtual I/O Server (VIO server) and AIX, the more I like them. I wrote about virtual optical devices with the Integrated Virtualization Manager (IVM) in this post.

After getting an optical media library working with IVM, I wanted to try it on my VIO server using the HMC. I couldn’t find anything in the HMC GUI. I was going to poke around the command line on my own, but as luck would have it, someone forwarded a presentation with the information I was looking for. Now I’ll share this information with you.

You can see the virtual optical commands that are available in your VIO server by running:

help | more

Virtual Media Commands

chrep
chvopt
loadopt
lsrep
lsvopt
mkrep
mkvopt
rmrep
rmvopt
unloadopt

You can then run help , where is name of the command you’re seeking information about.

First I log into my vio server as padmin. I run:

mkrep -sp datavg -size 16G

Virtual Media Repository Created
Repository created within “VMLibrary_LV” logical volume

This basically creates my optical library logical volume, as you can see:

$ oem_setup_env
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted
on
/dev/hd4 524288 458656 13% 2666 5% /
/dev/hd2 7340032 612624 92% 61350 46% /usr
/dev/hd9var 1310720 1223608 7% 447 1% /var
/dev/hd3 4718592 4302464 9% 48 1% /tmp
/dev/hd1 20971520 17088808 19% 60 1% /home
/proc – – – – – /proc
/dev/hd10opt 1572864 647912 59% 10655 13% /opt
/dev/VMLibrary_LV 33554432 33417600 1% 4 1%
/var/vio/VMLibrary

Then I copy my .iso files (which can be created from CD if you don’t have them) to /var/vio/VMLibrary. If you don’t have .iso images, you can insert a CD (after assigning the CD-ROM to your VIO partition) and run:

mkvopt -name .iso -dev cd0 -ro

where is what you want to call the file.

After the .iso file is in your /var/vio/VMLibrary directory, run:

mkvdev -fbo -vadapter vhost4
vtopt0 Available

(Obviously you’ll replace vhost4 with whatever vhost adapter you plan to use in your VIO server.)

This mkvdev command creates your virtual optical device. Now run loadopt and it loads your CD image as if it were located in the CD device. This was a great solution for the situation I faced, where I wanted to load AIX 6 even though my Network Installation Management (NIM) server hadn’t been updated to AIX 6.

I ran:

loadopt -vtd vtopt0 -disk
cd.AIX_6_OpenBeta.0737.V1.ISO

After the loadopt command, I could run lsmap on the vadapter I’d assigned to my vtopt device earlier in the mkvdev command.

lsmap -vadapter vhost4

SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost4          U9131.52A.0649DDG-V5-C7                      0x00000000

VTD                   vtopt0
Status                Available
LUN                   0x8100000000000000
Backing device        /var/vio/VMLibrary/cd.AIX_6_OpenBeta.0737.V1.ISO
Physloc

From here I just booted my partition from this virtual CD and loaded the OS onto it.

When the first CD finished, I was prompted to remove CD1, insert CD2 and click Enter. So I went back to my vio server and ran:

unloadopt -vtd vtopt0

then I ran:

loadopt -vtd vtopt0 -disk
cd.AIX_6_OpenBeta.0737.V2.ISO

and selected Enter in my terminal window. The install continued as if I’d moved physical media around.

I obviously did the same when prompted to load CD3.

As IVM is running VIO server, there’s no reason you can’t use these same commands on either your IVM or HMC managed machines using the command line.

User Groups: Still Going, and Still Worth Your Time

Edit: I still advocate for finding and attending user group meetings, both virtually and in person. The links will redirect to new sites but they no longer appear to take you where they used to.

Originally posted December 17, 2007 on AIXchange

Have seen the poweraix.org user group listing lately? There are around 30 groups by my count. Some of these groups look more active than others, but it still wouldn’t hurt to check into the status of a group in your area. Perhaps your inquiry might spur a group back into action. Or, if you don’t find a group in your area, maybe you should take the initiative to start one.

Over the years I’ve frequently attended Linux and AIX user group meetings, and I would argue that you’d benefit from doing the same. Although we’re surrounded by the talented people that we work with, it’s always good to meet and network with others in our field. Whether you’re new to AIX and seeking a mentor or you’re an experienced administrator looking to meet others, these meetings can be a great place for you.

If you can’t attend meetings, either due to a lack of time or the absence of a group in your area, you can still join virtual user groups and sign up for their teleconferences and webinars. They bring in various guest speakers just like traditional user groups–and perhaps an hour-long conference call fits more easily into your schedule.

User group mailing lists can be another great resource. Groups that may not regularly schedule formal meetings may still have active lists, and the informal question and answers that can come from the mailing list can be very helpful.

Still, when possible, take the time to clear your calendar and travel to a user group meeting. I believe the benefits outweigh the inconveniences. As noted, there’s the benefit of networking. By getting to know the other administrators in your area, you can find other local companies that run the System p and AIX platform. You never know when you may be able to find good talent and convince them to come work for your organization, or when you might hear of a good opportunity that makes sense for you.

You could even win something. Many groups have giveaways and raffles. One group I was part of got publishers to give away books. In exchange, they’d ask each person who won a book raffle to author a report on it. Anyone who submitted a book report would then get first choice on the books that were available at the next meeting.

Although much of this is North American-centric, this IBM Web site references the “Guide Share Europe pSeries Working Group [which] is a formally organized group whose membership is bound to an annual fee.”

The more people who learn about the benefits of Power Architecture and AIX, the better. User groups offer an excellent option for you to get involved and help spread the word in your area.

See the Difference in AIX 6.1

Edit: A blast from the past. I wonder how many customers still run AIX 6.1?

Originally posted December 11, 2007 on AIXchange

The IBM AIX Version 6.1 Differences Guide has been released, and I suggest you take the time to read it. I’ll run down some highlights, chapter by chapter.

Chapter 2–Information here includes things like turning off jfs2 logging to increase performance (page 34), taking a jfs2 internal filesystem snapshot (page 34-35) and turning on encrypted filesystems (page 38).   

Chapter 3–The focus is workload partitions. Also discussed are updates and changes that have resulted in different performance tools to account for workload partitions. On page 158 it’s noted that the default size of the argument area on the command line has been changed. In older versions of AIX, if you tried to do a rm * in a directory with too many files, you’d get an error. You could either manually find smaller lists of files to give to the rm command, or run a find with xargs and do your rm that way.

Page 161 illustrates how you can limit the number of threads per process and the number of processes per user. Included is an example of a developer writing code that would bog down the whole machine. But now you can keep developers from bringing your machine to its knees.

You will see new entries when you run ulimit –a:

time(seconds)        unlimited
file(blocks)         2097151
data(kbytes)         131072
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited

Page 169 cites the IBM Systems Director Console for AIX, a default feature of AIX 6.1. If you go to https://localhost:5336/ibm/console, you should get a login screen. Then login with the root password, then log on and see what it has to offer.

Page 178 contains info on the Distributed Command Execution Manager (DCEM), which allows you to run the same command on multiple machines. When I was at IBM, I used a similar tool that saved me the hassle of logging into 100 different machines. I just logged into the master server and issued the command, and it would run on all of the machines and return the result to my master machine. As I said, DCEM seems very similar to this concept.

Page 202 talks about restricted tunables. IBM is suggesting that system administrators shouldn’t modify these tunables unless instructed to do so by support. Because they’re not supposed to be modified, they’re not displayed unless you use the -F flag. You’ll also get a warning message if you change one of the restricted tunables. This action will also cause a notification to go to the error log.

Page 215 goes into detail about the performance settings that come with AIX 6.1 out of the box. This is a change from the old behavior–you no longer must go in right away and tune minperm and maxperm and enable aio servers for database machines. These settings are all now set up correctly by default. On Page 217 it states that AIX 6 will enable I/O pacing on new installs.

I also recommend that you read about all of the new security enhancements. These are found in Chapter 8 starting on page 253. Look for things like weak root passwords and how to install your machine secure by default. I know the first thing we’ve always done after a fresh install is to go in and disable the unneeded services from /etc/inetd.conf, /etc/inittab, etc. Now the OS is installed with minimal services, allowing you to activate only the additional features that you actually need.

There’s plenty more, but hopefully this has convinced you to download the guide yourself. Much is improved with AIX 6.

Saving Loads with VIOS

Edit: It is much easier now, but this was the beginning of my journey into loading virtual media with VIOS.

Originally posted December 3, 2007 on AIXchange

I had a problem, and it was driving me crazy. I had a test box located hundreds of miles away. I had no time to drive on-site to physically load media to get an operating system installed on this test machine, but I did have a desire to get the openbeta of AIX 6 loaded on it. 

I had the AIX 6 openbeta .iso image downloaded to the machine, (I had downloaded the beta before AIX 6 went GA on Nov. 9) and I wanted to load it on this test box. First I thought, this will be easy, I’ll use my NIM server. I’ll load the mksysb file from the openbeta .iso image into my NIM server, and load the OS onto the new machine using NIM. However, since my NIM server is currently running AIX 5.3, it won’t work to use it to serve out this newer version of AIX. You can only use NIM to serve out AIX at the same level as the NIM server or older versions. It can’t serve out newer versions of AIX. How would I get this machine loaded without using physical media and driving to the datacenter?

The update to VIOS, version 1.5.1, contains my answer. (Download or order the CDs here.)

I used IVM for my example below. (I haven’t seen the same functionality on machines using a hardware-management console (HMC) at this time, and I haven’t had a chance to try this using the command line.) After I got VIOS 1.5.1 running, I clicked on View/Modify Virtual Storage as you can see about halfway down this screen shot.

From here I went to optical devices: 

And clicked on create library.

I chose 5 GB for this media library’s size. This just adds a logical volume to rootvg of your VIO server under the covers, in my case I ended up with:

/dev/VMLibrary_LV   10485760  10442544    1%        4     1% /var/vio/VMLibrary

Looking at the View / Modify Virtual Storage page once again, it shows that you have a media library now and you can perform different operations on that library.

Once you’ve created your media library, you can use the Add Media button, which gives you many options from which to choose.

In my case, I just copied my .iso image from my /home/padmin directory to my /var/vio/VMLibrary directory from the command line after logging in as padmin and running oem_setup_env. 

After copying the file, I added my media to a virtual optical device, which I then assigned to a partition. The screen shot below shows how the partition’s properties looked once I had selected and created the virtual optical device I was going to use.

Now all that was left was to boot my partition and in SMS use this virtual CD device as my boot device.

It booted from this .iso image as if the DVD were mounted locally in the machine. 

From the Virtual I/O Server Version 1.5 Release Notes Virtual I/O Server Version 1.5 Release Notes:

“The Virtual optical support is extended to support the loading and unloading of optical media files to a virtual optical device in the partition. Read-only support is provided to enable multiple partitions to access the same ISO image simultaneously. Read-write support is provided to allow partitions to treat the device as a DVD-RAM drive. In addition to the optical support, files can now be used for virtual disk in addition to physical and logical volumes.”

Using this technique I loaded a machine as if it were using the local cdrom device. After testing that this would work with a DVD .iso image, I added the three CD .iso images to my media library.

Then I assigned CD 1 to my partition. I booted from it, and when it asked me to change the CD and hit “Enter,” I went in and changed the CD by going into the Partition Properties>Optical Devices and clicking the Modify button. Once you do this, you can select the next CD. When you go back to the console and hit “Enter,” it will read the next CD and continue with the install.

As I test this feature, I’m sure I’ll find more interesting things I can do with it, but for now this was exactly the solution I needed.

Taking a Look at lparmon

Edit: The alphaworks site is no longer live, although it does take you to an IBM site: https://developer.ibm.com/community/ I fondly remember lparmon, maybe it is time to bring it back.

Originally posted November 26, 2007 on AIXchange

In general, I find that customers and management more easily comprehend their system utilization when the data is displayed graphically rather than as text output. This is especially true when they want to see the overall performance and utilization of a machine that’s been carved into multiple LPARs.

I find it especially beneficial to show customers historical performance information plotted in a graphical format. The trends, spikes and utilization can be easier to identify when viewed graphically.

The Austin Executive Briefing Center has created a real-time graphical tool for use with System p.

IBM’s alphaWorks Web site has a description of the lparmon tool:

“LPAR Monitor for System p5 servers is a graphical logical partition (LPAR) monitoring tool that can be used to monitor the state of one or more LPARs on a System p5 server. LPAR state information that can be monitored with this tool includes LPAR entitlement/tracking, simultaneous multithreading (SMT) state, and processor and memory use. The LPARs to be monitored can be running any mixture of AIX 5.3, AIX 5.2, or the Linux operating systems.

“Included in Version 2.0 are several visual and functional enhancements, including the ability to group LPARs into user defined categories, set alert thresholds with associated alert actions, monitor and record into a file processor, and use memory data over time.

“The graphical LPAR tool consists of two components. First, there are small agents that run in AIX 5.3, AIX 5.2, or Linux LPARs on the p5 server. These agents gather various LPAR information through several operating-system commands and API calls. The agents then pass this information via a connected socket to the second component, which is the monitor’s graphical user interface. This graphical user interface is a Java application and it is used as a collection point for the server and LPAR status information, which it then displays in a graphical format for the user.”

Naturally, it’s easier to understand lparmon when you see it for yourself. Compare this output which was collected with lparmon:

To this output that I captured in a text window using topas:

I could have opened up xterm windows to all three of the LPARs that I was monitoring with lparmon. I could have run topas in each of them. But when you’re trying to show this information to people who might not be familiar with vmstat, topas or nmon output, it’s helpful to simply point them to lparmon’s easy-to-understand graphical dials rather than educate them on where they should be looking at the output. The lparmon output above is pretty clear in showing that host fifty2a is using most of the resources compared to the other LPARs.

Compare that to this third image when I put a load on my vio server:

Again, it’s easy to see what’s going on with the machine: vios1 is using available CPU capacity from the shared pool, above what it’s entitled to, as there are resources available for it to borrow from. When lparmon output is being shown live while making changes to a running machine, it can very clearly demonstrate the advantages of virtualizing your machine. And when setting up LPARs, you can see how different choices will impact your environment.

Furthermore, lparmon can be useful when monitoring production machines. You can see if LPARs have been set up with the correct number of virtual processors and if they’re using the shared CPU pool the way that you expect them to. I’ve seen instances where people thought they were using the shared pool, but hadn’t added enough virtual processors to allow the LPAR to do so. Thanks to lparmon, you can quickly see if your machine is overloaded, if you need to rebalance resources, etc.

So take the time to set it up and see how your machines are running. It’s worth adding lparmon to your collection of useful tools.

The Benefits of mksysb Migration

Edit: I still use mksysb migrations, but not with 4.3.

Originally posted November 19, 2007 on AIXchange

During last month’s IBM System p, AIX and Linux Technical University in San Antonio, I listened to a presentation on Advanced Network Installation Manager (NIM).

One topic introduced by the presenter, IBM’s Steve Knudson, has really stuck with me. It’s called a mksysb migration, and you use NIM to implement it.   

I still have customers that are running AIX 4.3. Some of your customers are, too. When the day finally comes to upgrade them, how will you do it? Will you attempt to upgrade the OS on old hardware that might not support the running of the upgraded OS? Will you just pray that your customers simply retire whatever old applications they’re running on their old hardware and old OSs to spare you from performing any migrations?

I have a better solution: Do what Steve suggests. According to his notes from the conference, the mksysb migration allows you to restore an old non-supported mksysb on POWER5 or POWER6 hardware. Once the mksysb is copied to the new hardware, it’s immediately upgraded.

This process involves booting the client machine from the NIM master server with the AIX5.3 SPOT. The NIM server restores the backlevel mksysb, then immediately migrates the mksysb to AIX5.3.

See the IBM Redbooks publication, “NIM From A to Z in AIX 5L” pages 205-216, for details.

Here’s an excerpt:

“Given that AIX 4.3 is not supported on the POWER5 platform, in the days before ‘mksysb migrations,’ the only course of action would have been to upgrade the AIX 4.3.3 system to AIX 5L V5.3 on the existing hardware (for example, the 6H1) and then clone the system via a mksysb to the new POWER5 LPAR. This process is now simplified with the mksysb migration.

“A mksysb migration allows you to move a lower level AIX system (for example, AIX 4.3.3 or AIX 5L V5.1) to POWER5 without upgrading AIX on the existing server first. Essentially you boot AIX 5L V5.3 media (in our case, we use NIM) and recover the AIX 4.3.3 or AIX 5L V5.1 mksysb image, followed by an
immediate migration to AIX 5L V5.3. This was not possible with previous versions of AIX and pSeries hardware. A mksysb migration is now the recommended way of moving unsupported hardware configurations of AIX 4.3 and AIX 5L V5.1 to new supported hardware and AIX 5L V5.3.

“The end result is that the new system will be a clone of the existing system, but will be running AIX 5L V5.3 on a POWER5 platform. The existing system remains the same (for example, running AIX 4.3). You may choose to use this method to perform a test migration of a system and certify the applications,
databases, or code against AIX 5L V5.3 on the clone before executing the real mksysb migration at some later stage.”

I’m going to try this out myself. Once I run a few tests I’ll let you know what I find.

Loading a Console Window Directly on an AIX Desktop

Edit: I still like vnc, I imagine the instructions and links may be a little different now. I do not use KDE or Firefox on AIX anymore.

Originally posted November 12, 2007 on AIXchange

I’ve been using HMC version 7 for a while now. Recently I loaded a console window directly on my AIX desktop. I like having a desktop session running on AIX or Linux. I can close it down, go to another location, fire it back up and pick up from where I left off. I explained this in a previous article.

I was running VNC. (You can download the rpm here or load it from your AIX Toolbox for Linux Applications CD.)

Once I loaded VNC and ran vncserver, I decided to run the KDE desktop rather than the default tab window manager (twm) desktop. From my Toolbox CD, I went to the ezinstall/ppc directory and loaded the kde3.all rpms.

/aixcd/toolbox/ezinstall/ppc # ls
app-dev crypto.base gnome.apps kde3.all kde3.opt
base desktop.base gnome.base kde3.base

After installing and running startkde, I had a desktop that I liked.

Then I decided to load Mozilla Firefox 1.5.0.6 for AIX from the CD. It installed fine, and I could bring up my HMC login screen with no problem. However, when I tried to open a console window on one of my partitions, I was warned that I needed to install the appropriate plug-in to handle Java. I clicked to download the appropriate plug-in, but Firefox had no idea what to do, and the documentation on Sun’s Web site wasn’t helpful. Fortunately, the documentation that was included with Firefox for AIX was.

The pertinent information for getting your plugin working with Firefox on AIX is available at /usr/mozilla/firefox. Read the README or README.HTML files.

Using the Java Plug-In

The AIX Java Plug-in for Firefox for AIX is included in Java 5 or later. This version of Java runs on AIX 5L and chrp system architecture only. Run bootinfo -p to find out a system’s architecture.

Downloading the Java Runtime Environment — Download the Java installp images.

1. Open  http://www.ibm.com/developerworks/java/jdk/
2. Select AIX–Downloads from Java 2 Platform, Standard Edition (J2SE)
3. Select Java 5 64-bit and sign in.
4. You need these files: Java 5 and Java5_64.sdk.tar.gz.

Installing the Java plug-in–These installp filesets must be installed: Java 5 and Java5_64.sdk. Use SMIT, WebSM, or installp to install the filesets.

Configure the Java plug-in–For Java 5, the Java plug-in file is /usr/java5_64/jre/bin/libjavaplugin_oji.so. If it doesn’t already exist, create this link:

ln -s /usr/java5_64/jre/bin/libjavaplugin_oji.so \
/usr/mozilla/firefox/plugins/libjavaplugin_oji.so

As an alternative, the plug-in can be linked into the user’s .mozilla directory:

ln -s /usr/java5_64/jre/bin/libjavaplugin_oji.so \
$HOME/.mozilla/plugins/libjavaplugin_oji.so

Note: Only one version of the Java plug-in can be used at a given time.

Verifying the Java plug-in- In Firefox fox AIX, “about:plugins” in the address bar should show the Java plug-in information.

With Firefox running inside my VNC session on AIX, I was able to connect to my HMC, start my console window and start running some NIM installs. In the middle of the installs, I could disconnect from VNC, then reconnect later to check on the progress.

In the past I had a similar setup running on Linux, but this configuration running on AIX suits my current needs.

Working the Network

Edit: I do not think I have run Firefox on AIX in a while, but I still do most things remotely. Mounting .iso images is more straightforward these days too.

Originally posted November 5, 2007 on AIXchange

I found myself hundreds of miles away from a new environment that needed to be set up. The machines were physically cabled to the network and everything was powered on. I was able to reach the HMC and from there I could power on LPARs, open console windows, and configure partitions. My options were to either find someone on site to be my hands and eyes and physically load media for me, or use the network myself. I chose to download the necessary AIX media onto my Network Installation Management (NIM) server so that I could configure my new lppsource.

Much of what follows assumes that you already know how to set up a NIM server and have ample free disk space. It’s also assumed that you have Entitled Software Support, a decent network connection to the Internet, etc.

First I went to the IBM Entitled Software Support page and downloaded my AIX CD images.

I ran the Firefox Web browser on AIX in a VNC session on my server, and from that connection I downloaded my images directly onto the machine I was working on. I had local copies of the CDs, but the network pipe from my location wouldn’t support copying them directly from my machine to the target machine. Downloading them directly to my NIM server was a better use of the network in this case.

Once I downloaded the images, I needed to mount the CDs so that I could run the smitty bffcreate command.

On Linux, I can simply run:

mount -o loop -t iso9660 filename.iso /mnt/iso

This mounts my CD image on my filesystem. On AIX, mounting an .iso image is a little more involved. First I created my logical volume, in this case:

/usr/sbin/mklv -y’testlv’ datavg 6

Then I ran the dd command in order to write the contents of the .iso file to my logical volume:

dd if=/aixcd1.iso of=/dev/testlv bs=10M

Then I mounted my .iso image with:

mount -v cdrfs -o ro /dev/testlv /mnt/iso

At this point the CD was mounted, and I could run smitty bffcreate.

Once I’d used bffcreate to get my images into the correct directory, I could create my lppsource and move forward with my NIM procedures.

Depending on disk space issues, you may find that you need to remove .iso images as you download them. In any event, this procedure saved me from taking physical media onsite and allowed me to keep the project moving forward from my remote location.

Handling HMC Login Failures

Edit: This is still something that you might run into. Surprised that the link still works.

Originally published October 29, 2007 on AIXchange

I went to a customer site to look at a machine that wasn’t showing up on the hardware-management console (HMC). The machine’s HMC port was connected to the same network switch as the HMC, so I powered it up. I logged onto the HMC, and under the status column on my HMC view, it said that authentication for this new machine had failed, there were too many failed attempts to log in to the service processor.

How do you get the machine connected to the HMC, if the HMC is unable to authenticate with  the machine? Some people suggested that we just remove the NVRAM battery from the machine, and that the passwords would go back to defaults. I was hoping that this was not the case, as that seemed like a pretty trivial way to bypass the security that the password provided. After trying this without success, we called support, and they provided the celogin password for the day so that I could log into the ASMI.

IBM Support is able to take the serial number of the machine, and match that with the current date, and based on that provide a password that’s good for the day. This is very  similar to when you have to call support to get the root password to the HMC.

The next issue was the fact that pulling the NVRAM battery had set the date to 1/1/2003. The password we were trying to use was for the current date. Once I logged into ASMI as admin, I was able to set the machine to the correct date. With the celogin password I was able to log in and reset the unknown HMC password. From: http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/iphby/browser.htm&tocNode=int_130339

There are several authority levels for accessing the service-processor menus using the ASMI. The following levels of access are supported:

  • General user–The menu options presented to the general user are a subset of the options
    available to the administrator and authorized service provider. Users with general authority can view settings in the ASMI menus. The login ID is “general” and the default password is “general.”
  • Administrator–The menu options presented to the administrator are a subset of those available to the authorized service provider. Users with administrator authority can write to persistent storage, and view and change settings that affect the server’s behavior. The first time a user logs into the ASMI after the server is installed, a new password must be  selected. The login ID is “admin” and the default password is “admin.”
  • Authorized service provider–This login gives the authorized service provider access to all of the functions that could be used to gather additional debug information from a failing
    system, such as viewing persistent storage, and clearing all deconfiguration errors. The login ID is “celogin.” The password is dynamically generated and must be obtained by
    calling IBM technical support.

Be sure that when you change the ASMI and HMC passwords that you document this change just like you would any other passwords. Also be sure to keep your machines under warranty in case you do find yourself in a situation where you need to call IBM support, although I  imagine they would be willing to provide the service for a fee.

IBM Revamping Fix Central

Edit: I just left the links alone on this one. They actually still resolve and the .pdf is still there.

Originally posted October 22, 2007 on AIXchange

Did you see that IBM announced it’s enhancing the Web-based fix download facility to support its new AIX service strategy? According to the company, simplified Web pages with enhanced search capabilities and more detailed package information will be provided in the next month or so.

I want to highlight the following from IBM’s Web page:

“IBM provides many documents that recommend service and support strategies for IBM Systems and software. These ‘best practices’ documents describe system planning and support procedures to improve system administration operations. The best practices documents referenced on this page provide strategies for IBM System p servers and firmware, the AIX operating system and related products, such as the Hardware Management Console and cluster software such as HACMP. “

While you’re there, be sure to read this PDF regarding  IBM’s new service strategy:

“The new service strategy encourages clients to apply complete Service Packs. Individual updates can still be applied; however, having a maintenance policy of applying the complete package improves manageability by reducing complexity and providing more consistency.

“Clients also benefit from knowing that IBM has regression-tested each Service Pack as a unit. Installing the entire Service Pack reduces the possibility of product regression and increases serviceability by enabling IBM support to identify problems quicker.

“One of the first changes you’ll notice is that IBM is now promoting the full distribution of fix packs. Downloading packages for specific problems is being discontinued. A fix to a specific problem will be available in one or more fix packs. A fix pack can be either a Service Pack or a Technology Level package.”

The PDF includes some screen shots of the upcoming changes to the Fix Central Web site. Also from the PDF:
 
“As you select package lists, this aide will remain visual to help you make decisions on whether to stay at your current Technology Level or upgrade to a newer one. There are many reasons to upgrade to the latest TL. End of service life is one of them.
 
“Search is being enhanced with full indexed searching and improved sorting options. Searching by symptoms or error messages is easier and more complete. All specific fixes are associated with fix packs that are either Service Packs or Technology Level packages.”
 
I recommend you take a few minutes to get all the information about the coming changes to Fix Central. These enhancements should allow us to obtain greater stability from the machines we manage because we’ll only be managing machines to the latest technology level and service pack. Of course we’ll need to check for the current technology and service-pack levels, and then decide which schedule for implementation makes the most sense in our environments.

Parting Thoughts on This Year’s Technical University

Edit: I still love the IBM Technical University. I am pretty sure the links at the end no longer work. I edited the first link.

Originally posted October 15, 2007 on AIXchange

A few weeks ago I mentioned the IBM System p, AIX and Linux Technical University held Oct. 1-5 in San Antonio, and recommended that you make every effort to attend. I did, and what follows are some of the reasons that I’m glad I went:

Sure, none of this had anything to do with AIX or Linux, but seeing these sites was well worth the time.

It’s easy to conclude that there’s no benefit to sending employees to these conferences for a week, that they’ll spend all their time as tourists rather than get the training they need. However, nothing could be further from the truth.

Each day at the Technical University featured about 55 presentations, which were held in 11 different rooms throughout the San Antonio Convention Center. Some were repeats, so if two sessions you wanted to see were running at the same time, chances are at least one of them was repeated on a different day. So with a little planning, you could attend every session that would benefit you.

The 75-minute sessions ranged from lectures to discussions with AIX developers to hands-on labs and onsite certification testing. Presenters’ slides were available for download. This is valuable of course, but there’s still much to be gained from sitting with your peers and listening to the instructors elaborate on what they meant when they wrote their presentations.

Training tracks covered AIX, HACMP, storage, virtualization and other topics. Following the color codes made it easy to determine which presentations fit your interests. On Monday there were keynote sessions covering System p trends and directions, followed by an AIX trends and directions session. A typical day started with breakfast, followed by sessions at 8 a.m., 9:45 a.m., 11:15 a.m., 1:45 p.m. and 3:30 p.m. Daily printouts listed any last-minute itinerary changes.

Overall I found the content to be very worthwhile. Besides the content, it’s always nice to hear colleagues report issues that they’re seeing in their home environments and solutions they’d found. In general, it’s just good to spend time with others who work on the same systems that I do.

European readers may be interested in the System p, AIX and Linux Technical University scheduled for Nov. 5-8 in The Netherlands. Otherwise, there’s always next year’s Technical University. I’ll see you there.

Modernization Decision Is Multifaceted

Edit: F50 servers? B80? 6M1? Now THAT is a long time ago.. I love that I am advocating for running AIX 6.1 on POWER6 hardware too.

Originally posted October 8, 2007 on AIXchange

I recently did some services work at a few customer sites. One customer was running an F50 with an unsupported version of AIX. Another was running a B80 and a 6M1 with AIX 5.1. In the first scenario, we needed to make some minor adjustments to the F50, and in the second case, we needed to upgrade the operating system. In both cases, upgrading to newer hardware came up in conversation. The virtues of virtualization and the benefits of the faster processing power and the reliability, availability and serviceability (RAS) discussions all took place. Ultimately though, for their particular environments and workloads, their current solutions were working just fine.

As System p professionals, we continually keep up to date with new hardware, new features, and new offerings from IBM. It’s very easy to think that everyone should run POWER6 processors and AIX 6 with VIO, and we need to keep in mind that the technology and the speeds and the feeds are good things. But the bottom line is that customers have business problems that they’re trying to solve, and computing resources are tools to help them do that. If their older hardware does the job, they may see no compelling reason to change their environments–even if their current machine isn’t as fast or responsive, and even if their jobs might run in half the time on newer hardware.

For whatever reason–budgetary, strategic or something else–they’ve chosen not to make the move at this time. Some customers I talk to make the analogy of driving a 1992 model car that still gets them to their destination adequately. As much as they’d like to modernize, they feel it doesn’t make sense to do so at this time.

In the meantime, we’ll continue to educate them, so that when they are ready to make that move, they’ll know all of the features and benefits that they can look forward to.

HMC v7 First Impressions

Edit: This post still seems to hold up, the HMC versions have changed, but what we can do with the HMC is the same. The link still works which is always a bonus.

Originally posted October 1, 2007 on AIXchange

After last week’s article, where I detailed some hardware management console (HMC) training I attended, I went ahead and upgraded to HMC v7.

Some first impressions:

It was a very simple upgrade. I installed the version 7 release 3.1.0 DVD (after backing up console data and saving upgrade data) and booted from it. After answering a few questions and loading both DVDs, I had a running system.

I’ve been doing my testing while using Firefox as my browser (Internet Explorer isn’t at the right level on my machine, and I haven’t looked into upgrading yet). It seems to work fine. Just go to  https://hmc.ip.address or https://hmc.hostname and you can get to your prelogin screen. Here you have a link to log on and launch the HMC Web app and view online help. Below those two selections you can view the system status, any attention LEDs, and any serviceable events (see below).

When you log in, you see some hmc “tip of the day” screens (see below). Note that you can disable these screens though I like reading the different tips.

The display is similar in concept to WebSM, and after clicking on the different headings, it is pretty obvious where to go find things. The HMC redbook that I mentioned last week gives a very good overview of the different things that you can do.

One thing I’ve noticed is that if I lose network connectivity, (or if I make the choice when I log out), I can reconnect to my previous session. When you select log off, and select the option to disconnect instead of log off, the next time you can either reconnect to the previous session or start a new session. This can be handy if you start a long-running task on the raised floor and need to walk back to your desk. You can logoff, disconnect, then reconnect at another location. As a heavy screen and vnc user, I can really appreciate this functionality.

I also used the HMC v7 to upgrade firmware on a machine, and it went very smoothly. It took me a minute to realize that I wanted to use Upgrade Licensed Internal Code to a new release, but once I used the correct option, the update process was very simple.

I was able to easily export a system plan from the HMC, and transfer that system plan to my machine. When I ran the IBM System Planning tool on my machine, it let me open and view the system plan I moved over, but I was unable to edit that system plan.

I think it would be very useful to be able to export sysplan data from the HMC, edit it on the PC, then transfer it back to the HMC. For now it looks like I can transfer it from the HMC and view it, but I would have to create a new file in order to edit it.

I did find that ports 9960/tcp need to be open for the console window, and that the HMC will communicate to the managed machine on 657/tcp and 657/udp for things like dynamic logical partitioning (DLPAR) operations.

As I continue working with and using the HMC, I’m sure I’ll find more to share with you. Hopefully you’ll continue sharing the things you’re learning with me.

The Advantages of HMC Version 7

Edit: I have not thought about websm in quite a while. Another blast from the past. The links still work, which always surprises me after so many years.

Originally posted September 24, 2007 on AIXchange

After I attended Hardware Management Console (HMC) version 7 training recently, I was inspired to upgrade my HMCs to version 7. I was also inspired to share what I learned. The following are some of the highlights; you can get more details from the HMC Redbook

This new release allows you to use a browser to connect to the HMC instead of using Web Based System Manager (websm). This means one less step is required when remotely accessing our HMC. Instead of going to hmc.ip.address/remote_client.html to download and install websm code, we can just go to hmc.ip.address with our browser and manage our machine that way without the need for additional software to be loaded on our laptops.

In days gone by, you had an HMC for POWER4 systems and a different HMC for POWER5 systems; you couldn’t mix and match them. With HMC version 7, you can manage POWER5 and POWER6 machines with the same HMC, you don’t necessarily have to get new hardware for your HMC to manage your new machines (assuming they are supported models).

How many times have you seen an error code on your system LED or in websm and wondered what it meant? To find out meant following this process: Writing the error code down, finding the documentation that told me what the code meant, or calling IBM Support to ask them what it meant. Now you can look up error codes with the built-in hyperlinked documentation. You can click on the code and have the meaning displayed. This should make finding out what’s wrong with your hardware a much simpler process.

Partition Availability Priority is a concept that’s similar to weighting partitions in that you give different weights to the different partitions. These weights aren’t used for the day-to-day operations of your partitions, you still set that up in the profile. The Partition Availability settings are used if you were to lose a processor and the system needed to decide which partition most needed to have cycles available to keep it running. It might be a good idea to be sure your production partitions had a higher priority than your test partitions in the event that a processor went away.

Another interesting topic that was covered was Utility Capacity On Demand (COD). I can remember situations where there would be heavy workloads and customers would use reserve COD and end up being charged a fee for using a processor for a full day, even though they only needed those CPUs for a few hours each night. With the new billing model, they’ll be charged on a minute-by-minute basis instead of a daily basis. This can be a huge advantage when you have peak workloads that require more computing resources than usual, but you don’t necessarily want to buy that large machine to sit around idle most of the time.

Check the IBM Systems Hardware Infocenter for information about upgrading from version 6 to version 7, and let me know if you find any other interesting features with this new release.

Systems Management Without an HMC

Edit: I was pleased to find the links below still seem to work. Every now and again I still pull out my trusty serial connections but it is more rare than it once was.

Originally posted September 17, 2007 on AIXchange

Some System p customers buy a smaller POWER5 machine, but don’t want to buy a hardware-management console (HMC) to go with it. It could be that the cost of the HMC outweighs its perceived benefits. Perhaps they don’t plan to partition the machine, and will run it as a single image. However, they still want to be able to manage the machine remotely.  They have a few options. To set the machine up initially, they can plug a laptop into either HMC connection on the back of the machine and access the advanced system management interface (ASMI) that way. This is all explained here

They can configure the IP addresses and plug the HMC network port directly into the network if they so choose, and use that connection to access and manage the machine.   

But what if they want to access a console remotely? Then they need to plug into the serial connection in the back of the machine. (If you Google “Serial to Ethernet converters” you should be able to find many different products from which to choose.)

Accessing ASMI via ASCII terminal is described here.

The newer laptops no longer have serial connections built in. I used to take a serial cable and connect it from my laptop to the back of the machine, and access ASMI and the console that way. My laptop has plenty of usb connections. I needed to buy a USB-to-serial converter, and plug that into the back of the machine. My converter had a male end on it, as does the serial port on the back of the machine, so a gender changer was also in order.

Then, if you’re running Windows on your laptop, you can run hyperterm (pay attention to which com port your USB-to-serial converter is running on–speed, duplex, etc.) and make a connection to the serial console. If you put your serial-to-Ethernet converter on your network, you can then access the console remotely.

As you look around at the different products, you’ll also see serial-terminal servers so that you can manage more machines in your environment via one terminal server.

Although having an HMC is certainly the preferred method for managing these machines, methods are available to manage your machines without one. What other solutions have you come across?

Skeptic to Believer: Live Partition Mobility Has Many Potential Uses

Edit: I feel like I was starting to hit my stride this week and the week before. The topics and the content got a little meatier as time went on. It is hard to believe how exciting it was the first time I saw Live Partition Mobility in action. It is interesting to see that the scenarios I described are very common these days, with much of it being automated and much faster than it was in the POWER6 days. I was able to dig up a link (included below) to the press release: IBM Demonstrates a UNIX Virtualization Exclusive, Moves Workloads From One Machine to Another — While They’re Running

Originally posted September 10, 2007 on AIXchange

When I was in Austin, Texas, recently for a technical briefing, IBM demonstrated how you can move workloads from one machine to another. They call it Live Partition Mobility.

I saw it in action and I went from skeptic a believer in a matter of minutes. I kept saying things like: “This whole operation will take forever.” “The end users are going to see a disruption.” “There has to be some pain involved with this solution.” Then they ran the demo. 

They had two POWER6 System p 570 machines connected to the hardware-management console (HMC). They started a tool that simulated a workload on one of the machines. They kicked off the partition-mobility process. It was fast, and it was seamless. The workload moved from the source frame to the target frame. Then they showed how they could move it from the target frame back to the original source frame. They said they could move that partition back and forth all day long. (Ask your business partner or IBM sales rep to see a copy of the demo. There’s a flash-based demo that was recorded to show customers a demo. I’m still waiting for it to show up on YouTube.)

The only pain that I can see with this solution is that the entire partition that you want to move must be virtualized. You must use a virtual I/O (VIO) server and boot your partition from shared disk that’s presented by that VIO server, typically a storage-area network (SAN) logical unit number (LUN). You must use a shared Ethernet adapter. All of your storage must be virtualized and shared between the VIO servers. Both machines must be on the same subnet and share the same HMC. You also must be running on the new POWER6 hardware with a supported operating system. 

Once you get everything set up, and hit the button to move the partition, it all goes pretty quickly. Since it’s going to move a ton of data over the network (it has to copy a running partition from one frame to another), they suggest that you be running on Gigabit Ethernet and not 100 Megabit Ethernet.

I can think of a few scenarios where this capability would be useful:

The next time errpt shows me I have a sysplanar error. I call support and they confirm that we have to replace a part (which usually requires a system power down). I just schedule the CE to come do the work during the day. Assuming I have my virtualization in place and a suitable machine to move my workload to, I just move my partition over to the other hardware while the repair is being carried out. No calling around the business asking for maintenance windows. No doing repairs at 1 a.m. on a Sunday. We can now do the work whenever we want as the business will see no disruption at all.

Maybe I can run my workload just fine for most of the time on a smaller machine, but at certain times (i.e., month end), I would rather run the application on a faster processor or a beefier machine that’s sitting in the computer room. Move the partition over to finish running a large month-end job, then move it back when the processing completes.

Maybe it’s time to upgrade your hardware. Bring in your new machine, set up your VIO server, move the partition to your new hardware and decommission your old hardware. Your business won’t even know what happened, but will wonder why the response time is so much better.

What happens if you’re trying to move a partition and your target machine blows up? If the workload hasn’t completely moved, the operation aborts and you continue running on your source machine.   

This technology isn’t a substitute for High Availability Cluster Multi-Processing (HACMP) or any kind of disaster-recovery situation. This entire operation assumes both machines are up and running, and resources are available on your target machine to handle your partition’s needs. Planning will be required.

I know I haven’t thought of everything. Let me know what scenarios you come up with for this useful tool.

System p, AIX and Linux Technical University Fast Approaching

Edit: This post has a dead link to a conference from many years ago. I still love going to the IBM technical universities. In this post I mention how in the olden days we were given a CD with slides etc, I think I prefer being able to download .zip files these days, it seems like more of the last minute slides make it into the .zip files vs the CDs.

Originally posted September 4, 2007 on AIXchange

It’s already time to start thinking about IBM System p, AIX and Linux Technical University. In some organizations, you need more lead time than this to plan to attend such an event, but if you still have some flexibility and budget available, you should really consider attending. If you’ve never been to this event, you should try to attend. If you are unable to go this year, start planning for next year. No, I don’t work for the marketing department. I just found that this was a beneficial event when I attended, and I have always heard the same from friends that have gone to the conference in the past.

This year’s IBM System p, AIX and Linux Technical University will be held Oct. 1-5 in San Antonio, Texas. The list of available sessions includes 94 at the time of this writing and IBM is still finalizing the content. By looking at the agenda, you can see that great material will be presented each day. In years past, attendees received a CD that contained the slides that were presented in all of the lectures. This means that in your copious amounts of free time, (machines never need to be built or fixed do they?) you can at least look over the slides to get an idea of what went on in any of the sessions you might have missed. Many of the slides also contain contact information for the presenters, so you could e-mail them to try to get more information about a given topic.

When I attended this conference in the past, I found it to be worthwhile. If you go, let me know what you think by posting a comment here.

Advice on Transitioning to AIX From Other UNIX Flavors

Edit: Still seems like a legit answer, and also applies to those that know AIX that want to learn more about Linux for example.

Originally posted August 27, 2007 on AIXchange

I’ve been asked, “How long will it take for me to get up to speed on AIX if I have experience in Solaris or HP/UX or Linux?”
 
That depends.
 
How willing are you to read documentation? How much time will you have available for hands-on learning? Will you have a lab available, and the time to spend on the test machines? Will you be trying to keep your skills current, or will you want to set up the machines and just let them run with little intervention?
 
Like anything in life, the more time that you invest, the better you get. I’ve heard managers say, “Oh, just send them to the AIX jumpstart class, that’s all they will need.”
 
This approach will certainly get the information necessary to start doing their jobs, but they won’t become experts overnight. This comes with real-world experience solving problems.   
 
With a solid UNIX background, the transition may be easier, just remember to leave the “baggage” that comes with this knowledge at the door. The AIX world may have different ways of doing things than the world that you’re experienced in. Learn why things are done the way they are. Learn how they’re done. Then share what you have learned.

When to Ask for Help

Edit: This is still a relevant discussion. There is value in beating our head against the wall and figuring things out, I know I learn and retain a ton of information using that method. However when a business is being affected because a server is down and every second counts, I do not hesitate to open PMRs and utilize those support contracts and get help, we pay to have access to additional resources so remember to take advantage of that option.

Originally posted August 19, 2007 on AIXchange

I like the quote “there is no shame in calling support.” I was talking to a co-worker, and we were talking about a problem that had cropped up in our environment. We weren’t sure how to proceed. Google and our usual methods of searching for answers didn’t help us in this case. Instead of wasting more time searching for the answer on my own, I called IBM and quickly received the answer I needed to solve the problem.

How long do you usually wait before calling in a problem? With a hardware issue, it’s a no brainer. You need the CE to bring out a new part, and you call it in. With software support, it can be complicated. We’re supposed to be the experts, we know it all. In reality, everyone needs help at some point. Whether it is a new technology that you are still learning about, or an obscure setting that you have forgotten about, what are the situations that warrant you calling IBM for help?

Consolidate Using System p LPARs

Edit: Virtualization is standard these days, but in 2007 there was still hesitation in some circles. These were discussions I remember having with customers at this time.

Originally posted August 12, 2007 on AIXchange

I have too many machines in my server room. I am running out of power connections and cooling capacity.

Why not consolidate some of those servers into a smaller physical space, using virtualization on a System p machine? Carve up your LPARs so that each partition uses idle resources that the other LPARs aren’t currently using. If peak workloads vary throughout the day on the different machines, then this could be a good solution. Instead of 10 that are 10-percent utilized and 90-percent idle, why not run 10 LPARs on a single frame and run the whole thing at a higher overall utilization?

This isn’t a magic bullet, and not appropriate for all workloads, but with proper planning this additional tool can help free up space on the raised floor and reduce the overall load on the computing environment.

Getting an AIX Education

Edit: I am starting to get the hang of it by my third post. The links in the post still work all of these years later, so that is a nice bonus. The course names may have changed but the principle is the same.

Originally posted August 5, 2007 on AIXchange

Once management discovers all that the System p platform has to offer your organization, you’ll be asked to learn AIX to support the new machines. Where do you go for this type of education? I would recommend getting some training, although there’s something to be said for getting a lab machine and poking around on it as well.

Look into the “AIX 5L Jumpstart for UNIX Professionals” class that IBM offers in classrooms worldwide. According to the Web site, “This course is targeted for Sun Solaris, HP-UX, or other UNIX system administrators.” This is a traditional classroom course with hands-on labs.

E-learning is another option for those unable to get away for a five-day class. Check out http://www.ibm.com/training for some options.  At the very least, visit http://www.ibm.com/redbooks and search for applicable AIX downloadable PDF files to read.

Once you see all that AIX has to offer, you’ll want to learn more, and there’s always more to learn.

Maintaining Uptime

Edit: My second post for AIXchange. How long did it take for my topics / style to improve? Now we have POWER servers, we do not still have System p machines. I would argue that keeping hardware from failing is still something to worry about.

Originally posted July 29, 2007 on AIXchange

I am trying to stop a hardware outage from taking down the partitions that I have running on my System p machine. The whole idea of virtualization and consolidation will backfire on me if a hardware issue now takes out four machines instead of one. 

As we do this planning, will we take advantage of as much redundancy as we can? Are we making sure that we have different feeds coming from different power sources? Are we setting up multiple fibre paths to our storage-area network and trying to have our multiple fibre cards exist in different I/O drawers?  Are we setting up redundant virtual I/O servers so that we can lose one and still keep our client LPARs running with the remaining server?  Do we have redundant hardware management consoles set up and functioning?   

What other tactics do you utilize to maintain your uptime?

Consolidation and Virtualization: What Are the Best Solutions?

Edit: I know this is a bit of a rough start, but it was the very first post I wrote for my brand new blog on IBM Systems Magazine. It ended up being called AIXchange, but at one point I was tossing around names like *xExchange, AixExchange, AdminExchange. Who knows what might have been if any of those names would have been used.

Originally posted July 16, 2007 on AIXchange

As you think about server consolidation and server virtualization, how do you decide what is the best solution for your enterprise? Do you look for the biggest machine that you can purchase with your budget allocation?

Do you figure that it is a good idea to have as many CPUs in the machine as possible? How do you architect your solutions? Do you have the experience in house to help with those decisions, or do you look externally for help? In many instances, an IBM Business Partner can be a smart choice, as they will have experience sizing machines for customers if you do not.

I like to know what machines are current, and what machines are on the horizon. I have found good information at: http://www-03.ibm.com/systems/p/. What resources do you utilize as you continue to plan for the future?