Verifying Microcode Levels

Edit: As of this writing the link still works.

Originally posted January 17, 2012 on AIXchange

As great as POWER7 servers are, plenty of older machines still run AIX. And as great as that is, even better is that, through the use of certain tools, you can easily verify that these older machines are running the latest versions of system firmware and microcode.

Recently a “Top Gun” CE reminded me about the Microcode Discovery Service. It’s a handy tool that allows you to see if your microcode is up to date.

From IBM Support:

“Microcode Discovery Service [MDS] is used to determine if microcode installed on your IBM System p or RS/6000 systems is at the latest level.

“MDS relies on an AIX utility called Inventory Scout. Inventory Scout is installed by default on all AIX 5 systems, and also on some later levels of AIX 4.3.”

As noted by IBM, there are three ways to run MDS:

  • Run a signed Java applet that connects to Inventory Scout daemon processes on hosts to be surveyed.
  • Run Inventory Scout either manually or by script, and upload the resulting survey files to the MDS website for analysis. Refer to the User’s Guide for instructions on running Inventory Scout.
  • The MDS Microcode CD-ROM is recommended for systems that are not internet connected. An image of this CD-ROM is available online, or you can order a physical disk. For more information or to download a CD-ROM image of this tool, visit: MDS Microcode CD-ROM.

Under the heading, “Preparing to use the Microcode Directory Service,” there’s a description of the Inventory Scout:

“Inventory Scout is a utility that runs on System p hosts. For AIX version 5 and later it is part of the standard install. In case it is not installed, refer to the User’s Guide for instructions. …

“The MDS applet is capable of performing surveys of more than one host at a time, and creating a combined microcode report. If using the MDS applet, then the following additional conditions must be met.

  • “Each host to be surveyed must be running Inventory Scout in a daemon process. These daemon processes are not started by default. Refer to the User’s Guide for instructions on how to configure Inventory Scout to run as a daemon.
  • “Java support must be enabled in your browser. To enable this support, see the Preferences or Tools options on your browser.
  • “Your company must allow applets to establish TCP connections to the hosts to be surveyed.”

I clicked on the applet option

and allowed it to run:

This screen allows you to add a host and give it an IP address, password and (optionally) the port if you’ve changed it.

If the wrong IP address is entered, you’ll get this screen.

I ran passwd on the invscout userid so that I knew the password for the invscout account. Then I ran invscoutd –d100000 on the command line of the machine I was going to scan so I could activate the daemon and get it to listen for the applet to connect. Originally invscoutd was listening for 50,000 bytes. It needs to listen for 100,000 bytes, which is why the –d flag is required.

Once I successfully added the host, I clicked start:

That brought up this report.

It found three devices and displayed the system firmware that needed to be updated. I clicked the link and copied the files to the machine that was running the applet. Then I moved those files over to the Power system that needed to be updated.

In my case I created /tmp/mcode, and copied the files to that directory. Then I ran rpm –ivh -–ignoreos *.rpm to get the information and microcode loaded into the /etc/microcode directory.

Next, I ran diag, task selection:

On this screen, I selected the “Microcode Tasks” option:

Then I selected “Download Latest Available Microcode” and hit enter:

Since I ran the rpm command earlier, I knew the files should be in /etc/microcode. I selected that option:

Then I selected “All Resources” and then hit the F7 key to let the microcode install:

I actually updated the system firmware via the HMC, though I could have used the command line if my system wasn’t managed by an HMC. Once my updates were complete, I reran the MDS applet. No further updates were needed:

Mds13

Of course there are other methods for verifying that your microcode is up to date. How do you prefer to do this in your environment?

Have You Seen the VIOS Advisor?

Edit: This is built in now. Some links no longer work.

Originally posted January 9, 2012 on AIXchange

Download the tool here. I’ll let IBM developerWorks provide the introduction.

“The VIOS advisor is an application that runs within the customer’s VIOS for a user specified amount of time (hours), which polls and collects key performance metrics before analyzing results and providing a health check report and proposes changes to the environment or areas to investigate further.

“The goal of the VIOS advisor is not to provide another monitoring tool, but instead have an expert system view performance metrics already available to the customer and make assessments and recommendations based on the expertise and experience available within the IBM systems performance group.

     Download vios_advisor.zip from the link provided in download section.
     Unzip vios_advisor.zip on a workstation that has a web browser.
     ftp vios_advisor onto the VIOS you wish to monitor. (Place in any directory.
     chmod +x vios_advisor to give the application execution privileges.

 “The application “vios_advisor” takes only one parameter, which is the duration of the the monitoring period, in minutes.

 “For example, to monitor the VIOS for 30 minutes, run:

      vios_advisor 30

   Usage Statement:

      Usage: vios_advisor
      duration_in_minutes:
      Recommended monitoring time = >= 30 min
      Minimum monitoring time = 5 min (only recommended for settings verification)
      Maximum monitoring time = 1440 min (24 hours)

      -v : Version

“The vios_advisor application is silent (does not produce any output to screen) and upon termination, will generate an xml file in the current running directory labeled:

     vios_advisor.xml

“Copy over the vios_advisor.xml file to the workstation where the zip file: vios_advisor.zip was extracted, and place the file in the vios_advisor folder. Open the vios_advisor.xml file with the web-browser of your choice to see the report.

“The measured overhead for the VIOS Advisor is minimal. An increase in CPU consumption of 0.1 cores was measured on a POWER7 server. Memory consumption will vary based on the number of physical I/O devices in the VIOS, but expect the advisor to consume 2-20 MB of memory.”

I downloaded the advisor and extracted the files from the .zip file. Then I selected the vios_advisor_example file that was located in the newly created directory. This was the output in my browser:

I copied the vios_advisor file to my VIOS and ran chmod on it so that I could run the tool. Then I ran a quick test to make sure it worked:

$ chmod u+x vios_advisor
$ vios_advisor -v
vios_advisor  Version: 121211B

Then I ran:

 $ vios_advisor

 Usage: vios_advisor

      duration_in_minutes:
            Recommended monitoring time = >= 30 min
            Minimum monitoring time =         5 min  (only recommended for settings verification)
            Maximum monitoring time =      1440 min  (24 hours)

       -v :    Version

Since this was a test, I chose the minimum of five minutes to verify the settings.

$  vios_advisor 5

At the end of the test, I received a file called vios_advisor.xml. I copied that back to my PC, putting it in the directory that my vios_advisor.zip file was extracted to. Then I examined the report.

I’m sure IBM will continue to enhance the tool, helped along by user feedback.

So have you tried the VIOS advisor? How would you improve it?

We Could All Use Extra Capacity

Edit: I still like this analogy.

Originally posted January 3, 2012 on AIXchange

I was recently delayed at the San Jose airport. Such is the life of a consultant.

The problem this particular day was dense fog. Airplanes could take off, but, per FAA rules, they weren’t allowed to land. My flight was coming from Reno, stopping in San Jose and continuing to Las Vegas. But with the fog, the plane was rerouted straight to Las Vegas. That left me and a bunch of other passengers to fend for ourselves.

Luckily I was able to book tickets for a flight later that day. (Given my many miles logged with this particular airline, they were pretty accommodating.) I made some different connections through different cities and arrived home some six hours later than planned. Unsurprisingly, my checked luggage didn’t accompany me. It showed up the next day (which is why I generally only take along what I can carry on). But at least I was able to complete my travels in one day.

As a frequent flyer, I’ve been through worse. Several times I couldn’t get directly home. I had to fly into a nearby airport and take the train the rest of the way. Once I ended up in a rental car with three strangers as we drove from Pittsburgh to Chicago (again, fog was the issue). I’ve also shared a rental car with fellow weary travelers between Fresno and San Francisco.

Being stranded at the airport does give a person time to think. For instance, while I sat in San Jose, I thought how great it would be if, whenever a flight is delayed or diverted, a spare plane with a standby crew standing would be instantly available to take passengers to their destination.

Imagine a snowstorm or other weather event that causes disruptions throughout an entire region over several days. In my fantasy world there would be lightly loaded planes with available seats so everyone could rebook with little worry or hassle. In an even more perfect world, this extra capacity would be available at every airport, at any time. If one plane has a mechanical problem, another would be ready to go. And in peak travel times — say, Thanksgiving week — extra planes could be deployed as needed to satisfy the additional demand.

So basically, I was thinking how awesome it’d be if flying was like Power Systems hardware.

Just think about that. Think about capabilities like Capacity on Demand (COD), micropartitioning and shared processor pools. Think about how the hypervisor, on a millisecond by millisecond basis, can redistribute workloads and efficiently utilize the processors on your machine.

Instead of sizing standalone machines for peak workloads, you can put many LPARs on a single frame to more fully utilize that machine. If you need extra capacity, it’s available. If you’re using COD, you can fire up dormant processors or extra memory that’s already on the frame. Active memory expansion and active memory sharing allow you to do more with less physical memory. You have many options to get extra capacity built right into your systems.

Unfortunately, excess capacity was nowhere to be found on my most recent layover at the San Jose airport. But at least whenever I return from the road, I know I’ll have excess capacity to handle additional workload in my computing environment.

AIX and TCO

Edit: Some links no longer work.

Originally posted December 20, 2011 on AIXchange

Would you put bicycle tires on a new car? I keep hearing that analogy, and I like it.

When I was much younger and much less well off, I sort of did that, only instead of a bicycle, I made life more difficult when I bought used tires for my car. This wasn’t quite as foolish as choosing gas over oil, but bear with me.

A tire shop in town took in old tires when they sold customers new tires. Then they’d resell their used tires to poor kids like me. The shop charged $5 per old tire. That sure seemed like a bargain at the time, but I soon learned that you get what you pay for. My “new” old tires had no tread and wouldn’t hold air. Then I’d spring for more $5 tires. While occasionally I’d find some tires with a little more life in them, it wasn’t long before I concluded that I should just spend the extra money up front. Once I started purchasing new tires, my tire problems went away.

Now, instead of a car, think about your computers, and instead of tires, think about the operating system or applications that you run in your enterprise. Then ask yourself, why do people go with cheap computers and free operating systems?

Sure, they’re less costly up front, and we might be able to download an OS and get them to run on commodity hardware. But by taking this route, are we putting bald used tires on a fancy new sports car? Are we so focused on the cheap initial investment that we overlook the total cost of ownership (TCO)?

I like enterprise class servers and enterprise class operating systems. In the case of Power Systems, I like knowing that IBM designed the hardware, designed the hypervisor, designed the operating system, and is there to support them all. If I have issues, help is a quick phone call away. If I need to escalate an issue, I can easily gather an army of IBMers to help solve problems.

If I’m running an application that supports my business, and I need application availability, why would I want to buy used, worn tires?

Recently on Twitter, fellow Power Champion Andrew Wojnarek (@andywojo) said that Linux is free only if your time is worthless. I don’t want to go that far, because Linux has its uses. But assuredly, if given the choice, I’ll always choose AIX over Linux.

Of course, there are those who believe that Linux on VMware is good enough for their needs, as you can see from the comments on this post. What do you think?

Running nmon and topas

Edit: Some links no longer work.

Originally posted December 13, 2011 on AIXchange

Nigel Griffiths had a great session at this fall’s IBM Technical University on “nmon, topas and Friends.” I assume that you know that he actually wrote nmon (aka, “Nigel’s Monitor”). As such, it’s very enlightening to hear him speak about the history of the tool and his motivation for writing it. Besides, obviously, being very knowledgeable about the subject, he’s also a very entertaining and engaging public speaker.

Nigel mentioned how nmon and topas had come together, and detailed the history and timeline around nmon being officially supported by IBM. He then offered some tips and tricks for running both tools.

He mentioned that current versions of AIX (which you should be running) include a copy of topas_nmon. I assume it’s on your machine if you’re reading this post. We should all be using the current version. nmon “classic” should only be used if you’re running old versions of hardware and AIX, although Nigel recommends using ONLY the latest version (12e+). Keep in mind that nmon classic is functionally frozen.

Nigel said this endeavor started as a personal project, but he was soon deluged with requests for copies. He explained that the tool consumes less than 1 percent CPU and uses APIs, rather than AIX commands, under the covers. This is how he accomplished his goal of making nmon “small, simple and safe.”

While I can’t capture everything Nigel laid out in his presentation, I do encourage you to experiment and learn more about the tool. Run nmon –h and look at all of the different available options and statistics. As IBM now supports both nmon and topas, you have a choice when it comes to viewing performance data and talking to IBM about what you’re seeing on your systems.

With topas, see what you get by entering “P,” “E,” “D,” “L,” “V” or “F.” Be sure to capitalize. “E,” for instance, is for shared Ethernet adapters. Log into your VIO server, run topas and hit “E,” and you’ll see the network traffic going across your shared Ethernet adapter.

If you’re running virtual SCSI devices, try “D”–and then “d”–inside your VIOS to view virtual to physical disk mapping information.

Topas –C gives you a view of all the LPARs across your physical machine (assuming you can access each LPAR over the network).

One nice thing is if you’re on a system and topas keeps refreshing, you can freeze the screen to conduct closer analysis. Just hit the space bar.

Now try nmon. Hit “l” (that’s a lower-case L) and watch as it gives you a long-term view of physical CPU. If you observe the display over time as your CPU works and idles, you should see the scale automatically change based on your machine’s activities.

Nigel also mentioned how we can use the Stephen Atkins tool, nmon analyzer, to graph and view our nmon output.

What are some of your favorite ways to customize your topas or nmon views?

As I’ve often noted, the IBM Technical University is a great educational experience. I think every one of Nigel’s sessions was standing-room only, and I know that Jay Kruemcke had to add another session on AIX trends and directions. I’m sure other presenters drew large crowds as well. The 2012 conference will be held in Las Vegas. Plan now so you can attend.

Getting Started with SDMC

Edit: I do not know very many people that got started or kept going with the SDMC.

Originally posted December 6, 2011 on AIXchange

Ready or not, the SDMC is on its way. I thought I’d get my toes wet by trying to test an SDMC virtual machine before I used the SDMC appliance. I ordered the SDMC DVDs and received the VMware and KVM versions. Given the option, I chose a VMware farm for the installation.

The DVD had quite a few large files on it, and rather than going on-site to physically load them, I just copied the files to my destination machine. Here’s the list of files I copied from the two DVDs.

Once I copied the files from the CD to my virtual machine, I ran the CreateSDMCova.bat script to covert the files into a large .ova image file.

This is from the README.txt file:

How to build the SDMC .ova image
——————————–

1. Copy all files from this disk (DVD1) to your build directory.
2. Copy all files from the 2nd disk (DVD2) to the same build directory (from step #1).
3. From this build directory, run CreateSDMCova to build the .ova image which can be deployed using ovftool.

For Linux,
# ./CreateSDMCova.sh 

For Windows,
> CreateSDMCova.bat

I was on a Windows machine, so I ran the CreateSDMCova.bat command, generating this output:

Creating .ova image….
SDMC_1119B_730_0512.00
SDMC_1119B_730_0512.01
SDMC_1119B_730_0512.02
SDMC_1119B_730_0512.03
SDMC_1119B_730_0512.04
SDMC_1119B_730_0512.05
SDMC_1119B_730_0512.06
SDMC_1119B_730_0512.07
SDMC_1119B_730_0512.08
SDMC_1119B_730_0512.09
SDMC_1119B_730_0512.10
SDMC_1119B_730_0512.11
SDMC_1119B_730_0512.12
SDMC_1119B_730_0512.13
SDMC_1119B_730_0512.14
SDMC_1119B_730_0512.15
SDMC_1119B_730_0512.16
SDMC_1119B_730_0512.17
SDMC_1119B_730_0512.18
        1 file(s) copied.

Then when I ran the dir command in my DOS window, I saw a new .ova file had been created from all of my image files:

170 CreateSDMCova.bat
144 CreateSDMCova.sh
57 md5sum
415 README.txt
419,430,400 SDMC_1119B_730_0512.00
419,430,400 SDMC_1119B_730_0512.01
419,430,400 SDMC_1119B_730_0512.02
419,430,400 SDMC_1119B_730_0512.03
419,430,400 SDMC_1119B_730_0512.04
419,430,400 SDMC_1119B_730_0512.05
419,430,400 SDMC_1119B_730_0512.06
419,430,400 SDMC_1119B_730_0512.07
419,430,400 SDMC_1119B_730_0512.08
419,430,400 SDMC_1119B_730_0512.09
419,430,400 SDMC_1119B_730_0512.10
419,430,400 SDMC_1119B_730_0512.11
419,430,400 SDMC_1119B_730_0512.12
419,430,400 SDMC_1119B_730_0512.13
419,430,400 SDMC_1119B_730_0512.14
419,430,400 SDMC_1119B_730_0512.15
419,430,400 SDMC_1119B_730_0512.16
419,430,400 SDMC_1119B_730_0512.17
312,432,640 SDMC_1119B_730_0512.18
7,862,179,840 SDMC_1119B_730_0512.ova

I went into VMware’s vcenter and followed the wizard to deploy the ovf / ova file:

After it deployed, I powered on the SDMC and brought up the console window in VMware so I could watch it boot up. It was similar to booting an HMC for the first time.

Once it came up, I was prompted to select the locale:

After accepting the license agreement, I reached the main wizard and completed the new install.

I filled in my date, time, passwords, IP configuration information, etc. Once it configured itself, it rebooted and brought up a login web page.

Incidentally, the sysadmin ID is now used for login (rather than hscroot). Once I logged in I reached this page.

I’ll add some systems soon, and in future posts I’ll discuss any issues I encounter.

Who else is test driving a SDMC? Have you gone live? Let me know in the comments.

The Hard Lessons of IT

Edit: It is still best to choose oil over gas when given the choice.

Originally posted November 29, 2011 on AIXchange

When I was 16, I got my driver’s license. One summer I had the opportunity to live in another state. However, my primary vehicle, a 1972 VW Bug, had to stay put. Still, I figured I needed wheels for that summer, so, after a year of busing tables at a Sizzler Steak House (and riding my bike to work), I saved enough to shell out $200 on a used station wagon.

It’s easy to understand why teenaged me made that choice. Speed and power. I seem to remember being told that the wagon had a “Buick 454.” I knew just enough about cars to understand that it had eight cylinders and that no one would be able to catch me in it. My friends’ sports cars looked better, but they couldn’t touch my beat up old station wagon. I never lost a race. (Of course it helped that none of my friends had a REAL sports car, but that old wagon was pretty fast.)

Still, as you might imagine, that purchase ended up being a very costly decision. First, I had to take it through state vehicle inspection, where issues with the muffler and exhaust system were found. I paid for the needed repairs (which, if memory serves me, cost at least as much as I’d spent on the wagon itself) and passed inspection. I thought I was ready to bask in my personal Summer of George, but my vehicle issues were only beginning.

The wagon burned oil–as much as a quart or two for every tank of gas I’d consume. And, if you weren’t already thinking it, the gas mileage on this thing was miniscule. So every few days that summer, I was buying oil and gas. I figured I had enough cash on hand to coast through those two months, but before long, I was almost tapped.

Eventually, I only had enough cash on hand for gas or oil. One, not both. As a result, my bad decision to get the wagon in the first place would be compounded by an even worse decision. I chose gas over oil. In no time at all that wagon ended up stalled on the side of the road, having thrown a rod. Then it was off to the junkyard.

Oh, and I made one other bad decision. Before my wagon was junk, I could have sold it. Someone was pestering me to buy it, and my plan was to unload it on him at summer’s end. Needless to say, that plan blew up with the motor.

I did at least learn from my mistakes. (For starters, it’s oil over gas. Always.) I guess that’s the important thing. If anything, it’s even more important now. Mistakes surely happen in IT, some even more costly than blowing an engine. Sometimes a backup doesn’t get made, and a system can’t be restored. Sometimes a backup is made but not tested, and thus doesn’t work when it’s needed. Sometimes shutdown –Fr gets run on the production box rather than the test box you thought you were logged into. Sometimes a VIO server is misconfigured and the production network goes down. Sometimes an rm command is run in the wrong directory.

These experiences lead to outages, and probably a few lost jobs. But the culprits undoubtedly learned from their mistakes.

What mistakes in IT have you made and, hopefully, learned from? I love hearing others’ IT horror stories (in part, admittedly, because they didn’t happen to me). So if you’ll indulge me, please share your learning experiences in Comments. I’m sure other readers will appreciate your stories as much as I would. And maybe by sharing you’ll keep someone else from making the same error.

Caching In

Edit: I have not done much geocaching lately.

Originally posted November 22, 2011 on AIXchange

Since I wrote about Watson and its appearance on “Jeopardy!,” I’ve become interested in the show’s famous human contestant, Ken Jennings.

Mind you I’d barely heard of Jennings when he was establishing his winning streak on “Jeopardy!” But post-Watson, I started following him on Twitter (@kenjennings) and have read two of his books, “Maphead” and “Brainiac.” I’ve found I really enjoy his writing style, both in his books and his short tweets. I guess I like his sense of humor. He’s even introduced me to a new hobby: geocaching.

Jennings writes about geocaching toward the end of “Maphead.” While I was aware of the pastime, and had even watched others find a cache or two, the geocaching bug never bit me, in part because I thought I’d have to get a costly standalone handheld GPS device. However, Jennings’ words motivated me, and I found I could simply get a free geocache app for my GPS-enabled phone. (I was pleasantly surprised to learn I didn’t have to pay for an app, though many “geocachers” do buy them.) Then I registered on geocaching.com and I was set.

In the book Jennings mentions some of the things his family discovered in their neighborhood as these cache hunts took them off their normal beaten path. Geocaching became a family thing for the McNellys as well. It didn’t take long to make our first discovery–or, for that matter, our first 10 discoveries. Yes, you’ll get some strange looks scanning the bushes along a walking trail, but it’s a great way to fill some downtime. Just go for a drive and look for some caches. Some of these hides are ingenious, and the containers themselves can surprise you–finds come in everything from old film canisters to ammo boxes to peanut butter jars. And yes, you may even get a gag container with coiled stuffed snakes that leap out at you.

Pretty much wherever I go, my geocache app informs me of nearby caches. While I didn’t have time to search while I was in Miami for October’s Technical University conference, I’ll definitely do some geocaching the next time I find myself in a new city. You can start by using your GPS to get turn-by-turn directions in your car. Once you’re within walking distance of your target, you simply switch to compass mode. Of course GPS accuracy can vary with the terrain, so technology doesn’t do everything for you. You still need to look around, or even look up: Some of my finds have been cleverly concealed in tree branches or hollowed-out tree trunks, among other hard-to-notice places.

It’s funny how my initial interest in Watson led me to Jennings, who encouraged me to spend time outside using my smartphone to play hide and seek. I guess life is full of odd little journeys though. Think about how you came to work with computers, or AIX specifically, or even how you found this blog. Wherever you’ve ended up in life, you likely have some good stories about how you got there.

As John Hughes wrote for his character Ferris Bueller, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it.”

The Command Line Isn’t for Everyone

Edit: This topic still comes up with new users.

Originally posted November 15, 2011 on AIXchange

As much as I rely on the VIO server, I understand that the command-line interface takes some getting used to for those who are new to it. This is especially true for anyone coming from a non-UNIX background (e.g., IBM i). Although IBM uses similar syntax (verb noun) between some AIX and IBM i commands, usage can be quite different.

Recently when I was showing some IBM i users how to map disks in VIOS, one asked why I just didn’t use the GUI. It turned out that he’d been shown a powerful tool that I wasn’t aware of.

Have you ever checked the “configuration/virtual resources” option on your HMC menu?

When you select the Virtual Storage Management option under Virtual Resources, you get a screen where you can choose the VIOS you want to work with.

Once you do that, select “Query VIOS” to view the storage details from your VIOS.

The Optical Devices tab provides views of your physical DVD drive (if attached to that VIOS) and the *.iso images in your media repository (if any). Other tabs have additional information, but in most cases I’m primarily interested in physical volumes, as I typically map whole LUNs to client LPARs. 

The Physical Volumes tab lists the disks that your VIOS can manage.

By clicking the appropriate radio button to choose a disk and then selecting modify assignment, you can choose the parititon you want to assign your disk to. Using this method, the mkvdev command runs under the covers, eliminating the need to use the command line.

This interface can also tell you which disks are assigned to which partition, and how large the partitions are.

Although many of us AIX pros are comfortable with the command line and would never consider using a GUI, there are instances where it’s helpful–particularly if your IT department has admins who cut their teeth on something other than UNIX. You never know when you may need to teach someone how to map their disks. They just might catch on better with a GUI as opposed to a command line.

Proud to be a Champion

Edit: The first time I was named as an IBM Champion. Some links no longer work. There is even a video

Originally posted November 8, 2011 on AIXchange

I may be late to the party, but I’ll still take a moment to toot my own horn. As Doug Rock and Steve Will note, I was recently recognized, along with 13 others, as an IBM Champion, and as an IBM Power Champion in particular.

“The IBM Champion program recognizes innovative thought leaders in the technical community—and rewards these contributors by amplifying their voice and increasing their sphere of influence. An IBM Champion is an IT professional, business leader, developer, or educator who influences and mentors others to help them make best use of IBM solutions and services. IBM Champions are not employees of IBM.”

I believe that this blog was a big part of the reason I was nominated, so I want to thank all of the readers for helping to make this possible. Between this blog and Twitter (@robmcnelly), hopefully I’ve been providing information that you’ve been able to use over the years.

I’ve been an IBM customer since 1988, when I started working as an AS/400 computer operator. Things were much simpler back then, but the systems that I managed were built to last and they seldom had problems. In that regard, nothing much has changed.

I was always impressed with our local customer engineer (CE). He’d come onsite, check how things were going, and proactively run diagnostics and check error logs on the machines. I can remember asking the CE about how he got started with IBM. Even then I admired the company.

Any time we called IBM Support, our problems were handled quickly. Even with the recent switch to call-back mode, I still believe I’m getting the timely support I’ve come to expect over the years.

I worked for a few different companies during my time on the AS/400. Later when I went back to school while continuing to work full-time, IBM recruiters came to campus. When they looked at my resume and saw that I had years of experience working with IBM products, they led me to the appropriate hiring manager who helped bring me on board with IBM.

I was an IBMer for six years in Boulder, Colo. I’ve been with my current employer for four years. They’re an IBM Premier Business Partner. So for my entire career, I’ve been either an IBM customer, an IBM employee or an IBM business partner.

“Apple fanboy” is a moniker that’s sometimes given to those who love Apple products. Along those lines, I guess I’m a “Power fanboy.” I love the platform and the operating systems that run on it. I love the virtualization capabilities, the performance and the reliability. And, as readers of this blog surely know by now, I love telling others about Power Systems servers. I’ve been reading the articles and following the tweets of other Power Champions for some time, which makes me all the more proud to be included in this group and recognized for my efforts.

If you know of someone worthy of recognition as an IBM Champion, please respond in Comments. I’d be happy to be involved with nominating others for this distinction.

Note: On another personal note, the end of Daylight Savings Time here in the U.S. this past weekend stirred up some old feelings. Check out my previous AIXchange blog entry on the topic.

What to Say About What You Do

Edit: Some links no longer work.

Originally posted November 1, 2011 on AIXchange

This recent Anthony English post got me thinking. When someone asked him what he did, he wasn’t sure how to respond. How do you answer that question? Can you explain what you do in a nice 30-second elevator pitch?

Luckily for me, Watson recently made a repeat appearance on “Jeopardy!” Watson has become the basis for my “pitch”: I simply tell people I work on the same servers that they saw on TV with Alex Trebek.

Of course, I’ll never be able to count on “my” servers appearing on television with any regularity, so I still need other ways to explain to others how I make my living. Or do I? As I’ve noted previously, telling people you work on computers can have some unwanted consequences. In their minds my admission is their opening to ask me to come over and resurrect their old, possibly virus-ridden machines. (“Can’t you just add a hard drive or memory or something?”) I generally counter those requests by explaining that I work on large enterprise servers running enterprise operating systems–in other words, small machines aren’t my specialty. Of course to a lot of folks, a computer is a computer.

(As an aside, why is it considered bad form to ask your doctor and lawyer acquaintances for free legal or medical advice, but few think twice about asking computer nerds they know for free help? Maybe they figure we have nothing better to do. Maybe we need to start quoting our hourly rates.)

So what do you say when you’re asked what you do? Do you talk about a typical day at work (one where machines haven’t blown up)? I’ve heard a system administrator liken his job to a plumber’s: nobody notices or needs one until something stinks. (That’s figuratively, one can only hope, in the admin’s case.)

If you’re reading this blog, I figure you’re involved with provisioning machines. You may not have built them, but you do care for them day to day. Interacting with your machines as much as you do, over time you may even get close to them. However, eventually you must put them out of their misery and upgrade to newer gear. (Another aside: It’s actually funny how quickly this cycle can run. You just installed the fastest, shiniest new hardware, but in a few short years you’re yearning for that new, state-of-the-art box.)

Ultimately, the way I explain what I do depends entirely on my audience and their frame of reference. To those in the industry, it’s simple: I start by saying I sell and install IBM’s Power hardware line and specialize in AIX. But to non-industry folks, I can’t have their eyes glazing over from my tales of patching, upgrading, installing, cabling, provisioning and deploying machines. Not to mention backups, restores, clones, LUNs, mirrors, migrations, copies, archives, scripts, cron jobs and the like. Maybe it’s enough for them to know that my specialty is enterprise IBM hardware. Or maybe I just say I work with computers.

What do you do?

Customized Comfort

Edit: All links still work at the time of this writing.

Originally posted October 25, 2011 on AIXchange

Do you have a nice customized shell and environment? Do you have a wonderful prompt that displays your current working directory and username? Does it change your terminal window name when you login?

Do you have aliases set up so that things like oem_setup_env or ls –la are as easy as typing “oe” and “ll”? Is your PATH variable set so that you don’t have to explicitly enter the path to the command you’re trying to run? Are your favorite settings and commands–like “set –o vi,” “stty erase^” or rm asking you if you really want to delete that file–all ready to run when you login?

Like anyone, I love having the same prompts, scripts and tools available across all of the LPARs I manage. This gives me the same familiar look and feel each time I login to my machines. I’m sure you can relate. When you work for a company you can set things up the way you want–at least as far as your own user ID is concerned. Of course, customizing your team’s root login usually involves some level of compromise when multiple admins are involved. How do you decide which customizations will run with the root ID?

As a consultant, much of my work these days involves others’ machines. The various sites I travel to are all customized to others’ specifications. I cannot just login and change things to fit my preferences. In fact, when I do new OS installs, I generally don’t have anything other than defaults to work with anyway. I always figure I’ll need to login and quickly run “set –o vi.” That’s usually the minimum of what I need to get by (although “stty erase ^?” is a close second on many systems). I just need to be sure to periodically run “uname –a” and pwd so that I know which system I’m on and which filesystem I’m in.

This is my job of course, but these experiences are nowhere as nice as working my own machines and having the prompt setup to give me the information I need in the manner that I like no matter where I am. However, facing the unfamiliar isn’t all bad. I’ve picked up good ideas from the customers I visit, and sometimes they borrow some of my customization preferences.

I assume most of you work in one environment. How do you customize it? Engage me in a thought experiment: What if your favorite tools and scripts weren’t always available to you? What are the first things you like to add to a new system? Are you so set in your ways that you’d freak out working in a bare-bones new install environment?

Using backupios

Edit: Some links no longer work.

Originally posted October 18, 2011 on AIXchange

In a recent AIXchange blog entry I discussed using the viosbr command to backup VIO server settings. Now I’ll tell you about backupios. Both commands should be used in your VIOS environment.

While viobr allows you to restore mappings, backupios is used to restore the whole VIOS operating system. So think of backupios as your VIOS’s mksysb:

“[backupios] creates an installable image of the root volume group, either onto a bootable tape, file system or DVD.

“The backupios command creates a backup of the Virtual I/O server and places it onto a file system, bootable tape or DVD. You can use this backup to reinstall a system to its original state after it has been corrupted. If you create the backup on tape, the tape is bootable and includes the installation programs needed to install from the backup.

“If the -cd flag is specified, the backupios command creates a system backup image to DVD-RAM media. If you need to create multi-volume discs because the image does not fit on one disc, the backupios command gives instructions for disk replacement and removal until all the volumes have been created.

“(Note: Vendor disc drives may support burning to additional disc types, such as CD-RW and DVD-R. Refer to the documentation for your drive to determine which disc types are supported.)

“If the -file flag is specified, the backupios command creates a system backup image to the path specified. The file system must be mounted and writable by the Virtual I/O Server root user prior to running the backupios command (see mount command for details). Backing up the Virtual I/O Server to a remote file system will create the nim_resources.tar image in the directory you specify. The Virtual I/O Server must have root write access to the server on which the backup will be created. This backup can be reinstalled from the HMC using the installios command.

“The backupios command empties the target_disks_stanza section of bosinst.data (which is part of the nim_resources.tar image) and sets RECOVER_DEVICES=Default. This allows the mksysb file generated by the command to be cloned to another logical partition. If you plan to use the nim_resources.tar image to install to a specific disk, then you need to repopulate the target_disks_stanza section of bosinst.data and replace this file in the nim_resources.tar image. All other parts of the nim_resources.tar image must remain unchanged.”

When I take backups, I typically think in terms of having access to a NIM server in my environment, so I’m just interested in the VIOS mksysb. I like to run:

backupios -file vio.mksysb -mksysb –nomedialib

Using the –nomedialib flag means I exclude the media library, so I’m not backing up all of those .iso images that hang around in my VIOS’s /var/vio/VMLibrary filesystem. Of course, it’s pointless to waste that space on a bunch of CD images (.iso files), since they’re generally simple to recreate if need be. (Of course there are exceptions, so by all means backup any images that are NOT easily recreated.)

Again, be sure to backup your VIOS environment with both viobr and backupios. Together, they give you the tools you need should something go wrong.

IBM Updates AIX, POWER7 Lineup

Edit: Have you migrated off POWER7 yet?

Originally posted October 11, 2011 on AIXchange

I install POWER7 systems at customer sites all around the country. Once customers get their hands on these new systems, I find that people are wowed by the hardware speed. Especially impressed are those customers who upgrade from machines a generation or two back, like POWER5 machines.

This week IBM is announcing some changes to AIX and the POWER7 lineup. Although the entry servers will still be known as the 710, 720, 730 and 740 and the enterprise servers will still be called the 770 and 780, they will all have new model and machine type numbers. This is intended to help customers differentiate the new servers from the old, though it’s important to understand that these machines are not POWER7+. General availability is set for Oct. 21.

Here are the new numbers:

Model      Machine Type

  • 710 8231-E1C
  • 720 8202-E4C
  • 730 8231-E2C
  • 740 8205-E6C
  • 770 9117-MMC
  • 780 9179-MHC

All of the POWER7-enhanced systems end with the letter C, while of course the current models end in B, so it’s easy to determine which system type you have.

Another change made in the interest of clarity is that the 710 and 730 no longer share the same machine type and model. Also note that the 740 is no longer available as a tower — it’s exclusively rack-mounted now.

Enhanced I/O Capabilities and Higher Memory Densities 

The biggest changes in the hardware revolve around the enhanced I/O capabilities and the increased memory densities across the servers. The servers all benefit from PCIe Gen2, which, according to the announcement details that I saw, provides “twice the I/O bandwidth which will enable higher performance, greater efficiencies and more flexibility.” Keep in mind that if you’re not driving your Gen1 PCIe adapters to the point where they become your bottleneck, simply switching to Gen2 won’t magically give you better performance. However, you will get better utilization of the hardware going forward with Gen2.

PCIe Gen2 provides for more I/O ports available per adapter. You’ll now see dual port 10G Ethernet cards and 4-port 8G fibre adapters. You’ll be able to push SAS data out at 6G per second vs. the current generation’s 3G per second. The new 5913 Large Cache SAS adapter has 1.8 GB cache and can drive up to 72 HDD or 26 SSD, or you can mix and match the drive types with this adapter. A huge improvement with this card is that it no longer has batteries, so you won’t have to worry about replacing them. If it loses power the card will use a capacitor and write to flash memory. Note that this card won’t be available before Oct. 31.

Gen2 allows you to more fully virtualize your systems by pushing more I/O with fewer adapters. With the new Gen2 adapters, you’ll benefit whether it’s fibre, SAS, networking or infiniband. Moving forward, we can stop thinking about PCI-X and concentrate solely on PCIe.

These new systems have more PCIe I/O slots in the CEC, with greater functionality per slot. The familiar IVE/HEA adapter is replaced with a standard 2-port 1 GB Ethernet card (on the entry systems) and an integrated multifunction card (on the enterprise machines). The latter consists of a 4-port card with two 10GB Ethernet ports and two 1 GB Ethernet ports, plus USB ports and a serial port.

There were four card slots in the entry level CEC; now the entry systems have five slots that can be populated, while the enterprise machines have six slots per CEC. Considering the optional half height cards that can be added to the 720 and 740, you can have up to 10 total cards by counting the standard Ethernet card that comes with the system (though you can’t use another card in place of the Ethernet card in that slot).

This announcement also includes new DIMM sizes: 64 GB in the enterprise server space and 16 GB in the entry systems. This allows the new “C” models to have greater maximum memory: 128 GB on the 710, 256 GB on the 720 and 730, and 512 GB on the 740. The new 770 and 780 models can have up to 4 TB of memory in the 4 node system, 1 TB per CEC.

If you need even more cores, a 96-core large capacity 780 server is available. Imagine pairing up 96 cores and 4 TB of memory on your 780. In addition, a clock speed tweak brings the 770 to 3.3 and 3.7 GHz, depending on whether you chose six or eight core per socket. The 780 can max out at 3.92 GHz.

Finally, watch for larger capacity 15K SFF SAS drives and a 1 TB RDX removable disk drive. The latter is positioned as an intriguing alternative to tape.

As you’d expect, customers can continue to upgrade to the latest technology from existing systems, including POWER6 570s and 520s.

PowerVM and AIX Updates

Besides hardware improvements, changes are coming with PowerVM and AIX. Active memory mirroring is a feature where the hypervisor has two copies running at the same time, with both copies being updated simultaneously. In the (rare) event of a hypervisor memory failure on the primary copy, the second copy will be invoked with notification sent to IBM. This capability was previously available on the 795, but now with the new machines it comes standard on the 780 and as an option on the 770.

With AIX 7 TL1 expect to see a new feature called active system optimizer, which is designed to autonomically improve workload performance (AIX 7 on POWER7 only). A new network option you can set is called tcp_fastlo, which enables TCP fast loopback. This reduces TCP/IP overhead and lowers CPU utilization if two TCP endpoints are on the same frame (e.g., communication between two processes in the same LPAR).

In addition, AIX features JFS2 filesystem enhancements that allow admins to tune performance by altering filesystem caching. This can be accomplished without having to unmount filesystems. Compared to earlier AIX releases, there’s a 50 percent reduction in JFS2 memory usage for metadata.

Other software enhancements include:

  • A new logical volume manager option to retry failed I/O operations indefinitely. This capability can aid in recovery from transient failure of SANs, for instance.
  • AIX 5.3 WPARs, which follow on the current AIX 5.2 WPAR offering. This allows you to run 5.3 workloads inside of AIX 7 into the future (i.e., even after IBM eventually ends its support of AIX 5.3). AIX 5.3 TL12 SP4 is required to make use of the 5.3 WPARs.

With the new C models, these versions of AIX and VIOS are supported:

AIX 5.3 TL12 SP5
AIX 6.1 TL5 SP7
AIX 6.1 TL6 SP6
AIX 7.1 TL0 SP4
AIX 7.1 TL1
VIOS 2.2.1

  • A new offering called PowerSC provides automated tools for security and compliance standards on PowerVM virtual machines. Using trusted logging, you can capture and compile AIX audit information from LPARs in real-time. (Did someone make a dynamic change to an AIX LPAR?) Trusted boot cryptographically signs and validates boot images before they’re started, while trusted network connect verifies that a boot image that’s trying to connect to the network is at the correct security patch and update level. Finally, prebuilt compliance profiles match industry standards like PCI, DOD and SOX.
  • Another new capability is active memory deduplication. It’s available on the new machines running the new firmware, and is used in conjunction with active memory sharing. Active memory deduplication allows systems containing duplicate memory pages to remove those duplicates while fitting similar workloads within any physical memory constraints.
  • PowerVM offers its own improvements. Live partition mobility operations can potentially run at twice the previous speed while performing up to eight LPM operations at once. Network balancing allows for load balancing across backup and primary shared Ethernet adapters. Shared storage pools are also enhanced. These PowerVM capabilities are available on the new VIO server. I’ll definitely write much more on this soon.
  • A new entry level analytics system, the 7710, is meant for customers that don’t need the full capacity of the existing 7700 offering. While coming in at about half the price of the 7700, the 7710 is a fully optimized and integrated solution that can be used in test and dev environments. It’s targeted for those with data warehouses under 10 TB.

There are also updates to PowerHA SystemMirror, including an SAP LiveCache Hot Standby solution, and PowerHA Federated Security, which provides for centralized administration via System Director, along with additional supported storage options to use with HA (including XIV, the V7000, the SVC and DS8800 and options from EMC, Hitachi and HP).

Finally, keep an eye out for coming changes to documentation, installation, configuration, management and packaging. Although some of these improvements aren’t quite ready, IBM’s intention is to make PowerVM quicker and easier to install and configure. Look forward to things like no-touch VIOS installation, GUI-based VIOS installs, VIOS setup and validation tools, and the capability to manage VIO servers as a pair rather than individually.

Backing Up VIOS

Edit: I still find customers that are not taking good backups. Some links no longer work.

Originally posted October 4, 2011 on AIXchange

Once you’ve set up your VIO server (VIOS), mapped the disks and configured everything, one question remains. How are you going to back up those settings? The answer is the viosbr command. I wrote about this back in January 2010, but I’m not sure how many people are using it. You’ll find much more about viosbr here.

From the website:

“[Viosbr] performs the operations for backing up the virtual and logical configuration, listing the configuration and restoring the configuration of the Virtual I/O Server. The viosbr command can be run only by the padmin user.

“This viosbr command backs up all the relevant data to recover a Virtual I/O Server after a new     installation. The -backup parameter backs up all the device properties and the virtual devices     configuration on the Virtual I/O Server. This includes information regarding logical devices, such as storage pools, file-backed storage pools, the virtual media repository and PowerVM Active Memory Sharing (AMS) paging devices. It also includes the virtual devices, such as Etherchannel, shared Ethernet adapters (SEAs), virtual server adapters and server virtual fibre channel (SVFC) adapters.

“Additionally, it includes the device attributes, such as the attributes for disks, optical devices, tape devices, fibre channel SCSI controllers controllers, Ethernet adapters, Ethernet interfaces and logical Host Ethernet Adapters (HEAs). All the configuration information is saved in a compressed XML file. If a location is not specified with the -file option, the file is placed in the default location /home/padmin/cfgbackups if the user does not specify a full path for saving the file. This command can be run once or can be run in a stipulated period of time by using the -frequency parameter with the daily, weekly, or monthly option. Daily backups occur at 00:00, weekly backups on Sunday at 00:00, and monthly backups on the first day of the month at 00:01. The -numfile parameter specifies the number of successive backup files that will be saved, with a maximum value of 10. After reaching the given number of files, the oldest backup file is deleted during the next backup cycle. The format of the file name is .xx.tar.gz, where xx starts from 01.

“The viosbr command does not back up the parent devices of adapters or drivers, device drivers, virtual serial adapters, virtual terminal devices, kernel extensions, the Internet Network Extension (inet0), virtual I/O bus, processor, memory, or cache.

“The -view parameter displays the information of all the backed up entities in a formatted output. This parameter requires an input file in a compressed or noncompressed format that is generated with the -backup parameter. The -view parameter uses the option flags type and detail to display information in detail or to display minimal information for all the devices or for a subset of devices. The -mapping option flag provides lsmap-like output for Virtual Small Computer System Interface (VSCSI) server adapters, SEA, SVFC adapters and PowerVM Active Memory Sharing paging devices. The entities can be controllers, disks, optical devices, tape devices, network adapters, network interfaces, storage pools, repositories, Etherchannels, Shared Ethernet Adapters, VSCSI server adapters, SVFC adapters and paging devices. The -list option displays backup files from the default location /home/padmin/cfgbackups or from a user-specified location.”

Although viosbr is great for capturing mappings etc, you must still run the backupios command if you plan on creating a mksysb of the vios root volume group from your VIOS.

Although you may be backing up your client LPARs, you should also be backing up your VIOS.

Logging Fibre Cards into a Switch

Edit: Some links no longer work.

Originally posted September 27, 2011 on AIXchange

I recently worked with a customer that was trying to figure out how to log their fibre cards into a switch before loading an OS onto the LPAR.

I immediately thought of this recent documentation. Although this information is intended for NPIV clients, it worked just fine for our standalone LPARs and physical fibre cards.

“If a vfc-client device is defined for an LPAR which is already running an operating system, then if/when the operating system opens the vfc-client device, the device will log in to the SAN. But in some cases it is desirable to force a vfc-client device to log in to the SAN before an operating system is installed.

“SSH to an HMC which is managing the LPAR. Use the vtmenu command on the HMC to open a virtual terminal session on the LPAR’s system console. On the HMC GUI, select the server on which the LPAR resides, then select the LPAR, and shut the LPAR down if it is running. Then, use Operations>Activate>Profile>Advanced … to open the Activate Logical Partition-Advanced window. In the window, select Boot mode: Open Firmware OK Prompt. In the LPAR’s system console window, you will see the LPAR start up and present the open firmware prompt (“0 >”).”

We followed these instructions, booted our LPAR and ended up at the 0 > prompt.

We then ran ioinfo from the Open Firmware OK Prompt:

    0 > ioinfo

I then saw:

    Select a tool from the following
    1. SCSIINFO
    2. IDEINFO
    3. SATAINFO
    4. SASINFO
    5. USBINFO
    6. FCINFO
    7. VSCSIINFO
    q – quit/exit
    ==> 6

I selected option 6 to run FCINFO:

    FCINFO Main Menu
    Select a FC Node from the following list:
      #  Location Code                Pathname
    —————————————————————
      1. U8233.E8B.0623B7P-V5-C21-T1    /vdevice/vfc-client@30000015
      2. U8233.E8B.0623B7P-V5-C22-T1    /vdevice/vfc-client@30000016
      q – Quit/Exit
    ==> 1

I then selected the correct fibre channel port. (I was using two physical 2-port fibre adapters. The example above shows some virtual fibre adapters.) Then I selected 1 to list the attached fc devices. It took a minute and then it logged into the switch.

Once that was done the SAN guys did their zoning magic and we were able to boot from NIM and install the OS on SAN LUNs.

This saved the SAN guys the aggravation of manually entering WWNs, and in one case it allowed us admins to discover that one of the fibre cables hadn’t been connected to the card. Once we attached that cable and reran the FCINFO command, it logged right in.

I’ve booted LPARs from NIM servers in the past to do the same type of thing, but how about you? How do you like to set up your machines and get them logged into the SAN?

Note: A quick reminder about the upcoming IBM Power Systems Technical University conference in Miami. It starts on Oct. 10, so register soon if you plan on attending. And be sure to follow #ibmtechu on Twitter for more information.

Moving a Filesystem

Edit: Link to technote no longer works.

Originally posted September 20, 2011 on AIXchange

More than once I’ve found myself on a system where all of the filesystems were placed in rootvg rather than split out into different volume groups. By default, the mksysb backs up all of rootvg. You can set up exclude lists, but then you must remember to maintain those lists. If someone adds another filesystem to rootvg without excluding it from the mksysb, those backups can become huge. In a perfect world we’d keep our mksysb files small by putting non rootvg filesystems in application volume groups.

I had a filesystem that was mistakenly placed in rootvg on an AIX 6 machine. I wanted to move that filesystem into datavg. It was a very simple procedure. When searching for help, I found this IBM
technote.

First I ran:

umount /export/myfilesystem

Then I ran:

cplv –v datavg fslv01

This returned with:

cplv: Logical volume fslv01 successfully copied to fslv03

The document told me to run logform, but when I did I got:

logform /dev/fslv03
logform: 0507-503 file system /dev/fslv03 does not exist. Change
logical volume type to be jfs2log for an outlinelog

I double checked in /etc/filesystems, and sure enough I had an inline log set up. So I decided to gamble and just run the chfs command as outlined in the document:

chfs -a dev=/dev/fslv03 -a log=INLINE /export/myfilesystem

Then I ran my fsck as instructed:

fsck -p /dev/fslv03

Then I mounted my filesystem:

mount /export/myfilesystem

Like magic, my filesystem moved and it was all pretty painless.

I realize that this method requires unmounting the fileystem that’s being moved, so this maintenance may need to occur during off-hours. Still, it’s nice to know that the option exists to move a filesystem to another volume group should the need arise.

Now I have a clean rootvg and my application data is in datavg.

Higher Availability for VIO Clients: An Alternative

Edit: Some links no longer work.

Originally posted September 13, 2011 on AIXchange

As I’ve noted, VIO server configuration can be tricky. But while I was sitting in on Steve Knudson’s NIM presentation, he shared a unique solution for providing higher availability for VIO clients.

In VIO server environments, automatic failover is set up with shared Ethernet adapters on VIO servers. Though an effective solution, if the control channel isn’t properly configured, problems can result. Another drawback to this method is that, with ever-increasing adapter speeds, it feels wasteful to have one or more 10-GB network adapters just sitting idle until a VIOS fails.

Steve’s recommendation for better utilizing network adapters is actually spelled out in this document, “Using Virtual Switches in PowerVM to Drive Maximum Value of 10Gb Ethernet.

The authors, Glenn E. Miller and Kris Speetjens, recommend an alternative to automatic failover. They suggest enabling both VIO servers to be active at the same time, and using network interface backup (NIB) at the VIO client level. This way the administrator can manually choose which LPAR uses which VIO server, and load balance that way. In the process, we end up using all the network adapters that we paid for, which is a good thing.

From the document:

“Something that we haven’t pointed out thus far in the discussion is the fact that redundancy does have its drawbacks. The backup adapter is fundamentally unused unless a failure occurs. In the example depicted in Figure 2, there are three physical adapters and their corresponding Ethernet switch ports that are never used except when a failure condition occurs. These ports have associated costs. Within the more common 1-GB environment, it’s not too drastic. However, in the 10-GB environment it’s vastly different. One customer estimated that it cost them $16,000 for each 10 Gb/s connection provided in their data center, taking into account the cost of the Ethernet adapter, cabling and the proportionate cost of the chassis, blade and port of the Ethernet switch. Obviously, 10 GB connectivity is going to be a necessity in the near future as customers continue to consolidate more and more workloads onto smaller, much more powerful systems. However, it it may be difficult to justify 40 GB worth of bandwidth when only 10 GB will be utilized.

“A significant benefit to this design is that both VIO servers can be active at the same time. Of course, each individual client LPAR is only using one, but half of the clients could be configured to use VIO Server 1 and the other half to use VIO Server 2 as their primary paths. Each client would failover to its respective secondary path in the case that its primary path was lost. So the customer’s investment in hardware is more effectively utilized.

“Protection against this scenario is accomplished by configuring two VIO servers on each Power Systems frame and assigning resources to the VIO clients from both VIO servers. The ‘classic’ design that allows use of VLAN tagging (Figure 2) uses a control channel to allow the VIO servers to detect a failure and handle Ethernet traffic accordingly. The vSwitch design handles this at the client level by pinging external resources and failing over the Client NIB Etherchannel when a threshold of failed pings is reached.

“The classic design’s advantages are that it requires no configuration at the VIO client level and all clients can be migrated from one VIO server to another with the execution of one command on the VIO server during system maintenance. The disadvantages of the classic design is that only one VIO server is carrying Ethernet traffic at any time which means a systems is only utilizing 50 percent of its available bandwidth at any time. It also means that there is no way to test if the failover link is correct without failing over every VIO client on a frame. The vSwitch design’s advantages are that it allows both VIO servers to carry Ethernet traffic at the same time. This means that administrators are given more granular control over moving Ethernet traffic from one VIO server to another as well as utilizing a higher percentage of bandwidth during normal operations. The disadvantage of the vSwitch design is that it requires every VIO client (which uses Network Interface Backup to verify path integrity) to ping an address outside of the frame to test for failures.”

The document details the pros and cons of this option, as well as explaining how to set it up. It’s well worth reading in its entirety.

So do you see any reasons not to implement this in your environment?

Steve Knudson on NIM

Edit: Some links no longer work.

Originally posted September 6, 2011 on AIXchange

At the recent tech briefing I attended, IBMer Steve Knudson had a great session called “NIM Master Tuning and NIM Master Group Migrations.” (He also covers some of this material in this techdoc.) One thing Steve explained in the session was how to get better NIM server performance when you have several clients enabled for installation. I know I’ll put this information to use when I’m building out new servers; I often end up deploying dozens of LPARs at once across multiple frames.

He pointed out that with all of this activity happening at the same time, you might experience slow processing when enabling the next NIM client install or resetting a NIM client due to the extensive rereading of /etc/exports. In addition, the NIM master will end up un-exporting and re-exporting NIM resources for different sets of NIM clients.

From the techdoc:

“Consider setting global_export=yes. If you perform frequent simultaneous installs, when one install completes, the default behavior of the master is to unexport NFS exports, remove the completed client from the export lists and re-export the filesystems. During this interval, other ‘in-flight’ client installs may see the message, ‘NFS server not responding, still trying’ on the client console.”

As an alternative to the traditional way of exporting NIM resources to each client, you can export NIM resources as read-only for all enabled NIM clients. The NIM master will keep them set to read-only until the last client install completes. With no clients enabled, and no reservations held for any resource, you can run:

nim –o change –a global_export=yes master

If you run a showmount –e before and after making the change, you can see the difference. Before the change the resources were exported to particular clients, while after they’re exported read-only for all users. This is from Steve’s document:

    showmount -e
    export list for bmark29:
    /export/mksysb/image_53ML3 sq07.dfw.ibm.com,sq08.dfw.ibm.com
    /export/53/lppsource_53ML3 sq07.dfw.ibm.com,sq08.dfw.ibm.com
    /export/53/spot_53ML2/usr sq07.dfw.ibm.com,sq08.dfw.ibm.com

    With global_export, exports are read-only for everyone:

    # exportfs
    /export/mksysb/image_53ML3 -ro,anon=0
    /export/53/lppsource_53ML3 -ro,anon=0
    /export/53/spot_53ML3/usr -ro,anon=0

Steve does note the potential security issue here, but unless you’re worried about users getting access to data in your mksysb file or some such thing, I don’t see it as a big deal if others view your AIX install content.

Steve also points out that you can change the max_nimesis_threads attribute from the default of 20 to support a high number of simultaneous installs (16 or more). For example:

nim –c change –a max_nimesis_threads=60 master

Finally, Steve says that while the networking defaults should be fine on a default AIX install, this can verified by running:

ifconfig –a

When doing so, look for tcp_sendspace, tcp_recvspace, and rfc1323.

Also check that use_isno is set with:

no –a | grep isno
use_isno = 1

As this is a restricted setting in AIX 6.1, the –F flag must be used:

no –F –a | grep use_isno

Steve is a go-to authority on NIM. People still rely upon his NIM basics and advanced slides (from presentations in 2007) to set up their master servers and get in-depth details. It was great listening to Steve’s NIM expertise in person.

Important HMC Fix

Edit: Hopefully none of you are still running this version. Some links no longer work.

Originally posted August 30, 2011 on AIXchange

This information has been circulating for awhile, and Anthony English covers the topic here and here. But I want to make sure HMC users are aware of this important update and the need to make sure you have the fix loaded if you’re at V7R7.3.0.

A problem is known to exist when using dual HMCs in one of two environments: either one HMC is at a different level than the other, or both HMCs are at the base HMC V7R7.3.0 level without fixes.

The problem is possible exposure to corruption that could cause you to lose partition profiles.

A fix is available and should be installed immediately on any HMC that might possibly be impacted by this problem.

If you’re using an HMC and an SDMC, be sure to get the fix for the SDMC as well.

From the IBM technical bulletin:

“This PTF was released July 18, 2011, to correct an issue that may result in partition configuration and partition activation profiles becoming unusable. This is more likely to occur on HMCs that are managing multiple systems. A symptom of this problem is the system may display Recovery and some or all profiles for partitions will disappear. If you are already running HMC V7R7.3.x, IBM strongly recommends installing PTF MH01263 to avoid this issue. If you are planning to upgrade your HMC to the V7R7.3.x code level, IBM strongly recommends that you install PTF MH01263 during the same maintenance window to avoid this issue.”

The efix can be found here. This package includes these fixes:

  • Fixed a problem where managed systems lose profiles and profiles get corrupted resulting in Recovery state which prevent the ability to do DLPAR/LPM.
  • Fixed a security vulnerability with the HMC help content.

As noted, this is the statement IBM released in July, before the fix became available. The fix–MH1263 PTF–is now out, so be sure to install it.

Again, from IBM:

“Abstract: HMC / SDMC Save Corruption Exposure
Systems Affected: All 7042s
Communicable to Clients: Yes

“Description:
IBM has learned that HMCs running V7R7.3.0 or SDMC running V6R7.3.0 could potentially be exposed to save area corruption (where partition profile data is stored).

“Symptoms include loss of profiles and/or recovery state due to a checksum failure against the profiles in the save area. In addition, shared processor pools names can be affected (processor pool number and configuration are not lost), system profiles lost, virtual ethernet MAC address base may change causing next partition activation to fail or to have different virtual Ethernet MAC addresses, loss of a default profile for all or some of the partitions.

“Partitions will continue to run, but reactivation via profile will fail if the profile is missing or corrupted. All mobility operations and some DLPAR operations will fail if a partition has missing or corrupted profiles.

“Environments using HMCs or SDMCs to control multiple managed systems have the greatest exposure. Triggers for exposure include any of the following operations performed in parallel to any managed system: Live Partition Mobility (LPM), Dynamic LPAR (DLPAR), profile changes, partition activation, rebuild of the managed system, rebooting with multiple servers attached, disconnecting or reconnecting a server, hibernate or resume, or establishing a new RMC connection.

“Recommended Service Actions:
Prevention/Workaround:
There is no real work-around other than limiting the configurations to a single HMC managing a single managed system.

“Customers who have not yet upgraded or installed HMC 7.7.3 should delay the upgrade/install if at all possible until a fix is available.

“Customers who have not yet installed and deployed SDMC 6.7.3.0 should avoid discovering     production servers until a fix is available.

“Customers that have 7.7.3 or SDMC 6.7.3.0 deployed should:

  • Immediately do a profile backup operation for all managed servers:

    bkprofdata -m -f

  •  Minimize the risk of encountering the problem by using only a single HMC or SDMC to  manage a single server via the following options:
  1. Power off dual HMC/SDMC or remove the connection from any dual HMC/SDMC.
  2. Use one HMC per server (remove/add connections as needed if necessary).
  3. A single HMC/SDMC managing multiple servers might be done relatively safely if the operations listed under triggers above are NOT done to two different servers concurrently.

“Recovery:
 NOTE: Recovery will be easiest with a valid backup of the profile data. So it is extremely important to backup profile data prior to an HMC upgrade or after any configuration changes to the save area. If a    profile data backup exists this problem can be rectified by restoring using:

    rstprofdata -m -l 3 -f

“In addition to user backups, profile backups can be extracted from the previous save upgrade data (DVD or disk); a backup console data (if available); or pedbg.

“If a good backup does not exist, call your HMC/SDMC support to determine if recovery is possible.

 “Fix:
A fix to prevent this from occurring is due out by the end of July (Editor’s note: We realize this is now available but wanted to include the verbiage for completeness), but the PTF will not fix an already corrupted save area. A follow-up notification will be sent as soon as it is available.”

Please heed the warnings and load this fix as soon as possible if you’re running V7R7.3.0. And don’t run any HMCs at V7R7.3.0 while running others at a lower level.

Thoughts on IBM’s New Support Model

Edit: Any thoughts on changes to your IBM support experience?

Originally posted August 23, 2011 on AIXchange

I assume many of you saw this e-mail recently, but if not, I’ll share it here:

Dear Valued Customer,
   
We wanted to let you know about an upcoming change to our service delivery model. We know that you have come to rely on us for a high-quality remote technical support experience with access to a skilled technical representative, and we can assure you that this model change will not detract from that experience.

Effective Sept. 1, 2011, requests for remote technical support for AIX and Storage software products, entitled to base support, will receive a callback from a technical support representative in lieu of a live call transfer. This change will ensure a specialist with the required skills is assigned to your problem and create a more consistent remote software support experience across all IBM software products.

Clients with enhanced support (formerly called premium support) will continue to have live call transfer by using an assigned DAC (Direct Access Code). If your business requires a live call transfer solution, please consider one of our enhanced support offerings which includes a responsiveness component in addition to many pro-active elements to help promote IT stability.

All IBM clients with technical support contracts can open support requests electronically via IBM’s web based Service Request (SR) tool. This option allows you to provide very detailed information about your issue and environment. Electronic service requests are handled with the same priority as one submitted by phone. Regardless of your call entry choice, the service request will be routed to the appropriate technical support team and they will respond with either a callback or an electronic response.

If you haven’t visited the Support area of our website recently, we invite you to take a fresh look.

http://www.ibm.com/support/entry/portal/Overview

The IBM Support Portal offers increased access to information and solutions that will help to manage your IT environment. You can now customize your IBM Support Portal to meet your specific product information needs and ensure the resources you require are always at your fingertips.

All of us at the IBM AIX and Storage support organization look forward to assisting you with your software technical support needs, and we thank you for doing business with IBM. Please contact your service representative if you have any questions.

Here’s what I was told about how this change is designed to benefit customers:

  • “IBM will be able to better align experts with client needs and more effectively solve customer problems without having to transfer clients to another resource for support. This will improve IBM’s ability to be more responsive on high priority issues.”

Hopefully this will mean that the first person that you talk to will be able to solve the issue, instead of needing to be transferred to different people. However, it will also be important when we open calls to give IBM good information so that the right person can call us back.

  • “Maintains high client satisfaction as demonstrated in the SWMA pilot program.”
  • “No change in Service Level Objectives, and we will continue to meet and exceed our 2-hour response time objective on problem submission.”

Keep in mind, this does NOT mean that it will take two hours to get a response, just that we should expect to have heard something back from IBM within two hours in a worst-case scenario.

  • “No change in the world-class service we deliver to our clients.”
  • “A large percentage of clients open support calls electronically (vs. via phone) and are, as a result, already accustomed to callback mode.”

I know it can be convenient to open a call online and have someone call you back, but one scenario that I run into will take some getting used to. For many customers, getting a call directly into their data center is a challenge. In many cases data center phones aren’t configured to take incoming calls originating from outside of the company. And even when they are, many times IT staff don’t know the external number of this phone line. Hopefully your data center offers acceptable cell phone reception and you have a workable callback number to give IBM.

One final point: According to the letter, if we have a severity one (SEV1) problem and a system is down, we should be able to reach a duty manager and see about getting a live transfer instead of waiting for a call back.

So how do you feel about this change? Please register your thoughts in Comments, and let me know what you see as things move forward.

SSD: What’s Holding You Back?

Edit: I cannot remember the last time I had to tolerate spinning rust on a laptop. Cost has come down a ton since I first wrote this.

Originally posted August 16, 2011 on AIXchange

I’ve written about the benefits of solid-state drives (SSD). Perhaps that’s why someone sent me this 3-minute video. The speaker, whose name is Arthur Bergman, gives a rather impassioned — and let’s just say, earthy — endorsement of SSD over spinning disk (or as he calls it, “spinning rust”).

Seriously, beware the swear. This may not be suitable for listening in your workplace. Watch live streaming video from oreillyconfs at livestream.com

Some points I transcribed from the video:

  • “Everyone who doesn’t have a SSD in their machine is wasting their life.”
  • He looked at the audience, and started comparing the boot times of their machines. On his laptop he could boot in 12 seconds, versus a minute or more with a traditional hard disk. When he multiplied that time savings across everyone that was listening to his talk he said, “Every time we boot our computer we are wasting a day of time.”
  • He asked if anyone in the room had their production machines running on SSD, and did not find anyone that was, except for one guy whose environment he had built.
  • “Go buy a SSD and put it in your laptop. I keep telling everyone to get a SSD, and I keep getting back that they are too expensive. Actually they are cheaper than drives. Relevant metric is GB per IOPS.”
  • He explained that on a SSD fileserver, they were running and fsck across 8 million files in 9 minutes, and it was taking 12 minutes for an rsync backup. He was seeing 4 GB/second random reads with average latency of 0.1ms and 2.2 GB/sec random read with average latency of 0.1ms.
  • “If you don’t access your data, don’t get SSD.”
  • He also argues how you will save power with SSDs, in his case he is showing that they use 1 watt vs. 15 watts with traditional disks.
  • “1 SSD is like 44,000 IOPS, one disk drive is like 180 IOPS.”
  • Start small: “You can’t drive a Formula One car, and you are currently on a bicycle, so just get a Ferrari. $1,000 for 600GB.”

As Bergman acknowledges, price keeps a lot of folks away from SSD. But as he also points out, you can start small. On that front, IBM has come out with a SSD solution that’s designed for affordability. Check out this video on IBM Easy Tier.

IBM argues that most operations are performed on a small subset of data. With this in mind, Easy Tier is designed to automatically and dynamically migrate I/O hot spots to SSD from traditional disks. Because these systems move highly active data behind the scenes with no intervention, customers benefit from SSD without having to manually migrate data. IBM claims that Easy Tier should provide 3X throughput with 10-15 percent of the data moved to SSD.

Easy Tier is available on the IBM Storwize V7000 storage subsystem as well as the SVC and the DS8000 product lines.

So what’s your view of SSD? If you don’t use it, why? Is it due to cost, data density per drive or some other factor? Please register your thoughts in Comments. I’m curious to learn what is holding you back.

Built for Speed

Edit: Now they are talking about 5G. Most of this is still applicable.

Originally posted August 9, 2011 on AIXchange

Just what are these strange devices we’re all carrying around these days? Are they phones or PDAs? Are they small computers? (They do have greater processing power available than some of the larger machines I cared for 20-some years ago.) Are they replacements for a cable modem? Does the name “smartphone” do them justice?

On my device, I can play games like Scrabble or hangman. I can shoot birds at pigs. I can take pictures and videos. I can use it as a GPS or a map, complete with turn by turn directions. I can use it as a flashlight. I can use it as an alarm clock.

I can, of course, use it to browse the Internet. I can check my e-mail. I can use it to locate nearby hotels and restaurants. I can use it to track flights and get weather information. I can use it as an ssh client. I can ssh to it. I can use it as an external drive and move data around with it.

I almost forgot–I can also use it make phone calls and texts. And I’m almost certainly forgetting several other uses.

I’m not here to argue for particular brands or wireless carriers. For one thing, it wouldn’t be practical. Readers from this blog live around the world, and not everyone has access to every model. Besides, it’s indisputable that all of these devices–iOS on the iPhone, the Android operating system, BlackBerry OS, among others–can do so many things.

Here in the U.S., marketing folks are calling faster network speeds 4G, so that’s what I’ll call them as well. I am shocked at the amazing real-world speeds that I am seeing throughout the country when I get on these 4G networks.

If you’re wondering what sort of speed you have in your hand, check out an app called Speed Test.

As I travel throughout the country installing Power servers, I find more and more locations are enabling 4G, so I get to enjoy these fast speeds while I’m on the road. Depending on my location, I’ve seen 4G speeds in the 16-29 Mbps range. That’s faster than hotel wifi; sometimes I can’t even get that speed at home. As a guy who used to plug in a 14.4Kbps modem to get online while on the road, I’m just amazed.

The best part is I can enable my device as a hot spot, which means that my laptop runs this fast as well, even while I’m on the move in a car. The benefits aren’t just work-specific, either. This summer I’ve learned that letting the kids stream Netflix is a good way to keep them occupied during long road trips. Now the cry from the back seat isn’t “are we there yet?” it’s “how much further until we get 4G?” My children are quickly learning the difference between 3G and 4G. (Side note: Keeping the youngsters entertained gets really tough when you’re, say, traveling California’s Pacific Coast Highway. For hundreds of miles, you’re lucky to have 1X speed, if you have cellular coverage at all.)

So why do you care? If you’re in an area that’s served by these faster speeds, and you can get an unlimited data plan, you should see if it’s worth it to switch over. Keep in mind you don’t have to use your phone to get the faster network speeds on the cellular data network. The data dongles will work fine on your laptop. I appreciate this because I can bypass a company’s network. If you’re consultant, you understand. So many companies don’t want us connecting to their network, either from network security or physical connectivity points of view. In many cases, companies simply aren’t set up to accommodate us, so having this alternative can be a big help.

Sure, I’ve been able to get e-mail and download files at customer sites for awhile–I used cellular cards for years–but it’s so much easier and faster now. (Only occasionally do I find myself in computer rooms or buildings where I can’t get cellular coverage.) Basically, I’m carrying my own network wherever I go.

So how much do you love your devices? More importantly, how much do you need them? And how much more do you love and need them in light of the faster speeds? Are there other things that I should be doing when I’m mobile? Let me know in Comments.

Media Makes the Difference

Edit: Some links no longer work.

Originally posted August 2, 2011 on AIXchange

A customer recently called because they couldn’t login to their machine. A new server was being built, and someone had rebooted the virtual machine. Once the system came back up, no one could ssh or telnet to it, though they were able to ping it across the network.

I was in a location that allowed me to set up webex. This way, we could both see what was going on instead of me simply hearing about it over the phone.

We started by running putty and making an ssh connection to the HMC. From there, we ran vtmenu, chose a frame and selected the LPAR on that frame. We were able to open the console window, and we had a login prompt. However, we couldn’t login as root. We tried a few different combinations of user IDs and passwords, but no luck. The machine appeared responsive, though. Had someone changed the passwords?

The decision was made to reboot the machine and login in maintenance mode. This way we could change the root password and get logged in to verify the network communications.

Because this environment wasn’t virtualized, it wasn’t as easy as simply booting from a virtual optical disk. We also discovered that the NIM server lived on this non booting LPAR, so booting from NIM to get into maintenance mode wasn’t going to work.

Luckily the disk controller that the CD was attached to was available, so we made the controller and the CD available to this LPAR and had someone load the physical AIX DVD into the drive. We booted the LPAR into SMS mode and then selected the correct CD device to boot the machine. Instead of choosing to install AIX, we started maintenance mode for system recovery. Then we chose to access a root volume group and start a shell.

Now we were logged in as root, and we were able to poke around. The filesystems looked OK after running a df, but when we tried to run the passwd command, we got an error. Everything pointed to a corrupt /etc/passwd file, but when we attempted to look at that file, we found that it didn’t exist. Someone had accidentally wiped it out. However, because /etc/security/passwd still existed their passwords were still there, and we just needed to get a copy of /etc/passwd back into the system. Once we did so and rebooted the machine, it came right up and we could login.

We did see a few rm –rf commands in .sh_history, but we didn’t find the actual smoking gun to prove that the file was deleted. We did learn though that someone was copying /etc/passwd files around the environment, so it was certainly possible that this person erred when manipulating the files.

So how is your environment set up? Are you taking mksysbs? Are you backing up individual files so that you can recover them if needed? Do you have a NIM server available to boot and restore from? Do you have install media handy that you can boot from? Install media was the key in this case. Although my customer’s problem was fairly trivial and relatively easy to fix, having the install media on hand allowed us to resolve the issue quickly.

Virtualization Webinars Add to AIX Education Offerings

Edit: Many links no longer work.

Originally posted July 26, 2011 on AIXchange

For some time, I’ve informally collected a few go-to resources for AIX pros. For starters, there’s Anthony English’s AIX Down Under blogChris Gibson’s AIX blog and the AIX Virtual User Group-USA. And for sure, get on Twitter.

Others who provide good AIX info include Andy WojoNigel GriffithsWaldemar Mark Duszyk and whoever is behind AIX Mind. (Feel free to let us know in the Comments section.)

Beyond those, have a look at these great AIX movies. And here’s some links to the Quicksheet and QuickStart documents that cover AIX and PowerVM.

Finally, courtesy of Anthony English, here’s word of a new webinar series on Power Systems Virtualization from IBM:

“As an IT professional, you may have heard of IBM PowerVM or Power Systems based around the IBM POWER processor. You may even have seen a presentation on it, but have you wondered:

  • What is it like to actually use?
  • What are the key features for POWER and AIX, Linux for Power and IBM i?
  • How will it save me systems administration time and reduce weekend working?
  • What do I need to run it and how do I get started?

“What are we planning to do?

  • Well …. it is best to let the product talk for itself via a series of live lectures and hands-on demonstration of these features.
  • The sessions aim to be about 50 minutes long and roughly once every two to three weeks.

“Who should attend?

  • These webinars are aimed at a technical audience (operators, systems administrators and technical specialists) — people using (or planning to use) IBM’s Power based systems.
  • Primarily customers, but also available to IBMers and IBM Business Partners.”

These webinars are being held during U.K. business hours (hence the euro spelling of “virtualisation”). Currently four sessions are listed; replays are available for the first two, which have already taken place.

Session 1: Exploiting Virtualisation on IBM Power Systems with PowerVM
Session 2: VIOS — how to get going
Session 3: Controlling processor resources in virtualised partitions
Session 4: Deeper dive into Active Memory Sharing

If you haven’t figured it out by now, I’m always looking for more tips and tricks and information. So let me know: Who do you follow? And how do you keep your skills and knowledge current?

The Value of IBM Tech Briefings

Edit: Briefings and virtual briefings cannot be beat.

Originally posted July 19, 2011 on AIXchange

Last month I was fortunate enough to attend an IBM technical briefing covering Power Systems and Storage Systems.

This one-day conference covered an array of information. For starters, IBMer Ian Jarman offered some stories and anecdotes about Watson and the IBM Jeopardy! Challenge.

This talk was followed by two simultaneous breakout sessions. Rolf Kocheisen and John Purcell covered IBM Systems Director, showing attendees how to install and use the solution in a live demo. Meanwhile, Bill Wiegand’s presentation, “Simplify Storage Management with Virtualization,” examined storage virtualization solutions, including products like XIV, V7000 and SVC.

After lunch, the conference broke into five different tracks. The storage track featured seasons covering VMware on XIV (Pete Kisich), TSM for Virtual Environments (Greg Van Hise) and Data Deduplication with ProtecTIER (Neville Yates). The AIX and IBM i track covered AIX Performance (Steve Nasypany), Oracle RAC and Oracle 11g on IBM Power Systems (Rebecca Ballough), VIO introduction for IBM i (Allyn Walsh), and IBM Storage Systems on Power (Brian Sherman).

Later sessions covered: Systems Director Management Console (SDMC) (Gary Anderson), Shared Storage Pools and VIO Server Enhancements (Ron Barker), PowerHA for IBM i (Eric Hess), Cloud Computing 101 (Jaqui Lynch), “What’s new in PowerHA for AIX?” (Shawn Bodily), NIM Master Tuning and NIM Master Group Migrations (Steve Knudson), Upgrade Planning for POWER7 Hardware and IBM i 7.1 (Allyn Walsh), and WebSphere Performance and Tuning on Power (Surya Duggirala).

So why did I list all these sessions and presenters after the fact? To illustrate the breadth of information that was presented and the technical “firepower” that delivered it. If you’ve worked on IBM Power Systems for any amount of time, you surely recognize at least some of the names I shared.

The point is, even though this IBM technical briefing is past, there will be others. And if you get a chance to attend an event like this — which is free of charge, by the way — jump at it. It’s a day well-spent. I’ve heard people compare IBM tech briefings to drinking from a firehose — you get so much information that it can be overwhelming — but I’ll take my chances. (Others like to joke that IBM stands for Information Between Meals. IBM teaches you, feeds you, and then moves you along to the next session where you can learn more.)

So reach out to your local IBM reps. They’ll e-mail you with information on upcoming events, and they can also connect with IBM presenters to get you slides from previous sessions. They may even be able to help you bring an event like this to your area. For that matter, if you’re close enough to the IBM Briefing Centers in Rochester, Minn., or Austin, Texas, simply schedule a briefing for your company.

Remember, IBM and other AIX pros have produced so many freely available resources: conferences, blogs and documentation like IBM Redbooks. It’s out there, and all we need to do is ask for it or look for it.

Twitter Yields More AIX Tips

Edit: Links no longer work.

Originally posted July 12, 2011 on AIXchange

Once again, Twitter had some interesting things to tell me when I searched in #aix.

I got a laugh from this Anthony English tweet:

“Found reference to #AIX 5.4 in doco http://t.co/FUQCAuU AIX 5.4 never released – 6.1 & #Power6 took its place.”

Sure enough, check this out:

“Enhanced JFS is the default file system for 64-bit kernel environments. Due to address space limitations of the 32–bit kernel, Enhanced JFS is not recommended for use in 32-bit kernel environments. Support for datasets has been integrated into JFS2 as part of AIX Version 5.4. A dataset is a unit of data administration.”

I wonder when that reference will be changed.

Nigel Griffiths tweeted about his article on keeping VIO servers up to date.

Chris Gibson had a tweet about extending error log size in AIX. As noted here, by default, AIX sets its error log size at 1 MB. However, since it’s a circular log, useful diagnostic information is often overwritten. The size of the log can be increased dynamically by use of the  “errdemon” command in AIX.

You’ll see here the current log size with the 1 MB restriction:

# /usr/lib/errdemon -l
Error Log Attributes
——————————————–
Log File                /var/adm/ras/errlog
Log Size                1048576 bytes <<<
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000

Use this command to reset the maximum log size to 40 MB:

# /usr/lib/errdemon -s 41943040

And here’s how to confirm the maximum size:

# /usr/lib/errdemon -l
Error Log Attributes
——————————————–
Log File                /var/adm/ras/errlog
Log Size                41943040 bytes <<<
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000

Finally, this tweet linked to a nice way to “retrieve all HBA WWNs on AIX”:

for i in $(lsdev -C|awk ‘/^fcs/ {print $1}’);do echo “$i\t$(lscfg -vl
$i|awk -F. ‘/Network Address/ {print $NF}’)” ;done

fcs0    C05092032BFC00C0
fcs1    C05092032BFC00C2
fcs2    C05092032BFC00C4
fcs3    C05092032BFC00C6

As always, it pays to follow AIX pros on Twitter. You’ll find all kinds of interesting facts, tips and tricks.

Migrating Subsystem Storage Data

Edit: This is still relevant.

Originally posted July 5, 2011 on AIXchange

Both Anthony English and I (go here) have recently written articles about migrating data from one storage subsystem to another. Take the time to read them so you can add more tools to your bag of tricks.

I’ve done quite a few migrations lately, and my preferred procedure is pretty simple. Assuming I’m adding new disk (hdisk1) to my existing disk (hdisk0), I like is to add the new LUN or hdisk to the volume group, using:

extendvg rootvg hdisk1

Then I run:

mirrorvg –S hdisk0 hdisk1

After the mirror completes and I verify that the logical volumes have changed from stale to synced and the mirror is taking place in rootvg, I run:

bosboot –ad hdisk1
bootlist –m normal hdisk1

Then I verify my bootlist by running:

bootlist –m normal –o

Then I unmirror the volume group by running:

unmirrorvg datavg hdiskX
chpv –c hdiskX
reducevg rootvg hdisk0
rmdev –dl hdisk0

Finally, I can remove the mappings, adapters, backing devices or whatever I used in the VIOS to present the LUN to the client.

Were I mirroring some datavg, obviously I’d skip the bosboot and bootlist and chpv commands, but the rest would be the same. Read the two articles and you’ll find other methods you can use, like migratepv (to migrate either the entire physical volume or just one logical volume at a time) or mklvcopy.

As far as data migrations go on AIX, do you have a preference? Do you like to run sync right away? Or do you- as Anthony suggests in his piece–wait until a less busy time? Share your thoughts in Comments.

One quick thing regarding last week’s post: Anthony English pointed out in the comments that you need to run oem_setup_env and become root first. Then you can run the bosboot and bootlist commands on your VIO servers if you’re going to be messing with those commands as root. I’d neglected to mention that I wasn’t running these commands as padmin. I made an assumption, and we all know what happens when you assume.

Protecting Your Data with mirrorios

Edit: The link no longer works.

Originally posted June 28, 2011 on AIXchange

In the “good old days” of AIX administration, companies had standalone servers, and rootvg lived on internal disks. We always had at least one pair of internal disks, mirroring them to one another. In the event of a disk failure, you’d unmirror the disks and then replace the failing/failed disk. This was usually accomplished on the fly with hot swap disks. Typically the end users never even knew there was a problem.

Those who still run servers on physical internal disks still need to make sure that they’re mirrored in the event a disk needs to be replaced. But even with companies that have moved on to virtualization technology, much of this thinking is preserved today with dual VIO servers and virtual SCSI devices. If you lose one physical path to the storage, the VIO client uses the path provided by the redundant VIOS to keep running.

Most customers I work with these days boot their LPARs (I know, I’m supposed to call them virtual servers, but old habits are hard to break) from SAN. The disk protection and physical disk replacement happens on the back end with a SAN team. Although I do see customers where the SAN guy and the AIX admin are the same person, in all cases the data protection occurs behind the scenes as far as AIX is concerned. The hdisk isn’t affected as far as the OS knows.

When installing your VIOS and booting it from internal disks, it’s still a good idea to mirror that disk to the other internal disk that’s assigned to the same bus on the VIOS. With split backplanes and dual VIO servers, this thinking just needs to be taken a step further: be sure to mirror all of the disks on all of your VIO servers, assuming they’re booting locally.

To help in this regard, VIOS has a built-in command called called mirrorios. When you run it, you’ll be prompted to reboot your machine when the mirror operation completes. However, that can be deferred by simply running the command, mirrorios –defer.

When this command completes, check your bootlist. You’ll find that it hasn’t been updated with the disk that you just mirrored to. To remedy this, you must manually run a bosboot on the new disk you mirrored to, and then update your bootlist to reflect the change. If you’re wondering why mirrorios can’t perform both steps, you’re not alone. This is supposed to be an appliance, after all.

So what other methods do you use to protect your data? Surely you take mksysb images and backup your system using TSM or some other method. This of course, brings up the familiar but important question: Have you tested your restore procedures lately? Are you sure they work?

Finally, have you made sure your VIO servers are set up correctly? As with a high availability cluster, the wrong time to find out that things aren’t set up correctly is when you really need them to work.

Connecting to a Remote HMC

Edit: Some links no longer work. The SDMC never did take over from the HMC.

Originally posted June 21, 2011 on AIXchange

What do the best practices documents tell us about HMC private networks when communicating between HMC and flexible service processor (FSPs)? Is a private network switch or VLAN really needed between the HMC and the FSP? Can an HMC in a remote data center be used to manage machines over regular network links?

This 2007 document, authored by IBMers Ron Barker, Minh Nguyen, and Shamsundar Ashok, states:

“The network connection between the HMC and the FSP can be either private or open on low-end to mid-range servers. Private is preferred, and therefore a best practice. A private network is required for systems that have a BPA, such as the models 590, 595 and 575.

“In an open configuration, the FSP’s IP addresses must be set manually on each managed server. They cannot be DHCP clients of any server other than a managing HMC.

“Addresses can be set using the Advanced System Management Interface (ASMI) on the FSP. This involves directly connecting a laptop to one of the ports on the FSP and using HTTPS to log into one of the two pre-defined IP addresses. The HMC1 port defaults to 192.168.2.147; HMC2 defaults to 192.168.3.147. The systems administrator can login as user ‘admin’ using the default password ‘admin,’ which should be changed during the initial installation for security reasons. If no laptop is available, an ASCII terminal can be used on the native serial port to access the FSP menus in character mode.

“Remember, with POWER7, the addresses have changed to:

Service processor A HMC1 169.254.2.147,  HMC2 169.254.3.147 and
Service processor B (if installed) HMC1 169.254.2.146  HMC2 169.254.3.146

“Open networks are used for communications between a logical partition and the HMC. This connection is largely to facilitate traffic over the Resource Monitoring and Control (RMC) subsystem, which is the backbone of Service Focal Point (SFP) and required for dynamic resource allocation. The open network also is the means by which remote workstations may access the HMC, and it could be the path by which an HMC communicates with IBM Service through an Internet connection.”

Though HMCs will be going away in favor of SDMC, the transition will be gradual. For the time being we need to keep our HMC skills sharp, and this is one question that frequently arises when customers plan to add new systems to an environment.

Recently I helped a client set up HMC communications over an open network in just this manner. We found that our FSP ports were still going 100MB, as the ports were requested to be set to 1000/Full Duplex. They wouldn’t link up to the network at that speed.

Once we had the HMC on the open network, we pointed the HMC to the new IP addresses we’d configured on the FSPs on the local machines. That worked. We then added managed systems that were located in a remote data center without a problem. Finally, we did the same thing with an HMC from the remote data center to manage the machines in the local data center.

While I wouldn’t recommend using a remote HMC for day-to-day tasks if a local one is available to you, this is a viable option when setting up machines.

How is the HMC set up in your environment? Please share your experiences in Comments.

Time’s Practical (and not so Practical) Complexities

Edit: One of the many reasons I moved to Arizona.

Originally posted June 15, 2011 on AIXchange

I devote a considerable amount of time to thinking about time. With family, friends, clients and fellow IT pros sprawled worldwide, I must think before picking up the phone. It’s never fun to be the recipient of a 3 a.m. call because someone incorrectly calculated a time-zone difference.

Even with e-mail, I must remind myself that, in some cases, I shouldn’t expect a reply any time soon  since it’s nighttime where the recipient lives. Or I realize that no, I’m not getting messages in the middle of the night, just from other parts of the world.

My calculations are made easier thanks to a tool called kworldclock (here).

It helps me visualize where the sun is shining around the world. I’d like to see it ported to other platforms so more people could use and enjoy it. However, the Android Market has a free app called “daylight world map” that I recently downloaded. It’s almost as good.

Another useful resource is the website, EveryTimeZone.com. I’m sure there are other similar sites out there, and I’d be curious to hear about your favorites.

I’ve also used Windows desktop gadgets that display times (and local weather conditions) in different parts of the world. And with Firefox’s foxclocks extension, times in different locations worldwide can be displayed in your browser.

While the world obviously needs different time zones, I don’t understand why we compound the confusion with Daylight Saving Time. For 20 years I lived in Arizona, the one U.S. state that doesn’t observe DST. I still can’t get over the fact that the rest of the country and other parts of the world burden themselves with it. Nonethless, having since lived in other areas of the United States, I now spring forward and fall back and take weeks to adjust to the changing hours like everyone else. Who came up with this idea?

Though I am in agreement with many others who’d like to abolish DST, this group would take it a step further and halve the four U.S. time zones.

“Congress appears to have felt we were not having enough of a difficult time so in 2007 they passed a law starting Daylight Savings Time three weeks earlier and ending it one week later. This cost U.S. companies billions to reset automated equipment, put us further out of sync with Asia and Africa time-wise, inconvenienced most of the country, all in the name of unproven studies that claim we save energy.”

I can attest to this. Back in 2007 I was patching machines so computer clocks could accommodate the change. It was like a Y2K flashback. I can only hope I don’t have to go through that again.

More from StandardTime.com:

“The activists here at StandardTime.com have a modest proposal to end Daylight Saving Time that will reap large benefits in addition to ending the semi-annual changing of the clock. It has not escaped our notice that in the United States, Eastern Standard Time is the same as Central Daylight Time and Mountain Standard Time is the same as Pacific Daylight Time. Thus, we propose that The Pacific and Central time zones remain on permanent Daylight Saving Time, and that the Mountain and Eastern time zones remain on permanent standard time.”

I don’t mind planning calls or going to other lengths to facilitate communications with others from around the world. But change my clocks twice a year? Let’s just say I have no time for that.

An AIX Migration Tip Leads the Grab Bag

Edit: How long has it been since you modified tunables. Some links no longer work. I still follow most of those users on twitter.

Originally posted June 7, 2011 on AIXchange

It’s been awhile since I’ve given you a grab bag of links and tips. I’ll start with a personal experience.

A recent client with an Power server running an Oracle database was migrating from AIX 5.3 to AIX 6.1 Certain settings they specified in their /etc/tunables/nextboot took effect when they booted AIX6, and they couldn’t figure out why Oracle was running so horribly. Jobs that normally ran for a few minutes were taking nearly an hour to process. Luckily, someone noticed some messages regarding changes to restricted tunables. Upon checking /etc/tunables/lastboot.log, they saw:

Setting maxperm% to 30
Warning: a restricted tunable has been modified
Setting maxclient% to 30
Warning: a restricted tunable has been modified
Setting strict_maxperm to 1
Warning: a restricted tunable has been modified

Once they changed /etc/tunables/nextboot and rebooted, Oracle ran like a champ and the machine was fine. So add this to your migration checklist: Try the AIX 6.1 default settings first, then make modifications if needed post-upgrade. And be sure to check the tunable settings that are carried over with a migration.

You know that I frequently link to IBMer Nigel Griffiths-–follow him on Twitter as @mr_nmon. In addition to sharing some hints and tips about IBM Systems Director, he’s posted a number of entries covering options for monitoring entire physical systems (as opposed to monitoring on a virtual server or a VM by VM basis).

For instance, here are 22 things you should do before setting up Systems Director. And here are eight things you should do once you’re running it. The latter installment tells you, among other things, how your boss can run Director from an iPad.

Nigel also offers systems monitoring tips involving Directorlpar2rrdtopas CEC analyzerIBM Tivoli Monitoring and Ganglia.

Recent AIX Virtual User Group meetings have also included Systems Director info. Listen to the replays, download the presentation materials and sign up for future meetings here.

As I’ve said before, you’ll find a lot of AIX knowledge on Twitter. @nicolettemcf, @cgibbo, @ibmaix, @aixmag, @aixdownunder are just a few of the users I follow. Search on #AIX and I’m sure you’ll find others you like.

Remote Access: From the Laptop to the Phone

Edit: This is still an issue, attackers still get in and we still need better security and intrusion protection.

Originally posted June 1, 2011 on AIXchange

As I wrote recently, I remotely access machines regularly, whether I’m logging in directly or using a tool like webex to observe or help others with their server configurations.

Given my reliance on remote access, I have an opinion about virtually every option out there. For instance, RSA tokens: It can be a pain if the physical token is in another location when you need to login to a server, but it’s still a step forward. And the more recent advent of RSA software is another step forward. This way you don’t have to worry about transporting (or forgetting to pack) a physical RSA token. Either way, with RSA, no one else can gain access without knowing the password and having access to the token or the laptop running the software. (Assuming, of course, that the RSA breach earlier this year didn’t compromise the entire system–see here, here and here.)

In contrast, while I have used Gmail, and I do like it, I worry about someone gaining access to my account and deleting and copying my mail. If someone gets my Google password, it’s game over. That hacker could log in from anywhere and do anything. It does happen. I read about a Gmail user who logged into his account and discovered all of his e-mail had been deleted. Even after he verified his identity, Google could only restore a small fraction of his mail. His data was gone. I recently enabled two-factor authentication for my Google account (see here).

While I’ve not had any issues with it, this reviewer found it difficult to manage with the myriad Google apps he was using. So your experience could be different from mine. In my case it was straight-forward. Once I enabled it and had the Google authenticator application loaded on my phone, it was a simple matter of logging in to my account as usual, and then, when prompted, entering the security code. For things like mobile Gmail on my phone or instant messaging using pidgin, I needed a new password from the Google account website, but that was all easily done.

Since I always carry my phone, I’d love to see more ways to run authentication software on it. With the continuing migration to smartphones it could become more common, but where would it all end? Once software on our smartphones becomes the norm, would we advance to swiping a fingerprint on a keyboard, looking into a webcam for an iris scan or using voice recognition? Who knows what other authentication mechanisms we will eventually conjure up as we try to keep our systems safe.

As a Term, LPAR isn’t that Logical

Edit: I still say LPAR all the time. Nigel’s link no longer works. People still fight over AS/400 and IBM i.

Originally posted May 24, 2011 on AIXchange

In an AIXchange blog entry last month, when I discussed the new SDMC IBM Redbook, I noted that:

“Section 1.5 shows us how the terminology will evolve. Managed systems are now called servers, frames are power units, LPARs are virtual servers, the hscroot ID becomes the sysadmin ID, partition mobility becomes relocation, etc.”

Nigel Griffiths took this a step further, recently arguing that the time has come to call our partitions virtual servers or virtual machines rather than LPARs:

“So the observant might have noticed a sharp decline in the term LPAR in the last three or four months. Apparently … this change is now recommended within IBM and IBM marketing, so you will see a lot more use of the new terms. This is a change in name that I whole heartedly approve- unlike, for example, RS/6000 to pSeries to System p to Power Systems (which now confuses the world’s fastest general purpose computers with mains electricity power supplies!). Of course, ‘LPAR’ will turn up out of habit on the Internet, in documents and articles for many years to come and be popular with IT luddites now that it is old fashioned. …

“When I think back to it, the Logical Partition (LPAR) name never did make much sense!

“Logical: means shared or pretend or not physical.
“Partition: means a part of the whole and started life as a disk term as a group of sectors.”

Like Nigel, I guess I never gave much thought to the terms “LPAR” and “logical partitions.” I was used to them. In my head I always compared LPAR with a hard or dedicated partition or a standalone server.  The dedicated partition would have some sort of dedicated hardware–dedicated processors or dedicated adapters. A logical partition was fully virtualized using virtual or logical devices, disks, network, shared processors, etc. To me it made sense to call it a logical partition since it was using logical devices. Because dedicated and logical partitions could be mixed and matched on a physical frame, I appreciated that this terminology easily differentiated the kind of LPAR we were talking about.

Virtual systems and virtual machines are appropriate terms for this updated technology. But with VM’s history, there is potential for confusion. There’s not only VM, the old mainframe operating system, but there’s VMware, the non-IBM virtualization software. PowerVM is obviously far more powerful than VMware, but again, they have that VM term in common.

I’ll certainly try to call them VMs going forward, but don’t be surprised if I occasionally make reference to LPARs. And I’m sure I won’t be the only one–after all, plenty of customers still tell me about their AS/400 and RS/6000 systems, even though they’ve been on POWER7 servers for some time. Despite the new direction in terminology, I wouldn’t be surprised if we continue to hear about LPAR well into the future.

All that matters really is that everyone understands what we’re talking about. Perhaps going forward we’ll find a way to differentiate a VM with dedicated adapters versus a fully virtualized VM. Or does that distinction even matter anymore? How quickly do you expect your vocabulary to change?

The Hidden Cost of Poor Service

Edit: Modified the Gitomer link, still good information here.

Originally posted May 17, 2011 on AIXchange

How do you respond to poor customer service? Do you flip out, demand to see a manager and cause a scene? Do you demand upgrades? Or do you quietly walk away, telling yourself that you’ll never be back, no matter what.

What do poor attitudes and poor customer service cost you and your company? Or, on the flip side, how much does your company benefit from great attitudes and great customer service? When customers like a company or a product, they’ll spread the word to their friends. People will also let others know if they don’t like a company or product.

As Jeffrey Gitomer puts it: “The one word definition of referral is risk. … When someone gives you a referral, it means they are willing to risk their relationship with the referred person or company. They have enough trust and faith in you to perform in an exemplary manner, and not jeopardize their existing friendship or business relationship.”

Last year I discussed a bad experience I had with a computer manufacturer:

“I placed my order, and waited for my delivery. And waited. And waited some more. Eventually I got an e-mail saying that the ship date had slipped by several weeks. No kidding. Unfortunately, in this case I was counting on the system to arrive by a certain date because I’d already promised my older system to someone else. They didn’t want to wait either.”

While I didn’t publicly name the manufacturer, I’ve since had people ask me for recommendations. Needless to say, I’ve always recommended someone else. And when I’ve had other machines that needed to be replaced, I’ve gone with another vendor.

I don’t think I’m exaggerating when I say that, as a result of this one bad experience, at least six systems were purchased from other vendors, either by me or by people I know. Some of these systems included dual monitors, SSD drives and other hardware upgrades. How much money did this poor customer service cost this company? Odds are I’m not the only one who’s had an issue this manufacturer. So take my experience and multiply it three, five, even 10 times. Now we’re talking about real money.

Maybe the saddest part is this manufacturer will probably never know what happened. I didn’t mention their name. I didn’t rant about my experience on Twitter. Other than the phone call that I had to make when I was forced to cancel my order, I’ve had no interactions with them. Sure, they still e-mail special offers, but I delete them as soon as I see them. I don’t do business with them anymore.

As a consumer, I have a long memory. I know which restaurants I won’t return to and which airlines I’ll never fly again. And I’m hardly alone in this regard. I know people who refuse to patronize vendors based on bad experiences that happened decades ago.

Of course, mistakes and accidents happen, and some things are beyond our control. Will your company shine when those moments come, or will it lose customers — along with several of those customers’ friends and acquaintances? What are you doing to bolster and/or maintain your company’s reputation for good customer service?

Sometimes the Latest isn’t the Greatest

Edit: I still have a landline, and I still like it.

Originally posted May 10, 2011 on AIXchange

I know I shouldn’t say this, since I work in technology, but I still have a landline phone at home, and I like it.

Sure, I’ve used voice over IP (VOIP) for webinars, and I’ve had different flavors of Cisco and Avaya IP phones on my desk through the years (and probably some others that I don’t recall at the moment). And it’s fine. I can seldom tell the difference between VOIP and traditional landlines. On my PC I use Skype and Google voice and different kinds of VOIP software. With these solutions, my computer makes for a perfectly acceptable phone.

Still, when you get right down to it, I prefer the voice quality of my regular old landline phone. Tell me I’m a luddite. Remind me — since most people on are cell phones or VOIP these days — that my landline calls go over IP at some point in their journey anyway. I still argue that you can run into issues with latency and jitter with VOIP that you don’t face with the regular old phone system. I also prefer to have a working landline phone in case the Internet goes down or the power goes out (although I’m not sure who I’d call since the rest of you have apparently switched to VOIP).

Maybe it’s because I still do numerous conference calls, but I prefer a landline with a nice, old-school Plantronics headset. Sure, I sacrifice mobility, but I don’t have to worry about dying batteries or the connection getting choppy while I download large files. I deal with these issues plenty when I’m on the road, so I know of what I speak. I’ll be using a laptop and wireless phone with a wireless headset, and eventually, inevitably, the batteries for each will slowly drain. My Bluetooth wireless headset is usually the first to go. While I do have wired headsets that I’ll then plug into the cell phone, I know it’s a matter of time before that phone battery goes next. Then I’ll generally plug the phone into an outlet (rather than spend a few minutes dropping the call and changing the battery).

Are these big hassles? No. But they’re still hassles. Then there’s the sound quality issue. I’m convinced that cell phones still lag behind landlines in that regard.

Getting back to VOIP, it has its own drawbacks. With a VOIP software client, you cannot leave your computer. You cannot reboot your computer. You cannot move large files around without affecting the call quality. If someone else on the network starts using the bandwidth, your call quality can be affected.

Admittedly, I see fewer issues with VOIP than I once did. I also know there are products that will route calls to your cell or VOIP phone, or your home phone. And I further know that it’s 2011. But I’m the guy who still loves — and uses — an IBM Model M keyboard. Even though I work with incredible, cutting-edge technology every day, there’s still a bit of old-school in me.

However, if you’d like to drop me a line and tell me I’m crazy to not be dropping my landline — or if you want to point out some new solutions I should pay more attention to — leave a message in Comments.

Setting Up NPIV

Edit: This is still good stuff.

Originally posted May 3, 2011 on AIXchange

Following up on this recent post, I want to go into greater detail on setting up NPIV (N_port ID virtualization).

With most customers, the first question I get is, “Do I have the hardware to run NPIV?” If you’re running at least POWER6, you have IBM 8-GB fibre cards and your SAN switches are NPIV-capable, you should have what you need.

This document can help you determine if you’re set to use NPIV. If you log into your VIO server, run lsnports and find the value for “fabric” is 1, you’ll know you can safely map virtual adapters to your physical adapters. (Also remember to read the configuration document I referenced in the previous post.)

Setup is straight-forward. Create a virtual fibre adapter in your VIO server, then create a virtual fibre adapter in your VIO client. Map the virtual adapter in the VIO server to a physical fibre adapter using the vfcmap command and give the virtual worldwide name (WWN) to your SAN team.

Lately I’ve done a number of logical disk migrations for people who initially set up virtual SCSI and want to move to NPIV. Using dynamic LPAR, virtual fibre adapters can be added to the VIO server and client. The virtual adapter is mapped to the physical adapter and WWNs are obtained from the HMC. If you use Live Partition Mobility in your NPIV environment, remember that you’ll need to map both virtual WWNs, as both are used during the actual migration.

NPIV allows you some flexibility as far as using virtual adapters. I’ve seen some environments that have one adapter per VIO server, and others that map a virtual fibre adapter to every physical adapter in their VIO server. Some argue that one virtual adapter per VIO reduces complexity while providing sufficient redundancy. In many of these environments, the first virtual adapter is mapped to fcs0, the second to fcs1, etc. Whichever method you choose, I believe it’s important to test the set-up by rebooting the VIO servers. You need to verify that what you think will happen when you bring down a VIO server is what will actually happen.

I have customers that reuse the same LUN that they were using with vSCSI. In those cases, we unmounted the filesystems, varied off and exported the volume groups, used the rmdev command to remove the disk and the disk’s mappings from both VIO servers, changed the SAN zoning to map to the virtual WWN instead of the VIO servers’ physical WWN, ran cfgmgr in the client LPAR to see the disk directly in the client (importvg –y vgname hdiskX) and mounted the filesystems. It’s almost as if we never made any changes — though you need to be aware of any disk drivers or MPIO software that’s now needed in the client instead of the VIO server.

I also have customers that — rather than go through the downtime associated with remapping their disks – are fortunate enough (because they have enough storage) to just create new LUNs. They leave their original vSCSI mappings in place, map their new LUNs via NPIV directly to the client and just use migratepv to move the data from the old disks to the new disks. Then they remove the old vSCSI disks and mappings at their leisure.

One other thing to keep in mind once you complete the move to NPIV: Just because you no longer use vSCSI for your disks, you should still keep a vSCSI adapter on your VIO server and client for virtual optical devices. I know I still want the capability to use virtual .iso images as I always have.

So are you looking forward to an NPIV migration project? And if you’re already up and running, what’s your experience been like? Please share your thoughts in Comments.

The SDMC Evolution

Edit: Did anyone ever run this?

Originally posted April 26, 2011 on AIXchange

The IBM Redbook covering the IBM Service Director Management Console (SDMC) is now available. Whether you’re making the move from the HMC to the SDMC now or later, this publication will help you with your transition. It’s well worth the download.

The first time I read it, I learned interesting things like:

* “The SDMC is available as a software and a hardware appliance. The software appliance will replace the Integrated Virtualization Manager, and can manage machines from the blades up to the 750 class servers. The hardware appliance is required for management of midrange systems and high-end systems. The SDMC releases can be used alongside the Hardware Management Console during trials and deployment, which eases transition.”

* “The SDMC virtual machine contains Linux as the base operating system. For the software appliance, the client supplied virtualization options for different hypervisors include Red Hat Enterprise Virtualization KVM or VMware ESX/ESXi.”

* Section 1.5 shows us how the terminology will evolve. Managed systems are now called servers, frames are power units, LPARs are virtual servers, the hscroot ID becomes the sysadmin ID, partition mobility becomes relocation, etc.

* “The SDMC incorporates most functions of the Hardware Management Console. This has been done through direct mapping of commands or by replacing functions that are present already in IBM Systems Director. Some functions are not available in the first release of the SDMC, notably the ability to handle system plans.”

Comment: As system plans are wonderful tools that I highly recommend, hopefully this will be fixed very quickly. From what I understand this will be addressed in one of the early service packs.

* “The command-line interface has been mostly kept the same. On the SDMC, most of the commands are just preceded by smcli. This new prefix might require changes to existing scripts that use the Hardware Management Console.”

* “SDMC provides the capability to back up the whole virtual machine onto removable media or a remote FTP server. You can restore using the backup file from the removable media or from a remote FTP server. The restore will be full image deployment and all existing files will be replaced from the backup. Unlike the HMC, SDMC backs up the entire disk instead of individual files. The backup function requires that the SDMC be temporarily shut down to quiesce the disks, but it will be immediately restarted while the disk files are copied to removable media or a remote FTP server. The restore function takes under an hour to complete.”

The SDMC has been a topic of discussion at workshops and IBM Technical University conference, so hopefully most customers are up on this change. It shouldn’t come as a surprise.

Basically, the SDMC is still an appliance just like the HMC is today. It will run Systems Director code under the covers. The hardware will be the same CR6 that we’re used to, but SDMC will require more memory and disk space. There will be two 500 GB disks running in a RAID0 setup, so be sure to backup the SDMC; these disks will not be mirrored. Although I’ve heard that existing CR6 machines will ultimately be upgradeable, at GA the machines will be net new. So initially, it will probably make sense to run the HMC and SDMC simultaneously until you get used to the SDMC’s new capabilities.

Those new capabilities are impressive. The SDMC will be able to manage the whole POWER6 and POWER7 lineup, including blade systems. This is a much nicer alternative than the current solution of using and managing each individual blade via IVM. It’s a pain to use the GUI and deal with the frequent timeouts that occur when using the IVM interface. Assuming you have sufficient resources, another thing you’ll be able to do with the SDMC that can’t be done with IVM is create dual VIO servers on your blades.

Finally, the SDMC will support the capability to run live partition mobility operations between blades and standalone servers and back again. This will give customers greater flexibility as far as purchasing hardware and running workloads. With this forgiving infrastructure you’ll be able to move workloads around on the fly, and with dynamic logical partition operations you’ll be able to adjust hardware allocations on the fly.

The SDMC transition will not be a big bang change from the HMC, but it will take some time. The rollout, in fact, is expected to take years. As is standard practice for IBM when introducing updated solutions, the HMC will continue to be supported through this transition, but over time advanced virtualization capabilities will increasingly be brought to the SDMC (and not necessarily the HMC). Customers are encouraged to try out the SDMC, make a transition plan, run it alongside the HMC and get used to it.

As noted, there is a strong thread between the two solutions. The SDMC, like the HMC, is an appliance with user management capabilities and a built-in firewall. The network topology is identical on both solutions. As with the HMC, the SDMC won’t allow admins root access or the capability install software. Just as larger environments have multiple HMCs now, you’ll be able to run multiple SDMCs. You’ll still need an additional Systems Director server to manage your SDMC stand-alone devices and take advantage of advanced plugins like Active Energy Manager or VMControl.

SDMC availability is planned for May 13.

So what do you think of this change? When do you expect to see an SDMC in your environment?

Getting Started With NPIV

Edit: The link still works. This is still a good comparison.

Originally posted April 19, 2011 on AIXchange

NPIV isn’t new functionality, but plenty of customers are only just now getting started with it. I know this because lately, I’m hearing a lot about NPIV. In response to the numerous queries coming my way, I searched and found this excellent IBM Support document on configuring NPIV:

“N_Port ID Virtualization or NPIV is a Fibre Channel facility allowing multiple N_Port IDs to share a single physical N_Port. This allows multiple Fibre Channel initiators to occupy a single physical port,
easing hardware requirements in Storage Area Network design. An NPIV-capable fibre channel HBA can have multiple N_Port IDs, each with a unique identity and world wide port name.”

Compared to using virtual SCSI devices (vSCSI), storage management is greatly simplified with NPIV. NPIV allows AIX admins to zone a LUN to a particular client LPAR directly, rather than use VIOS as a middleman. So with NPIV and SEA, the VIO servers handle the shared Ethernet and NPIV duties. Best of all, there’s no need to map and track LUNs — that duty can be left with the SAN team where it belongs.

In contrast, when using vSCSI with VIO servers, your lsmap –all output can be a mess to manage if a large number of LUNs are being mapped through your VIOS to client LPARs. I’ve seen servers with hundreds of LUNs being presented to the VIOS. In those cases, the AIX admins must manage the subsets of LUNs that are then mapped to individual VIO clients. All that disk-mapping must be tracked, and I’ve seen many different spreadsheets and documents that attempt to do this.

In a typical scenario, two VIO servers will be set up (so that one can be serviced or restarted without these activities impacting the client LPARs). A fibre card or two is usually attached to each VIO server. Then the SAN team can zone the VIO servers to the SAN using the World Wide Name (WWN) information from the physical adapters. This results in a pile of LUNs that AIX admins must map to the appropriate VIO clients. To make all of the LUNs accessible from both VIO servers, each LUN’s no reserve attribute must be set. So the admins end up doing the mappings twice, once on each VIO server.

On top of that, admins must pay attention to PVIDs or LUN IDs to ensure that the disk that’s mapped on VIOS1 is the same one mapped on VIOS2. Having the no reserve attribute set on the disk can open up a potential disaster if the same LUN is accidentally mapped to different clients. If two different clients are booting from the same LUN, it’s time to look for a mksysb and do a restore.

One plus with vSCSI is that MPIO software only needs to be loaded on the VIO server. The VIO clients usually just use the built-in AIX MPIO software as they have no visibility to the disks other than recognizing that they’re virtual SCSI disks.

From this lengthy explanation on vSCSI, you might have already figured that NPIV, once you have it set up, is much easier to use. And you’re correct. With NPIV, virtual WWN information is created for each client LPAR. The SAN team gives LUNs to the client LPARs directly. Virtual fibre adapters must still be mapped to a particular physical fibre card in the VIO server, but admins don’t need to map and track LUNs or worry about reserve locks on the LUNs. (We do, however, need to remember to load the MPIO software into the client LPARs, because the clients do recognize the disks and the storage subsystems from which they come.)

I’ll have more NPIV info next week, so stay tuned.

IBM Product Preview

Edit: POWER7 blades. SDMC. Those are names I have not heard in a long time.

Originally posted April 12, 2011 on AIXchange

IBM is conducting what it calls a “product preview” today. The subject of this preview is new hardware that is expected to be rolled out later this year.

I received this information during a recent conference call with IBM.

First, there will be a new POWER7 high-performance computing machine, the Model 775. As you’d expect, this machine is based on POWER7 processors, which come with eight cores per socket running at 3.8 GHz. The 775 will be packaged with a Quad Core Module (QCM). A QCM consists of four POWER7 chips; thus, each QCM will have 32 cores.

IBM will then take the QCMs and integrate them into what they call a drawer, or a node. A 2U drawer will have eight QCMs, giving us 256 (8×32) total cores in an efficient, densely packed 2U form factor. You’ll be able to have 2TB of memory per node, and IBM estimates peak performance of 7.8 teraflops in a 2U package.

The machine will have a high-speed interconnect fabric, which allows these 2U nodes to be connected using an optical interconnect. This will provide the capability to connect to a total of four drawers (IBM calls this a supernode) consisting of 1,024 cores. Twelve nodes will fit in a rack, with a maximum of 24 TB per rack and a peak performance of 95 teraflops in the rack. Optical interconnects allow for connecting supernodes together — as of now customers could have as many as 512 supernodes (or 524,000 cores) running at an estimated 15 petaflops at peak performance.

The 775 will be a quiet machine, because it will have fewer fans. It will be water-cooled with 100 percent heat capture. The 775 is expected to be used for climate and weather modeling and prediction, life sciences, nuclear resource management, and financial services.

Here are some other things that grabbed my attention:

* The 795 is expected to have new capabilities that allow for hot node add, hot memory upgrades and repair. You’ll also see concurrent GX adapter add and hot GX adapter repair, along with concurrent system controller repair. The maximum number of partitions that can be created on a frame will increase to 1,000 on the 795, 640 on the 770 and 780, and 320 on the 750. Relatedly, when active energy manager is used with these machines, administrators will be able to set up energy policy definitions by partition rather than by system, so different policies can enable energy savings while maintaining performance.

* New blades are planned. There will be a new single-wide Model 703 blade with a maximum of 16 cores running at 2.4Ghz and 128 GB of memory. There will also be double-wide Model 704 blades, which consist of 32 cores running at 2.4GHz and a maximum of 256 GB of memory. The 703 is expected to have one hard disk bay, and you can choose either an HDD or an SSD. The 704 will have two disk drive bays, so you can have either two HDDs or four SSDs. The blades would provide the capability to run both traditional rotating hard drives and SSD drives. These new blades are expected to run in BladeCenter H, HT or S chassis.

* The Model 750 will be refreshed with new processor options, including 4-core 3.7 GHz, 6-core 3.7 GHz and 8-core 3.2 and 3.6 GHz options. The 750 will still have four sockets and 512 GB of memory per machine.

* Support for dual VIO servers across all of the POWER7 blades will be enabled through the new Systems Director Management Console (SDMC). SDMC will be used to enable active memory expansion on the blades. There will also be support for running live partition mobility operations between blades and rack servers, which will open up a whole new way to manage workloads. The SDMC will run on familiar CR6 hardware, although with beefier disk and memory requirements.

The SDMC enhancements mean we’ll no longer need to run IVM to manage our blades, and since we’ll be able to run dual VIO servers, this will make blade offerings a much more attractive option to many customers. In addition, because IBM is making the SDMC the next-generation management console for Power systems, the HMC will be phased out over the next several years. (Although the HMC and IVM are expected to be kept current with new Power systems models into 2013, they will not incorporate future advanced management capabilities.) During the transition period, customers will be able to run the SDMC side by side with existing HMCs until they’re ready to switch permanently.

On the call it was stressed that the SDMC is meant to be evolutionary rather than revolutionary. In other words, IBM says it will give customers ample time to make this transition. And really, this shouldn’t come as a surprise. For awhile now, I’ve been hearing that “the HMC is going away” at conferences and workshops I’ve attended.

The SDMC will manage POWER6 and POWER7 servers, and there will be a virtual appliance version for small-tier systems. The SDMC will utilize the Systems Director user interface. It will support a superset of HMC capabilities, integrate platform and OS management, and maintain compatibility for CLI and scripting support.

I’ll write more about this in the near future, but rest assured this solution will make it much easier to manage an entire computing environment — including servers and blades — from a central location. In addition, read IBM Systems Magazine for more details about the SDMC (a cover story is planned for the May 2011 issue). And an IBM Redbook on the SDMC is expected to be ready later this month.

* Another interesting option that I saw was an SAS disk-only I/O drawer (the EXP24S) that could house up to 24 SFF drives in a 2U form factor. This I/O drawer would allow you to partition the drawer into four different sets of disks, thus making it easier to present a smaller group of disks in the drawer to different partitions. This could be a nice option if you’re not using a SAN but still need access to more external storage.

* Finally, IBM highlighted a change to the Power systems landing page on its website. Look for this URL: www.ibm.com/power.

So what do you think of these planned solutions? I expect there to be plenty of discussion around the HMC and SDMC as we learn more in the coming months.  Please leave your thoughts in Comments.

Remote Tech Support

Edit: I still use webex all the time.

Originally posted April 5, 2011 on AIXchange

I’ve been using screen and VNC on a daily basis for years — and I’ve been writing about them for quite awhile, too. Another tool I like, though I don’t use it all that often, is portmir.

Occasionally I’ll use VNC, screen or portmir to share a session so I can troubleshoot a problem with someone. It’s not the best arrangement — I may not have access to the other user’s network, and/or that other user may be unfamiliar with these tools, or may not have them on their system.

More frequently, I find myself working remotely with customers via VPN. Many customers happily provide me with VPN access so I can help them solve their problems. All I need is a decent network connection, which usually isn’t an issue. Even when I’m traveling in remote areas, I can usually find good wireless or cellular data connections these days.

 Remote technical support has its advantages. It’s much quicker than finding time on my calendar to book a flight and schedule a trip to the customer site. And many good VPN clients are available — I’ve used Cisco, Citrix, GreenBow IPSecopenvpn and Shrew Soft, among others.

But as far as VPN has come, I still work with plenty of customers who either don’t use it or don’t allow vendors to use it. So what do I do when I can’t get VPN access to networks and machines I need to look at? Or what do I do if a new customer, due to its internal processes, can’t get me access right away? What about customers that use physical RSA securID hardware tokens? In that scenario, I have to wait while the token is shipped to me. How do I remotely get access in the meantime?

Fortunately, there’s myriad free- and fee-based web-based conferencing solutions. I really like tools like webex, GoToMeeting and Sametime Unyte. Setting up a conference, having both participants connect, and then having the customer share their screen with me is pretty painless. Many of these solutions also allow you to remotely take control of the session. Some also provide audio capabilities, although I find it just as easy to set up a conference call or make a quick phone call. Everyone can talk to each other, and because we can all view the same desktop at the same time, everyone can watch the commands as I run them or the configuration changes as I make them. These solutions work great when I can’t get VPN access, or when I have to wait for it.

This technology can also be used for training or other types of collaboration. Again, I see exactly what they see. Since we’re usually on the phone, I can easily walk a group of customers through whatever they’re working on or whatever issues they’re having.

I think that more support organizations could benefit from these tools. I’d sure love to be able to call IBM Support, share my desktop and have them see exactly what I’m seeing. Obviously this wouldn’t work with hardware problems that keep you from being able to boot up or access a machine, but that sort of thing is less of an issue these days. Usually when I have problems it’s either a configuration issue or I need to modify settings. In those scenarios, sharing a screen with someone generally makes troubleshooting much quicker and easier.

What other tools and techniques do you use for remote access? If you have a tip or use a tool that I’ve not mentioned here, please let me know in Comments.

Watson’s Impact

Edit: It does not seem like it was that long ago, and yet.. Some of the links no longer work.

Originally posted March 29, 2011 on AIXchange

The IBM Jeopardy! challenge has ended, the experience succinctly summarized with Ken Jennings’ words after Final Jeopardy: “I, for one, welcome our new computer overlords.”

I heard that the project was originally code-named bluej. IBM code names are meant to be placeholders, but still, I’m glad they went with the name Watson. Hearing that bluej beat Jennings doesn’t have the same ring, and somehow, I’m not sure the Ken Jennings reddit AMA (ask me anything) would have been quite as interesting. (Warning: comments may not be safe for work.)

Here’s another reddit AMA with Watson team members.

Though the buzz has naturally subsided, interest is still strong. From what I understand there remains a huge demand for Watson team members to speak at different events. Really, just about any conference you may attend in the near future — Pulse, Impact, Innovate: SWG Events, STG Technical Conferences, COMMON Minneapolis, Power User Groups, Smarter Computing Summit, LinuxCon, University Events and many more — will feature presentations and demos.

There of course has been tons of discussion about what Watson’s victory means for humanity, including, already, a book.

The Jeopardy Archive breaks down Watson’s win (herehere and here).

Recently I saw the webcast, “Beyond Jeopardy!: The Business Implications of IBM Watson.” The participants explore potential real-world uses of this information processing technology (healthcare, for example).

Thanks to Watson, lately I find myself talking about Power systems with non-technical people. Generally, these folks would have a hard time imagining what I do for a living, but because they saw Alex Trebek in the computer room with those servers and computer racks, I found I could more easily explain what I do: “I rack and stack and configure and sell those 750s you saw on Jeopardy!, along with the rest of the IBM Power server product line.”

By the way, Watson may have won the challenge, but it isn’t undefeated. Rush Holt, the New Jersey congressman and a Jeopardy! champ from the 1970s, recently beat a slightly slower Watson version. From CNN:

“After beating Watson $8,600 to $6,200, Holt expressed admiration for the machine, saying the technology has the potential to be extremely useful in situations that require tough decision-making, such as medical diagnosis, air traffic control, and situations that require piecing together bits of knowledge.

“Such technology can also be extremely helpful in emergencies, like an outbreak of a food-borne illness or a natural disaster, said Chris Padilla, vice president of IBM Governmental Programs.

“‘In the modern world, we’re all flooded with information,’ Padilla said. ‘What Watson can do, is go through all of that data, and in response to a natural language question, rank the order of likely responses in terms of what you asked it in the first place.'”

IBM has a webpage filled with interesting facts. For instance, during preliminary sparring matches Watson only used 75 percent of its processing resources. And did you know that a computer with a single processing core takes more than two hours to perform the deep analytics needed to answer a single Jeopardy! clue? Watson, in contrast, holds all the information that it needs to compete on Jeopardy! in about 500 GB of space. You’ll also find flash animations of the machine on stage, a graphic depicting the physical server layout and links to Watson’s architecture, workload, energy storage and network usage and more.

Finally, check out IBM’s Watson website and this whitepaper.

I expect Watson will be talked about for quite some time, both for what it did and for what it can potentially do. Are you still following Watson? Did you enjoy seeing POWER7 machines on prime time television? Are your non-technical friends actually interested in what you do now? Feel free to post in Comments.

#watson

How Much Memory?

Edit: The link points to POWER8 servers at the time of this writing, but the principle is still the same.

Originally posted March 22, 2011 on AIXchange

When ordering a Power server, the number of sockets you pick and the dual inline memory module (DIMM) size you use matter. Consider the 8233-E8B server, commonly called the Model 750. This would be the same model machine that was selected to build the Watson cluster.

With a new machine you have a number of choices to make. A particularly important one is the amount of memory you want. You can choose from memory kits of different sizes, which will allow for different memory densities on the machine.

According to the facts and features guide, the maximum available memory for a 750 is 512 GB. The guide also notes that the machine supports from one to up to four sockets.

The number of sockets you choose will tell you how much memory you can order. If you have one socket, you’ll have eight memory slots. The ratio stays the same moving up: with two sockets, you’ll have 16 memory slots, with three sockets, 24 memory slots and with four sockets, 32 slots. If you’re looking to max out the memory on the machine, you’ll want to max out the number of sockets. While other choices can be made here — e.g., CPU clock speed and the number of cores per socket (either six or eight for the 750) — to reach 512GB on the system, you must choose the largest memory size, 32 GB, which is packaged as 2x16GB memory DIMMs.

Of course, other memory sizes are available for your machines: 8 GB (2x4GB) or 16 GB (2x8GB) DIMMs. But once you make your memory size selection, you need to stick with it, because, if you upgrade the machine in the future, 16 GB and 32 GB features won’t mix. They must be the same feature code. Since, in this scenario, you’re trying to max out the machine, the 32GB (2x16GB) memory option is the choice. And since you have 32 slots for memory in your 4-socket machine, you can see how 16GBx32 slots gives you 512 GB.

So how do I know that you cannot mix DIMM sizes? Here’s a little story, as told to me by someone who was there.

Once upon a time, a Power server was getting a memory upgrade. This system (not a 750) was being boosted from 16 GB to 32 GB total memory. There were 4x2GB DIMMs attached to each processor (eight total), and eight new memory DIMMs were ordered. Since the memory was going to be doubled, it seemed logical to just plug in the eight new DIMMs and power the machine back on. So, the new memory was installed and the machine was rebooted. And then? Error codes started flashing across the LED.

Perhaps the memory was not seated properly? After pulling and reseating all of the memory, the same error came up. At this moment, someone finally thought to check the boxes that the new memory came in. Sure enough, the new memory DIMMs were of a higher density — 8x4GB rather than 8x2GB. Because you can’t mix memory sizes, the machine issued errors. Once the 8x2GB memory was pulled, the machine came right back up with 32 GB.

Here’s something to think about when ordering machines: Do you expect to add more memory in the near future? Sure, this can be difficult to predict, but if you think you’ll eventually upgrade the memory, try to leave yourself some open slots. If you max out your memory with a smaller DIMM size, your only option down the road may be to pull out the smaller DIMMS and replace them with larger DIMMs.

If your machine supports it, think about using Capacity on Demand. That way you’ll have a machine with max memory physically installed, but you’ll only use (and pay for) the activated memory you need now. Should you eventually elect to upgrade, additional physical memory can be activated later. It can help take the guesswork out of future upgrades.

Whatever choices you make, be sure you know what’s installed on your machine — and what you’re adding to your machine — before opening it up.

It’s Lame to Blame

Edit: This is still an issue today.

Originally posted March 15, 2011 on AIXchange

I enjoyed reading this article on some of the turf wars that go on in IT:

“IT pros do battle every day — with cyber attackers, stubborn hardware, buggy software, clueless users, and the endless demands of other departments within their organization. But few can compare to the conflicts raging within IT itself. Programmers wage war with infrastructure geeks. IT staff butts heads with IT management. System admins battle for dominance. And everybody wishes security would just leave them alone.”

I can certainly relate to this. In fact, I cannot count the number of times I’ve seen server guys blame network guys who blame SAN guys who blame operations guys who blame management. At one point or another, any IT group may be viewed as “the enemy.”

From my server-centric point of view, even when problems are addressed, it can be frustrating. Often the server guys will tell the network team about some connectivity or response-time issues. The network team fixes the problem, but they seldom share the solution. It can really seem like they don’t want you to know what they did. You just get those infamous words, “try it now,” and the network problem magically vanishes. Sure, we’re all glad things are fixed, but there’s value in transparency. If we know the specifics about a problem with network connectivity or, say, accessing a LUN, we can remind the network folks or the SAN folks of what they previously did, should it happen again. We could save them time, if they’d just keep us informed.

I don’t mean to paint the server folks as angels. I’m sure I’ve told various users to “try it now” while neglecting to explain the source of the problem. I’m sure many admins, when asked why the time is wrong on the server, or why applications cannot resolve a hostname, or why users cannot login, or why users have the wrong home directory, or with any number of issues, respond with “try it now.”

I suppose part of the reason for turf wars stems from the fact that, in larger organizations, these groups often have different team members with different skill sets, and many times individual team members use only their own hardware, with no cross-training. The network guys work on the network switches, the SAN guys work on the SAN switches, the server guys work on the servers.

So everyone’s isolated — except of course, when there’s a problem. Then everyone must work together. And for the most part, everyone is a professional. Still, there are times when people are more interested in deflecting blame from themselves and their team. No one wants to cop to a mistake. Honestly, these turf wars could make for a great reality show — if only more IT people looked good on camera.

So how do we resolve turf wars? Start by remembering that you’re all in the one organization, and that, despite the many different areas of IT expertise, everyone in IT really has a stake in the computing environment.

Especially as organizations grow, it’s vital that everyone be kept informed of changes to the environment. To accomplish this, these changes must be documented, even if documenting changes typically unleashes unwanted bureaucracy. For instance, a new server is brought onto the raised floor. A ticket is written, and the notifications fly. I’ve seen these new-server tickets reach everyone from the network and SAN team to the backup and monitoring teams. While the bureaucracy frustrates me as much as anyone, these processes and procedures are generally in place for sound reasons. No one is looking to slow down your server build, but other teams do need to be informed of changes that will eventually impact their workloads as well.

That’s why we should play nicer with one another. Give those in the “other group” the benefit of the doubt. Assume we’re all doing our best.

More from the article:

“Down the road someone will ask, ‘Do you know so and so?’ and you’ll say, ‘Yes, he walked out on us and took our passwords with him.’ It’s a small industry. The only things that have meaning in this life are your name and reputation. Lose them and you’ll never get hired again.”

If you’re always combative and causing drama, people may stop working with you, or avoid you until they are forced to work with you.  It really is a small world and people will remember your interactions with them. It’s really in your best interests to make the effort to get along with others. And that will make everyone’s days that much easier.

One more snippet:

“The most important decisions a CIO faces aren’t about technology per se, but about business outcomes. And that may never enter the mind of an in-the-trenches IT grunt. ‘I’ve had a lot of discussions with a lot of very tech-savvy CIOs,’ he says. ‘But at the end of the day, the business decisions they need to make aren’t based on sexy technology — they’re based on business outcomes. There’s pressure on the CIO from the CEO to deliver business value. The IT guys are focused on the technology in their particular tower.”

In other words: Managers may have a completely valid reason for denying a new technology that you recommend.

It’s natural to get caught up in our areas of expertise. But remember that we’re not only supporting servers, but applications and the users of those applications. We’re all providing value to an organization.

10 Rules for Admins

Edit: This is still a good list of rules.

Originally posted March 8, 2011 on AIXchange

A few months ago I took a class with IBMer Tommy Todd, who highlighted 10 rules for administrators that he had accumulated over the years. I’ll run down his list, and comment about each rule. Then I’d appreciate your thoughts.

Documentation: Make sure your documentation is up to date. Ask yourself how you’re documenting your systems. I really like to generate a sysplan from the HMC. It shows me a diagram of the physical hardware, where the adapters are assigned, how the LPARs are configured, etc.

Make backups: How are you backing up your machine? Do you backup both the operating system (rootvg) and data (datavg)? Are you periodically running mksysb commands, and have you tested them? Can you restore your machines? Have you tested your disaster/recovery plans? Did you back up your HMC and VIO servers?

Try it three times: Did you fat finger something? Do you have poor typing skills? Did you use the wrong flag? Do you need to go look at the man pages?

Don’t overlook the obvious: Many times the answer will be simple. Recently someone was trying to remove a directory and couldn’t do it. Fuser, lsof — nothing was showing that the directory was in use. The admin was stumped. It turned out he still had a mounted filesystem on that mountpoint. Once he unmounted the filesystem, he was good to go. How many obvious things have you overlooked?

Try it, it might work: I like to log into test machines and try different things; you never know what you’ll learn. For me the best learning is hands-on learning.

Never say never, always avoid always: There will be exceptions, there is usually more than one way to reach the same endpoint. In other words, don’t say “it always works that way,” or “it will never work like that.” The technology does change. Things that didn’t work before do now, and vice versa.

Make a copy before you edit anything: You might have a copy out on a TSM server or backed up somewhere, but what if that backup copy has an issue? It’s nice to have that safety net, but it’s smart to cp /etc/hosts /etc/hosts.orig before making file changes. If you find yourself making changes to /etc/inittab without using chitab, be sure you back it up first.

There’s usually another way to do it: Especially in UNIX, there’s more than one way to do something. The religious wars come up when people believe that theirs is the only way. I like to hear about how other people do things and learn from them. Many times they do things their way because they had issues in the past. We can all learn from others’ mistakes and benefit from their hard-earned knowledge.

Login as yourself, switch to root when it’s needed: With tools like sudo and role based access control (RBAC), do we really need to be logging in and moving around as root? One wrong keystroke can spell disaster when you have super-user authority.

Don’t say, “I’ll go back and fix that later”: There’s no time like the present to fix your issues. If you must “fix that later,” be sure document it somewhere so you have a reminder to actually come back and fix it later.

Never keep your resume on the system you’re supporting: What if the machine crashes and you don’t have it on a backup server? What will you do then?

Do you abide by all these rules? What are your own rules? Please register your thoughts in Comments.

Debating Support Scenarios

Edit: These are still interesting topics to consider.

Originally posted March 1, 2011 on AIXchange

In a recent post, I said this:

“Troubleshooting and administration are done via the network, from anywhere on the globe. This is great, especially for companies that utilize sun-support scenarios, where different teams in different countries and different time zones support machines during their normal business hours. Provided that good turnover information is being passed on from shift to shift, and calls and trouble tickets are accurately logged in a searchable database, this is a terrific support setup. At least it’s preferable, I think, to having IT staff members carry pagers and get called in the middle of the night to work on problems.”

However, this counter-argument is sometimes made to the follow-the-sun support scenario: If the administrators who built the machines are the same people who will get paged in the middle of the night in the event of a problem, then these admins will be extra careful when configuring their machines in the first place. Ultimately, if extra care is taken up front, there are fewer emergency calls.

Beyond that, some believe that the admin who built the server is the best person to fix it. We do get to know our machines over time. We know how they normally behave, we know where the logs are and when the cron jobs run, and we remember that quick little change we implemented a few days or weeks ago. An administrator who’s servicing an unfamiliar machine on a 3 a.m. call may need some time to get familiar with the applications it runs and its other unique characteristics.

All of this sounds logical, but I feel that the familiarity factor is a bit overrated. These days, many organizations take the time to standardize the look and feel of all their machines so that any team member can log into any machine and get right to work. But let me expound on what I said in the previous post: What I like about the follow-the-sun scenario is that people are actually working on the machines during their normal daylight hours. They’re not sleep-deprived; they’re fresh and alert and able to work on issues during the normal course of their day. And anything that isn’t resolved can be left for those coming in on the next shift.

Of course, in those cases, there’s a need to bring new shift members up to speed on what’s already been tried. But this isn’t all bad, either. Many times I’ve worked with IBM Support on issues that took multiple shifts to resolve. The departing shift members fill in the people coming in, and we continue to troubleshoot problems. Sometimes it helps to have a new set of eyes looking at a problem. A group of people will spend lots of time on an issue, then a new person will come in and immediately spot something that the rest of us overlooked. I’ve seen it happen.

Admittedly, globally dispersed support teams are a luxury available to only a few large companies. The rest of us generally work within individual IT departments.

So how do you deal with support issues? Do you prefer to have the on-call pager for a week at a time?  Do you prefer to have dedicated staff working second and third shifts? Is your after-hours call volume so high that you can only handle a few days of it before exhaustion creeps in? I knew a guy who hated his turn on the pager rotation so much that he would bribe his teammates — to the tune of hundreds of dollars — to take his week for him.

Hopefully you’re on good terms with your IT team and can adjust your schedule when need be. And hopefully your bosses recognize the perils of pager duty and allow you time off after an extended period of night calls. But how does your organization handle this? If you have some solutions — or some horror stories — e-mail me or make a post in Comments.

More on VIOS Installs

Edit: Some links no longer work. It seems this is easier to do now via the HMC.

Originally posted February 22, 2011 on AIXchange

Anthony English offered an intriguing comment about an issue I had during a VIOS installation.

His response: “I prefer to install the VIOS without using physical media at all using the HMC command line and the installios command. Requires an FTP server or NFS mount. I download the VIOS install media from Entitlement Software Support and then install the VIOS without NIM or physical access to a managed system. True bare metal install. This requires an HMC. Don’t think it can be done for an IVM managed system.”

I decided to try installios command from the HMC as Anthony suggested. I installed a new server. I defined both VIO servers, and I installed my first VIOS from install media as I always do. I defined my second VIOS with appropriate physical and virtual resources and put the VIOS install media into the HMC DVD drive.

I logged into the HMC GUI and selected HMC management. Then I went to open restricted shell terminal. On the command line I typed installios. (Obviously if need be I could have connected remotely via ssh to the HMC to run installios.)

The machine came back with:

“The following objects of type ‘managed system’ were found, please select one.”

I chose the managed system that had the VIO definition created on it.

“The following objects of type ‘virtual I/O server partition’ were found. Please select one.”

I chose the VIO server definition I wanted to load.

“The following objects of type ‘profile’ were found. Please select one.”

I chose the profile I was interested in.

“Enter the source of the installation images [/dev/cdrom]:”

I hit enter to take the default. Then, as prompted, I entered the client’s intended IP address, intended subnet mask, gateway and client speed.

It came back with:

“Please select an adapter you would like to use for this installation. (Warning, the client IP address must be reachable through this adapter!)”

I chose the appropriate adapter, the one that had been configured as my “open” network adapter in this case. I then watched as the adapter information was retrieved and the HMC automatically powered up my LPAR and began the installation.

It came back with what looked like an SMS screen from inside of the VIOS. Here I was able to choose the correct network adapter to use for the installation.

It then prompted me for a language and locale (en_US in my case) and gave me a screen containing a summary of the information I’d entered up to this point. I was given the option to proceed (Enter) or cancel (type Ctrl-C). It then showed me a license agreement screen. Once I accepted, it fired up nimol resources to start loading the other VIOS. It copied booti.chrp.mp.ent.Z, ispot.tar.Z, mksysb, etc., from the CD media to the /extra filesystem on the HMC.

And at this point, it failed. When I looked at the log, I realized why. When prompted for the network card’s speed, I just took the defaults: 100/Full. As I was using a virtual network adapter, these settings were not correct. So I went back through and made sure to select auto/auto for the virtual network adapter. Once I did that, it worked as expected.

I saw:

Connecting to vios2
Connected
Checking for power off
Power off complete
Power on vios2 to open firmware
Power on complete
Client ip address
Server ip address
Gateway ip address
Subnetmask ip address
Getting adapter location codes
Network booting install adapter
Bootp sent over network

It then brought up the /var/log/nimol.log and displayed on the screen what was happening during the install.

One thing I don’t like about this is not having the option to select the disk I want VIOS installed to. This isn’t a big deal on a fresh build, but it could be if you’re installing to a system that could potentially overwrite existing data. However, if you don’t plan on creating a NIM server in your environment, this HMC method certainly works fine.

As Anthony noted in his comment, there are multiple options for loading VIO servers. Using the HMC is certainly another worthy option, especially considering how easy it is.

Hey, Cut Me Some Slack!

Edit: This is still relevant information to consider.

Originally posted February 15, 2011 on AIXchange

I recently shared some of my gripes concerning modern data centers, as well as the importance of keeping actual people in mind when designing and constructing these buildings.

On a somewhat related note, another trend I’m seeing is those nice new, pristine racks become a nightmare once it’s time to service the equipment.

I’ve seen plenty of newly constructed raised floors. These facilities look marvelous. The cables are all color-coordinated and very neatly laid out. They’re cut to precise lengths (no slack) and tied down from the switch, through the cable management trays, into the cable management arms and into the server. The people that tour these places must go away thinking that any company that goes to these lengths to organize its IT equipment this organized must be on top of its entire operation.

The problem is, raised floors aren’t meant to be pretty. When the cabling is that precise, it can actually be a problem. For instance, without the slack, you can no longer slide the drawers forward in the rack to service the components.

IBM has quite a few hot swap parts in their computers. They have great rails that they use to mount their servers in the computer racks, and these rails allow you to easily slide a drawer in and out of the rack. (Aside: I find that the latest design for the HMC and the POWER7 rails is the easiest to install, while the rails for some storage switches which shall remain nameless are the worst. The last thing I want to do when installing switches is assemble rails.)

Anyway, here’s the big deal with cables cut to size: When your machine needs service, the only way you can slide it forward and fix it is to unplug everything. The machines have rails for a reason — so you can move them a bit to tinker with them when necessary. If you tie down your cables without leaving slack, you defeat the purpose of having these redundant hot swap parts for the machines.

If I need to unplug everything to service a machine, I have to be careful to avoid bumping into the other servers, and I need to hope that the cables are labeled properly so that they get plugged in correctly when the service is over. When you’re talking about multiple adapters, multiple connections and multiple serial and HMC cables, that’s not a trivial number of connections.

Another interesting thing I see is some people not using any rails. They just put shelves into their racks and stack two or three servers on a shelf. I don’t think this is any better. I still cannot slide the machines out, and worse, if I need to reach one of the bottom machines, I may need to power off multiple physical servers just to get at it.

Cleanliness has its place in the computer room, of course. You should make sure your racks and cabling are clean. But think about what you actually need and, literally, cut your service personnel some slack — make sure there’s enough slack in the cables that each drawer can be easily pulled out when service is needed.

Computer Rooms are Still People Rooms

Edit: We still need to plan for humans in data centers.

Originally posted February 8, 2011 on AIXchange

I travel to customer sites across the country — including customer-owned facilities, outsourcing facilities, disaster/recovery facilities and co-location facilities — and I see plenty of raised floors. But I’m always fascinated by how much these sites cater to machines rather than people.

These days many large data centers are designed as lights-out environments, where people don’t need to go onsite at all. Troubleshooting and administration are done via the network, from anywhere on the globe. This is great, especially for companies that utilize sun-support scenarios, where different teams in different countries and different time zones support machines during their normal business hours. Provided that good turnover information is being passed on from shift to shift, and calls and trouble tickets are accurately logged in a searchable database, this is a terrific support setup. At least it’s preferable, I think, to having IT staff members carry pagers and get called in the middle of the night to work on problems.

Then there are the colo data centers that many companies now use. Customers have one or more racks that sit next to other customers’ racks, and each is housed behind its own chain link fence. Although the prison aesthetic of these “caged” machines can take some getting used to, again, the concept has its place. Customers can get personalized attention from the staff that typically mans the facility 24-7, and the costs to rent space can be quite reasonable.

However, the reality is that computers still need to be installed and decommissioned on a regular basis. Even if these new facilities are designed to have only a few people working onsite, IT folk are constantly coming in and out. Recently we joked about one large site reminding us of the place in the original Raiders of the Lost Ark where they ended up storing the Ark of the Covenant, just a huge cavernous warehouse full of pallets as far as the eye could see.

The simple, frustratingly overlooked truth is that people need to get stuff done, and they need room to do it. I can’t count the number of times I’ve had to stack cardboard boxes to create a makeshift desk on a raised floor or temporarily cover the perforated floor tiles that are needed to cool the raised floor. The computers may need the AC to do their job, but I can’t do mine if I’m a human popsicle.

It always amazes me to see these new facilities and their state-of-the-art security apparatus: cameras, biometric man traps, retina scanners, voice recognition systems and sensors that weigh people coming and going (to ensure that you’re not walking out with some valued piece of equipment). And yet, many of these same places are constructed without enough conference rooms, lounge areas, work spaces and even bathrooms.

It goes without saying that if these facilities don’t have enough room for people on a daily basis, they’re not equipped to handle a disaster scenario, either. I’ve heard facilities managers say that in the event of a disaster that would bring an influx of IT folks to their site, they’d just get some portable toilets. (I guess that would work as long as it isn’t one of those disasters where it’s really hot or really cold outside; otherwise those Porta-Potty trips could get a little uncomfortable.)

The point is, if you’re in charge of planning and managing a data center, remember us humans. Unlike the computers, we need places to eat. We may need places to sleep. We absolutely need adequate restrooms. I realize that these facilities are built for computers, but as long as computers need people to work on them, these sites must be designed with people in mind.

IBM’s New Software Compatibility Tool

Edit: The link no longer works.

Originally posted February 1, 2011 on AIXchange

IBM has come out with a new software compatibility website.

I learned of this site from a mailing list, which offers this description:

“Clarity is the new tool based on Clearing House data designed to allow users to easily generate custom reports about compatible IBM software combinations…. Using this tool customers may create reports about a product’s compatibility with operating systems, prerequisite software or virtualization environments. They can also generate EOS reports for [IBM] products.”

When you go to the site you’ll find lists of available reports, including:

* Operating systems for a specific product.
* Prerequisites of a specific product.
* Virtualization environments supporting a product.

I was interested in “Products that use a specific operating system,” so I selected “Products supported by AIX 6.1 POWER System.” (Options ranged from the current AIX 7.1 and back as far as AIX 4.3.)

The tool produced a report displaying a list of products that were supported, under headings with names like:

* Information Management (DB2, InfoSphere, Informix)
* Lotus (Domino, Mashups, Quickr)
* Rational (Asset Manager, ClearCase, COBOL)
* Tivoli (Access Manager, Configuration Manager)
* WebSphere (Application Server, Business Monitor)

I didn’t count the number of products listed in the report, but it was several pages worth of information. Besides the software name, it also displayed the versions that were supported. (To my surprise, some of these products — including Tivoli Access Manager, DB2 and Informix — support AIX 4.3.) This is a great way to quickly determine which levels of software are supported by a particular operating system version.

Using checkmarks or greyed-out checkmarks as indicators, this report also broke down each product in this manner:

* “This operating system is supported by all parts of the product.”
* “This operating system is supported by some of the parts of the product.”

Also available on the software compatibility site is a software end-of-service tool. For fun look at the end of service dates on VIOS: You’ll see the general availability date along with the time (e.g., 2012 third quarter) products are expected to reach end of service. And by adjusting the tool’s start dates, you can see when the product came on the market and when it will go end of service in a graphical format.

I’m always interested in new ways to look at data, so if you run your own sample reports, let me know what you find.

An Unusual VIOS Install

Edit: The link to the NIM document still works. Getting the mksysb file is easier now.

Originally posted January 25, 2011 on AIXchange

I had an interesting experience with a VIOS installation recently. I’m curious if anyone else has seen something similar. Maybe I just had a bad day.

When I do a NIM install of a VIOS on a new system, I typically refer to this documentation, which outlines a nice way to get the mksysb file off of the installation DVD to use with the NIM server.

The document states:

“Copy the VIOS mksysb image from the CD to your NIM master:

“Mount the VIOS base CD and copy the VIOS mksysb image from the CD (in /usr/sys/inst.images) to your NIM master:

# mount -o ro -v cdrfs /dev/cd0 /mnt
# cd /mnt/usr/sys/inst.images
# cp mksysb_image /export/mksysb/mksysb_image

“If using VIOS 1.5 or higher media, the mksysb file may be split into two parts. To combine these two parts and copy them to hdisk, run the following:

# cat /mnt/usr/sys/inst.images/mksysb_image
/mnt/usr/sys/inst.images/mksysb_image2 > /dir/filename

” *** You can substitute any path you would like to save the combined mksysb image, for ‘/export/mksysb_image’.”

Why would I use NIM to install VIOS if I already have the installation media in hand?

With some IBM Power Systems models, you can get an optional split backplane. This allows you to set up multiple partitions with their own disk controller and disks. With some of these models, however, the DVD drive can only be accessed only by one of the disk controllers. So when the other partition boots up, it cannot see the DVD. This makes it difficult to load that second VIOS on that second set of internal disks, since they cannot see the VIO install media.

I’ve seen people load VIOS to one internal disk and then, when that installation finished, pull the disk out of the first controller and put it into the second controller. Then they reload VIOS onto a disk that they put into the original disk controller. This works, but I don’t think it’s a very clean method; you end up with defined devices on your VIOS that are no longer seen by the operating system.

My preference is to load VIOS and then immediately create a client partition and make it my NIM master. Once I have this NIM master, I define the VIOS mksysb image and use it to load any other VIO servers in the environment — just what’s layed out in the aforementioned document. With the NIM  server there, loading the rest of the client partitions is trivial — at least it usually is.

In this case, however, the issue was that the VIO servers that I loaded from left me with a server at version 2.2.0.0 instead of 2.2.0.10 like I was expecting. Maybe it was bad media, or maybe I just fat-fingered something. At least the solution was simple enough. I just took a mksysb from my clean, newly installed VIOS with the command:

backupios –nomedialib –mksysb –file /mksysb/vios.mksysb

This command excluded the .iso images I’d copied into the /var/vio/VMLibrary (which is my virtual media repository).

I copied that mksysb image over to my NIM server and created my spot. The other VIOS installed as expected.

So has anyone else seen this issue when copying the mksysb file from the install media?

Technical University: Looking Back and Ahead

Edit: Some links no longer work. The links to pictures do, it was fun looking for people from all those years ago.

Originally posted January 18, 2011 on AIXchange

The 2011 schedule for the IBM Systems Technical Conference Series is now out. These worldwide educational events include the IBM Power Systems Technical University, which is set for Oct. 10-14 in Miami, Fla.

I’ve written numerous times about the valuable education available through the IBM Power Systems Technical University events. I figure it’s worth mentioning now because companies typically finalize their annual budgets early in the calendar year — so really, now is the time to make the case to your management about how this educational conference will help you in your job.

If you can’t make time to attend the conference in October, you might also consider the three-day Power Systems Technical Symposium that’s set for April 27-29 in Orlando, Fla.

From the website:

“The IBM Power Systems Technical Symposium is a shorter version of the IBM Power Systems Technical University offered in October. It will focus exclusively on training related to the recent POWER announcements and the potential impact on your data center.”

These conferences are a great place to hear about the technical details behind Power systems, without any marketing fluff. They are technical conferences for technical people. They’re taught by people with real-world experience. The attendees you talk to and network with face the same kinds of challenges that you do. It’s a great place to meet with others in your field.

Besides the technical information, you can usually count on learning about other interesting topics while you’re there.

When I attended the 2010 conference in Las Vegas last October, Steve Squyers was the keynote speaker. Steve is the scientific principal investigator for the Mars Exploration Rover mission, and based on the information he presented that night, I found that space exploration is even more fascinating than I had realized.

I picked up a copy of his book, “Roving Mars,” so that I could learn more. I enjoyed reading what he wrote. I learned a lot from the book, but I also realized that he shared insights with us in his talk that he didn’t cover in his book. I only wish I’d made an audio recording so I could listen to it again. (If anyone has an MP3 of the lecture, please send it my way.)

If you attended the conference, look through these pictures and see if you can find a photo of yourself. I was able to find a shot of the back of my head in one of them.

Many of the speakers flew straight from the event in Las Vegas to the Technical University that was held the following week in Lyon, France, in order to present there as well.

Here’s a set of pictures from France.

I got some behind-the-scenes information from IBM’s Marlin Maddy, who’s in charge of the IBM Technical Conference Series. He told me there were approximately 1,550 attendees in 2010, nearly double the attendance at the 2009 event. There were well over 400 technical sessions, many of which were standing room only.

These venues are selected close to 18 months ahead of time and they had selected the rooms that the speakers would be presenting in for the different topics that were going to be covered in the conference well before actually getting the final attendance numbers, which was why some of the rooms felt so crowded. They did acquire a couple of additional larger rooms to ease the congestion, but it’s difficult to plan for a crowd when you are not sure until the event starts just how many people will be attending.  Since attendees only sign up for the conference itself as opposed to specific presentations, it can be hard to plan for the size of the room that will be needed.

Event organizers collect attendee feedback from each of the conference sessions so that they can get our thoughts and criticisms while they are still fresh in our mind, and according to Marlin, overall customer satisfaction with the event was extremely high in 2010.

A large burst of people signed up at the last minute for the 2010 conference, which is unusual as most years they will have a steady stream of people that sign up for conferences They also offered two pre-conference certification classes that had 50 attendees. This was something new, and event organizers felt that it enhanced the value of the conference.

Prizes given away at the 2010 conference included five iPads, five Kindles and five 1TB external drives. Expect more such giveaways this year.

As Marlin put it, “Putting together an event like this takes a great deal of planning and a solid team working together. There are always minor surprises and as a team we just need to adjust to make it all transparent to the customer.”

I thought they did a great job with the event. Let me know what you think in the comments, or send me an e-mail.

Remote HMC Upgrades

Edit: Some links no longer work. I still love remote upgrades.

Originally posted January 11, 2011 on AIXchange

Anthony English’s recent blog entry about remotely upgrading the HMC struck a chord with me. How many times have you found yourself on a cold raised floor to upgrade a machine? Wouldn’t you rather do that work from a warm office (or, in Anthony’s case, the deck of a ship in Sydney Harbor)?

I had an HMC running V7.3.3 that needed to be upgraded to V7.7.2 in order to support a new POWER7 720 machine. What were my options? There’s upgrading the way we’ve done it for years: I could download the .iso images and burn them to a CD, and then put the CDs into the DVD drive.

Or, I could order the CDs from IBM and not have to download or burn anything.

Or, I could avoid burning physical media or visiting a cold computer room by attempting the upgrade over the network.

Since this was my first time using this method, I started here and followed the instructions:

Download options for HMC network install images. You have three options for acquiring HMC network install image files:

  • Download all files simultaneously via link to Download Director.
  • Download the files individually.
  • Download all files via an anonymous FTP process at a command line.

Note that in all cases you must download (or copy) the image files to a server that accepts FTP requests. You cannot download these files directly to the HMC.

I went ahead and downloaded the files to an FTP server. Then I followed this information:

“You can upgrade the HMC remotely by using network install images rather than using a Recovery DVD. The HMC commands involved include saveupgdata, getupgfiles, chhmc and hmcshutdown. The network install images are linked off of IBM FixCentral.”

I downloaded the files:

Enter these commands, type.

# ftp
ftp> open ftp.software.ibm.com
  Name: anonymous
  Password: ftp
ftp> cd software/server/hmc/network/v7720
ftp> prompt   (turns off interactive mode)
ftp> bin   (ensures binary transfer mode)
ftp> mget *   (downloads all six files simultaneously)
ftp> bye   (exits FTP after all files have been downloaded)”

These files had been successfully downloaded to my FTP server:

# ls -la
total 5068440
drwxr-xr-x   2 root     system          256 Dec 07 13:25 .
drwxr-xr-x   3 root     system          256 Dec 07 10:07 ..
-rw-r–r–   1 root     system      1531846 Dec 07 10:08 bzImage
-rw-r–r–   1 root     system    672645120 Dec 07 11:03 disk1.img
-rw-r–r–   1 root     system   1133975552 Dec 07 12:25 disk2.img
-rw-r–r–   1 root     system    753831936 Dec 07 13:25 disk3.img
-rw-r–r–   1 root     system           78 Dec 07 13:25 hmcnetworkfiles.sum
-rw-r–r–   1 root     system     33049856 Dec 07 13:28 initrd.gz

I logged into my HMC using ssh as hscroot, and the first command ran fine:

>saveupgdata -r disk

However, I ran into a snag with my second command:

> getupgfiles -h ftpserver -u root –passwd passw0rd -d /fixes/HMCV7R7.2.0
Cannot contact server to obtain files.

A Web search netted me this forum and this question, but, alas, no answer.

So I tried rebooting the HMC. That didn’t help.

I verified I could connect to the FTP server from other machines, but I couldn’t figure out why my HMC wouldn’t connect to the FTP server. Instead of spending more time troubleshooting, I decided to download the files directly from IBM onto the HMC. I ran this command:

>getupgfiles -h ftp.software.ibm.com -u anonymous –passwd ftp -d /software/server/hmc/network/v7720

That worked great, although it took longer to download the files than it would have if I were moving files on the local network. To monitor my progress, I ran the script that’s suggested in this post:

while true ; do
date
ls -la /hmcdump
sleep 60
done

Once the download completes, the /hmcdump filesystem gets unmounted. That tells you that the download is finished.

“The getupgfiles operation will mount a filesystem called /hmcdump and copy the install files into the directory then unmount the filesystem. The following commands will set the HMC to boot from the network install images and allow the upgrade to proceed.”

Once the download completed, I ran:

>chhmc -c altdiskboot -s enable –mode upgrade

>hmcshutdown -r -t now

Then I waited a while until the HMC came back from the upgrade. This was actually the one mildly disconcerting bit of the whole operation; there’s no feedback as to whether anything is happening. How do I know if it’s actually working if I don’t see something telling me it’s working? How do I know that the HMC didn’t have a problem on reboot and it’s not just sitting there, frozen, waiting for someone to physically touch the machine?

In my case it took about 20 minutes for the upgrade to complete, and beyond that I had to wait until I could actually login to the HMC GUI (although I could ping the HMC and ssh into the command line). I kept getting an error: “Console not Ready. You cannot log on at this time. The console is still initializing and is not yet ready for users to login. Allow the console to finish initializing and then try to login again.”

After the upgrade, I made sure to load the mandatory eFix MH01235, along with MH01243 and MH01244. This brought the HMC to the current level at the time of writing.

I have to agree with Anthony: Remotely upgrading an HMC is a very painless way to go, though I still wish there was a way to get status information during the actual upgrade process. Just don’t ask me how IBM could provide that information from a machine that’s hundreds of miles away and not on the network while the upgrade is taking place.

More From the Tweet Life

Edit: Some links no longer work.

Originally posted January 4, 2011 on AIXchange

If you’re not following along on Twitter, you should be. Most recently, Twitter brought me this update about VIOS Next Generation:

“VIOS Next Generation or ‘NextGen VIOS’ was released on Dec. 9 as VIOS 2.2 SP01. I recently installed it on my test cluster and put it through its paces to see what was included in the nearly 900MB download.

“First of all, pay close attention to the README prior to installing this code. There are more than a few caveats that are important to pay attention to. Some notable one are: 

  • The reject option of updateios is not supported in this release. Once you install this service pack, you are committed.
  • The new shared storage pool functionality requires 4 GB of RAM in the VIO server.
  • There is a maximum of one (1) VIOS node per shared storage cluster in this release.
  • VIO servers that host shared storage pools may not participate in Live Partition Mobility operations or Partition Suspend/Resume Operations.
  • VIO clients that make use of storage from shared storage pools are not supported for Live Partition Mobility.”

Here’s more I’ve recently gleaned from tracking various AIX enthusiasts on Twitter:

In his blog, Anthony English notes that the depreciation of the the bootinfo –s command means we should instead use the getconf command to track disk size. Anthony’s post points to the following techdoc:

“The command /usr/sbin/bootinfo has traditionally been used to find out information regarding system boot devices, kernel versions, and disk sizes. This command has been depricated in favor of the command /usr/bin/getconf. The bootinfo man page has been removed, and the command is only used in AIX by the booting and software installation utilities. It should not be used in customer-created shell scripts or run by hand.

“The getconf command will report much of the same information that bootinfo will:

“What was the device the system was last booted from?
$ getconf BOOT_DEVICE
hdisk0

“What size is a particular disk in the system?
$ getconf DISK_SIZE /dev/hdisk0
10240

“What partition size is being used on a disk in the system?
$ getconf DISK_PARTITION /dev/hdisk0
16

“Is the machine capable of running a 64-bit kernel?
$ getconf HARDWARE_BITMODE
64

“Is the system currently running a 64-bit or 32-bit kernel?
$ getconf KERNEL_BITMODE
64

“How much real memory does the system have?
$ getconf REAL_MEMORY
524288.”

Here’s an interesting story (link not active) about Apple dumping its Xserve rack-mounted servers, and the conjecture that maybe running Snow Leopard Server on an IBM Power 710, 720 or 750- or on a PS700, PS701 or PS702 blade–might be a good option in the future. Of course getting the OS running on the hardware is a hurdle, but you have to admit that the idea of running yet another operating system on Power Systems servers is intriguing.

Sure, a lot of Twitter is devoted to people sharing what they had for breakfast or where they’re going for the weekend. But if you look around, Twitter can be a valuable resource for the AIX pro (e.g., go to Twitter.com and try searching for #aix#ibmtechu or #ibmwatson).

So what hashtags or users are you interested in?

Watson Follows in Deep Blue’s Steps

Edit: I cannot believe that it has been this long ago already. Some links no longer work.

Originally posted December 21, 2010 on AIXchange

It wasn’t that long ago when chess master Garry Kasparov took on–and was defeated by–IBM’s Deep Blue supercomputer.

Nearly 14 years after that match-up, another man-vs.-machine competition is being staged, and this one will be hosted on the long-running American television game show “Jeopardy!” In a series of shows that will air Feb. 14-16, two of Jeopardy!’s most successful players will test their knowledge against a cluster of IBM Power 750 machines running IBM DeepQA software, dubbed “Watson.”

A group of us recently met with IBM’s marketing team to get more information about Watson and to discuss the technology behind it. They were quick to praise the efforts of the scientists at IBM Research, under the direction of Dave Ferrucci, as being the brains behind Watson. 

They wouldn’t confirm the number of machines that make up the cluster (saying only it was between one and 100 servers), but they told us that Watson runs IBM DeepQA software on Novell SUSE Linux Enterprise Server 11 that has been compiled for Power. The Power 750 servers, which have been configured with 32 cores and either 256 GB or 128 GB memory each, are connected together over a 10 Gb Ethernet network. Watson connects with 2 TB of clustered storage for a total of 4 TB.

I’m interested in solid-state disks, so I had to ask if Watson used SSD to speed up access to the data. I was told that it uses SAS drives and that disk performance isn’t an issue since, once booted, the entire application and data resides in main memory. Watson receives questions in text form at the same time that human contestants have the questions read to them. Watson physically presses the buzzer and uses a voice synthesizer to “speak” the answers. The machine isn’t connected to the Internet; it relies only on its memory for answers.

As you might imagine, IBM has built a significant Web presence to promote Watson and the DeepQA Project (see this introduction, this slideshow, these press releases and this Twitter feed). There’s also this background about Watson’s road to “Jeopardy!”:

“An IBM executive had proposed that Watson compete on ‘Jeopardy!’, but the suggestion was initially dismissed. While search engines such as Microsoft’s Bing and Google are able to provide search results based on search terms provided, no computer program had been able to answer anything other than the most straightforward of questions, such as ‘What is the capital of Russia?’ In competitions run by the United States government, Watson’s predecessors were able to answer no more than 70 percent of questions correctly and often took several minutes to come up with an answer. To compete successfully on ‘Jeopardy!’, Watson would need to come up with answers in no more than a few seconds, and the problems posed by the challenge of competing on the game show were initially deemed to be impossible to develop.

“In initial tests run in 2006 by David Ferrucci, the senior manager of IBM’s Semantic Analysis and Integration department, Watson was given 500 clues from past ‘Jeopardy!’ programs. While the top real-life competitors buzzed in half the time and answered as much as 95 percent of questions correctly, Watson’s first pass could only get about 15 percent right. In 2007, the IBM team was given three to five years and a staff of 15 people to develop a solution to the problems posed. …

 “By 2008, the developers had advanced to the point where Watson could compete with low-level ‘Jeopardy!’ champions. That year, IBM contacted ‘Jeopardy!’ executive producer Harry Friedman about the possibility of having Watson compete as a contestant on the show. The show’s producers readily agreed. …”

 In addition, another American TV show, the acclaimed science series “NOVA,” will feature Watson in a Feb. 9 broadcast. The segment is entitled “The Smartest Machine on Earth.”

Finally, there’s this video, from which I’ll quote:

“We were mainly interested in using ‘Jeopardy!’ as a playing field upon which we could do some science. We wanted the ability to use questions that had not been designed for a computer to answer. ‘Jeopardy!’ really represents natural language. You have to understand the English language and all the nuances and all the regionalisms, slang, and the shorthand to play the game, to get the clues. It’s not just a piece of information.

“In 2009 the producers of ‘Jeopardy!’ watched Watson compete for the first time. Their concern was how do we keep it from becoming a stunt or a gimmick. This was different, this was the notion of knowledge acquired by a computer against knowledge acquired and displayed by the best Jeopardy! players. This could be something important, and we want to be a part of it. Many people are going to watch the ‘Jeopardy!’ show and look at Watson and how it competes in ‘Jeopardy!’ and the curiosity of the computer. They will focus on man versus machine, but the more interesting general challenge is, we are trying to produce a deep question and answering machine which will change the way people interact with computers and machines. We are going to revolutionize many many fields.”

 What do you think? Is this a gimmick? A ploy? Does a cluster of 750s beating humans at “Jeopardy!” make you more likely to purchase a Power Systems server? Does this mean we’ll soon be able to interact with computers the way they did on Star Trek? Hopefully there will still be a way to connect my Model M keyboard to these computers of the future.

#ibmwatson

Virtualization for the Right Reasons

Edit: Some more good discussion. The link does not seem to work.

Originally posted December 14, 2010 on AIXchange

In a recent AIXchange blog entry, I outlined the reasons why some customers have yet to get on board with virtualization. Along those lines comes AIX blogger Waldemar Mark Duszyk, who cautions against virtualizing just for the sake of virtualization.

Here’s Duszyk:

“I do believe that there is room for a VIOS, but not in each and every data center and especially not because the admin from across the street just put one on line so we have to have it too! If you were an owner of a big and heavy track capable of heavy loads in access of 100 tons, would you use it to carry a pillow across your state? You could have used a mail service instead, right? Or if you had 10,000 of pillows to transport, you will make sure they are all compressed to fit as many as possible? The point I am making here is this you would think how to save.”

I remember watching a television show where it was argued that a diesel engine powered school bus that gets six miles to the gallon can sometimes be preferable to an economy car getting 40 miles to the gallon. You may be getting great mileage taking the children to and from school, but it takes a lot of small cars to transport as many children as the big bus can move in one trip. As Mark says, if you have to transport goods, look for the most economical way to do it. The same mindset applies to computing. Don’t virtualize just because everyone else is, do it to save on floor space and power and cooling costs, and to consolidate workloads.

 Again from Mark:

“Do I think that virtualization is a bad idea? Nope again, except that it is still a very expensive proposition. First, before even thinking about virtualization the surrounding IT environment must be comfortable with SAN boot, because without it will be very difficult if not impossible to fully utilize the processing capacity of hardware one wants to virtualize. Why? How much will it cost you to buy just one CPU (including its activation costs) + RAM + physical I/O adapters for your planned VIO environment? Now, multiply this number by two if you want to have two VIO servers in the new managed system? The point to remember is this: For VIO to save you money you have to prove that over time you will at least be able to recover the costs associated with VIO implementation. It is already obvious that if you decided to follow the VIO crowd, in order to recover the costs of virtualization, you have to pack into your managed system as many partitions as possible. Welcome to the world of SAN boot! If your partitions cannot boot from SAN you have to provide them with local disks!”

I cannot agree more. We don’t want to use physical disks and physical adapters when we virtualize. We want to boot from SAN and run many LPARs on our frames, and then we can move workloads around by running Live Partition Mobility between our frames. 

Mark also touches on workload partitions (WPARs) as well as Nigel Griffiths’ idea about running workloads and applications inside of WPARs rather than the global AIX instance:

“Use GLOBAL [instances] solely for systems management. Don’t run workloads there, and don’t create any more users than are required. Create WPARs for each workload, and create the necessary users there. Since WPARs are inherently resource efficient, you don’t give up very much by dedicating GLOBAL [instances] to management only. The overhead is certainly much less than creating a separate LPAR for each workload.”

As I’ve said: Not everyone is virtualizing, and not everyone necessarily wants to virtualize. So what are your reasons for holding back?

Blinky: The Mouse that Roared

Edit: I still love my Model M. I still get freebies at conferences, but I no longer have small children at home that love to see what I brought them.

Originally posted December 7, 2010 on AIXchange

I’m a stickler when it comes to my computer keyboard. If I’m going to be stationed in any one place for an extended period of time, my keyboard is coming with me.

In the past I’ve waxed poetic about my Model M keyboard. With a PS/2 to USB converter, I’ve been able to continue using the same keyboard for so many years that I’ve lost track.

However, I’m far less passionate about my computer mice. I seem to cycle through different iterations without much fanfare or fuss. I certainly don’t miss the old style mouse with the ball inside; I was perfectly happy to join the ranks of the optical mouse users.

Recently, I got a free optical mouse. Well, it turned out it was only “almost” free, but I’m getting ahead of myself. I picked it up at a conference. Anyone who travels to these technical events knows all about the nice freebies that vendors hand out. Over the years I’ve taken home flying disks, foam footballs, Rubik’s Cubes, flashlights, pens and flash drives, along with plenty of other knickknacks I’ve long since lost or given away.

Anyway, this optical mouse was actually nice. Being just the right size for my tastes, I determined it would make a fine addition to my computer bag. I’m always swapping out the mouse that I take with me when I travel. Lately I’ve divided my time between a corded optical mouse and a wireless optical mouse, but since this freebie mouse came with a nice retractable USB cable, I thought I’d try it on my next trip.

So I plug it in, and I’m pleased. Really, it exceeded expectations. The sensitivity was great, it seemed very responsive and, like I said, the size was just right.

But that blinking.

The mouse blinked, and it wouldn’t stop. It even changed colors as it blinked.

Someone must have thought that a computer mouse that could cycle from blue to red and alternate between solid and blinking was a neat idea–and it was, for about three seconds. Then it became annoying, especially in any room with low light. If there was a simple way to stop the blinking, I couldn’t figure it out. But that blinking had to be stopped.

I figured I could just open up the mouse and … do something. I wasn’t sure what, though. So I asked around. I was told that applying black nail polish to the LED would keep the annoying light from escaping. Someone else told me that a piece of black electrical tape would do the trick.

Finally, someone told me to just get some wire cutters and remove the LED entirely. That seemed more my style.

Opening the mouse was fairly simple, especially since I wasn’t overly concerned with breaking my little freebie. So I went to work with the wire cutters and removed the LED.

It was at this point when I learned something, something I probably should have known beforehand. That little red light that you see on the bottom of your optical mouse? It comes from an LED.

“Able to work on almost any surface, the mouse has a small, red light-emitting diode (LED) that bounces light off that surface onto a complimentary metal-oxide semiconductor (CMOS) sensor. The CMOS sensor sends each image to a digital signal processor (DSP) for analysis. The DSP, operating at 18 MIPS (million instructions per second), is able to detect patterns in the images and see how those patterns have moved since the previous image. Based on the change in patterns over a sequence of images, the DSP determines how far the mouse has moved and sends the corresponding coordinates to the computer. The computer moves the cursor on the screen based on the coordinates received from the mouse. This happens hundreds of times each second, making the cursor appear to move very smoothly.”

Turns out my little freebie had two LEDs: One made all those annoying lights blink; the other performed the critical task of making the optical mouse itself work. In my haste to solve the problem, I’d removed both LEDs. I’d killed my mouse.

Needless to say, I realized my mistake the moment I reassembled it. So I was off to the Radio Shack to drop $1.50 on a new LED that I could solder onto the circuit board. It works fine now, and that’s how I ended up with my free optical mouse that I only paid a little bit for.

I spend my work week expertly configuring, installing and supporting computers that can be worth millions, and yet I can’t be trusted with a device that some vendor paid maybe a couple of bucks to put their logo on. Go figure.

Those Who Do Without Virtualization

Edit: Most everyone virtualizes these days, although I still know of vendors that prefer you run one big LPAR per frame.

Originally posted November 30, 2010 on AIXchange

Working on virtualized systems as much as I do, and talking to people about virtualization as often as I do, I tend to forget a couple things:

  1. Not all IBM Power Systems users have virtualized systems.
  2. Not all of them use VIOS even while they benefit from other aspects of virtualizing their machines.

It isn’t necessarily that these shops are limited by the constraints of older hardware and operating systems. I know of customers with POWER6 and POWER7 hardware that haven’t yet virtualized their systems. Maybe they lack the time or the resources to virtualize more fully, or maybe they simply lack the skills that come only with hands-on experience.

Customers who aren’t hands-on generally don’t realize that virtualization covers a wide range of functionality. Using workload partitions (WPAR) counts as virtualization. Micropartitioning CPU, where we assign fractions of a CPU to an LPAR and then set up processing entitlements and cap or uncap partitions based on our LPAR’s requirements? That’s virtualization. We use VIOS to virtualize disk, the network or both. NPIV allows us to virtualize our fibre adapters and have our clients recognize the LUNs we provision–and it saves us the effort of having to map them to the VIOS and remap them to the VIOS client LPARs. We use the built-in LHEA to virtualize the network. We could create an LPAR with some dedicated physical adapters and some virtual adapters. We could use active memory sharing and active memory expansion to better utilize our systems’ memory. Power Systems offers many choices and scenarios where it can be said that we’re using virtualized machines.

I know some administrators who’ve been unable to convince their management or application vendors of virtualization’s benefits. I know of some IBM i users who are reluctant to get on board with VIOS (though plenty of AIX shops still don’t virtualize, either). Sometimes it’s the vendor that lacks the time, resources or skills for virtualization. For instance, I’ve seen multiple customer sites where tons of I/O drawers are used; the vendor won’t officially support VIOS because the vendor hasn’t tested it, and these customers don’t want to run an unsupported configuration.

I talked to an admin who has experience with configuring logical partitions, setting up dedicated CPUs and dedicated I/O slots in his environment, but he continues to use a dynamic logical partition (DLPAR) operation to move a physical DVD between his different LPARs. It’s the way he’s always done it. He figures that since his shop doesn’t use virtualization is no big deal, since he has no experience with VIOS and virtual optical media anyway. “You can’t miss what you’ve never had,” is how he put it.

Others will tell me that they the see the writing on the wall. They insist they’ll virtualize, some day.

Are there roadblocks keeping you from virtualizing? Are there complications that prevent you from moving to a fully virtualized environment? I’d like to hear about the challenges you face. Please e-mail me or post in Comments.

IBM’s Virtualization Alternative

Edit: Still some pretty good arguments in favor of PowerVM. Awareness is still an issue.

Originally posted November 23, 2010 on AIXchange

Did you know that when IBM publishes server benchmarks, these workloads always run on virtualized IBM Power Systems machines? The virtualization is built into the hardware and firmware; there is no concept of a non-virtualized, standalone Power machine anymore. Contrast that with offerings from other virtualization solutions running on other platforms that can degrade performance 30 percent just by using their virtualization software solutions.

The previous statements come from a recent IBM presentation. As you likely know, IBM has been at this virtualization game for a generation or so. The company developed the hypervisor that would become VM on the mainframe in 1967. In 1973, IBM was doing physical partitioning.

Here’s some more material I gleaned from this training session:

  • IBM Power Systems servers provide up to twice the performance of other virtualization solutions on other platforms. These numbers can be even greater depending on the level of virtualization you employ.
  • IBM Power Systems servers are scalable, both in terms of being capable of accommodating workload spikes and in allowing an enterprise to grow its business.
  • PowerVM technology gives you enterprise quality of service virtualization capabilities with higher performance, more scalability and enterprise security. You can have higher utilization of your machines–around 90 percent–which enables you to consolidate your workloads onto fewer physical servers. You can dynamically move from as little as 1/10th of a core to as many as 256 cores in your LPAR, using all of the resources of your server. You can make dynamic changes to resources like CPU, memory and I/O, and you can add and remove dedicated I/O adapters and storage devices, all without a reboot.
  • Live Partition Mobility allows you to easily move running workloads to other frames in your server environment. You can also use LPM to move workloads between POWER6 and POWER7 machines in your environment.
  • Using IBM Systems Director, VMs can be moved automatically to any physical machine in your environment, based on the criteria that you set up. If you have a busy workload on one machine, and more capacity available on another machine, Director can move that workload, without interruption and without human intervention, to the less busy machine.
  • IBM Power Systems servers are secure by design. No common vulnerability exposures (CVEs) have been reported against PowerVM virtualization by US CERT or by MITRE Corp. In contrast, more than 200 VMware-related vulnerabilities are listed in the U.S. government National Vulnerability Database (NVD). VMware is a third-party software add-on, while PowerVM is integrated into the server firmware. No PowerVM vulnerabilities are currently listed in the NVD. Compare PowerVM virtualization with VMware for instance.
  • POWER7 servers offer LPM, live application mobility, partition availability priority, first failure data capture, processor instruction retry, alternate processor recovery, dynamic processor deallocation, dynamic processor sparing, extended error handling and I/O adapter isolation.

The presentation featured a detailed comparison of PowerVM and VMware, making IBM’s case that PowerVM virtualization runs workloads more efficiently than VMware, with far superior resource utilization, price/performance, resilience and availability. PowerVM technology outperforms VMware by up to 65 percent on Power 750, running the same Linux workloads and virtualized resources. See this comparison of PowerVM and VMware virtualization performance for more information. In addition, PowerVM on a Power 750 will scale better than VMware with linear scaling that maximizes resource utilization with 4X more virtual CPUs. And compared to a large-tier POWER7 model such as the Power 795, you can have 32X more virtual CPUs than VMware.

Assuming I have my facts right (I borrowed them from the presentation, please correct me in Comments if you disagree with the information) VMware ESX 3.5 allows for four virtual CPUs per VM, 64 GB per VM, 192 VMs on a server, 32 CPU threads on a server and 256 GB on a server. ESX 4.0 allows for eight virtual CPUs per VM, 255 GB per VM, 320 VMs on a server, 64 threads on a server, and 1024GB on a server. PowerVM allows for 256 virtual CPUs per VM, 8192 GB memory per VM, 1000 VMs on a server, 1024 threads per server, and 8192 GB on a server.

With PowerVM technology, you can utilize all CPU cores and all physical memory. Which would you prefer for your enterprise workloads?

Let’s look at flexibility once your VM is running. PowerVM virtualization allows you to make dynamic changes to virtual CPUs, memory and I/O devices, and have integrated LPAR and WPAR support with PowerVM.

None of this is possible with ESX 3.5. With ESX 4.0, you can add but not remove virtual CPU, add but not remove memory. You can only make some dynamic I/O device changes, and limited direct access to I/O devices.

The same arguments can be made with OracleVM Server for SPARC or HP Integrity VM 4.0. Oracle/Sun allows for Sun Logical Domains on UltraSPARC T1/T2 servers only–they allow for 32 partitions on a T1 or 128 on a T2. You can add or remove CPU, but only add virtual I/O. You can perform warm migrations with constraints. There’s no support for dedicated I/O. With HP you can have 8 CPUs max, 64 GB ram. To do dynamic logical partitioning you need to reboot your LPAR. There’s no support for dedicated I/O. There’s no dynamic CPU sharing.

I still find that some shops simply aren’t aware of all that IBM Power Systems servers and PowerVM technology have to offer, and all that they can do. These customers either aren’t yet virtualizing their systems, or they don’t see the limitations they’re under using other vendors’ solutions. Hopefully comparisons like these will cause them to take a close look at IBM’s alternative.

The Case for High Availability

Edit: Shawn still gives awesome presentations.

Originally posted November 15, 2010 on AIXchange

Recently I attended a session on the IBM PowerHA high-availability solutions. The point was made that, given the reliability and uptime of IBM Power servers, many customers wonder why they even need an HA solution.

IBM’s Shawn Bodily, our PowerHA presenter, described one of his typical customer interactions: First, another IBM representative will tell the customer about the hardware and the systems’ reliability, availability and serviceability (RAS) features. Then a second rep will discuss live partition mobility and how it seamlessly shifts logical partitions from one frame to another.

So after 20 to 30 minutes of hearing about how the hardware never fails, THEN Shawn must step in and explain why the customer should be concerned with high availability and disaster recovery. That’s one tough act to follow.

So why should you care about high availability and disaster recovery? I’m reminded of something I heard at another presentation, this one at an IBM Technical University conference: “What’s the most important thing in the data center?”

I can’t recall the name of the presenter who asked that question, but I definitely know the answer. The most important thing in the data center is the applications that run on the systems. These applications are the reason we buy the systems. Really, we don’t worry about systems going down; we worry about systems going down and losing access to the applications. Or maybe it takes a system failure before we realize just how critical a given application is to the organization. When users can no longer login, when processing no longer occurs, when the cost of said failure soars by the minute–that’s what we worry about.

A Standish Group study from a few years ago estimated that only about 20 percent of outages are a result of hardware failure. And with today’s Power hardware, one can readily assume that that percentage has diminished even further.

So what else can go wrong? What about something like planned maintenance? Live partition mobility might help if your hardware alerts you to the need for a fix. Then you just move the workload off of the machine, perform the service and move the workload back on. But, as Shawn pointed out, what good is it to move your workload if you need to update the application or OS?

In those scenarios, we might look at multibos updates. Or we might look at using a product like PowerHA to fail our workload to a standby node. Yes, you’ll see an outage while the application is stopped and then restarted, but only a brief one.

The point is, things happen. Certainly we’ve seen our share of natural disasters in recent years. Or what about a simple power outage that knocks out the electricity and air conditioning? What about operator/user/human error? A mistake is made, files get deleted. Things do happen. These are the reasons  you should care about high availability and disaster recovery. You may need it. At some point you may need to bring your systems up in another location.

Ask yourself the questions that Shawn asked us: How long can you afford to be without your systems? When your systems are recovered, how much data can you afford to lose? I don’t know any companies that really want to be without their systems for any length of time. I can’t imagine any that would view their data as expendable.

When it comes to high availability and disaster recovery, the time to think about it is now–not after you’re hit with something unexpected.

The Difference Between Busy and Productive

Edit: This is still good stuff.

Originally posted November 9, 2010 on AIXchange

Some time ago I read two articles that got me thinking about the same thing: the difference between being busy and getting things done.

How are you living your day-to-day life? Are you busy running around from one task to another without thinking about what you’re doing? Are you actively looking for ways to automate or eliminate tasks? Are you stressed out?

The author of this piece explains why she stopped working with “busy” people. “It took me a while to realize that there’s a big difference between someone who feels busy and someone who has a lot going on in their business. Busy, my friends, is a cop-out. It’s a euphemism for everything from ‘I’m frantic with deadlines’ to ‘I just don’t wanna’ to ‘I feel bamboozled as to what to do next so I’m checking Twitter obsessively to tell people I’m busy.'”

How are you at prioritizing, making lists and systematically attacking the items that need to be completed? Are you working on goals and are the things you do each day helping you to reach those goals?

I understand we all have things that we need to get done, and many of them need to get done NOW, but that doesn’t necessarily mean that we must become so stressed, so busy, that we lose perspective on what we’re actually trying to accomplish. We can’t get so busy handling service requests, for instance, that we lose sight of the reality that some tasks can be delegated to others.

I can hear you thinking: “But Rob, there have been cutbacks, and now I’m doing the job that two or three people did before.” This is all the more reason to seek assistance. Early in my career, senior staff members would rely on junior staff members to handle the routine requests, which helped to free up senior staff to create simple but highly useful tools like scripts and self-help portals. How much better is it to spend a bit of time setting up a tool that people can use to help themselves?

Always stop and ask yourself: Is this task really important for me to tackle?  Can someone else do it?  Can we teach someone to do it themselves?  Can we give them better tools to help them do their job?

As is noted in this article on “the cult of the busy,”  “By appearing busy, people bother them less, and simultaneously believe they’re doing well at their job. It’s quite a trick. The person who gets a job done in one hour will seem less busy than the guy who can only do it in five. How busy a person seems is not necessarily indicative of the quality of their results. Someone who is better at something might very well seem less busy, because they are more effective. Results matter more than the time spent to achieve them. People who are always busy are time poor. They have a time shortage. They have time debt. They are either trying to do too much, or they aren’t doing what they’re doing very well. [They’re either ineffective with their time or they] don’t know what they’re trying to effect, so they scramble away at trying to optimize for everything, which leads to optimizing nothing.”

So are you busy? Are you effective? What do you plan to do about it?

The Delicate Art of VIOS Configuration

Edit: Setting up SEAs is easier now with built in control channels and HMC GUIs, but it is still something to be aware of. Some links no longer work, and I removed one that appears to be malicious.

Originally posted November 2, 2010 on AIXchange

What’s the quickest way to get to know your network team? Just bring down the entire network.

I actually know of people who have caused network outages by misconfiguring dual VIOS. However, this isn’t another of my scary stories–I just want to tell you how to avoid stirring up your own broadcast network storm.

Start with this sample example:

mkvdev -sea ent0 -vadapter ent2 -default ent2 -defaultid 2 -attr ha_mode=auto ctl_chan=ent1

When you run this command, make sure that each VIOS is set up to use the same control channel VLAN (ent1 in this case). If not, the two servers will be unable to communicate with one another. And if that happens, each will respond as if the other VIOS is down, and each will attempt to function as the primary server.

From IBM Support:

“A Shared Ethernet Adapter (SEA) can be used to connect a physical network to a virtual Ethernet network. It provides the ability for several client partitions to share one physical adapter. SEA can only be configured on the Virtual I/O Server (VIOS) and requires the POWER Hypervisor and Advanced POWER Virtualization feature. The SEA, hosted on the VIOS, acts as a Layer-2 bridge between the internal and external network.

“One SEA on one VIOS acts as the primary (active) adapter and the second SEA on the second VIOS acts as a backup (standby) adapter. Each SEA must have at least one virtual Ethernet adapter with the
‘Access external network’ flag (previously known as trunk flag) checked. This enables the SEA to provide bridging functionality between the two VIO servers.

“This adapter on both the SEAs has the same PVID, but will have a different priority value. A SEA in ha_mode (Failover mode) might have more than one trunk adapters, in which case all should have the same priority value. The priority value defines which of the two SEAs will be the primary and which will be the backup. The lower the priority value, the higher the priority — e.g. an adapter with priority 1 will have the highest priority. An additional virtual Ethernet adapter, which belongs to a unique VLAN on the system, is used to create the control channel between the SEAs, and must be specified in each SEA when configured in ha_mode. The purpose of this control channel is to communicate between the two SEA adapters to determine when a failover should take place.”

In other words: When setting up VIOS, you must set up a control channel so that the two servers can communicate with one another. You also need to establish one VIOS as the primary server and the other as the backup.

This document states the consequences of misconfiguring your SEAs:

“In this section, you will create the control channel virtual Ethernet adapters on VIOS1 and VIOS2, which will communicate on VLAN ID 12. It is very important to create this adapter on both VIOS partitions before creating SEA adapters to support failover for the same VLAN. Failing to have proper control channel configuration can result in causing a broadcast storm when both SEA adapters are activated on the same VLAN (VLAN ID 2 in this case).

“First you will create the control channel adapters on each VIOS partition. These control channel adapters are used to determine the health of the SEAs and are required to avoid a broadcast storm (which can result when two trunking virtual adapters are available on the same VLAN).”

In another part of this document, we read:

“Failing to have proper control channel configuration can result in causing a broadcast storm when both SEA adapters are activated on the same VLAN (VLAN ID 2 in this case).”

And again:

“When you run the mkvdev -sea command, it is very important that you specify the ha_mode and ctl_chan attributes. If you fail to do this, creation of the primary adapter on VIOS2 could result in a network broadcast storm.”

And again:

“STOP!!! Before you continue to the next step, ask a lab instructor to determine that you have the correct adapter configuration. Failure to properly configure an SEA failover scenario can result in a broadcast storm than can affect the entire lab network.”

A network guy I know recommends enabling BPDU on our Cisco switches to try to address this issue. This website seems to agree with that assessment:

“As a precaution, you can enable Bridge Protocol Data Unit (BPDU) Guard on the switch ports connected to the physical adapters of the SEA. BPDU Guard detects looped Spanning Tree Protocol BPDU packets and shuts down the port. This helps prevent broadcast storms on the network.”

Maybe some networking gurus out there can let us know whether using BPDU is advisable on our VIOS-connected ports.

Even those of us who routinely work with VIOS shouldn’t get cocky, because one wrong move can take out a network. So be careful. The stakes are high.

Scary Tales of IT

Edit: Surely there have been more stories I could have been told in the time since this was published..

Originally posted October 27, 2010 on AIXchange

Halloween’s coming up, and I’m looking for horror stories. No blood and gore, please–just tales from your life as an IT professional.

We all have these stories, things we’ve been through and things we’ve heard about. But even if your story comes from a friend of a friend, I’d still like to hear it. I feel all these experiences are instructive. They remind us to be on our toes around our machines.

For instance, a guy once told me about one of his coworkers replacing a disk in a rack-mounted server. An aluminum rod in the raised floor snapped, and the rack started to fall on him. Thankfully, the others working on the raised floor at the time were able to catch the rack before it crushed him.

I heard about another guy who stumbled entering another raised floor — and in the process he accidentally pressed the big red button that completely cut off power to the computer room. From what I was told, the IT folk did not have a particularly happy day recovering those machines. And I can tell you first-hand that when you enter this room now, you’ll find a large cover over that big red button.

I have many stories about dropped machines. I even know someone who took pictures. In that case, my friend said his customer unboxed a new 595 while it was still on the truck — ignoring his advice, by the way. They then wheeled the machine onto the semi’s liftgate, which was sloped slightly. I think you know where this is going. The 595 rolled down the slope, tumbled off the truck and landed upside-down on the ground. That story at least has a relatively happy ending; once they got the machine to the raised floor, it powered right up. But the cosmetic damage serves as a vivid reminder of what can happen if you uncrate the machine before taking it off the truck.

I’ve personally been in computer rooms where columns and posts blocked the ramp–providing just enough obstruction to make it impossible to wheel large computer equipment onto the raised floor. My back still aches thinking about the time I had to lug a 4-CEC 570 with multiple drawers of disks up a flight of stairs.

Finally, a friend e-mailed me this story: “In the late 80s our company had three System/38 machines, and we ended up buying a company out of Minnesota that had its own System/38. Of course that box had to be shipped to Phoenix after we bought the company.

“The System/38 was about the size and weight of your typical Fiat 500. While it did have wheels and could be rolled off the truck and into our building with relative ease, we could only fit it onto the elevator to our third-floor data center by standing it up on its end.

“There we were, around 11 p.m. one night, taking advantage of the extra people in the data center at shift change. We gently slid it out of the elevator and into the lobby. With eight people standing around the 1,100-pound behemoth, surely it would be no problem to gently set it back down on its wheels and push it the last 50 feet to its new home.

“But as we nudged it back towards its normal horizontal position, it became apparent that not everyone understood exactly how heavy this thing was, and, it kind of got away from us. We were close, but with about 24 inches left to go, some of us lost our grip (or maybe our nerve, thinking about crushed toes and fingers). The machine slammed into the floor, shooting the remains of one wheel across the lobby.

“It turns out a System/38 can be rolled on three wheels, if you really want it to. So we managed to get it into the data center and give it a trailer-park touch, leveling it with a piece of 2×4 we scrounged up in the parking garage.

“We figured we’d better find out as soon as possible how badly the system was damaged, but upon plugging it in and powering up, it came up normally. We never had any problems, other than the access doors that never quite closed completely after that night.”

I just wish I’d been there to see that. No, on second thought, I’ve done enough heavy lifting as is.

These are a few of my horror stories. Now let’s hear yours. Surely you’ve seen something memorable in your career (and hopefully the statute of limitations has expired by now). Share your tale by sending me an e-mail or making a post in the Comments section.

The Tweet Life

Edit: Twitter is still a thing. Some links no longer work.

Originally posted October 18, 2010 on AIXchange

I’ve said it before, but Twitter offers a lot of value to IT professionals. I’m finding more and more useful information and links from the people I follow.

In fact, just recently, I came across all of this information in a single morning:

First, gmon. It can now play back files.

“gmon allows you to graphically monitor several AIX 5.3TL5+, AIX6, Linux LPARs and/or [VIOS] running on POWER5, POWER6 or POWER7 servers — from a PC or laptop running Windows. gmon has a very high refresh rate [1-4 seconds] and is best used as a demonstration tool or as an educational tool to help you learn and ‘see’ how POWER virtualization works in action ‘real time.’ gmon also now has the ability to playback nmon files — up to 8 nmon files can be played back at once. This new version supports both an interactive monitoring mode (using a small agent installed on AIX, [VIOS] or Linux) and a nmon file(s) playback mode.”

Next, the ent line in vmstat and what it means, from the IBM developerWorks forums.

“In the documentation it says ent is only used if running shared processors, but it doesn’t say what it actually is telling you. I know what pc and ec mean in the stats, but what is it telling me here with ent and a link to any documentation explaining it further would be appreciated. …

“You are correct ent=NNN.N in the top line is the entitled CPU capacity of the logical partition (LPAR) and it is only shown if this is a shared CPU LPAR. This number is the guaranteed CPU time available to the LPAR. If the LPAR is uncapped, you can use more than this number (if available). If capped then its the maximum. Nothing can stop the LPAR getting this much CPU time.”

Another topic: IBM System Director plugins. Here’s an overview with a download link. I counted 11 different plugins.

Speaking of IBM Systems Director, there’s this tutorial on how to discover systems that use a mirrored (or cloned) image.

“Systems that are cloned (or use a mirrored image) and managed by IBM Systems Director must be correctly configured to ensure their successful discovery. To discover cloned systems, they must be
configured in the following ways:  All cloned systems must have a unique identifier (UID). Each cloned Common-Agent managed system must have a Tivoli globally unique identifier (GUID). Any cloned system that uses Secure Shell (SSH) must have a unique Secure Shell (SSH) host key.”

Finally, I found another good article by Anthony English. Here, Anthony discusses useful commands that help you locate free disks and logical volumes on the VIOS that are available to be mapped to client LPARs.

“There are three commands:

lspv -free lets you see which disks are not mapped to a vscsi device

lslv -free shows the logical volumes which aren’t mapped

lspv -size shows all disk with their sizes in megabytes.”

Again, this was just one morning of Twitter-watching for me.

Finding what you’re looking for on Twitter can be as simple as going to twitter.com and searching on a term like AIX, but plenty of applications are also available to help you navigate this terrain. As noted, I like tweetdeck; with it I’ve set up columns that constantly search for tweets containing AIX or #AIX. Of course by doing this, I’ll occasionally be exposed to other things with the letters a-i-x, like this city in France.

Sounds like my kind of town.

VIOS Updates

Edit: There have been a few updates since I first posted this.

Originally posted October 12, 2010 on AIXchange

I first heard about an updated version of the virtual I/O server (VIOS) during a recent IBM conference call. Now it’s official.

We already use VIOS for sharing disks and networks, active memory sharing and live partition mobility. With these just-announced enhancements, we’ll be able to suspend and resume workloads, do more with virtual networks and take advantage of thin storage provisioning and storage pool sharing capabilities.

Here are some announcement highlights, starting with a new feature called suspend/resume.

As I learned in the conference call, suspend/resume is the process of “freezing” an LPAR and saving the complete system state to disk. Then you can restart the workload exactly where it left off, without data loss. The entire LPAR system state is stored in a set of files and can be resumed on either the same server or a different system after migration. After suspension, the server resources are freed up for use by other workloads.

As you can imagine, this feature can make hardware maintenance much easier, because it allows system administrators to perform system updates or CEC upgrades without the need to shut down and restart applications, and without the need to engage application teams to verify that everything is running properly after the restart.

Where live partition mobility allows us to shift resources between physical machines while applications are still running, with suspend/resume, we’ll be able to move workloads to another machine (though obviously with an interruption of services). We’ll also be able to temporarily suspend low-priority or long-running workloads to allow more urgent processes to access server resources.

For debugging or forensics purposes, IBM states that a workload can be temporarily suspended and a copy made for offline analysis for security or performance purposes. I can’t wait to test out this intriguing feature.

More about suspend/resume from the IBM announcement letter:

“Using Suspend/Resume, clients can provide long-term suspension (greater than 5-10 seconds) of partitions, saving partition state (memory, NVRAM and VSP state) on persistent storage, freeing server resources that were in use by that partition, restoring partition state to server resources, and resuming operation of that partition and its applications either on the same server or on a different server.

“Requirements for Suspend/Resume: All resources must be virtualized prior to suspending a partition. If the partition is to be resumed on a different server, then the shared external I/O (disk and LAN) should remain identical. Suspend/Resume works with AIX and Linux workloads when managed by HMC.”

Here’s what the announcement letter says regarding shared storage pools, VIOS grouping and thin provisioning:

“VIOS 2.2 allows the creation of storage pools that can be accessed by VIOS partitions deployed across multiple Power Systems servers so that an assigned allocation of storage capacity can be efficiently managed and shared. … Multiple VIOS 2.2 partitions can utilize a common shared storage pool to more efficiently utilize limited storage resources and simplify the management and integration of storage subsystems.”

During the conference call the presenters mentioned that this would eliminate the need for vscsi devices or NPIV, but I’ll need to do some hands-on testing to understand the functionality better.

“VIOS 2.2 supports highly efficient storage provisioning, whereby virtualized workloads in VMs can have storage resources from a shared storage pool dynamically added or released as required.”

It sounds like the thin provisioning that we’re used to managing on our storage subsystems can now be managed from our VIO servers. I look forward to testing it out.

“When a new VM is created, the amount of physical storage used is less than the amount defined for the virtual workload, resulting in optimal storage utilization across the shared storage pool. Additional
storage is delivered dynamically when workloads expand and released when workloads contract.  This automates optimized storage utilization, has a more cost-efficient use of storage resources and integrates multiple storage subsystems.”

The last thing they touched on in the training was the enhancements to the virtual networking.

“The virtualized network switch functionality within the VIOS will include support for SNMP, networking QoS, dynamic VLAN and MAC access control lists (ACLs). There will be more sophisticated controls for monitoring and tuning network traffic between virtualized workloads. There will be control over networking QoS (quality of service) rules for specific LPARs and you can fine-tune the performance of network-sensitive workloads. There will be support for MAC based access ACLs to allow administrators to impose higher levels of protection for specific workloads.”

According to the announcement letter, VIOS 2.2 is set for availability on Oct. 15.

New SSD Modules Offer Greater Efficiency

Edit: I cannot remember the last time I did not run SSD in my laptops. Some links no longer work.

Originally posted October 5, 2010 on AIXchange

I’ve been meaning to touch on one other aspect of the recent Power Systems announcements — that being the new solid-state drive (SSD) disk modules.

The new SSD modules are about the same size as a thick credit card. And, according to slides I’ve seen, compared to the 69GB SSD, the new modules give you a better per GB cost, more dense physical packaging and 50 percent less energy and heat per drive, with comparable performance.

As I’ve noted, filemon (filemon -O hot -A -x “sleep 20” -r fmon -o fmon.out) can help us identify which filesystems and physical and logical volumes should be moved to SSD drives and determine the proper physical locations for these relocated files.

If you’re just starting to gather information about SSD technology, IBM offers some good introductory material. In particular check out the SSD vs. hard-disk drives comparison.

From IBM:

“Also known as Flash technology, solid-state drive technology eliminates the rotational delay of a spinning platter and of waiting for an arm to move to the correct position. Thus, data is available nearly immediately. Dramatically reducing crippling I/O bottlenecks, an SSD provides 33X to 125X more I/O Operations Per Second (IOPS) than a HDD and works at speeds much closer to those of memory, bridging the HDD performance gap. SSDs are also more efficient than HDD. While SSD operates close to 100 percent capacity, HDD is often limited to 20-50 percent storage capacity in an effort to improve responsiveness.”

There’s also a comparison of internal storage solutions and storage area networks (SANs) as well as a rundown of older SSD disk options, including SAS-connected 69 GB drives and the new double-wide PCIe card which can house up to four 177GB SSD eMLC disk modules.

Again, from IBM:

“eMLC technology stands for “Enterprise Multi-Level Cell” Flash memory technology. IBM is the first server vendor to provide this new SSD technology option which blends enterprise class performance and reliability characteristics with the more cost effective characteristics of MLC Flash storage.”

Here’s one more quote from IBM that I agree with wholeheartedly: “Remember, it’s not a question of if solid-state drives will be part of your computer center, but rather, when.”

Finally, revisit this post, which references Nigel Griffiths’ SSD demonstration video.

Technical University a Training Highlight

Edit: Some links no longer work.

Originally posted September 28, 2010 on AIXchange

They say what happens in Vegas stays in Vegas, but that isn’t always the case. For instance, the IBM Power Systems Technical University 2010 is set for Oct. 18-22 in Las Vegas. And if you attend this conference, odds are you’ll bring back a wealth of new knowledge about AIX, Linux and IBM i systems.

As I’ve noted numerous times (herehere and here), the information available at Technical University conferences is invaluable. I consider this the IBM technical training highlight of the year. These events are well worth planning and budgeting for.

Here’s an overview from IBM:

“This university is an intense, consolidated way for attendees to learn how to reduce operating costs, simplify the IT environment, access current and upcoming solution providers and leverage the newest technology innovation — virtualization with the IBM POWER7 technology.

“IBM Power Systems Technical University will offer hundreds of sessions on extensive topics, multiple training levels (beginner to advanced), best practices, solution center/expo and certification testing. Attendees will hear details behind the latest POWER7 announcements offering improved capabilities: New workload optimizing technologies like IBM TurboCore, IBM Active Memory Expansion and Active Memory Sharing to reduce memory costs, IBM PowerVM and VMControl virtualization software to support up to 1,000 virtual machines, Intelligent energy optimization features, such as IBM POWER7 EnergyScale.

“The more than 300 sessions will feature such topics as:

•    What’s new in AIX 7.1 and IBM i7.1.
•    Deepdive sessions covering all 2010 POWER7 announcement.
•    NDA future Trends and Directions sessions for Power, AIX and IBM i.
•    Active Memory Sharing (AMS) and Active Memory Expansion (AME). Taking virtualization to the next level.
•    Best practices for POWER Systems including VIOS, I/O and firmware/microcode currency.
•    Turning SAP and Oracle in AIX Environments.
•    Fibre Channel over Ethernet (FCoE) and converged network adapters (CNAs) for Power Systems servers.
•    Understanding the Processor Virtualization for POWER6 and POWER7.
•    Migrating from IBM i V5R4 to i6.1/7.1.
•    VIO Server Best Practices and Enhancements.
•    Leveraging Live Partition Mobility to move to POWER7.
•    Technical details of the new high-end POWER7 Systems.
•    How to migrate to POWER7 Hardware.
•    Designing your NIM Environments with Reliability.
•    What’s new in PowerHA SystemMirror.
•    Performance, Capacity Planning Enhancements.”

If you can make it happen, I encourage you to attend next month’s Technical University conference. And, for those readers outside the United States, here’s a worldwide events calendar that includes an IBM Technical University conference in France.

Getting Started With AIX 7.1

Edit: Some links no longer work.

Originally posted September 20, 2010 on AIXchange

Well, I was wrong. After arguing in two posts (here and here) that getting physical media from IBM is preferable to downloading AIX images, I am now among the converted. Sort of.

What happened? AIX 7.1 happened. When it was released on Sept. 10, I just had to get my hands on it. That meant I had to download it. Don’t worry; I also ordered my physical media. But I did download a copy so I could work with it right away.

Glad as I was to get started with the newest AIX version, the experience reminded me why I’m generally reluctant to download these files. I couldn’t unzip one file,
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.

Instead, I received this error message:

End-of-central-directory signature not found. Either this file is not a zipfile, or it constitutes one disk of a multi-part archive. In the latter case the central directory and zipfile comment will be found on the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP
or
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.zip,
and cannot find
AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_1_of_2_092010.iso.ZIP.ZIP, period.

Since I was able to unzip the AIX_7.1_Base_Operating_System_TL_7100-00-00_DVD_2_of_2_092010.iso.ZIP (disk 2) file with no trouble, I thought maybe the download didn’t complete the first time. So I tried downloading the whole thing again, with the same results. Interestingly though, once I moved the file to an LPAR running RedHat Linux on the same frame, it unzipped just fine.

After completing the download and the decompressing the files, I moved the .iso images over to my virtual media repository, booted an LPAR from it and loaded AIX 7.1. I was able to select the edition of AIX I wanted to install, and I was able to navigate through menus and pick different software install options, including my preferred browser and the server packages I wanted to install. It looked like a normal AIX installation.

     1  System Settings:
         Method of Installation…………. New and Complete Overwrite     
         Disk Where You Want to Install…..hdisk0

    2  Primary Language Environment Settings (AFTER Install):
         Cultural Convention…………….English (United States)
         Language ……………………..English (United States)
         Keyboard ……………………..English (United States)
         Keyboard Type………………….Default
    3  Security Model…………………..Default                  
    4  More Options  (Software install options)
    5  Select Edition…………………..express

Install Options

 1.  Graphics Software………………………………………… Yes
 2.  System Management Client Software………………………….. Yes
 3.  Enable System Backups to install any system…………………. Yes
     (Installs all devices)
4.    Import User Volume Groups…………………………………. Yes

Install More Software

 1. Firefox (Firefox CD)………………………………………. No
 2. Kerberos_5 (Expansion Pack)………………………………… No
 3. Server (Volume 2)…………………………………………. No

Nevertheless, I have a couple of nits to pick: First, why is the Manage Editions menu option so high up on the SMIT main menu?

  Software Installation and Maintenance
  Software License Management
  Manage Editions
  Devices
  System Storage Management (Physical & Logical Storage)
  Security & Users
  Communications Applications and Services
  Workload Partition Administration
  Print Spooling
  Advanced Accounting
  Problem Determination
  Performance & Resource Scheduling
  System Environments
  Processes & Subsystems
  Applications
  Installation Assistant
  Electronic Service Agent
  Using SMIT (information only)

Even computer geeks build muscle memory, and we’ve become used to the devices or system storage manager options being only a couple of down arrow keystrokes away. Now we must unlearn years and years of keyboarding. It’s the new options — especially those that probably won’t change often — that should be buried further down the list.

So AIX 7 was loaded, and rebooted. Now we come to my second issue: Having to watch this message on my console as I waited for it to allow me to login to the system for the first time:

    This is the first time starting Director Agent. Please wait several minutes for the initial setup…

    Stopping The LWI Nonstop Profile…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…
    Waiting for The LWI Nonstop Profile to exit…

It’d be nice if I could start the System Director Agent when I wanted to, or at least have the option of running it in the background while I install systems.

And of course, ssh wasn’t installed by default, but as always it was easy enough to install it after the fact.

Another thing keep in mind is that if you plan on working with versioned WPARs (where AIX 5.2 can run in a WPAR), those vwpar filesets aren’t loaded by default. Be sure you’ve ordered them separately.

Along with the unveiling of AIX 7.1, new VIOS code was released. Read about Anthony English’s experiences with VIOS. (And once you’ve done so, check out Anthony’s comments about migrating the NIM server to AIX 7.1.)

Finally, please share your experiences with AIX 7 in Comments. While I realize that many of you aren’t yet considering migration, I am curious to kn ow if AIX 7 is in your near-term plans.

AIX and Linux

Edit: I still love AIX. The link to the article no longer works.

Originally posted September 14, 2010 on AIXchange

I’ve been exchanging numerous e-mails regarding this article that’s been making the rounds on Twitter. The premise? Linux is now on a par with AIX.

My response? First, note the source: CIO Weekly. Now, it’s fine that C-level executives are seeing the value of Linux. But to suggest that Linux has achieved parity with AIX? I have a hard time believing that actual AIX and Linux administrators would go that far.

The article quotes Jean Staten Healy, IBM’s director of worldwide Linux strategy.

“From Healy’s perspective, Linux is meeting the needs of many CIOs today. She noted that total cost of ownership is a focus for CIOs, but there are other pressures which Linux can help relieve. She noted that virtualization and server consolidation as well as management simplification are key CIO goals in 2010.”

And the money quote?

“‘Linux is on parity with AIX,’ Healy told InternetNews.com in response to a question about how IBM is positioning AIX against Linux. ‘Linux enables choice. I think that’s one of the basic tenants of the faith.'”

Chris Gibson wrote a great article that articulates many of the points I made in the e-mails. Among other things, Chris discusses smit, LVM, mksysb, NIM, multibos, nimadm, concurrent updates, alt_disk_install, savevg, installp, WPARs and IBM support.

To me, Linux has one huge advantage over AIX — its ease of entry. Obtaining a copy of Linux and getting up and running in a test lab is simple. About all you need is an old x86 machine or the capability to create a virtual machine to host one. Then you can play around with the systems or work or home and get comfortable.

On the other hand, if management doesn’t approve a sandbox system for the administrators to learn the ins and outs of AIX, it makes things that much more difficult. It’s great to attend classes or read IBM Redbooks, but these are no substitutes for hands-on work with an operating system and hardware.

So many, many more people have used Linux than IBM Power Systems. And that matters when a company’s UNIX team is asked which operating system they should deploy. They’re unlikely to say AIX if they’ve never previously worked with it (or even worse, if their only AIX experience came many years ago on 3.x or 4.x).

But if you’ve used AIX like I have, you can rattle off its many attributes. I like having my logical volume manager running by default. I like making changes to the system, running cfgmgr and having my new hardware and new LUNs show up automagically. I like making dynamic changes to my running system with no need for a reboot. I like that the IBM owns the hardware and the operating system, and that its support team will fix the system when I report a problem. I like having enterprise class hardware and an enterprise class operating system to run my enterprise. I even like AIX’s security through obscurity advantage. How many script kiddies are attacking my AIX machines as opposed to my Linux machines?

Maybe my problem with the article simply stems from the word choice. “Parity” is defined as “the quality or state of being equal or equivalent.” That doesn’t seem accurate to me — even if you believe Linux is as valuable as AIX, they’re very different operating systems. They aren’t the same. But I imagine the perspective does change if you look at this through the eyes of a CIO.

Ask this administrator though, and I’ll always maintain that, unless or until Linux gets the same capabilities, AIX is the superior operating system.

People, Not Resources

Edit: This is still good stuff.

Originally posted September 7, 2010 on AIXchange

Do these statements sound familiar?

“We need to see if we can find a resource for this project.”

“Our storage resource is busy, but our network resource is available.”

“We need to find another resource.”

I hear things like this all the time. And while I recognize that most even moderately sized companies have human resources departments, I really don’t like it when the word “resource” is applied to people.

I am not a resource. I’m a person who happens to have some unique technical skills that might be utilized to help other people get something done. But I’m not a resource. I’m not a machine. I have a name.

I like this Wikipedia entry, especially the last sentence:

“The term human resources can be defined as the skills, energies, talents, abilities and knowledge that are used for the production of goods or the rendering of services. In a project management context, human resources are those employees responsible for executing the activities defined in the project plan. Human resources are considered to be the most important resource in any project.”

That’s my point: The people providing “human resources” are not disposable. They’re not expendable. They’re critical to the organization, and they should be treated that way.

This article outlines 10 ways employers can keep employees happy, including offering flexible work options, communicating openly, recognizing success, explaining the big picture, building trust and, above all, giving employees respect. It made me happy just to read it.

In this world of downsizing and budget cutting, it’s worth remembering that it’s better and less expensive to retain a current employee than it is to recruit a new employee. Over time your people become more skilled, they know how to get things done internally and they know the customers. In IT specifically, they know the systems and have experienced the server issues.

When I was writing this post, I searched on “I am not a resource,” and found, among other things, this.

So I’m not the only person who feels this way. Far from it.

You may think I’m being overly sensitive about employers describing employees as resources. And if you think I am, let me know in Comments. My response, again, comes from Wikipedia:  “A resource is any physical or virtual entity of limited availability that needs to be consumed to obtain a benefit from it.”

Hopefully we don’t need to be consumed to be beneficial.

Plane Talk About Serving Customers

Edit: I have been known to stand up and stretch sooner than I used to. These days I take pictures of receipts instead of using a scanner. Some of the links no longer work.

Originally posted August 31, 2010 on AIXchange

I’m old enough to recall when airline travel was a stand-up comedy staple. Why do comedians talk so much about flying? I imagine the biggest reason is that they’re on the road so much, it’s familiar territory for them.

And, as a consultant, I can relate. In my travels to and from customer sites, I spend considerable time moving through airports, sitting on tarmacs and waiting at car rental counters. And even though I’m no Seinfeld, I have some observations of my own.

For instance, why do airline passengers insist on standing up immediately after a flight? They just jam the aisles and slow down the deplaning. Why can’t they stay seated, and then leave the plane row by row? I understand the need to stretch — we’ve all been sitting for a long time. But give it five more minutes. Once you’re off the plane you can stretch all you want.

Anyone who’s ever flown can probably rattle off a half dozen annoying things about air travel. But at least the part about tracking business expenses and getting reimbursed has become easier for me. That’s because I found a moderately priced portable scanner. Instead of waiting till I get home to scan all my receipts, now I do it the moment I buy something on the road. Really, keeping track of receipts has become a breeze.

Flying on a regular basis, one tends to develop strong preferences regarding particular airlines, hotel chains and car rental agencies. A single experience can turn you into a loyal customer — or a former customer. For me, when it comes to hotels and car rentals, flexibility is the key. I may need to cancel a reservation at the last minute, so I need to know that I can make that happen.

The whole flying experience reminds me once again how critical a role employee attitudes play in business. And that’s something we should all keep in mind. A friend displays in his cubicle a list of the 11 commandments of good customer service. You can easily replace the word “customer” with “user” or whoever it is you work for:

1. Customers are the most important part in any business.
2. Customers are not dependent upon us, we are dependent upon them.
3. Customers are not an interruption of our work, they are the purpose of it.
4. Customers do us a favor when they call, we are not doing them a favor by serving them.
5. Customers are not cold statistics, they are flesh and blood human beings with feelings and emotions like our own.
6. Customers are part of our business, not outsiders.
7. Customers are not there to argue or match wits with.
8. Customers are people who bring us their wants; it is our job to fill those wants.
9. Customers are deserving of the most courteous and attentive treatment we can give them.
10. Customers are the people that make it possible to pay your salary, whatever your role might be in the company.
11. Customers are the life-blood of this and every other business.

On an unrelated note, here are a couple of useful links. First, a presentation on POWER7 blades. It’s a 30MB file, so be patient when downloading.

The whole site, which I’d written about previously, is worthy of your investigation. There’s a lot of good training material here.

Hot Spares and Other Tips and Tricks

Edit: Some links no longer work.

Originally posted August 24, 2010 on AIXchange

I love getting tips and tricks, and hopefully you love it when I share them. For instance, recently while perusing a mailing list, I learned of a simpler way to look up IBM employee contact information from a smartphone. At least for me, this just seems to render better on my phone than the more familiar URL (whois.ibm.com) that many of us have bookmarked on our browsers.

Thanks to the same mailing list, I was reminded of something else: You can still create a physical hot spare disk in a volume group. This capability has been available through the AIX logical volume manager since AIX 5.1, but of course, with the advent of SANs and shared storage, we’re far less reliant on internal and direct-attached disks these days. But even though we don’t need hot spares the way we once did, it’s good to know that this option remains available.

In the days of SSA drawers I used hot spares all the time. Knowing the hot spare would immediately take over when a disk failed was, to put it mildly, reassuring. Then I’d just place a quick service call, and the CE would come replace my old disk or IBM would ship me a disk and I’d replace it myself.

Here is a detailed definition of hot spares.

I’ll also highlight these steps for enabling hot spare support. Although this document uses websm rather than smit, the concepts are still the same. To select your volume group, go to smit lvm > Volume Groups > Set Characteristics of a Volume Group > Change a Volume Group. Then in the smit panel, change Set Hotspare Characteristics to y.

“Beginning with AIX 5.1, you can designate hot spare disks for a volume group to ensure the availability of your system if a disk or disks start to fail. Hot spare disk concepts and policies are described in AIX 5L Version 5.2 System Management Concepts: Operating System and Devices. The following procedures to enable hot spare disk support depend on whether you are designating hot spare disks to use with an existing volume group or enabling support while creating a new volume group.

Enable Hot Spare Disk Support for an Existing Volume Group
The following steps use Web-based System Manager to enable hot spare disk support for an existing volume.
1.    Start Web-based System Manager (if not already running) by typing wsm on the command line.
2.    Select the Volumes container.
3.    Select the Volume Groups container.
4.    Select the name of your target volume group, and choose Properties from the Selected menu.
5.    Select the Hot Spare Disk Support tab and check beside Enable hot spare disk support.
6.    Select the Physical Volumes tab to add available physical volumes to the Volume Group as hot spare disks.

“At this point, your mirrored volume group has one or more disks designated as spares. If your system detects a failing disk, depending on the options you selected, the data on the failing disk can be migrated to a spare disk without interruption to use or availability.

Enable Hot Spare Disk Support while Creating a New Volume Group
The following steps use Web-based System Manager to enable hot spare disk support while you are creating a new volume group.
1.    Start Web-based System Manager (if not already running) by typing wsm on the command line.
2.    Select the Volumes container.
3.    Select the Volume Groups container.
4.    From the Volumes menu, select New > Volume Group (Advanced Method). The subsequent panels let you choose physical volumes and their sizes, enable hot spare disk support, select unused physical volumes to assign as hot spares, then set the migration characteristics for your hot spare disk or your hot spare disk pool.

“At this point, your system recognizes a new mirrored volume group with one or more disks designated as spares. If your system detects a failing disk, depending on the options you selected, the data on the failing disk can be migrated to a spare disk without interruption to use or availability.”

Here’s an update to last week’s blog post about the new POWER7 servers:Here are the supported operating systems: AIX 7.1, AIX 6.1TL6, AIX v5.3 TL12 SP1 or later, IBM i 7.1, IBM i 6.1 with 6.1.1 MC or later, VIOS 2.2 or later and HMC V7R720 or later.

While this will be true at GA, on 9/30 we should see support for AIX V5.3 TL10 SP 5, or later and AIX V5.3 TL11 SP 5, or later.

The New POWER7 Servers

Edit: How many of you still have machines you need to upgrade to AIX 7?

Originally posted August 16, 2010 on AIXchange

Following on previous releases of POWER7 servers (the 750, 770 and 780 models) and blades, IBM today announced five new POWER7 servers: the 710, 720, 730, 740 and 795 models. The 710 and 730 are 2U servers; the 720 and 740 are 4U servers. The 795 is the high-end replacement for the Model 595.

Here’s a quick overview of the new servers, all of which come with a standard three-year warranty:

Power 710: This 2U single-socket server comes with four, six or eight cores. It can have a maximum of 64 GB of memory with four low-profile PCIe slots. It runs on 100-240 VAC power.

Power 720: This 4U single-socket server comes with 4, 6 or 8 cores. It can have a maximum of 128 GB of memory with four PCIe cards plus four low-profile PCIe cards. It also runs on 100-240 VAC power.

Power 730: This 2U 2-socket server comes with 8, 12 or 16 cores. It can have a maximum of 128 GB of memory with four low-profile PCIe cards. It runs on 200-240 VAC power.

Power 740: This 4U 2-socket server comes with 4, 8, 12 or 16 cores. It can have a maximum of 256 GB of memory with four PCIe cards plus four low-profile PCIe cards. It also runs on 200-240 VAC power.

For a comparison, the 750 is a 4U 4-socket server with 6, 8, 12, 16, 18, 24 or 32 cores, with up to 512 GB of memory and three PCIe cards and two PCI-X cards.

Power 795: This machine can have 24 to 256 cores running at 3.7, 4.0 or 4.25 GHz. Like the 780, the 795 supports TurboCore mode, where half of the cores in a socket are turned off to allow the remaining “enabled” cores to use the shared cache. While TurboCore mode can be deactivated via the ASMI, remember that the entire system is either in TurboCore or MaxCore mode — you can’t mix and match.

These machines can have 8 TB of DDR3 memory when using 32GB DIMMs, with an aggregate memory bandwidth of 4TB per second.

Here are the supported operating systems: AIX 7.1, AIX 6.1TL6, AIX v5.3 TL12 SP1 or later, IBM i 7.1, IBM i 6.1 with 6.1.1 MC or later, VIOS 2.2 or later and HMC V7R720 or later.

While this will be true at GA, on 9/30 we should see support for AIX V5.3 TL10 SP 5, or later and AIX V5.3 TL11 SP 5, or later.According to IBM, customers who upgrade from a 64-core 5 GHz POWER6 595 to a 64-core 4.25 GHz POWER7 795 can obtain 40 percent greater performance while using 35 percent less energy.

I also found this interesting statement in the IBM materials I received:

“rPerf (Relative Performance) is an estimate of commercial processing performance relative to other IBM UNIX systems. rPerf reflects a single image AIX/Linux workload and is derived from an IBM analytical model which uses characteristics from IBM internal workloads such as TPC, SPEC and other benchmarks. Most Power 795 systems will be used to consolidate multiple workloads leveraging multiple PowerVM partitions of various sizes. Starting with the introduction of the Power 795, a new rPerf estimate will be added that represents multiple partitions of smaller sizes. Single image rPerf estimates will continue to be provided up to a maximum of 64 cores.”

I think this reflects the reality that most of us carve our servers into multiple LPARs rather than run a giant 256-core 8TB single image of AIX on a 795. (Although, I must admit, it would be fun to be the admin on that one.)

Another thing IBM notes is that a 64-core 795 would use 61 percent less power than a 64-core 595.

Finally, I saw how mirrored hypervisor memory will be available to add additional built in redundancy:

“(Mirrored hypervisor memory) eliminates system outages due to uncorrectable errors in memory by maintaining two identical copies of the system hypervisor in memory at all times. Both copies are simultaneously updated with any changes, and in the event of a memory failure on the primary copy, the secondary copy will be automatically invoked and a notification sent to IBM via the Electronic Service Agent (ESA).”

In addition to the new hardware, IBM also officially unveiled AIX 7. Here are some key points from that announcement, some of which been covered previously. (See my earlier AIX 7 post, with accompanying links to Nigel Griffiths and Ken Milberg, here.)

AIX 7 will allow vertical scalability for massive workloads with up to 256 cores/1,024 threads in a single AIX partition. AIX 7 will run AIX 5.2 in a WPAR to simplify consolidation of legacy environments on POWER7. I already know of customers who are excited about taking their old applications that are bound to AIX 5.2 and upgrading them onto POWER7/AIX7 WPARs.

AIX 7 will have built in clustering to simplify configuration and management of scale-out workloads and high availability solutions. Its profile-based configuration management will ease the management of pools of AIX systems.

AIX 7 is binary compatible with AIX 6 and AIX 5. Current applications will continue to run; there is no need to recompile applications to work with AIX 7. AIX 7 fully exploits POWER7 processor-based systems, but can also run on systems based on POWER4, POWER5 or POWER6 processors.

Customers can upgrade directly to AIX 7 from AIX 6 and AIX V5; it’s a free upgrade for customers with Software Maintenance Agreements (SWMA).

AIX will have solid state disk (SSD) only volume groups, and there are enhancements to the filemon tool to help identify good SSD candidates. This will help you determine which filesystems to put on your more expensive SSD drives.

AIX is available in three different editions:

AIX Standard Edition: Suitable for most UNIX workloads, with vertical scalability up to 256 cores using AIX 7 (or 64 cores using AIX 6).

AIX Enterprise Edition: Simply, this consists of AIX plus enterprise management features. This edition includes AIX Standard Edition plus Systems Director Enterprise Edition and the Workload Partitions Manager for AIX. Vertical scalability up to 256 cores using AIX 7 (64 cores using AIX 6).

AIX Express Edition: This lower priced edition is targeted toward customers with low-end servers or who are looking to consolidate smaller workloads on larger servers. This edition includes most of the functionality of AIX Standard Edition, but vertical scalability is limited to 4 cores and 8GB of memory per core in a single partition. Customers can use multiple AIX Express Edition partitions in a single larger server.

Keep in mind that customers can run any combination of AIX Standard, Express and Enterprise edition on the same server — for example, you could use AIX Standard for a big database instance and AIX Express for 4-core application server instances.

Take the time to look at the updated facts and features documents. This will allow you to determine which POWER7 servers make the most sense in your environment.  Also start thinking about when you should upgrade to AIX 7.

Readers Respond

Edit: Some links no longer work.

Originally posted August 10, 2010 on AIXchange

Recently I questioned why so many people choose to download .iso images rather than order a set from IBM. Some of you were kind enough to offer your thoughts.

Being able to download these images from IBM is nothing new, (although the capability to download one DVD image as opposed to multiple CD images is a welcome new twist). Back when I first wrote about this, we weren’t yet able to take advantage of virtual optical media. It makes me laugh to go back and read about the gyrations I once went through to use these disk images for anything other than a source file that I would then need to burn to physical media. The method I described in that post didn’t even allow me boot from the images to load the OS; the images could only be used to load the AIX code into a NIM server using the bffcreate command.

In the article I mentioned that I had to download the CDs, then I noted that:

“On Linux, I can simply run:
    mount -o loop -t iso9660 filename.iso /mnt/iso

“This mounts my CD image on my filesystem. On AIX, mounting an .iso image is a little more involved. First I created my logical volume, in this case:
    /usr/sbin/mklv -y’testlv’ datavg 6

“Then I ran the dd command in order to write the contents of the .iso file to my logical volume:
    dd if=/aixcd1.iso of=/dev/testlv bs=10M

“Then I mounted my .iso image with:
    mount -v cdrfs -o ro /dev/testlv /mnt/iso

“At this point the CD was mounted, and I could run smitty bffcreate.”

Of course these days, with virtual optical media, .iso images can be copied into a virtual media library and loaded and unloaded without the need to create logical volumes and run dd commands.

Another thing that simplifies this process now is the addition of the loopmount command in AIX. Anthony English explains:

     “You can now mount ISO images directly onto an AIX LPAR using the loopmount command. This was introduced into AIX 6.1 TL 4 (use the oslevel -s to check your current level). The man page for loopmount provides this example:
    loopmount -i cdrom.iso -o “-V cdrfs -o ro” -m /mnt

So, with all this said, I can certainly understand why people choose to download the .iso images, and that for some in fact it may be their only option. As one reader told me: “I prefer the downloads. It seems easier to me to mount an .iso via virtual means. Though we keep hard copies at the DC just for the reasons you mention.”

Not everyone disagreed with me though. Here’s another comment: “Call me old school also, but I too like for IBM to send me the base media, comes in real handy for booting to maintenance mode for an outage recovery. I can always download/burn .iso (images), but if a server is down, every minute counts….”

Here’s a final comment I liked: “I wish I could download .iso images, because I have more often access to a high bandwidth Internet connection than to a physical media drive. (I work on some systems that are a several miles away from me.)” Without physical access to a machine, an .iso image in a virtualized machine is certainly the way to go. But even in that situation, I’d still want a copy of the media from IBM as well.

Yes, I understand that we can boot from .iso images as if an actual DVD was loaded in a virtual drive.  We can also burn our own copies from the images if we want to. With physical media, we can still load our system even if the environment lacks a NIM server or we don’t have a VIO server running on the server in question. Each method has its pros and cons. As in any case, we need to know what tools we have at our disposal, and then use the most appropriate one for the task at hand.

On an unrelated note, the Central Region Virtual users group hosted another great session — this one covers NIM. Check out the replays (here and here) and download the materials (here and here).

IBM Gets Rolling with Loaner Hardware

Edit: The links no longer work. I guess these days we would just try out workloads in the cloud.

Originally posted August 3, 2010 on AIXchange

Are you a current IBM customer who’s planning on upgrading to POWER6 or POWER7, but would like to try out the machines before buying them? Or maybe you use other operating systems, but want to evaluate IBM hardware running AIX? Or maybe you’ve been reading about the latest virtualization techniques, but don’t have current hardware to test them on?

If you face any of these scenarios, help may be available. Of course, your business partner may have access to machines that you could run some test workloads on. You may be able to work with an IBM Innovation Center to test the hardware.

Or, you could look into the POWER on Wheels program. From IBM:

“Power on Wheels is a revolutionary addition to the Power Loaner Program designed to help quickly determine if Power is right for your server consolidation efforts by providing your client with direct,
hands on access to the newest Power technology in a simple to use package that requires little to no previous AIX or Power skills. Power on Wheels is delivered to a client location in a self contained shipping box. When the box arrives, the client wheels it onto their floor, opens the doors, plugs the box into an electrical outlet and within minutes, the client starts stepping through the graphical user interface to power up and starts running the demo software application.”

Power on Wheels is a POWER7 technology-based server and software demo combination that can be used to demonstrate virtualization, CPU sharing, multiple operating system support, server consolidation Power savings and more. Participants receive a loaner plug and play shipping box, which IBM ships to the customer location for three weeks. The solution also features several pre-packaged software solution demos, but customers can add their own applications to test the hardware as well.

Once the shipping container arrives, you would need to provide power and (if desired) network connectivity. Once it’s plugged in on the raised floor, you’d fire up the physical machines and start running the LPARs, monitoring, applications, etc.

To get nominated for the program, contact your IBM Field Technical Sales Specialist (FTSS) or IBM Business Partner. Currently Power on Wheels is available only in North America, but availability is expected soon for European Union members, and a worldwide rollout is being planned.

Right now, IBM is building the fleet of machines that will serve this program. Currently, there are six shipping boxes, two POWER7 and four POWER6 systems. Another six systems are expected to be deployed, and IBM anticipates mid-August availability for the Power on Wheels V2 stack. And around that time, the POWER6 systems should be upgraded to POWER7.

For more on the Power on Wheels program, check out this text and these videos.