How Much is Too Much Downtime?

Edit: The link has changed, the whitepaper was revised in 2011, but you can still read about it, although I do not think this applies anymore unless you are running some ancient code and older hardware.

Originally posted September 30, 2008 on AIXchange

How often do you hear someone say they’re happy running their applications using Linux on their x86 hardware? They don’t want to hear about Power systems–in their minds they perceive them to be “too expensive.”

I always wonder how much is too much when you’re running your core business applications on these commodity servers. Really, it comes down to how much system downtime can you afford in your environment.

How quickly do you want to be able to call support, diagnose a problem, dispatch a CE and have a repair made? Better yet, what if your machine detects problems and “heals itself,” calling home to IBM so the service reps can let you know that your machine is reporting that it needs service.

If downtime doesn’t translate into lost dollars for your business, then maybe you can afford to take a commodity hardware approach. Some people are just fine deploying server farms consisting of commodity hardware. If they lose one machine, it’s no big deal, because the others that are still running continue to provide service.

The server farm approach has its downsides–the overall power consumption, rack space, infrastructure cabling issues, etc. One thing to consider when making these decisions involves reliability, availability and serviceability (RAS), a topic covered in this great whitepaper.

From IBM:

“In IBM’s view, servers must be designed to avoid both planned and unplanned outages, and to maintain a focus on application uptime. From a reliability, availability and serviceability (RAS) standpoint, servers in the IBM Power Systems family include features designed to increase availability and to support new levels of virtualization, building upon the leading-edge RAS features delivered in the IBM
[System p and System i] servers. This paper gives an in-depth view of how IBM creates highly available servers for business-critial applications.”

Many issues are covered here, including dynamic processor sparing, processor recovery, hot node add (add a drawer to a running system) and protecting memory.

More from the whitepaper:

“The overriding design goal for all IBM Power Systems is simply stated: Employ an architecture-based design strategy to devise and build IBM servers that can avoid unplanned application outages. In the unlikely event that a hardware fault should occur, the system must analyze, isolate and identify the failing component so that repairs can be effected (either dynamically, through “self-healing” or via standard service practices) as quickly as possible – with little or no system interruption. This should be accomplished regardless of the system size or partitioning.”

How much downtime you can afford is something that each company must determine for itself. The question revolves around the total cost of ownership. What do you need your machine to do to support your business? What kind of performance are you looking for? What kind of reliability are you looking for? Ultimately, this will tell you the amount of downtime you can tolerate.

IBM Technical University Worth Planning For

Edit: I remember seeing John McCain that night, and I still find value in attending IBM Technical University.

Originally posted September 23, 2008 on AIXchange

If you happened to be in the lobby of the Chicago Hilton on the evening of Sept. 8, you might have seen Secret Service agents posted at all the doors. Then, if you looked closely, you would have seen John McCain entering the ballroom for a fundraising dinner. A few nights later, in that very same ballroom, you could have seen Jake and Elwood with a Blues Brothers tribute band providing entertainment.

Readers might have been in the Chicago Hilton those nights since this was the venue where the IBM Power Systems Technical University featuring AIX and Linux conference was held. According to the attendees that I spoke to, and from my perspective, this was another worthwhile event.

After the consolidation of AIX and IBM i onto the same hardware platform earlier this year, the AIX conference was held simultaneously with the IBM Power Systems Technical University featuring IBM i. This was a great opportunity for IBM i focused people to learn more about AIX, and vice versa. I was able to attend some IBM i sessions, and many of the messages are the same on both sides of the house–we all use the same hardware, the same HMC and many of the same procedures to get things done.

Each session consisted of around 20 different classes (taking both conferences into account), covering basic to advanced topics. Many of them repeated, so if you had a conflict, you could usually find a convenient time to attend the class of your choice.

One topic I enjoyed learning more about was virtualizing IBM i partitions. IBM i can act like a VIO server and host AIX and Linux guest partitions, but, as of IBM i 6.1, it can also host another IBM i partition. IBM i can manage the disk for the guest partitions, or you can use VIO to host an IBM i partition. The IBM i administrators at the conference seemed to express some concern about the new interface and command line options that need to be learned when setting up VIO partitions, but, once they get a chance to try it, I’m sure it will all start to make sense.

Put next year’s Technical University on your calendar now. This way, when it comes time to request education, you’ll already have one event that you know you won’t want to miss.

IBM Unveils AIX Enterprise Edition

Edit: The first link no longer works, the second link still does. AIX Enterprise Edition is still a thing, although what is included has changed over the years.

Originally posted September 16, 2008 on AIXchange

During last week’s IBM Power Systems Technical University in Chicago, IBM announced  AIX Enterprise Edition. Take a few moments to look into this solution. I think you’ll be glad you did.

From IBM:

“The AIX Enterprise Edition is a new IBM offering that includes AIX 6 and several key manageability products. AIX Enterprise Edition consists of the AIX 6 operating system, the PowerVM Workload Partitions Manager for AIX (WPAR Manager), and three Tivoli products: Tivoli Application Dependency Discovery Manager (TADDM), IBM Tivoli Monitoring, and the IBM Usage and Accounting Manager Virtualization Edition for Power Systems. This offering delivers significant manageability capabilities beyond the capabilities of the standard AIX 6.1 product (AIX Standard Edition).”

In the past, if I wanted to control and relocate workload partitions with WPAR Manager, I had to buy a separate product. Now it’s bundled in Enterprise Edition. This is the product we need to take advantage of application mobility (the capability to move WPARs from one system running AIX 6.1 to another system running AIX 6.1).

Tivoli Application Dependency Discovery Manager (TADDM) is designed to “discover system and application data center resources.” I haven’t had a chance to try it yet, but it sounds like it makes it easier to visualize what’s going on in the computer room by telling me which applications are running on which virtual and physical machines. TADDM also locates changes in the data center, which can help make troubleshooting easier.

AIX Enterprise Edition also includes IBM Tivoli monitoring (ITM)–this allow us to monitor physical and virtual resources and, if need be, look at historical data. In addition, a usage and accounting manager (UAM) reports computer resource usage by a department or an organization. This can be handy if multiple departments want to be charged for their actual computing utilization.

I expect that purchasing the Enterprise Edition, as opposed to buying each package as a stand-alone product, would be a money-saver for your organization.

So read through the announcement, and contact your sales organization for more information if you think this would make sense in your environment.

The Value of Being ‘Well-Red’

Edit: This redbook looks like it was last updated in 2013, so some of the entries have probably been changed by now. The advice about reading Redbooks and keeping informed still applies. It also speaks to the value of downloading material so that you have different versions available to you.

Originally posted September 5, 2008 on AIXchange

Do you read IBM Redbooks? Some people tell me they consult Redbooks when they’re seeking specific information, but that they’re too busy to read them from start to finish. (And since Redbooks are typically hundreds of pages in length, they aren’t generally quick reads.)

My response is that, if you look hard enough, you can find extra time for learning. For instance, do you commute on a train? That could be time spent reading. Do you watch a bunch of mindless sitcoms? Kill your television and study. Maybe there’s some other time-waster you can remove from your life? Replace that with something worthwhile. Don’t just punch the clock, or wait for a team lead or senior person to spoon-feed you what you should know.

To me, the Redbook, “PowerVM Virtualization on IBM System p: Managing and Monitoring,” published in June, contains tons of worthwhile information.

For starters:

  • Setting up a firewall on your VIO server (pages 56-58)
  • Setting up NTP on your VIO server (page 73)
  • Setting up additional users besides padmin on your VIO server (page 77)
  • Creating vio backups (page 82)
  • Backing up IVM profile data using the VIO command line with
    bkprofdata -o backup -f /home/padmin/profile.bak (page 84)
  • Different ways to use the backupios command: backupios -file /mnt/backup –mksysb (page 87
  • Backing up other disk structures with savevgstruct (page 93)
  • Sending error logs to another server (page 127)
  • Using Linux on Power tools to dynamically move memory on a running machine (page 156)
  • This statement (page 170): “There are no architected maximum distances between systems for PowerVM Live Partition Mobility. The maximum distance is dictated by the network and storage configuration used by the systems, and by the fact that both systems must be managed by the same HMC. Provided both systems are on the same network, are connected to the same shared storage, and are managed by the same HMC, PowerVM Live Partition Mobility will work. Standard long-range network and storage performance considerations apply.” That’s interesting, even if it might not yet be practical.
  • The script that creates a simple report (vios_report.ksh) of all the disks on a given VIO server (page 187)
  • The tip about connecting to https:/hostname:5336/ibm/console on your AIX 6 machines (page 208). You’ll get a systems director console for AIX. Take some time to try this out.

And this is just one Redbook. Take the time to read Redbooks, and any other documentation that’s applicable to your job. I can promise you’ll find information that you can use in your shop.

Lesson Learned about Citrix on Linux

Edit: This is mainly here for historical purposes, I would be surprised if anyone would still find this useful.

Originally posted August 26, 2008 on AIXchange 

At a recent training session we had to connect to a Citrix server to access the machines used for the class. I didn’t have any issues, but the student next to me couldn’t get a Citrix client working on his laptop. He tried uninstalling/reinstalling and rebooting, among other things, but couldn’t connnect using his Windows laptop. In the end he had to borrow another machine to work on the labs.

That made me think: If I were in his boat, what would I do?

I like vnc. Maybe I could use a VPN to access a Linux machine in my lab, fire up vncserver, run Firefox inside of the vnc session and run a Citrix client that way?

Then I realized that the Linux partition I have up and ready to go runs Linux on Power, and on top of the Linux on Power installation is a version of Lx86, which I discussed in another AIXchange blog entry.

So I give it a try. I point my Firefox browser in my vnc session to the Web site to log on. The login page displays a link to download a client. It recognizes that a Citrix client isn’t installed, and offers me a choice of clients to download. I choose the Linux client and end up downloading a package called linuxx86.tar.

When I untar the file, I run:

./setupwfc

This command gives me this output:

./setupwfc
Citrix ICA Client 9.0 setup.

Select a setup option:

1. Install Citrix ICA Client 9.0
2. Remove Citrix ICA Client 9.0
3. Quit Citrix ICA Client 9.0 setup

Enter option number 1-3 [1]:

Option 1 is the default. I choose this option, and receive this output:

Please enter the directory in which Citrix ICA Client is to be installed
[default /usr/lib/ICAClient]
or type “quit” to abandon the installation:

I choose the default directory, and press “Enter”.

Then I’m prompted to accept a license agreement, which I do by selecting option 1.

Select an option:

1. I accept
2. I do not accept

Enter option number 1-2 [2]

Option 2 is the default here.

After the installation completes, I quit the client setup. Then I change directories and create a link to the newly installed file so Firefox can find the plug-in:

cd /usr/lib/firefox-1.5.0.10/plugins

ln –s /usr/lib/ICAClient/npica.so npica.so

After I restart Firefox and return to the login page, I receive this error in Firefox:

“You have chosen not to trust Equifax secure certificate Authority the issuer of the server’s security certificate.”

I poke around on Google and find this answer, which points me here and here.

I download the files from each link and rename them with a .crt rather than a .cer extension. Then I copy those files into the /usr/lib/ICAClient/keystore/cacerts directory and run chmod o+r  to give Firefox permission to read them.

After restarting Firefox and logging into the Web site, I’m able to login and do the labs. I end up with Firefox running a Citrix client on top of my Lx86 (x86) installation, which is running on my Linux on Power partition (ppc64).

I was proud of my accomplishments, but others in the room were less impressed. However, this solution paid off when we had a network outage later in the day. While everyone else had to log back into their sessions, I just reconnected to my vnc session and picked right up where I left off. And the performance of the Citrix client running in Lx86 world was pretty good.

I was able to convince myself that if my browser wouldn’t work on my Windows machine, I could still connect to the IBM machines using my Citrix running on Linux solution–and in this case, one running on an LPAR running POWER6 hardware as well.

Doing More With Less

Edit: Although we are working with POWER9 instead of POWER6, these questions and discussions are still ongoing today.

Originally posted August 19, 2008 on AIXchange

A vendor wants to charge you a “per CPU license” fee. What number are vendors looking for when they want to count CPUs in your machine to calculate what you owe? Are they looking for the number of physical sockets on your machine, or maybe the number of processor cores? Are they looking for the total number of processors installed, or do they only count the number of CPUs that are actually activated on your machine?

With the advent of Micro-partitioning, there’s another thing to consider. Are they looking at your HMC to learn my minimums and maximums for each LPAR? Are they looking at the number of virtual processors that your operating system sees? Are they looking for the “number” of processors that you consume when borrowing CPU cycles from other, less busy LPARs during busy times?

More vendors are realizing that we’re less likely to dedicate a whole CPU to an LPAR. Depending on the workload, dedicating a CPU to a “less busy” LPAR when another LPAR could make use of those “idle” cycles can be a waste of resources.

What is a processor? When your HMC reports that 4 CPUs are available on your machine, what does that mean? Do you have four actual chips in your machine?

On my JS22 for instance, it tells me I have four processors. There are two chips on the blade; each is dual core. Is my vendor interested in the number of physical sockets on the blade? Is it interested in my processor class? My POWER6 system will do more work than my POWER5 system did, so I can configure fewer CPUs for my LPAR to do the same amount of work. Will my vendor then charge me less?

As we become more virtualized, this topic will continue to be revisited, but it’s another thing to think about as we justify upgrading our hardware. We may well be able to do more with less, and lower our software bills in the process.

IBM Installation Toolkit for Linux Does More Than Just Install Linux

Edit: Changed the first link, the second link, the third link, and the 4th link, the text no longer reads as it did in 2008. I am not sure how applicable this tool is going to be anyway, but you never know what someone may find useful in the future so I am keeping it here.

Originally posted August 12, 2008 on AIXchange

Don’t be taken in by the title–the IBM Installation Toolkit for Linux isn’t just for installing Linux. From IBM: “The IBM Installation Toolkit for Linux provides a set of tools that simplifies the installation of Linux on IBM Power Systems. The toolkit also provides IBM value-added software that you can install, so that you can take advantage of Power Systems capabilities, such as Dynamic Logical Partitioning (DLPAR). The toolkit also supports Web-based updates, providing immediate access to the latest offerings.

“The Toolkit can also be used as a rescue bootable DVD to run diagnostic tools and repair previously installed operating systems. It also provides a wealth of IBM documentation for configuring and managing Power systems.

“The Toolkit is available as a single ISO image that that you can download from this website. This image can be used to create a bootable DVD or to create a network installation server, which makes multiple and parallel Linux installations over the network possible.”

While I plan on using it to install Linux, I first wanted to check out the Toolkit’s other features. First, I downloaded the .iso image. As always, I prefer virtual optical instead of physical media. So I went to my virtual I/O (VIO) server and ran:

mkrep -sp datavg -size 16G

Then I ran:

oem_setup_env
cd /var/vio/VMLibrary/
scp source_machine:/path/to/iso.image ./

This copied the .iso image to my VIO server so that I could assign it as a virtual optical device.

I did a DLPAR operation on my HMC to add a virtual SCSI adapter to my VIO server. Then I ran cfgdev on my VIO server so that I could see the adapter.

And then I ran:

mkvdev -fbo -vadapter vhost3
loadopt -vtd vtopt0 -disk IBM_Installation_Toolkit.iso

lsmap showed me:

vhost3

VTD                   vtopt0
Status                Available
LUN                   0x8100000000000000
Backing device        /var/vio/VMLibrary/IBM_Installation_Toolkit.iso
Physloc

After assigning the virtual SCSI adapter to my client LPAR, I was then able to boot from this CD image.

Once I booted the machine, I got a root prompt. Then I entered:

WelcomeCenter

After accepting the license, I was presented with these options:

Install Linux
Utilities
Help

I went into the Utilities and saw:

Configure network
Eject Media
Reboot System
System Diagnostics
Firmware Update

The System Diagnostics has:

System Properties
System Inventory
Error Log
Service Configuration
Boot Configuration

These look like useful tools, and if I boot from this CD image I don’t need to have an OS installed to run the utilities. Once I’ve had more time to explore them, I’ll report back with more findings.

More from IBM:

“The server consolidation tool provided by the Toolkit tackles the most time-consuming and error-prone aspect of server consolidation: the migration of OS stack and user and application data. With the Toolkit, the administrator can quickly put a new server into production. The tool targets the migration and customization of LAMP stack (Linux – Apache – mySQL – Perl, Python, and PHP) and data from
X86 servers running RHEL 4, RHEL5, SLES9 and SLES10 to Power Systems.

“The administrator has complete control over the installation and migration process. He chooses the level of the OS and whether additional RPMs should be installed as well as whether to migrate user accounts and data. So whether you intend to migrate one or more servers, the new server consolidation tool is sure to save you a lot of time.

“The IBM Installation Toolkit is intended for customers who want to:

  • Install and configure Linux on a non-virtualized Power System.
  • Install and configure Linux on machines with previously configured Logical Partitions (virtualized machines).
  • Install IBM RAS Tools along with Linux or on a previously installed system.
  • Upgrade firmware level on Power Systems.
  • Perform diagnostics or maintenance operations on previously installed systems.
  • Improve application performance using the latest Power Systems optimizations available in the Advance Toolchain.
  • Migrate LAMP stack from X86 RHEL and SLES servers to Power Systems.
  • Browse and search Linux documentation included on the Toolkit ISO. “

What is Crush in AIX, Anyway?

Edit: This link sheds a little more light on the crush command. I still think this intro holds up.

Originally posted August 5, 2008 on AIXchange

Are administrators violent by nature? Given our terminology, I sometimes wonder.

Consider: When a machine fails, it crashes. If a user has a process that doesn’t behave, we kill it. When we want to manipulate data, we use cut–or in perl, we use chop. We can even use finger if it’s appropriate. I guess that’s not as bad as “the” finger.

Of course, we do have our soft side. Nice, cat, and sleep are other commands we use. So maybe it just depends on our mood.

AIX also has a command called crush. Have you used it? If you have bos.perf.tools installed, you can find it in:

/usr/lib/perf/crush

When I googled /usr/lib/perf/crush, it returned one page, in Chinese. A Google translation indicated that it was a listing of files that are contained in the bos.perf.tools fileset.

I asked around, but I couldn’t find any other information. So I got on a test box and I ran crush. The machine returned this output:

Please supply an integer number of pages.

So I did. I ran /usr/lib/perf/crush 1. Nothing exciting seemed to happen.

Then someone told me that it was an undocumented command. Another guy told me it’s undocumented for a reason. A third guy told me that all crush does is allocate a bunch of memory. Then it goes through and touches each page in memory and cleans up the cruft that accumulates in the machine while it’s been running.

What you will find, when you give it a large enough integer number of pages, is that your free list will grow, and it will page some of your memory out to paging space.

In my case, I ran:

svmon -G
                     size          inuse         free         pin         virtual
memory       958464     957272       1192     216590     520248
pg space     786432     245608

                 work         pers         clnt
pin            216590          0             0
in use       331024           0       626248

I took the memory size (958464), subtracted my pinned memory (216590) and tried the result (roughly 740000):

/usr/lib/perf/crush 740000

When I reran svmon, I saw:

svmon -G
                    size           inuse         free           pin          virtual
memory       958464     248584     709880     216532     520256
pg space     786432     271874

                work         pers       clnt
pin           216532           0           0
in use       246062          0      2522

My free list had gone from 1192 to 709880.  After a few hours, I saw:

svmon -G
                    size           inuse           free         pin          virtual
memory       958464     548408     410056     216596     517442
pg space     786432     265683

                 work       pers       clnt
pin           216596          0            0
in use       289660          0      258748

With this post, at least the next person who searches for /usr/lib/perf/crush will have a little more to go on. Hopefully whoever wrote crush will see this and leave a comment so we can better understand when it’s appropriate to run the tool on a production machine, and what it’s meant for in the first place.

Staying Current on AIX Takes Effort

Edit: The infocenter link is a blast from the past. I edited the user group link. I also edited the link to the irc and usenet article, although there is much more there as well.

Originally posted July 29, 2008 on AIXchange

Writing this blog is interesting. I hear from people with many different backgrounds, and from many different places. Some readers are brand new to AIX (having recently come from other UNIX flavors), while others have been around AIX from the start. Some are part of very large enterprises with multiple locations, hundreds of machines and teams of people. Others work in small businesses that might have one or two critical servers.

While there are commonalities to every administrator’s job–you need to know how to patch, upgrade, and manage the machines from day to day–a lot of what you do depends on the type organization you work for.

In large organizations powered by enterprise-class machines, IT personnel may be specialized and devoted to specific areas. They want to read about topics that cover things like networking, storage, best practices or disaster recovery. In a smaller shop, fewer people handle multiple roles. In fact, the Windows admin and the AIX admin may well be the same person. For these AIX professionals, the interest in areas like networking and SANs may be even greater, since they’re the ones supporting it all.

Security should be a focus in all organizations. Of course, it’s harder to be confident that your machines are secure and set up properly when they’re the first and only AIX machines that you’ve ever seen. Things that seasoned administrators take for granted may not be done according to best practices in a smaller shop with less skilled personnel.

I like sending new administrators to the Information Center, but there’s a difference between reading about things and doing them over and over in a production environment.

Another way you can get help is to get involved with or start a user group in your area.

For newer admins and the guys in the smaller shops, user groups can provide great opportunities to get information and advice from more seasoned professionals. Most people I know are willing to help out someone who’s looking for help, especially when the person asking the question has already put some effort into finding the answer.

Two other good resources are Usenet and IRC.

Here’s a final piece of advice: Someone once told me, turn off the TV and use that time to study. Even if you only do it a couple times a week, you’ll be amazed at what you can get done. Nobody knows everything. Sometimes doing the same things the same way over a period of years makes you reluctant to learn new things. Staying current requires effort. Regardless of your environment, you can know as much as you want, depending on what you’re willing to put into it. Put in the effort, and you’ll quickly gain the necessary skills to do the job.

The Case for Trace

Edit: I do not know that anyone would argue about the overhead, but you never know. I would probably call Earl Jew and Nigel Griffiths and let them hash it out. I did not edit the links, I will leave it as an exercise for the reader to google for more information if it is desired.

Originally posted July 22, 2008 on AIXchange

During a class I recently attended, an AIX systems internals guy argued that nmon and topas add (minimal) overhead to any machine that you might need to analyze. He also said that the tools’ granularity is such that they could miss some things. The intervals these tools use are measured in seconds, while events on the machine occur at the millisecond level.

His recommendation: If you really want to analyze a machine, use trace, and then use curt or trcrpt to analyze the information trace generates. Events on your machine are being collected all the time. Using trace just logs the information that’s already being collected. While the argument could then be made that logging this information creates some overhead, I think we’re just being pedantic at that point.

I use trace all the time. To get started, run this command:

trace -a -o /tmp/trc.out

Make sure you let the trace run for a reasonable period of time (hopefully long enough for the behavior that you’re trying to detect to present itself).

To end trace, run trcstop. Then you’re ready to run curt:

curt -i /tmp/trc.out | more

With curt, it obviously takes some knowledge to make sense of the information that’s generated. If you need help, IBM is an option here–I’ve seen IBM support use trace when helping cusomers with their performance problems.

Information on curt is available from the Information Center.

This Information Center link introduces trcrpt, which you may also be familiar with. This tool analyzes captured trace data.

To look at the trace information collected with trcrpt, run:

trcrpt /tmp/trc.out

curt provides a clean summary report of all the information in the trace output, but if I’m looking at something specific, I may use trcrpt.

I won’t tell you which tool you should use, but getting down to the trace level can be a good next step when you’re working on an issue.

The Value of Business Partners

Edit: There is still value to be found with your IBM Business Partner. I updated the link at the end to take you to a Business Partner search tool.

Originally posted July 15, 2008 on AIXchange

Something needs to change on your raised floor. Maybe you need to implement a SAN, MetroMirror or FlashCopy. Maybe you need to virtualize and consolidate, or maybe you need to look at blades. Maybe you’re not sure what you need.

Yours was once a small shop, but you’ve been continually growing. The business is making more acquisitions and you’re struggling to keep up. Nobody seems to know where these servers came from over the last few years, but the raised floor is now full of critical production stand-alone machines. They all have internal disks and their own tape drives. Space is an issue. Heat is an issue. Maintenance and backups have become painful. How do you get out of this mess? What’s the rest of the industry doing? What are best practices? Where do you go for advice?

Business partners can be a lifesaver when you’re making these decisions. They know what works and what doesn’t, because they know what other shops are implementing. Best of all, creating lasting customer relationships is their priority. After all, what good is selling new equipment to a customer that has no idea how to install and maintain it? That would only lead to frustration, and complaints about the machines not working as advertised. Then, when it came time for the next upgrade, that customer would likely look elsewhere.

Good business partners do more than sell. They keep you abreast of IT’s constant changes. They take you to briefings, and bring in people to educate your staff. The customer/business partner relationship really is meant to be a partnership. Both sides should work together. Your business partner should bring you new ideas and solutions to help you address real business needs. These should not be cookie-cutter, one-size-fits-all solutions, but solutions that are geared toward your organization’s unique needs and individual skill sets. You should be confident that the IT equipment you purchase from your business partner will work as designed, allowing you to concentrate on your own customers and business challenges rather than dwell on IT issues.

Business partners don’t want to be vendors. Although they might make some money on a single transaction, they focus on the long-term — at least that’s what they should be doing. If you’re finding this isn’t the case with your business partner, maybe it’s time to sever the relationship.

If your company is looking for a business partner, IBM provides this search capability.

Training Fears Unfounded

Edit: I still recommend have lab equipment and not learning on the job with production systems. I also still think that training employees is the way to go, and I still advocate for attending the IBM Technical university. The last link no longer works.

Originally posted July 8, 2008 on AIXchange

Even though we all work hard at our jobs, we also want to continue learning and growing to keep pace with trends and new technology. But staying current can be challenging. Particularly if you spend your time periodically setting up machines that run for the most part with no fuss and no muss, your skills can erode.

Sure it helps to read articles and documentation, but there’s nothing like a lab or test machines for actually learning how new technology works. You definitely don’t want to use your production machine to test new things, not unless you like outages and restores.

When employers are seeking people with new skills, they often turn to contractors. Their skills are continually kept up to date through training and hands-on experience, and by working at multiple customer sites, they get exposed to different kinds of systems.

I concede that some contractors are better than others. You must be careful when you’re trying out someone untested. A contractor not only needs the necessary skill set, but the ability to fit into an organization. A genius contractor who cannot communicate or work well with others is of little use to an organization.

However, good contractors can be invaluable. I’ve seen contractors who are onsite so often that they’re mistaken for a regular employee. In other situations, where they’re only needed for specific projects that require specialized expertise, good contractors will work side by side helping in-house staffers get up to speed, and when the project is complete, they’ll leave behind documentation and knowledge. While organizations can benefit from bringing in contractors, they shouldn’t dismiss the more traditional way of bringing new IT skills: training.

I certainly recommend that regular employees receive as much hands-on training as possible.

Unfortunately, training and education budgets are unpredictable, and many organizations offer less training than they once did. Besides budgetary issues, there’s a trust issue. Some organizations fear that educating employees equates to padding their resumes, and the expenditure inevitably leads to people taking their new skills to a different company. My response is that if your organization trains people and provides avenues for growth, it’s a good place to be, and there’s no reason employees would want to look elsewhere. If people are leaving, it’s not because of training, but for another reason that must be identified and addressed.

Speaking of training, IBM Power Systems Technical University featuring AIX and Linux (formerly IBM System p, AIX and Linux Technical University, which I covered in a previous AIXchange blog entry) is set for Sept. 8-12 in Chicago.

Make plans to attend this year’s conference, and you’ll be well-positioned for the future.

The Starting Point for AIX Tools

Edit: Had to update the link to the toolbox, otherwise it is still relevant. I also added a link to an article from June 2004 that talked about vnc and screen.

Originally posted June 24, 2008 on AIXchange

AIX newcomers will often ask where they can find useful tools for their machines. Though they can certainly download source code and compile the tools themselves, many times they’re just looking for something that’s precompiled and ready to run. In those cases, I point them to the AIX Toolbox for Linux Applications.

While I like building tools from source, especially when newer versions are available compared to what’s on the Toolbox, oftentimes the Toolbox is good enough.

Once the files (which are also available on CD) are copied to the target machine, they can be installed using the rpm command. I usually run rpm -ivh . In some cases, prerequisite filesets must also be loaded from the toolbox, but rpm is pretty good about telling you which files must also be installed. To see which files are already loaded on your machine, enter this command:

rpm -qa | more

Which tools do I like? At the top of my list are vnc and screen. I wrote about both in an article in IBM Systems Magazine.

Using these tools I can disconnect (either on purpose or due to a network or computer error) and then reconnect later from a different location.

I also like wget, rsync, lsof, and expect.

The Toolbox has many tools to choose from, so load them onto a test box and try them out. I also encourage you to recommend your favorite AIX tools in comments.

Getting to Know SVC

Edit: SVC is still here, and the links still work.

Originally posted June 17, 2008 on AIXchange

For many system administrators, SAN management is like an unsolvable mystery. The fibre cable is plugged into the server’s host bus adapter, and then somehow, like magic, a LUN appears. Others who more frequently interact with their disk vendors are involved with disk management on an ongoing basis. Although I’m usually more OS-centric, I understand that disk subsystems hugely impact system performance and data availability. I know that the world’s fastest processor does me no good if my disk subsystem isn’t sized appropriately.

When disk subsystems get refreshed, questions crop up. What’s the method for migrating from an EMC disk subsystem to a new IBM disk solution? Should you back up your data from one storage unit and restore it to another using tape? Maybe you should export a LUN from the old and the new disk units and then mirror them using LVM in AIX. What if you want your production data to be stored in a new disk subsystem, but you also want a flash backup copy of it so that data is available on older, slower disk?

If you’re unfamiliar with IBM’s SAN Volume Controller (SVC), get to know this product. IBM’s recent announcement of SVC Version 4.3 is a good starting point.

According to the announcement, SVC allows you to:

  • Combine storage capacity from multiple vendors for centralized management.
  • Increase storage utilization by providing more flexible access to storage assets.
  • Improve administrator productivity by enabling management of pooled storage from a single interface.
  • Insulate host applications from changes to the physical storage infrastructure.
  • Enable a tiered storage environment to match the cost of storage to the value of data.
  • Apply common network-based copy services across storage systems from multiple vendors.
  • Support data migration among storage systems without interruption to applications.

More from IBM:

“System Storage SAN Volume Controller Software in version 4.3.0 further extends its dynamic and high availability storage management capabilities with the introduction of space-efficient VDisks and VDisk mirroring functions. Space-efficient VDisks add the capability to define virtual disk capacity that is separate from the physical disk capacity, and use only the physical disk capacity required to store the data.

“VDisk mirroring offers a significant improvement for high availability SVC configurations by providing the capability to have a VDisk supported by two sets of physical managed disks (MDisks) in different managed disk groups on different storage controllers.

“SVC copy services are further enhanced by allowing FlashCopy to be used with the new space-efficient VDisks to yield a space-efficient FlashCopy capability, which combines with the support for up to 256 FlashCopy targets to enable more frequent FlashCopy while improving physical disk usage.”

For a basic introduction to the SVC, there’s always Wikipedia:

“SVC uses an in-band architecture which means that data flowing between a host and a storage controller flows through an SVC node. On the front end, SVC presents an interface to a host which looks like a storage controller (like a target). On the SVC’s back end, it provides interface to a storage controller that looks like a host (like an initiator).

“SVC holds the current Storage Performance Council (SPC) world record for SPC-1 performance benchmarks, returning over 272K iops (release 4.2.0). There is no faster storage subsystem benchmarked by SPC. The SPC-2 benchmark also returns a world leading measurement over 7GB/s throughput.”

Many of you might be thinking, “I’m a system admin, not a storage guy.” But outside of large specialized environments, the system admin and the storage guy are often the same person–or at least they work closely together to keep the organization’s data accessible and available.

So, chances are, making disk administration easier and data more highly available in your environment is part of your job. Check into the SVC and see if it can help you.

Once More: How Much is Your Data Worth?

Edit: Another oldie but a goodie, backups are still relevant, although the available tools make it even easier to set it and forget it.

Originally posted June 10, 2008 on AIXchange

Recently I covered the topic of server backups. Though this post doesn’t pertain directly to your back-end server environment, I still think the topic needs more exploration.

How much is your data worth?

Pause for a minute and really think about it. How much is the data on your laptop worth after you suffer a disk crash and you can no longer get to it? Do you have VPN connection information, documentation, configurations, procedures, etc.? On personal machines, do you have pictures, letters, scripts and/or financial information that you might want to keep?  Do you have any information stored on your machines that would need to be recreated, or would it be lost forever? Certainly you have applications loaded and configured, and your desktop is set up the way you like it.

After a theft, hardware failure or human error has occurred is the wrong time to ask yourself about your personal data backup strategy.

Many organizations offer space in a SAN environment and tell you to keep files in that shared space as it is being properly backed up on the back end. This is great when you have network connectivity, but when you’re disconnected from the network, it might not be as useful. Although cellular data and wifi coverage is good, it’s not everywhere yet, and the bandwidth to move some of these files might be an issue.

There are several methods to look into, from taking images of your hard drive with software that you can use to restore onto replacement hardware to writing scripts with cygwin and rsync. For critical data, use some of your USB thumb drives and copy information there. Look into automated tools to handle these chores if you don’t trust yourself to remember to handle it manually. Have cron send you a reminder message at appropriate intervals to be sure your data is protected.

So how much is your data worth? If your answer is nothing, then you have nothing to worry about. But if the thought of your machine no longer booting makes your heart skip a beat, now’s the time to take action.

When Maintaining Your IT Environment, Little Things are Worth the Effort

Edit: Another post that still rings true today.

Originally posted June 3, 2008 on AIXchange

For a lot of us in North America, a chunk of our springtime is devoted to yardwork. Winters can be harsh, and yards and gardens need care. So we remove clutter, trim plants and pull weeds. And now, as we near summer, we–make that our kids–are mowing the lawn regularly.

While yardwork is a seasonal chore, when it comes to maintaining your IT environment, there’s always work that needs doing. So are all of your patches and microcode up to date? Do you have old users that can be removed? Can you reclaim disk space by removing old file systems that are no longer in use?

Are the tools and scripts in place to automatically document your machines? Is useful performance information being automatically collected? Are these reports being sent to a back-up machine so that they can still be analyzed if your source machine is no longer responding? Are your machines and cables labeled correctly? Is your documentation up to date?

These items may not seem as urgent as a run-away job or problem tickets that must be dealt with, but you need to plan for them. Because if you don’t, your IT environment will grow out of control like a weed-infested lawn, and you’ll eventually find yourself with a mess. Constantly caring for your machines may seem like more work, but the attention to detail ultimately makes your job easier.

VIOS Update Now Available From IBM

Edit: You had better not still be running this code anywhere.

Originally posted May 27, 2008 on AIXchange

A VIOS update is now available. “Fix Pack 11.1 provides a migration path for existing Virtual I/O Server (VIOS) installations. Applying this package will upgrade the VIOS to the latest level, V1.5.2.1.”

Read about all of the enhancements on the IBM Web site.

Be sure to note:

“In order to take full advantage of all the function available in the VIOS (including SEA bandwidth apportioning.), it is necessary to be at system firmware level SF235 or later. SF230_120 is the minimum level of SF230 firmware supported by the Virtual I/O Server. If a system firmware update is necessary, it is recommended that the firmware be updated before upgrading the VIOS to V1.5.2.1.”

Here’s a sampling of what you will find on the Web site. Many more items are listed there that you can read about by selecting the above link.

• Added support for Bandwidth apportioning for Shared Ethernet Adapter.
• Added support for a Shared Ethernet Adapter accounting tool (new CLI command “seastat”).
• Added virtual switch support to Partition Mobility commands.
• Added CLI command for new SEA command “seastat”
• Improved topas reporting for SEA, EtherChannel and VLAN.
• Added new VLAN attribute called vlan_priority to VLAN pseudo device.
• Added LoopBack Device Support.

Another important thing to keep in mind according to the IBM Web site:

“If you are updating from an ioslevel prior to 1.3.0.1, the updateios command may indicate several failures (i.e. missing requisites) during fix pack installation. These messages are expected. Proceed with the update if you are prompted to “Continue with the installation [y/n]”.

Here’s the list of failures that I saw in my /home/padmin/install.log file when I ran the update:

SELECTED FILESETS: The following is a list of filesets that you asked to install. They cannot be installed until all of their requisite filesets are also installed. See subsequent lists for details of requisites.

    bos.ecc_client.rte 5.3.8.0                    # Electronic Customer Care Run…
    bos.esagent 6.5.8.0                            # Electronic Service Agent
    bos.sysmgt.nim.spot 5.3.8.0               # Network Install Manager – SPOT
    bos.sysmgt.trcgui_samp 5.3.0.30        # Trace Report GUI
    ifor_ls.msg.en_US.java.gui 5.3.7.0     # LUM Java GUI Messages – U.S….
    rsct.msg.EN_US.basic.rte 2.4.0.0        # RSCT Basic Msgs – U.S. Engli…
    rsct.msg.en_US.basic.rte 2.4.0.0        # RSCT Basic Msgs – U.S. English

MISSING REQUISITES: The following filesets are required by one or more of the selected filesets listed above. They are not currently installed and could not be found on the installation media.

    bos.sysmgt.nim.spot 5.3.0.0               # Base Level Fileset
    bos.sysmgt.trcgui_samp 5.3.0.0          # Base Level Fileset
    ifor_ls.java.gui 5.3.0.0                       # Base Level Fileset
    lwi.runtime 5.3.8.0                             # Base Level Fileset
    rsct.basic.rte 2.4.0.0                          # Base Level Fileset

From the IBM documentation:

VIOS V1.5.2 provides several key enhancements in the area of POWER Virtualization.

  • VIOS network bandwidth apportioning. The bandwidth apportioning feature for the Shared Ethernet Adapter (SEA), allows the VIOS to give a higher priority to some types of packets. In accordance with the IEEE 802.1q specification, VIOS administrators can instruct the SEA to inspect bridged VLAN-tagged traffic for the VLAN priority field in the VLAN header. The 3-bit VLAN priority field allows each individual packet to be prioritized with a value from 0 to 7 to distinguish more important traffic from less important traffic. More important traffic is sent faster and uses more VIOS bandwidth than less important traffic.
  • Virtual I/O Server Command Line Interface (CLI) was enhanced to support Image Management commands. The CLI is in a unique position to have access to virtual disks and their contents. Users will be able to make a copy of virtual disks and install virtual disks using the Image Management command, cpbd. This command will allow other programs to create and copy virtual disk images.
  • The Virtual I/O Server runs on Internet Protocol version 6 (IPv6) networks, therefore users can configure IPv6 type IP addresses.
  • The updates to the Systems Planning and Deployment tool include updates to ensure none of the existing VIOS mappings are changed during the deployment step.

For me it was a smooth update. I downloaded the code to my machine (it was a pretty big download; you might also think about ordering the updates on CD depending on your network connection), ran updateios -commit and then updateios -accept -install -dev /mnt. (Since I was doing this over an nfs connection, I had mounted the remote filesystem that I needed to /mnt.)

When it was done, and I rebooted the VIOS and saw this output:

$ioslevel
1.5.2.1-FP-11.1

I’ll keep an eye on it and notify you of any issues.

Can You Restore? Now’s the Time to Find Out

Edit: Still true today, have you tested your backup lately?

Originally posted May 20, 2008 on AIXchange

A buddy recently told me about a situation he encountered where a non-disruptive disk update on a storage area network proved extremely disruptive. The client lost its LUNs, which impacted all of the LPARs that were booting from SAN, along with many machines that had datavg stored out in the SAN.  Rootvg on these partitions was gone.

Still, the client figured it could restore its environment from their most recent backups. However, this was a test environment. There were no backups. The machines were built and used, with no thought ever given to recovering them. Since this wasn’t production, it wasn’t important enough to add jobs to the backup server to accommodate these machines, right?

This was another painful lesson learned. The machines had to be rebuilt. The scripts, users, tools and code that was being developed were lost.

I, for one, have covered this topic repeatedly, because it’s important. Yet some continue to ignore the message. I still find environments that don’t have consistent, recent backups. Yes, IBM hardware is robust, and it’s known for reliability and availability. But even the best hardware is still man-made, and machines break. And even the most experienced administrators logon and make mistakes. Stuff happens.

To this, some respond: “I have nothing to worry about in my environment. I take my weekly mksysb. I send the tapes offsite. I take my nightly backup. It goes over the network to my backup server, which we then offload to tape, which we then take offsite.”

But do you test those backups? Are you sure the tapes can be read? I’m not even talking about a true disaster-recovery scenario, where your building burned down and you’re trying to rebuild. I’m talking about simply trying to restore data to machines in your current environment.

When you really need to restore isn’t the time to find out that you can’t restore. It’s critical that you take the time to confirm this now, when things are still running smoothly.

Ask yourself what will be lost if you must go back to your last backup. Did you take that backup last night? If your outage happened right before closing time, is your business prepared to rollback to last night’s backup and redo the work that was done during the day?

More and more the answer is no: Outages any kind and any length cannot be tolerated.

Constantly look at your environments. If you add file systems to machines, are the new file systems being backed up? Is the frequency of the backups acceptable to the business? Does management agree with your conclusion?

The objective of a backup is to be able to restore. Be sure that you can.

Another Great AIX Script

Edit: I love revisiting these scripts, and I wonder if anyone still runs them.

Originally posted May 13, 2008 on AIXchange

I recently saw another great script from the mailing list, written by Dean Roswell. To get it working on my machine, I loaded these rpms from the AIX Toolbox CD:

tcl-8.4.7-3.aix5.1.ppc.rpm
tk-8.4.7-3.aix5.1.ppc.rpm
expect-5.42.1-3.aix5.1.ppc.rpm

Then I followed the instructions in the script’s introductory comments (seen in the script listed below) and set up the ssh keys to allow automatic login from my AIX machine to my VIO server. After editing the list of VIO servers in the script to match what I had in my environment, I was able to use this script to send out the same command to multiple VIO servers at the same time.

They gave some examples from the e-mail I saw.

Syntax:

root@coenim:/:# dshvio -?
Valid parameters are:
-r for a root command
-n for a list of VIO servers
-n vios1
-n vios1,vios2

A regular VIOS command:

root@coenim:/:# dshvio ioslevel
=========================
VIO server –> coevios1
=========================
1.5.1.1-FP-10.1
=========================
VIO server –> coevios2
=========================
1.5.1.1-FP-10.1

An example of running a command through oem_setup_env automatically:

dshvio -r fget_config -Av
=========================
VIO server –> coevios1
=========================

—dar0—

User array name = ‘COE-DS4700’
dac0 ACTIVE dac1 ACTIVE

Disk DAC LUN Logical Drive
hdisk2 dac1 0 coenim_nim
hdisk3 dac1 1 coesap01_disk1

=========================
VIO server –> coevios2
=========================

—dar0—

User array name = ‘COE-DS4700’
dac0 ACTIVE dac1 ACTIVE

Disk DAC LUN Logical Drive
hdisk2 dac1 0 coenim_nim
hdisk3 dac1 1 coesap01_disk1

And here’s a command that I ran in my environment:

./dshvio.ksh -r “lscfg | grep disk”
=========================
VIO server –> vios1
=========================

vios1:/home/padmin # oem_setup_env
+ hdisk0           U787F.001.DPM1XRD-P1-T10-L3-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
+ hdisk1           U787F.001.DPM1XRD-P1-T10-L4-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
+ hdisk2           U787F.001.DPM1XRD-P1-T10-L5-L0   16 Bit LVD SCSI
Disk Drive (73400 MB)
=========================
VIO server –> vios3
=========================

vios3:/home/padmin # oem_setup_env
* hdisk0           U787F.001.DPM1XRD-P1-C1-T1-W5005076801103022-L0
        MPIO Other FC SCSI Disk Drive
* hdisk1
U787F.001.DPM1XRD-P1-C6-T1-W5005076801202FFF-L1000000000000  MPIO
Other FC SCSI Disk Drive
* hdisk2           U787F.001.DPM1XRD-P1-C6-T1-W5005076801202FFF-
MPIO Other FC SCSI Disk Drive

This means that I can run commands across all of my VIO servers from my AIX machine, in either the padmin or oem_setup_env environment when I specify the -r option. I welcome any changes or improvements that readers might suggest, and I hope that Dean continues to share these useful tools with us.

#!/bin/ksh
# Created by Dean Rowswell, IBM, April 26, 2006
# Modified by Dean Rowswell, IBM, October 11, 2007
#       Added a check for the -r flag for a root user command
#       NOTE: this flag will require the expect RPM package to be installed
# Modified by Dean Rowswell, IBM, October 12, 2007
# Added a check for the -n flag to specify a single or multiple VIO servers
#
#——————————————————————————
# Assumption: this server is a trusted host for running ssh commands to
# the VIO server(s)
#   To set this up:
#     ssh-keygen -t dsa (press ENTER for all prompts)
#     scp $HOME/.ssh/id_dsa.pub padmin@VIOserver:.ssh/authorized_keys2
#
# NOTE: if the VIO server responds with “rksh: ioscli:  not found” then
# login to the VIO server and change to the root shell via oem_setup_env.
# Edit /etc/ssh/sshd_config
#       Change: PermitUserEnvironment no
#       To: PermitUserEnvironment yes
#       Run: stopsrc -s sshd ; startsrc -s sshd
#——————————————————————————
#===========================================================#
# Define the list of VIO servers in this variable
#===========================================================#
VIOS=”vios1 vios3″
#===========================================================#

DisplayUsage() {
echo “Syntax: dshvio COMMAND\n  Run dshvio -? for the valid parameters”
exit
}

if [ ${#*} -eq 0 ]
then
      DisplayUsage
else
      while getopts :rn: PARMS
        do
         case $PARMS in
          r) lslpp -L|grep -w expect >/dev/null
               if [ $? -ne 0 ]
                 then
                   echo “ERROR: cannot use -r flag because expect\
RPM package is not installed”
                   exit 1
                 else
                   ROOT=1
               fi ;;
          n) VIOS=${OPTARG}
                VIOS=`echo ${VIOS}|sed ‘s/,/ /g’`;;
          ?) echo “Valid parameters are:\n  -r for a root command\n\
  -n for a list of VIO servers\n  -n vios1\n  -n vios1,vios2″ ; exit ;;
         esac
        done

        shift $(($OPTIND -1))
        VIOSCMD=${*}
        if [ ${#VIOSCMD} -eq 0 ]
        then
                DisplayUsage
        fi
fi

for VIO in ${VIOS}
do
  ping -c1 ${VIO} >/dev/null 2>/dev/null
if [ $? -eq 0 ]
    then
    echo “======================\nVIO server –> ${VIO}\n\
======================”
        if [ ${ROOT:=0} -ne 1 ]
        then
         ssh padmin@${VIO} “ioscli ${VIOSCMD}”
        else
         expect -c “spawn ssh padmin@${VIO} ;expect \”\$\*\”;\
send \”oem_setup_env\\r\”;expect \”\#\*\”;send \”${VIOSCMD}\\r\”;\
send \”exit\\r\”;expect \”\$\*\”;send \”exit\\r\””|egrep -v “^spawn\
|^Last|oem_setup_env|^exit|^#”
        fi
     else
        echo “===================\\nVIO server –> ${VIO}\n\
===================”
        echo “VIO server: ${VIO} is not responding”
     fi
done

Making the Case for AIX and Power Systems

Edit: IBM’s Virtualization is still as powerful today, if not more so.

Originally posted May 6, 2008 on AIXchange

I recently received an e-mail from a mailing list that linked these documents from The Sageza Group (link not active) and Forrester Research. Both reports offer information that may help non-technical personnel understand the value proposition of AIX and Power Systems.

In “The Value of PowerVM Workload Partitions New Virtualization Options in IBM AIX v6.1,” Sageza focuses on on workload partitions (WPARs). I’ve previously covered WPARs herehere and here.

While the Sageza report is apparently accessible online, Forrester’s report, “Virtualization Trends On IBM’s System p: Unraveling The Benefits In IBM’s PowerVM,” seems to be available only through registration and subscription. From Forrester:

“IBM’s PowerVM (formerly Advanced POWER Virtualization) technology has catalyzed the consolidation of server systems resources and a variety of applications workload types–both AIX- and Linux-led–as virtualized on more powerful multi-core System p servers. Evolution of IBM’s virtualization stack has improved dramatically–from its early 2001 introduction of logical partitions on the first multicore POWER4-based systems–to its current PowerVM virtualization stack. In 2007, IBM’s refresh to POWER6 came fast and furious: debuted with high-end System p 570 (in May), followed by the P6-based JS22 blade (November), and sweeping through the System p 520 (entry) model and System p 550 (midrange) server (January 2008). Traction for PowerVM virtualization now accounts for 70 percent of its IT customer base — showing to what extent IBM’s virtualization stack has become a shortlist contender as a systems consolidation enabler. The November 2007 release of AIX 6 added two breakthrough features–Live Partition Mobility and Live Application Mobility–further cementing IBM’s advanced virtualization advantages against its Unix competitors.

“1. What’s the history behind the IBM virtualization stack?
2. What is the overall business value of the System p virtualization stack?
3. What role does the POWER Hypervisor play in the PowerVM?
4. How does the POWER Hypervisor integrate with PowerVM technologies?
5. What are the benefits of micro-partitioning and the shared-processor pool?
6. What are workload partition (WPAR) and Live Application Mobility?
7. What business problems are solved with PowerVM’s new Live Partition
Mobility?”

And, from Sageza:

“Many organizations have embraced virtualization to improve IT utilization and reduce the expenses associated with equipment acquisition, installation and operation. While traditional virtualization or partitioning schemes have improved IT resource utilization, reducing the number of physical servers has not reduced the number of server operating system (OS) images requiring administration and maintenance. If anything, virtualization has encouraged growth in the number of servers that support the application workloads in organizations. There is an opportunity for IT to reduce this administrative overhead to become more streamlined and cost-efficient while continuing to provide the levels of service on which organizations have become dependent.

“IBM AIX 6.1, through its support for Workload Partitions, enables organizations to rethink the way they deploy multiple workloads on a single server. While traditional approaches such as virtualization using logical partitioning provide OS isolation and independence, for many workloads, this degree of isolation exceeds the user’s need and results in unnecessary administrative and operational overhead.

“WPARs offer IT managers a more cost-effective yet secured approach that meets the needs of many organizations. WPARs differ from other partitioning or virtualization schemes in that they partition server resources by the workload and share access to a single OS image. This is in contrast with the more common approach of creating a discrete operating system image to support each virtual server. By reducing the number of OS images required, the level of server software maintenance and other related IT administrative and management activity can be decreased while maintaining streamlined operational management and reduced need for physical resources.

“WPARs increase resource utilization from the typical 5-20 percent average, reduce partition creation and teardown times, and reduce the number of OS instances and associated system management workload. WPARs provide standard application environments, support mobility and templates as well as cloning, and have automated policy-based resource and workload management through the WPAR manager. Consolidating with WPARs saves floor space and reduces the power consumption and expense associated with servers and air conditioning in the data center while maintaining the one-app/one-server deployment paradigm.

“In this paper, we examine the flexibility that WPARs offer IT professionals in their virtualized UNIX server environments. In particular, we review how WPARs are different from other partitioning technologies and how WPARs complement existing environments. We discuss the capabilities and practical uses of WPARs in sample scenarios and articulate the ways in which WPARs provide an alternative to other partitioning schemes. Through AIX 6.1 and its support for WPARs and PowerVM Live Application Mobility, IT managers have greater flexibility in server configuration and can select the best approach to meet the user organization’s needs while also simplifying the operational and cost efficiency of the IT environment.”

PowerVM Redbook Recommendation

Edit: Still useful concepts to study and be familiar with

Originally posted April 29, 2008 on AIXchange

If you’re working with PowerVM but haven’t kept up with the changes, or if you’re new to virtualization, then the updated Redbook, “PowerVM Virtualization on IBM System p: Introduction and Configuration” (4th edition), should be required reading. It serves as everything from an introduction to virtualization to a cookbook for setting up dual VIO servers for redundancy. Much of this post quotes directly from the Redbook, as I don’t think I can say it any better than the authors have.

From the abstract:

“This IBM Redbook provides an introduction to PowerVM virtualization technologies on IBM System p servers. The Advanced POWER Virtualization features and partitioning and virtualization capabilities of IBM Systems based on the Power Architecture have been renamed to PowerVM.

“PowerVM is a combination of hardware, firmware and software that provides CPU, network and disk virtualization. The main virtualization technologies are:

* POWER5 and POWER6 hardware
* POWER Hypervisor
* Virtual I/O Server

“Though the PowerVM brand includes partitioning, software Linux emulation, management software and other offerings, this publication focuses on the virtualization technologies that are part of the PowerVM Standard and Enterprise editions.

“This publication is also designed to be an introduction guide for system administrators, providing instructions for:

* Configuration and creation of partitions and resources on the HMC
* Installation and configuration of the Virtual I/O Server
* Creation and installation of virtualized partitions

“While discussion in this publication is focused on IBM System p hardware and AIX, the basic concepts can be extended to the i5/OS and Linux operating systems as well as the IBM System i hardware.

“This edition has been updated with the new features available with the IBM POWER6 hardware and firmware.”

And, from the introduction:

“The first edition of this publication was published over three years ago. Since then the number of customers using Advanced POWER Virtualization (currently named PowerVM) editions on IBM System p servers has grown rapidly. Customers use PowerVM in a variety of environments including business-critical production systems, development, and business continuity. This fourth edition includes best practices learned over the past years to build on the foundation work of the previous versions of the Redbook.

“This publication targets customers new to virtualization as well as more experienced virtualization professionals. The publication is split into four chapters, each with a different target audience in mind.

“Chapter one is a high-level introduction for those wanting a quick overview of the technology.

“Chapter two is a slightly more in-depth discussion of the technology aimed more
at the estate- or project-architect for deployments.

“Chapters three and four are aimed at professionals who are deploying the technology. Chapter three works through a simple scenario and Chapter four introduces the more advanced topics such as VLANs, Multiple Shared Processor Pools and Linux. Additionally it will introduce the techniques that can be used to provide the periods of continuous availability required in production systems.”

Be sure to look for the shaded sections throughout the book, which include different sections labeled Important, Note, and Tip. Reading and understanding these will save you headaches when deploying your machines. For example, take a look at this from the SEA section:

“Note: A Shared Ethernet Adapter does not need to have IP configured to be able to perform the Ethernet bridging functionality. It is very convenient to configure IP on the Virtual I/O Server. This is because the Virtual I/O Server can then be reached by TCP/IP, for example, to perform dynamic LPAR operations or to enable remote login. This can be done either by configuring an IP address directly on the SEA device, but it can also be defined on an additional virtual Ethernet adapter in the Virtual I/O Server carrying the IP address. This leaves the SEA without the IP address, allowing for maintenance on the SEA without losing IP connectivity if SEA failover has been configured. Neither has a remarkable impact on Ethernet performance.”

With this Redbook and a test environment, it wouldn’t take long to better understand the topics presented.

Script Changes

Edit: I wonder if this script is still running in the wild.

Originally posted April 22, 2008 on AIXchange

I received an interesting e-mail from a mailing list. Included was this information submitted by Dean Rowswell:

1. Turn on PuTTY logging
2. Copy and paste these 6 commands first:

lshmc -v
lshmc -V
lshmc -r
lshmc -n
lshmc -b
lssysconn -r all
lssyscfg -r sys
lssyscfg -r frame
lshmc -n -F clients
cat /opt/hsc/data/.hmc/.removed
lspartition -dlpar
lspartition -sfp

3. Copy and paste these commands last:

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
echo ” ============MANAGED SYSTEM –> LIC level”
lslic -m ${MANAGEDSYS} -t sys
echo ” ============MANAGED SYSTEM –> processor config”
lshwres -m ${MANAGEDSYS} -r proc –level sys
echo ” ============MANAGED SYSTEM –> lpar processor usage”
lshwres -m ${MANAGEDSYS} -r proc –level lpar -F
lpar_name:curr_proc_mode:curr_sharing_mode:run_proc_units:run_procs
echo ” ============MANAGED SYSTEM –> memory config”
lshwres -m ${MANAGEDSYS} -r mem –level sys
echo ” ============MANAGED SYSTEM –> lpar memory usage”
lshwres -m ${MANAGEDSYS} -r mem –level lpar -F lpar_name:run_mem
echo ” ============MANAGED SYSTEM –> lpar status”
lssyscfg -m ${MANAGEDSYS} -r lpar -F name:state
echo ” ============MANAGED SYSTEM –> cuod processor config”
lscod -m ${MANAGEDSYS} -t cap -r proc -c cuod
echo ” ============MANAGED SYSTEM –> cuod memory config”
lscod -m ${MANAGEDSYS} -t cap -r mem -c cuod
echo ” ============MANAGED SYSTEM –> drawer config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype unit
echo ” ============MANAGED SYSTEM –> bus config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype bus
echo ” ============MANAGED SYSTEM –> slot config”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot
echo ” ============MANAGED SYSTEM –> slot config summary”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
drc_name:description:lpar_name:lpar_id
echo ” ============MANAGED SYSTEM –> virtual ethernet”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype eth –level sys
echo ” ============MANAGED SYSTEM –> virtual ethernet all lpar”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype eth –level lpar
echo ” ============MANAGED SYSTEM –> virtual scsi all lpar”
lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype scsi –level lpar
for LPAR in `lssyscfg -r lpar -m ${MANAGEDSYS} -F name`
do
echo ” ============LPAR –> ${LPAR} –> CPU resources”
lshwres -r proc -m ${MANAGEDSYS} –level lpar –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Memory resources”
lshwres -r mem -m ${MANAGEDSYS} –level lpar –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Physical adapters”
lshwres -r io –rsubtype slot -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Virtual Ethernet config”
lshwres -r virtualio –rsubtype eth –level lpar -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> Virtual SCSI config”
lshwres -r virtualio –rsubtype scsi –level lpar -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> LPAR config”
lssyscfg -r lpar -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
echo ” ============LPAR –> ${LPAR} –> LPAR profiles”
lssyscfg -r prof -m ${MANAGEDSYS} –filter lpar_names=${LPAR}
done
done

4. Copy and paste these lines to add the information to my LPAR resource allocation spreadsheet
NOTE: when I do a paste into Excel click on the paste options and select “Text import wizard”

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description
done

For System p machines that already have the physical resources assigned:

for MANAGEDSYS in `lssyscfg -r sys -F type_model*serial_num`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description:lpar_id:lpar_name
done

5. To capture any errors and events

lssvcevents -t hardware
lssvcevents -t console

After messing around with the script, I wanted to get it working from cron. I’m unable to run scripts on my HMC, but after looking here (link not active), I set up my ssh keys so I could auto login from my AIX machine to my HMC.

Then I modified the above script so I could run it from my AIX machine and have it connect to my HMC using ssh. Now I can run the job out of cron on my AIX machine instead of messing with putty.

#!/usr/bin/ksh
#
# scriptname -m hmchostname -l hmcuser
#
#
hmc=hmc.ip.address

user=hscroot

while getopts m:l: option
do
case $option in
m) hmc=”$OPTARG”;;
l) user=”$OPTARG”;;
esac
done

echo “HMC Information:”
echo “”
ssh $hmc -l $user ‘date’
ssh $hmc -l $user ‘lshmc -v’
ssh $hmc -l $user ‘lshmc -V’
ssh $hmc -l $user ‘lshmc -r’
ssh $hmc -l $user ‘lshmc -n’
ssh $hmc -l $user ‘lshmc -b’
ssh $hmc -l $user ‘lssysconn -r all’
ssh $hmc -l $user ‘lssyscfg -r sys’
ssh $hmc -l $user ‘lssyscfg -r frame’
ssh $hmc -l $user ‘lshmc -n -F clients’
ssh $hmc -l $user ‘cat /opt/hsc/data/.hmc/.removed’
ssh $hmc -l $user ‘lspartition -dlpar’
ssh $hmc -l $user ‘lspartition -sfp’

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”

echo ” ============MANAGED SYSTEM –> LIC level”
ssh $hmc -l $user “lslic -m ${MANAGEDSYS} -t sys”
echo ” ============MANAGED SYSTEM –> processor config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r proc –level sys”
echo ” ============MANAGED SYSTEM –> lpar processor usage”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r proc –level lpar -F
lpar_name:curr_proc_mode:curr_sharing_mode:run_proc_units:run_procs”
echo ” ============MANAGED SYSTEM –> memory config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r mem –level sys”
echo ” ============MANAGED SYSTEM –> lpar memory usage”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r mem –level lpar -F
lpar_name:run_mem”
echo ” ============MANAGED SYSTEM –> lpar status”
ssh $hmc -l $user “lssyscfg -m ${MANAGEDSYS} -r lpar -F name:state”
echo ” ============MANAGED SYSTEM –> cuod processor config”
ssh $hmc -l $user “lscod -m ${MANAGEDSYS} -t cap -r proc -c cuod”
echo ” ============MANAGED SYSTEM –> cuod memory config”
ssh $hmc -l $user “lscod -m ${MANAGEDSYS} -t cap -r mem -c cuod”
echo ” ============MANAGED SYSTEM –> drawer config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype unit”
echo ” ============MANAGED SYSTEM –> bus config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype bus”
echo ” ============MANAGED SYSTEM –> slot config”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot”
echo ” ============MANAGED SYSTEM –> slot config summary”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
drc_name:description:lpar_name:lpar_id”
echo ” ============MANAGED SYSTEM –> virtual ethernet”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
eth –level sys”
echo ” ============MANAGED SYSTEM –> virtual ethernet all lpar”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
eth –level lpar”
echo ” ============MANAGED SYSTEM –> virtual scsi all lpar”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r virtualio –rsubtype
scsi –level lpar”

for LPAR in `ssh $hmc -l $user “lssyscfg -r lpar -m ${MANAGEDSYS} -F name”`
do
echo ” ============LPAR –> ${LPAR} –> CPU resources”
ssh $hmc -l $user “lshwres -r proc -m ${MANAGEDSYS} –level lpar
–filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Memory resources”
ssh $hmc -l $user “lshwres -r mem -m ${MANAGEDSYS} –level lpar
–filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Physical adapters”
ssh $hmc -l $user “lshwres -r io –rsubtype slot -m ${MANAGEDSYS}
–filter lpar_names=${LPAR}”

echo ” ============LPAR –> ${LPAR} –> Virtual Ethernet config”
ssh $hmc -l $user “lshwres -r virtualio –rsubtype eth –level lpar -m
${MANAGEDSYS} –filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> Virtual SCSI config”
ssh $hmc -l $user “lshwres -r virtualio –rsubtype scsi –level lpar
-m ${MANAGEDSYS} –filter lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> LPAR config”
ssh $hmc -l $user “lssyscfg -r lpar -m ${MANAGEDSYS} –filter
lpar_names=${LPAR}”
echo ” ============LPAR –> ${LPAR} –> LPAR profiles”
ssh $hmc -l $user “lssyscfg -r prof -m ${MANAGEDSYS} –filter
lpar_names=${LPAR}”
done
done

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description”
done

for MANAGEDSYS in `ssh $hmc -l $user “lssyscfg -r sys -F type_model*serial_num”`
do
echo “============MANAGED SYSTEM –> ${MANAGEDSYS}”
ssh $hmc -l $user “lshwres -m ${MANAGEDSYS} -r io –rsubtype slot -F
unit_phys_loc:bus_id:phys_loc:description:lpar_id:lpar_name”
done

ssh $hmc -l $user ‘lssvcevents -t hardware’
ssh $hmc -l $user ‘lssvcevents -t console’

Feel free to improve upon what Dean and I have done so far. I’ll add your contribution to a future posting.

IBM Power Announcement: Not Just Another Renaming

Edit: There are still ongoing wars over what you call your AS/400, System i, does it run OS/400, i5/OS, IBMi, etc. The change happened more than ten years ago, maybe it is time to call it IBMi on POWER?

Originally posted April 15, 2008 on AIXchange

I spent the first part of my career working on OS/400, but since then I have been much more focused on the System p and AIX. Like a lot of you, I’ve been around long enough to recall when these machines were known as the AS/400 and RS/6000. I was amused when a coworker forwarded me this article on April Fool’s Day. It talks about IBM and the name changes they have made over the years to the AS/400.

Ironically, the day after that article was published, April 2, 2008, IBM announced yet another name change. However, the latest announcements represent much more than a simple name change and a new faceplate on the hardware.

If you haven’t been following this (you have plenty to do keeping machines running from day-to-day, after all), then let me bring you up to speed. IBM is unifying its System i and System p systems onto common hardware platforms. The way the company puts it, there’s a new power equation for the new enterprise data center: Power = i + p.

From IBM’s announcement letter:

“The IBM System i and IBM System p organizations are unifying the value of their server offerings into a single, powerful lineup of Power Systems servers based on industry-leading IBM POWER6 processor technology with support for the IBM i (formerly known as i5/OS), IBM AIX, and Linux operating systems. This new, single portfolio of Power Systems servers offers industry-leading technology, continued IBM innovation, and the flexibility to deploy the operating system that your business requires.

“Specifically, being announced today are: IBM Power 520 Express, IBM Power 550 Express, IBM BladeCenter JS12 Express blade server. All three of these systems can be ordered in the AIX Edition, i Edition or Linux Edition.”

Then last week came another IBM announcement:

“IBM announced two high-end Power Systems models–the world’s fastest UNIX server and a unique water-cooled supercomputer. The new systems offer sophisticated IBM virtualization technology and energy-saving capabilities to help dramatically reduce bottom-line operating costs, such as those for energy, floor space and systems management, while improving system performance, helping clients transition to a new enterprise data center. Beginning today, clients will be able to leverage the world’s most powerful microprocessor, POWER6–with new world-record speeds of up to 5 GHz–in these new systems, leading to significant performance improvements across a wide array of applications.

“The new UNIX enterprise server, the Power 595, designed to extend IBM’s leadership in the UNIX market, will be attractive to existing IBM clients as well as Sun Solaris and HP UNIX users. IBM’s new POWER6 “hydro-cluster” supercomputer, the Power 575, is built to help users tackle some of the world’s most challenging problems in fields such as energy, aerospace and weather modeling. The new super-dense system uses a unique, in-rack, water-cooling system and with 448 processor cores offers users nearly five times the performance and more than three times the energy efficiency of its predecessor, IBM’s POWER5+ processor-based p5-575 supercomputer.

“The new IBM Power 570 is a unified version of the popular midrange POWER6 processor-based System p 570 and the System i 570. Existing customers can update to the new system at no-charge. The Power 570 runs any permutation and combination of i, AIX or Linux partitions offering the ultimate in flexibility and increased asset utilization and reuse. And with PowerVM, Power servers also run many Linux x86 applications.”

So we’ll be able to run i, AIX, Linux on Power, and Lx86 on everything from JS12 and JS22 blades, to the 520, 550, 570, 575 and 595 Power models.

What makes the most sense in your environment? A BladeCenter with some JS12 or JS22 blades running AIX, i or Linux? You could mix those blades in the same chassis with Intel or AMD blades to run whatever Windows or native x86 Linux applications you might require. Or maybe you need an IBM Power 595 running 5ghz POWER6 chips with 4 TB of RAM? You could carve that machine into LPARs running i, AIX and Linux as needed.

The same virtualization options are still available:

Using PowerVM for virtualization, we can “aggregate and manage resources via a consolidated, logical view.”

What do these announcements mean for your organization? Again, the keyword is unification. You may have had separate System i and System p IT teams that managed their own hardware and operating system, with each always figuring that its platform was the best and the most important.

At a minimum, as new Power hardware arrives on the floor, I’d expect more communication between the teams. It makes sense to get some cross-training as we seek ways to make coexisting on the same hardware a reality.

If IT personnel don’t communicate and make an effort to understand the other operating system, if organizations continue to maintain separate computing empires, the capability to run i and AIX on the same hardware will be useless. Although initially it may make sense in some environments to let each group run with its own hardware, that mentality will be harder to justify as management keeps hearing about virtualization and consolidation and the machines keep handling larger workloads.

Of course, since every shop has budget and hardware lifecycle concerns, not everyone will get the new Power hardware right away. But plenty of shops are struggling with older technology that is in need of a refresh. For those organizations that will soon go through the refresh process, be sure to look at the new machines and make the virtualization and consolidation decisions that are right for you.

Getting Started When You’re on Your Own

Edit: It seems to still be an issue where the people that built machines are not necessarily the ones that manage and maintain machines as time goes on.

Originally posted April 8, 2008 on AIXchange

You need to log-on to a machine that you’ve never seen before. There’s no documentation. The administrator who built and ran the machine is no longer with the business and left nothing behind, so nobody has any idea how it was put together. The machine just sits in a corner and runs.

So what are some of the first commands you use to get more information?

To answer this question, you should ask several more. What are you trying to understand? Do you want to know how the disks are laid out? Do you want to know how the application starts? Do you want to know what is running in cron or what applications are running on it now?

This is by no means comprehensive, but a few things that I like to check on a machine would be found in the output from these commands:

oslevel –s
instfix –i | grep SP
hostname
lsdev –Cc adapter
netstat –rn
netstat –in
cat /etc/hosts
cat /etc/resolv.conf
lslpp –l
rpm -a
lscfg
prtconf
lsmcode
lssrc -a
lspv
lsvg
df
lsfs
lsps -a
uname -L
errpt –a
crontab –l

I’ve left a few things off this list. What other commands would you add?

Do you have scripts and tools that you like to load and run on a “foreign” machine? Do you keep copies of how the machine looks so that you can easily compare today’s output to last week’s? Do you have run books or other documentation–not to mention any necessary backups–to use if a machine rebuild is needed?

Although in many cases the people who build the machines are also the ones who run them day to day, it’s not always the case. Sometimes, due to the size and complexity of the environment or the simple fact that people do switch jobs in IT, the guy who loaded and understands the machine may not be available to do work on it now.

That leaves it up to us. We need to be able to log-on to any machine, learn about the environment and then do what needs to be done. It starts with knowing which commands to run.

Getting Hands-On with Live Partition Mobility

Edit: This is still a useful tool for us to utilize.

Originally posted April 1, 2008 on AIXchange

My first experience with live partition mobility came as an observer. A few months back I went to Austin and saw an LPAR move from one POWER6 570 to another.
 
Since then, I’ve become a live partition mobility user. I move running LPARs between my two JS22 POWER6 blades, which are connected to the same SAN and able to see the same LUNs. Once I got it set up, I haven’t had any problems. It functions with the blades just as I saw it work on the larger systems.
 
Checklist
To get started, I made sure I had the enterprise edition of PowerVM. After loading the virtual I/O (VIO) server onto my blade and logging into the Integrated Virtualization Manager (IVM) GUI, I was able to enter my APV key.
 
I did some other checks during set-up. I made sure my network was set to use a shared ethernet adapter instead of a host ethernet adapter, and I made sure the reserve_lock on my SAN disk was set to no_reserve. (It was originally set to single_path.) If you fail to fix your reserve_lock, you’ll need to change it. When searching on “change reserve lock,” I found some documentation which included the following:
 
To change the setting with an LPAR running, from your vio server run:

# chdev -dev hdisk7 -attr reserve_policy=no_reserve
 
Some error messages may contain invalid information for the virtual I/O
server environment.
 
Method error (/usr/lib/methods/chgdisk):
    0514-062 Cannot perform the requested function because the specified
    device is busy.

The error makes sense because we’ve mapped the physical device to a virtual
device.

    $lsmap -vadapter vhost3
    SVSA            Physloc                                      Client Partition ID
    ————— ——————————————– ——————
    vhost3          U7998.61X.100BB8A-V1-C17                     0x00000007
 
    VTD                   vtscsi3
    Status                Available
    LUN                   0x8100000000000000
    Backing device        hdisk7
    Physloc               U78A5.001.WIH0A68-P1-C6-T1-W5005076801303022-L6000000000000
 

Shutdown the LPAR that is using the device.

Remove the virtual adapter and its mapping.

$ rmdev -dev vhost3 -recursive
hdisk7 deleted
vhost3 deleted

Change the reserve_lock setting.

# chdev -l hdisk7 -a reserve_lock=no
hdisk7 changed

Let’s make sure the setting changed.

$ oem_seup_env
(to leave the restricted shell)

# lsattr -El hdisk7
 
Check for
 
reserve_policy  no_reserve

With the changes verified, the next step is to re-create the virtual device using mkvdev.

Using my IVM GUI, I selected the LPAR that I was going to migrate. (In the drop-down menu, go to “mobility” and select “migrate.”) I entered the IP address of the machine I was moving to, followed by the padmin password. Then I selected “validate.” This verified that my LPAR was ready to move. Once it passed the tests, I clicked on the “migrate” option, and my LPAR moved to the other blade.   
 
Keep Running
The value of this technology became clear to me on a recent customer call. My customer was conducting maintenance. I had to bring the machine down, and the users suffered an outage. Had the customer been on POWER6, live partition mobility would have been a perfect solution here. I could have moved the running LPAR and then brought down the source machine without affecting the workload that was running.

I’m sure that as more customers deploy POWER6 technology, we’ll see live partition mobility become more widely adopted. It’s extremely useful technology.

Lx86 Works as Advertised

Edit: now that is a name that I have not heard for a very long time. Some of the links just resolve to generic IBM pages.

Originally posted March 25, 2008 on AIXchange

After hearing so much about Lx86 (formerly known as the System p Application Virtual Environment, or System p AVE), I finally decided to try it out.

Lx86 allows you to run unmodified Intel/x86 Linux binaries on IBM Power hardware. This is significant because the alternative–running Linux applications natively on Power–requires a recompile. This can be painful or impossible, and in fact it’s this reason that many IBM customers choose Intel x86 to run their Linux applications.

For more about how Lx86 works, see the references at the end of this post. Now, I’ll tell you about installing and getting started with Lx86.

First, I set up a trial subscription at Red Hat to get the .iso files I needed. There are 10 files in all–both 32-bit x86 images and 64-bit Power images. Once I downloaded the files, I moved the ppc .iso files to the /var/vio/VMLibrary directory on my virtual I/O (VIO) server, a process I cover here.

This allowed me to use my virtual optical drive to mount the images rather than burn and mount a bunch of physical media.
 
I booted from the first ppc .iso file, and after setting my install selections, I was able to install the machine onto an LPAR. I had to do some loadopt and unloadopt commands to get through all the CDs, but that seemed easier than personally taking a fistful of media to the site.
 
After the installation, I downloaded the code here. (IBM registration required.)
 
As of this writing I was using IBM PowerVM Lx86 V1.1 p-ave-1.1.0.0-1.tar (8652800).
 
I copied the tarball and my x86 .iso files to my newly built Linux on Power installation. I untarred the p-ave .tar file, and ran ./installer.pl. After accepting the license and registering with both IBM and Red Hat, I was ready to load the software. I chose to do a full install, and had it share my /home directories.
 
The installer prompts for the path to your x86 linux .iso files, as it expects to find them loaded on the machine. I gave the installer the correct path, and it examined the .iso files to verify it could find the rpm files it needed. I selected continue and it installed my x86 world. It basically copies all the necessary files to my /i386 directory, so that when you start the environment it can chroot into /i386 and function like this is the / directory.
 
After getting confirmation that System p AVE and x86 World were installed successfully, I was prompted to run /usr/local/bin/runx86 to start a shell. It returned to the menu and I selected Option 6 to quit. You can look at the install log by searching for p-ave_install*log.
 
I am now able to cd /i386 and runx86. Once I do this, the shell runs as it would on an x86 machine. If you run “arch” from your shell before you runx86, you’ll see ppc. If you run arch after you’re in your x86 shell, you’ll see i686.
 
I can now run and install rpms and do anything else as if I were on a regular Linux machine. I ran vncserver and my desktop came up as I would have expected. I moved some rpm files into the environment and installed them with rpm -ivh as I normally would. There are caveats–this isn’t a panacea where any and all applications will run at the same speeds as on native x86 hardware. But in many cases, the performance will be quite good.
 
I found that I wanted to ssh into the x86 World instead of the ppc world, so I opened my console, copied my /etc/ssh/ information from ppc world to x86 world, killed sshd that was running in ppc world, did a runx86 and started sshd from there. Once I did that I could ssh -Y into my x86 world and start exporting X applications to my desktop. I’ll need to read more documentation and play with it more, but on first glance it works as advertised. I was able to simply install rpms and run them.

References

From:   
 
Transitive Corporation, a leading provider of cross-platform virtualization software that enables the execution of applications across diverse computing platforms, today announced that IBM will commence shipping PowerVM Lx86 with all copies of PowerVM Editions, available across its entire line of  System p servers. PowerVM Editions, a set of advanced virtualization offerings developed by IBM for Power Systems platforms, now includes the x86 feature (developed for IBM by Transitive) which simplifies migration of x86 Linux applications onto this popular platform for server consolidation and business application deployment. PowerVM Lx86 allows the creation of an x86 application virtual environment so users may easily install and run a wide range of x86 Linux applications on a Power Systems platform with a Linux for POWER operating system. PowerVM Lx86 allows thousands of x86 Linux binaries to run unmodified and without recompilation on System p servers, helping to bring additio!
nal benefits with IBM PowerVM virtualization to enterprise customers by enabling more applications to be consolidated.
 
From:
 
Up to now more than 2500 applications have been ported to Linux on POWER, but still there are thousands only ported to x86 based platforms. With the IBM PowerVM Lx86 environment, a customer can take the original installation media of a Linux on x86 application and install it as is within a Linux on POWER partition running on IBM System p. There are many workloads that will run well within this environment. There are a few workloads that are not recommended to be run in this environment
From a customer perspective, this environment allows a very transparent and easy way to start taking the benefits of such an advanced infrastructure platform. From an ISV perspective this environment provides an excellent opportunity for a jump-start onto a new marketplace, postponing the decision of the code porting from Linux on x86 to Linux on POWER to a more appropriate moment in time if he judges necessary. It also allows the ISV the opportunity for keeping his development and support costs on a lower level, since there is only a single source (x86-based) code.
 
From:
 
In addition, the performance of some x86 Linux applications running on PowerVM Lx86 may significantly vary from the performance obtained when these applications are run as a native port. There are various architectural differences between x86 and POWER processors which can impact performance of translated applications. For example, translating dynamically generated code like Java byte codes is an ongoing translation process, which can be expected to impact the performance of x86 Java applications using an x86 Java virtual machine. Floating point applications running under x86 have a different default precision level from Power Architecture processors, so translating between these levels can have additional performance penalties. And finally, translating and protecting multi-threaded applications can incur an additional performance overhead as the translator works to manage shared memory accesses. IBM suggests that clients carefully consider these performance characteristics when selecting the best method for enabling applications for their environment.

Updating to a New TL or Service Pack? Call This Doc

Edit: I love that this document is still available.

Originally posted March 18, 2008 on AIXchange

I think you’ll find this IBM support document quite useful. It explains how to upgrade to a new technology level or service pack in AIX.
 
The document describes the recommended processes of updating your system to a new technology level or adding a service pack to an existing technology level. I’ll review some key words and terminology, run through recommended pre-checks, discuss the update_all process using both SMIT and command line and, finally, cover post-checks and FAQ.

Let’s start with the pre-checks. As noted in the document:

“It is recommended to have at least one back-out method available to you in case a restore is required. Recommended back out methods include: mksysb restore, sysback restore, altinst_rootvg clone, and multibos.”
 
This paragraph makes it pretty clear: Do not reject applied filesets with a TL update–use another back-out method if things don’t work out. Be sure to update test machines before production, and actually run tests to ensure that things work as expected on the test machines.
 
The document also tells you how to perform operations like boot image verification and conduct firmware, fileset consistency and free space checks. Then, under the Post Checks heading, is this statement:
 
“Presuming your update_all was successful you will want to check the following commands. If you receive unexpected output please contact the support center for assistance.

# oslevel -r
This should return with the expected TL output

# oslevel -s
This should return with the expected TL and SP output.

# lppchk -v
This should come back ideally with no output.”
 
And be sure to check out the FAQ section. There are some good questions (and answers) here. For instance:
 
Q: Is it okay to install my Technology Level in the “APPLIED” state ? Doesn’t that let me reject them if there is a problem?

A: With the introduction of Technology Levels and the “all or nothing” packaging format, these updates are bringing in on the upwards of 400+ fileset updates for each TL. Attempting to perform a “reject” process on so much code simply doesn’t work well. Recommended back-out methods are discussed earlier in this document.

Q: Does the same hold true for Service Packs?

A: The Service Pack updates are certainly much smaller groups of updates….typically numbering around 40-50 per update. While you certainly will have a better chance of successfully rejecting a 40 filesets instead of 400, it would still be best to have one of the back-out methods mentioned earlier.
 
Q: I need to run my update today but I won’t be able to reboot until next week. Is that a problem?

A: Plans should be made to reboot as soon as the update is complete and checks have been made to ensure there were no failures. System files will have been replaced, but the corresponding kernel and library updates will not be loaded until boot time. You will likely encounter problems if you delay rebooting.
 
Q: Is it recommended to reboot before issuing a TL upgrade?

A: If this is possible, absolutely. There are systems out there that haven’t been rebooted in over a year or more. Who is to say that something hasn’t happened in that time that simply wouldn’t show up until a reboot. Rebooting the system first assures a good boot image, a stable system, and would isolate any problem that normally wouldn’t be caught until the post-update reboot as either a preexisting issue, or an issue directly related to the TL update itself.

Q: Some say to use base AIX install media when updating the TL, others say the TL fix downloads or CDs should be used. Which is right?

A: The recommendation is to use the TL fix downloads from FixCentral, or the TL CDs that can be ordered either online or from AIX SupportLine. You can also use the base AIX installation media, however without getting into a long answer, the recommendation is using the TL fix packages.
 
Q: Is it okay to run the update_all while people are online?

A: Updating could affect running processes. As such, applications should be down and users offline as a general rule.

Even though I’ve quoted a lot here, I suggest you read the whole thing. And, as is noted throughout the document, feel free to contact IBM support with any questions.
 
I’m always on the lookout for good documentation, recommendations, cookbooks and the like. Whenever I find something, I’ll be sure to mention it here.

IBM Support Comes Through

Edit: I always advocate for calling problems into IBM support. This is old information, but I leave it here because you never know what people are still running and what problems they might run into. Why reinvent the wheel?

Originally posted March 11, 2008 on AIXchange

Recently I was working on a customer machine that was giving these lppchk errors: 
lppchk -v
lppchk:  The following filesets need to be installed or corrected to
         bring the system to a consistent state:
 
  bos.rte.xxxxxxx 5.3.7.0         (usr: not installed, root: APPLIED)
 
Using oslevel -s  and instfix -i, I received this output:
 
5300-04-00-0000
 
instfix -i | grep ML
   All filesets for 5.3.0.0_AIX_ML were found.
   All filesets for 5300-01_AIX_ML were found.
   All filesets for 5300-02_AIX_ML were found.
   All filesets for 5300-03_AIX_ML were found.
   All filesets for 5300-04_AIX_ML were found.
   Not all filesets for 5300-05_AIX_ML were found.
   Not all filesets for 5300-06_AIX_ML were found.
   Not all filesets for 5300-07_AIX_ML were found.
 
TL7 had  been applied at some point, but there must have been issues during that install that weren’t caught then. The customer had no backups of the machine prior to the TL7 upgrade. I opened a PMR with IBM and the correct update media was quickly shipped out, but when I tried to install it, I couldn’t due to the state the machine was in.

On my attempts to reinstall, I received this error:
 
fileset is applied on the “root” part but not on the “usr” part.
      Before attempting to re-apply this fileset you must remove its
      “root” part.  (Use the reject facility if the fileset is an
      update.  Remove the fileset via the deinstall facility if it is
      a base level fileset.)
 
If I tried to reject it, I got this error:
 
SELECTED FILESETS:  The following is a list of filesets that you
  asked to reject.  They cannot be rejected until all of their
  dependent filesets are also rejected.  See subsequent lists for
  details of dependents.
 
We tried to force overwrite the fileset, but it gave us errors as well. So I was in a catch-22. But then I called IBM support and referenced the PMR number, and was connected with a knowledgeable AIX support person.
 
We had no mksysb of the machine, and reloading the operating system from scratch was a last resort. I think the IBM representative understood my position. She took the time to help us explore all options before finally having me reload the machine.
 
Thanks to IBM support’s hard work, I was able to resolve the problem by performing “surgery” on the machine’s ODM. Now, I would NOT recommend trying this on a production machine unless support instructs you to do so. (I guess though if you’re on a test machine that you don’t care about destroying if you make a mistake, have it at.)

Here’s a rough idea of what we did to make the machine ignore the broken updated fileset.
 
# export ODMDIR=/etc/objrepos
# mkdir -p /tmp/odmfix
# cd /tmp/odmfix
# odmget -q name=’fileset name’ lpp > lpp.out
==> vi lpp.out to get the lpp_id = ###
# odmget -q lpp_name=’fileset name’ product > product.out
# odmget -q lpp_id=### history > history.out
# vi history.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# vi product.out ==> Remove the ver=5 rel=3 7 stanza’s, save file
# odmdelete -q lpp_name=’fileset name’ -o product
# odmdelete -q lpp_id=### -o history
# odmadd product.out
# odmadd history.out
 
After running this procedure, we put the machine in a state where it ignored the broken TL7 file set. At that point, I could reload it. After swapping a few CDs and finishing the TL7 update, the lppchk errors went away.
 
Support also reminded me of something found here:
 
“The rule has been changed that previously allowed applying individual updates/PTFs from a TL. The rule now says that installing a Technology Level is an ‘all or nothing’ operation. Requisites are now added so the whole Technology Level is installed. Before applying a TL, you should always create a backup and plan on restoring that backup if you need to rollback to your previous level.”
 
The IBM rep gave me this explanation: When doing TL updates, plan to commit the fixes rather than apply them, because they don’t support rejection of TL updates. The backout procedure is to restore from mksysb (or boot from your alternate disk if you go this route).

Long story short: Make sure you have valid environments for testing fixes before installing them in production, and always be sure you have good backups.

And the moral of this story? Never take IBM support for granted. Their help is invaluable.

Supporting Users Starts with Data

Edit: Another relevant post with things to consider, it still holds up today.

Originally posted March 4, 2008 on AIXchange

A while back I injured my knee and needed treatment. My doctor referred me to a specialist, who had me fill out some forms since I was a new patient. When I met with the nurse, she looked over my forms and entered the information into the computer. While doing so, she called up my entire patient history, which included details about older prescriptions that I’d forgotten to include on the form. She could do this because the specialist and my primary physician had access to the same database of patient information.

I’ve experienced something similar when calling my ISP to report a network outage. A technician would bring up my call history to access information about issues I’d previously reported. I’ve seen the other side of this, too. I’ve contacted help desks that either didn’t maintain or didn’t bother to check my call history. Instead of immediately responding to my problem, they wasted my time getting basic information. It wasn’t that the technicians were rude or incompetent, but their companies just seem less professional, especially when I compare those encounters to my experiences with the ISP.When supporting users, the more information we track and act upon, the better. I try to be proactive by monitoring machines and networks. When users contact me, I find it extremely helpful to have their call history available. Is the user calling about the same printer problem? Then maybe some hardware needs to be fixed, or the user needs some training.

A friend was telling me about his efforts to figure out why an important network device was intermittently going down at his company. Whenever he thought he’d resolved the issue, it would resurface. It turned out that a user who’d taken an extended leave of absence was periodically coming into the office with a laptop and a dedicated IP address, and that was causing the network conflict. But my friend only figured it out because he checked the problem reports and correlated that with the data being logged on the network.

Systems and network monitoring–along with good ticket-tracking software and procedures–can provide first-level support personnel with the information they need to resolve user problems. Chances are, your company is engaged in these practices. But if you’re not, and you find yourself putting out the same fires and handling the same kinds of problems, maybe it’s time to rethink the way you’re doing things.

Customer Satisfaction Starts with Us

Edit: This is still good information to consider and think about.

Originally posted February 26, 2008 on AIXchange

When assisting customers with their hardware designs, communication is key. Every step of the way we need to educate customers about the configurations we’ve chosen and the thought processes that went into those choices. Then we must listen and address any issues they might have with our proposals.

Rather than push the latest and greatest hardware and virtualization tools, we must recognize what our customers are trying to accomplish and help them understand the tools that can best help them realize their objectives. Ultimately, whether it’s new network gear or System p hardware, we must be sure that the solutions we propose to our customers fit their needs.
 
We may run from customer to customer and implementation to implementation living and breathing PowerVM, CuoD, VIO and LPAR. But if they’ve not had our training and hands-on experience, we may find that the same words have different meanings. For instance, when I say “LPAR,” my customer might hear “all my eggs in one basket.” When I say “virtual I/O server,” my customer might hear “performance bottleneck.” We must be certain we’re speaking the same language. We must be sure that customers understand the pros and cons of any proposed solution.
 
We must also be sure that our customers are up to speed on best practices. We need to explain that we wouldn’t propose a solution to them unless we were convinced it was right for their situation.

And again, we must listen. If we’ve educated a customer on LPAR’s benefits but that customer elects to run on a standalone machine, we should help them with their chosen server solution rather than hammer at an option they’re not interested in at this time. This doesn’t mean that we stop educating customers about virtualization’s many benefits. But it’s a matter of priorities. Ensuring that they’re comfortable with the machines they’ll be running in their environment is foremost.
 
Again, we must be clear on the assumptions we’ve made and the tradeoffs that we took when coming up with our designs. Customers need to know what they’re signing off on. The last thing they want is to discover months down the road that they don’t have the equipment they thought they did. I’ve seen customers who believed that the machines at their production and disaster recovery (DR) sites were identical, only to learn that the DR site was slightly less powerful. Now, in a DR scenario, running a piece of the application and/or using a less powerful machine at the secondary data center may make financial or business sense. But the decision must be communicated to all involved parties. Certainly, system operators need to be aware of these choices long before they start loading new operating systems or testing applications.
 
We’re in this together. Customers want appropriate solutions to solve real business problems, and we want happy and satisfied customers. We have the right hardware and the right tools–it’s up to us to help customers architect the right solutions.

Workload Partition Manager Offers a Better Way

Edit: Guess what I do not run anymore? The links seem to redirect, but I was able to find the information after a little digging. Your mileage may vary.

Originally posted February 18, 2008 on AIXchange

Managing and organizing an environment with several workload partitions (WPARs) running on many different machines can be difficult. To start and stop WPARs, you must log into each individual global instance. But there’s a better way: Workload Partition Manager.

With Workload Partition Manager, you can much more easily see which WPARs are running on your machines. You can also use this product to relocate your WPARs automatically, by establishing policies, instead of moving them manually. While WPARs are built into AIX 6, Workload Partition Manager is a separately purchased program that you should consider if your shop plans on using WPARs.

WPARs have been covered a lot lately — including herehere and here.
 
But in this entry I’ll dig deeper, and focus on how to relocate workload partitions from one machine to another. First, I installed the “IBM Workload Partition Manager for AIX” CD on my AIX 6 machine and read the README.wparmgr.txt file. This info is also available here and here.

Following the readme file, I ran these two commands:

installp -acqgYXd wparmgt.mgr
installp -acqgYXd wparmgt.db
 
Then I ran:
 
/opt/IBM/WPAR/manager/db/bin/DBInstall.sh -dbinstallerdir /db2 -dbpassword
 
and received this output:
 
DBInstall.sh:Database install started. 
DBInstall.sh:Database install successful. 
DBInstall.sh:Database instance creation started. 
DBInstall.sh:Database instance creation successful.
DBInstall.sh:Database creation started.
DBInstall.sh:Database population started.
DBInstall.sh:Database creation successful.
 
The instructions then called for me to “execute the following command with the X11 DISPLAY variable set.” So I pointed my DISPLAY to my X session and ran:
 
# /opt/IBM/WPAR/manager/bin/WPMConfig.sh
 
(Note: You can also use the console version by adding -console to the end of that command.)
 
The WPMConfig.sh command presents a GUI that’s used to configure Workload Partition Manager. I used all of the default values that it presented to me, except when entering the password.
 
Then I installed the agent code on my agent machine with:
 
installp -acqgYXd wparmgt.agent
 
After that, I ran:
 
/opt/IBM/WPAR/agent/bin/configure-agent -hostname
 
(Note: I found one gotcha that revolved around the hostname I used. When I tried host1_aix6, I received this error:

java.net.URISyntaxException: Illegal character in hostname.

The GUI kept displaying “failed registration” in the state column. But when I changed it to host6, the state changed to “online” and it worked fine. Hopefully this tip will help someone out there. You must also be sure that your host names are resolvable in your network.)
 
I was prompted for the password I’d configured earlier. I received this output:
 
Agent Registration Password:
Re-enter Agent Registration Password:
0513-059 The wparagent Subsystem has been started. Subsystem PID is 426010
 
Then I installed the agent on a second machine using the same process.
 
With that, I could I point my browser to my management machine and login by going to:
 
http://<localhost>:14080/ibm/console
 
At this point I could log in with root:password and create and relocate WPARs, change settings, look at error logs, etc.
 
Then I looked at the documentation and on my NFS server and ran:
 
crfs -v jfs2 -m /wparsfs -A yes -a size=1G -g datavg
mount /wparsfs
mknfsexp -d /wparsfs/ -r -B
exportfs

(Note: exportfs should show you the directory that you just exported using nfs.)
 
I went ahead and used the wizard to create a WPAR, following the prompts that were presented to me. They seemed pretty self-explanatory. I made sure to select “enable relocation” when creating my WPAR so that I could test out the relocation of my WPARs from one machine to another. (WPARs can obviously be created from the command line, but I chose the wizard instead to see how it worked. You can also get the ouput that the GUI generates and then save that for later use in scripts, etc.)
 
By selecting the task activity and then the workload partitions tab, I could toggle back and forth between them in case I needed to do some troubleshooting. The task activity tab provided important warnings and information as I set up my WPARs to be “relocatable.”
 
Once I got it all working, I was able to create, deploy and relocate WPARs, which was the point of the exercise. I could also change their properties, all from the GUI. I could easily see which WPARs were defined, active, broken and undeployed, and on which machines. I could use the GUI to create and remove them.

Again, all of these things (save for the actual relocation–this is why you need to purchase the software) can be done from the command line, but Workload Partition Manager makes it much easier to keep track of your WPARs. Familiarize yourself with this valuable tool.

Quick Tips

Edit: The first link no longer works, although when you google for the publication you can find it on other sites. Some of the Linux tips may not work the exact same way, but the principles are the same and the link still works. The Youtube video is gone as well.

Originally posted February 11, 2008 on AIXchange

I’m passing on these tips that I picked up on a mailing list.
 
#1: The first is related to advanced system management interface (ASMI) access. There are now different default IP addresses for POWER6 machines. More information is available in the System p Operations Guide for ASMI and for Nonpartitioned Systems (SA76-0094-02).
 
For the primary service processor use:

HMC1 = 169.254.2.147
HMC2 = 169.254.3.147

For the secondary service processor use:

HMC1 = 169.254.2.146
HMC2 = 169.254.3.146

Incidentally, I covered the topic of using the ASMI to manage your machines here.
 
#2: The second tip involves new service and productivity tools for Red Hat Linux on POWER (non-blade, non-HMC-managed) systems. Check IBM’s Web site for different tabs pertaining to RHEL5, RHEL4 and RHEL3.
 
IBM lists rpms for service aids, hardware inventory, service log, error log analysis, service agent, etc. From IBM:

“The following tools are available for servers running Red Hat Linux that are not managed by an HMC and are not BladeCenter servers. Click the Tool name link for a brief description of the tool. Click the Download link to download the tool package.

“To install and use the following tools under RHEL4, ensure that compat-libstdc++-33-3.2.3-47.3.ppc.rpm is installed from the RHEL4 media. When installing the powerpc-utils and powerpc-utils-papr packages, include the “–force” option on the rpm command line. Tool packages must be installed in the order listed in the table.”

These tools should help make managing Red Hat Linux on Power a bit easier.

#3: Finally, check out this new YouTube video that demonstrates PowerVM Lx86 and Live Partition Mobility. 

If you have tools and tips that others should know about, let me know, or add a comment.

Configuring Your Machine Before it Arrives? Now That’s a Good Plan

Edit: Modified the link to go to a current SPT site.

Originally posted February 4, 2008 on AIXchange

I hope you’re keeping current with the latest version of the IBM System Planning Tool (SPT). From IBM:
 
“The SPT is a browser-based application that helps you design system configurations; it is particularly useful for designing logically partitioned systems. The SPT is integrated with the IBM Systems Workload Estimator (WLE), which enables you to plan a system based on existing performance data or based on new workloads. System plans generated by the SPT can be deployed on the system by the Hardware Management Console (HMC) and Integrated Virtualization Manager. The SPT is available to assist the user in system planning, design, validation and to provide a system validation report that reflects the user’s system requirements while not exceeding system recommendations.”
 
The latest version (as of this writing) is dated Jan. 29 and offers the following improvements. (Again, the list is from IBM.)

  • Added support to convert an HMC or Integrated Virtualization Manager (IVM) system plan to an SPT system plan.
  • Added support to allow comments and order status on expansion units.
  • Added the ability to change order status on multiple items at once.
  • Added support to record the alternate restart device for i5/OS.
  • Added support for partition profile names.
  • Added support to allow utilization of unused dedicated processors.
  • Added capability to copy systems between system plan files. Choose the “Add…” option from the Work with Planned Systems panel and select “Import from another system plan.”
  • Added support for creating Virtual Ethernet Adapters in Linux, AIX and VIOS partitions which communicate on multiple VLANs.

A final thing from IBM:

“SPT 2.0 will be the last release that will support .lvt and .xml files. Users should load their old .lvt and .xml plans and save them as .sysplan files. It is recommended that you take action prior to March 31, 2008.”
 
When you fire it up, you see this message:
 
“The SPT helps you plan the configuration of a partitioned system. You can place your hardware order based on this system plan. You can also use this system plan to automate the creation of logical partitions on the system.”
 
I had a plan that I generated from my HMC, and the first thing I saw was:
 
“System plans generated by the IVM or HMC must be converted into a format that is compatible with the SPT. This wizard steps you through the process of converting a system plan to the correct format. When you finish the wizard, the changes you make are applied and the plan is saved with a new name. You can view the original system plan as you go through the wizard.”
 
So I clicked on the Convert button and was then able to go in and edit the file that came from my HMC.

As the tool evolves and improves, I keep hearing more positive things. Being able to configure your machine before it arrives is a great idea. When you import your system plan to the HMC, you avoid configuring each partition by hand. You can get straight to loading the OS when the hardware arrives on your raised floor instead of spending time configuring the partitions. When you plan on having multiple VIO servers and multiple LPARs, this tool makes things go much more smoothly. It will warn you when you make mistakes or when your partitions aren’t set up properly.
 
Be sure to look at the system plan view from the HMC–it will show you a complete description of how your machine has been set up, along with a graphical view of the actual hardware, the slot numbers that have been assigned to each partition, etc. Download the code and start using it, and keep watching IBM’s Web site for updates and patches.

BladeCenter: More Than Intel Inside

Edit: I can’t remember the last time I messed with blades. A couple of the youtube links no longer work, the hardware links take you to generic IBM pages, entropy lives.

Originally posted February 1, 2008 on AIXchange

I was tuned in for playoff football, but when I was too slow with my remote control and my DVR, I found myself watching an IBM BladeCenter commercial. Surely you’ve caught some of the spots:

http://www.youtube.com/watch?v=bPm4IHY6vvg&NR=1

http://www.youtube.com/watch?v=zuWvBy3Ttc4&feature=related

http://www.youtube.com/watch?v=eGX1QpLIbSA&feature=related

http://www.youtube.com/watch?v=cmmiJJOyJm0&feature=related

A quick search of “IBM blades” on YouTube yields some interesting material, including commercials, demonstrations and comparisons. In many cases you’re pointed to IBM’s website, which also offers useful information, like how the BladeCenters are designed to help reduce power, cooling and cabling costs.

What caught my attention with many of these commercials is the message that the blades have Intel inside. Sure, that’s great for my Windows and Linux administrators, but why should AIX or i5/OS administrators care about BladeCenter? Two reasons: the IBM BladeCenter JS22 Express and the IBM BladeCenter JS21 Express.

The JS22 has POWER6 processors on a blade. When would you consider deploying these POWER blades? Maybe you need to refresh some older standalone machines  with newer hardware. Blades might be great for consolidating smaller machines, or they might make a terrific test lab or QA environment. With a POWER6 processor, this solution may be suitable for larger workloads as well.

Depending on the size of your shop, running some Intel servers along with your AIX/Linux servers in the same chassis could make sense. In my case, I had spare slots in an existing BladeCenter H chassis and was able to quickly and easily load a JS21 and a JS22.  Neither of them had an OS loaded, and although I could have loaded AIX 5.3 or AIX 6.1 directly onto the hardware, I chose VIO 1.5 instead. This loads the Integrated Virtualization Manager (IVM) onto the blade, allowing me to carve up my blades into LPARs using an interface similar to that of the new HMC v7. One of those blades currently runs three LPARs, the other runs four. I run VIO, Linux, AIX 5.3 and AIX 6.1 on the same blade.   

The JS21 can have two internal drives and the JS22 can have one, so to really benefit from running multiple LPARs, I strongly recommend connecting the BladeCenter to a SAN-based storage solution.

How do you begin loading an OS onto the blade? First, run the secure shell command (ssh) and log into the BladeCenter environment. Then run:

console –o –T blade[x]

where x is the blade number that you’re connecting to. From here, load the VIO server. Once the VIO server had an IP address, connect to it, and a Web browser front end comes up. Then log into the IVM and start carving up LPARs.   

For more information, click here.

Creating and Using a WPAR

Edit: How many of you used or still use WPARs?

Originally posted January 21, 2008 on AIXchange

Last week I discussed workload partitions (WPARs) in AIX 6. Now let’s continue with this topic and look at how you actually create and use a WPAR.
 
With WPARs in AIX 6.1, there’s only one copy of the AIX operating system to worry about–it’s called the global instance. From this global instance, you manage your WPARs. Creating a basic WPAR is as simple as entering:
 
mkwpar -n mywpar

and waiting a few minutes.  After the wait is done, enter:
 
startwpar mywpar
 
and you have a running WPAR.   
 
As I previously noted, the IBM Redbook on Workload Partition Mobility gives much more information.   
Here you’ll learn about specification files that you can create so that you can clone your WPARs, the differences between application WPARs and system WPARs, etc.  If you set up networking (or if your hostname already existed in /etc/hosts on your machine when you created your WPAR) then you can ssh or telnet into your WPAR, as if it were any other machine on the network. You can also get a console login by entering:
 
clogin mywpar
 
from the global instance of AIX.
 
Again, from the Redbook:
 
“The separation of user sets (or security domains) between different system workload partitions also enables the system administrators to isolate groups of users logging on in AIX environments according to their application access control requirements. Users defined in one system WPAR are unaware of the applications executing in the global environment or in other WPARs. They cannot see the list of users or processes outside their WPAR.”
 
This means that there’s a different /etc/passwd file and a different root user for the WPAR. You can change the WPAR root password and give it to a junior administrator or database admin, or any users who think that they need root. They can do what they need to do as root, but they don’t effect the AIX global instance. If they break something, they only hurt themselves, not anyone else on the system.

Perhaps, for example, an application runs better when managed using root. Instead of setting up sudo, or a role-based access control (RBAC), just give the user the root password to the WPAR. Think of a chroot jail, or any other virtual environment you’re used to.
 
You cannot see any disks in a WPAR. It lives in a bunch of filesystems in the global instance:
 
/dev/fslv03       262144    208144   21%      1710     7% /wpars/mywpar
/dev/fslv04       131072    128312     3%           5     1% /wpars/mywpar/home
/opt                  262144     54144    80%      2103    26% /wpars/mywpar/opt
/proc                  –         –    –          –     –  /wpars/mywpar/proc
/dev/fslv05       262144    256856    3%           10     1% /wpars/mywpar/tmp
/usr                 3276800   113072   97%     33643    68% /wpars/mywpar/usr
/dev/fslv06       262144   236008   10%         365     2% /wpars/mywpar/var
 
There are flags to encapsulate the whole WPAR into one filesystem on your machine. If you want to set up 10 WPARs on your machine, your /etc/filesystems and df output  in your global instance can get pretty ugly pretty quickly.
 
It is spooky the first time you run lspv and lsvg in WPAR and get nothing in return.   
 
# lspv
# lsvg
0516-318 lsvg: No volume groups found.
 
Be sure to read about the -@ flags that many commands use now. If I’m in my global instance and I want to see the processes running in my WPAR, I can enter:
 
ps -ef -@ mywpar
 
   WPAR      UID    PID    PPID   C    STIME    TTY  TIME CMD
mywpar     root 278754 385194   0    Dec 07      –  0:00 /usr/sbin/syslogd
mywpar     root 315502 385194   0    Dec 07      –  0:00 /usr/sbin/qdaemon
mywpar     root 319598 385194   0    Dec 07      –  0:00 /usr/sbin/sshd
mywpar     root 344148 385194   0    Dec 07      –  0:00 /usr/sbin/writesrv
mywpar     root 348376 385194   0    Dec 07      –  0:00 /usr/sbin/rsct/bin/IBM
mywpar     root 364548 385194   0    Dec 07      –  0:01 /usr/sbin/rsct/bin/rmc
mywpar     root 385194 413910   0    Dec 07      –  0:00 /usr/sbin/srcmstr
mywpar     root 409814 413910   0    Dec 07      –  0:00 /usr/local/bin/aixagen
mywpar     root 413910 200850   0    Dec 07      –  0:00 /etc/init
mywpar     root 426046 413910   0    Dec 07      –  0:00 /usr/lib/errdemon
mywpar     root 430208 413910   0    Dec 07      –  0:00 /usr/sbin/cron
mywpar     root 438510 385194   0    Dec 07      –  0:00 /usr/sbin/rpc.lockd -d
mywpar     root 442490 385194   0    Dec 07      –  0:00 /usr/sbin/portmap
mywpar     root 446646 385194   0    Dec 07      –  0:00 /usr/sbin/inetd
mywpar     root 458986 385194   0    Dec 07      –  0:00 /usr/sbin/biod 6
mywpar     root 463090 385194   0    Dec 07      –  0:04 sendmail: accepting co
mywpar     root 557080 385194   0    Dec 07      –  0:06 /usr/sbin/rsct/bin/IBM
mywpar     root 561182 385194   0    Dec 07      –  0:00 /usr/sbin/rsct/bin/IBM
 
and only see the processes that belong to that WPAR.   
 
This command

topas -@ mywpar
 
also shows interesting output, as there are no disk stats to report.

So read the Redbook, load AIX 6 on a test box and see what else you can do with WPARs. Breathe new life into that old hardware. Yes, POWER6 and APV certainly have their place, but AIX 6.1 gives us new options in the way we manage our environments.

WPAR Mobility has its Benefits

Edit: I have not done much with this lately but it is always fun to look back at what we were able to do with the technology as it evolved.

Originally posted January 14, 2008 on AIXchange

In this post, I discussed a trip to Austin where I had my first chance to look at Live Partition Mobility. You can move an actual running workload from one physical machine to another, and nobody can tell that you’ve made this change–it happens on the fly.

While I was in Austin, there was also some discussion of Live Application Mobility using Workload Partition (WPAR) Manager in AIX 6.1. At the time, I was far more impressed with Live Partition Mobility, since users would experience an interruption with Live Application Mobility. Sticking with Live Partition Mobility and POWER6 seemed like a no-brainer.

To use Workload Partition Mobility, you had to actually check-stop your WPAR; then it would restart on the machine that you moved it to. Although you’d keep track all of your transactions and all of your data that was “in flight” at the time of the move, there would still be a period of time when the application was unresponsive. At first glance, this seemed unacceptable. However, now that I’ve had some time to rethink my position, I can see the benefits of each approach.

Here’s an excerpt from an IBM Redbook on Workload Partition Mobility

“In 2007, IBM System p6 and AIX V6 have two features that seem similar, but are different: WPAR mobility and live partition mobility:

“WPAR mobility, which is discussed in this book, is a feature of AIX V6 and WPAR Manager. It is available on POWER4, POWER5 and POWER6 systems.

“Live partition mobility relies on the POWER6 hardware and hypervisor technology (Advance Power Virtualization). It is available on POWER6 systems only. This feature is also available to AIX 5.3 LPARs.”

If you have older POWER4 hardware that you want to use micropartitions with, you’re out of luck– Advanced Power Virtualization (APV) isn’t supported. But if you didn’t pay for APV with POWER6 or POWER5 hardware, or if you have the older POWER4 hardware, you can try to simulate micropartitions with WPARs. There are tradeoffs–for instance, you won’t get the full benefit of APV using a WPAR–but you can still do some workload consolidation, assuming it makes sense for your environment.

By loading AIX 6.1 on POWER4 or POWER5 machines, you’ll find a whole new way to manage these systems using WPARs. When you set up WPARs, they can dynamically change their CPU and memory usage on the fly. You can create limits so that they can only consume some percentage or share of the system. You can also set up automatic movement of WPARs between machines, so if Machine A is getting bogged down, but more are resources available on Machine B, you can either manually or automatically move those workloads.

As with Live Partition Mobility, if you need to do hardware maintenance, you can move workloads in your WPARs to other machines, and then power down the departure system to work on it. Once that maintenance is completed, you can return the workload to the original machine.

Again, there are limitations. As of this writing you can only use NFS to move your workloads between machines. You can’t move a WPAR from a POWER6 machine down to a POWER4, but you can certainly move WPARs between machines from the same CPU family.

I’ll probably spend a little more time on this topic next week, so be sure to check back then.

More on Virtual Optical Devices

Edit: This is still something I use all the time. An oldie but a goodie.

Originally posted January 8, 2008 on AIXchange

The more I use virtual optical devices with the IBM Virtual I/O Server (VIO server) and AIX, the more I like them. I wrote about virtual optical devices with the Integrated Virtualization Manager (IVM) in this post.

After getting an optical media library working with IVM, I wanted to try it on my VIO server using the HMC. I couldn’t find anything in the HMC GUI. I was going to poke around the command line on my own, but as luck would have it, someone forwarded a presentation with the information I was looking for. Now I’ll share this information with you.

You can see the virtual optical commands that are available in your VIO server by running:

help | more

Virtual Media Commands

chrep
chvopt
loadopt
lsrep
lsvopt
mkrep
mkvopt
rmrep
rmvopt
unloadopt

You can then run help , where is name of the command you’re seeking information about.

First I log into my vio server as padmin. I run:

mkrep -sp datavg -size 16G

Virtual Media Repository Created
Repository created within “VMLibrary_LV” logical volume

This basically creates my optical library logical volume, as you can see:

$ oem_setup_env
# df
Filesystem 512-blocks Free %Used Iused %Iused Mounted
on
/dev/hd4 524288 458656 13% 2666 5% /
/dev/hd2 7340032 612624 92% 61350 46% /usr
/dev/hd9var 1310720 1223608 7% 447 1% /var
/dev/hd3 4718592 4302464 9% 48 1% /tmp
/dev/hd1 20971520 17088808 19% 60 1% /home
/proc – – – – – /proc
/dev/hd10opt 1572864 647912 59% 10655 13% /opt
/dev/VMLibrary_LV 33554432 33417600 1% 4 1%
/var/vio/VMLibrary

Then I copy my .iso files (which can be created from CD if you don’t have them) to /var/vio/VMLibrary. If you don’t have .iso images, you can insert a CD (after assigning the CD-ROM to your VIO partition) and run:

mkvopt -name .iso -dev cd0 -ro

where is what you want to call the file.

After the .iso file is in your /var/vio/VMLibrary directory, run:

mkvdev -fbo -vadapter vhost4
vtopt0 Available

(Obviously you’ll replace vhost4 with whatever vhost adapter you plan to use in your VIO server.)

This mkvdev command creates your virtual optical device. Now run loadopt and it loads your CD image as if it were located in the CD device. This was a great solution for the situation I faced, where I wanted to load AIX 6 even though my Network Installation Management (NIM) server hadn’t been updated to AIX 6.

I ran:

loadopt -vtd vtopt0 -disk
cd.AIX_6_OpenBeta.0737.V1.ISO

After the loadopt command, I could run lsmap on the vadapter I’d assigned to my vtopt device earlier in the mkvdev command.

lsmap -vadapter vhost4

SVSA            Physloc                                      Client Partition ID
————— ——————————————– ——————
vhost4          U9131.52A.0649DDG-V5-C7                      0x00000000

VTD                   vtopt0
Status                Available
LUN                   0x8100000000000000
Backing device        /var/vio/VMLibrary/cd.AIX_6_OpenBeta.0737.V1.ISO
Physloc

From here I just booted my partition from this virtual CD and loaded the OS onto it.

When the first CD finished, I was prompted to remove CD1, insert CD2 and click Enter. So I went back to my vio server and ran:

unloadopt -vtd vtopt0

then I ran:

loadopt -vtd vtopt0 -disk
cd.AIX_6_OpenBeta.0737.V2.ISO

and selected Enter in my terminal window. The install continued as if I’d moved physical media around.

I obviously did the same when prompted to load CD3.

As IVM is running VIO server, there’s no reason you can’t use these same commands on either your IVM or HMC managed machines using the command line.

User Groups: Still Going, and Still Worth Your Time

Edit: I still advocate for finding and attending user group meetings, both virtually and in person. The links will redirect to new sites but they no longer appear to take you where they used to.

Originally posted December 17, 2007 on AIXchange

Have seen the poweraix.org user group listing lately? There are around 30 groups by my count. Some of these groups look more active than others, but it still wouldn’t hurt to check into the status of a group in your area. Perhaps your inquiry might spur a group back into action. Or, if you don’t find a group in your area, maybe you should take the initiative to start one.

Over the years I’ve frequently attended Linux and AIX user group meetings, and I would argue that you’d benefit from doing the same. Although we’re surrounded by the talented people that we work with, it’s always good to meet and network with others in our field. Whether you’re new to AIX and seeking a mentor or you’re an experienced administrator looking to meet others, these meetings can be a great place for you.

If you can’t attend meetings, either due to a lack of time or the absence of a group in your area, you can still join virtual user groups and sign up for their teleconferences and webinars. They bring in various guest speakers just like traditional user groups–and perhaps an hour-long conference call fits more easily into your schedule.

User group mailing lists can be another great resource. Groups that may not regularly schedule formal meetings may still have active lists, and the informal question and answers that can come from the mailing list can be very helpful.

Still, when possible, take the time to clear your calendar and travel to a user group meeting. I believe the benefits outweigh the inconveniences. As noted, there’s the benefit of networking. By getting to know the other administrators in your area, you can find other local companies that run the System p and AIX platform. You never know when you may be able to find good talent and convince them to come work for your organization, or when you might hear of a good opportunity that makes sense for you.

You could even win something. Many groups have giveaways and raffles. One group I was part of got publishers to give away books. In exchange, they’d ask each person who won a book raffle to author a report on it. Anyone who submitted a book report would then get first choice on the books that were available at the next meeting.

Although much of this is North American-centric, this IBM Web site references the “Guide Share Europe pSeries Working Group [which] is a formally organized group whose membership is bound to an annual fee.”

The more people who learn about the benefits of Power Architecture and AIX, the better. User groups offer an excellent option for you to get involved and help spread the word in your area.

See the Difference in AIX 6.1

Edit: A blast from the past. I wonder how many customers still run AIX 6.1?

Originally posted December 11, 2007 on AIXchange

The IBM AIX Version 6.1 Differences Guide has been released, and I suggest you take the time to read it. I’ll run down some highlights, chapter by chapter.

Chapter 2–Information here includes things like turning off jfs2 logging to increase performance (page 34), taking a jfs2 internal filesystem snapshot (page 34-35) and turning on encrypted filesystems (page 38).   

Chapter 3–The focus is workload partitions. Also discussed are updates and changes that have resulted in different performance tools to account for workload partitions. On page 158 it’s noted that the default size of the argument area on the command line has been changed. In older versions of AIX, if you tried to do a rm * in a directory with too many files, you’d get an error. You could either manually find smaller lists of files to give to the rm command, or run a find with xargs and do your rm that way.

Page 161 illustrates how you can limit the number of threads per process and the number of processes per user. Included is an example of a developer writing code that would bog down the whole machine. But now you can keep developers from bringing your machine to its knees.

You will see new entries when you run ulimit –a:

time(seconds)        unlimited
file(blocks)         2097151
data(kbytes)         131072
stack(kbytes)        32768
memory(kbytes)       32768
coredump(blocks)     2097151
nofiles(descriptors) 2000
threads(per process) unlimited
processes(per user)  unlimited

Page 169 cites the IBM Systems Director Console for AIX, a default feature of AIX 6.1. If you go to https://localhost:5336/ibm/console, you should get a login screen. Then login with the root password, then log on and see what it has to offer.

Page 178 contains info on the Distributed Command Execution Manager (DCEM), which allows you to run the same command on multiple machines. When I was at IBM, I used a similar tool that saved me the hassle of logging into 100 different machines. I just logged into the master server and issued the command, and it would run on all of the machines and return the result to my master machine. As I said, DCEM seems very similar to this concept.

Page 202 talks about restricted tunables. IBM is suggesting that system administrators shouldn’t modify these tunables unless instructed to do so by support. Because they’re not supposed to be modified, they’re not displayed unless you use the -F flag. You’ll also get a warning message if you change one of the restricted tunables. This action will also cause a notification to go to the error log.

Page 215 goes into detail about the performance settings that come with AIX 6.1 out of the box. This is a change from the old behavior–you no longer must go in right away and tune minperm and maxperm and enable aio servers for database machines. These settings are all now set up correctly by default. On Page 217 it states that AIX 6 will enable I/O pacing on new installs.

I also recommend that you read about all of the new security enhancements. These are found in Chapter 8 starting on page 253. Look for things like weak root passwords and how to install your machine secure by default. I know the first thing we’ve always done after a fresh install is to go in and disable the unneeded services from /etc/inetd.conf, /etc/inittab, etc. Now the OS is installed with minimal services, allowing you to activate only the additional features that you actually need.

There’s plenty more, but hopefully this has convinced you to download the guide yourself. Much is improved with AIX 6.

Saving Loads with VIOS

Edit: It is much easier now, but this was the beginning of my journey into loading virtual media with VIOS.

Originally posted December 3, 2007 on AIXchange

I had a problem, and it was driving me crazy. I had a test box located hundreds of miles away. I had no time to drive on-site to physically load media to get an operating system installed on this test machine, but I did have a desire to get the openbeta of AIX 6 loaded on it. 

I had the AIX 6 openbeta .iso image downloaded to the machine, (I had downloaded the beta before AIX 6 went GA on Nov. 9) and I wanted to load it on this test box. First I thought, this will be easy, I’ll use my NIM server. I’ll load the mksysb file from the openbeta .iso image into my NIM server, and load the OS onto the new machine using NIM. However, since my NIM server is currently running AIX 5.3, it won’t work to use it to serve out this newer version of AIX. You can only use NIM to serve out AIX at the same level as the NIM server or older versions. It can’t serve out newer versions of AIX. How would I get this machine loaded without using physical media and driving to the datacenter?

The update to VIOS, version 1.5.1, contains my answer. (Download or order the CDs here.)

I used IVM for my example below. (I haven’t seen the same functionality on machines using a hardware-management console (HMC) at this time, and I haven’t had a chance to try this using the command line.) After I got VIOS 1.5.1 running, I clicked on View/Modify Virtual Storage as you can see about halfway down this screen shot.

From here I went to optical devices: 

And clicked on create library.

I chose 5 GB for this media library’s size. This just adds a logical volume to rootvg of your VIO server under the covers, in my case I ended up with:

/dev/VMLibrary_LV   10485760  10442544    1%        4     1% /var/vio/VMLibrary

Looking at the View / Modify Virtual Storage page once again, it shows that you have a media library now and you can perform different operations on that library.

Once you’ve created your media library, you can use the Add Media button, which gives you many options from which to choose.

In my case, I just copied my .iso image from my /home/padmin directory to my /var/vio/VMLibrary directory from the command line after logging in as padmin and running oem_setup_env. 

After copying the file, I added my media to a virtual optical device, which I then assigned to a partition. The screen shot below shows how the partition’s properties looked once I had selected and created the virtual optical device I was going to use.

Now all that was left was to boot my partition and in SMS use this virtual CD device as my boot device.

It booted from this .iso image as if the DVD were mounted locally in the machine. 

From the Virtual I/O Server Version 1.5 Release Notes Virtual I/O Server Version 1.5 Release Notes:

“The Virtual optical support is extended to support the loading and unloading of optical media files to a virtual optical device in the partition. Read-only support is provided to enable multiple partitions to access the same ISO image simultaneously. Read-write support is provided to allow partitions to treat the device as a DVD-RAM drive. In addition to the optical support, files can now be used for virtual disk in addition to physical and logical volumes.”

Using this technique I loaded a machine as if it were using the local cdrom device. After testing that this would work with a DVD .iso image, I added the three CD .iso images to my media library.

Then I assigned CD 1 to my partition. I booted from it, and when it asked me to change the CD and hit “Enter,” I went in and changed the CD by going into the Partition Properties>Optical Devices and clicking the Modify button. Once you do this, you can select the next CD. When you go back to the console and hit “Enter,” it will read the next CD and continue with the install.

As I test this feature, I’m sure I’ll find more interesting things I can do with it, but for now this was exactly the solution I needed.

Taking a Look at lparmon

Edit: The alphaworks site is no longer live, although it does take you to an IBM site: https://developer.ibm.com/community/ I fondly remember lparmon, maybe it is time to bring it back.

Originally posted November 26, 2007 on AIXchange

In general, I find that customers and management more easily comprehend their system utilization when the data is displayed graphically rather than as text output. This is especially true when they want to see the overall performance and utilization of a machine that’s been carved into multiple LPARs.

I find it especially beneficial to show customers historical performance information plotted in a graphical format. The trends, spikes and utilization can be easier to identify when viewed graphically.

The Austin Executive Briefing Center has created a real-time graphical tool for use with System p.

IBM’s alphaWorks Web site has a description of the lparmon tool:

“LPAR Monitor for System p5 servers is a graphical logical partition (LPAR) monitoring tool that can be used to monitor the state of one or more LPARs on a System p5 server. LPAR state information that can be monitored with this tool includes LPAR entitlement/tracking, simultaneous multithreading (SMT) state, and processor and memory use. The LPARs to be monitored can be running any mixture of AIX 5.3, AIX 5.2, or the Linux operating systems.

“Included in Version 2.0 are several visual and functional enhancements, including the ability to group LPARs into user defined categories, set alert thresholds with associated alert actions, monitor and record into a file processor, and use memory data over time.

“The graphical LPAR tool consists of two components. First, there are small agents that run in AIX 5.3, AIX 5.2, or Linux LPARs on the p5 server. These agents gather various LPAR information through several operating-system commands and API calls. The agents then pass this information via a connected socket to the second component, which is the monitor’s graphical user interface. This graphical user interface is a Java application and it is used as a collection point for the server and LPAR status information, which it then displays in a graphical format for the user.”

Naturally, it’s easier to understand lparmon when you see it for yourself. Compare this output which was collected with lparmon:

To this output that I captured in a text window using topas:

I could have opened up xterm windows to all three of the LPARs that I was monitoring with lparmon. I could have run topas in each of them. But when you’re trying to show this information to people who might not be familiar with vmstat, topas or nmon output, it’s helpful to simply point them to lparmon’s easy-to-understand graphical dials rather than educate them on where they should be looking at the output. The lparmon output above is pretty clear in showing that host fifty2a is using most of the resources compared to the other LPARs.

Compare that to this third image when I put a load on my vio server:

Again, it’s easy to see what’s going on with the machine: vios1 is using available CPU capacity from the shared pool, above what it’s entitled to, as there are resources available for it to borrow from. When lparmon output is being shown live while making changes to a running machine, it can very clearly demonstrate the advantages of virtualizing your machine. And when setting up LPARs, you can see how different choices will impact your environment.

Furthermore, lparmon can be useful when monitoring production machines. You can see if LPARs have been set up with the correct number of virtual processors and if they’re using the shared CPU pool the way that you expect them to. I’ve seen instances where people thought they were using the shared pool, but hadn’t added enough virtual processors to allow the LPAR to do so. Thanks to lparmon, you can quickly see if your machine is overloaded, if you need to rebalance resources, etc.

So take the time to set it up and see how your machines are running. It’s worth adding lparmon to your collection of useful tools.

The Benefits of mksysb Migration

Edit: I still use mksysb migrations, but not with 4.3.

Originally posted November 19, 2007 on AIXchange

During last month’s IBM System p, AIX and Linux Technical University in San Antonio, I listened to a presentation on Advanced Network Installation Manager (NIM).

One topic introduced by the presenter, IBM’s Steve Knudson, has really stuck with me. It’s called a mksysb migration, and you use NIM to implement it.   

I still have customers that are running AIX 4.3. Some of your customers are, too. When the day finally comes to upgrade them, how will you do it? Will you attempt to upgrade the OS on old hardware that might not support the running of the upgraded OS? Will you just pray that your customers simply retire whatever old applications they’re running on their old hardware and old OSs to spare you from performing any migrations?

I have a better solution: Do what Steve suggests. According to his notes from the conference, the mksysb migration allows you to restore an old non-supported mksysb on POWER5 or POWER6 hardware. Once the mksysb is copied to the new hardware, it’s immediately upgraded.

This process involves booting the client machine from the NIM master server with the AIX5.3 SPOT. The NIM server restores the backlevel mksysb, then immediately migrates the mksysb to AIX5.3.

See the IBM Redbooks publication, “NIM From A to Z in AIX 5L” pages 205-216, for details.

Here’s an excerpt:

“Given that AIX 4.3 is not supported on the POWER5 platform, in the days before ‘mksysb migrations,’ the only course of action would have been to upgrade the AIX 4.3.3 system to AIX 5L V5.3 on the existing hardware (for example, the 6H1) and then clone the system via a mksysb to the new POWER5 LPAR. This process is now simplified with the mksysb migration.

“A mksysb migration allows you to move a lower level AIX system (for example, AIX 4.3.3 or AIX 5L V5.1) to POWER5 without upgrading AIX on the existing server first. Essentially you boot AIX 5L V5.3 media (in our case, we use NIM) and recover the AIX 4.3.3 or AIX 5L V5.1 mksysb image, followed by an
immediate migration to AIX 5L V5.3. This was not possible with previous versions of AIX and pSeries hardware. A mksysb migration is now the recommended way of moving unsupported hardware configurations of AIX 4.3 and AIX 5L V5.1 to new supported hardware and AIX 5L V5.3.

“The end result is that the new system will be a clone of the existing system, but will be running AIX 5L V5.3 on a POWER5 platform. The existing system remains the same (for example, running AIX 4.3). You may choose to use this method to perform a test migration of a system and certify the applications,
databases, or code against AIX 5L V5.3 on the clone before executing the real mksysb migration at some later stage.”

I’m going to try this out myself. Once I run a few tests I’ll let you know what I find.

Loading a Console Window Directly on an AIX Desktop

Edit: I still like vnc, I imagine the instructions and links may be a little different now. I do not use KDE or Firefox on AIX anymore.

Originally posted November 12, 2007 on AIXchange

I’ve been using HMC version 7 for a while now. Recently I loaded a console window directly on my AIX desktop. I like having a desktop session running on AIX or Linux. I can close it down, go to another location, fire it back up and pick up from where I left off. I explained this in a previous article.

I was running VNC. (You can download the rpm here or load it from your AIX Toolbox for Linux Applications CD.)

Once I loaded VNC and ran vncserver, I decided to run the KDE desktop rather than the default tab window manager (twm) desktop. From my Toolbox CD, I went to the ezinstall/ppc directory and loaded the kde3.all rpms.

/aixcd/toolbox/ezinstall/ppc # ls
app-dev crypto.base gnome.apps kde3.all kde3.opt
base desktop.base gnome.base kde3.base

After installing and running startkde, I had a desktop that I liked.

Then I decided to load Mozilla Firefox 1.5.0.6 for AIX from the CD. It installed fine, and I could bring up my HMC login screen with no problem. However, when I tried to open a console window on one of my partitions, I was warned that I needed to install the appropriate plug-in to handle Java. I clicked to download the appropriate plug-in, but Firefox had no idea what to do, and the documentation on Sun’s Web site wasn’t helpful. Fortunately, the documentation that was included with Firefox for AIX was.

The pertinent information for getting your plugin working with Firefox on AIX is available at /usr/mozilla/firefox. Read the README or README.HTML files.

Using the Java Plug-In

The AIX Java Plug-in for Firefox for AIX is included in Java 5 or later. This version of Java runs on AIX 5L and chrp system architecture only. Run bootinfo -p to find out a system’s architecture.

Downloading the Java Runtime Environment — Download the Java installp images.

1. Open  http://www.ibm.com/developerworks/java/jdk/
2. Select AIX–Downloads from Java 2 Platform, Standard Edition (J2SE)
3. Select Java 5 64-bit and sign in.
4. You need these files: Java 5 and Java5_64.sdk.tar.gz.

Installing the Java plug-in–These installp filesets must be installed: Java 5 and Java5_64.sdk. Use SMIT, WebSM, or installp to install the filesets.

Configure the Java plug-in–For Java 5, the Java plug-in file is /usr/java5_64/jre/bin/libjavaplugin_oji.so. If it doesn’t already exist, create this link:

ln -s /usr/java5_64/jre/bin/libjavaplugin_oji.so \
/usr/mozilla/firefox/plugins/libjavaplugin_oji.so

As an alternative, the plug-in can be linked into the user’s .mozilla directory:

ln -s /usr/java5_64/jre/bin/libjavaplugin_oji.so \
$HOME/.mozilla/plugins/libjavaplugin_oji.so

Note: Only one version of the Java plug-in can be used at a given time.

Verifying the Java plug-in- In Firefox fox AIX, “about:plugins” in the address bar should show the Java plug-in information.

With Firefox running inside my VNC session on AIX, I was able to connect to my HMC, start my console window and start running some NIM installs. In the middle of the installs, I could disconnect from VNC, then reconnect later to check on the progress.

In the past I had a similar setup running on Linux, but this configuration running on AIX suits my current needs.

Working the Network

Edit: I do not think I have run Firefox on AIX in a while, but I still do most things remotely. Mounting .iso images is more straightforward these days too.

Originally posted November 5, 2007 on AIXchange

I found myself hundreds of miles away from a new environment that needed to be set up. The machines were physically cabled to the network and everything was powered on. I was able to reach the HMC and from there I could power on LPARs, open console windows, and configure partitions. My options were to either find someone on site to be my hands and eyes and physically load media for me, or use the network myself. I chose to download the necessary AIX media onto my Network Installation Management (NIM) server so that I could configure my new lppsource.

Much of what follows assumes that you already know how to set up a NIM server and have ample free disk space. It’s also assumed that you have Entitled Software Support, a decent network connection to the Internet, etc.

First I went to the IBM Entitled Software Support page and downloaded my AIX CD images.

I ran the Firefox Web browser on AIX in a VNC session on my server, and from that connection I downloaded my images directly onto the machine I was working on. I had local copies of the CDs, but the network pipe from my location wouldn’t support copying them directly from my machine to the target machine. Downloading them directly to my NIM server was a better use of the network in this case.

Once I downloaded the images, I needed to mount the CDs so that I could run the smitty bffcreate command.

On Linux, I can simply run:

mount -o loop -t iso9660 filename.iso /mnt/iso

This mounts my CD image on my filesystem. On AIX, mounting an .iso image is a little more involved. First I created my logical volume, in this case:

/usr/sbin/mklv -y’testlv’ datavg 6

Then I ran the dd command in order to write the contents of the .iso file to my logical volume:

dd if=/aixcd1.iso of=/dev/testlv bs=10M

Then I mounted my .iso image with:

mount -v cdrfs -o ro /dev/testlv /mnt/iso

At this point the CD was mounted, and I could run smitty bffcreate.

Once I’d used bffcreate to get my images into the correct directory, I could create my lppsource and move forward with my NIM procedures.

Depending on disk space issues, you may find that you need to remove .iso images as you download them. In any event, this procedure saved me from taking physical media onsite and allowed me to keep the project moving forward from my remote location.

Handling HMC Login Failures

Edit: This is still something that you might run into. Surprised that the link still works.

Originally published October 29, 2007 on AIXchange

I went to a customer site to look at a machine that wasn’t showing up on the hardware-management console (HMC). The machine’s HMC port was connected to the same network switch as the HMC, so I powered it up. I logged onto the HMC, and under the status column on my HMC view, it said that authentication for this new machine had failed, there were too many failed attempts to log in to the service processor.

How do you get the machine connected to the HMC, if the HMC is unable to authenticate with  the machine? Some people suggested that we just remove the NVRAM battery from the machine, and that the passwords would go back to defaults. I was hoping that this was not the case, as that seemed like a pretty trivial way to bypass the security that the password provided. After trying this without success, we called support, and they provided the celogin password for the day so that I could log into the ASMI.

IBM Support is able to take the serial number of the machine, and match that with the current date, and based on that provide a password that’s good for the day. This is very  similar to when you have to call support to get the root password to the HMC.

The next issue was the fact that pulling the NVRAM battery had set the date to 1/1/2003. The password we were trying to use was for the current date. Once I logged into ASMI as admin, I was able to set the machine to the correct date. With the celogin password I was able to log in and reset the unknown HMC password. From: http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/iphby/browser.htm&tocNode=int_130339

There are several authority levels for accessing the service-processor menus using the ASMI. The following levels of access are supported:

  • General user–The menu options presented to the general user are a subset of the options
    available to the administrator and authorized service provider. Users with general authority can view settings in the ASMI menus. The login ID is “general” and the default password is “general.”
  • Administrator–The menu options presented to the administrator are a subset of those available to the authorized service provider. Users with administrator authority can write to persistent storage, and view and change settings that affect the server’s behavior. The first time a user logs into the ASMI after the server is installed, a new password must be  selected. The login ID is “admin” and the default password is “admin.”
  • Authorized service provider–This login gives the authorized service provider access to all of the functions that could be used to gather additional debug information from a failing
    system, such as viewing persistent storage, and clearing all deconfiguration errors. The login ID is “celogin.” The password is dynamically generated and must be obtained by
    calling IBM technical support.

Be sure that when you change the ASMI and HMC passwords that you document this change just like you would any other passwords. Also be sure to keep your machines under warranty in case you do find yourself in a situation where you need to call IBM support, although I  imagine they would be willing to provide the service for a fee.

IBM Revamping Fix Central

Edit: I just left the links alone on this one. They actually still resolve and the .pdf is still there.

Originally posted October 22, 2007 on AIXchange

Did you see that IBM announced it’s enhancing the Web-based fix download facility to support its new AIX service strategy? According to the company, simplified Web pages with enhanced search capabilities and more detailed package information will be provided in the next month or so.

I want to highlight the following from IBM’s Web page:

“IBM provides many documents that recommend service and support strategies for IBM Systems and software. These ‘best practices’ documents describe system planning and support procedures to improve system administration operations. The best practices documents referenced on this page provide strategies for IBM System p servers and firmware, the AIX operating system and related products, such as the Hardware Management Console and cluster software such as HACMP. “

While you’re there, be sure to read this PDF regarding  IBM’s new service strategy:

“The new service strategy encourages clients to apply complete Service Packs. Individual updates can still be applied; however, having a maintenance policy of applying the complete package improves manageability by reducing complexity and providing more consistency.

“Clients also benefit from knowing that IBM has regression-tested each Service Pack as a unit. Installing the entire Service Pack reduces the possibility of product regression and increases serviceability by enabling IBM support to identify problems quicker.

“One of the first changes you’ll notice is that IBM is now promoting the full distribution of fix packs. Downloading packages for specific problems is being discontinued. A fix to a specific problem will be available in one or more fix packs. A fix pack can be either a Service Pack or a Technology Level package.”

The PDF includes some screen shots of the upcoming changes to the Fix Central Web site. Also from the PDF:
 
“As you select package lists, this aide will remain visual to help you make decisions on whether to stay at your current Technology Level or upgrade to a newer one. There are many reasons to upgrade to the latest TL. End of service life is one of them.
 
“Search is being enhanced with full indexed searching and improved sorting options. Searching by symptoms or error messages is easier and more complete. All specific fixes are associated with fix packs that are either Service Packs or Technology Level packages.”
 
I recommend you take a few minutes to get all the information about the coming changes to Fix Central. These enhancements should allow us to obtain greater stability from the machines we manage because we’ll only be managing machines to the latest technology level and service pack. Of course we’ll need to check for the current technology and service-pack levels, and then decide which schedule for implementation makes the most sense in our environments.

Parting Thoughts on This Year’s Technical University

Edit: I still love the IBM Technical University. I am pretty sure the links at the end no longer work. I edited the first link.

Originally posted October 15, 2007 on AIXchange

A few weeks ago I mentioned the IBM System p, AIX and Linux Technical University held Oct. 1-5 in San Antonio, and recommended that you make every effort to attend. I did, and what follows are some of the reasons that I’m glad I went:

Sure, none of this had anything to do with AIX or Linux, but seeing these sites was well worth the time.

It’s easy to conclude that there’s no benefit to sending employees to these conferences for a week, that they’ll spend all their time as tourists rather than get the training they need. However, nothing could be further from the truth.

Each day at the Technical University featured about 55 presentations, which were held in 11 different rooms throughout the San Antonio Convention Center. Some were repeats, so if two sessions you wanted to see were running at the same time, chances are at least one of them was repeated on a different day. So with a little planning, you could attend every session that would benefit you.

The 75-minute sessions ranged from lectures to discussions with AIX developers to hands-on labs and onsite certification testing. Presenters’ slides were available for download. This is valuable of course, but there’s still much to be gained from sitting with your peers and listening to the instructors elaborate on what they meant when they wrote their presentations.

Training tracks covered AIX, HACMP, storage, virtualization and other topics. Following the color codes made it easy to determine which presentations fit your interests. On Monday there were keynote sessions covering System p trends and directions, followed by an AIX trends and directions session. A typical day started with breakfast, followed by sessions at 8 a.m., 9:45 a.m., 11:15 a.m., 1:45 p.m. and 3:30 p.m. Daily printouts listed any last-minute itinerary changes.

Overall I found the content to be very worthwhile. Besides the content, it’s always nice to hear colleagues report issues that they’re seeing in their home environments and solutions they’d found. In general, it’s just good to spend time with others who work on the same systems that I do.

European readers may be interested in the System p, AIX and Linux Technical University scheduled for Nov. 5-8 in The Netherlands. Otherwise, there’s always next year’s Technical University. I’ll see you there.

Modernization Decision Is Multifaceted

Edit: F50 servers? B80? 6M1? Now THAT is a long time ago.. I love that I am advocating for running AIX 6.1 on POWER6 hardware too.

Originally posted October 8, 2007 on AIXchange

I recently did some services work at a few customer sites. One customer was running an F50 with an unsupported version of AIX. Another was running a B80 and a 6M1 with AIX 5.1. In the first scenario, we needed to make some minor adjustments to the F50, and in the second case, we needed to upgrade the operating system. In both cases, upgrading to newer hardware came up in conversation. The virtues of virtualization and the benefits of the faster processing power and the reliability, availability and serviceability (RAS) discussions all took place. Ultimately though, for their particular environments and workloads, their current solutions were working just fine.

As System p professionals, we continually keep up to date with new hardware, new features, and new offerings from IBM. It’s very easy to think that everyone should run POWER6 processors and AIX 6 with VIO, and we need to keep in mind that the technology and the speeds and the feeds are good things. But the bottom line is that customers have business problems that they’re trying to solve, and computing resources are tools to help them do that. If their older hardware does the job, they may see no compelling reason to change their environments–even if their current machine isn’t as fast or responsive, and even if their jobs might run in half the time on newer hardware.

For whatever reason–budgetary, strategic or something else–they’ve chosen not to make the move at this time. Some customers I talk to make the analogy of driving a 1992 model car that still gets them to their destination adequately. As much as they’d like to modernize, they feel it doesn’t make sense to do so at this time.

In the meantime, we’ll continue to educate them, so that when they are ready to make that move, they’ll know all of the features and benefits that they can look forward to.

Python Programming Tutorials

Our Python Tutorials will help you learn Python quickly and thoroughly. We start with “Hello World” and then move on to data structures (sets, lists, tuples and dictionaries). Next we’ll cover classes, and give a variety of in-depth examples and applications.

HMC v7 First Impressions

Edit: This post still seems to hold up, the HMC versions have changed, but what we can do with the HMC is the same. The link still works which is always a bonus.

Originally posted October 1, 2007 on AIXchange

After last week’s article, where I detailed some hardware management console (HMC) training I attended, I went ahead and upgraded to HMC v7.

Some first impressions:

It was a very simple upgrade. I installed the version 7 release 3.1.0 DVD (after backing up console data and saving upgrade data) and booted from it. After answering a few questions and loading both DVDs, I had a running system.

I’ve been doing my testing while using Firefox as my browser (Internet Explorer isn’t at the right level on my machine, and I haven’t looked into upgrading yet). It seems to work fine. Just go to  https://hmc.ip.address or https://hmc.hostname and you can get to your prelogin screen. Here you have a link to log on and launch the HMC Web app and view online help. Below those two selections you can view the system status, any attention LEDs, and any serviceable events (see below).

When you log in, you see some hmc “tip of the day” screens (see below). Note that you can disable these screens though I like reading the different tips.

The display is similar in concept to WebSM, and after clicking on the different headings, it is pretty obvious where to go find things. The HMC redbook that I mentioned last week gives a very good overview of the different things that you can do.

One thing I’ve noticed is that if I lose network connectivity, (or if I make the choice when I log out), I can reconnect to my previous session. When you select log off, and select the option to disconnect instead of log off, the next time you can either reconnect to the previous session or start a new session. This can be handy if you start a long-running task on the raised floor and need to walk back to your desk. You can logoff, disconnect, then reconnect at another location. As a heavy screen and vnc user, I can really appreciate this functionality.

I also used the HMC v7 to upgrade firmware on a machine, and it went very smoothly. It took me a minute to realize that I wanted to use Upgrade Licensed Internal Code to a new release, but once I used the correct option, the update process was very simple.

I was able to easily export a system plan from the HMC, and transfer that system plan to my machine. When I ran the IBM System Planning tool on my machine, it let me open and view the system plan I moved over, but I was unable to edit that system plan.

I think it would be very useful to be able to export sysplan data from the HMC, edit it on the PC, then transfer it back to the HMC. For now it looks like I can transfer it from the HMC and view it, but I would have to create a new file in order to edit it.

I did find that ports 9960/tcp need to be open for the console window, and that the HMC will communicate to the managed machine on 657/tcp and 657/udp for things like dynamic logical partitioning (DLPAR) operations.

As I continue working with and using the HMC, I’m sure I’ll find more to share with you. Hopefully you’ll continue sharing the things you’re learning with me.

The Advantages of HMC Version 7

Edit: I have not thought about websm in quite a while. Another blast from the past. The links still work, which always surprises me after so many years.

Originally posted September 24, 2007 on AIXchange

After I attended Hardware Management Console (HMC) version 7 training recently, I was inspired to upgrade my HMCs to version 7. I was also inspired to share what I learned. The following are some of the highlights; you can get more details from the HMC Redbook

This new release allows you to use a browser to connect to the HMC instead of using Web Based System Manager (websm). This means one less step is required when remotely accessing our HMC. Instead of going to hmc.ip.address/remote_client.html to download and install websm code, we can just go to hmc.ip.address with our browser and manage our machine that way without the need for additional software to be loaded on our laptops.

In days gone by, you had an HMC for POWER4 systems and a different HMC for POWER5 systems; you couldn’t mix and match them. With HMC version 7, you can manage POWER5 and POWER6 machines with the same HMC, you don’t necessarily have to get new hardware for your HMC to manage your new machines (assuming they are supported models).

How many times have you seen an error code on your system LED or in websm and wondered what it meant? To find out meant following this process: Writing the error code down, finding the documentation that told me what the code meant, or calling IBM Support to ask them what it meant. Now you can look up error codes with the built-in hyperlinked documentation. You can click on the code and have the meaning displayed. This should make finding out what’s wrong with your hardware a much simpler process.

Partition Availability Priority is a concept that’s similar to weighting partitions in that you give different weights to the different partitions. These weights aren’t used for the day-to-day operations of your partitions, you still set that up in the profile. The Partition Availability settings are used if you were to lose a processor and the system needed to decide which partition most needed to have cycles available to keep it running. It might be a good idea to be sure your production partitions had a higher priority than your test partitions in the event that a processor went away.

Another interesting topic that was covered was Utility Capacity On Demand (COD). I can remember situations where there would be heavy workloads and customers would use reserve COD and end up being charged a fee for using a processor for a full day, even though they only needed those CPUs for a few hours each night. With the new billing model, they’ll be charged on a minute-by-minute basis instead of a daily basis. This can be a huge advantage when you have peak workloads that require more computing resources than usual, but you don’t necessarily want to buy that large machine to sit around idle most of the time.

Check the IBM Systems Hardware Infocenter for information about upgrading from version 6 to version 7, and let me know if you find any other interesting features with this new release.

Systems Management Without an HMC

Edit: I was pleased to find the links below still seem to work. Every now and again I still pull out my trusty serial connections but it is more rare than it once was.

Originally posted September 17, 2007 on AIXchange

Some System p customers buy a smaller POWER5 machine, but don’t want to buy a hardware-management console (HMC) to go with it. It could be that the cost of the HMC outweighs its perceived benefits. Perhaps they don’t plan to partition the machine, and will run it as a single image. However, they still want to be able to manage the machine remotely.  They have a few options. To set the machine up initially, they can plug a laptop into either HMC connection on the back of the machine and access the advanced system management interface (ASMI) that way. This is all explained here

They can configure the IP addresses and plug the HMC network port directly into the network if they so choose, and use that connection to access and manage the machine.   

But what if they want to access a console remotely? Then they need to plug into the serial connection in the back of the machine. (If you Google “Serial to Ethernet converters” you should be able to find many different products from which to choose.)

Accessing ASMI via ASCII terminal is described here.

The newer laptops no longer have serial connections built in. I used to take a serial cable and connect it from my laptop to the back of the machine, and access ASMI and the console that way. My laptop has plenty of usb connections. I needed to buy a USB-to-serial converter, and plug that into the back of the machine. My converter had a male end on it, as does the serial port on the back of the machine, so a gender changer was also in order.

Then, if you’re running Windows on your laptop, you can run hyperterm (pay attention to which com port your USB-to-serial converter is running on–speed, duplex, etc.) and make a connection to the serial console. If you put your serial-to-Ethernet converter on your network, you can then access the console remotely.

As you look around at the different products, you’ll also see serial-terminal servers so that you can manage more machines in your environment via one terminal server.

Although having an HMC is certainly the preferred method for managing these machines, methods are available to manage your machines without one. What other solutions have you come across?

Skeptic to Believer: Live Partition Mobility Has Many Potential Uses

Edit: I feel like I was starting to hit my stride this week and the week before. The topics and the content got a little meatier as time went on. It is hard to believe how exciting it was the first time I saw Live Partition Mobility in action. It is interesting to see that the scenarios I described are very common these days, with much of it being automated and much faster than it was in the POWER6 days. I was able to dig up a link (included below) to the press release: IBM Demonstrates a UNIX Virtualization Exclusive, Moves Workloads From One Machine to Another — While They’re Running

Originally posted September 10, 2007 on AIXchange

When I was in Austin, Texas, recently for a technical briefing, IBM demonstrated how you can move workloads from one machine to another. They call it Live Partition Mobility.

I saw it in action and I went from skeptic a believer in a matter of minutes. I kept saying things like: “This whole operation will take forever.” “The end users are going to see a disruption.” “There has to be some pain involved with this solution.” Then they ran the demo. 

They had two POWER6 System p 570 machines connected to the hardware-management console (HMC). They started a tool that simulated a workload on one of the machines. They kicked off the partition-mobility process. It was fast, and it was seamless. The workload moved from the source frame to the target frame. Then they showed how they could move it from the target frame back to the original source frame. They said they could move that partition back and forth all day long. (Ask your business partner or IBM sales rep to see a copy of the demo. There’s a flash-based demo that was recorded to show customers a demo. I’m still waiting for it to show up on YouTube.)

The only pain that I can see with this solution is that the entire partition that you want to move must be virtualized. You must use a virtual I/O (VIO) server and boot your partition from shared disk that’s presented by that VIO server, typically a storage-area network (SAN) logical unit number (LUN). You must use a shared Ethernet adapter. All of your storage must be virtualized and shared between the VIO servers. Both machines must be on the same subnet and share the same HMC. You also must be running on the new POWER6 hardware with a supported operating system. 

Once you get everything set up, and hit the button to move the partition, it all goes pretty quickly. Since it’s going to move a ton of data over the network (it has to copy a running partition from one frame to another), they suggest that you be running on Gigabit Ethernet and not 100 Megabit Ethernet.

I can think of a few scenarios where this capability would be useful:

The next time errpt shows me I have a sysplanar error. I call support and they confirm that we have to replace a part (which usually requires a system power down). I just schedule the CE to come do the work during the day. Assuming I have my virtualization in place and a suitable machine to move my workload to, I just move my partition over to the other hardware while the repair is being carried out. No calling around the business asking for maintenance windows. No doing repairs at 1 a.m. on a Sunday. We can now do the work whenever we want as the business will see no disruption at all.

Maybe I can run my workload just fine for most of the time on a smaller machine, but at certain times (i.e., month end), I would rather run the application on a faster processor or a beefier machine that’s sitting in the computer room. Move the partition over to finish running a large month-end job, then move it back when the processing completes.

Maybe it’s time to upgrade your hardware. Bring in your new machine, set up your VIO server, move the partition to your new hardware and decommission your old hardware. Your business won’t even know what happened, but will wonder why the response time is so much better.

What happens if you’re trying to move a partition and your target machine blows up? If the workload hasn’t completely moved, the operation aborts and you continue running on your source machine.   

This technology isn’t a substitute for High Availability Cluster Multi-Processing (HACMP) or any kind of disaster-recovery situation. This entire operation assumes both machines are up and running, and resources are available on your target machine to handle your partition’s needs. Planning will be required.

I know I haven’t thought of everything. Let me know what scenarios you come up with for this useful tool.

System p, AIX and Linux Technical University Fast Approaching

Edit: This post has a dead link to a conference from many years ago. I still love going to the IBM technical universities. In this post I mention how in the olden days we were given a CD with slides etc, I think I prefer being able to download .zip files these days, it seems like more of the last minute slides make it into the .zip files vs the CDs.

Originally posted September 4, 2007 on AIXchange

It’s already time to start thinking about IBM System p, AIX and Linux Technical University. In some organizations, you need more lead time than this to plan to attend such an event, but if you still have some flexibility and budget available, you should really consider attending. If you’ve never been to this event, you should try to attend. If you are unable to go this year, start planning for next year. No, I don’t work for the marketing department. I just found that this was a beneficial event when I attended, and I have always heard the same from friends that have gone to the conference in the past.

This year’s IBM System p, AIX and Linux Technical University will be held Oct. 1-5 in San Antonio, Texas. The list of available sessions includes 94 at the time of this writing and IBM is still finalizing the content. By looking at the agenda, you can see that great material will be presented each day. In years past, attendees received a CD that contained the slides that were presented in all of the lectures. This means that in your copious amounts of free time, (machines never need to be built or fixed do they?) you can at least look over the slides to get an idea of what went on in any of the sessions you might have missed. Many of the slides also contain contact information for the presenters, so you could e-mail them to try to get more information about a given topic.

When I attended this conference in the past, I found it to be worthwhile. If you go, let me know what you think by posting a comment here.

Advice on Transitioning to AIX From Other UNIX Flavors

Edit: Still seems like a legit answer, and also applies to those that know AIX that want to learn more about Linux for example.

Originally posted August 27, 2007 on AIXchange

I’ve been asked, “How long will it take for me to get up to speed on AIX if I have experience in Solaris or HP/UX or Linux?”
 
That depends.
 
How willing are you to read documentation? How much time will you have available for hands-on learning? Will you have a lab available, and the time to spend on the test machines? Will you be trying to keep your skills current, or will you want to set up the machines and just let them run with little intervention?
 
Like anything in life, the more time that you invest, the better you get. I’ve heard managers say, “Oh, just send them to the AIX jumpstart class, that’s all they will need.”
 
This approach will certainly get the information necessary to start doing their jobs, but they won’t become experts overnight. This comes with real-world experience solving problems.   
 
With a solid UNIX background, the transition may be easier, just remember to leave the “baggage” that comes with this knowledge at the door. The AIX world may have different ways of doing things than the world that you’re experienced in. Learn why things are done the way they are. Learn how they’re done. Then share what you have learned.

When to Ask for Help

Edit: This is still a relevant discussion. There is value in beating our head against the wall and figuring things out, I know I learn and retain a ton of information using that method. However when a business is being affected because a server is down and every second counts, I do not hesitate to open PMRs and utilize those support contracts and get help, we pay to have access to additional resources so remember to take advantage of that option.

Originally posted August 19, 2007 on AIXchange

I like the quote “there is no shame in calling support.” I was talking to a co-worker, and we were talking about a problem that had cropped up in our environment. We weren’t sure how to proceed. Google and our usual methods of searching for answers didn’t help us in this case. Instead of wasting more time searching for the answer on my own, I called IBM and quickly received the answer I needed to solve the problem.

How long do you usually wait before calling in a problem? With a hardware issue, it’s a no brainer. You need the CE to bring out a new part, and you call it in. With software support, it can be complicated. We’re supposed to be the experts, we know it all. In reality, everyone needs help at some point. Whether it is a new technology that you are still learning about, or an obscure setting that you have forgotten about, what are the situations that warrant you calling IBM for help?

Consolidate Using System p LPARs

Edit: Virtualization is standard these days, but in 2007 there was still hesitation in some circles. These were discussions I remember having with customers at this time.

Originally posted August 12, 2007 on AIXchange

I have too many machines in my server room. I am running out of power connections and cooling capacity.

Why not consolidate some of those servers into a smaller physical space, using virtualization on a System p machine? Carve up your LPARs so that each partition uses idle resources that the other LPARs aren’t currently using. If peak workloads vary throughout the day on the different machines, then this could be a good solution. Instead of 10 that are 10-percent utilized and 90-percent idle, why not run 10 LPARs on a single frame and run the whole thing at a higher overall utilization?

This isn’t a magic bullet, and not appropriate for all workloads, but with proper planning this additional tool can help free up space on the raised floor and reduce the overall load on the computing environment.

Getting an AIX Education

Edit: I am starting to get the hang of it by my third post. The links in the post still work all of these years later, so that is a nice bonus. The course names may have changed but the principle is the same.

Originally posted August 5, 2007 on AIXchange

Once management discovers all that the System p platform has to offer your organization, you’ll be asked to learn AIX to support the new machines. Where do you go for this type of education? I would recommend getting some training, although there’s something to be said for getting a lab machine and poking around on it as well.

Look into the “AIX 5L Jumpstart for UNIX Professionals” class that IBM offers in classrooms worldwide. According to the Web site, “This course is targeted for Sun Solaris, HP-UX, or other UNIX system administrators.” This is a traditional classroom course with hands-on labs.

E-learning is another option for those unable to get away for a five-day class. Check out http://www.ibm.com/training for some options.  At the very least, visit http://www.ibm.com/redbooks and search for applicable AIX downloadable PDF files to read.

Once you see all that AIX has to offer, you’ll want to learn more, and there’s always more to learn.

Maintaining Uptime

Edit: My second post for AIXchange. How long did it take for my topics / style to improve? Now we have POWER servers, we do not still have System p machines. I would argue that keeping hardware from failing is still something to worry about.

Originally posted July 29, 2007 on AIXchange

I am trying to stop a hardware outage from taking down the partitions that I have running on my System p machine. The whole idea of virtualization and consolidation will backfire on me if a hardware issue now takes out four machines instead of one. 

As we do this planning, will we take advantage of as much redundancy as we can? Are we making sure that we have different feeds coming from different power sources? Are we setting up multiple fibre paths to our storage-area network and trying to have our multiple fibre cards exist in different I/O drawers?  Are we setting up redundant virtual I/O servers so that we can lose one and still keep our client LPARs running with the remaining server?  Do we have redundant hardware management consoles set up and functioning?   

What other tactics do you utilize to maintain your uptime?

Consolidation and Virtualization: What Are the Best Solutions?

Edit: I know this is a bit of a rough start, but it was the very first post I wrote for my brand new blog on IBM Systems Magazine. It ended up being called AIXchange, but at one point I was tossing around names like *xExchange, AixExchange, AdminExchange. Who knows what might have been if any of those names would have been used.

Originally posted July 16, 2007 on AIXchange

As you think about server consolidation and server virtualization, how do you decide what is the best solution for your enterprise? Do you look for the biggest machine that you can purchase with your budget allocation?

Do you figure that it is a good idea to have as many CPUs in the machine as possible? How do you architect your solutions? Do you have the experience in house to help with those decisions, or do you look externally for help? In many instances, an IBM Business Partner can be a smart choice, as they will have experience sizing machines for customers if you do not.

I like to know what machines are current, and what machines are on the horizon. I have found good information at: http://www-03.ibm.com/systems/p/. What resources do you utilize as you continue to plan for the future?

10 Things to Love About AIX

Edit: I miss watching the Refreshments live.

Originally posted June 2019 by IBM Systems Magazine

Illustration by Mark Allen Miller

If you’ve had access to a television at any point since 1997, you’re probably familiar with the instrumental theme to the long-running animated series, “King of the Hill.” The music is performed by The Refreshments, a band from Tempe, Arizona.

I watched the band attract larger followings and perform in well-known venues. Then came the record deal and radio airplay.

Of course, things change. In the case of The Refreshments, the band broke up. In the case of yours truly, live music and late nights eventually lost their appeal. Recently though, I was able to attend an unofficial reunion show consisting of three of the original members, plus the lead singer from one of the groups that would morph into The Refreshments.

The musicians and the audience were older, wiser and certainly grayer, but we were all transported. It’s not that I’d forgotten about those tunes and those days—I still break out my old CDs from time to time—but that night I realized I’d taken them for granted.

AIX: What’s Not to Love?

In a similar vein, I think we, as AIX* professionals, can take our favorite OS for granted. While the OS can do so much, the job in front of us is our focus. We have systems to maintain, so many of us keep to a narrow set of tasks and operations. We’re aware of this whole wider world of AIX features and function, but we may not take time to really think about it.

So think about it now: What do you love about AIX? For me, it’s the system management interface tool (SMIT). I also love the ease of importing and exporting volume groups and the simplicity of managing disks and filesystems.

I love the thought that went into the naming of commands and the way the whole system works together. Gathering performance data and tuning system performance are straightforward processes. Mirroring, unmirroring and migrating disks is a breeze.

Naturally, I love all of the new stuff, but I also love that you can count on things to stay the same: Years-old Korn shell scripts continue to work on the newest versions of the OS, and upgrades allow you to preserve settings and configurations, saving you from having to rebuild LPARs from scratch.

Many of my favorite things about AIX fly under the radar; functionality that many admins and developers might get to utilize only infrequently. Of course, some notable new developments are also helpful. So I put it all together in this quick list of things I really appreciate—and yes, love—about AIX.

1-AIX will run on POWER nodes in Nutanix clusters

Unveiled last year, Nutanix is a converged system that utilizes software-defined storage and networking, eliminating the need to manage an external SAN. If your organization already uses or is considering using Nutanix clusters for your x86 environment, you can use the same hypervisor and virtualization stack for your AIX and POWER* environment. There’s no need to learn the HMC or the VIO server, simplifying systems management for existing Nutanix clients as well as those who are new to running AIX. Once you learn how to perform an operation on one type of cluster, you’ll be able to do it on the other.

2-Live Kernel updates

AIX Live Update allows you to update your kernel without downtime. As it becomes possible to patch more parts of the OS without rebooting, this will allow for on-the-fly updates and less disruptive change windows. When coupled with nondisruptive firmware updates for POWER hardware, you can maintain system security without affecting services and end users. Of course, this must be deployed carefully. Progress still needs to be made in this arena, but you can expect that more AIX and firmware updates will be performed this way.

3-Upgrade on the fly

The alt_disk_install and alt_disk_migration methods greatly simplify the entire upgrade process. If anything goes wrong once you’re up and running on the newer versions of code, you can back out by changing your boot device and rebooting to your original disk. It’s that easy because you’re leaving the original disk alone while confining changes to your cloned root disk. Why waste time and run risks related to bad backups? No one wants to restore an OS after an upgrade gone bad.

4-The AIX toolbox for Linux

This collection of open-source and GNU software benefits from more frequent updates. The environment of choice for many Linux* application developers, these tools are packaged in RPM format and can be downloaded and run without needing to compile. This allows you to run familiar open-source tools and programs on your AIX servers. You can also use YUM to automate the download process and set up prerequisites.

5-AIX, IBM i and Linux can run on the same POWER frame

We all understand this, but take a moment to really think about the flexibility this provides. From one base of reliable, powerful hardware, you can choose the OS that makes the most sense for your application needs. Although I like to run AIX where I can, it’s reassuring to know that I can still stick with POWER to run Linux- or IBM i-specific workloads.

6-The hypervisor and virtualization technologies

Of course, these aren’t actually components of the OS—their heritage comes from the IBM mainframe—but it’s all part of the platform. The hardware/virtualization combo is unquestionably one of the best things about working on AIX. Virtualization is baked into the hardware; it’s not a bolted-on afterthought that consumes CPU and memory like you find on other hardware platforms. And because the company that built your hardware also built your virtualization layer and OS, you can expect IBM to troubleshoot the whole stack should you encounter issues. There’s no need to waste time chasing multiple vendors, trying to get someone to actually own your problem.

7-It’s a very forgiving platform

AIX and IBM Power Systems* hardware provide the flexibility to add physical memory and CPU as needed. This allows you to plan for the future and pay as you go. You can upgrade your machine with no outages. You can mix virtualized and dedicated adapters, as well as make virtual machine configuration changes on the fly. You can set up enterprise pools and share resources across physical frames. Adding and removing memory, CPU and adapters is seamless and simple. The same OS will run on the smallest to the biggest systems. Migration is a breeze: Bring in your new server and use Live Partition Mobility to move your workloads while they are running with no downtime. And it’s easy to adjust your resources if needed.

8-AIX has the capability to become a NIM server

Using a network installation manager (NIM) server allows you to back up, restore and upgrade your system over the network from a central location. You can use it to boot your machines into maintenance mode, and under the covers for other operations when managing AIX servers.

9-Boot from USB

I still encounter a fair number of clients who are unaware of this capability. The VIO server and AIX OS can be booted and installed from flash drives. In addition to being much faster than more traditional boot options, booting from USB completely eliminates the need for physical media like DVDs and DVD drives. This is especially useful when setting up an environment where a NIM server isn’t already installed.

10-The system responds to problems before they become outages

The OS has an error-logging facility that diagnoses issues and helps predict problems that could arise on the system. When coupled with hardware location codes, this makes it simple to determine which disk or piece of hardware to replace. When you set up call home, a problem ticket is generated with IBM, and in many instances, a part will be shipped or a client engineer will get dispatched before you even realize there’s an issue. The system helps keep itself highly available.

Make Your own List

That’s my list. I’m willing to bet you could make your own, and I suggest you do so. I expect you’ll end up with an even greater appreciation for everything AIX has to offer. Comment on this post online with your list of aspects you love about AIX.

Data Backup Options Balance Risk and Cost

Edit: A backup without a test restore is a wish.

Originally posted February 2019 by IBM Systems Magazine

In some environments, disaster recovery (DR) testing and system rebuilding are ongoing. The most dedicated organizations conduct failover tests and run Live Partition Mobility (LPM) operations to evacuate frames so maintenance can be safely performed. Then LPM is used to put LPARs back onto the frames when the maintenance is complete.

Other environments are much more static. LPARs are built, quarterly or semi-annual patches are applied and that’s it. Of course, far too many environments do no maintenance at all. While regular testing is ideal, this level of activity isn’t practical or even necessary for everyone. Before you ever invest in system availability, understand that you’ll always face risk.

We’ve all been in meetings where recovery time and recovery point objectives are set. How far back should your backups go? That depends: How much data can you afford to lose? You must determine the amount of risk that’s acceptable to your enterprise.

Of course, these decisions are often based on business priorities rather than technical considerations. For instance, in some enterprises where real-time or near real-time data replication is seen as cost-prohibitive, backup tapes are shipped to a DR location and then restored to a secondary machine. This provides an extra layer of protection, but it’s also an example of balancing cost versus risk. In this case, risk is the possibility that data may be lost.

Greatest Investment, Lowest Risk

Not too long ago, maintaining two data centers was seen as an option strictly for huge organizations with large IT budgets, but this practice is relatively mainstream now. Obviously, the benefit of protecting data with a secondary data center is that a disaster or any sort of outage is unlikely to take out both facilities simultaneously. However, these solutions still carry risks. For instance, data corruption is still a possibility. If data is maliciously encrypted or destroyed, it may still be copied to your secondary location. For this reason, offline backups should still be a component of your solution.

Also, keep in mind the importance of testing. If you have a high availability (HA) cluster, fail it over regularly and run production for a period of time on your secondary node. (This assumes your failover node is sized to handle the entire workload rather than only its most critical components.) Same goes with a DR site: fail over to it and run production from the secondary location. Verify that everything works as it should. Don’t wait for an unplanned outage or an actual disaster to learn that a critical piece of infrastructure or data didn’t get replicated. Testing may also reveal technical issues with DNS or network connectivity into the secondary data center, or procedural issues that should be ironed out when personnel are fresh and expecting to troubleshoot issues.

Medium Investment, Reduced Risk

If your OS configuration is fairly static, monthly or weekly OS backups may be sufficient. Again though, you must understand the risks. Obviously, in the event of a restore, you’ll need to reintroduce any changes that occurred since the last OS backup. Beyond that, something could happen to your backup image.

If OS and data backups are written to tape, make sure the tapes are safely labeled and stored offsite, and that you have a recovery plan and a method to quickly access them if needed. Remember: A severe outage might lead to a compete loss of access to your machines and data center. Also remember tapes don’t last forever. Have a plan in place to replace them over time.

On the other end of the spectrum are organizations that rely on their storage subsystem to take snapshots. This may seem like sufficient protection, but storage subsystems do fail. Or what if some sort of catastrophe makes the snapshots unreadable? OS images and snapshots can’t be recovered if they no longer exist in a readable form.

Again, testing is critical. You’ll never know your backups are good unless you try to restore them. And when was the last time you audited your backups? Changes happen. Are you sure the backups you set up are still running properly? Even restoring individual files periodically can help you confirm that the backups can be read and the data still exists.

Backups shouldn’t be limited to enterprise systems, either. VIO servers and the HMC should also be backed up and maintained. Make sure boot media and any other necessary tools are readily available should you need to rebuild machines after a disaster.

Risking It All

As I noted at the beginning, with some enterprises, the choice is to do nothing. Backups may not occur at all or they’re rarely tested. Legacy systems may be left to run without being maintained in any way.

Again, risks and costs are being weighed, but in these cases, the risk may be misunderstood, or seen as negligible, while any cost is viewed as onerous. I won’t offer a lengthy defense of IT spending because if you’re reading this, it’s highly likely that you fully understand the need to protect data and the systems that store it. Plus, that’s probably in your job description.

Whatever choices are made, whatever is invested and whatever risk is allowed, it’s critical that your backup and recovery process is thoroughly documented and that everyone in the organization understands the ramifications of these decisions. If you have concerns about, say, recovering your LPARs, make them known immediately, before an event occurs.

Certainly, additional backup solutions and options are available—I didn’t discuss virtual tape, for example—but hopefully some of these points will help spark an honest assessment of your current situation.

Getting Linux on POWER up and Running is Simple

Edit: It is still simple.

Originally posted January 2019 by IBM Systems Magazine

Many organizations operating with AIX* and IBM i environments also rely on Linux* to run their businesses. But even if you know Linux, you may not realize how easy it is to run the OS on IBM Power Systems* servers.

In larger enterprises, maybe there’s a dedicated AIX team and another team of Linux administrators, with everyone doing their own thing. The AIX group isn’t focused on Linux, and the Linux folks don’t know what Power Systems servers are capable of.

Some believe that running Linux on enterprise hardware is too costly or complex. It’s not. Others don’t realize that running Linux on POWER* is even an option.

Depending on your workload characteristics, Linux performance can be significantly better when run on POWER. It’s also worth noting that IBM has worked to make this option even more appealing. Since the POWER8* processor was introduced, IBM has been transitioning the processor to fully support the little endian format. This makes it easier for application providers to recompile and run Linux on POWER without making changes to their source code. As a result, more distributions, packages and applications are being migrated to Power Systems servers all the time.

Given the affordability of open source, you owe it to your enterprise to consider Linux on POWER and to get hands-on with various Linux distributions. That way, you’ll be able to provide meaningful input when your company discusses the pros and cons of available choices.

An Array of Choices

Numerous Linux distributions are available. Some of the more widely used distributions include Ubuntu, SUSE and Red Hat (the company IBM plans to acquire). But any number of other distributions, such as CentOS, Debian and Fedora, don’t require licensing or support fees.

It’s critical to choose a distribution that’s been compiled for and works on Power Systems hardware—but beyond that, the choices are wide open. As all of these distributions are made up of open-source code, they stand out in different ways. Maybe you’ll find it easiest to work with a particular desktop manager, default filesystem type or package manager. Each distribution is unique, of course, but once you’re proficient with one, working with others is fairly easy.

A Familiar Process

If you’ve not experimented with Linux on POWER, here are some things to consider.

First, installing Linux is similar to installing AIX, so if you’re familiar with that process, you shouldn’t have any issues. Choose a distribution and download the appropriate .iso image.

For instance, a web search on “Ubuntu download Power” returns options to download various Ubuntu versions: There’s Ubuntu 18.04 LTS for IBM Power*, the first release to support POWER9*, and Ubuntu 16.04 LTS and Ubuntu 14.04 (both are for POWER8*). SUSE is similar, although you must register for a 60-day free trial. On Red Hat’s website, you can request an evaluation.

In all instances, be sure you’re getting the install images for ppc64le, and if you’re going to run on POWER9, be sure that the latest processor is supported.

I’m assuming your testing will occur on a traditional Power Systems server with some spare capacity. I’m further assuming you’re running a VIO server, and that you’ve obtained the necessary permissions from your IT management. Testing is even easier if you have either a Linux-only variant of Power Systems hardware—for instance the L922—or hardware acquired from an OpenPOWER community vendor, such as Raptor Computing Systems’ Talos II.

On a traditional system, the installation process begins by copying the .iso image that you downloaded to the virtual media repository in your VIO server. This allows you to boot from a virtual DVD over vSCSI. Then you can either have your SAN administrator provide a LUN that you can use for testing, or you can map a spare disk in your frame to an LPAR, or you can carve up a logical volume in your VIO server to use as a backing device for your Linux installation. Again, all this is familiar for anyone who’s installed AIX.

To make this LPAR available to your network, obtain the appropriate IP address information. While you won’t need much processing power or memory, sizing your test LPAR appropriately will obviously lead to better results.

Once your LPAR is defined, simply boot from your virtual DVD. In many cases, the defaults listed in the various installer menus will be sufficient, but it’s worth taking the time to familiarize yourself with the Linux environment by going through the various menus and trying different settings and options. Practice setting up repositories and user IDs, make changes to filesystems, load software and configure the system.

After going through this process a few times with one distribution, try another. Incidentally, this is why having access to a “crash and burn” test system is critical; you can do what you want without impacting others.

To install Linux as a client hosted by IBM i, view the “IBM Support” document listed in the “Linux on POWER References,” below.

Simple, Seamless

If you allow users on your system, in most cases, they won’t even realize that Linux is running on POWER as opposed to the x86 hardware they may be accustomed to—but they could notice the improved performance. You have the hardware, and getting Linux up and running on it is a simple process.

IBM Power Systems L922 benchmark information: ibm.co/2Teo8zC

Little endian primer: ibm.co/2DFkudC

IBM Redpaper (REDP5496): “IBM Power System L922: Technical Overview and Introduction”: ibm.co/2FA6Km6

OpenPOWER Foundation: bit.ly/2BdsHDD

IBM Support: Create a client partition (i, AIX or Linux) hosted by an IBM i server partition using HMC Classic: ibm.co/2FqpMvt

Linux Distributions

Ubuntu: bit.ly/2PYU10p

SUSE: bit.ly/1rmA4dg

Red Hat: red.ht/2QBniv3

CentOS: bit.ly/2qKtINp

RaptorCS: bit.ly/2wpNH5z

Debian: bit.ly/2Psp7OI

Fedora: bit.ly/2RWsb2p

Meridian IT YouTube videos

Say NO MORE to the status quo. You deserve a partner that can help you transform your technology tools and resources into undeniable business value.

We’ve been helping clients with their needs since 1979 – and we’re proud to say that some of our clients have been with us from day one. We recognize our clients have a business to run, and Meridian serves as their technology advisor to help implement the solutions they want and provide the expertise on the solutions they need. Our team gives our clients a competitive edge to dominate their industry with solutions optimized for their unique challenges.

Don’t take it from us, hear it from our customers. We take pride in our long-term relationships and ensuring that their experience is the best the industry has to offer.

https://meridianitinc.com/customers

Say “No More” to your IT & Business Challenges with Meridian IT

https://www.youtube.com/channel/UCt_Y_ae22ZFQBMxLWj0I9LQ/videos

AIXpert blog

This AIXpert Blog page links you to How-To articles by Nigel Griffiths – many originally from the developerWorks AIXpert Blog. Nigel Griffiths also called “Mr nmon” due to his Twitter handle @mr_nmon as the developer of nmon & njmon program among other performance related tools for AIX and Linux.

Quick Sheets

From William Favorite’s website:

QuickSheets are a listing of the most frequently used commands and concepts for basic system administration. They are designed to be printed double sided and tri-folded for storage. Another popular method is to print two-up and view as a single page.

The QuickStart pages are somewhat longer and contain more introduction and concepts than a QuickSheet. The intent was not to print these pages, but the formatting has been designed to minimize flow disruption due to page cut off if and when you do.

NIM Server Simplifies Installing and Upgrading AIX

Edit: I still love NIM

Originally posted December 2018 by IBM Systems Magazine

Installing and upgrading your AIX* OS can be done in numerous ways.

For starters, you can install it from base media that’s downloadable from IBM. It’s simply a matter of populating your virtual media library with the appropriate images and using virtual DVDs and virtual SCSI adapters. The downside to using base media is that it requires customization once the OS is installed. This wasn’t a problem back when we were running one OS on each physical machine, but when you’re looking at loading many images onto a single physical frame, scalability becomes an issue.

Logically moving the install media devices and adding custom scripts, user IDs, cron jobs and site-specific information to your LPARs can also be time-consuming if done manually. Over time, many sites have just taken to upgrading and migrating existing systems rather than reinstalling them.

Alternatively, using the current versions of the VIO server and AIX, you can install your OS from images that you copy to USB flash drives. Flash drives are fast and by installing VIOS from USB, they solve the chicken and egg dilemma of how to install an OS when you’re installing the first machine in the data center. However, many of the same pitfalls apply with flash drives.

While many organizations install AIX using IBM PowerVC* virtualization and cloud manager to capture and deploy OS images, this method isn’t for everyone. Some encounter difficulties when coordinating among siloed teams to get PowerVC virtualization and cloud manager operational. (For one example, consider SAN teams that would have to allow access to their switches. This sort of bureaucracy can be overwhelming if you’re tasked with getting a green-field environment up and running quickly.)

The Advantages of NIM

So yes, when it comes to installing AIX, you have options. But for me, there’s only one real choice: the Network Installation Manager (NIM). NIM runs on an AIX LPAR, making it simple to deploy, even for those who haven’t previously used it. I also appreciate the control NIM gives me. I can choose where in the environment to run NIM, and I can have multiple NIM servers in multiple locations.

In addition, NIM is great for upgrades and migrations. If you need to migrate an older system to new hardware, you can take a backup image (in AIX it’s known as a mksysb) and bypass hardware limitations that might exist with older AIX versions. Yes, you can do the same thing from base media, but again, to make the solution scale, it’s best to use the network. Assuming your NIM server has the appropriate resources, it’s no trouble kicking off multiple installs or updates simultaneously.

NIM runs on an AIX LPAR, making it simple to deploy, even for those who haven’t previously used it. I also appreciate the control NIM gives me. I can choose where in the environment to run NIM, and I can have multiple NIM servers in multiple locations.

–Rob McNelly, Power Systems architect for Meridian IT

I also use NIM for alt_disk_ migrations and alt_disk_upgrades. By cloning your OS’s root volume group (rootvg) to a spare disk, you can perform operations on that volume group copy. Rather than take time during a change window, you can perform the upgrade beforehand, without affecting your running workloads. When you want to take backups, take them directly to your NIM server. If you already have a mksysb image, you can copy it to your NIM server. Then, after creating a spot, you can use that backup image to either restore or clone your LPAR.

Finally, as it’s sometimes difficult to get ports opened in enterprises, there’s also an option to use HTTP with your NIM server.

As I previously stated, NIM is easy to use, but it has one important requirement: Your NIM master must be at the highest level of AIX in your environment. You can’t use an older AIX version to install or restore a newer AIX version.

NIM Is Tried and True

As you evaluate different methodologies for performing installations, patching and ongoing maintenance in your environment, it’s easy to overlook tools we’ve been using confidently for years. Although newer options like PowerVC and BigFix* software (which uses NIM under the covers) will continue to gain wider adoption going forward, in my opinion, nothing beats the tried and true NIM server.

A Closer Look at Performance and Security Updates to AIX 7.2

Edit: I am installing AIX next right now.

Originally posted October 2018 by IBM Systems Magazine

I still hear from people who are convinced that the AIX* OS is going away. I’ve done my best to refute these arguments by pointing out that IBM’s support of Linux* is not a threat to AIX. I’ve even asked some well-known experts to explain why our favorite OS isn’t going anywhere.

But if you want to really understand why the OS has a bright future, start by taking a look back. IBM has been putting out roadmaps for years. In charts available prior to the Version 7.2 release, “AIX Next” served as a placeholder name for the upcoming variant. If you go back to 5L’s debut in 2001, you can see a consistent cadence of OS releases (and retirements) every three to five years. And every year or so, new service pack support is announced.

As with previous releases, I’ve seen current charts labeled “AIX Next.” AIX 7.2 became generally available in 2015, so we can expect something new around 2020. Will it be called AIX 7.3? AIX 8.1? AIXi? AIX X? Only time will tell, but rest assured, “next” is coming.

Performance and Security

In the meantime, let’s discuss the AIX OS in the present. I recently attended an IBM briefing about the latest AIX 7.2 technology level (TL). TL3 is expected to be available in the latter part of this year. Here are some highlights:

  • AIX will support running up to 1536 threads (192 cores running at SMT8) and up to 32 TB of RAM in a single LPAR. When I stop to think about that, I’m amazed. For as long as I’ve been at this, a terabyte of anything still seems like a large number.
  • In conjunction with IBM PowerVM* virtualization and IBM POWER9*, you’ll be able to include AIX in a processor-based chain of trust to secure the booting process. Secure Boot for firmware images helps prevent unauthorized access to customer data—either through unauthorized firmware running on a host processor or from security vulnerabilities in authorized service processor firmware or its hardware service interfaces.
  • Trusted Remote Attestation of firmware images enables a remote system to determine the level of trust in the integrity of the platform. The OS will also support trusted install and update, and an option will allow only privileged users to run kernel tracing.
  • Enhanced support for alt_disk_mksysb installs allow for customized boot images to be copied during alternate disk maintenance
  • Additional open-source tools and solutions from the AIX Toolbox for Linux applications
  • JFS2 file space reclaim for enhanced efficiency with thin provisioned storage solutions
  • Look for new Multipath I/O (MPIO) enhancements to support disk storage attached through the AIX iSCSI software initiator. MPIO storage resiliency will also be enhanced with changes to IBM-recommended MPIO drivers. Related: The recommended multipath driver to use on AIX and the VIO server when attached to SVC and IBM Storwize* storage devices running microcode levels Version 7.6.1 and later will be changing from SDDPCM to the default AIXPCM (ibm.co/2nzM1Do). SDDPCM won’t be supported on POWER9.
  • AIX 7.2 TL3 will run in SMT8 mode as a system default. This change stems from POWER9 showing impressive results with SMT8 (ibm.co/2nzM1Do).

Lastly, IBM recently released a statement of direction (SOD) that should excite administrators who need to connect their AIX systems to Windows* environments. The SOD reads in part: “IBM intends to enable the SMB2 (server message block) version for AIX to enable data exchange between AIX and Windows OSes.”

The Quiet Transformation

With each new release and each new TL, AIX users get more features and greater functionality. The OS, although superficially similar to AIX 4.3.3 or 5.3, has undergone many improvements to get us to AIX 7.2, yet IBM has done so in such a way to be minimally disruptive to the platform’s long-time administrators and users.

An important example of IBM’s care to minimize disruption comes with the release of the new POWER9 processor-based systems. AIX allows for Live Partition Mobility to help with planned migrations. LPARs running AIX levels supporting POWER9 servers can be migrated live from POWER7* or POWER8* systems to POWER9 systems and be run there as is, without workload interruption. Some clients seem to be unaware of this critical option. You’ll definitely want to keep this in mind as you plan and prepare to migrate to POWER9 hardware.

The AIX OS has always been rock-solid. While all of the changes over the years are incremental in nature, taken together, they’ve transformed AIX.

POWER9 Brings Changes to the HMC

Edit: I want my HMC.

Originally posted June 2018 by IBM Systems Magazine

The Hardware Management Console (HMC) is evolving, and you’ll need to adapt. IBM is moving away from the traditional x86-based hardware appliances and will only be selling POWER* processor-based 7063-CR1 HMC appliances going forward. One reason for the change is that some clients have concerns about running Lenovo hardware in their data centers, and as the supply of these appliances dwindles, we’ll no longer be able to order them. Others have always questioned why we were managing IBM Power Systems* hardware with x86 servers in the first place.

According to roadmaps I’ve seen, the 9.1.910 and 9.1.920 releases will be available sometime this year for both platforms. However, with the 9.2.930 release (expected in 2019), the code will be compiled for POWER only—not x86. As end of marketing and hardware support takes place, it will be time to migrate your data center away from x86 HMCs entirely.

Upgrades on the Horizon

Currently, you can choose from four different options to run your HMC. Of course, there’s the traditional x86-based HMC appliance that you’ve been using since the IBM POWER4 days. Alternatively, you can run a virtualized HMC (vHMC) image in either the VMware, KVM or Xen hypervisors on your own x86 hardware. Or you can run an HMC appliance that’s based on the POWER processor. Finally, you can run a vHMC image in an LPAR on POWER hardware. As with any other workload, the HMC code will be capable of utilizing the strengths of the POWER hardware—including additional threads, greater memory bandwidth and superior performance.

As you update the firmware on your POWER hardware, you’ll need to upgrade your HMC code. This is another area of change, as HMC code nomenclature will be different going forward. For instance, today’s HMC V8 R870 M1 denotes the version, release, maintenance level and any fixes. The version correlates to the POWER family, the release is the corresponding firmware, the maintenance is the service pack, and the fix is not used at this time. Starting with Version 9 of the HMC code, we’ll still have version, release, maintenance and fix, but now with—for example, V9.1.910—the version will be the POWER family and the release will only increment on major revisions, meaning you’ll see only infrequent updates. The maintenance will be the firmware release and the fix will be any PTFs. So rather than get new HMC releases, we’ll see new HMC maintenance levels.

To support these changes, IBM plans to regularly seek input from HMC clients. You’ll see your first survey 30 days after initial login; then you’ll receive new surveys every 180 days thereafter. The HMC team will carefully review the feedback and use this information to improve the tool.

Additional IBM PowerVM* virtualization simplification enhancements are also planned in support of the transition from the classic menus to the enhanced HMC GUI. For example, IBM intends to provide templates to simplify system deployments as well as integrate performance and capacity metrics. Efforts will also be made to simplify the process of partition provisioning. In training I recently attended, it was acknowledgment that the early beta releases of the enhanced GUI were less than stellar, but don’t let your first impressions tarnish your view. The performance and usability of the GUI has greatly improved.

Moving to POWER9

As you move to POWER9* and upgrade your HMC, consider:

  • The V9 HMC code will no longer allow you to manage any POWER6* hardware that might still be running in your data center. You’ll need the V9 R1.910 code to support the S914, S922 and S924 systems. And as new hardware models become available, you’ll need to update your HMC code to manage them.
  • Farther down the line, plans call for the HMC to be able to manage OpenPOWER hardware so you can manage all your IBM and non-IBM systems together. Note that not all functions will be supported initially; non-supported functions will be either inaccessible or will trigger error messages.

Adjusting to the New HMC

It’s time to get on board with the new HMC. While it can be frustrating to try and do things with the GUI that you could do in your sleep with the classic menus, myriad outlets are available for assistance.

The HMC is an integral part of how we manage our IBM Power Systems hardware. Keeping on top of these changes is an important part of maintaining the overall health of the systems we support.

An In-Depth Look at POWER9

Edit: Still love new hardware.

Originally posted March 2018 by IBM Systems Magazine

The POWER9 era is upon us. As you undoubtedly know, IBM announced six new POWER9 servers in February to go along with the initial POWER9 server that was unveiled in December.

Following up on the introduction of the S914L922S922S924H922 and H924 boxes, IBM released rperf and CPW numbers on Feb. 27. Note that as part of the benchmarking process, IBM has published numbers that reflect the addition of all known security and bug mitigations that will be installed on the new systems, which GA on March 20. This of course is a response to the Meltdown and Spectre bugs.

If you’re looking for more detailed information about these announcements, there are a couple of presentations that I highly recommend. In this video, IBMer Nigel Griffiths examines the S924 that he received via the early ship program. Hear his impressions of the server, and watch as he pulls out fans, moves the machine in and out of the rack and shows you the server internals, cable management arm and more. It’s a fun 14 minutes.

For a deeper dive, check out this IBM Power Systems Virtual User group replay and accompanying slides.

This 2-hour presentation by IBM’s Joe Armstrong is well worth your time. Here are some summary notes to give you an idea of what’s covered:

-The model AC922, announced in December, consists of POWER9 chips built from SMT4 “split” cores, while the six new servers run on the POWER9 SMT8 “big” cores. This is illustrated in slides 4-6.

-POWER9 chips have 8 billion transistors, compared with 4.2 billion in POWER8 and 1.2 billion in POWER7. POWER9 is 14nm, versus 22nm for POWER8 and 45nm for POWER7.

-In contrast to the buffered memory and custom chips used in POWER8 systems, POWER9 scale-out systems use a commodity form-factor direct-attached solution for the DDR4 memory subsystem. This allows for better pricing and lower latency. Expect to see buffered memory in scale-up enterprise-class servers down the line.

-Keep in mind that IBM will only support systems that use official IBM memory DIMMs. 16G (feature code EM62), 32G (EM63), 64G (EM64) and 128G (EM65) DIMMs are available for order.

The frequency that the memory will run at depends on the number of DIMMs that are populated per socket. One machine can have 16 DIMMs per socket, or 32 DIMMs total, for a maximum of 4 TB of memory.

-POWER9 servers have PCIe Gen4 adapters running at 192 GB/s peak bandwidth, doubling the rate on POWER8 servers, which use PCIe Gen3 adapters. Note that your Gen3 adapters will work with the new Gen4 ports.

-There are four processor modes (see slide 14): disable all modes, enable static power saver, enable dynamic performance and enable maximum performance, settings that correspond to minimum, nominal, turbo and ultra operational frequencies. After logging into ASMI, you should, according to IBM, be able to change modes as needed without a reboot. Running in nominal mode means the system doesn’t automatically make changes on the fly, as the other modes do. Under max or dynamic performance, fan noise may be louder than you’ve accustomed to with earlier systems.

-With the model S924 (and variants), you’ll have options of 12 cores running at 3.4-3.9 GHz (feature code EP1G), 10 cores running at 3.5-3.9 GHz (EP1F), and eight cores running at 3.8-4.0 GHz (EP1E). These will be in the P20 IBM i software group. Note that these frequency numbers indicate the turbo and ultra speeds.

-With the model S922 (and variants), you’ll have options of 10 cores at 2.9-3.8 GHz (EP19), eight cores at 3.4-3.9 GHz (EP18), and four cores at 2.8 to 3.8 GHz (EP16). These will be in the P10 IBM i software group.

-With the model S914, you’ll have options of eight cores running at 2.8-3.8 GHz (EP12), six cores running at 2.3-3.8 GHz (EP11), and four cores running at 2.3-3.8 GHz (EP10). The 8- and 6-core versions will be in the P10 IBM i software group; the 4-core option will be in the P05 IBM i software group. This is also the only system that defaults to the dynamic performance mode.

-With the model L922, you’ll have options of 12 cores running at 2.7-3.8 GHz (ELPX), 10 cores running at 2.9-3.8 GHz (EPPW), and eight cores running at 3.4-3.9 GHz (ELPV).

-There are up to four 400G NVMe drives, and you can assign each one to its own LPAR. In other words, you could assign one individual NVMe drive to one individual LPAR for up to a total of four drives in four LPARs. While you won’t be able to hot plug them like you can with an SAS drive, this setup nonetheless is great to use for internal boot drives for VIO servers. You could also logically carve up these drives, virtualize them in your VIO server and serve them to your vSCSI clients. In other words, you could assign internal disk to your LPARs as well (provided these aren’t heavy, write-intensive workloads).

-AIX and VIOS OS images can be downloaded as a single install image from IBM Entitled Support. This simplifies installation from USB drives.

-The chart on slide 23 presents a good overview of the machines, including details like the number of sockets, the amount of memory, the number of CAPI 2.0 slots, and more. Slides 24-51 get into the specifics of each machine. Slides 52-57 cover different I/O adapter options, and slides 58-70 go into supported operating systems, including roadmaps that extend years into the future.

HMC options are covered in slides 73-76. The CR7, CR8 and CR9 are no longer being sold, so look for the 7063-CR1 HMC, which is based on POWER processors. Alternatively, you could choose to run one of your HMCs as a virtual HMC.

Make sure you’re running V9R1.910 HMC code to manage your POWER9 servers. Keep in mind if you update to this version, you will no longer be able to manage POWER6 servers in your environment.

-Don’t lose sleep over migrating to POWER9. Slides 77-82 get into details about migration and other topics. If you don’t have PowerVM installed, learn how to get temporary PowerVM Enterprise Edition codes for your older hardware so you can migrate workloads to POWER9 using Live Partition Mobility. On that note, since all new POWER9 servers will have PowerVM Enterprise Edition by default, that means all POWER9 servers can run Live Partition Mobility by default.

-The POWER9 power supplies will run 1400W 200-240 VAC. POWER8 servers run 900W power supplies.

As you can see, POWER9 gives us a lot to be excited about. As I’ve said before, I can’t wait to get my hands on these new machines.

Seriously, AIX is Not Going Away

Edit: Seriously.

Originally posted February 2018 by IBM Systems Magazine

Lately I’ve received a number of inquiries about the future of AIX. In a sense, I understand the fears. Sometimes we hear about companies migrating from AIX. Often you’ll see Linux featured in mainstream tech media, and seldom will you find much about AIX. But AIX is not going away.

Don’t take my word for it. Here are other views on the future of AIX. Hopefully the responses from these prominent folks will prove persuasive.

(Note: These comments contain minor edits.)

Joe Armstrong, Power Systems VUG (this is from a January email to user group members):

IBM is continuing to invest in the AIX operating system, and I have seen AIX roadmaps well into the future. Again, AIX is not going away. However, Linux is growing in the industry, and Power Systems are a superior platform to run Linux workloads. SAP HANA and the growing number of Machine Learning and AI workloads are a prime example of where IBM Power Systems shine….

In reality, the VUG sessions have always covered more than AIX. Power Systems VUG is a much more meaningful name, and I intend to continue providing the same informative webinars that you have enjoyed over the past ten years, and I will continue to cover AIX specific topics.

With the introduction of POWER9, new features, and updates to existing features, there is a lot to cover in the Power space. Rather than replacing AIX related webinars, I plan to offer some extra webinars, so you may see more than one per month….

Nigel Griffiths, IBM:

(An AIX user) asked me, “Do you think that AIX is dead for the future? There are fewer and fewer proposals from recruiters.”

I answered with:

That is a truly bizarre conclusion to come to on that evidence. AIX is a multibillion-dollar business for IBM. Why would IBM stop that revenue? That would be bonkers!

AIX is running in the vast bulk of major companies in the world, and AIX is running their most vital workloads. IBM and AIX are here for the long term. There is no better UNIX on the planet and it is the OS that my bank account is held in! Sure the AIX guys don’t make the same volume of noise as the Linux fanatics, as they are quietly running the core systems.

Don’t get me wrong, Linux is great fun but when I run into critical to the business problems, I want to be on AIX… the AIX support team are second to none—and I know many of them personally (being in the same company for so many years). So claiming AIX is dead is IMHO rather silly.

Earl Jew, AIX performance expert, IBM:

AIX won’t be fading for a long while, if ever. Do we still have a niche for IBM z mainframes? Do we still have a happy following of IBM i customers? Yes and yes. Likewise AIX will endure too. All that is happening now is… other IT niches are emerging, growing and evolving… For instance, Linux on Power has exploding opportunities with SAP HANA, AI, cloud and machine learning.

The “rise of the rest” doesn’t mean AIX will fade and die. AIX will endure because the nature of AIX workloads is durable. Commercial enterprises will always need fast secure reliable processing of traditional structured data on systems of record. AIX is best for open systems that make money, mean money or are money. Such systems are thus long-lived; they tick along on AIX through decades as they spark, grow and evolve. AIX will also persist because AIX evolves while remaining essentially AIX through the decades. AIX evolves to better exploit POWER technologies and AIX evolves to better serve what runs on AIX—yet AIX is still AIX.

We learn AIX and keep it. We use AIX and love it. We port to AIX to stay. This is why AIX will not fade and die.

Shawn Bodily, an IBM Champion and former IBMer now with Clear Technologies:

When someone asks me why AIX, I often ask some of the following questions: Is security important to you? Do you like having one OS across all server sizes? Is RAS important to you?

That last question I ask with some fear of it being rhetorical and even a bit cliché. AIX is a mature enterprise class OS that, when combined with PowerVM and POWER processor, has among the highest uptime and least amount of security vulnerabilities. AIX offers some unique features that are not only unavailable in Linux, but not expected in Linux anytime soon. One example of that is Live Kernel Update. This key differentiating feature has come to fruition in just the last couple years. This is a testament to the continued development and long term viability of AIX….

Here’s more from Nigel:

We also get another erroneous conclusion: People think that since IBM is not announcing AIX V8, then AIX is dead. IBM can put out massive new functions in the TL levels of AIX 7.2 without an AIX V8. AIX V8 would be disruptive, as it implies a slow overwrite install upgrade. People confuse version numbers with commitment to the product. I really don’t get it….

I think technical people see themselves as AIX experts as they use AIX commands every day and they know AIX features very well. They are betting their careers on developing AIX skills. AIX is here on the screen in front of them “talking” to them. They don’t think of themselves as POWER8 experts as the server is miles away in a dark room.

Again, IBM is still heavily invested in AIX, which runs critical workloads in large businesses the world over. As Nigel says, we do not need an AIX V8 to realize additional improvements in the operating system; enhancements arrive all the time.


One final thought: If you’re truly concerned about the future of AIX, take action. Spread the word. Attend conferences and user group meetings. Share webinar replays with colleagues.

I’ve written about other things we could do, like interact on Slack or irc. There’s also the AIX sub-Reddit and the AIX forum.

It’s one thing to love the operating system, as we all do. But we should be proactive about it. If we don’t help spread the word, people can easily convince themselves, despite ample evidence to the contrary, that AIX is going away. What are you doing to help dispel this perception?

POWER9 Hardware and More to Look Forward to in 2018

Edit: Some links no longer work.

Originally posted January 2018 by IBM Systems Magazine

For most of us, the holidays are a time to unwind. Of course I say “most of us” because in IT, someone is always needed to keep tabs on those machines on the raised floor. Even though laptops, phones and VPNs make it possible to arrange for coverage remotely, there’s still nothing like being able to completely unplug from work for a week or two (preferably on a beach somewhere). So if you were tasked with being on call during the holidays, I salute you. And if you were fortunate enough have time off at the end of the year, I hope you enjoyed it, and I assume you’re now ready to get back to it.

Being away from our jobs is certainly relaxing, but with everything that goes on during the holidays, it’s easy to lose track of news about AIX and IBM. With this in mind, let’s take a quick look back at some things you may have missed over the final weeks of 2017.

No doubt, you’ve heard that POWER9 is here. While I expect 2018 to be an eventful year with POWER9 server announcements―as well as the expected updates to AIX, IBM i and the VIO server itself―the first POWER9-based server, the AC922, was unveiled in December. This announcement naturally drew a lot of coverage from mainstream tech outlets, including ZDNetCRN  and Tech Crunch

Given this IBM statement of direction, look for much more going forward:

IBM intends to offer clients with IBM Power E870, E870C, E880, and E880C systems the following capabilities that are designed to provide a smoother migration path to the POWER9 technology-based systems when they become available.

IBM plans to offer system upgrades from Power E870, E870C, E880, and E880C systems to the next-generation POWER9 systems that will maintain the serial number of the existing IBM POWER8 systems.

IBM intends to deliver the capability for the next-generation high-end system with POWER9 processors to participate in the same Power Enterprise Pool with Power E870, E880, or E870C/E880C systems.

Although this doesn’t speak to timing―and this article is certainly not meant to be an announcement of any kind―it’s entirely reasonable to assume that new POWER9 servers will be coming in the relatively near future. I, for one, cannot wait to start installing them with clients.

Other Changes

Big changes are also in store for the HMC. I wrote about the four HMC options, including a virtual HMC that will run on x86 and a virtual HMC that will run on Power servers. There’s also an HMC appliance that will be based on POWER hardware as x86 hardware gets phased out over time. Because some data centers won’t allow Lenovo hardware onto their floors (which the current HMC appliances are based on) and since POWER hardware is a better choice anyway, it makes even more sense to switch from x86-based HMC appliances.

In addition, there’s a new version of PowerVC. Read about version 1.4.0 here and here. Some highlights include integrated software defined storage, support for machines running KVM on POWER, the capability to import/export deployable images, the capability to capture a live virtual machine and UI updates.

PowerAI

is a platform that bundles the most popular machine learning frameworks with all of the dependencies and optimizations needed to get up and running quickly. To get started with PowerAI, download the code or request a trial.

https://www.ibm.com/developerworks/community/blogs/Power_Systems_Solutions/entry/5_Things_to_Know_About_IBM_PowerAI?lang=en

https://www.ibm.com/us-en/marketplace/deep-learning-platform

Another interesting development is “cloud ready” AIX images. I’ll let Chris Gibson fill you in:

In addition to installation images for AIX, “cloud” image formats are also made available that can be readily deployed with PowerVC. These images contain a default AIX base media install configuration that includes Cloud Init and its dependencies. The images can be obtained from the IBM Entitled System Support website or IBM Passport Advantage.

This covers plenty of other ground, including the capability to install AIX from USB:

AIX 7.2 technology Level 2 and AIX 7.1 technology 5 support installation via a USB flash memory stick on POWER8 and later systems. A USB flash memory stick containing an AIX installation image can be created by first downloading the AIX installation image from the IBM Entitled System Support website. A single volume installation image of these AIX levels is available on the Entitled System Support website for writing to a USB flash device. Once downloaded, the AIX installation image can be written to a USB flash memory stick. It is recommended that a recently manufactured USB flash memory stick be used.

Here are a couple recent updates about IBM’s PowerHA product suite: the new PowerHA SystemMirror GUI and the latest version of PowerHA SystemMirror for Linux.

Finally, the Power Systems best practices document was recently updated.

Two Worthwhile Webinars

The monthly AIX Virtual User Group meetings are a great resource. Even if you can’t tune into the live webinars, the replays are typically available a few days after the session. Subject matter experts from IBM and elsewhere cover a wide range of products and technologies. Recent presentation topics include PowerAIthe AC922  and the new HMC interface. Although I’ve linked to the presentation materials, I recommend you also listen to the session replays, which can be found on the homepage. The AIX Virtual User Group replay archives go to 2008, with presentations going back to 2007. Much if not most of this information remains relevant today.

Another webinar series originates from the U.K. The format is similar to that of the AIX Virtual User Group, and again, quality, in-depth information is provided on a variety of topics. The archives are here. The U.K. group has recently covered PowerAI, the Cloud Management Console and the enhanced HMC GUI  (this latter session includes a demonstration).

The Case for Keeping Up

It may sound trite, but in our line of work, there is always more to learn. Hopefully you’ll find value in the information I’ve cited. Even if some of this material isn’t currently relevant to your job, you never know when you may find yourself in a meeting or in a discussion with a colleague and be asked for your two cents about cloud or machine learning or any number of subjects. Obviously our day-to-day responsibilities are substantial, but I believe that informing ourselves about what’s new and what’s changing is worth the time and effort.

The Value of Performance Data

Edit: Are you tracking your system performance?

Originally posted December 2017 by IBM Systems Magazine

At the most recent IBM Technical University event in New Orleans, I was talking with Randy Watson of Midrange Performance Group (MPG). He mentioned that many customers don’t keep any performance data whatsoever.

Randy’s words surprised me. Everyone should carefully track system performance. To say it’s worth the cost and effort is an understatement. This data provides a variety of important benefits.

Three Scenarios

Consider this scenario: Your phone rings in the middle of the night. You’re told that users are having issues with one of your systems. Once you clear your head, you bring up the performance graphs that display your LPAR’s historical data. You can immediately see where and when things changed. Now you’re on track to determine what’s causing the issues you’re seeing.

Performance graphs provide an easy way to visualize your environment running normally compared to how it looks when there’s a problem. Of course in real life things are seldom cut and dried. For instance, subtle changes may require you to go back over longer period of time to find something.

While graphs can prod things in the right direction, they do have their limits. Sometimes graphing your data can hide or at least distort the truth, a point AIX performance expert Earl Jew makes in his articles and lectures. Much of what you see when interpreting graphs will depend on the specific items you’re looking at and the length of your intervals.

Still, performance graphs are typically helpful in these situations. If someone tells you that performance is degraded, you need data. How can you begin to understand the impact of a change to your environment if you have no idea what normal looks like?

Historical data is also useful in areas beyond performance. Here’s another scenario: Management tells you that it’s time to migrate to POWER8 servers, and they want to know what models and hardware components you recommend for the refresh. Naturally, you’re not going to guess. You’ll check the aggregated historical performance data that encompasses all of the LPARs in your environment. You’ll project what workloads can be expected to do over the expected lifespan of the new hardware and estimate the performance gains the new hardware will bring. You’ll give management every reason to take your thoroughly researched recommendations seriously.

And now for one more scenario that highlights one more benefit of consulting performance data: Your enterprise is looking to add new workload to the physical hardware and is considering consolidating some other workloads from other data centers. Where is the best place for this new workload to land? Can existing servers handle additional memory or CPU, or should new adapters be brought in? Is ordering all new server hardware the right answer?

Performance Monitoring Tools

Various products―some fee-based and others that come at no cost―can be deployed to monitor performance and alert you to potential problems. Tools can certainly save you the effort of looking up rperf numbers, creating spreadsheets and guessing.

Cluster management

* Nutanix running on Power is an offering for customers that use Hyperconverged servers, which debuted earlier this year. Nutanix allows you to run capacity planning reports directly from Prism. The reports, which include graphs and charts, will inform you of your capacity usage, projected growth requirements, and are designed to help you manage your cluster resources.

* Ganglia is a cluster monitoring tool that’s designed for AIX high-performance computing (HPC) environments.

https://www.ibm.com/developerworks/community/wikis/home?lang=en_us#!/wiki/Power+Systems/page/Ganglia

Scripts and software

* nnomchart is a Korn shell script for AIX or Linux. It converts nmon collected files to HTML and displays more than 50 AIX and Linux performance graphs and configuration details.

http://nmon.sourceforge.net/pmwiki.php?n=Site.Nmonchart

* lpar2rrd is free software:

http://www.lpar2rrd.com/

The tool offers you end-to-end views of your server environment and can save you significant money in operation monitoring and by predicting utilization bottlenecks in your virtualized environment. You can also generate policy-based alerts, provide capacity reports and forecasting data. The tool supports IBM Power Systems* and VMware* virtualization platforms. It is agentless (it receives everything from the management stations like vCenter or HMC). Collected data set can be extended about data provided by the OS Agents or NMON files.

Vendor Tools
Note that I don’t endorse any of the products listed here, but these commercial solutions are certainly worthy of your consideration.

Galileo Performance Explorer

Help/Systems Robot Monitor

Midrange Performance Group Performance Navigator 

In addition, IBM has its performance management product, as well as PowerVP. You could even just activate topas or nmon recording on each of your LPARs.

Setting a Course of Action

Once you choose a product or tool, you then need to decide what you want to accomplish. Is your focus going to be performance monitoring or capacity planning? Are you most interested in graphs and dashboards? Do you want to see trends?

During my discussion with Randy Watson, he mentioned that ultimately, most customers will use performance data either to conduct some kind of server sizing or to implement workload consolidation (scenarios 2 and 3 from earlier). You’ll need to collect data for a reasonable amount of time in order to make any useful projections, so the sizing process in particular can take awhile if you haven’t previously collected data.

According to Randy, the size that the data MPG’s product generates on local disk varies depending on the number of LUNs in the environment, but with 5-minute intervals, 1-5 MB per day can be expected. He said that MPG tries to manage its customers’ historical consolidated files by only keeping 90 days of disk data. In addition, they delete about 20 percent of daily file size by removing redundant data (e.g., configuration data that doesn’t change). A year’s worth of data should be kept by default. For most customers, that amounts to less than 1 GB of data for that consolidated file.

I’m sure other vendors take similar approaches to keep a handle on the amount of data being collected. Then again, with the large disk sizes that are available now, spending a reasonable amount of capacity on historical performance data shouldn’t break anyone’s budget.

Maintaining uptime is important, as is planning for the future. Not only do you need to keep your servers running, you must proactively ready them for what lies ahead. Are you doing your part?

IBM Debuts Hyperconverged Servers

Edit: Have you tried this yet?

Originally posted August 2017 by IBM Systems Magazine

In May, IBM announced it was partnering with Nutanix to “bring new workloads to hyperconverged deployments.” In July IBM unveiled two new hyperconverged systems. So what does IBM’s move into the hyperconverged infrastructure market mean? For that matter, what is a hyperconverged infrastructure?

https://www.youtube.com/watch?v=5R8l81K8UB8

Per Wikipedia, a hyperconverged infrastructure describes systems that virtualize everything. It includes a hypervisor, software-defined storage and software-defined networking. It will typically run on commodity hardware.

This would be different from the IBM Power Systems servers that I’ve used over the years. In those environments, the machines connect to a storage area network (SAN) via fibre channel adapters. Although PowerVM gives me a great hypervisor and access to an internal network switch, a hyperconverged cluster of servers has direct-attached disks, and the servers communicate over a 10G Ethernet network, sans a SAN. Seriously, no SAN is involved.

So why is IBM interested in Nutanix? Their claim is that they are able to make your underlying infrastructure invisible. They have also been growing by leaps and bounds over the past few years.

It’s very possible that you are already running—or at least thinking about running—an x86-based Nutanix cluster. Historically, Nutanix clusters would run on x86 hardware from Nutanix, Dell, HP or Lenovo. You would set up your cluster and choose your hypervisor: ESXi, Hyper-V or Nutanix’s free hypervisor, AHV, which is based on CentOS KVM.

As noted, IBM has two new servers, the CS821 and CS822, which run the Nutanix software. They’re available in a few different hardware configurations.

The CS821 is model 8005-12N. It’s a 1U server that has 2×10 core 2.09 GHz POWER8 CPUs with up to 160 threads, 256G memory and 7.68 TB of flash.

The CS822 is model 8005-22N. It’s a 2U server that has 2×11 core 2.89 GHz POWER8 CPUs with up to 176 threads, 512G memory and 15.36 TB of flash.

Now, under the IBM-Nutanix union, you have a choice when it comes to the processor: POWER or x86. The CS821 and CS822 servers run AHV, and the virtual machines running on top of the hypervisor are running Linux on Power. AIX and IBM i aren’t supported as virtual machines at this time.

Nutanix handles all cluster management through its Prism product. The management interface is accessible via browser, command line, shell, etc. You mix and match your clusters based on the hypervisor you pick, and run them all through the same instance of Prism (although you would have to drill down to manage each cluster individually). With the CS821 and CS822 machines, this means that your new POWER based cluster will appear in Prism as just another cluster that happens to be using a different processor. You won’t be able to mix and match POWER and x86 nodes in the same cluster, but you can still manage a POWER cluster in much the same way as you’d manage an environment of existing x86 clusters.

What exactly do you gain by running Nutanix software? For starters, it’s an established product that’s scalable, reliable and distributed. The storage layer is handled by the Acropolis Distributed Storage Fabric (ADSF), which determines where to store your data on disk. Since a minimum cluster consists of three nodes, out of the box you will have resilience as the data gets copied―locally, and also to at least one other node, depending on the resiliency factor you choose and how many nodes are in the cluster.

ADSF is designed for virtualization. It handles tiering across your spinning hard disks, SSDs, etc., and, as your VMs relocate to different hosts in the cluster, it will take care of getting the hot data to the right node. In addition, ADSF handles snapshots, clones, deduplication and compression.

You can set up replication factors for your storage depending on how many nodes you have in your cluster. For example, choosing RF3 will allow for one node in your cluster to fail. RF5 will allow for two nodes to fail.

When it’s time to grow your cluster because you need more CPU, memory or disk, just add another node. It’s seamlessly discovered and integrated.

For an in-depth look at the technical specifications of the product, I recommend the Nutanix Bible

Part 1 discusses a brief history of infrastructure and it discusses the problems that Nutanix is trying to solve. Part 2 primarily covers Prism, the basics of the GUI and navigation, upgrading your cluster and accessing I/O metrics. There are screen shots. In addition, there’s a capacity planning feature that includes details about projections of when it might make sense to add nodes based on the current and predicted workloads.

Part 3 is the book of Acropolis, the storage compute and virtualization platform. Acropolis is “a back-end service that allows for workload and resource management, provisioning, and operations…This gives workloads the ability to seamlessly move between hypervisors, cloud providers, and platforms.” Included is a visual comparison of the Acropolis and Prism layers. Another image shows a typical node. That’s followed by a visual of a cluster looks with the nodes linked together.

Different Nutanix components are defined, including:

  • Cassandra, the metadata store
  • Zookeeper, the cluster configuration manager
  • Stargate, the I/O manager
  • Curator, MapReduce cluster management and cleanup
  • Prism, the UI and API
  • Genesis, the cluster componenet and service manager
  • Chronos, the Job and Task Scheduler
  • Cerebro, Replication / DR manager
  • Pithos, vDisk configuration manager
  • Acropolis Services, handles task scheduling, execution, etc.
  • Dynamic Scheduler, makes VM placement decisions

Finally, you can see how Nutanix handles the different levels of potential failure, including disk and node failures.

There’s much more, and the document continues to be updated. If you read through the Nutanix Bible, I think you will have a very good understanding of the platform and how it differs from other cluster solutions you’ve used.

As you continue to plan for updates to your data center, you should really give IBM Hyperconverged Systems powered by Nutanix a closer look.