Helping Visitors is Also Part of the Job

Edit: Still worth considering. Some links no longer work.

Originally posted March 19, 2013 on AIXchange

I visit many customer locations, and each experience is unique. Of course some are more pleasant than others. From my previous jobs at IBM and elsewhere, I know what it’s like to work in the same building every day. But as a consultant, I also know what it’s like to show up at an unfamiliar facility.

While it’s not easy to see your workplace through the eyes of someone who’s never been there before, it’s important to understand the needs of anyone who might need to come into your facility on a one-time or temporary basis. Think of the basic information a newcomer would need. When I go on a customer visit, the first things I need are an address and (hopefully) a contact name and phone number. The fact that phones now have GPS capabilities has truly simplified my life. It wasn’t that long ago that I had to deal with paper maps and/or printed directions from MapQuest or some other website. If I missed an exit or got turned around in a strange city, it could be difficult to get back on track. Now with the automatic rerouting I scarcely put any thought into my trips.

Of course, nothing is guaranteed in this life, including cell phone service. On one customer visit I was without service due to a snowstorm. At least I’m guessing it was the storm. It could have been a coincidence. Whatever the cause, the entire cellular network went out. I lost the maps on my phone, and of course I didn’t have anything printed out. Fortunately, I was near my destination and familiar with the area in general, but in some other town that could have been disastrous.

It’s also important to understand the limitations of GPS. It generally gives you the shortest route, but the shortest route is not always the best route. In fact, if you don’t know where you are you can find considerable trouble just blindly following GPS directions.

So, about your workplace: Is it friendly and accessible to strangers?

Can visitors easily get into the parking lot, and once there, can they make use of close-by visitor parking spots? If your parking situation is unique, you need to let your visitors know. Some companies have guard gates and allow visitors only with advance notice. If this is the case, notify your company’s security team about your visitor’s arrival.

When your visitor arrives, is it clear where they should enter? At one site we were told to enter Door 1. It turns out there were several doors labeled Door 1. Visitors need to know exactly where they need to be.

More security considerations: Do visitors to your company need to sign in with reception? Do they need to show ID and get their picture taken? Do they need to have you come and physically escort them? Even simple things like restroom access can be an issue. I’ve been to many sites where there’s no visitor’s restroom. Needless to say, after a long drive, I generally have use for such facilities. If you don’t have a place where visitors can freshen up, let them know.

Are visitors’ backpacks or electronics inspected at your site? Is there a metal detector? Are phones with cameras allowed on the raised floor? Is there guest wireless access? Is there a place to eat? Is there even a place to sit?

While most companies do right by their visitors, I’ve heard of and experienced my share of horror stories.

What about you? Is your facility visitor-friendly? Or if, like me, your job frequently takes you to new sites, have you had any issues?

For the AIX Newbies

Edit: Some links no longer work. Updated the roadmap to a list of courses from Global Knowledge. Another link to try is this one: https://www.ibm.com/services/learning/us/

Originally posted March 12, 2013 on AIXchange

I assume most of the readers of this blog have years of experience with AIX. But it’s important to recognize that new users regularly come to this operating system. Often these pros previously worked on other versions of UNIX, or even another operating system altogether. If you’re new to AIX, you should be aware of the numerous available options for getting up to speed.

We’ll start with free sources of AIX information and education. The amount of high-quality, freely available information may surprise you. Of course I’ll start by mentioning this blog and recommending that you sign up for IBM Systems Magazine. (Even if you’re not new to the platform, why not get a free copy of the magazine?)

You can also view these quicksheets and quickstarts and these Nigel Griffiths videos. Then there’s the AIX Virtual User Groups. Be sure to check out both the upcoming sessions and the highly informative replays.

The other primary source of free AIX education is IBM Redbooks.

Everything I’ve listed to this point is freely available. Still, to get the most from your experience on AIX, you should also invest in training. Which classes should you attend? This depends largely on your background and experience, as well as, of course, your available time and monetary resources. IBM’s training website is a good starting point. There’s an AIX Users and System Administration roadmap to plan your training, which can take the form of instructor-led online classes, traditional classroom training or self-paced virtual courses.Here are some more specific roadmaps:

AIX Security, Network Administration, Problem Determination, Virtualization and Performance

PowerHA SystemMirror for AIX

AIX Systems Management, Clustering, Internals and Cloud Computing

Once you select the training you want to take, you can consider ways to save. IBM Education Packs can be used for either for IBM classroom, online or onsite training courses. They can also be used to attend IBM technical conferences.Finally, here’s a list of one-day IBM Power Systems training events that run through June.

Non-IBM vendors provide training as well, including Jack Morton and M/UX Data Systems.

Think about someone who’s new to AIX. What other sources of information would you recommend to someone just getting started with the operating systems? Please leave your suggestions in comments.

Readers Respond

Edit: The comments have been lost over the years. Some links no longer work.

Originally posted March 5, 2013 on AIXchange

My recent post about command line shortcuts generated some very good responses. For instance:

            “The part about looping on a set of values reminded me of seq. I missed it from my Linux days, and so had written an imitation in perl before realizing that the AIX Toolbox for Linux Applications page has it packaged in the coreutils RPM.”

Another reader pointed out the apply command:  

“I’m a fan of the least known command, apply. So for your first command:


            apply “lscfg -vl fcs%1” 0 1 2 4 | grep Net
 

Be sure to read the other comments from that post for more great tips.

Incidentally, if you want to learn more about the apply command, look at the man pages. Run “man apply” on your AIX machine and you’ll see:


            “The apply command runs a command string specified by the CommandString parameter on each specified value of the Parameter parameter in turn. Normally, Parameter values are chosen individually; the optional -Number flag specifies the number of Parameter values to be passed to the specified command string. If the value of the Number variable is 0, the command string is run without parameters once for each Parameter value.


Notes:
Because pattern-matching characters in CommandString may have undesirable effects, it is recommended that complicated commands be enclosed in single quotation marks (‘ ‘). You cannot pass a literal % (percent sign) followed immediately by any number without using the -a flag.”

It seems wherever I go, I learn something new. I certainly learn from commenters on this blog, but throughout my career I’ve been fortunate enough to interact with others who’ve taught me simple tricks that have made my job easier. It’s truly one of my favorite things about my career choice.

For instance, there was the customer who informed me that you can move the Windows toolbar from the bottom of the screen to the right side. It takes some getting used to, but if you have a ton of applications open, this option really seems to make better use of your desktop space.

Another customer introduced me to a tool called launchy that I’ve come to love.

Long ago I learned that by running the following from your VIO client…


            #lspath -F “name path_id parent connection status”
            hdisk0 0 vscsi0 810000000000 Enabled
 

… you can map that output to your VIO server when you run lsmap –all.
 

Check it out. The LUN information in your lsmap output…


            LUN           0x8100000000000000
 

… directly corresponds to the lspath information above. This is another way to map disks from the VIO client to the VIO server.

As others have taught me, I’ve tried to return the favor by writing about some lesser-known tools that I’ve relied on over the years. Two of my favorites have always been VNC and screen, but of course the list has grown through time. Back in 2009 I pointed out some other useful tools.So let’s revisit that discussion. Is there some undeveloped (or unknown) capability that you’d like to see? And which desktop tools do you use now on a daily basis that you couldn’t live without?

When Words Don’t Get the Job Done

Edit: Still true, although google translate and duolingo can help these days.

Originally posted February 26, 2013 on AIXchange

As a youngster I worked on AS/400 systems. One day I needed to go from our U.S. corporate headquarters to our manufacturing facility in Tijuana, Mexico, to help install some dumb terminals and printers. I’d fly from Phoenix to San Diego and then walk across the border, where I was picked up by the manufacturing guys. Border crossings were more manageable in those days, since you could get over and back with only a driver’s license.

Being raised in the southwestern U.S., I’ve been around Spanish-speaking people my whole life. Despite this — and the Spanish classes I took in high school — I never really picked up the language. So when I go south of the border, I have to hope I run into English-speakers.

On this particular trip I remember trying to communicate with the crew I was working with. Only the office manager spoke English; the others at the plant did not. Somehow we got everything to work, but the language barrier made for a long and occasionally exasperating day.

I still run into some of the same thing with international teams. When I worked at IBM I remember a project where the developers were in Germany writing code, while the servers and the administrators (including me) were in the U.S. Although their English was good (certainly much better than my non-existent German), we still had to overcome time zone differences and other little misunderstandings along the way.

Interacting with others around the world, I can tell you that language barriers can be quite frustrating. Over the years I’ve had numerous discussions that bordered on games of charades (or perhaps, Pictionary) — different groups of people diagramming, pantomiming or simply guessing at what the other side was saying. Honestly, during those moments it’s tempting to view those who don’t speak your language as less intelligent. I suppose others could have looked at me that way, too — particularly since so many people in other countries at least have a grasp of English, whereas I don’t speak any foreign language. Sometimes in these situations the written word is more easily understood than the spoken word. After all, there are no accents in email or instant messaging communications.

Despite some difficulties here and there, I have mostly fond memories of these interactions. For instance the engagement with the Germans ended well. The German team came over for the go-live, and I got to play tour guide. During some down days we went sightseeing around Colorado. I still remember driving the car out of the Rocky Mountain National Park while a baseball game was in progress on the radio. It was fun to try to explain a game they had never seen based on solely on the announcer’s descriptions of the action.

I’ve been on the other side of this, too. A few years ago I was in South Africa visiting family when the South Africans were taking on Australia in cricket. This was a big deal there. TV broadcasts trumpeted the big “five-day test.” What I remember is that after the five days, the thing somehow ended in a tie. The locals tried to explain it to me, but I never did quite get what the fuss was all about. Of course, back home, the NFL postseason was going on, and I could never get the South Africans to understand either the game of American football in general or why I was so interested in those playoff scores.It’s a cliche, but the world does seem to be getting smaller. All in all, that’s a wonderful thing.

Do you interact with friends, family or coworkers from other countries? What methods do you use to help make sense of one another?

The Case for Patching

Edit: Still important to consider.

Originally posted February 19, 2013 on AIXchange

Do you update your systems? Do you patch your machines monthly? Quarterly? Annually? Do you ever patch?

Are change windows built into your environment (e.g., there’s scheduled system maintenance, say, the third Sunday of each month)? Is it too difficult to get the various applications owners to agree to a set downtime because you have so many different LPARs running on your physical frame? Is downtime simply not allowed in your environment?

Over the years I’ve met a number of people who live by the “if it ain’t broke, don’t fix it” adage. What’s funny is oftentimes the older a system gets, the more reluctant customers are to maintain it. Logically these systems have a greater need for attention than something just out of the box. Of course we’ve all used, seen or at least heard about systems that just kept running. Recently I saw F40s that are still in production, still running AIX 4.3 and still chugging along. And sure, they can keep going for a long time to come. We are fortunate enough to work with incredibly powerful and well-built hardware.

But just think about an older system — not only the hardware that’s running old microcode, but the HMC that’s running old code, the operating system that hasn’t been patched and the application that hasn’t been updated. Even if the machine isn’t visible to the Internet, there’s still great potential for things to go wrong. And if something does go wrong, how would you respond?

Customers in this situation know they’re on their own, and they’re OK with it. Typically I’m told that the application vendor is no longer in business, so they can’t get support for that code anyway. If their hardware dies, they hope they can find someone who can help them — someone who’s familiar with the limitations of older OS versions. They hope they can still get parts for their old hardware. (Along those lines, I know of folks who buy up duplicate servers just so they can have parts available to swap out. I just hope that these customers realize that tearing out part of an old machine and successfully putting it into another old machine is a unique skill.)

So I’ve heard it all, but I’ll never truly understand people who would take these chances. Why rely on hope? There are alternatives — alternatives that don’t involve buying all new systems.

For instance, if you’re running AIX 5.2 or 5.3, you can move onto newer POWER7 hardware by utilizing versioned WPARs. This allows you to keep running your older code on newer, supported versions of the operating system, which in turn provides you with some limited support options.

Many of us who’ve called IBM Support learned that our issue was a known problem that was addressed with an operating system fixpack or firmware update. That’s the advantage of paying for regular maintenance. Updates to your machines and operating systems take care of the known issues.

Of course some will then argue that making these types of changes could introduce new bugs or issues that would have been avoided by not fixing what wasn’t broken. My response to this argument is that test and QA systems are really important. Implement your changes on these boxes first; then move them into production.

Some methods to consider for hardware maintenance include Live Partition Mobility (LPM) or PowerHA. With LPM you can evacuate running LPARs onto other hardware with no downtime, conduct maintenance on your source hardware and then move the LPARs back to the original hardware. Using PowerHA you can move your resource group to a standby node, conduct maintenance on your original node and then move your resource group back. In this case a short outage for the application to restart each time the resource group moves is required, but PowerHA is much faster than some alternatives.

(Note: Whether or not you’re doing maintenance, periodically moving your resource groups around in a PowerHA cluster is a good idea. By doing this you can make sure that the failover actually works, and that changes haven’t occurred on node A weren’t replicated on node B.)

For OS upgrades you might use alt disk copy or multibos to update your rootvg volume group by making a copy of it and updating that copy. You can boot from that copy after the update, and if anything goes wrong, you can quickly change your boot list and return to the original boot disk. This would simplify your backout process if you needed to go back for any reason.

So where do you stand on patching? Let me know in the comments.

Sometimes Even Consultants Need a Consultation

Edit: Some links no longer work.

Originally posted February 13, 2013 on AIXchange

Recently I was brought into a large migration project that was already underway. An outside team had done the design, and the goal of these folks was to create a system that emphasized simplicity. To make it easy to manage, they decided that the system wouldn’t have virtualization or allow the sharing of resources. Each LPAR would have dedicated adapters and dedicated CPUs.

It’s been some time since I’ve seen large systems designed and set up like this. I will admit that, with non-virtualized systems, determining which card belongs to which LPAR is a snap.

Of course there were still challenges. As the system was being set up, the decision was made to install IBM Systems Director on one of the NIM LPARs. This immediately raised a red flag in my mind, because I recall a Nigel Griffiths presentation where he said — and I paraphrase — oh no, never, ever, ever, ever install a NIM server and Systems Director together on the same LPAR running AIX. Really he probably just said this isn’t a good idea.

So I contacted Nigel, and my questions and his responses became the subject of this blog post:

“I have cut down the questions a bit but it is two parts: His customer is thinking about putting Systems Director on to their NIM server. Rob remembered me commenting on this but wants the details. They are planning to give it one dedicated POWER7 core and 12G memory. What do I think about that?

“Two years ago this combination (NIM & ISD) was not allowed (I think it was just not tested so not supported rather than it being a problem), but now is OK. So you can find older [web] pages with duff info.

“However, I do NOT recommend it. To get NIM to push out the very latest AIX version, the NIM server needs to be at that AIX level. But Systems Director may not be supported at that very new AIX level. Then you can’t get Systems Director support. This is a ‘will probably work but not supported’ risk that you have to decide [whether to take].

“Running a single dedicated CPU… will make Systems Director look and feel slow. With a dozen users the CPU use will go up and be a lot more peaky. With NIM it would not matter but Systems Director GUI would suffer and so would the user. Personally a dedicated CPU for NIM is pretty dumb — that CPU could be used elsewhere most of the time.”

I agree. A dedicated CPU is dumb when you could use dedicated donating or shared processor pools. However, in this case, I wasn’t involved in the design of the server. I was only asked by the customer if I could make it work.

My gut feeling was that mixing these workloads was a bad idea, and mixing them on only one dedicated core made it even worse. I certainly understand the customer not blindly taking me at my word, so that’s why I brought in the big guns — i.e., Nigel. Between his presentations, his videos and his all-around Power Systems knowledge, I knew he was the right person to ask. And I’m grateful for his fast response.

No matter how experienced and accomplished you are, it never hurts to have someone you can go to who can give you an answer or validate your own course of action. Who do you have in your corner when you get stuck?

IBM Expands the POWER7+ Server Family

Edit: The links still work as of the time of this writing.

Originally posted February 5, 2013 on AIXchange

After unveiling the first POWER7+ machines in October, IBM is now adding more servers to the POWER7+ family: a new model, the 760, along with refreshed 710, 720, 730, 740 and 750 machines. The new lineup also features POWER7+ chips going into the PowerLinux 7R1, 7R2 and new PureFlex nodes based on the POWER7+ processor.

The refreshed 710, 720, 730, 740, 7R1 and 7R2 machines are set for Feb. 20 general availability. The refreshed 750 and the new 760 will GA on March 15.

The information that follows is gleaned from my participation in various IBM-hosted pre-announcement training sessions. Prior to IBM announcements, business partners are invited to attend sessions that cover the details of the announcement. IBM Power Champions also receive access to additional pre-announcement sessions. Sessions conducted over the past few weeks have covered the different operating systems that run on Power Systems servers, as well as all of the new hardware that is being announced today. Because this information is embargoed, those of us who take part in training sessions agree (by signing nondisclosure agreements) to not discuss what we learn prior to the announcement date.

One point of emphasis with today’s announcement is that the new technology should deliver performance improvements across the product family. (Of course the amount of improvement varies, based on the model chosen and the workload running on it.) IBM also highlighted the pricing changes on the 710 and 730 models, which are supposed to be comparable to the pricing we might expect to see on 2U x86 servers.

As these announcements continue to roll out, it’s worth noting that IBM consistently sticks to its schedule when introducing new products. Not every technology vendor is so reliable. Also keep in mind the big picture. We had POWER5, then POWER6 arrived a few years later. Now we have POWER7. It doesn’t take a genius to figure out that the next versions of POWER processors are being developed as we speak, and that future generations are already in the planning stages. IBM continues to demonstrate its commitment to the platform.

The same holds true for the operating systems. Both IBM i and AIX versions have been updated every few years, with additional functionality delivered via service packs and technology levels. I don’t see this development effort slowing any in the coming years.

POWER7+: The Specs 

As was announced last fall, POWER7+ has 10 MB of L3 cache per core. Its memory compression engines allow for less overhead with active memory expansion, and the chip contains onboard encryption engines and random number generators.

An exciting feature of the POWER7+ machines is their capability to double the maximum number of LPARs on a frame by allowing you to allocate 0.05 of a core to an LPAR. In the “old days,” with a hypothetical 1-core machine, you were limited to 10 LPARs, as each LPAR could only be assigned a minimum of 0.10 of a CPU. Now you can take your 1-core machine to 20 LPARs, with each assigned a minimum of 0.05 of a CPU. This effectively doubles the number of LPARs you can have on POWER7+ machines versus POWER7 machines. Obviously, the higher limits on memory per frame mean you can do more serious workload consolidation.

The specs for these servers are pretty impressive:

  • The 710 is a 2U server with 4-, 6- or 8-core options and up to 256 GB of memory. It has five PCIe Gen2 slots and can support 160 LPARs.
  • The 720 is a 4U server with 4-, 6- or 8-core options and up to 512 GB of memory. It has five PCIe Gen2 regular height slots and four PCIe half height cards, and can support up to 160 LPARs.
  • The 730 is a 2U server that supports 4, 6 or 8 cores per socket, for 16 cores total. It can have up to 512 GB of memory, five PCIe Gen2 slots, and can support up to 320 LPARs.
  • The 740 is a 4U 2-socket server with 6 or 8 cores per socket, for up to 16 cores total. It can have up to 1 TB of memory, five regular height PCIe Gen2 slots and four half height PCIe Gen2 slots. This machine can support up to 320 LPARs.
  • The 750 is a 5U server with four sockets. With all four sockets populated, it’s a 32-core machine with speeds of 3.5 GHz or 4 GHz. It has up to 1 TB of memory, six PCIe Gen2 slots and two GX++ slots, and an integrated split backplane. This machine can support up to 640 LPARs. It can be managed by either IVM or an HMC. It comes with 3 years of 24×7 maintenance coverage.
  • The 760 is also a 5U server with four sockets. If fully populated with six cores per socket, it can have up to 48 cores running at 3.1GHz or 3.4GHz. It has up to 2 TB of memory, six PCIe Gen2 slots and two GX++ slots, and an integrated split backplane. This machine can support up to 960 LPARs. It must be managed by an HMC. It allows for Capacity on Demand for processors. This machine comes with three years of 24×7 maintenance coverage. Unlike the other models being announced, IBM must install this machine. The others of course can be set up by customers.

In the education I attended, IBM said the new 750 and 760 servers offer enterprise system features at express system pricing.

As a reminder, the 770 and 780 can have 4 TB and the 795 can have 16 TB of memory. In the training sessions it was often mentioned that even greater amounts of memory are available to system LPARs through the use of active memory expansion.

More Announcement News

Along with the new hardware, there are new versions of VIOS and AIX software. In addition, IBM has released a statement of direction that points to future support of AIX 5.3 with a service pack on the new servers. Service packs AIX 6.1 TL7 SP7 and AIX 6.1 TL8 SP2 will support these new servers.

Another statement of direction notes future availability of a service pack for AIX 7.1 TL0 and TL1, with a SP2 available for AIX 7.1 TL2. VIOS will require 2.2.2 to run on the new hardware.

IBM is also announcing 2-port 16 Gb Fibre Channel adapters — a 4-port adapter with two ports of 10 Gb FCoE and two ports of 1 GbE. There’s also an enhanced integrated multifunction card where the RJ45 ports are capable of running at 10 Gb. 387 GB SSD 6- and 4-pack options will be available with new server orders.

Finally, there are announcements around IBM AIX Solution Edition for Cognos on Power and IBM AIX Solution Edition for SPSS on Power.

With this announcement, the entire range of the product family (save for the 795, a POWER7 model) is ready to run POWER7+ chips. Which server are you most excited to run in your environment? What features are you looking forward to seeing in action?

IBM hosted an announcement webcast this morning. You can also view Nigel Griffith’s announcement video and Jay Kruemcke’s announcement blog.

Another Source of AIX Info

Edit: Only the last link still works now that developerworks went away.

Originally posted January 29, 2013 on AIXchange

To keep up on IBM Power Systems, I rely on various resources. I’m a long-time reader of Nigel Griffiths’ AIXpertAnthony English’s AIXDownUnder and Chris Gibson’s AIX blog, and I follow a number of folks on Twitter. @IBMRedbooks, @mr_nmon, @chromeaix, @POWERHAguy, @cgibbo, @aixdownunder are just a few “twitterers” who provide good insights and links.Twitter is also where I discovered Brian Smith. Brian is another IBM developerWorks blogger who produces tons of valuable information. For instance, check out his post on using uuencode to embed binaries in scripts and copy/paste transfer files between servers:

            “If you have a relatively small binary file that you need to transfer between servers, you can easily transfer it by copying and pasting it using uuencode/uudecode. This can be a time saver in some circumstances. It also might be helpful if you have a server that isn’t connected to the network but for which you can get a console on through something like the HMC.

            “In this example, we will copy and paste the /usr/bin/ls binary between servers.

            On the source server, type:

                uuencode /usr/bin/ls /dev/stdout

            Then copy all of the output in to the clipboard.

            On the destination server, type:

                uudecode -o /tmp/ls

            “Then press enter, and then paste in the uuencode output from the source server. The copy/pasted ls binary will be saved to /tmp/ls. You can verify the source and destination ls files are identical by comparing the checksum of the files with the csum command. “

Brian also writes about scripting importvg on AIX:

            “There are some situations where you need to exportvg volume groups, and then reimport them  This often occurs when doing disk migrations between servers. The usual routine is to record which PVIDs go with which volume groups, and when you need to import the volume groups again run an importvg and specify the correct volume group name with the hdisk that has the matching PVID. You generally can’t rely on the hdisk name/number because it might be numbered differently.

            “To make this easier, I wrote a small script that automates this process. …”

These are great tips and tricks. I could easily highlight a dozen of Brian’s posts, but I’ll limit myself to three more:

Version 1.0 RC1 of prdiff released

            “[prdiff] is the tool that will compare your LPAR running configuration with the profile and report differences. For more info and to download, go here.

            “This can come in handy if you are not certain that your running profile has been modified with DLPAR operations without also modifying the script definition on the HMC.”

Reset padmin VIO password from the HMC with zero downtime

            “Here is a method you can use to reset a lost VIO padmin password from the HMC with zero downtime on the VIO server. This is a somewhat involved process, but much easier than having to take a downtime on the VIO server to change the password. This is a very challenging task because the viosvrcmd HMC command doesn’t allow the command run on the VIO server to have a pipe (“|”), or any redirection (“<“, “>”) and doesn’t allow for interactive input. So this rules out using something like “chpasswd” to change the password.”

New version of EZH (Easy HMC Command Line Interface) — Including interactive menu

            “For those of you not familiar with EZH, it is a script for the HMC that provides a very simple and easy to use interface to the HMC command line so that you can very quickly, efficiently, and easily complete day to day administration tasks from the command line. It is very easy to install and doesn’t require any modifications to the HMC (the script runs within the restricted HMC shell).

            “I released a new version today with many improvements, including support to easily DLPAR CPU, Virtual CPU, Memory, and VSCSI/VFC slots.

            “It also includes a new interactive menu that you can access by using the ezh command…. More information is available at http://ezh.sourceforge.net/

I’m always looking to add to my reading rotation, so if you have an AIX resource that I’ve overlooked, please let me know in comments.

Blockbuster Performance: Then and Now

Edit: Some links no longer work.

Originally posted January 22, 2013 on AIXchange

Jay Kruemcke recently posted this image on Twitter, and I love it.

The quotes are terrific: “In a world where so many things can go wrong, one machine made a difference” and “not one second of downtime in this thriller” are my favorites.

Given the references to both the IBM RS/6000 and the 1992 Academy Award winner for best picture, this was likely created sometime ago. (On the other hand, did they have Photoshop back then?) In any event, I’d love to learn any specifics about its origins. 

In some ways though, not that much has changed. Like its RS/6000 forerunner, POWER7 systems are renowned for their reliability. And how often is unplanned downtime an issue in your AIX environment? 

I don’t believe that the image Jay posted was ever part of an ad campaign, but IBM has marketed its systems and software systems in a number of distinct ways over the years. For instance, IBM once used its Fiesta Bowl sponsorship to push OS/2. Much more recently, IBM’s latest and greatest innovations are all over YouTube (herehere and here). And of course Watson’s appearance on “Jeopardy!” was a tremendous vehicle for raising awareness of and visibility for our modern platform.

As Watson moves from game shows to healthcare and the financial industry to dealing with natural language in computing, we will continue to see IBM market its solutions. 

But if it was your call, how would you promote IBM Power Systems and AIX and educate the public about their capabilities? It may seem like a superfluous question, but then again, some of the best marketing is word of mouth. Do you tell others about how your machine makes your job easier? Have you ever made your own ad? 

Many of us have questioned IBM marketing over time, but effectively getting the word out is a challenge, and now more than ever. So how would you tell the world that our hardware is better, and that our operating system is unparalleled? How would you explain virtualization, or the fact that 0.05 of a core can now be allocated to an LPAR in a customer’s environment? How can you quickly and easily communicate this information in a way that people, customers and the general public alike, understand? And where would you broadcast your message? TV ads? YouTube videos? Podcasts? Twitter?

Redbooks are Must-Reads

Edit: Still some gems on this list.

Originally posted January 15, 2013 on AIXchange

On this blog I often reference and recommend IBM Redbooks. Technology is constantly shifting and evolving, and with education budgets shrinking in many organizations, Redbooks can help you keep your skills up to date; or, if you’re new to the Power platform, they’re a great starting point.

I find that when I read a Redbook and then go try out the concepts on a test box, I’ll end up re-reading that Redbook. I just feel it’s the best way to learn. You can read every Redbook IBM puts out, but touching a keyboard and trying it and breaking things and seeing what typos might have slipped into the publication is how you transform reading material into practical knowledge.

I know some people don’t like IBM Redbooks. I’ve been told that they’re good sleeping aids. Others say they just don’t have the time. But I’ve read (and reread) Redbooks for years, and I know many others who read and learn from them as well.

Lists of AIX-themed Redbooks have been making the rounds on mailing lists and Twitter. I’ve read a lot of these publications and look forward to reading them all.

So here’s my list. It’s lengthy, but it’d be even longer had I included storage-related Redbooks. (Although storage pros should definitely check out those publications.)

If you’re looking for a quicker read, check out the IBM Redbooks Point of View publications. These are “brief, strategy-oriented documents that represent an author’s perspective on a particular technical topic. Written by senior IBM subject matter experts, the publications examine current industry trends, directions, and emerging technologies.”

Have I missed anything? If you’ve read any Redbooks that aren’t on this list, please add them in Comments.

A New Year, an Annual Highlight

Edit: Some links no longer work.

Originally posted January 8, 2013 on AIXchange

It’s a new year and I can tell you one thing I’m already looking forward to: The next IBM Power Systems Technical University conference.

I’ve written about this annual event often over the years. It just always seems to energize me. I get to go and be surrounded by people who know what Power Systems are and can relate to what I do in my day to day job. They understand VIO servers. They understand AIX and IBM i. It’s a refreshing change from social situations where people ask me what I do, and I tell them — and they have no idea what I’m talking about. Often I end up just saying “I’m in IT” or “I work with computers.” But at the Technical University, everyone gets it. We all understand how we make our livings.

The Technical University typically convenes in late October. At the 2012 event, three new Power Champions — Ron Gordon, Terry Keene and Andy Lin — were announced at the keynote presentation. They were introduced, and then the previous Power Champions (including me) got to stand and be recognized (see here. See also here for photos of the conference in general).

I admit, it is nice to be singled out this way. Being recognized, people came up and talked to me for the rest of the week. I enjoy hearing from other attendees. First of all, it’s just nice to put a face with a name. Oftentimes I’ll meet people whose work I’ve read on blogs or seen and heard in presentations and seminars that I’ve found valuable. Or maybe we’ve simply exchanged emails over time. I believe meeting people in “real life” enhances the relationship, making each person more invested in helping the other.

In addition, those I encounter at Technical University and other conferences often give me ideas for articles and blog posts. The general support and encouragement I’ve received face to face from readers is also greatly appreciated.

What makes Technical University stand out though is the opportunity to meet and interact with experts — including many key IBM technologists — in the Power Systems ecosystem. There simply isn’t a better place to ask a question and get an immediate, informed answer.

I mention Technical University now because if you would like to attend this year’s event — it’s Oct. 21-25 in Orlando — now is the time to plan for it. If you’ve never gone before, no sweat: Each year they ask for a show of hands of those who are there for the first time, and at each conference more than half the attendees in the room have their hands raised. The conference is growing, because it’s a great event — and also because the platform is growing. New customers are migrating to it.

In many companies, 2013 budgets will be set over the next few weeks. I encourage you to make your case to attend IBM Technical University. Attendees come from around the U.S. as well as from overseas. And, as I said, first-timers come in droves every year. You may even be able to qualify for a free voucher from IBM to attend the conference — check with your IBM rep or business partner.

Technical University is one event I look forward to every year. I hope you can experience it for yourself. I believe you’ll find it as valuable as I do.

Two Ways to Measure Network Performance

Edit: Some links no longer work, although the methods still should assuming ftp ports are open.

Originally posted December 18, 2012 on AIXchange

Note: The next update for this blog will be Jan. 8.

I was forwarded a newsletter that contained a piece on measuring network bandwidth. I’m sharing it here with the permission of the author, IBMer Doug Herman. Doug says he compiled his information from a Steve Knudson presentation at a recent IBM Champions event.

Although I’ve used both of these methods, I hadn’t previously covered them in the blog.

From the newsletter:

            Recently I had a situation where we were being told that network performance was unacceptable from one site to another across a high speed WAN link. After using the ftp method described below, we were able to show that the network speeds were not working as expected across the WAN. It turned out that there was a routing issue with our VLAN; the network admins had it going over a much slower link than the one everyone thought we were using. Once they made the change, it worked as expected.

Two Methods for Measuring Network Bandwidth (subhead)

First Method – FTP

This test is from AIX 5L Practical Performance Tools & Tuning Guide.

ftp> bin

200 Type set to I.

ftp> put “|dd if=/dev/zero bs=8k count 1000000” /dev/null

200 PORT command successful.

150 Opening data connection for /dev/null

1000000+0 records in.

1000000+0 records out.

226 Transfer complete.

8192000000 bytes sent in 70.43 seconds (1.136e+05 Kbytes/s)

Local: |dd if=/dev/zero bs=8k count=1000000 remote: /dev/null

—————————————————————————————-

Second Method – iperf (download)

Server LPAR – “systemX”

> rpm –ivh iperf-2.0.5-1.aix5.1.ppc.rpm

> iperf –s

————————————————————

Server listening on TCP port 5001

TCP window size: 256 KByte (default)

————————————————————

Client LPAR – “systemY”

> rpm –ivh iperf-2.0.5-1.aix5.1.ppc.rpm

> iperf –c systemX

————————————————————

Client connecting to systemX, TCP port 5001

TCP window size: 64.2 KByte (default)

————————————————————

[ 3] local 10.1.1.100 port 55707 connected with 10.1.1.222 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 384 KBytes 314 Kbits/sec

Client LPAR – using three parallel threads

> iperf –c systemX –P3

Client connecting to dettc005, TCP port 5001

TCP window size: 128 KByte (default)

————————————————————

[ 5] local 10.1.1.222 port 37477 connected with 10.1.1.100 port 5001

[ 3] local 10.1.1.222 port 37475 connected with 10.1.1.100 port 5001

[ 4] local 10.1.1.222 port 37476 connected with 10.1.1.100 port 5001

[ ID] Interval Transfer Bandwidth

[ 4] 0.0-10.0 sec 1.20 MBytes 1.00 Mbits/sec

[ 3] 0.0-10.1 sec 2.25 MBytes 1.86 Mbits/sec

[ 5] 0.0-14.9 sec 256 KBytes 141 Kbits/sec

[SUM] 0.0-14.9 sec 3.70 MBytes 2.08 Mbits/sec

(Note: This was originally published in the Power Systems newsletter. It’s produced quarterly and is available to non-IBMers. This Nigel Griffiths post provides some details and tells you how to subscribe.)

Have you used either method, or do you have another way of measuring network bandwidth? Please share your thoughts in Comments.

Configuring Cluster Notifications

Edit: Link no longer works.

Originally posted December 10, 2012 on AIXchange

A customer running a PowerHA 7.1.1.2 cluster wanted to be notified when nodes were down and when the resource group moved in their cluster.

Check here and you’ll find details about configuring a custom remote notification method:

“These topics describe how to configure custom remote notification methods to respond to an event, how cluster verification confirms the remote notification configuration, and how node failure affects the remote notification method.

“You can configure a remote notification method through SMIT to issue a customized numeric or alphanumeric page in response to a specified cluster event. You can also send SMS text message notifications to any address, including a cell phone SMS address or mail to an email address. The pager message is sent through the attached dialer modem. Cell phone text messages are sent through email using the TCP/IP connection or an attached GSM wireless modem.

“You can send the following custom remote notifications:

  • Numeric and alphanumeric page
  • SMS text message to any address including a cell phone or mail to an email address
  • SMS text message using a GSM modem to transmit the notification through a wireless connection

“The PowerHA SystemMirror remote notification functionality requirements follow:

  • A tty port used for paging cannot also be used for heartbeat traffic
  • Any tty port specified must be defined to AIX and must be available
  • Each node that might send a page or text messages must have an appropriate modem installed and enabled

“Note: PowerHA SystemMirror checks the availability of the tty port when the notification method is configured and before a page is issued. Modem status is not checked.

“To send an SMS text message over the dialer modem, your pager provider must offer this service.

  • Each node that might send email messages from the SMIT panel using the AIX operating system mail must have a TCP/IP connection to the Internet
  • Each node that might send text messages to a cell phone must have an appropriate Hayes-compatible dialer modem installed and enabled

Each node that might transmit an SMS message wirelessly must have a Falcom-compatible GSM modem installed in the RS232 port with the password disabled. Ensure that the modem connects to the cell phone system.”

My customer just wanted to receive email notifications, but I still had to make sure that I had a tty device defined on the node. I used smitty (smitty sysmirror) to access the PowerHA menus. From there, I selected: Custom Cluster Configuration > Events > Cluster Events > Remote Notification Methods.

I had to select the Configure a Node/Port Pair option. Defining a port would make sense if I was connected to a modem, but it was a needless endeavor in this case, since, as noted, my customer was only interested in enabling email notifications. Hopefully node/port pair configuration will be optional in future PowerHA SystemMirror releases.

In any event, in this screen I chose the node and port. Then I selected the Add a Custom Remote Notification Method option. In these fields I entered a name and the nodenames in the cluster. In the Number to Dial field I entered the email address that would receive the notifications. Then I chose the cluster events for which notifications were desired: rg_move, node_up and node_down. PowerHA users can choose numerous other events, however.

Once it was set up, I verified and synchronized the cluster. Then I ran the Send a Test Remote Notification menu option to make sure it worked. It did.

While I could have done pre- and post-event commands along with some scripting, I felt that using the remote notification method was the better way to go.

The final test came once the cluster was running. We moved the resource group from one node to the other. The notification worked as expected and we got the email notification we wanted.

Have you set up something like this in your HA environment?

Command Line Shortcuts

Edit: An oldie but a goodie.

Originally posted December 4, 2012 on AIXchange

What are your favorite scripting command line shortcuts? When I have a relatively small pile of repetitive things to do, I like to create a for loop, such as:

#for i in 0 1 2 3

>do

>lscfg -vl fcs$i | grep Net

>done

In this case I can easily get my WWPNs from my fibre cards.

If you’ve already run set –o vi and you recall your command history with esc-k, you might end up with something like this on your command line, ready for you to rerun:

#for i in 0 1 2 3^Jdo^Jlscfg -vl fcs$i | grep Net^Jdone

Though it’s certainly easy enough to go back in and edit that directly on the command line using normal vi keys, sometimes with the ^J characters and the lack of spacing — especially if it’s a long command that wraps around on the command line — it can be easier to enter v somewhere on that line and pull yourself into a vi editor session. That makes it easier to work on the command in question:

for i in 0 1 2 3

do

lscfg -vl fcs$i | grep Net

done

When you’re done with your edits, just save out of vi as you normally would, and the command that you put together will run as if it had been edited on the command line.

Here’s another loop I sometimes use:

while (true)

do

df

sleep 5

done

This basically runs the df command every 5 seconds. It will do so forever.

One way to easily remove all hdisks from a system is to run:

for x in `lsdev -Cc disk|grep hdisk|awk ‘{print $1}’`

do

rmdev -dl $x

done

Obviously, it will not rmdev disks that are in use, but I find that on systems with hundreds of hdisks that I want to manipulate for some reason, this can be a handy way to do some cleanup.

Otherwise, if you had some lists of values that you needed to loop on — say you need to delete hdisk12-25 — you could first run:

x=12 to set $x equal to 12, then you could run either

>while [ $x -le 25 ]

> do

> rmdev -dl $x

> ((x=$x+1))

> done

or

while (($x<=25)); do

> rmdev -dl $x

> let x=$x+1

> done

What other simple things do you run on the command line to make your job easier? Please share your tips in Comments.

A Case of Extreme Uptime

Edit: They tell us this should not be a badge of honor because it just means you are running unpatched machines, but I still think it is interesting. I was surprised that the links still work.

Originally posted November 27, 2012 on AIXchange

Ongoing maintenance of our machines is important. You should schedule change windows and make sure servers have the latest firmware and OS patches. Performing regular maintenance is the simplest way to avoid security vulnerabilities. Keeping current on fixes can save you from calling IBM Support; often their first response to a question is to tell you that your issue has been resolved in an already released service pack.

That said, AIX systems provides us with world-class technology. These machine are capable of running for a very long time without any care. And a few do.

A friend recently forwarded this email concerning one of his AIX machines:

            I have a production server that was here when I started in 1999. It was last booted on Jan. 14, 2000, almost 13 years ago…

            It was renamed after applications were migrated off of the server two weeks ago. It is now going to be used as a DR box. As you can see below, it was up 4,675 days before I rebooted it this morning. And yes, it came up just fine.

            # oslevel -r

            4330-11

            # uptime

            09:35AM   up 4675 days,   2:21, 2 users, load average: 1.22, 1.29, 1.28

This is, of course, first and foremost a tribute to the quality of AIX systems. However, a not insignificant amount of good fortune is also involved. This box ran continuously for almost 13 years. Power outages were never an issue. Any hardware issues were resolved through hot swapping. No one accidentally logged into this production server and accidentally ran a shutdown –Fr. The firewall that this box must have operated behind kept it safe from constant attacks.

I was impressed to hear of a production AIX server running for this amount of time without even a reboot. I imagine there are systems that have been up even longer, though I couldn’t find anything specific. If you’d care to do your own research, there are threads devoted to this sort of thing. See here, here, here, here and here.

Frankly, I wouldn’t recommend treating a machine this way. I always want to be sure I’m running a supported operating system with the latest fixes. Still, these types of stories surface every now and again, maybe you have your own. What’s the longest-running production system that you know of? What were the circumstances? Please share your anecdotes in Comments.

At an IT Conference, a Glimpse of Life Outside of IT

Edit: Some links no longer work. I try to mention a little bit more these days.

Originally posted November 20, 2012 on AIXchange

Last month I attended the IBM Power Systems Technical University. I was part of one session that featured IBM executives, IBM employees, and IBM Power Champions discussing different issues around the Power systems ecosystem.

First we went around the room and introduced ourselves. Now, most of us have a quick “elevator pitch” to explain who we are, what we do and what makes us such interesting and wonderful people. Being a consultant, these introductions can be important. For me, displaying the right combination of credibility and likability in these instances can help open doors and spur the people that I talk with to invite me into their organizations to help them make decisions involving their computing environments.

In the session I mentioned something about starting out on the AS/400 and working with that system for 10 years. I talked about my past employment with IBM and working on AIX, and noted my current position with Meridian IT. I added that I’m a Certified Advanced Technical Expert (CATE), a Red Hat Certified Engineer (RHCE) and an all-around swell person.

I didn’t bring up any hobbies or anything I do and enjoy outside of work. I didn’t say where I was from, why I love living there or where I plan to go for my next holiday. I didn’t mention my dreams or aspirations.

Would anyone have been interested if I had? Perhaps so. At this same conference, a lot of us were talking about the keynote session, because of the speech given by Jeff Jonas.

Read his bio, and you’ll see that Jeff is chief scientist of the IBM Entity Analytics group and an IBM Distinguished Engineer. He has many impressive professional accomplishments, and he spoke of his work experiences. When he was introduced, it was mentioned that he has participated in numerous Ironman triathalons over the years.

His material was excellent. Through his storytelling and use of humor, he simplified the technical concepts. His style of presentation was a pleasant change from what many of us have come to expect from technologists at conferences like these. Walking out of that room, you felt like you really understood the projects he was working on. But others I talked to who were there found his personal story just as memorable.

When you think about it, you probably know several people who are really passionate about a sport, a hobby or a subject. I know someone who competes at a very high level in bowling. I’ve met people who enjoy flying planes, who are martial arts experts, who collect and shoot firearms. I know people who run marathons, and people who love sailing. And every single one of these people works in IT.

That’s the thing. Our jobs are important to us, and a lot of us are very passionate about what we do. Still, we’re not defined by our jobs. The things we do outside of work are an even bigger part of our identities.

As they say, when you’re on your deathbed, you won’t be wishing you’d spent more time at the office, but you might be regretting that you didn’t spend more time sharing with your loved ones and pursuing your passions.

Working in IT is just one of many things that makes us who we are. Although we may enjoy our time in the technology field, many of us do impressive things outside of work. When you introduce yourself, do you stick to the resume, or do you also bring up your other interests? Perhaps I should revisit my elevator pitch.

Do You Need the Speed?

Edit: 4G rules the roost but 5G is on the way.

Originally posted November 13, 2012 on AIXchange

With today’s phones, 4G is the fastest. But, all things considered, is the fastest speed automatically the best option? I’ve wondered about that for awhile. More recently, I noticed a writer in Europe — which is, of course, another world as far as cell phone service and providers goes — expressing a similar sentiment:

“4G can do more with the radio spectrum than 3G, but this cleverness comes at the a cost: it requires much more processing power to cope with the surge in data and the electronics will draw more current. This is straightforward physics and – even if mobile networks had no legacy baggage — a 4G network would deplete your battery faster than 3G. The technology in the handset will improve and become more efficient, but that’s no use to us here and now.

“The question you then have to ask is — do you really really need that extra speed? When HSDPA+ has proved more than adequate? Personally, I’m struggling to think of applications where I’m prepared to trade off weight and power drain against that speed. If you’re only ever sat beneath a 4G network mast, and with a briefcase full of power chargers, the question may be a moot.”

In a follow-on post, he added:

“The first is that 3G has far more life in it than we thought. Aware that they’ll be getting a marketing pounding from EE, which has an exclusive on LTE in the UK for some months, rival 3G operators have quietly been upgrading to the latest and much faster version of 3G. The latest flavour of 3G, dual-channel HSPA+, delivers quite amazing speeds on Three’s network.

“My personal choice is for near-4G data speed when I need it and a phone that lasts all day, as opposed to 4G speed and a phone that craps out towards the end of a long lunch while rinsing me of all my cash.”

Though I don’t disagree with the author’s point, I’ve found I need the speed. With all the travel I do, I need a fast, reliable network connection. Here in the U.S., I never saw decent 3G speeds on any handset that I tried, so 4G LTE is the only choice for me. For me 4G is like having a cable modem in my pocket. Hotel Wi-Fi is frequently heavily saturated and barely usable, especially at night when everyone’s back in their rooms, trying to access the hotel’s network. Client sites can be hit or miss as far as external network access. There’s also the issue of restrictions on Internet use. More than once I couldn’t access Google Search due to a client’s internal filters. If you’ve read this blog for awhile, you know that I often turn to Google when I encounter a technical issue I haven’t seen previously.

Thankfully, I can use my phone as a mobile hotspot. This provides me with fast, reliable network access when I’m at the hotel, the airport or a customer site. Of course, the author is correct about 4G and battery life. Even with a 3200 mAh extended battery, your phone will drain in a hurry if you run it as a hotspot for any length of time. I don’t know of any mobile worker who walks around with a charger plugged in all day, and swapping batteries and using external battery packs aren’t elegant solutions either.

The point is, for average daily usage — a few calls, texts, emails and file transfers — 4G is great, but perhaps speed isn’t the primary consideration for most users. Certainly I rely on 4G, but cost and access are still important to me, even if I have to make them less of a priority. I have an unlimited data plan — or so my provider tells me. I find though that after some arbitrary amount of data has been used each month, my speeds get throttled.

Like every user, I’d love unlimited everything: minutes, texts and data, along with the fatest speed available and acceptable battery life at a reasonable price. 4G’s cleverness may come at a cost, but at this point, I feel I have no choice but to pay that fare.

So where does 4G rate with you? Are there other ways to get onto the network I should consider?

Running cldump on a Cluster

Edit: Hopefully nobody runs into this error these days.

Originally posted November 6, 2012 on AIXchange

I was recently asked why the cldump command wasn’t running on a PowerHA 7.1 cluster.

After running /usr/es/sbin/cluster/utilities/cldump, my client received this output:

            cldump: Waiting for the Cluster SMUX peer (clstrmgrES)
            to stabilize………….
            Failed retrieving cluster information.

            There are a number of possible causes:
            clinfoES or snmpd subsystems are not active.
            snmp is unresponsive.
            snmp is not configured correctly.
            Cluster services are not active on any nodes.

            Refer to the HACMP Administration Guide for more information.

I checked and learned that IBM has been scaling back the default SNMP configuration over the years for security reasons. However, this issue is relatively easy to address:

            1) edit /etc/snmpv3.conf (all nodes) and remove the comment hash from this line:

            #COMMUNITY public    public     noAuthNoPriv 0.0.0.0    0.0.0.0         –

            2) add this line (this is the top-level cluster view of the SNMP MIB):

            VACM_VIEW        defaultView     1.3.6.1.4.1.2.3.1.2.1.5 – included –

            3) restart the relevant daemons (this can be done without stopping cluster services):

            stopsrc -s clinfoES
            stopsrc -s snmpd
            stopsrc -s aixmibd
            stopsrc -s hostmibd
            stopsrc -s snmpmibd
            sleep 10
            startsrc -s snmpd
            startsrc -s aixmibd
            startsrc -s hostmibd
            startsrc -s snmpmibd
            sleep 60
            startsrc -s clinfoES

After these changes, cldump was working. 

We also found warning messages when we started cluster services or tried to synchronize the cluster:

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea1 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea2 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea1 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea2 does not support fast disk takeover

I called support and was told that this was addressed by IV26874. We were also provided with an iFix, which, once loaded, took care of the problem. So if you see the warning, contact IBM and get the iFix (if it isn’t yet available via service pack.)

Incidentally, neither of these issues was a show-stopper in my client’s environment. I continue to be very impressed by PowerHA 7.1.

Training on PowerHA

Edit: Some links no longer work.

Originally posted October 30, 2012 on AIXchange

In its Oct. 3 announcements, IBM noted that the new PowerHA SystemMirror 7.1 Enterprise edition will go GA on Nov. 9. Since I recently took some IBM training on this product, I’d like to tell you more about it.

First, understand that PowerHA is designed to provide mission-critical application availability through planned and unplanned outage events. True to its name, the enterprise edition is aimed toward multisite configurations, while PowerHA SystemMirror standard edition offers access to the normal capabilities of a local PowerHA cluster.

There are two options for multisite clusters: stretched and linked. A stretched cluster utilizes a single repository disk and occupies a single communications network that can extend for shorter distances. For a real-world example, think of a storage subsystem using GLVM that covers a college campus. The stretched cluster communicates with the nodes in the cluster using multicast.

A linked cluster is two sites in two different networks that are linked together. Distance is not an issue — the sites can be cross campus or cross country. A linked cluster utilizes two separate repository disks instead of the shared repository disk that’s used in a stretched cluster. While cluster-wide AIX commands can be used with both stretched and linked clusters, linked clusters use unicast communications.

HyperSwap provides for a multisite PowerHA cluster with continuous storage availability. With HyperSwap, applications keep running in the event of a storage outage, and the storage is kept in sync via Metro Mirror. Storage maintenance and storage migration can be performed without downtime. However, due to the specialized code involved, non-IBM storage products are not supported. IBM DS8000 storage systems must be on both sides of the HyperSwap solution. 

In the training session I attended, IBM emphasized the tighter integration that now exists between PowerHA and AIX. This is in large part due to Cluster Aware AIX. PowerHA 6.1 used traditional communication-based heartbeats, along with “user space” event processing and rsct topology management. The PowerHA 7.1 architecture features multicast communications, SAN communications, a repository disk heartbeat and kernel-based event processing. It becomes harder to have a cluster become “split brained” or partitioned due to the changes to the topology and heartbeating. 

Other notes about the training session:

* IBM has made changes to the Systems Director plugin to simplify cluster creation. This allows you to use a GUI to create your cluster and access a multisite install wizard.

* Be sure to get the latest service packs for both AIX and PowerHA. Having the latest fixes always helps; it’s vital with PowerHA.

IBM is continuously investing in and improving PowerHA as well as planning future capabilities. As was noted in the training session, the product still allows you to use dynamic logical partitioning to grow your LPARs when needed. You can set up an LPAR with 4 CPUs on your primary node and 1 CPU on your failover node. When it’s time to swap roles, that failover LPAR is able to take on 4 CPUs dynamically. This can save you on software license fees for your backup nodes.

Having worked with it for awhile now, I’m impressed with PowerHA SystemMirror 7.1 Standard edition. I’m still amazed at how easy it is to set up. Have you worked with this version of the product yet? What have your experiences been so far? Are you looking forward to the new capabilities with Enterprise edition?

Computer Jargon: A Look Back

Edit: I still find this interesting, the file probably needs to be updated.

Originally posted October 23, 2012 on AIXchange

Years ago when I worked for IBM I read and enjoyed a file called the “IBM Jargon and General Computing Dictionary.” It seems to be making the rounds again, at least if recent emails and tweets I’ve seen are any indication.

The dictionary’s tenth edition, published back in 1990, is still preserved online. While terms like “back to back remote,” “brass tag” and “Charlie letter” are old school, many of these words and expressions hold up and are still in use today.

Here’s a bit of the editor’s introduction:

“… This edition follows the markup and format of the last (Ninth) edition, and has more than one hundred and seventy new entries (bringing the total to over fourteen hundred entries).

 “This is not only the tenth edition of the dictionary, but is also its tenth year; the first edition was compiled and distributed in 1980. At that time the use of jargon was on the increase, but I now observe that the quantity and use of jargon appears to be decreasing – perhaps as computing becomes less of a specialist discipline. Not only does this make my task as editor of the dictionary a little easier, but it might also imply that the computing industry is at last getting better at communicating with its customers!”

This resonates with me. Most people use computing devices in their daily lives now. People and their cell phones or Smartphones are basically inseparable. Twenty-some years ago, being the “computer guy” had very different connotations. He — and it was pretty much strictly “he” — was usually much more technical than the rest of humanity. He often had a different way of talking, and if you didn’t know the lingo and the acronyms, you had a tough time even understanding him.

More from the dictionary:

“The items in this dictionary have been selected from the huge vocabulary of computer-related terms used in IBM. To be included here, a word or phrase must either have originated in IBM, or (more commonly) its meaning or usage in IBM must be different from the usual. Acronyms and abbreviations are not included except where they are necessary for cross-references, or are used as true words in their own right (for example, “APAR”).

“This dictionary is intended both to inform and to entertain. Each entry has a definition, which is usually supplemented by an explanation and an example of usage. Formal etymologies are not included, since in most cases the etymology is either unknown or disputed. In many cases, a meaning or usage is so subtle or bizarre that a light treatment is more appropriate (and conveys the sense better) than an attempt to define the term formally. As a result, this compilation is not just a source of information but is also a window on the IBM culture, as reflected in its language.”

Unfortunately the dictionary is no longer updated, and this seems to be something of a trend. From what I can tell, the (non-IBM specific) Jargon File was last updated in 2003. Here’s version 4.4.7.

Has the language evolved so much that we no longer need reference materials to help us make sense of the computing world? If you work with Power Systems regularly, do terms like IVM, HMC, KVM, FSM, VIOS, APV and SEA need entries these days, or do you just know what all of these acronyms mean today? Are there others that drive you crazy when you hear them?

Twenty-some years from now, will people be trying to make sense of what we were talking about with our abbreviations and lingo? Will we still rack and stack servers, or will everything be in the cloud?

If the IBM jargon dictionary was still being maintained, which words and terms and abbreviations would you want to add to it? I recently came across a term that would be a natural fit. Ask me about it the next time you see me.

Do any of the the IBM jargon dictionary terms bring back good memories for you?

Dual HMCs and Interface Locking

Edit: This is still relevant.

Originally posted October 16, 2012 on AIXchange

A customer has two HMCs and wants to get them on the network, with both machines controlling the same set of servers. Maximum availability is a priority. The customer doesn’t want to risk any HMC downtime in their environment.

Chapter 8 in this Redbook explains how to set up dual HMCs:

“A dual HMC is a redundant Hardware Management Console (HMC) management system that provides flexibility and high availability. When two HMCs manage one system, they are peers, and each can be used to control the managed system. One HMC can manage multiple managed systems, and each managed system can have two HMCs.

“A redundant remote HMC configuration is very common. When customers have multiple sites or a disaster recovery site, they can use their second HMC in the configuration remotely over a switched network…  The second HMC can be local, or it can reside at a remote location. Each HMC must use a different IP subnet.

“You need to consider the following points:

* Because authorized users can be defined independently for each HMC, determine whether the users of one HMC should be authorized on the other. If so, the user authorization must be set up separately on each HMC.

* Because both HMCs provide Service Focal Point and Service Agent functions, connect a modem and phone line to only one of the HMCs and enable its Service Agent. To prevent redundant service calls, do not enable the Service Agent on both HMCs.

* Perform software maintenance separately on each HMC, at separate times, so that there is no interruption in accessing HMC function. This allows one HMC to run at the new fix level, while the other HMC can continue to run at the previous fix level. However, the best practice is to upgrade both HMCs to the same fix level as soon as possible.

“The basic design of HMC eliminates the possible operation conflicts issued from two HMCs in the redundant HMC configuration. A locking mechanism provided by the service processor allows interoperation in a parallel environment. This allows an HMC to temporarily take exclusive control of the interface, effectively locking out the other HMC. Usually, this locking is held only for the short duration of time it takes to complete an operation, after which the interface is available for further commands.

“Both HMCs are automatically notified of any changes that occur in the managed systems, so the results of commands issued by one HMC are visible in the other. For example, if you choose to activate a partition from one HMC, you will observe the partition going to the Starting and Running states on both HMCs. The locking between HMCs does not prevent users from running commands that might seem to be in conflict with each other. For example, if the user on one HMC activates a partition, and a short time later a user on the other HMC selects to power the system off, the system will turn off. Effectively, any sequence of commands that you can do from a single HMC is also permitted when it comes from redundant HMCs.

“For this reason, it is important to consider carefully how to use this redundant capability to avoid such conflicts. You might choose to use them in a primary and backup role, even though the HMCs are not restricted in that way. The interface locking between two HMCs is automatic, usually of short duration, and most console operations wait for the lock to release without requiring user intervention.”

Although I typically see dual HMCs in larger enterprises, size isn’t a factor. Any type of environment can benefit from this configuration option.

A Cluster of Cluster Resources

Edit: Some links no longer work.

Originally posted October 9, 2012 on AIXchange

I don’t know who at IBM developerWorks wrote this document, but I really like it. By following along with the sections as outlined, you’ll learn how to define and configure PowerHA SystemMirror for AIX.

The first section includes references to a good introductory article, while section two focuses on infrastructure planning and configuration and section three has an IBM Information Center document on smart assists. The final three sections cover networks, resources and resource groups, creating a cluster and testing a configured cluster. Finally, there’s a cheat sheet from Christian Pruett and information about PowerHA training.

Information from the opening chapters of this Redbook is noted throughout the document, plus there are links to presentations, including a couple from IBMer Alex Abderrazag. I appreciate the inclusion of training information and really like how the document is organized overall.

Here’s another resource for PowerHA users: the new IBM draft Redbook, “IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update.” This was just released in September.

Just a quick point about SystemMirror: It runs a feature under the covers called cluster-aware AIX, which is integral to managing shared stored pools in VIOS. I’ll write more about this soon.

As for the SystemMirror Redbook itself, one thing that stands out to me is the step by step instructions for setting up a cluster in Chapter 3. There’s also this, which, quite honestly, made me chuckle:

“During the developing of this book, the repeating question was: what is the recommended virtual Ethernet configuration? The authors all had their own opinion, and there were many long debates on this topic. Finally we agreed that there is no specific or recommended virtual Ethernet configuration because all redundant configurations should work well in a PowerHA environment.”

To me this paragraph nicely sums up our profession. We all have strong beliefs about how to configure systems, and we’re often pretty vocal in pointing out the distinct advantages of our own particular way of doing things.

And it turns out that, despite their disclaimer, the authors managed to get together and settle on these recommendations for configuring virtual Ethernet:

* “Two Virtual IO servers per physical server.
* Use the servers’ already configured Virtual Ethernet settings because no special modification is required. In case of a VLAN tagged network, the preferred solution is to use SEA failover, otherwise use the network interface backup.
* One client side virtual Ethernet interface simplifies the configuration; however, PowerHA misses network events. (This can be remedied by applying APAR IV14422 and configuring your /usr/es/sbin/cluster/netmon.cf file as described in section 3.8.2)

* Two virtual Ethernet interfaces on the cluster LPAR because this enables PowerHA to receive the network events. This results in a more stable cluster.”

I do encourage you to check out all of these resources. With freely available information like this, learning about building a PowerHA cluster is easier than ever.

Reminder: IBM had a big announcement last week featuring POWER7+ hardware and software. I covered the many new solutions and features in this special post.

POWER7+ Systems Unveiled

Edit: Some links no longer work.

Originally posted October 3, 2012 on AIXchange

If you’re planning to upgrade your enterprise Power hardware in the near future, at this point you should focus on IBM’s POWER7+ systems.

On Wednesday IBM announced new versions of its enterprise Power Systems models, along with new software: AIX 7.1 TL2, AIX 6.1 TL8, IBM i 7.1TR5, Linux RHEL 6.3, SLES 11SP2 and PowerVM 2.2.2.

General availability for the software is slated for Oct. 12. The 770 and 780 hardware GA is Oct. 19. IBM i 6.1.1 support of POWER7+ on the 770 and 780 is expected Nov. 9. GA for model upgrades to the POWER7+ 770 and 780 — along with new firmware for the Power 795 — is Nov. 16. Other AIX 7.1 and 6.1 TL levels and VIOS 2.2.1.5 support for POWER7+ 770 and 780 are expected on Dec. 19.

Featuring a more densely packaged chip that gives off less heat and uses less power, POWER7+ systems offer 20-40 percent more performance per core. Another way to consider the progression is through these additional numbers from IBM: POWER7+ performance per watt can be up to five times greater than what was offered with POWER6+, and more than 10 times that of POWER5+.

The L3 cache size has more than doubled to 10 MB (vs. 4 MB in POWER7). POWER7+ processors run at a higher frequency, and include an on-board memory compression accelerator that allows active memory expansion (AME) to run with significantly reduced CPU overhead.

Model numbers have not changed with this announcement. We’re still talking about the Power 770 and 780, but now we’re looking at the “D” machine types in each family.

The new Power 770 with POWER7+ processors is the 9117-MMD. This server allows you to have up to 64 cores running at 3.8 GHz, or up to 48 cores running at 4.2 GHz. Comparatively, the POWER7 770 can run 64 cores at 3.3 GHz, or 48 cores at 3.7 GHz. As noted, the 9117-MMD allows up to 20 LPARs per core (up to 1,000 on the frame) with up to 16 concurrent live partition mobility operations. POWER7 systems support 10 LPARs per core and eight concurrent live partition mobility operations.

Those now using POWER6 570 9117-MMA, POWER7 770 9117-MMB and POWER7 9117-MMC systems will be able to upgrade to the new POWER7+ Model 770 (9117-MMD) system. From the announcement letter:

“You can upgrade the 9117-MMA, 9117-MMB, or 9117-MMC with 9117-MMD processors. For upgrades from 9117-MMA, 9117-MMB or 9117-MMC systems, IBM will install new CEC enclosures to replace your current CEC enclosure.”

The Power 780 also has the same 32 nm POWER7+ core with 10MB L3 cache per core. Where you could previously max out your 780 at 96 cores running at 3.44 GHz, or 64 cores running at 3.92 GHz, the 9179-MHD POWER7+ 780 can have a maximum of 128 cores running at 3.72 GHz, or 64 cores running at 4.42 GHz. As with the 770, the 780 can have 20 LPARs per core and run up to 16 concurrent live partition mobility operations. According to the information I saw, existing POWER6 and POWER6+ 570 9117-MMA and POWER7 780 9179-MHB and 9179-MHC systems can be upgraded to the new Power 780 (9179-MHD).

The 795 servers aren’t refreshing with POWER7+ processors, but they are part of this announcement. The 795 will have a new 256G memory feature with four 64 GB DDR3 DIMMs, so it can support up to 16 TB of memory on the frame. The 795 will also allow 20 LPARs per core with a firmware update. In addition, the 795 has two new PCIe Gen2 GX++ adapters (10G fibre channel card and 10G FCoE/CN) that plug directly into the GX++ slot on the processor card. This card combines a GX adapter + GX cables + PCIe I/O drawer + PCIe adapter into one new 2-port GX hybrid adapter. This card is designed to eliminate the need to have a drawer and the cables. Up to three adapters can be plugged into a processor book, and they can be housed in any of the four GX slots. Gen1 GX and Gen 2 adapters can function in the same processor book.

Here are some other announcement details.

* Elastic Capacity on Demand will enhance On/Off COD. Only two keys — one for processors and one for memory — will be needed to enable 90 days of available but inactive resources.

* Power System Pools will be available for 780 and 795 systems. Elastic COD resources may be purchased and billed for a pool. Rather than have some COD available on one system but not on another, we can now create pools of high-end Power Systems servers that allow sharing of Elastic CoD processor and memory credits. This capability can also be used in support of planned maintenance events. 

While a pool can have up to 10 Power 780 and 795 systems, Power System Pools have two limitations:

1) Fifty percent of the processors in the pool must be active.

2) Although the servers in the pool can be located in multiple data centers, AIX and IBM i cannot be mixed in the same pool.

Note that PowerVM and Electronic Service Agent are needed to enable this functionality. 

* The dynamic platform optimizer (DPO) is a new systems-tuning tool that optimizes processor and memory affinity in virtualized environments. The system can assess the level of affinity on a partition by partition basis. The system and workloads continue to run while the frame adjusts workload placement in the background to optimize performance without requiring additional admin interaction. (Note: This is not the same as the active system optimizer. ASO runs inside of AIX on your LPAR, while DPO runs at the hypervisor level and is designed primarily to optimize your LPAR’s physical cores and memory.)

 * AIX 7.1 TL2 and 6.1 TL8, will ship with the new POWER7+ systems this month. In the same timeframe we’ll also see the appropriate service packs (SP) for AIX 7.1 TL1, AIX 7.1 TL0, AIX 6.1 TL6 and AIX 6.1 TL7 to enable POWER7+ support. In addition, there will be an LPAR-to-WPAR migration tool, which, as you can imagine, helps migrate workloads from an LPAR to a WPAR. If you have AIX 5.3 service extension, expect a TL12 SP to enable POWER7+ processor support according to IBM’s statement of direction to provide future support.

* The new levels of AIX allow for exploitation of POWER7+ crypto offload accelerators, which enable encrypted filesystems and IPsec. According to IBM’s announcement, “this provides cryptographic engines that may relieve the POWER7+ processor from the performance-intensive cryptographic algorithms of AES and SHA. This can offload work from processor cores from doing these tasks and improve performance of those functions.”

* POWER7+ also includes a hardware random number generator and enhanced single precision floating point performance. High quality random numbers help improve security and offload cryptographic CPU cycles from the processor.

* A new virtual processor management scaled throughput option is designed to improve the ratio of workload throughput.

The AIX Enterprise Edition will now include PowerSC and Smartcloud Entry bundle. Also included are AIX 6.1 or 7.1, WPAR manager, IBM Tivoli monitoring, IBM System Director Standard, VMcontrol Enterprise, network control, PowerSC, Smartcloud Entry and storage control. IBM has made some changes to enterprise edition to remove some of the infrequently used items and make room for the newer offerings. If you currently have enterprise edition, you’ll receive any products you don’t currently have at no additional charge. 

* PowerVM 2.2.2, also announced on Wednesday, is set for a Nov. 9 GA. This will allow for the support of 20 LPARs per core on the 770, 780 and 795 systems. VIOS performance advisor updates and live partition mobility improvements are, according to IBM, expected to double the concurrency and improve LPAR movement performance as much as three times.

* PowerHA System Mirror 7.1 Enterprise Edition, also announced on Wednesday, is set for a Nov. 16 GA. We’ll have to get used to some new concepts, including stretched clusters clusters for campus or metro deployments, and linked clusters which enables two sites with independent networks across campus or across the country. I’ll cover this in detail in the near future.

* Finally, a new HMC, the 7042-CR7, will run V7R7.6.0 code and support blade servers. By running the new HMC code along with new 7.6 firmware and newer AIX levels, you’ll be able to set a new, lower minimum CPU for your LPARs (0.05 instead of 0.10). The new code level also allows HMC to support more current web browsers. HMC V7R7.6.0 is the last code level that will run on older models (7310-C04, 7315-CR2 and 7310-CR2). (Note: IBM recommends that if your HMC manages more than 254 partitions, or if you use IBM Systems Director to manage your HMC, at least 3GB of RAM is needed. The HMC should also be a rack-mount CR3 or later.)

For other announcement coverage, check out this IBM Systems Magazine Web Exclusive. And here’s Jay Kruemcke’s take.

Overall, I’m impressed with this announcement. Obviously though, there a ton of material here. Please post any questions in Comments, and I’ll do my best to track down detailed answers.

The Case for Documentation

Edit: I also like HMCscanner output

Originally posted October 2, 2012 on AIXchange

Once I was called in to help a customer that had lost its AIX support staff. I won’t go into the details; just understand that in this case, quite a bit of knowledge vanished overnight and had to be re-created.

We had to figure out passwords and LPAR configurations. Multiple profiles were associated with each LPAR, and there was no one who could answer our questions.  The only way to determine how the profiles were created — or even they were even still active — was to go into each one and look for recent updates. From there, we were left to making educated guesses.

We had to figure out how to connect to the HMC, both locally and remotely. We had to verify network addresses for the HMC as well as the various LPARs. We had to find the user IDs on the various systems that had escalated authority.

Physical connectivity was another puzzle. We found that there were two HMCs in a rack, but only one monitor and keyboard. It turns out the customer was using a KVM in the environment and employed a non-standard way of switching between the different sessions.

We had to figure out how to connect to the storage array, and then determine how the storage was allocated to the servers.

Luckily for us, no lasting damage was done, we were able to recover the passwords and get into the systems. Of course, it did take some time and effort. We didn’t have the luxury of being able to check a runbook, wiki or some other document.

When you build and maintain your own systems, you “just know” all of this information. When you’re a consultant like me and you come into an environment cold, there’s generally someone who can give you this information. Not that it was available in this case, but even documentation can be tricky. Don’t get me wrong, documentation is very valuable — provided it’s current. But outdated documentation is practically worthless, if not actually harmful. It can lead to bad assumptions, which can lead to bad actions, which generally result in system outages.

One tool I rely on in situations like this is HMC sysplans, which provides a snapshot of a machine’s configuration. I also run scripts on all machines so I can have current output. That’s the best way to identify what should be on a machine (or at least, what was on a machine).

Ultimately, though, this is yet another example of why maintaining current documentation is so vital. What kind of critical business information exists only in the memories of your IT staffers? If you were hit by a train, what would be lost?

So what do you do to document the needs and inner-workings of your environment? How frequently do you update this information?

Cache on Hand

Edit: Link no longer works.

Originally posted September 25, 2012 on AIXchange

Chris Gibson tweeted a link to a great read that will help you get your head around the inner-workings of your Power hardware.

Here’s a snippet from the article, “Under the Hood: Of POWER7 Processor Caches.”

“Most of us have a mental image of modern computer systems as consisting of many processors all accessing the system’s memory. The truth is, though, that processors are way too fast to wait for each memory access. In the time it takes to access memory just once, these processors can execute hundreds, if not thousands, of instructions. If you need to speed up an application, it is often far easier to remove just one slow memory access than it is to find and remove many hundreds of instructions.

“To keep that processor busy — to execute your application rapidly — something faster than the system’s memory needs to temporarily hold those gigabytes of data and programs accessed by the processors AND provide the needed rapid access. That’s the job of the Cache, or really caches. Your server’s processor cores only access cache; [servers] do not access memory directly. Cache is small compared to main storage, but also very fast. The outrageously fast speed at which instructions are executed on these processors occurs only when the data or instruction stream is held in the processor’s cache. When the needed data is not in the cache, the processor makes a request for that data from elsewhere, while it continues on, often executing the instruction stream of other tasks. It follows that the cache design within the processor complex is critical, and as a result, its design can also get quite complex.”

The author goes on to describe the cache array, the store-back cache, the L3 cast-out cache and finally, cache coherence:

* “Processor cache holds sets of the most recently accessed 128-byte blocks. You can sort of think of each cache as just a bucket of these storage blocks, but actually it is organized as an array, typically a two dimension array.”

* “So far we’ve outlined the notion of a block of storage being ‘cache filled’ into a cache line of a cache. Clearly, when doing store instructions, there is a need to write the contents of some cache lines back to memory as well.”

* “For POWER7 processors, a storage access fills a cache line of an L2 cache (and often an L1 cache line). And from there the needed data can be very quickly accessed. But the L1/L2 cache(s) are actually relatively small. [Technical Note: The L2 of each POWER7 core only has about 2000 cache lines.] And we’d rather like to keep such blocks residing close to the core as long as possible. So as blocks are filled into the L2 cache, replacing blocks already there, the contents of the replaced L2 are ‘cast-out’ from there into the L3. It takes a bit longer to subsequently re-access the blocks from the L3, but it is still much faster than having to re-access the block from main storage.”

* “This is a Symmetric Multi-processor (SMP). Within such multi-core and multi-chip systems, all memory is accessible from all of the cores, no matter the location of the core or memory. In addition, all cache is what is called ‘coherent’; a cache fill from any core in the whole of the system is able to find the most recent changed block of storage, even if the block exists in another core’s cache. The cache exists, but the hardware maintains the illusion for the software that all accesses are from and to main storage.”

Much more is covered in this article, including tips you may want to consider as a Power programmer. I encourage you to read the whole thing.

Running AIX 5.3 on POWER7 Hardware

Edit: Anyone still running 5.3? Some links no longer work.

Originally posted September 18, 2012 on AIXchange

I was recently asked about potential issues running AIX 5.3 with the latest fixes on POWER7 hardware with dedicated adapters. Somehow this person had gotten the idea that AIX 5.3 could only handle the underlying hardware and adapters by running in a virtualized environment using VIO servers.

Perhaps this person thought that since AIX 5.3 needs to run in POWER6 mode, the newer physical adapters wouldn’t be supported with an old version of the operating system. These days I typically run everything in a virtualized environment using VIO servers, and just off of the top of my head I can’t recall the last time I needed to dedicate an adapter to an AIX 5.3 LPAR on POWER7 hardware.

However, rather than shoot my mouth off, I quickly checked some Redbooks and asked some trusted resources. They confirmed what I figured: AIX 5.3 can most definitely handle all of the latest adapters you can throw at it. Absolutely.

This highlights one of the many strengths of using IBM solutions. From the hardware to the firmware to the hypervisor to the operating systems, one vendor owns the stack. Thus, IBM can ensure that any new hardware and adapters will continue to run with its legacy operating systems.

Speaking of AIX 5.3, remember that even though it’s no longer in support, AIX 5.3 is still eligible for extended support contracts. Another way to get support for your legacy OS is through AIX 5.3 versioned WPARs (see here and here). With this option you can run in POWER7 mode with four threads. IBM also provides bug fixes and how-to support if you choose to run 5.3 WPARs. Obviously running in POWER7 mode should give you a nice a performance boost over using two threads in POWER6 mode.

That IBM makes these options available is all the more impressive when you realize that AIX 5.3 debuted in 2004. You can see how someone would assume that current hardware couldn’t possibly support an eight-year-old operating system.

An item from the above link:

“As new hardware becomes available over the next two years, we will provide new hardware toleration when possible. This will not include new hardware that requires architectural changes.”

Also:

“Some people have asked me about when they should use the AIX 5.3 service extension versus using the AIX 5.3 Workload Partitions product. The answer is pretty easy – if you intend to migrate to a later release of AIX, then use the service extension to bridge you to that point. If however, you believe that you will need to run AIX 5.3 indefinitely for a particular application, then the AIX 5.3 WPARs is the better choice.”

So where are you on this? Have you already migrated all of your systems? Do you have extended support? Do you use the AIX 5.2 or 5.3 versioned WPAR offerings? Do you even know about them?

Looking Beyond Performance

Edit: Still thought provoking.

Originally posted September 11, 2012 on AIXchange

I recently attended another IBM technical briefing. As always, it was time well spent. This briefing included a keynote from IBMer Brad McCredie, whose ideas really resonated with me. Basically, Brad said the golden age of computing may already be behind us. No longer can increasing clock speeds and raw performance improvements dominate our computing hardware purchasing decisions.

Brad used the analogy of buying a new car. In his example, the model’s horsepower and track times hadn’t changed in 30 years. There were no performance improvements whatsoever. Still, there was progress. The new models had better brakes, better seatbelts, a more comfortable interior and significantly higher gas mileage. There was built-in navigation, a better sound system and even lighted cup holders. In short, while performance hadn’t advanced, the overall car driving experience has been transformed. Now it’s the efficiency and creature comforts — not the performance — that catches our eye.

Something similar could be said about the airline industry. Brad took us from the Wright brothers to the Concorde. He noted that planes don’t fly any faster now than they did a generation ago. And yet, Boeing 787s are sold out through the end of the decade. The new planes promise greater fuel efficiency and usability.

Brad then moved onto televisions. With flat-screen prices coming way down, manufacturers no longer emphasize bigger. Instead, they develop features like 3D TV, Internet-ready capabilities and amazing built-in sound systems.

This is now happening in computers as well. We often look at factors beyond raw performance. We want to improve our TCO. We need to manage unplanned outages. We need to run our workloads on fewer cores. We need efficiency.

Obviously, some industries still emphasize raw computing performance. He pointed to the financial industry, where every little bit of speed can help make money. Most customers, however, want to save on power and cooling costs. Or maybe their needs center on RAS and virtualization management features, or the capability to rapidly provision servers or consolidate storage and network infrastructure rather than manage two different networks. We don’t strictly focus on speeds and feeds and raw performance.

So what goes into your computing purchasing decisions these days?

Using mksysb for NIM Backups

Edit: This should still be relevant. Link no longer works.

Originally posted September 4, 2012 on AIXchange

Recently I received a reader question that prompted this email exchange with IBM network installation management (NIM) expert Steve Knudson.

Reader: I currently have one NIM server that I use to recover all of our AIX systems. We will be moving from P5 to P7 hardware, but I will no longer have a tape drive to back up my NIM server to. Nor will I have a DVD writable drive. What I need to know is how can I back up my NIM server to a mksysb file and recover my NIM server from that. I believe that I can’t use the NIM master backup to recover itself to a different LPAR somewhere, correct? Am I going to need an alternate NIM server to recover my NIM master? Do you have a documented procedure somewhere on how to accomplish this? Any help that you could provide on this matter would be greatly appreciated.

Steve’s reply: The way I would approach this….

1) Take a backup of the NIM database into a file in rootvg on the NIM master.
2) Collect mksysb of master, to itself.
3) NIM restore mksysb to new POWER7 LPAR, and in the process, make the question:

Remain NIM client after install? [no]

This eliminates the check and removal of bos.sysmgt.nim.master, nim.spot filesets.
4) After the restore, copy to the POWER7 the various lpp_source, scripts, bosinst_data, resolv_conf resources you want to preserve.
5) On the POWER7 LPAR, run the nim_master_recover command to restore the NIM database on the new LPAR. It will likely look for the copied resources in the exact path and filenames they had on the P5 LPAR.

This restores a backup of the NIM database to a different machine and updates the database to reflect
this change.

If you were planning on changing IP and hostname on the P5, and then setting IP and hostname of P7 LPAR to what the P5 had, you might be able to just restore the NIM database backup on the P7, without nim_master_recover.

Customer reply: Thanks for your response. I have a couple of questions though. So I would need to specify a different hostname and IP address to recover to on the POWER7? Then once I’m ready to shut it down (LPAR on P5), after copying all the lpp_source stuff and scripts, can I rename the POWER7 LPAR with the same hostname and ip address as the one that was on the P5?

I also assume I wouldn’t do the nim_master_recover until the P7 LPAR is updated with the original hostname and ip address?

Steve’s reply: If you want to move the hostname and IP to the new P7 NIM master LPAR, I would restore the mksysb, change hostname and IP on original P5 and then set hostname and IP on the new P7 LPAR. After that, I would restore the NIM database on the P7 and not do nim_master_recover.

Customer reply: How would I restore the master NIM database?

Steve’s reply: smitty nim > Perform NIM administration tasks > Backup/Restore the NIM Database.

Backup/Restore the NIM Database.

Move cursor to desired item and press Enter.

Backup the NIM Database.
Restore the NIM Database from a Backup.

This is where you collect the backup of the NIM database on P5 to start, and also where you’ll restore the NIM database on the P7.

Thus ends the exchange. So have you tried this? Has it been successful? Please share your experiences in Comments.

Training Without the Travel

Edit: Some links no longer work.

Originally posted August 28, 2012 on AIXchange

In February I discussed some great PowerHA V7.1 resources, including virtual user group training and an IBM Redbook. In a follow-on post, I pointed to this training session replay (“Configuring PowerHA SystemMirror V7.1 for AIX Cluster”). This video covers some of the new features in the latest version and also includes a live demo of the instructor setting up a cluster.

You may be unaware that IBM has actually posted several other training session replays on YouTube. They cover topics like IBM Systems Director 6.2 for Power Systems, IBM XIV, BRMS on System i, IBM Storwize V7000, AIX CPU performance management and advanced PowerVM and performance. The full list is available here.

These freely available videos are set up much as an actual IBM training course would be, and I imagine that IBM hopes you like what you see and will then be interested in signing up for a course of a longer
duration. Here’s how IBM puts it:

“Test drive classes do not replace a fee-based class, which is typically 3-5 days in length, provides extensive technical depth and includes hands-on labs. Instead, test drives give you a snapshot of key technical fundamentals using the very popular ILO format.

“Instructor-led online courses (ILO) are taught by a live instructor on a specific day and time. Most courses are exactly the same as their classroom equivalent, including the course duration, content and
student materials. Here is what you need to know:

•    You receive the course materials in advance.
•    To participate on the day of the class, all you need is a broadband Internet connection. This allows you to connect to a virtual classroom where you can interact directly with the instructor and your peers.”

Obviously, many employers are reluctant to budget for IT training and the travel and expenses that go with it. Internet-delivered education like IBM’s ILO is a less-costly alternative, though even then some of us can find it tough to break away from our jobs and remain focused on a class while in the day-to-day office environment. On the other hand, I do know of people who’ve successfully dealt with these potential distractions by taking the classes in office conference rooms or even from home.

Have you taken IBM’s ILO courses or other Internet-delivered education? Please share your experiences in Comments.

PowerVM Best Practices, Part Two

Edit: I still love Redbooks. Part 2.

Originally posted August 21, 2012 on AIXchange

As I said last week, the “IBM PowerVM Best Practices” Redbook has a lot of valuable information. This week I’ll cover the final three chapters of this publication.

Chapter 5 notes

Storage, with virtual SCSI and virtual Fibre Channel, are covered. The authors also address the issue of whether to boot from internal or external disk:

“The best practice for booting a [VIO server] is using internal disks rather than external SAN storage. Below is a list of reasons for booting from internal disks:

* The [VIOS] does not require specific multipathing software to support the internal booting disks. This helps when performing maintenance, migration, and update tasks.
* The [VIOS] does not have to share Fibre Channel adapters with virtual I/O clients, which helps in the event a Fibre Channel adapter replacement is required.

* If virtual I/O clients have issues with virtual SCSI disks presented by the [VIOS] backed by SAN storage, the troubleshooting can be performed from the [VIOS].”

Virtual SCSI and NPIV can be mixed within the same virtual I/O client. Booting devices or rootvg can be mapped via virtual SCSI adapters; data volumes can be mapped via NPIV (section 5.1.3). The pros and cons of mixing NPIV and virtual SCSI are illustrated in table 5-1.

A chdev should be run on all fibre devices (section 5.1.4):

$ chdev -dev fscsi0 -attr fc_err_recov=fast_fail dyntrk=yes
fscsi0 changed

“Changing the fc_err_recov attribute to fast_fail will fail any I/Os immediately if the adapter detects a link event, such as a lost link between a storage device and a switch. The fast_fail setting is only recommended for dual [VIOS] configurations. Setting the dyntrk attribute to yes allows the [VIOS] to tolerate cable changes in the SAN.

The authors recommend exporting disk devices backed by SAN storage as physical volumes. In environments with a limited number of disks, storage pools should be created to manage storage from the VIOS (section 5.2.1).

Virtual adapter considerations and naming conventions are covered in section 5.2.2. The pros and cons of using logical volumes for disk mappings versus mapping entire disks are considered in section 5.2.3. This section also tells us:

“Virtual tape devices are assigned and operated similarly to virtual optical devices. Only one virtual I/O client can have access at a time. It is a best practice to have such devices attached to a [VIOS], instead of moving the physical parent adapter to a single client partition.

“When internal tapes and optical devices are physically located on the same controller as the [VIO server’s] boot disks, it is a best practice to map them to a virtual host adapter. Then, use dynamic logical partitioning to assign this virtual host adapter to a client partition.”

Section 5.2.4 covers configuring the VIOC with Virtual SCSI and lists some recommended tuning options for AIX. Sections 5.3 and 5.4 cover shared storage pools and NPIV, respectively:

“NPIV is now the preferred method of providing virtual storage to virtual I/O clients whenever a SAN infrastructure is available. The main advantage for selecting NPIV, compared to virtual SCSI, is that the [VIOS] is only used a pass through to the virtual I/O client virtual Fibre Channel adapters. Therefore, the storage is mapped directly to the virtual I/O client, with storage allocation managed in the SAN. This simplifies storage mapping at the [VIOS].”

Chapter 6 and 7 notes

Chapter 6 covers performance monitoring, highlighting tools and commands that enable both short- and long-term performance monitoring.

Chapter 7 covers security and advanced PowerVM features, including default open ports on the VIOS like FTP, SSH, telnet, rpcbind and RMC. The authors recommend disabling FTP and telnet if they’re not needed (section 7.1.2). Active memory sharing and active memory duplication are covered in sections 7.4 and 7.4.3.

PowerSC and Live Partition Mobility are covered in sections 7.2 and 7.3. LPM storage considerations are listed in section 7.3.3:

“* When configuring virtual SCSI, the storage must be zoned to both source and target [VIO servers]. Also, only SAN disks are supported in LPM.

* When using NPIV, confirm that both WWPNs on the virtual Fibre Channel adapters are zoned.
* Dedicated I/O adapters must be deallocated before migration. Optical devices in the [VIOS] must not be assigned to the virtual I/O clients that will be moved.
* When using virtual SCSI adapters, verify that the reserve attributes on the physical volumes are the same for the source, and destination [VIO servers].
* When using virtual SCSI, before you move a virtual I/O client, you can specify a new name for the virtual target device (VTD) if you want to preserve the same naming convention on the target frame. After you move the virtual I/O client, the VTD assumes the new name on the target [VIOS]. …”

Section 7.3.4 lists LPM network considerations:

“* Shared Ethernet Adapters (SEA) must be used in a Live Partition Mobility environment.
* Source and target frames must be on same subnet to bridge the same ethernet network that the mobile partitions use.
* The network throughput is important. The higher the throughput, the less time it will take to perform the LPM operation. For example, if we are performing an LPM operation on a virtual I/O client with
8 GB of memory:

– A 100 MB network, sustaining a 30 MB/s throughput, takes 36 minutes to complete the LPM operation.
– A 1 GB network, sustaining a 300 MB/s throughput, takes 3.6 minutes to complete the LPM operation.”

PowerVM Best Practices Redbook

Edit: I still love Redbooks.

Originally posted August 14, 2012 on AIXchange

Occasionally I like to highlight IBM Redbooks that provide particularly valuable information to AIX pros. The new publication, “IBM PowerVM Best Practices,” is the latest example.

The version I viewed was a draft document (“Redpiece”) dated July 2, 2012. If it hasn’t yet been finalized, it should be soon. While a fairly short read at 118 pages, this publication is packed with relevant information that should be easily understood.

Chapter 1 notes

The authors remind us of the features and benefits of running PowerVM (section 1.1). Current VIO server minimums include 30GB of disk, a storage adapter, 768MB of memory and 0.1 processor (section 1.2). It’s worth repeating that that memory figure is a minimum. In fact, 6GB of memory and a core of entitled capacity for your VIOS is suggested (depending, of course, on your workload) in section 1.3.4.

The authors add: “Core speeds vary from system to system, and IBM is constantly improving the speed and efficiency of POWER processor cores and memory. Therefore, the above guidelines are, once again, a starting point. Once you have created your [VIOS] environment, including all the virtual clients, you should test and monitor it to make sure the assigned resources are appropriate to handle the load on the [VIOS].”

Section 1.3.7 addresses the question of whether to run a single VIOS or multiple VIO servers. The authors recommend the latter given the benefits of redundancy.

Section 1.3.9 covers slot numbering and naming conventions.

Chapter 2 notes

Installation, migration and configuration are covered. Included is a nicely documented section illustrating VIOS configuration on an HMC.

Two important reminders: First, set all physical adapters to desired, as setting them to required prevents dynamic LPAR (DLPAR) operations from working (section 2.1.4). Similarly, all virtual adapters should be set to desired if you’re planning on implementing live partition mobility (section 2.1.5).

The authors recommend using NIM to install the VIOS, citing this documentation.

Section 2.3.1 covers the need to perform regular maintenance. Firmware updates and patching should be done once a year. Two other points from the authors:

“When doing system firmware updates from one major release to another, always update the HMC to the latest available version first along with any mandatory HMC patches, then do the firmware. If the operating system is being updated as well, update the operating system first, then HMC code, and
lastly the system firmware.

“In a dual HMC configuration always update both HMCs in a single maintenance window, or disconnect one HMC until it is updated to the same level as the other HMC.”

Section 2.3.3 has a checklist you can use to apply fix packs, service packs, and ifixes. Section 2.4 covers VIOS migration.

Chapters 3 and 4 notes
Administration and maintenance are covered, including the process of backing up and restoring the VIOS (section 3.1). Backing up the VIOS is a separate task from backing up your client LPARs, so be sure you are backing up both (section 3.1.1). The VIOS should be backed up to a remote file (section 3.1.2).

Restoring the VIOS — from either the HMC or by using a NIM server — is discussed in section 3.1.4. In a D/R scenario, NIM is recommended (section 3.1.5).

Changes made with DLPAR operations should be saved by either manually making the changes to the profile, or by using save configuration from the HMC GUI (section 3.2)

Section 3.2.1 has a warning. This has actually tripped me up with NPIV in the past, so pay attention:

“Using the method of adding a virtual FC adapter to a virtual I/O client via a DLPAR operation, and then modifying the permanent partition profile, will result in a different pair of WWPNs between the active and saved partition profiles.

“When a virtual FC adapter is created for a virtual I/O client, a pair of unique WWPNs are assigned to this adapter by the Power Hypervisor. An attempt to add the same adapter at a later stage will result in the creation of another pair of unique WWPNs.

“When adding virtual FC adapters into a virtual I/O client via a DLPAR operation, use the ‘Overwrite existing profile’ option [Figure 3-4, page 44] to save over the permanent partition profile. This will result in the same pair of WWPNs in both the active and saved partition profiles.”

Section 3.3 covers the virtual media repository. Section 3.4 covers power server shutdown and startup.

Chapter four covers networking best practices, examining many different scenarios. You should read through them all.

There’s plenty more, so next week I’ll cover the rest of the material in this publication.

The 411 on a Client Hanging at LED 0611

Edit: Some links no longer work.

Originally posted August 7, 2012 on AIXchange

I was using a NIM server to load AIX, and it kept stopping at LED 0611. My first thought was to check this Redbook, but when I didn’t find anything there, I just did a web search. That let me to Steve Knudson’s AIX Network Installation Management Basics slides, which indicated that I was dealing with an NFS issue.

“Client hangs at LED 0611 – Indicates that some nfs resource that should be exported from the nim master is not available to the nim client. Mostly likely cause is that a parent directory was already exported to the client. the nim_bosinst process doesn’t always give errors when starting off this way. Check exports on the server, they should look something like this:

# exportfs
/export/aix433/images/mksysb.minimal -ro,root=nim6:,access=nim6:
/export/aix433/lppsource -ro,root=nim6:,access=nim6:
/export/aix433/spot/spot_aix433/usr -ro,root=nim6:,access=nim6:
/export/aix433/res/bundle.gnu -ro,root=nim6:,access=nim6:
/export/aix433/res/itab.mkmaster -ro,root=nim6:,access=nim6:
/export/aix433/res/bosinst.initial.B50 -ro,root=nim6:,access=nim6:
/export/nim/scripts/nim6.script -ro,root=nim6:,access=nim6:

“If the exports are substantially different from these, then power off client node. On the nim master run:

nim -o reset -aforce=yes clientnode
nim -Fo deallocate -asubclass=all clientnode
exportfs -ua
edit /etc/exports, remove inappropriate entries
nim_bosinst the client node again

“You can also get this led 611 when nim master resolves client hostnames in /etc/hosts.

“Entries there should be ipaddr fullyqualifiedname shortname.”

I went through these hints, reset the client, deallocated resources and then set up the client again. It still didn’t work. I went through /etc/hosts and tried a few permutations, and again, no luck.

At this point, I made a rookie mistake and changed a few different things at once. One of those changes fixed my issue, I’m just not sure which one. I do have it narrowed down to two possibilities. First, when I found this, I once again made sure that /etc/hosts had the correct information, and then I edited /etc/netsvc.conf and added hosts = local4,bind4.

That might have been the solution. Or, it could have come from this. The last entry in the thread says:

“Having spent the weekend making wild stabs in the dark I stumbled across the ‘solution.’ Please [provide] feedback as I’d like to understand what I’m doing.

“I used SMIT NIM/Perform NIM Administration Tasks/Configure NIM Environment Options. (This is sounding obvious, I know.) I then selected two options, the one that made the difference, I’m not sure. Maybe someone could point it out? Export NIM Resources Globally and Control Client Resource Allocation/Set Allocation Permissions for all clients.”

In any event, after making both of these changes, my NIM server was working again, and my mksysb booted and loaded as expected.

Interestingly enough, after I had this experience, one of my customers went through something similar. The customer addressed their issue by running these commands:

stopsrc –g nfs
cd /etc
rm –rf rmtab xtab
cd /var/statmon
rm –rf sm sm.bak state
startsrc –g nfs

Like many other things in AIX, there’s more than one way to tackle this issue. I’m not sure if there’s a preferred way. Perhaps some of you NIM experts could chime in on that in Comments. I’d love to hear from you.

Five Years In

Edit: Time sure flies.

Originally posted July 31, 2012 on AIXchange

Hard to believe, but it’s been five years since this blog debuted. Check the AIXchange archives and see for yourself.

Since the launch of AIXchange, I’ve written approximately 250 weekly posts — I don’t think we’ve missed more than a handful of weeks in five years. Sometimes I’m asked where I come up with the ideas and topics that I write about. The simple answer is I spend time at customer sites around the U.S., listening to the concerns of people working in their data centers. I talk to customers who are looking at different technologies and solutions. Sometimes I attend training or seminars and conferences.

I read technical manuals. I read other AIX blogs and follow AIX pros on Twitter. I respond to reader questions. In short, every day is a learning experience for me, and I try to share what I learn here.

I greatly enjoy doing this, and I plan to continue as long as you keep reading. Along those lines, what would like to read about going forward? What information would help you deal with the challenges you face in your department and across your company? Are there topics I’ve overlooked? Are there things I’ve covered too much? Please share your views and ideas. That’s how it works, after all.

The Enduring Value of IRC

Edit: I still love irc. Isn’t slack just irc with a GUI slapped on top of it?

Originally posted July 24, 2012 on AIXchange

If you’re old enough to remember Windows 3.11, you may recall the earliest days of IRC:

“Internet Relay Chat (IRC) is a protocol for real-time Internet text messaging (chat) or synchronous conferencing. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer, including file sharing.

“IRC was created in 1988. Client software is available for every major operating system that supports Internet access. As of April 2011, the top 100 IRC networks served more than half a million users at a time, with hundreds of thousands of channels operating on a total of roughly 1,500 servers out of roughly 3,200 servers worldwide.”

Back then, I spent considerable time on efnet and undernet IRC servers. I had a shell account running screen where I would run an IRC client, and connect to any IRC servers that I was interested in. Later on, when I worked at IBM, I took advantage of two IRC channels (#linux and #aix) that ran on the internal IBM network. It was a wonderful resource. When I couldn’t figure out something on my own, when I really needed help, I had quick and easy access to technical experts.

Would you like quick and easy access to technical experts? A few years ago I suggested that using IRC was one way users could keep current on AIX technology. Since writing that, social media has taken off — I know I’ve come to rely on Twitter. And of course today’s instant-messaging (IM) clients, with their video conferencing capabilities, are light years beyond IRC in terms of function. (And the IM evolution is certainly ongoing.)

 Nevertheless, I stand by what I said in 2008: IRC is a valuable tool. It still is.

So get an IRC client. Try connecting to irc.freenode.net. Then try connecting to ##aix. Now, I recommend lurking awhile before you jump into the channel and blurt out all your (tech-centric) problems. The effort though is worth it, because ##aix can put you in touch with a world of talented AIX pros who are willing to help.

I like IRC’s immediacy. You’re not continually checking forums to see if someone replied to the question you posted. This is realtime communications. Keep in mind, however, that the inhabitants of ##aix are essentially volunteers. Don’t get impatient if you don’t receive an immediate response. Wait a few minutes. Wait a half hour if need be. This isn’t anyone’s full-time job. Just remember that you’re accessing a worldwide audience of techies. No matter the hour, someone is out there. Odds are you’ll get help if you just wait. And even if there’s no one who can address your specific issue, sometimes it’s just nice to exchange ideas with another person who can provide a perspective different from your own.

I’ve always liked the IRC environment. I find comfort in the old-school feel. I enjoy the off-topic conversations. I like the relationships that develop over time — and if you do stay connected to a channel and get to know folks rather than simply join and bolt once your question’s been answered, you will make some new friends.

If you’ve never used IRC, I encourage you to give this “old” technology a try. And if you have or do use it, please share your thoughts in Comments.

Using the HMC Scanner

Edit: I just recommended this tool the other day, I still love it. This video is a nice demo as well. Updated the first link, the second link is the old download link that will be going away.

Originally posted July 17, 2012 on AIXchange

I recently downloaded the latest version of the HMC Scanner tool:

“HMC Scanner is a Java program that uses SSH to connect to an HMC, downloads the system configuration and produces a single Excel spreadsheet that contains the configuration of servers and LPARs. The result is a simple way to document configuration and to easily look at most important configuration parameters.

“Information is organized in tabs:

* System summary: name, serial number, cores, memory, service processor IP for each server.
* LPAR summary: list of all LPAR by serve with status, environment, version, processor mode.
* LPAR CPU: processor configuration of each LPAR.
* LPAR MEM: memory configuration of each LPAR.
* Physical slots: list of all slots of each system with LPAR assignment, description, physical location and drc_index.
* Virtual Ethernet: network configuration of each virtual switch and each LPAR.
* Virtual SCSI: configuration of all virtual SCSI adapters, both client and server.
* VSCSI map: devices mapped by each VIOS to partitions.
* Virtual fibre: virtual fibre channel configuration of client and server with identification of physical adapter assigned.
* SW cores: LPAR and virtual processor pool configuration matrix to compute the number of software licenses. Simulation of alternative scenarios is possible.
* CPU pool usage: easy to read history of CPU usage of each system. Based on last 12 months of lslparutil data.
* Sys RAM usage: easy to read history of physical memory assignment to each LPAR. Based on last 12 months of lslparutil data.
* LPAR CPU usage: easy to read history of CPU usage of each LPAR. Based on last 12 months of lslparutil data.”

After downloading and opening the .zip file, I opened a DOS prompt and went to the directory I chose when I unzipped the file. Then I followed these directions:

“Unzip the downloaded file and edit the hmcScanner.bat or hmcScanner.ksh in order to make the BASE variable point to the directory where the ZIP file has been decompressed.”

Once I pointed the BASE variable to the directory containing my extracted files, I ran hmcScanner.bat and saw:

            HMC Scanner Version 0.3
            Missing or wrong arguments. Syntax is:

            hmcScanner.Loader [-p ] [-dir ] [-perf ] [-readlocal] [-key file] [-stats] is the directory where data will be stored. Default is current directory.

        and is data collection retrieval interval. Syntax is: YYYY MMDD  d=daily data samples; h=hourly data samples
        -readlocal will force reading of existing local data without contacting HMC
        -key will use OpenSSH private key (default $HOME/.ssh/id_rsa)
        -stats will produce system statistics

I tried running hmcScanner.bat hmchostname hscroot –p password (obviously using a real hostname and password). It ran for a few minutes and started to gather data. After it finished, it generated an .xls file that had summary information about my HMC-managed systems. (Sample here.)

The system summary displays the serial number, the number of cores installed and active, memory installed and active, service processor IP information, etc. 

The next tab is labeled LPAR Summary. It provides a view of the LPARs on the machine and whether they’re running or not. It also displays the OS version and the mode the processors are running in.

The LPAR CPU tab displays processor information including entitlement, weight, minimums and maximums, and whether the processors are shared or capped.

The LPAR MEM tab displays similar information for memory statistics.

The Physical Slots tab shows which LPARs are assigned which physical cards on the machine.

The Virtual Ethernet tab displays the virtual slot numbers, whether the network adapters are trunked together, the virtual Ethernet MAC address, the virtual switch it’s attached to and the VLAN ID.

The Virtual SCSI tab displays the slots that are set up, and which slots they’re attached to.

The VSCSI Map tab shows how disks — including LUN IDs, backing devices, etc. — are mapped. There are also Virtual Fibre and SW cores tabs.

The HMC Scanner is useful on its own, but when coupled with the HMC system plans you can generate from your HMC (select System Plans/Create System Plan if you don’t already run these in your environment), it provides some fantastic resources to help you document your system.

The Compatibility of VIOS and IBM i

Edit: I still recommend checking that the sun still occupies the sky once in a while.

Originally posted July 10, 2012 on AIXchange

“What do you mean you can’t see the disk?”

I was surprised. I’d just run mkvdev –vdev hdisk2 –vadapter vhost0 on my VIO server and mapped hdisk2 to an IBM i client, and the guy doing the install couldn’t see the disk I just mapped to his LPAR.

I wondered why I was having VIOS manage these disks anyway. Why not give him the physical adapters and let him go nuts? There were two good reasons to go VIOS. First, multiple IBM i clients would be using the internal disk. Second, the company wanted to be positioned for the future. They planned to attach the IBM i partitions to a SAN.

I was working with a large RAID5 array made out of 6x600G internal SAS disks. It was nearly 3TB in size.

How do you create a big RAID array out of internal disks? In the VIOS I was logged in as padmin, and I’d assigned the storage adapters to the VIOS LPAR in the HMC GUI. In this instance we only had one VIOS, so it was fairly straightforward.

I ran diagmenu from the $ prompt and got into my normal diag screens that I’m familiar with in AIX.  From there I hit Enter, then I ran Task Selection>RAID array manager>IBM SAS Disk Array Manager.

The next step was to “Create an Array Candidate pdisk and Format to 528 Byte Sectors.” I selected all of the relevant hdisks I was planning to use in the array and let diag format and then delete them as hdisks. They would then be recreated as pdisks. (This took a while, but I needed a break. I stepped outside to confirm that the sun still occupies the sky.)

Once the formatting/deleting/recreating was complete, I was able to “Create a SAS Disk Array” from within the IBM SAS Disk Array Manager menu. I chose RAID5 and my stripe size, and the new hdisk was created. Now we’re back at the point where I mapped the hdisk that my IBM i admin couldn’t see. So I unmapped the single large disk and carved it up into smaller logical volumes. I added hdisk2 into the datavg volume group using this command:

mkvg –vg datavg hdisk2

Then I made some smaller logical volumes to present to IBM i using this command:

mklv –lv disk1 datavg 500G

Once I’d carved up the logical volumes, I mapped them with this command:

mkvdev –vdev disk1 –vadapter vhost0

Lowering my logical disk sizes was the key. Once I did that and the remapping, all was well, and the IBM i admin was able to use the disk as presented.

While this is an older IBM Redbook, it has good information about IBM i and VIOS and disk limitations:

http://www.redbooks.ibm.com/redbooks/pdfs/sg247668.pdf

From Section 4.5.4: “The maximum available size of a logical drive for System i is 2 TB – 512 bytes. But for performance reasons we recommend much smaller sizes, as is described further in this section.”

I seem to do more and more with IBM i. I guess these shops are recognizing how well IBM i and VIOS work together. That’s a good thing for them, and it’s good for us AIX users, too. Anytime we can work together, we can learn from one another.

Helping Users Help Themselves

Edit: Still good stuff.

Originally posted July 3, 2012 on AIXchange

Someone recently shared with me a thread where IT pros lament end users’ lack of computer expertise. Even though end users have their own job responsibilities, you’d think that companies would hire people who at least have a basic understanding of e-mail or widely used business applications like Microsoft Office. However, that frequently isn’t the case.

Anyway, this lengthy thread generated an epic comment, which I’ll repost below. The commenter’s basic point is that while an automobile is an incredibly complex machine, when something goes wrong most car owners can at least explain their problem. With end users, again, that frequently isn’t the case:

“I’ve actually been using the cars analogy for a couple months now and I think it’s very fitting. Imagine if you were a mechanic who owned an auto shop and your average customer call went something like this?

“Customer: My car isn’t working and I need you to fix it immediately, this is an emergency.
Mechanic: Alright sir what seems to be the problem?
Customer: I don’t know, I tried to use my car on Friday and it didn’t work, now it’s Monday and I need to get to work and I can’t and this needs to be fixed right now.
Mechanic: Can you start the car? Can you even get into your car? Does it make any sounds when you try to start it? Are all four tires there?
Customer: I don’t know, I don’t know what any of that stuff means, I tried to get to work and it wouldn’t let me and you need to fix it now because you changed my oil six months ago.
Mechanic: Alright well what kind of car are you driving?
Customer: I don’t know, a green one, why does that matter?
Mechanic: Please take a look at the back of your car and see if there are any letters or numbers that would indicate a vehicle model or manufacturer
Customer: Ok, my car is a SV2 87K.
Mechanic: No sir that’s your license plate. My records indicate that you drive a Nissan Altima, can you confirm that the key you’re using to try and get into this car says Nissan on it?
Customer: My key says Lexus but I don’t see how that makes a difference, I’ve been using this key on this car for years and it’s always worked, what did you do to my car?”

Could you imagine your mechanic having to ask if you’re using the right key, or if you have tires? Could you imagine saying you didn’t know? Yet that sort of thing happens every day in the world of tech support. We ask customers — even end users — if the machine is plugged in. We really do ask if they’ve tried turning it off and on again.

It’s our job to help people, but the people seeking our help need to help us by providing basic information. I cannot count the number of times I’ve asked “what changed?” only to, quite honestly, be lied to. “I didn’t do anything,” I’ll hear. Then later the user admits to deleting files after reading somewhere that doing so would make his computer run faster. Which it could — unless of course you randomly delete some important system files.

Dealing with end users can be frustrating. Laughing at comments like the aforementioned car analogy is one way to deal with the frustration. But more important is to always keep in mind the user’s perspective. Sometimes I wonder if it isn’t that users are dumb, but they’re afraid of feeling dumb. Perhaps they don’t confess because they’ve been belittled for their computer ignorance in the past.

 As I wrote back in 2009, no matter how difficult they might make our jobs, end users still deserve our respect:

“A co-worker of mine once snapped at a nurse when she had problems logging into her workstation. She responded by asking him if he’d like to come up the hall with her and fix an IV or administer some drugs. Touche. The nurse was just as knowledgeable and passionate about healthcare as my coworker was about technology. Working with computers was important, but it was only a small part of her job. She just needed to enter data and to print some reports. She didn’t care about drivers, passwords or proper startup/shutdown sequences. Once we showed her how to do what she needed to do, she was fine, and we didn’t hear from her again.”

Perhaps end users as a whole should understand computers more than they do. Remember though — few people are as passionate about technology as we are. And ultimately, we support businesses. These businesses don’t run the latest and greatest hardware because they’re geeked about feeds and speeds. They rely on computers to process data and solve problems. It’s our job to be friendly and helpful and, above all, help people help themselves.

Moving VIOS to Internal Disks

Edit: Link no longer works.

Originally posted June 26, 2012 on AIXchange

Recently, a client of mine had a VIO server that was booting from a SAN. Many shops boot everything from SANs, in part to avoid the hassles of working with internal disks. The flip side is that booting from internal disks leaves available the capability to troubleshoot a system even if the SAN is down. If you boot from SAN and the SAN has an issue, the entire system is unavailable until the SAN is fixed. If you boot from internal disks, you can still login and point to the errors in your error log, even if your remaining LPARs are unusable because they boot from SAN (or at least get their data from the SAN).

In this case, the VIOS had been built on smaller-sized SAN LUNs, and my client decided that the operating system should be moved onto larger-sized internal SAS disks. To accomplish this, I opted to add the disks to rootvg and then run the mirrorios command to mirror rootvg onto the internal drives.

However, when I tried to run extendvg rootvg hdisk52, I got this error:

Unable to add at least one of the specified physical volumes to the
volume group. The maximum number of physical partitions (PPs) supported by the volume group must be increased.  Use the lsvg command to display the current maximum number of physical partitions (MAX PPs per PV:) and chvg -factor to change the value.

*******************************************************************************
The command’s response was not recognized.  This may or may not
indicate a problem.
*******************************************************************************

*******************************************************************************
The command’s response was not recognized.  This may or may not
indicate a problem.
*******************************************************************************

extendvg: Unable to extend volume group.


Here’s the fix for this issue:

            -factor
            Changes the limit of the number of physical partitions per physical volume, specified by factor. factor should be between 1-16 for 32 disk volume groups and 1-64 for 128 disk volume groups.
            If factor is not supplied, it is set to the lowest value such that the number of physical partitions in the volume group is less than factor x1016. If factor is specified, the maximum number of physical partitions per physical volume for the volume group changes to factor x1016.

I aimed low, first trying chvg –factor 2. When that didn’t work, I moved to chvg –factor 3. Bingo. That command was able to extend rootvg.

Once I was able to extendvg successfully, I ran mirrorios. That produces this prompt:


            This command causes a reboot. Continue [y|n]?

I chose yes. While I could have run mirrorios with the –defer option, I prefer to do the reboot right away so I know everything works as expected once the mirror operation completes.

Then I reminded myself of something else. I ran mirrorios hdisk52 hdisk53, and of course it did its job of spreading the copies across both disks. After running unmirrorios hdisk52 and unmirrorios hdisk53 to clean that up, I ran mirrorios hdisk52. Once this completed, I was able to unmirror hdisk14 (my SAN LUN), then mirrorios hdisk53. This left me with what I wanted — copies of rootvg on my internal SAS disks, with the copy on my SAN LUN clear so I could remove it from rootvg.

What about your environment? Do you run VIOS on internal disks like my client, or do you boot from SAN?

Not-So-Crazy Ideas

Edit: What do I consider crazy now that I should give a second look to.

Originally posted June 19, 2012 on AIXchange

I was in a meeting when a consultant suggested we adopt a storage area network solution for our servers. I was part of a team that laughed the consultant out of the room.

“What an idiotic solution,” we thought. “You want us to put all of our storage together in one piece of hardware? What if that machine goes down? Our whole environment will go down with it, and who knows how long it would take to untangle that mess?

“You want us to pay how much? Please leave. You’re clearly insane.”

This meeting took place in the mid 1990s. At the time we had AS/400 and Novell Netware servers, along with Windows, Sun, SCO and AIX machines. Everything in the computer room had internal disks.

With SANs still in their infancy, we may have been wise to hesitate. But fast forward some 15 years. Our initial fears have long since been addressed. Most SANs have built-in redundancy, so if a disk or controller goes down, the SAN keeps running, and the data is perfectly preserved. SANs are so prevalent now we pretty much take them for granted. Probably every person reading this either currently uses a SAN or has used one at some point.

Shortly after that long-ago storage meeting, we were presented with another crazy idea. This guy, a new hire into our organization, wanted to move our core switches to Fast Ethernet, running at 100 full. At the time we were perfectly happy running on an FDDI ring.

“What about network collisions?” we wondered. “Why would we go with Ethernet?

Back then we were invested in FDDI. We put our budget and expertise into it. There was even an emotional investment. (Really, one coworker was so upset when we ultimately went with Ethernet, he left the company.)

Replacing most of our network interfaces and switches to accommodate Ethernet was expensive, but we realized that the times were changing, and we needed to change with them.

Of course, technology is full of stories like these. How many of you were DOS experts? Did you know your way around Windows 3.11, OS/2 or Netware? When was the last time you put your skills with line printers or reel to reel tapes to good use? What about configuring dot matrix printers or serial devices directly attached to AIX machines? I’m sure a few folks still do these things, but by and large much of this technology is gone. And by and large, we’ve smoothly adapted to what’s come after.

It’s human nature to fall into routines and resist change. But change is the nature of technology. It’s not all bad, either. Honestly, I believe one of the fun things about working in IT is being a part of this constant change, and seeing old assumptions crumble — even if those assumptions were sometimes my own.

A Look at IBM’s XIV

Edit: Are there many still out there? Some links no longer work.

Originally posted June 12, 2012 on AIXchange

Are you running IBM’s XIV disk storage system?

If you’re not familiar with this solution, Anthony Vandewerdt’s blog is a good starting point. Anthony, an IBM storage solutions specialist who’s based in Australia, has covered, among other things, XIV historyXIV Release 3.1 materials (including a really cool video) and the XIV Release 3.1 SSD read cache.

Anthony also explains how to load the latest XIV GUI (Version 3.01).

Relatedly, if you check the release notes, you’ll find this summary:

“IBM XIV Management Tools v3.0.1 contains all the utilities required to manage and monitor your XIV system: IBM XIV GUI for graphic management and control, IBM XIV Top for real time monitoring, and IBM XIV XCLI for CLI access and scripting.

“Version 3.0.1 offers support for IBM XIV Storage System Gen3, as well as enhanced GUI functionality and usability and internal updates for XIVTop and XCLI utilities. The Management Tools package can be used to manage all generations of the XIV system from a single console.”

The tools run on different versions of Windows, MAC OS, Linux, AIX, Solaris, and HP-UX. Check the readme for more details, and download the GUI. (Note: The relevant management tools and their fix packs are listed in the middle of the page under the Management Tools heading.)

In my case I selected the Windows version, and after my 300 MB download, I had a GUI to play with.

If you don’t have an XIV system, you can still get a glimpse of its functionality. There’s an option to select demo mode on your management console that allows you to simulate an XIV environment and its storage-provisioning features. (Note: Registration with IBM.com is required to download the management tools or fixes.)

So have you considered XIV for your shop?

A New User’s Take on the Command Line

Edit: I am not sure this has gotten any easier.

Originally posted June 5, 2012 on AIXchange

I recently worked with a customer whose environment is kind of interesting. Even though Linux is prevalent and he has a background of running Windows servers on VMware, there’s little — let’s call it traditional — UNIX hardware.

This customer, however, just installed  AIX on Power for the first time, and it was enlightening to hear his perspective on this leading-edge technology.

Given his long-term use of VMware, he found the virtualization concepts easy enough to grasp. On the other hand, installation and navigation of the HMC and AIX were new ground.

One of the first stumbling blocks came once we’d installed the VIO server. We were navigating around in the Korn shell after accepting all of the license agreements on the command line. Of course I do this all the time, so it’s second nature to me. That’s why I didn’t have a good answer when he wondered why better command history navigation and command completion options haven’t been implemented.

There were little annoyances. His term type wasn’t set properly. Sometimes the function keys worked, sometimes not. He had to manually type stty erase ^? to get his backspace key to work. Another source of frustration was that the F4 key would provide a pick list in smitty — sometimes. And sometimes not.  These little annoyances are easy to work around, provided you’ve been doing it for a while.

When I showed him how to use set –o vi and then navigate the command line like a vi session, I received a look of sheer incredulity. He asked why on earth he should he need to know this obscure stuff? Couldn’t AIX just allow users to get their command history via the up arrow, as DOS 5 and DOSKEY users have been doing for years? He noted that every other command line he uses, including Cisco’s IOS, allows him to use the up arrow to go through his shell history and serves as tab completion when he enters commands.

I told him he could load bash and others shells onto AIX for a more user-friendly command line experience. Of course, that doesn’t help him when he’s doing work as root or padmin using the Korn shell.

Some of his critiques were actually pretty funny. For instance, when I informed him that the esc-k combination allows him to go back in his shell history and edit previous commands, he jokingly told me that “esc k” in Spanish means “What did I just type?”  (As que — pronounced “k” in Spanish — means “what” in English, I can see how he would make that leap.) 

Other words he used to describe the AIX interface included “antiquated,” “primitive” and “painful.” He told me it wasn’t 1982 anymore. Then he suggested that perhaps this interface would pass for modern in North Korea.

Despite his light-hearteded comments, the customer clearly wasn’t impressed with the out-of-the-box Korn shell command line experience. Although we are able to clean some of the issues up with entries in our .profile, it left me wondering about improvements that could be made to the command line.

We moved on to loading operating systems, virtual Ethernet, virtual storage and virtual optical devices. The complaining continued. Once I got to the NIM server — or, as he called it, “the secret of nimh server” — the customer had had enough for the day.

I accepted all his comments with a grain of salt. Even for IT pros, learning something new can be daunting. Since that first day he’s gotten more comfortable with the AIX environment. The rate of complaints has slowed.

But it’s interesting to think about. Imagine you didn’t have your years of experience with the command line and try to see things with fresh eyes. What kinds of crazy things do we do on a daily basis that we accept as perfectly normal? Does it extend beyond the command line? Are there features and functions  in AIX that would turn off users who don’t know the platform? Could any or all of these things truly be a barrier that keeps some customers from adopting AIX and Power solutions? Please share your thoughts in Comments.

Implementing a Shared Storage Pool

Edit: This is still a slick way to handle disk.

Originally posted May 29, 2012 on AIXchange

I wrote about shared storage pools (here and here) back in January. Recently, I had an opportunity to implement one with a customer.

We had two 720 servers, each of which had two VIO servers. We upgraded to the latest VIOS code, making sure our HMC and firmware were at current levels. Then we presented the LUNs from our SAN, following the steps outlined in my January posts.

First we made sure all the LUNs were set to no reserve.

            chdev -dev -attr reserve_policy=no_reserve

Then we created the cluster. While I’m giving the names referenced in Nigel Griffiths’ presentation (see my first post linked above), for the record, we used our own names.

            cluster -create -clustername galaxy -repopvs hdisk2
            -spname atlantic -sppvs hdisk3 hdisk5 -hostname bluevios1.ibm.com

With that accomplished, I could see that we had a cluster.

            cluster –list
            cluster -status -clustername galaxy

In our cluster, we used one 20G repository LUN and two 500G LUNs for our data.

The cluster –create command took a few minutes to run. On our first try we didn’t have fully qualified hostnames on our VIO servers, so we got an error when we tried to create the cluster. Changing the names was easy enough, and after that, the cluster was successfully created.

We ran the cluster –list and cluster –status commands again and got the output we expected. Then from the same node we ran the cluster –addnode command to add a second VIOS to our cluster.

            cluster –addnode –clustername galaxy –hostname redvios1.ibm.com

It took about a minute to add that node, and it was successful. We ran cluster –status again to confirm that the second VIOS was added.

One thing I liked about the process is that the output provides the node name, machine type and model information. This way it’s easy to see determine which physical server is running the command.

We did the same procedure for the next two VIO servers. This took a bit longer, likely because they were on another physical server. Still, at the end of the procedure the cluster -status command displayed all four VIO servers in the cluster. When we logged into each of the other VIO servers and ran cluster –status, we saw the same output.

(Note: Running lspv won’t tell you that the disks in your storage pool are in use, but the lspv -free command will give you this confirmation. This could be an issue if you were mapping the entire hdisk to a client LPAR — i.e., the “old” way. But because you’re not actually mapping hdisks directly, this isn’t necessarily a problem.)

To create a new vdisk to map to our client LPAR, we ran:

            mkbdsp -clustername galaxy -sp atlantic 16G-bd vdisk_red6a -vadapter vhost2

Once we had our disk created and mapped, we ran:

            lssp –clustername galaxy –sp atlantic –bd

That showed us that vdisk_red6a was in the cluster.

Then we ran this command to map it in vios2:

mkbdsp -clustername galaxy -sp atlantic -bd vdisk_red6a -vadapter vhost2

If you compare the command that creates the vdisk to the one that maps the vdisk to the client LPAR, the only difference is the size you provide. Someone can tell me if there’s an easier way to do it. For my own amusement I tried using the old mkvdev command. It didn’t work.

When we ran lsmap –all, we could see the same vdisk presented to the client, going from both VIO servers.

We then wanted to try live partition mobility using shared storage pools. This posed some problems, but searching on the error message we encountered (HSCLA24E) turned up this entry:

“This week we were trying to migrate some VIO hosted LPARs using XIV disk from one POWER7 system to another. The disk is hosted on a VIO server via the fabric, then using VSCSI devices to map up to the servers. Unfortunately the migration failed and the message we got was HSCLA24E: The migrating partition’s virtual SCSI adapter 2 cannot be hosted by the existing virtual I/O server (VIOS) partition on the destination managed system. To migrate the partition, set up the necessary VIOS hosts on the destination managed system, then try the operation again.

“So we did some searching and found the following:

“HSCLA24E error:         

1) On the source VIOS partition, do not set the adapter as required and do not select any client partition can connect when you create a virtual SCSI adapter (can cause the error code).
2) Max transfer size of the used hdisk may not be different on source and destination VIOS.
3) The VIO servers may not have set the reserve_policy to single_path, no_reserve is required.
4) Destination VIO servers are not able to see the disks the client needs.
5) The same VTD (virtual target devices) names may not exist on the destination system.”

In our case we addressed no. 1 by unselecting the “any client can connect” option and mapping to the specific client we were using. With these changes, we could successfully migrate the LPAR.

In the course of changing the adapters, we rebooted the VIO servers. Be patient when rebooting. It seems to take some time for the servers to restart and join the cluster. You’ll know it’s ready when the cluster –status command changes from “state down” to “state OK.” (We joked that you only have to give it until “despair + 1.”)

Also, be sure to run df and check your /var/vio/SSP/’clustername’ filesystem that gets created on all the members of your cluster. That was a quick and dirty way for us to determine that our status was about to change to OK. As the cluster comes online, and as you run cluster –status, you’ll see the filesystems mount and the status change from down to OK.

This initial build-out of shared storage pools offers some advantages. For starters, there are fewer, larger LUNs to present and zone to the VIO servers. With larger LUNs being carved up in the pool, there’s less to manage with no reserves and the mkvdev commands. Of course some would argue that this advantage is offset by the need to run mkbpsp commands on both VIO servers.

It’s also nice being able to login to one cluster node, create a vdisk and see that new vdisk show up on all four nodes, rather than having to login to each VIO server separately. This just feels like a cleaner disk management solution.

As I continue to work with shared storage pools, I’m sure I’ll have more lessons to pass along. If you’ve been using this technology, please share your thoughts in Comments.

Restoring Old Data

Edit: I enjoy crazy projects like this one.

Originally posted May 22, 2012 on AIXchange

A customer was looking to restore some data from an old LTO1 tape. The tape was created in 2005 with versions of AIX and Tivoli Sysback that were common back then. Since the customer no longer had hardware that could read the tape, I was asked if I could help retrieve the data.

An LTO3 tape drive can read LTO1 tapes, so I figured I’d look for that hardware rather than try to scrounge up an LTO1 drive. I contacted IBMers Jeff Werner and Pete Dragovich. From their Chicago lab, they use all kinds of leading-edge and vintage hardware and software to conduct customer-initiated projects and training activities. For instance, they recently performed a simulated HMC update (from 5.2.1 to 6.1.3) for a customer that lacked its own test box. They’ve also recently done a couple of proof of concept (POC) projects, one to simulate failover with IBM’s PowerHA high availability solution and the other involving SLES 11 Linux on POWER6. I figured if anyone could help me, it’d be Jeff and Pete.

I picked up the tape from the customer and brought it to the IBM lab. We built a POWER4 server with an old version of AIX and loaded the Tivoli Sysback code along with the tape utilities we needed for our LTO drive.

After getting everything ready, we loaded the tape, ran /usr/sbin/readsbheader –Nd –f/dev/rmt0 –l, and waited for a list of information. Instead, we immediately got errors. The volume label could not be read. After searching online, we discovered we could run commands to move the tape around and try to read some tape labels, but then we got I/O errors.

Fortunately, the customer had made two copies of the backup tapes, so I picked up the second tape. At that point, I was told that they’d used an unusual block size. That turned out to be the key. We tried 0, 512, and 1024, but hadn’t thought about 262144. Once we changed the block size, we were able to restore the data and all went as expected.

With the data restored, we copied it to a Windows machine to burn a DVD copy of the data. The customer then loaded the DVD onto their own Windows machine, and sent it to their server via FTP.

Some takeaways from this project: If you’re archiving data, either keep the hardware and software that’s capable of reading it, or periodically transfer important old data to newer media. And if you use a different block size when writing data, be sure that’s documented—preferably on the tape itself.

I enjoy solving problems, and this was no exception. Now if I’m asked to restore an old Sysback tape, I know I can do it—provided Jeff and Pete are again willing to lend me a hand.

Tracking NPIV Connectivity

Edit: I still love handy scripts.

Originally posted May 15, 2012 on AIXchange

IBMer Glenn Lehman posted this script on a mailing list, and with his permission I’m posting it here. Glenn offered this introduction and description: “I search for various 4-digit IBM storage types. My example is coded to recognize 2810, 2107,1750 (which translates to XIV, DS8300, DS6800).

“I share this as an intro… because prior to starting, our team was concerned with how we would keep track of all this configuration information, so I wrote and we now use my handy script. It gathers and parses and organizes all the info we wanted to track into neat columns. It’s all automatic, and all real-time.

“Our environment was all p570 frames — and strictly NPIV (no vSCSI). We have dual SAN fabrics and dual VIO servers per frame for redundancy and disk multi-pathing.

“Let me know if you find this useful, or if you make any changes or modifications so we can share it with others.”

He adds that the following should be customizable for those familiar with ksh scripting.

#!/bin/ksh
#set -x
#@============================================================================
# Script    :  Collect and display FC/Disk/SAN info for given VIOS pair
# Author    :  Glenn Lehman / IBM
#
# History  :  Created: Thu Jan 26 EST 2012
# Modify    :  Jan 30 2012 – add dph VIOS option
#          :  Jan 31 2012 – add ping test to confirm VIOS is active
#          :  Feb 06 2012 – add uniq date suffix to TMP filename
#          :  Feb 09 2012 – add disk type and count to output
#
# Usage    :  Invoke from remote LPAR with NPIV frame parameter
#=============================================================================
#
##########################  Assumptions  ################################
# There is a remote server that can access all VIO-servers & VIO -clients
#  and that remote server is where this script resides and invokes from
# There are 2 VIO-Servers in a given frame
# There is a distributive shell method configured in the environment
# The LPAR profile name is VIO-…
# Example: Profile=>  VIO-hostAix61prd
#          Hostname=> hostAix61prd
# Storage types are one of the following:
# XIV – 2810
# DS8300 – 2107
# DS6800 – 1750
# There are dual SAN Fabrics with unique domains, so that the FCIDs
# can pinpoint the fabric
# Example:
# 0x15 Fabric vs. 0x16 Fabric


##########################  Customization Step  #########################
# Assign the distributive shell command and usage options
#
DSH=’dsh -w’

function Get_lsmap
{
## capture the full lsmap output for VIO1 ##
$DSH $VIOH1 “/usr/ios/cli/ioscli lsmap -all -npiv” | cut -f2- -d’:’ > $TMP.map1
## convert to line separated stanzas ##
cat $TMP.map1 | grep -v “^ $” | \
    awk ‘
    $1 == “VFC” {print $0 “\n \n”}
    $1 != “VFC” {print $0}’ > $TMP.sep1

## capture the full lsmap output for VIO2 ##
$DSH $VIOH2 “/usr/ios/cli/ioscli lsmap -all -npiv” | cut -f2- -d’:’ > $TMP.map2
## convert to line separated stanzas ##
cat $TMP.map2 | grep -v “^ $” | \
    awk ‘
    $1 == “VFC” {print $0 “\n \n”}
    $1 != “VFC” {print $0}’ > $TMP.sep2

cat $TMP.sep1 $TMP.sep2 > $TMP.map_all
return
}

function Show_Usage
{
echo “Requires a single valid parameter that denotes a VIOS pair
using npiv mapping…”
echo
echo “Usage: $(basename ${0}) < ae2 | b12 | e40 | 88f >”
echo;echo
return
}

function Show_Header
{
echo “\t=========================================================================================”
echo “\t  $FRAME virtual (fscsi) mapping – $(date)”
echo “\t=========================================================================================”
echo ”                                          VIOS
VIOC        “
echo “VFCHOST    ID#  Partition            Server,Slot,Phys
Slot,Virt  Virtual WWPN used  Disks    FC/SAN Status”
echo “———-  —  —————–    ——————-
———  ——————  ——  —————————-

return
}

function Ping_VIOS
{
## Verify VIOS pings OK from this server
VIOS=$1
/etc/ping -c 2 $VIOS  > /dev/null
PING_RC=$?
if [[ $PING_RC != 0 ]]; then
  echo “ERROR: Remote server ping from nim0 to $VIOS failed…. aborting.”
  exit 99
fi
return
}


# —— MAiN PROGRAM BEGiNS HERE —— #
if [[ $# -ne 1 ]]; then
  Show_Usage
  exit 99
fi

## Define pretty-print variables
typeset -L11 VFCHOST
typeset -R2  PLPID
typeset -L20 PLPNAME
typeset -L21 SFCINFO
typeset -L10 CFCINFO
typeset -L8 DISKINFO
typeset -L19 PWWPN

VIOPAIR=$1

##########################  Customization Step  #########################
# Assign Server Frame keyword shortcuts for custom parameter input;
# add these to the Show_Usage “Usage” line
# Assign the various VIO-server hostnames and the Server Frame name
# for each “shortcut”
# Example: Use ae2 as shortcut for Server-9117-MMA-SN10D1AE2
#
case $VIOPAIR in
  ae2 | AE2 ) VIOH1=lbvio1-ae2; VIOH2=lbvio2-ae2;
FRAME=’Server-9117-MMA-SN10D1AE2′;;
  b12 | B12 ) VIOH1=lbvio1-b12; VIOH2=lbvio2-b12;
FRAME=’Server-9117-MMA-SN10D1B12′;;
  e40 | E40 ) VIOH1=lbvio1-e40; VIOH2=lbvio2-e40;
FRAME=’Server-9117-MMA-SN1014E40′;;
  88f | 88F ) VIOH1=lbvio1-88f; VIOH2=lbvio2-88f;
FRAME=’Server-9117-MMA-SN106988F-NDev’;;
  * ) echo “VIOS parm – $VIOPAIR – is invalid; aborting…”
      Show_Usage
      exit 99
esac

TMP=”/tmp/get_vfchosts-$(date +%s)”

## confirm VIOS pair is active and ping-able ##
Ping_VIOS $VIOH1
Ping_VIOS $VIOH2

## collect and reorganize the lsmap output ##
Get_lsmap

Show_Header

## build sorted list of vfchosts ##
grep vfchost $TMP.map1  | awk -v vioh=$VIOH1 ‘{print $0″  “vioh}’ >
$TMP.vfchost1
grep vfchost $TMP.map2  | awk -v vioh=$VIOH2 ‘{print $0″  “vioh}’ >
$TMP.vfchost2
cat $TMP.vfchost1 $TMP.vfchost2 | sort -tt -n -k 2,2 > $TMP.vfchost_all

cat $TMP.vfchost_all | while read VFCHOST PHYLOC PARTID NAME1 NAME2 VIOS; do
    PLPID=$PARTID
    LPNAME=$(echo ${NAME1}${NAME2})
    if [[ $LPNAME != “VIO-“* ]]; then
      PLPNAME=’unassigned’
      HOSTNAME=”
      VIOS=$NAME1                      # shifts position if less fields in line
    else
      LPNAME=${LPNAME%AIX}
      HOSTNAME=${LPNAME#VIO-}
      PLPNAME=$LPNAME
    fi
    SSLOT=$(echo $PHYLOC | awk -F- ‘{print $NF}’)
    PFC=$(grep -wp $PHYLOC $TMP.map_all | grep -w FC | awk ‘{print
$2}’ | cut -f2 -d’:’)
    if [[ $PFC != “fcs”* ]]; then
      PFC=’none’
    fi
    SFCINFO=”$VIOS,$SSLOT,$PFC”
    CSLOT=$(grep -wp $PHYLOC $TMP.map_all | grep VFC | awk ‘{print
$NF}’ | awk -F- ‘{print $NF}’)
    if [[ $CSLOT != “C”* ]]; then
   0;   CSLOT=’undefined’
    fi
    VFC=$(grep -wp $PHYLOC $TMP.map_all | grep VFC | awk ‘{print $3}’
| cut -f2 -d’:’)
    if [[ -n $HOSTNAME ]]; then
      $DSH $HOSTNAME “lsdev -c disk” | cut -f2- -d’:’ > $TMP.disk
      DSKCNT=$(grep -c MPIO $TMP.disk | awk ‘{print $1}’)
      ##########################  Customization Step  #########################
      # Assign the possible remote disk types based on their 4-digit
      # machine type
      #
      if ( grep -w 2810 $TMP.disk > /dev/null ); then
        DISKTYP=’XIV’
      elif ( grep -w 2107 $TMP.disk > /dev/null ); then
        DISKTYP=’DS83′
      elif ( grep -w 1750 $TMP.disk > /dev/null ); then
        DISKTYP=’DS68′
      else
        DISKTYP=’unkn’
      fi
      DISKINFO=”$DISKTYP($DSKCNT)”
      $DSH $HOSTNAME “fcstat $VFC | grep ‘Port'”  | cut -f2- -d’:’ > $TMP.fcst
      WWPN=$(grep ‘World Wide Port’ $TMP.fcst | awk ‘{print $NF}’)
      FBRC=$(grep ‘Port FC ID’ $TMP.fcst | awk ‘{print substr($NF,1,4)}’)
      TYPE=$(grep ‘Port Type’ $TMP.fcst | awk ‘{print $NF}’)
    else
      WWPN=’no active flogi  ‘
      FBRC=’….’
      TYPE=”
      DISKINFO=’unknown’
    fi
    PWWPN=$WWPN
    CFCINFO=”$CSLOT,$VFC”
    PCNT=$(grep -wp $PHYLOC $TMP.map_all | grep Ports | cut -f2 -d’:’)
    print “$VFCHOST $PLPID  $PLPNAME $SFCINFO  $CFCINFO  $PWWPN
$DISKINFO $PCNT ports logged in $FBRC $TYPE”
done
echo

## Be polite and clean up after yourself ##
rm -f $TMP.*
exit

In the process of posting this script to the blog, some of the formatting may be altered. Make the logical adjustments as need be. And if you have your own handy scripts or tools, please send them to me and I’ll share them here.

Another Grab Bag

Edit: Some links no longer work.

Originally posted May 8, 2012 on AIXchange

As I’ve noted before, I love passing along tips and tricks. And I love hearing IT horror stories. A little of both this week:

* First, here’s an email I got from someone who changed the xfer_size option on his machine. In his words:

“I found this blog post a day or two too late. I had tried some AIX fibre card tuning on my Domino servers, which consist of two physical and two VIO NPIV virtual adapters.

“I meant to only change the real cards to 200000 for the xfer_size option, but I had changed the virtual adapters as well, and rebooted. The LPAR hung at LED code 554, and I had to mount OS disk in maintenance mode to mess around with it. This allowed me to undo my change and rmdev the OS disk and paths, to get the LPAR back.

“FYI in case anyone tries this in the future, hopefully they will learn from my mistakes. I am not sure if this is an XIV or an NPIV issue, but I would advise people to not mess with the NPIV xfer_size settings, especially for their root disk.”

* I found this item on a mailing list (from Phil L.) and feel it’s worth sharing:

“[VIOS 2.2.1.3 introduced] clustering… which is controlled by System Director. System Director automatically starts snmpd, so even if you have it disabled with the viosecure commands it will still start via System Director. The workaround:

            dirsnmpd (Systems Director) is started from:

                /opt/ibm/icc/cimom/bin/startdirsnmpd

            To inhibit dirsnmpd at bootup:

            Edit: startdirsnmpd script

            Comment-out:

                # /usr/bin/startsrc -s snmpd > /dev/null 2>&1

“IBM Support is considering modification of the viosecure rules.”

* I saw this in a recent e-mail:

“We have two VIO servers that need to be updated from version 1.4.1.2-FB-9.2 to version 2.2.0.13-FP24 SP03. We did update the two VIO servers in our second data center (so it was a non-production environment). The problem was that we did it incorrectly.

“We put in the migration CD and ran ‘updateios; instead of booting off the CD and running ‘migrateios.’

“We had to rebuild the whole environment from scratch. We definitely want to avoid that in the production environment!”

Me now. All of you have a test lab to try things out first, right?

“I did have a backup of the VIO servers (created via the command ‘ioscli backupios –file … -mksysb,’ run as padmin) but we were unable to recover from that backup.

“We worked with IBM Support and they still were not able to recover from our backup so we had to do a fresh install of the VIO server software and rebuild the environment from scratch, and then I had to
recover each LPAR (12) from their mksysb’s. It was a 40-hour weekend!

Me again. All of you know your backups are good and you’ve tested your recovery process, right?

“In fairness, a lot of that time was actually waiting for CDs to spin. I think if the migration had been done correctly there would only have been a handful of commands that actually needed to be run.”

IBM Sticks with the HMC

Edit: The HMC has still survived. Some links have not.

Originally posted May 1, 2012 on AIXchange

So the SDMC evolution was upon us. I took my test drive. But then, just like that, it’s over. The HMC has apparently survived.

Nigel Griffiths (mr_nmon) posted these tweets:

  • “SDMC withdrawn as IBM listens to customers. SDMC only functions like dual VIOS+AME for Blades go in the next HMC version. Long live the HMC.”
  • “I had better add: SDMC owners needing Blade functions it is supported for 2 more yrs & you can convert to HMC (same HW) at your convenience.”

 What exactly is going on?

Start with these announcement letters (here and here). IBM is no longer selling SDMC units. The SDMC hardware indicator is going away. The SDMC and SDMC virtual appliances are also going away, although they’ll be supported through April 2015.

IBM has announced this statement of direction:

“IBM intends to enhance its systems management capabilities for Power Systems hardware as follows:

  • Continued integration between the base platform management capabilities in HMC and advanced capabilities in IBM Systems Director
  • Enhancement of the HMC to add support for Power Systems blades and mixed rack and blade server environments
  • New HMC virtual appliance offering
  • New process for transitioning from SDMC to HMC
  • Improved usability”

I, for one, look forward to running the HMC appliance in smaller environments.

This FAQ includes the following, but read the entire document as I didn’t copy everything:

“Why is this change being made?

“Clients have asked IBM to continue enhancing the HMC, which is a trusted, secure and dedicated management appliance for Power Systems. At the same time, due to the rapid increase in adoption of advanced virtualization and cloud solutions by Power Systems clients, IBM plans to continue enhancing the virtualization management capabilities of the Systems Director management server software.

“What are the options for clients using an SDMC?

“Clients currently using an SDMC for management can either convert it to an HMC (at no charge) immediately, or continue using the SDMC and convert it at a more convenient time in the future. The SDMC will continue to be supported by IBM until April 2015.

“What are the options for clients using Power blades?

“Clients currently using an SDMC for managing Power blades can continue using it or switch to using IVM instead. When the new release of the HMC (featuring blade support) is available, they can then convert the SDMC to an HMC at no charge.

“Are hardware changes required to convert an SDMC to an HMC?

“The SDMC is based on the same hardware as the HMC, and no hardware changes are required to make the conversion. After converting an SDMC to an HMC, clients can benefit from the additional memory and disk capacity.

“Will endpoint management licenses be required for the HMC?

“No.

“What happens to support agreements after clients convert an SDMC into an HMC?

“SDMC clients with current SWMA contracts may contact IBM Support for assistance before, during and after the HMC conversion process. When their SWMA is due for renewal, clients who have converted an SDMC to HMC should renew using the HMC Machine Control Program Remote Support
Agreement (MCRSA).

“How can a client using SDMC to manage Power blades transition to using IVM?

“As an alternative to the SDMC, clients can manage Power blades with the IVM. However, the IVM has limitations (compared to the SDMC and HMC) and cannot support dual VIOS configurations.

This IBM Redpaper covers SDMC-to-HMC migrations.

Basically you prepare for the migration by gathering IP addresses, verifying passwords, backing up profiles and then removing managed systems and frames from the SDMC. You download the HMC code and service packs (or order it on media) and then install the HMC code. I understand IBM will soon come out with video demonstrations of this process.

Keep in mind that none of this affects the FlexSystems Manager that will be used with the newly announced IBM PureFlex systems. I’ll get into that topic soon.

So what are your thoughts about this change? Had you done much with the SDMC, or were you taking a wait and see attitude?

The Connection to Storage

Edit: Still good information.

Originally posted April 17, 2012 on AIXchange

In case you’re wondering why this server blog just published a post about storage, it’s simple: Without storage, servers don’t have anywhere to read and write their data. Many of us server admins do have some knowledge of storage, but many more do not. Understanding the differences between storage technologies is important. It can help us when we need to discuss our options with our storage friends.

Anyway, back to Norman Bogard’s storage webinar. By the way, although I wish everyone had access to this training, it was provided exclusively for IBMers and IBM Business Partners. The content is not available online, which is why I’m posting this information (with Norman’s permission).

I’ll continue with a rough summary of how he compares and contrasts network-attached storage and storage area networks. We’ll start with SANs:

“Block-level storage devices and SANs like IBM V7000, IBM DS8000 and IBM XIV provide access to equal sized blocks of storage, and the blocks are found by block numbers on a device. All read and write operations are performed on data blocks – mainly using the SCSI protocol. Block services are segmented into LUNs or vdisks, and you might usually have a few dozen of them.

“On the other hand, NAS devices like N series, SONAS or V7000 Unified provide access to files.  These files are found by a name within a tree of names: read, write, create, delete and many more. CIFS, NFS, FTP and other protocols are used to access these files. Device services are exposed as exports, directories and files, and in this case we might be accessing a few hundred, or possibly even millions or billions of files.

“Your NAS may be connected to hundreds or thousands of client machines. Authorization is handled by user IDs for reads, writes and meta-data operations.

“Who owns the filesystem? With direct-attached storage, it’s a simple case. The storage lives on the server. Think of regular SCSI disks and expansion drawers filled with disks, or old SSA drawers of disks.

“With a SAN, the server still owns the files and it controls how the data is written to the disk, even though those disk arrays might be on a disk subsystem instead of internal disk. NAS, on the other hand, handles the filesystems and the files and just gives you access to them after you have authenticated.

“With converged, or unified, storage, there are two fundamental approaches to intermixing block and file storage within a single system. IBM’s N series uses block on file. A device file with a logical unit number (LUN) assigned to it is stored within the file server’s WAFL (write anywhere file layout) file system and then mapped to a host. File and block data are stored within the same file system.

“IBM’s Storwize V7000 Unified uses file on block instead of WAFL. A raw device from the V7000 is mapped to hosts. File data is contained within discrete devices. Host block data is contained within discrete devices. File and block data are stored independently.

“Based on your application type, rules of thumb can help you decide whether NAS or SAN makes more sense in your environment.

“Applications and data types that typically reside in block stores or SAN include RDBMS (Oracle, SQL Server, DB2), analytics (stream processing), OLTP, metadata layers (component of content management), e-mail (MS Exchange, Lotus Notes) and virtualization stacks (VMware: VDI, VMDK implementations, HyperV; Citrix Xen)

“Applications/data types that typically reside in files or NAS includes rich media (pictures, videos, seismic data, medical imaging, etc.), VOD, AOD, IPTV analytics (SAS grid), enterprise content management (ECM, e.g., web stores), research data sets, user files (documents, etc.), product lifecycle/data management (PLM/PDM) and virtualized environments (VMware client-driven deployment).

“Another consideration is how you backup these different environments. With NAS you typically have consistent snapshots since the files and filesystems are consistent on the NAS device. Replication is supported, and NAS usually integrates with backup software.

“With a SAN, integration with the host file system is needed to ensure consistency. Many times backups are moved through a master media server to disk or tape. Replication is supported once the file system is consistent.”

The webinar also covers file systems, file shares, network services, authentication and authorization, quota, data availability, data protection using snapshots, backups and replication, antivirus support and file cloning.

I love education in this format. This webinar takes a big concept like storage and breaks it down to easy to comprehend descriptions. We may work on servers, but servers connect to storage. That’s why learning about storage is worth our effort.

Valuable Insight into Storage

Edit: It amazes me how much further we have come since I wrote this.

Originally posted April 10, 2012 on AIXchange

I recently took some online training that I found interesting and valuable. The webinar presenter, Norman Bogard, compares and contrasts network-attached storage (NAS) and storage area networks (SANs). (In his presentation Norman acknowledges Brett Cooper and Nils Haustein for their input, so I want to be sure to mention them here.)

When I think of early iterations of direct-attached storage, like SSA disks or regular internal SCSI disks, it amazes me how far we’ve come. Advances have been brought not just to storage hardware, but to the network infrastructure. Networks and switches are so much faster and more robust now.

The webinar opens with a review of the history of NAS and SAN and protocols like NFS, NCP, SMB and CIFS. Then some terminology is introduced. I’m paraphrasing and borrowing some of the language from the slide deck since Norman did such a good job of compiling the material:

“SAN, or block storage, will leverage Small Computer System Interface (SCSI) commands to read-write specific blocks. Common SCSI access methods include Fiber Channel (FC), Internet Small Computer System Interface (iSCSI), or InfiniBand (IB). InfiniBand is a high speed network interconnect.

“NAS, or file storage, reads and writes files instead of blocks. The NAS has control of the files, contrasted with a SAN where the server would have control over the files.  A file server is a storage server dedicated (primarily) to serving file-based workloads.

“A NAS gateway is a server that provides network-based storage virtualization. It provides protocol translation from host-based CIFS/NFS to Storage Area Network (SAN) based block storage. Examples of NAS gateways are IBM N series & SONAS; NetApp V Series; EMC VNX/Celerra; OnStor (LSI); HP P4000 Unified Gateway.

“Unified Storage is a single logical, centrally managed storage platform that serves both block (FC, iSCSI, IB) and file-based (CIFS, NFS, HTTP, etc.) workloads. Examples of Unified Storage includes IBM N series; NetApp V series; IBM Storwize V7000 Unified.

“When you compare NAS and SAN, you will find that they have similar concepts. For example, your redundancy for your SAN will come from your MPIO or SDD drivers, while redundancy for the NAS will come from teaming or trunking your network ports for resiliency or improved bandwidth, depending on how you have set things up.

“Your security for a SAN will come from LUN Masking and Zoning, while you would control access on the network the same way you always would, with things like VLANS, exports, and shares.

“Your physical connectivity to the SAN would come via the HBA, while your network traffic for the NAS would go out the same NIC that it always has, at least until converged network adapters become more widely deployed. Once we have converged adapters, all of the traffic will be network traffic, although you will then be dealing with more encapsulation of the different frames and protocols.

“Your underlying protocol on a SAN is SCSI, while you use the same IP/UDP protocols that you do with networking when you use NAS.

“You will call your SAN devices arrays, and you will present LUNs, while your NAS will have filers and data movers. You will have structured/relational data on a SAN, and unstructured data on a NAS.”

From there, Norman contrasts the concepts of block storage (SAN) and file storage (NAS). I’ll share more about this webinar in next week’s post.

The Importance of DR Testing

Edit: Taking a backup without actually testing you can recover is really just as good as making a wish. Link no longer works.

Originally posted April 3, 2012 on AIXchange

Recently my customer wanted to see if its old, unsupported application could be recovered in an emergency. They were running AIX 5.2, and I was cloning that to another piece of older hardware for a disaster/recovery test. While the customer had been taking mksysb and application backups for some time, this was the first actual attempt to recover the system.

After restoring the mksysb to the target machine, we went to run the vendor’s built-in scripts to recover the database. It turns out that all of the data and binaries we needed to run the recovery operation was on datavg instead of rootvg.

It also turned out there was no datavg backup, only the database backups. This became the first item to address, ensuring we had a savevg of datavg. In this case the customer had 135 logical volumes that had to be recreated, some jfs2 and some raw.

The customer really wanted this cloned machine to be identical to the source machine, down to the PP size, but no way was I going to recreate 135 logical volumes manually. So I went ahead and did a savevg from the source machine. Had this been an actual DR situation, we would have already been in trouble had the LV information not been stored somewhere. (Hint: Besides backups, it may be handy to have output from important files like /etc/filesystems available to you in case of an emergency.)

When I tried to restore the information from the savevg and remake the volume group

(smitty/system storage management/logical volume manager/volume groups/remake a volume group),

it kept coming up with a 512MB PP size instead of the 64MB PP size I was inputting. Even when I tried it from the command line (restvg –f /dev/rmt0 –r –n –P 64 hdisk4), it’d still create the 512MB PP size.

However, since I still had the source system, it was a simple matter of taking the logical volume information from the source volume group and copying it to /tmp:

lsvg –l datavg > /tmp/datavgout.file

Because I only cared about the first and third columns of the lsvg output, I ran this command to obtain the LV name and LV type:

cat /tmp/datavgout.file | awk ‘{print $1, $3}’ > /tmp/datavgout2.file

I created the datavg on the target system manually with the 64 MB PP size. Then I edited the datavgout2.file and made sure it had the correct LV name, LV type and the number of PPs that I wanted on the target machine. To read the file and create the LVs, I ran this simple loop:

cat ‘/tmp/datavgout2.file’ | while read $i $j $k
do
mklv –t $j –y $i datavg1 $k
done

($i is the name, $j is the type and $k is the number of PPs.)

I did end up using smitty (smitty/system storage management/logical volume manager/volume groups/restore) to restore the files in the jfs2 filesystems.

Once the volume group had been recreated and the necessary files were restored, I could use the database backup tape to restore the database.

The customer now takes a daily savevg of datavg, and all of the necessary LV information from the entire system is saved as part of the rootvg backup. In the end we were able to get a running system. Even more important, we learned something. Without going through this exercise, my customer may have been missing key information and data it needed to restore the system in an actual disaster.

This is why simply having a DR plan isn’t enough. Your plan must be tested. Even if you think you have good backups, it might not be the case.

Incidentally, Anthony English recently wrote about recovering datavg filesystems as well. He discusses using the mkvgdata (to capture the volume group structure) and restvg commands. His information is worth considering if you find yourself recreating a volume group. It’s certainly simpler than going through the gyrations that I went through, though I’ll still need to test it to see if Anthony’s approach will eliminate my customer’s PP issue.

VIOS and IBM i

Edit: Some links no longer work.

Originally posted March 27, 2012 on AIXchange

Two questions for IBM i shops: Are you reluctant to use a VIO server and attach it to your SAN, even though your SAN isn’t supported directly by IBM i? Do you end up telling yourself that internal disks give you better performance?

If so, this document might help alleviate your fears.

It covers different topics related to IBM i virtualization and open storage, including how to use an IBM i partition to host another IBM i partition:

“An IBM i 6.1/7.1 LPAR can host one or more additional IBM i LPARs, known as virtual client LPARs. Virtual client partitions typically have no physical I/O hardware assigned and instead leverage virtual I/O resources from the host IBM i partition. The types of hardware resources that can be virtualized by the host LPAR are disk, tape, optical and networking.”

There’s also this about IBM i using open storage as a client of the VIOS:

“IBM i virtual client partitions can also be hosted by VIOS. VIOS is virtualization software that runs in a separate partition with the purpose to provide virtual storage, optical, tape and networking resources to one or more client partitions. The most immediate benefit that VIOS brings to an IBM i client partition is the ability to expand its storage portfolio to use 512-byte/sector open storage. Open storage volumes (or logical units, LUNs) are physically attached to VIOS through a FC or a Serial-attached SCSI (SAS) connection and then made available to IBM i. While IBM i does not directly attach to the storage area network (SAN) in this case, as soon as open storage LUNs become available through VIOS, they are managed the same way as integrated disks or LUNs from a directly attached storage system and run IBM i on a Power blade.”

Finally, something about blades:

“The third major virtualization enhancement with IBM i 6.1 is the ability to run an IBM i LPAR and its applications on a Power blade server, such as IBM BladeCenter JS12 or JS22. Running IBM i on a Power blade is beyond the scope of this paper. Refer to the IBM i on a Power Blade Readme First for a complete technical overview and implementation instructions.”

The document covers supported configurations and concepts to help you visualize what I’m proposing. I’ll highlight this section on 5.2 performance:

“When creating an open storage LUN configuration for IBM i as a client of VIOS, it is crucial to plan for both capacity and performance. As LUNs are virtualized for IBM i by VIOS instead of being directly connected it may seem that the virtualization layer will necessarily add a significant performance overhead. However, internal IBM performance tests clearly show that the VIOS layer adds a negligible amount of overhead to each I/O operation. Instead, the tests demonstrate that when IBM i uses open storage LUNs virtualized by VIOS, performance is almost entirely determined by the physical and logical configuration of the storage subsystem.

“The IBM Rochester, MN, performance team has run a significant number of tests with IBM i as a client of VIOS using open storage. The resulting recommendations on configuring both the open storage and VIOS are available in the latest Performance Capabilities Reference manual (PCRM).”

I find more customers that are willing to give VIOS a try. I’ve yet to find one that decided to switch back because performance was unacceptable.

I realize that this blog’s readership is very AIX-centric, but plenty of shops run Linux and IBM i as well. It’s nice to know that the frame that you’re virtualizing with VIOS to run AIX can run other operating systems as well. Not that this is a new idea.

Automatically Changing IP Addresses in a D/R Environment

Edit: This is still an interesting idea.

Originally posted March 20, 2012 on AIXchange

I recently spoke to a customer that has its primary and backup servers in different locations. The customer boots from a SAN, with the SAN replicating from site 1 to site 2. In the event of a disaster, the customer wants to fire up its site 2 LPAR from the replicated copy of rootvg. However, the networks are also in different locations.

Rather than do some admin kung fu to allow each network to have the same IP address when it boots up, the customer sought the capability to easily change the IP address depending on the frame being used to boot the LPAR. The customer says this functionality is available in VMware’s recovery management product, and wanted to know if the same type of thing can be done from the HMC.

I checked with a couple of my IBM contacts to see if they had any ideas. Chris Gibson had a good one.

“You could write a script that lives on the source system. It checks the system ID (lsattr -El sys0 -a systemid) when the LPAR boots. And if it’s a particular system serial number, it could bring up the interface with a different IP address.”

I forwarded this suggestion to the customer, and literally within a day their script was working. With the customer’s permission I’m sharing it here, along with their caveat:

“It works … the script is pretty rough. I’m no shell script expert by any means, but it does what I need it to do. I have it in /etc/inittab right before the rctcpip stuff, and that seems to work fine.”

Before trying to use this script, make sure your domain, nameserver, gateway and primary and backup server information are accurate for your environment. Of course you might be able to simplify or improve what’s here, but this should help you get started. Also note that in the process of posting this script to the blog, some of the formatting may be altered. You savvy scripters should move things around if need be.

#!/bin/ksh
# This script checks to see whether the system is booting off hardware
#at the primary or backup site and sets the IP, gateway, and name
#server based on what hardware it is booting from

# check to see which hardware is booting
OPTION=`lsattr -El sys0 -a systemid -F value`
IPADDRESS=`lsattr -El en0 -a netaddr -F value`
HOST=$(hostname)
DOMAIN=”mydomain.com”
PRIMARY=”IBM,123″
BACKUP=”IBM,456″
NAMESERVER=”10.9.0.1″
GATEWAY=”10.9.16.1″
# set the primary and backup IP to the correct subnet (xx.1 for primary, xx.2 for backup)
PRIMARYIP=`echo $IPADDRESS | awk -v pos=4 -v repl=1 ‘{print
substr($0,1,pos-1) repl substr($0,pos+1)}’`
BACKUPIP=`echo $IPADDRESS | awk -v pos=4 -v repl=2 ‘{print
substr($0,1,pos-1) repl substr($0,pos+1)}’`
BACKUPGW=”10.9.16.1″
BACKUPNS=”10.9.30.32″

echo “Host Hardware: $OPTION”
echo “Current IP: $IPADDRESS”
echo “Primary IP: $PRIMARYIP”
echo “Backup IP: $BACKUPIP”

if [ “$OPTION” = “$PRIMARY”  ]
then
        echo “Running from primary site”
        if [ “$IPADDRESS” = “$BACKUPIP” ] ; then
                echo “Setting IP for primary location”
                /usr/sbin/mktcpip -h $HOST -a $PRIMARYIP -m
255.255.255.0 -i en0 -n $NAMESERVER -d $DOMAIN -g $GATEWAY -A no -t
N/A
        fi
fi
if [ “$OPTION” = “$BACKUP”  ]
then
        echo “Running from backup site”
    if [ “$IPADDRESS” = “$PRIMARYIP” ] ; then
        /usr/sbin/mktcpip -h $HOST -a $BACKUPIP -m 255.255.255.0 -i en0 -n
$BACKUPNS -d $DOMAIN -g $BACKUPGW -A no -t N/A
    fi
fi

As always, I love getting reader questions and submissions. That includes scripts. Please send me your scripts or any other useful tips. We all benefit when you share your expertise.

Where the Virtual Still Falls Short

Edit: At least google voice running on your laptop lets you text using a Model M. The MIT link no longer works.

Originally posted March 13, 2012 on AIXchange

I’ve written before about my fondness for the durability and quality of a certain type of old keyboard.

In fact, for a very long time, I told myself that in a perfect world I’d find a way to hook my Model M keyboard up to a Bluetooth adapter, and then connect that to my mobile phone. Even if it wouldn’t be practical to carry that setup on the road, it’d sure be satisfying to take the typing speed that only a real keyboard can provide and bring it into the mobile world.

To my chagrin though, others eagerly anticipate a keyboard-free future:

“Why do we still use a keyboard and mouse to interact with digital information? This mode of human-computer interaction, invented more than 40 years ago, severely constrains our ability to access and interact naturally with digital content.

“Our group designs new interfaces that integrate digital content in people’s lives in more fluid and seamless ways. Our aim is to make it easier and more intuitive to benefit from the wealth of useful digital information and services. Our work is focused in the following areas:

“Augmented Experiences: We augment a person’s experience of their surroundings with relevant digital information. We try to make the experience as seamless as possible, blending the digital information into the physical environment and making interaction with that information natural and fluid.

“Responsive Objects: We alter everyday objects by embedding sensors, actuators and displays so that the objects can respond to people using them in meaningful, noticeable ways.

“Collaborative Interactions: We experiment with novel interfaces that are designed from the start for use by multiple people. The projects support collaborations ranging from small numbers to very large numbers of people and further differ in whether they support collocated versus remote collaboration as well as synchronous versus asynchronous collaborations.

“Programmable Materials: We invent interfaces and machines for control and manipulation of materials such as paper, fabric, wood and food. Our goal is to endow materials and manufacturing with some of the advantages associated with the digital world such as modifiability, programmability, responsiveness and personalization.”

Sure, I guess I look forward to the day when my glasses can transform into some type of heads-up display, and I can access other types of information just by focusing my eyes or looking in different directions. It will be nice when my personal digital assistant truly becomes that, or when Siri and Evi and the like actually function seamlessly.

It will be a great convenience to speak to my machine and find it not only understands me, but knows what I mean and not just what I say. (Or maybe we’ll have to get brain implants before we can have a better human/machine interface. Not really convenient, but still potentially useful.)

For me though, the physical still beats the virtual, hands down. I know a big part of that is all the time I’ve invested in physical keyboards. I’ve had my awesome Model M for going on three decades now. Given more time, I suppose I’ll eventually become proficient on virtual keyboards. But I type so much faster and scroll around the screen so much more easily with my keyboard and mouse. I can type reasonably quickly with a Blackberry since it has an actual keyboard, but with any virtual keyboard touch screen, I just plod along. And while my Android phone is adequate at voice recognition, that doesn’t help me when I’m at a meeting or in any environment where I don’t have the luxury of speaking aloud. Sure, virtual keyboards are fine for short messages, but when I need to type anything more substantial than “:)” and LOL, I don’t like them. Autocorrect can rescue me some of the time, but often it just introduces new problems.

Call me a dinosaur if you insist, but until I get my perfectly augmented reality, I’ll live happily with my old mouse and older keyboard.

The Disruptive Force of Data Lost

Edit: The link below is to one of my older articles. It still feels like yesterday.

Originally posted March 6, 2012 on AIXchange

This anecdote from author Neil Gaiman got me thinking:

“I left my Macbook Air on a plane on Sunday night, and have spent most of the rest of the week doing things like being on the phone to the backup service, learning that the tracking software I’d thought was on there was on there, but hadn’t been activated, buying a new computer, etc. I didn’t get the thing I was meant to be writing written. I was grumpy.

“And this morning I got an e-mail telling me that the thing that I would have been working on all week, that I’d already lost 15 pages of … was now going to change so radically I would have wasted a week’s work if I’d been working on it. So I am happy.”

I view this story on a few levels. Do you have a backup of your phone and laptop if you should lose both right now? If you’ve installed tracking software, have you tested it? Are you comfortable knowing that all someone has to do is wipe your machine and your tracking software will be useless? And were that to happen, would you still have your contacts, your latest projects, the data that is critical to you? Or would they be lost forever?

On a larger scale, if your data center burned down, can you restore it? Do you have disaster/recovery procedures in place? Have you tested them?

But beyond the loss of the Gaiman’s machine and his data, I was also struck that he had a deadline, he had something that needed to be worked on and completed, and he had lost some of that work. Fiddling with his computer rebuild had caused him to lose time that he could have spent working on the project. As things turned out though, his requirements changed and any work he would have done would have been wasted.

I’ve heard that some people are purposefully nonresponsive. When they get a call, e-mail or instant message, they’ll wait instead of answering immediately. Then, by the time they do respond, the person making the inquiry may have solved the problem without any assistance. While I don’t think IT folks should ignore their users, it is true that many times people will reach out rather than take a moment to examine their issue a little further. And, once they dig into the problem, they can often help themselves.

Of course, in our world, projects seldom change, and time spent fighting our machines is simply time lost. At least in this case, it’s nice to think that the universe was looking out for this author and things worked out in the end for him. Hopefully the universe looks out for all of us on occasion.

ASO the First Phase in Autonomic Tuning

Edit: This is something I have not thought about in a long time.

Originally posted February 28, 2012 on AIXchange

I’ve touched on Active System Optimizer (ASO) before, but now that Nigel Griffiths has released an ASO video, it seems an appropriate time to expound on this topic.

To run ASO, you must be at AIX7.1 TL01 or greater on POWER7 hardware running in POWER7 mode. ASO is installed by default on AIX 7. It’s not supported on older AIX releases or older hardware. (It will appear to be running, but will actually be hibernating.)

Nigel recommends running:

* oslevel –s to verify your AIX version

* lslpp –L | grep –I optimi to help verify that the ASO fileset is installed

* lsconf | grep ^Processor to verify that your LPAR is running in POWER7 mode.

ASO works under the covers; you don’t need to do anything to start it. It optimizes workloads based on AIX kernel statistics and the POWER7 hardware performance counters. It’s designed to improve cache and memory affinity by dynamically evaluating tuning options and making changes — including moving processes and memory — on the fly. It conducts pre- and post-monitoring to ensure that the changes improve performance. If improvement isn’t detected, ASO backs out the changes, hibernates and tries again later. Listen to Nigel’s presentation for details on this.

ASO provides cache affinity, aggressive cache affinity and memory affinity. It monitors performance and detects situations where threads can be moved from one chip to another to utilize the closer L3 cache.

ASO operates best on multi-threaded workloads. The jobs it monitors should be stable and long-running so it can most effectively make changes to the workload. It also needs to be a busy LPAR, otherwise you don’t gain much by trying to move things around. By “busy,” I mean that the processes need at least a 10-second lifetime. If applications come and go more quickly, ASO cannot make recommendations. If you’ve done manual tuning, either on your own or based on recommendations from IBM Support, ASO will simply hibernate rather than monitor and override your changes. In addition, specific processes can be marked as “don’t bother these,” and ASO won’t impact them.

ASO runs as an SRC kernel service. To locate it, search your machine for the aso process.

Use these commands to get ASO running:

1) start the kernel service with startsrc –s aso

2) run asoo –o aso_active=1

According to Nigel, the aso process uses very minimal CPU time, so this shouldn’t add much additional overhead on the system.

Logs are located in the /var/log/aso/* directory. You’ll see two files:

* aso.log, which has on/off/hibernating information

* aso_process.log, which provides details of actions and modified processes

Nigel says this log file isn’t formally documented, but you should be able figure out what it’s doing.

The man page says ASO can run outside of SRC, but this should probably only be done for debugging. You can also set shell variables before starting processes and apps. This provides some control over how they function with ASO.

As always, Nigel has much more than I can cover here. For instance, he shows real life examples of ASO running on his machine, output from the logfile, and more.

To wrap up, he tells us that ASO is largely set and forget. ASO uses near zero CPU when running, and it gently applies changes, tests behavior and undoes the changes if necessary. Because ASO is good for complex, multithreaded, long-running applications, it can move things around inside of your LPAR if they’re spread across CPUs.

Best of all, this is, as he says, just the first phase of clever autonomic affinity tuning. So keep your eyes open for what’s ahead.

A Good Look at PowerHA

Edit: Some links no longer work.

Originally posted February 21, 2012 on AIXchange

Another great Virtual User Group webinar recently took place, this one featuring Shawn Bodily’s presentation, “Introduction to PowerHA SystemMirror for AIX Standard Edition.” Be sure to get the presentation materials and listen to the replay. And look forward to the next VUG webinar, when this topic will be continued.

This material is very similar to the presentation on the same topic at last fall’s IBM Technical University. That’s an indication of the interest surrounding this solution. As Shawn notes in his presentation, IBM has unveiled 23 major releases of the product, an average one release per year. More than 12,000 customers use it worldwide. While Standard Edition is the focus of Shawn’s presentation, he points out that PowerHA SystemMirror for AIX Enterprise edition allows for multi-site cluster management, while also including all the functionality of Standard edition.

Shawn notes that “PowerHA gives you cluster management for the data center. The software monitors, detects and reacts to events. It establishes a heartbeat between the systems, and enables automatic switch-over.” He then defines what HA solutions generally try to do, which is eliminate single points of failure. The goal is to reduce downtime, but HA can also help you with planned downtime as well. It might not be fault tolerant, but it will be fault resistant.

One chart shows how failure points can be eliminated. To eliminate node failure, use multiple nodes. To eliminate VIO failover, use dual VIO servers. To eliminate site failure, deploy additional sites. Other items are covered, but again, the idea is to build in redundancy so you can continue to provide access to the applications and data that the business runs on.

Shawn discussed his customer interactions (an area I’ve touched on previously):

“First, another IBM representative will tell the customer about the hardware and the systems’ reliability, availability and serviceability (RAS) features. Then a second rep will discuss live partition mobility and how it seamlessly shifts logical partitions from one frame to another. So after 20-30 minutes of hearing about how the hardware never fails, THEN Shawn must step in and explain why the customer should be concerned with high availability and disaster recovery. That’s one tough act to follow.”

Some additional slides introduce Live Partition Mobility and explain how it allows you to move your live running operating system and application from one physical frame to another. Of course, that’s a hardware maintenance solution. What about software maintenance? With PowerHA, you can fail over to the other node, upgrade your application or your operating system, then fail back and do the same thing on the other side. Shawn notes that basically, PowerHA performs the functions that LPM doesn’t. He also gets into how PowerHA is used to recover from node failures, network failures, loss of shared storage access, and — with version 7.1 based on Cluster Aware AIX technology — rootvg errors. Finally, Shawn covers the differences between PowerHA 6.1 and 7.1.

Shawn makes these additional points:

* Remember, we can fail over from one system to one system, from one system to any other system, from any system to one, and from any to any. We can also fail over between different versions of hardware from the same or different families, assuming you can live with the performance degradation, and after you verify that the version of the operating system you want to run will work with your particular configuration. Failing POWER7 to POWER6 or POWER5 could conceivably work as long as you verify that your particular setup is supported.

* Often you’ll want to make your service IP address highly available, along with your application server and your shared storage.

* You can create user defined resources, custom resource groups and more granular resource group options. You can set up resource group dependencies. In his example, you might want to be sure your database is running before you try to start application servers, so you could configure that as a dependency.

* You can configure different priorities and choose which nodes to your resource starts on. This provides a great deal of flexibility and control when setting things up.

* You can also configure things to automatically run DLPAR and COD operations. So you could have a very “skinny” standby node, but when needed it could perform operations on the HMC and bring additional memory and CPU resources online.

* You can have application monitors so you can take actions if PowerHA detects that the application has gone down, and you can set up file collections to have the software help you keep config files or other important files kept in sync across the cluster. This is meant to support all regular files, as opposed to things like trying to keep password files in sync.

* Configuration and smart assistants are available to configure clusters. System Director plugins are also available, both for managing and monitoring your cluster’s state.

* The CAA command — /usr/sbin/clcmd — can be used to distribute commands across all cluster nodes.

* A cluster test tool can be used to validate clusters. This is also a good way to run tests across many different clusters in the environment, to ensure that we’re running the same tests across all of the machines.

Shawn’s final slide lists these great resources:

* IBM Redbook: PowerHA SystemMirror 7.1 for AIX

PowerHA & Aix Support & Compatibility Matrix

PowerHA Hardware Support Matrix

Incidentally, you can follow Shawn on Twitter.

I really enjoy these webinars, and of course the availability of the replays and presentation materials is a huge convenience. Hopefully my writing about VUG webinars encourages you to take the time and listen for yourself.

Note: Completely unrelatedly, I’m a YouTube sensation. Well, maybe not, but I’m there. Five new videos just went live where you can see me and some other IBM Power Champions talking with IBM’s Ian Jarman at last fall’s Technical University.

IVM, HMC and SDMC Continued

Edit: The links still work.

Originally posted February 14, 2012 on AIXchange

Continuing from last week, here’s more on the recently released IBM Redpaper, “IBM PowerVM Getting Started Guide.”

Chapter 2: IVM

From the authors:

“IBM developed the Integrated Virtualization Manager (IVM) as a server management solution that performs a subset of the HMC and SDMC features for a single server, avoiding the need for a dedicated HMC or SDMC server. IVM manages a single stand-alone server — a second server managed by IVM has its own instance of IVM installed. With the subset of HMC and SDMC server functionality, IVM provides a solution that enables the administrator to quickly set up a server. IVM is integrated within the Virtual I/O Server product, which services I/O, memory, and processor virtualization in IBM Power Systems.

“There are many environments that need small partitioned systems, either for test reasons or for specific requirements, for which the HMC and SDMC solutions are not ideal. A sample situation is where there are small partitioned systems that cannot share a common HMC or SDMC because they are in multiple locations.

“IVM is a simplified hardware management solution that inherits most of the HMC features. It manages a single server, avoiding the need for an independent personal computer. It is designed to provide a solution that enables the administrator to reduce system setup time and to make hardware management easier, at a lower cost.

“When not using either the HMC or the SDMC, VIOS takes control of all the hardware resources. There is no need to create a specific partition for the VIOS. When VIOS is installed using the default settings, it installs on the server’s first internal disk controller and onto the first disk on that controller. IVM is part of VIOS and activated when VIOS is installed without an HMC or SDMC.”

Chapter 2 continues with details on IVM installation.

I wish this chapter would include screen shots. (There are screen shots in chapters 3-4.) The Redpaper describes the steps, but for those unfamiliar with the interface it might be confusing. Some screen shots could help.

Chapter 3: HMC

More from the authors:

“Note: There is flexibility for you to plan your own adapter numbering scheme. The Maximum virtual adapters setting needs to be set in the Virtual Adapters window to allow for your numbering scheme. The maximum setting is 65535 but the higher the setting, the more memory the managed system reserves to manage the adapters.”

They cover the three VIOS installation methods: DVD, via the HMC (using the installios command) and via Network Installation Manager (NIM). One of the notes says:

“Interface en5 is the SEA adapter created in 3 on page 29. Alternatively, an additional virtual adapter may be created for the VIOS remote connection, or another physical adapter may be used (it will need to be cabled) for the TCP/IP remote connection. TCP and UDP port 657 must be open between the HMC and the VIOS. This is a requirement for DLPAR (using RMC protocol).”

I know when I set up shared Ethernet adapters on VIO servers, I like to add an additional virtual Ethernet adapter to put my VIO IP address on. This allows me to perform maintenance on my VIOS and SEA without an outage, as the network traffic goes out my backup SEA on my other VIOS.

Section 3.2 covers setting up dual VIO servers:

“The benefit of a dual VIOS setup is that it promotes Redundancy, Accessibility and Serviceability (RAS). It also offers load balancing capabilities for MPIO and for multi SEA configuration setups. The differences between a single and dual VIOS setup are:

The additional VIOS partition
The additional virtual Ethernet adapter used as the SEA Control Channel adapter per VIOS
Setting the trunk priority on the virtual Ethernet adapters used for bridging to physical adapters in an SEA configuration.”

The authors explain how to move from a single VIO SEA to a dual VIO scenario by adding the control channel adapter using this command:

   chdev -dev ent5 -attr ctl_chan=ent6 ha_mode=auto

They also mention that we can run commands on the VIO command line or use cfgassist, which is similar to smitty in AIX.

Section 3.3 covers setting up virtual fibre. The authors argue that virtual SCSI disks be used for rootvg and NPIV be used for data LUNs:

“Virtual Fibre Channel allows disks to be assigned directly to the client partitions from the SAN storage system. With virtual SCSI, the disks are assigned to the VIOS partition before they are mapped to a virtual SCSI adapter.

“The preference is to still use virtual SCSI for client partition operating system disk, and use virtual Fibre Channel for the data. The reasons for using virtual SCSI for client partition operating system disks are:

* When the disks are assigned VIOS first, they can be checked before having them mapped to a client. Whereas using virtual Fibre Channel this cannot be determined until the client partition is loaded from an installation source.

* Operating systems such as AIX and Linux have their kernels running in memory. If serious SAN issues are being experienced, the VIOS will first detect the problem and sever the link to the client partition. The client partition will halt abruptly reducing any risk to data corruption. With operating systems using virtual Fibre Channel or physical Fibre Channel, the partition will remain running for a period. During that period the client partition is susceptible to data corruption.

* Operating system disks using virtual SCSI are not reliant on external device drivers whereas operating system disks using virtual Fibre Channel are. When it comes to upgrading the external device drivers, the client partitions would need to follow special procedures to upgrade.”

Chapter 4: SDMC

From the authors:

“The IBM Systems Director Management Console (SDMC) provides system administrators the ability to manage IBM Power System servers as well as IBM Power Blade servers. The SDMC organizes tasks in a single panel that simplifies views of systems and day-to-tay tasks. The SDMC is also designed to be integrated into the administrative framework of IBM Systems Director.

“The SDMC can automatically handle the slot allocation of virtual adapters for the user. With the SDMC the user can choose to either let the SDMC manage the slot allocations, or use the traditional manual mechanism to allocate the virtual adapter IDs.”

According to section 4.1.4, setting up an SEA failover configuration is a simple GUI operation when using SDMC:

Select the primary VIOS, the physical adapter you want to use and the backup VIO and its physical adapter. Then hit OK:

“The SDMC automatically creates the SEA adapters on both VIOS1 and VIOS2. The SDMC will also configure the control channel as a part of this step. The virtual Ethernet adapter with the highest VLAN ID is used for the SEA control channel.”

This should remove the possibility of errors arising from setting up the control channels manually.

Although you can do the same with the HMC GUI, I still prefer to manage things on the command line.

The publication has much more. It’s well worth your time.

One Guide to the IVM, HMC and SDMC

Edit: The link still works.

Originally posted February 7, 2012 on AIXchange

The just-published IBM Redpaper, “IBM PowerVM Getting Started Guide,” shows you how to use the Integrated Virtualization Manager (IVM), Hardware Management Console (HMC) and the Systems Director Management Console (SDMC) to configure your systems. It’s an extremely valuable guide that’s brief enough, at 104 pages, to be read quickly.

The chapters are independent, so they can be read in any order. I’ll run down some highlights in posts over the next two weeks:

Chapter 1

* There’s a great chart on page 2 that compares and contrasts the advantages and disadvantages for the IVM, HMC and SDMC. 

* Section 1.2 covers planning:

“Be sure to check system firmware levels on your power server and HMC or SDMC before you start. Decide if you will use Logical Volume Mirroring (LVM) — in AIX LPARs — or Multipath IO (MPIO) at the VIOS level. Obviously if you are running NPIV you would want to run MPIO at the AIX level. The examples in this paper use MPIO. Make sure your Fibre Channel switches and adapters are N_Port ID Virtualization (NPIV) capable if you will be using NPIV.

Make sure your network is properly configured.

Check the firewall rules on the HMC or SDMC.

Plan how much processor and memory you will assign to the VIOS for best performance.”

* The authors recommend using a dual VIOS architecture — two VIO servers — to provide serviceability and scalability. So do I.

* Part of planning includes establishing a VIO slot number scheme. While the SDMC automates slot allocation, the authors illustrate their preferred scheme in Figure 1-2 on page 5.

The authors suggest a VIO slot numbering scheme where the server slot is 101, 102, 103, etc. in both VIO servers, and the client is 11, 12 connecting to VIO1, and 21 and 22 connecting to VIO2. When mapped, VIO1 would map 11 to 101, 12 to 102, and VIO2 would map 21 to 101, 22 to 102. I prefer a numbering scheme where my even-numbered adapters come from one VIOS (VIO1) and my odd-numbered adapters come from the other (VIO2), with both client and server using the same numbers. In my case I like 100, 110, 120, 130 coming from VIO1, and 101 111 121 131 coming from VIO2. Of course, you may have your own numbering scheme — which I’d love to hear about in Comments.

* Section 1.3 covers the terminology differences between Power- and x86-based systems, which can be handy for someone with little or no background managing power systems. This can help them make the transition in terminology between the two.

* Section 1.4 lists some prerequisites for setting up the machines:

“Check that:

  • Your HMC or SDMC (the hardware or the virtual appliance) is configured, up, and running.
  • Your HMC or SDMC is connected to the new server’s HMC port. We suggest either a private network or a direct cable connection.
  • The TCP port 657 is open between the HMC/SDMC and the Virtual Server in order to enable Dynamic Logical Partition functionality.
  • You have IP addresses properly assigned for the HMC, and SDMC.
  • The Power Server is ready to power on.

All your equipment is connected to 802.3ad capable network switches with link aggregation enabled. Refer to the Chapter 5: Advanced Configuration on page 75 for more details.

Fibre Channel fabrics are redundant. Refer to Chapter 5: Advanced Configuration on page 75 for more details.

Ethernet network switches are redundant.

SAN storage for virtual servers (logical partitions) is ready to be provisioned.”

Chapter 5

The next three chapters are devoted to the specific approaches you might choose to take. Chapter 2 covers the IVM, Chapter 3 the HMC and Chapter 4 the SDMC. I’ll dissect those options next week. For now I’ll briefly discuss Chapter 5 (Advanced Configuration):

“This chapter describes additional configurations to a dual Virtual I/O Server (VIOS) setup and highlights other advanced configuration practices. The advanced setup addresses performance concerns over the single and dual VIOS setup.

This chapter includes the following sections:

  • Adapter ID numbering scheme
  • Partition numbering
  • VIOS partition and system redundancy
  • Advanced VIOS network setup
  • Advanced storage connectivity
  • Shared processor pools
  • Live Partition Mobility
  • Active Memory Sharing
  • Active Memory Deduplication
  • Shared storage pools”

* Table 5-1 illustrates an example of virtual SCSI adapter ID allocations.

* Section 5.4 covers advanced VIOS network setup, including link aggregation and VLAN tagging:

“The VIOS partition is not restricted to only one SEA adapter. It can host multiple SEA adapters where:

A company security policy may advise a separation of VLANs so that one SEA adapter will host secure networks and another SEA adapter will host unsecure networks.

A company may advise a separation of production, testing, and development networks connecting to specific SEA adapter configurations.

“There are considerations regarding the use of IEEE 802.3ad Link Aggregation, 802.1Q VLAN tagging, and SEA:

There is a maximum of 8 active ports and 8 standby ports in an 802.3ad Link Aggregation device.

Each of the links in a 803.3ad Link Aggregation device should have their speeds set to a common speed setting. For example, set all links to 1g/Full duplex.

A virtual Ethernet adapter is capable of supporting up to 20 VLANS (including the Port Virtual LAN ID – PVID).

A maximum of 16 virtual Ethernet adapters with 20 VLANS assigned to each adapter can be associated to an SEA adapter.

A maximum of 256 virtual Ethernet adapters can be assigned to a single virtual server, including the VIOS partitions.

The IEEE 802.1Q standard supports a maximum of 4096 VLANS. SEA failover is not supported in IVM as it only supports a single VIOS partition.”

Whether you set up IBM Power Systems all the time or you’re just getting started with the platform, this Redpaper is an excellent resource for learning or reviewing the relevant technology and terminology.

More on Shared Storage Pools

Edit: Some links no longer work

Originally posted January 31, 2012 on AIXchange

Back in 2010 I wrote about the changes that were coming to VIOS. One of those big changes, shared storage pools, is now a reality. This gives admins another option to consider when setting up disks on Power servers.

In larger companies, disk changes are typically implemented by SAN teams with many other responsibilities and, often, different priorities. However, by allocating storage to the servers up front and setting it up in a storage pool, admins can manage shared storage pools. In doing so, we can be more responsive to requirement changes. And with thin provisioning, we can determine the amount disk we actually use on each server. For the first time since the days of internal disks and expansion drawers, disk is back under our control.

Here’s how Nigel Griffiths explains shared storage pools:

“The basic idea behind this technology… is that [VIO servers] across machines can be clustered together and allocate disk blocks from large LUNs assigned to all of them rather than having to do this at the SAN storage level. This uses the vSCSI interface rather than the pass through NPIV method. It also reduces SAN admin required for Live Partition Mobility — you get the LUN available on all the VIOS and they organise access from there on. It also makes cloning LPARs, disk snapshots and rapid provisioning possible. Plus thin provisioning — i.e., disk blocks — are added as and when required, thus saving lots of disk space.”

Continuing from last week, here’s more from Nigel’s presentation.

Since shared storage pools are built on top of cluster-aware AIX, the lscluster command also provides more information, including: lscluster –c  (configuration), lscluster –d  (list all hdisks), lscluster –i  (network interfaces), lscluster –s  (network stats).

In the demo, he also discusses adding disk space and assigning it to client VMs. Keep in mind that you cannot remove a LUN from the pool. You can replace a LUN but you can’t remove one.

He also covers thin and thick provisioning using shared storage pools and shows you how to conduct monitoring. Run topas on your VIOS and then enter D (make sure it’s upper-case) so you can watch the disk I/O get spread across your disks in 64 MB chunks. From there, Nigel covers how to set up alerts on your disk pool. If you’re using thin provisioning, you must ensure you don’t run out of space.

Nigel also shares his script, called lspool. It’s designed to do the work of multiple scripts by presenting all of the critical information at one time instead of running multiple commands:

# lspool list each cluster and for each list its pools and pool details
~/.profile
clusters=`cluster -list | sed ‘1d’ | awk -F ” ” ‘{ printf $1 ” ” }’`
echo “Cluster list: ” $clusters
for clust in $clusters
do
pools=`lssp -clustername $clust | sed ‘1d’ | awk -F ” ” ‘{ printf $1 ” ” }’`
echo Pools in $clust are: $pools
for pool in $pools
do
lssp -clustername $clust | sed ‘1d’ | grep $pool | read p size free
totalLU numLUs junk
let freepc=100*$free/$size
let used=$size-$free
let usedpc=100*$used/$size
echo $pool Pool-Size: $size MB
echo $pool Pool-Free: $free MB Percent Free $freepc
echo $pool Pool-Used: $used MB Percent Used $usedpc
echo $pool Allocated: $totalLU MB for $numLUs Logical Units
alert -list -clustername $clust -spname $pool | sed ‘1d’ | grep $pool
| read p poolid percent
echo $pool Alert-Percent: $percent
if [[ $totalLU > $size ]]
then
let over=$totalLU-$size
echo $pool OverCommitted: yes by $over MB
else
echo $pool OverCommitted: no
fi
done
done

Nigel examines snapshots and cloning with shared storage pools, noting that the different commands — snapshot –create, snapshot –delete, snapshot –rollback and snapshot –list — use different syntax. Sometimes it asks for a –spname flag, other times it asks for a –sp flag. Pay attention so you know the flags that are needed with the commands you’re running. He also demonstrates how some of this management can be handled using the HMC GUI.

The viosbr command is also covered. I discussed it here.

Nigel recommends that you get started by asking the SAN team to hand over a few TB that you can use for testing. Also make sure your POWER6 and POWER7 servers are at the latest VIOS 2.2 level. It’s worth the effort. This technology will save time, boost efficiency and increase your overall responsiveness to users.

Finally, here’s Nigel’s shared storage pools cheat sheet:

1. chdev -dev -attr reserve_policy=no_reserve
2. cluster -create -clustername galaxy -repopvs hdisk2
-spname atlantic -sppvs hdisk3 hdisk5 -hostname bluevios1.ibm.com
3. cluster –list
4. cluster -status -clustername galaxy
5. cluster –addnode –clustername galaxy –hostname redvios1.ibm.com
6. cluster -rmnode [-f] -clustername galaxy -hostname redvios1.ibm.com
7. cluster –delete –clustername galaxy
8. lscluster –s or –d or –c or –i = CAA command
9. chsp –add –clustername galaxy -sp atlantic hdisk8 hdisk9
10. chsp -replace -clustername galaxy -sp atlantic -oldpv hdisk4 -newpv hdisk24
11. mkbdsp -clustername galaxy -sp atlantic 16G
-bd vdisk_red6a -vadapter vhost2 [-thick]
12. rmbdsp -clustername galaxy -sp atlantic -bd vdisk_red6a
13. lssp -clustername galaxy -sp atlantic -bd
14. lssp -clustername galaxy
15. alert -set -clustername galaxy –spname atlantic -value 80
16. alert -list -clustername galaxy -spname atlantic
17. errlog –ls
18. snapshot -create name -clustername galaxy -spname atlantic -lu LUs
19. snapshot -delete name -clustername galaxy -spname atlantic -lu LUs
20. snapshot -rollback name -clustername galaxy -spname atlantic -lu LUs
21. snapshot –list -clustername galaxy -spname atlantic
22. viosbr -backup -clustername galaxy -file Daily -frequency daily -numfiles 10
23. viosbr -view -file File -clustername Name …
24. viosbr -restore -clustername Name …
25. lsmap -clustername galaxy –all

Take the time to listen to the replay, and you’ll learn even more. I highly recommend it.

Getting Started with Shared Storage Pools

Edit: Some links no longer work.

Originally posted January 24, 2012 on AIXchange

The December AIX Virtual User Group webinar featured Nigel Griffiths’ discussion of phase 2 of shared storage pools. If you didn’t tune in, download the presentation materials and listen to the replay.

The new shared storage pool functionality is enabled with the latest PowerVM 2.2 service pack, and is a feature of PowerVM Standard and Enterprise. If you already have PowerVM, simply download the VIO server fixpack to obtain these new features. (Note: Because this TL is based on AIX 6.1 TL7, your NIM server must be at AIX 6.1 TL7 or AIX 7.1 TL1 to use your NIM server with your VIO server.)

 One thing to note, as Nigel points out in the presentation, is that the most common VIOS storage options have been around for some time:

1) Logical volumes, created from a volume group and presented to client LPARs
2) Whole local disks
3) SAN LUNs
4) File-backed storage, either from a file system on local disk or a file system on SAN disks
5) NPIV LUNs from SAN

Nigel then discusses the newest option: using SAN LUN disks that are placed into a shared storage pool. This new option, he emphasizes, doesn’t eliminate any of the other options. It does not portend the death of NPIV. It’s just an additional VIOS storage choice we now have.

Listen to the replay or look over the slides to gather Nigel’s thoughts on the benefits of shared storage pools. He explains that fibre channel LUNs and NPIV can be complex. They require knowledge of the SAN switch and the SAN disk subsystem. If you need to make changes, it might take your SAN guys awhile to implement them. This can slow overall responsiveness. That’s to say nothing of smaller organizations that don’t have dedicated SAN guys. Live partition mobility can be tough work if your disks aren’t pre-zoned to the different frames.

With a shared storage pool you pre allocate the disk to the VIO servers. Then it’s under your control. You can more easily allocate the space to your virtual machines.

POWER6 and POWER7 servers (including blades) are needed to use shared storage pools. At minimum you should allocate a 1 GB of LUN for your repository and another 1 GB of LUN for data, but in order to be useful, in most cases you’ll need much larger LUN(s) – think terabytes of disk — if you plan to do much with it. 

Your VIOS must have the hostname set correctly to resolve the other hostnames. In Nigel’s experience he couldn’t use short hostnames — they had to be changed to their fully qualified names.

He also recommends giving your VIOS a CPU and at least 4 GB of memory. “Skinny” VIOS servers aren’t advisable with shared storage pools. Currently, the maximum number of nodes is four, the maximum physical disks in a pool is 256, the maximum virtual disks in a cluster is 1,024, and the number of clients is 40. A pool can have 5 GB to 4 TB of individual disks, and storage pool totals can range from 20 GB to 128 TB. Virtual disk capacity (LU) can be from 1 GB to 4 TB, with only one repository disk.

If you played around with phase one, you’ll find that many of your limits have been removed. Now you can use shared storage pools for live partition mobility, perform non-disruptive cluster upgrades and use third party multi-pathing software.

You cannot have active memory sharing paging disks on shared storage pool disks.

Nigel covers the relevant terminology (clusters, pools, logical units, etc.).  He also demonstrates how to actually prepare and set up your disks. In a nutshell you must get your LUNs online and zoned to your VIO servers, and you need to set your reserve policy to no_reserve on your LUNs.

After covering the commands for managing clusters — cluster –create, cluster –list, cluster –status, cluster –addnode, cluster –rmnode and cluster –delete — he recommends creating the cluster on one of the VIO servers and then adding additional VIO servers to the shared storage pool cluster. From there, you can allocate space to the VM clients.

Next week I’ll have more information from Nigel’s presentation, including scripts and cheat sheets. In the meantime, why not upgrade your test machine’s VIO servers to the latest level so you can try this functionality?

Verifying Microcode Levels

Edit: As of this writing the link still works.

Originally posted January 17, 2012 on AIXchange

As great as POWER7 servers are, plenty of older machines still run AIX. And as great as that is, even better is that, through the use of certain tools, you can easily verify that these older machines are running the latest versions of system firmware and microcode.

Recently a “Top Gun” CE reminded me about the Microcode Discovery Service. It’s a handy tool that allows you to see if your microcode is up to date.

From IBM Support:

“Microcode Discovery Service [MDS] is used to determine if microcode installed on your IBM System p or RS/6000 systems is at the latest level.

“MDS relies on an AIX utility called Inventory Scout. Inventory Scout is installed by default on all AIX 5 systems, and also on some later levels of AIX 4.3.”

As noted by IBM, there are three ways to run MDS:

  • Run a signed Java applet that connects to Inventory Scout daemon processes on hosts to be surveyed.
  • Run Inventory Scout either manually or by script, and upload the resulting survey files to the MDS website for analysis. Refer to the User’s Guide for instructions on running Inventory Scout.
  • The MDS Microcode CD-ROM is recommended for systems that are not internet connected. An image of this CD-ROM is available online, or you can order a physical disk. For more information or to download a CD-ROM image of this tool, visit: MDS Microcode CD-ROM.

Under the heading, “Preparing to use the Microcode Directory Service,” there’s a description of the Inventory Scout:

“Inventory Scout is a utility that runs on System p hosts. For AIX version 5 and later it is part of the standard install. In case it is not installed, refer to the User’s Guide for instructions. …

“The MDS applet is capable of performing surveys of more than one host at a time, and creating a combined microcode report. If using the MDS applet, then the following additional conditions must be met.

  • “Each host to be surveyed must be running Inventory Scout in a daemon process. These daemon processes are not started by default. Refer to the User’s Guide for instructions on how to configure Inventory Scout to run as a daemon.
  • “Java support must be enabled in your browser. To enable this support, see the Preferences or Tools options on your browser.
  • “Your company must allow applets to establish TCP connections to the hosts to be surveyed.”

I clicked on the applet option

and allowed it to run:

This screen allows you to add a host and give it an IP address, password and (optionally) the port if you’ve changed it.

If the wrong IP address is entered, you’ll get this screen.

I ran passwd on the invscout userid so that I knew the password for the invscout account. Then I ran invscoutd –d100000 on the command line of the machine I was going to scan so I could activate the daemon and get it to listen for the applet to connect. Originally invscoutd was listening for 50,000 bytes. It needs to listen for 100,000 bytes, which is why the –d flag is required.

Once I successfully added the host, I clicked start:

That brought up this report.

It found three devices and displayed the system firmware that needed to be updated. I clicked the link and copied the files to the machine that was running the applet. Then I moved those files over to the Power system that needed to be updated.

In my case I created /tmp/mcode, and copied the files to that directory. Then I ran rpm –ivh -–ignoreos *.rpm to get the information and microcode loaded into the /etc/microcode directory.

Next, I ran diag, task selection:

On this screen, I selected the “Microcode Tasks” option:

Then I selected “Download Latest Available Microcode” and hit enter:

Since I ran the rpm command earlier, I knew the files should be in /etc/microcode. I selected that option:

Then I selected “All Resources” and then hit the F7 key to let the microcode install:

I actually updated the system firmware via the HMC, though I could have used the command line if my system wasn’t managed by an HMC. Once my updates were complete, I reran the MDS applet. No further updates were needed:

Mds13

Of course there are other methods for verifying that your microcode is up to date. How do you prefer to do this in your environment?

Have You Seen the VIOS Advisor?

Edit: This is built in now. Some links no longer work.

Originally posted January 9, 2012 on AIXchange

Download the tool here. I’ll let IBM developerWorks provide the introduction.

“The VIOS advisor is an application that runs within the customer’s VIOS for a user specified amount of time (hours), which polls and collects key performance metrics before analyzing results and providing a health check report and proposes changes to the environment or areas to investigate further.

“The goal of the VIOS advisor is not to provide another monitoring tool, but instead have an expert system view performance metrics already available to the customer and make assessments and recommendations based on the expertise and experience available within the IBM systems performance group.

     Download vios_advisor.zip from the link provided in download section.
     Unzip vios_advisor.zip on a workstation that has a web browser.
     ftp vios_advisor onto the VIOS you wish to monitor. (Place in any directory.
     chmod +x vios_advisor to give the application execution privileges.

 “The application “vios_advisor” takes only one parameter, which is the duration of the the monitoring period, in minutes.

 “For example, to monitor the VIOS for 30 minutes, run:

      vios_advisor 30

   Usage Statement:

      Usage: vios_advisor
      duration_in_minutes:
      Recommended monitoring time = >= 30 min
      Minimum monitoring time = 5 min (only recommended for settings verification)
      Maximum monitoring time = 1440 min (24 hours)

      -v : Version

“The vios_advisor application is silent (does not produce any output to screen) and upon termination, will generate an xml file in the current running directory labeled:

     vios_advisor.xml

“Copy over the vios_advisor.xml file to the workstation where the zip file: vios_advisor.zip was extracted, and place the file in the vios_advisor folder. Open the vios_advisor.xml file with the web-browser of your choice to see the report.

“The measured overhead for the VIOS Advisor is minimal. An increase in CPU consumption of 0.1 cores was measured on a POWER7 server. Memory consumption will vary based on the number of physical I/O devices in the VIOS, but expect the advisor to consume 2-20 MB of memory.”

I downloaded the advisor and extracted the files from the .zip file. Then I selected the vios_advisor_example file that was located in the newly created directory. This was the output in my browser:

I copied the vios_advisor file to my VIOS and ran chmod on it so that I could run the tool. Then I ran a quick test to make sure it worked:

$ chmod u+x vios_advisor
$ vios_advisor -v
vios_advisor  Version: 121211B

Then I ran:

 $ vios_advisor

 Usage: vios_advisor

      duration_in_minutes:
            Recommended monitoring time = >= 30 min
            Minimum monitoring time =         5 min  (only recommended for settings verification)
            Maximum monitoring time =      1440 min  (24 hours)

       -v :    Version

Since this was a test, I chose the minimum of five minutes to verify the settings.

$  vios_advisor 5

At the end of the test, I received a file called vios_advisor.xml. I copied that back to my PC, putting it in the directory that my vios_advisor.zip file was extracted to. Then I examined the report.

I’m sure IBM will continue to enhance the tool, helped along by user feedback.

So have you tried the VIOS advisor? How would you improve it?

We Could All Use Extra Capacity

Edit: I still like this analogy.

Originally posted January 3, 2012 on AIXchange

I was recently delayed at the San Jose airport. Such is the life of a consultant.

The problem this particular day was dense fog. Airplanes could take off, but, per FAA rules, they weren’t allowed to land. My flight was coming from Reno, stopping in San Jose and continuing to Las Vegas. But with the fog, the plane was rerouted straight to Las Vegas. That left me and a bunch of other passengers to fend for ourselves.

Luckily I was able to book tickets for a flight later that day. (Given my many miles logged with this particular airline, they were pretty accommodating.) I made some different connections through different cities and arrived home some six hours later than planned. Unsurprisingly, my checked luggage didn’t accompany me. It showed up the next day (which is why I generally only take along what I can carry on). But at least I was able to complete my travels in one day.

As a frequent flyer, I’ve been through worse. Several times I couldn’t get directly home. I had to fly into a nearby airport and take the train the rest of the way. Once I ended up in a rental car with three strangers as we drove from Pittsburgh to Chicago (again, fog was the issue). I’ve also shared a rental car with fellow weary travelers between Fresno and San Francisco.

Being stranded at the airport does give a person time to think. For instance, while I sat in San Jose, I thought how great it would be if, whenever a flight is delayed or diverted, a spare plane with a standby crew standing would be instantly available to take passengers to their destination.

Imagine a snowstorm or other weather event that causes disruptions throughout an entire region over several days. In my fantasy world there would be lightly loaded planes with available seats so everyone could rebook with little worry or hassle. In an even more perfect world, this extra capacity would be available at every airport, at any time. If one plane has a mechanical problem, another would be ready to go. And in peak travel times — say, Thanksgiving week — extra planes could be deployed as needed to satisfy the additional demand.

So basically, I was thinking how awesome it’d be if flying was like Power Systems hardware.

Just think about that. Think about capabilities like Capacity on Demand (COD), micropartitioning and shared processor pools. Think about how the hypervisor, on a millisecond by millisecond basis, can redistribute workloads and efficiently utilize the processors on your machine.

Instead of sizing standalone machines for peak workloads, you can put many LPARs on a single frame to more fully utilize that machine. If you need extra capacity, it’s available. If you’re using COD, you can fire up dormant processors or extra memory that’s already on the frame. Active memory expansion and active memory sharing allow you to do more with less physical memory. You have many options to get extra capacity built right into your systems.

Unfortunately, excess capacity was nowhere to be found on my most recent layover at the San Jose airport. But at least whenever I return from the road, I know I’ll have excess capacity to handle additional workload in my computing environment.

AIX and TCO

Edit: Some links no longer work.

Originally posted December 20, 2011 on AIXchange

Would you put bicycle tires on a new car? I keep hearing that analogy, and I like it.

When I was much younger and much less well off, I sort of did that, only instead of a bicycle, I made life more difficult when I bought used tires for my car. This wasn’t quite as foolish as choosing gas over oil, but bear with me.

A tire shop in town took in old tires when they sold customers new tires. Then they’d resell their used tires to poor kids like me. The shop charged $5 per old tire. That sure seemed like a bargain at the time, but I soon learned that you get what you pay for. My “new” old tires had no tread and wouldn’t hold air. Then I’d spring for more $5 tires. While occasionally I’d find some tires with a little more life in them, it wasn’t long before I concluded that I should just spend the extra money up front. Once I started purchasing new tires, my tire problems went away.

Now, instead of a car, think about your computers, and instead of tires, think about the operating system or applications that you run in your enterprise. Then ask yourself, why do people go with cheap computers and free operating systems?

Sure, they’re less costly up front, and we might be able to download an OS and get them to run on commodity hardware. But by taking this route, are we putting bald used tires on a fancy new sports car? Are we so focused on the cheap initial investment that we overlook the total cost of ownership (TCO)?

I like enterprise class servers and enterprise class operating systems. In the case of Power Systems, I like knowing that IBM designed the hardware, designed the hypervisor, designed the operating system, and is there to support them all. If I have issues, help is a quick phone call away. If I need to escalate an issue, I can easily gather an army of IBMers to help solve problems.

If I’m running an application that supports my business, and I need application availability, why would I want to buy used, worn tires?

Recently on Twitter, fellow Power Champion Andrew Wojnarek (@andywojo) said that Linux is free only if your time is worthless. I don’t want to go that far, because Linux has its uses. But assuredly, if given the choice, I’ll always choose AIX over Linux.

Of course, there are those who believe that Linux on VMware is good enough for their needs, as you can see from the comments on this post. What do you think?

Running nmon and topas

Edit: Some links no longer work.

Originally posted December 13, 2011 on AIXchange

Nigel Griffiths had a great session at this fall’s IBM Technical University on “nmon, topas and Friends.” I assume that you know that he actually wrote nmon (aka, “Nigel’s Monitor”). As such, it’s very enlightening to hear him speak about the history of the tool and his motivation for writing it. Besides, obviously, being very knowledgeable about the subject, he’s also a very entertaining and engaging public speaker.

Nigel mentioned how nmon and topas had come together, and detailed the history and timeline around nmon being officially supported by IBM. He then offered some tips and tricks for running both tools.

He mentioned that current versions of AIX (which you should be running) include a copy of topas_nmon. I assume it’s on your machine if you’re reading this post. We should all be using the current version. nmon “classic” should only be used if you’re running old versions of hardware and AIX, although Nigel recommends using ONLY the latest version (12e+). Keep in mind that nmon classic is functionally frozen.

Nigel said this endeavor started as a personal project, but he was soon deluged with requests for copies. He explained that the tool consumes less than 1 percent CPU and uses APIs, rather than AIX commands, under the covers. This is how he accomplished his goal of making nmon “small, simple and safe.”

While I can’t capture everything Nigel laid out in his presentation, I do encourage you to experiment and learn more about the tool. Run nmon –h and look at all of the different available options and statistics. As IBM now supports both nmon and topas, you have a choice when it comes to viewing performance data and talking to IBM about what you’re seeing on your systems.

With topas, see what you get by entering “P,” “E,” “D,” “L,” “V” or “F.” Be sure to capitalize. “E,” for instance, is for shared Ethernet adapters. Log into your VIO server, run topas and hit “E,” and you’ll see the network traffic going across your shared Ethernet adapter.

If you’re running virtual SCSI devices, try “D”–and then “d”–inside your VIOS to view virtual to physical disk mapping information.

Topas –C gives you a view of all the LPARs across your physical machine (assuming you can access each LPAR over the network).

One nice thing is if you’re on a system and topas keeps refreshing, you can freeze the screen to conduct closer analysis. Just hit the space bar.

Now try nmon. Hit “l” (that’s a lower-case L) and watch as it gives you a long-term view of physical CPU. If you observe the display over time as your CPU works and idles, you should see the scale automatically change based on your machine’s activities.

Nigel also mentioned how we can use the Stephen Atkins tool, nmon analyzer, to graph and view our nmon output.

What are some of your favorite ways to customize your topas or nmon views?

As I’ve often noted, the IBM Technical University is a great educational experience. I think every one of Nigel’s sessions was standing-room only, and I know that Jay Kruemcke had to add another session on AIX trends and directions. I’m sure other presenters drew large crowds as well. The 2012 conference will be held in Las Vegas. Plan now so you can attend.

Getting Started with SDMC

Edit: I do not know very many people that got started or kept going with the SDMC.

Originally posted December 6, 2011 on AIXchange

Ready or not, the SDMC is on its way. I thought I’d get my toes wet by trying to test an SDMC virtual machine before I used the SDMC appliance. I ordered the SDMC DVDs and received the VMware and KVM versions. Given the option, I chose a VMware farm for the installation.

The DVD had quite a few large files on it, and rather than going on-site to physically load them, I just copied the files to my destination machine. Here’s the list of files I copied from the two DVDs.

Once I copied the files from the CD to my virtual machine, I ran the CreateSDMCova.bat script to covert the files into a large .ova image file.

This is from the README.txt file:

How to build the SDMC .ova image
——————————–

1. Copy all files from this disk (DVD1) to your build directory.
2. Copy all files from the 2nd disk (DVD2) to the same build directory (from step #1).
3. From this build directory, run CreateSDMCova to build the .ova image which can be deployed using ovftool.

For Linux,
# ./CreateSDMCova.sh 

For Windows,
> CreateSDMCova.bat

I was on a Windows machine, so I ran the CreateSDMCova.bat command, generating this output:

Creating .ova image….
SDMC_1119B_730_0512.00
SDMC_1119B_730_0512.01
SDMC_1119B_730_0512.02
SDMC_1119B_730_0512.03
SDMC_1119B_730_0512.04
SDMC_1119B_730_0512.05
SDMC_1119B_730_0512.06
SDMC_1119B_730_0512.07
SDMC_1119B_730_0512.08
SDMC_1119B_730_0512.09
SDMC_1119B_730_0512.10
SDMC_1119B_730_0512.11
SDMC_1119B_730_0512.12
SDMC_1119B_730_0512.13
SDMC_1119B_730_0512.14
SDMC_1119B_730_0512.15
SDMC_1119B_730_0512.16
SDMC_1119B_730_0512.17
SDMC_1119B_730_0512.18
        1 file(s) copied.

Then when I ran the dir command in my DOS window, I saw a new .ova file had been created from all of my image files:

170 CreateSDMCova.bat
144 CreateSDMCova.sh
57 md5sum
415 README.txt
419,430,400 SDMC_1119B_730_0512.00
419,430,400 SDMC_1119B_730_0512.01
419,430,400 SDMC_1119B_730_0512.02
419,430,400 SDMC_1119B_730_0512.03
419,430,400 SDMC_1119B_730_0512.04
419,430,400 SDMC_1119B_730_0512.05
419,430,400 SDMC_1119B_730_0512.06
419,430,400 SDMC_1119B_730_0512.07
419,430,400 SDMC_1119B_730_0512.08
419,430,400 SDMC_1119B_730_0512.09
419,430,400 SDMC_1119B_730_0512.10
419,430,400 SDMC_1119B_730_0512.11
419,430,400 SDMC_1119B_730_0512.12
419,430,400 SDMC_1119B_730_0512.13
419,430,400 SDMC_1119B_730_0512.14
419,430,400 SDMC_1119B_730_0512.15
419,430,400 SDMC_1119B_730_0512.16
419,430,400 SDMC_1119B_730_0512.17
312,432,640 SDMC_1119B_730_0512.18
7,862,179,840 SDMC_1119B_730_0512.ova

I went into VMware’s vcenter and followed the wizard to deploy the ovf / ova file:

After it deployed, I powered on the SDMC and brought up the console window in VMware so I could watch it boot up. It was similar to booting an HMC for the first time.

Once it came up, I was prompted to select the locale:

After accepting the license agreement, I reached the main wizard and completed the new install.

I filled in my date, time, passwords, IP configuration information, etc. Once it configured itself, it rebooted and brought up a login web page.

Incidentally, the sysadmin ID is now used for login (rather than hscroot). Once I logged in I reached this page.

I’ll add some systems soon, and in future posts I’ll discuss any issues I encounter.

Who else is test driving a SDMC? Have you gone live? Let me know in the comments.

The Hard Lessons of IT

Edit: It is still best to choose oil over gas when given the choice.

Originally posted November 29, 2011 on AIXchange

When I was 16, I got my driver’s license. One summer I had the opportunity to live in another state. However, my primary vehicle, a 1972 VW Bug, had to stay put. Still, I figured I needed wheels for that summer, so, after a year of busing tables at a Sizzler Steak House (and riding my bike to work), I saved enough to shell out $200 on a used station wagon.

It’s easy to understand why teenaged me made that choice. Speed and power. I seem to remember being told that the wagon had a “Buick 454.” I knew just enough about cars to understand that it had eight cylinders and that no one would be able to catch me in it. My friends’ sports cars looked better, but they couldn’t touch my beat up old station wagon. I never lost a race. (Of course it helped that none of my friends had a REAL sports car, but that old wagon was pretty fast.)

Still, as you might imagine, that purchase ended up being a very costly decision. First, I had to take it through state vehicle inspection, where issues with the muffler and exhaust system were found. I paid for the needed repairs (which, if memory serves me, cost at least as much as I’d spent on the wagon itself) and passed inspection. I thought I was ready to bask in my personal Summer of George, but my vehicle issues were only beginning.

The wagon burned oil–as much as a quart or two for every tank of gas I’d consume. And, if you weren’t already thinking it, the gas mileage on this thing was miniscule. So every few days that summer, I was buying oil and gas. I figured I had enough cash on hand to coast through those two months, but before long, I was almost tapped.

Eventually, I only had enough cash on hand for gas or oil. One, not both. As a result, my bad decision to get the wagon in the first place would be compounded by an even worse decision. I chose gas over oil. In no time at all that wagon ended up stalled on the side of the road, having thrown a rod. Then it was off to the junkyard.

Oh, and I made one other bad decision. Before my wagon was junk, I could have sold it. Someone was pestering me to buy it, and my plan was to unload it on him at summer’s end. Needless to say, that plan blew up with the motor.

I did at least learn from my mistakes. (For starters, it’s oil over gas. Always.) I guess that’s the important thing. If anything, it’s even more important now. Mistakes surely happen in IT, some even more costly than blowing an engine. Sometimes a backup doesn’t get made, and a system can’t be restored. Sometimes a backup is made but not tested, and thus doesn’t work when it’s needed. Sometimes shutdown –Fr gets run on the production box rather than the test box you thought you were logged into. Sometimes a VIO server is misconfigured and the production network goes down. Sometimes an rm command is run in the wrong directory.

These experiences lead to outages, and probably a few lost jobs. But the culprits undoubtedly learned from their mistakes.

What mistakes in IT have you made and, hopefully, learned from? I love hearing others’ IT horror stories (in part, admittedly, because they didn’t happen to me). So if you’ll indulge me, please share your learning experiences in Comments. I’m sure other readers will appreciate your stories as much as I would. And maybe by sharing you’ll keep someone else from making the same error.

Caching In

Edit: I have not done much geocaching lately.

Originally posted November 22, 2011 on AIXchange

Since I wrote about Watson and its appearance on “Jeopardy!,” I’ve become interested in the show’s famous human contestant, Ken Jennings.

Mind you I’d barely heard of Jennings when he was establishing his winning streak on “Jeopardy!” But post-Watson, I started following him on Twitter (@kenjennings) and have read two of his books, “Maphead” and “Brainiac.” I’ve found I really enjoy his writing style, both in his books and his short tweets. I guess I like his sense of humor. He’s even introduced me to a new hobby: geocaching.

Jennings writes about geocaching toward the end of “Maphead.” While I was aware of the pastime, and had even watched others find a cache or two, the geocaching bug never bit me, in part because I thought I’d have to get a costly standalone handheld GPS device. However, Jennings’ words motivated me, and I found I could simply get a free geocache app for my GPS-enabled phone. (I was pleasantly surprised to learn I didn’t have to pay for an app, though many “geocachers” do buy them.) Then I registered on geocaching.com and I was set.

In the book Jennings mentions some of the things his family discovered in their neighborhood as these cache hunts took them off their normal beaten path. Geocaching became a family thing for the McNellys as well. It didn’t take long to make our first discovery–or, for that matter, our first 10 discoveries. Yes, you’ll get some strange looks scanning the bushes along a walking trail, but it’s a great way to fill some downtime. Just go for a drive and look for some caches. Some of these hides are ingenious, and the containers themselves can surprise you–finds come in everything from old film canisters to ammo boxes to peanut butter jars. And yes, you may even get a gag container with coiled stuffed snakes that leap out at you.

Pretty much wherever I go, my geocache app informs me of nearby caches. While I didn’t have time to search while I was in Miami for October’s Technical University conference, I’ll definitely do some geocaching the next time I find myself in a new city. You can start by using your GPS to get turn-by-turn directions in your car. Once you’re within walking distance of your target, you simply switch to compass mode. Of course GPS accuracy can vary with the terrain, so technology doesn’t do everything for you. You still need to look around, or even look up: Some of my finds have been cleverly concealed in tree branches or hollowed-out tree trunks, among other hard-to-notice places.

It’s funny how my initial interest in Watson led me to Jennings, who encouraged me to spend time outside using my smartphone to play hide and seek. I guess life is full of odd little journeys though. Think about how you came to work with computers, or AIX specifically, or even how you found this blog. Wherever you’ve ended up in life, you likely have some good stories about how you got there.

As John Hughes wrote for his character Ferris Bueller, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it.”

The Command Line Isn’t for Everyone

Edit: This topic still comes up with new users.

Originally posted November 15, 2011 on AIXchange

As much as I rely on the VIO server, I understand that the command-line interface takes some getting used to for those who are new to it. This is especially true for anyone coming from a non-UNIX background (e.g., IBM i). Although IBM uses similar syntax (verb noun) between some AIX and IBM i commands, usage can be quite different.

Recently when I was showing some IBM i users how to map disks in VIOS, one asked why I just didn’t use the GUI. It turned out that he’d been shown a powerful tool that I wasn’t aware of.

Have you ever checked the “configuration/virtual resources” option on your HMC menu?

When you select the Virtual Storage Management option under Virtual Resources, you get a screen where you can choose the VIOS you want to work with.

Once you do that, select “Query VIOS” to view the storage details from your VIOS.

The Optical Devices tab provides views of your physical DVD drive (if attached to that VIOS) and the *.iso images in your media repository (if any). Other tabs have additional information, but in most cases I’m primarily interested in physical volumes, as I typically map whole LUNs to client LPARs. 

The Physical Volumes tab lists the disks that your VIOS can manage.

By clicking the appropriate radio button to choose a disk and then selecting modify assignment, you can choose the parititon you want to assign your disk to. Using this method, the mkvdev command runs under the covers, eliminating the need to use the command line.

This interface can also tell you which disks are assigned to which partition, and how large the partitions are.

Although many of us AIX pros are comfortable with the command line and would never consider using a GUI, there are instances where it’s helpful–particularly if your IT department has admins who cut their teeth on something other than UNIX. You never know when you may need to teach someone how to map their disks. They just might catch on better with a GUI as opposed to a command line.

Proud to be a Champion

Edit: The first time I was named as an IBM Champion. Some links no longer work. There is even a video

Originally posted November 8, 2011 on AIXchange

I may be late to the party, but I’ll still take a moment to toot my own horn. As Doug Rock and Steve Will note, I was recently recognized, along with 13 others, as an IBM Champion, and as an IBM Power Champion in particular.

“The IBM Champion program recognizes innovative thought leaders in the technical community—and rewards these contributors by amplifying their voice and increasing their sphere of influence. An IBM Champion is an IT professional, business leader, developer, or educator who influences and mentors others to help them make best use of IBM solutions and services. IBM Champions are not employees of IBM.”

I believe that this blog was a big part of the reason I was nominated, so I want to thank all of the readers for helping to make this possible. Between this blog and Twitter (@robmcnelly), hopefully I’ve been providing information that you’ve been able to use over the years.

I’ve been an IBM customer since 1988, when I started working as an AS/400 computer operator. Things were much simpler back then, but the systems that I managed were built to last and they seldom had problems. In that regard, nothing much has changed.

I was always impressed with our local customer engineer (CE). He’d come onsite, check how things were going, and proactively run diagnostics and check error logs on the machines. I can remember asking the CE about how he got started with IBM. Even then I admired the company.

Any time we called IBM Support, our problems were handled quickly. Even with the recent switch to call-back mode, I still believe I’m getting the timely support I’ve come to expect over the years.

I worked for a few different companies during my time on the AS/400. Later when I went back to school while continuing to work full-time, IBM recruiters came to campus. When they looked at my resume and saw that I had years of experience working with IBM products, they led me to the appropriate hiring manager who helped bring me on board with IBM.

I was an IBMer for six years in Boulder, Colo. I’ve been with my current employer for four years. They’re an IBM Premier Business Partner. So for my entire career, I’ve been either an IBM customer, an IBM employee or an IBM business partner.

“Apple fanboy” is a moniker that’s sometimes given to those who love Apple products. Along those lines, I guess I’m a “Power fanboy.” I love the platform and the operating systems that run on it. I love the virtualization capabilities, the performance and the reliability. And, as readers of this blog surely know by now, I love telling others about Power Systems servers. I’ve been reading the articles and following the tweets of other Power Champions for some time, which makes me all the more proud to be included in this group and recognized for my efforts.

If you know of someone worthy of recognition as an IBM Champion, please respond in Comments. I’d be happy to be involved with nominating others for this distinction.

Note: On another personal note, the end of Daylight Savings Time here in the U.S. this past weekend stirred up some old feelings. Check out my previous AIXchange blog entry on the topic.

What to Say About What You Do

Edit: Some links no longer work.

Originally posted November 1, 2011 on AIXchange

This recent Anthony English post got me thinking. When someone asked him what he did, he wasn’t sure how to respond. How do you answer that question? Can you explain what you do in a nice 30-second elevator pitch?

Luckily for me, Watson recently made a repeat appearance on “Jeopardy!” Watson has become the basis for my “pitch”: I simply tell people I work on the same servers that they saw on TV with Alex Trebek.

Of course, I’ll never be able to count on “my” servers appearing on television with any regularity, so I still need other ways to explain to others how I make my living. Or do I? As I’ve noted previously, telling people you work on computers can have some unwanted consequences. In their minds my admission is their opening to ask me to come over and resurrect their old, possibly virus-ridden machines. (“Can’t you just add a hard drive or memory or something?”) I generally counter those requests by explaining that I work on large enterprise servers running enterprise operating systems–in other words, small machines aren’t my specialty. Of course to a lot of folks, a computer is a computer.

(As an aside, why is it considered bad form to ask your doctor and lawyer acquaintances for free legal or medical advice, but few think twice about asking computer nerds they know for free help? Maybe they figure we have nothing better to do. Maybe we need to start quoting our hourly rates.)

So what do you say when you’re asked what you do? Do you talk about a typical day at work (one where machines haven’t blown up)? I’ve heard a system administrator liken his job to a plumber’s: nobody notices or needs one until something stinks. (That’s figuratively, one can only hope, in the admin’s case.)

If you’re reading this blog, I figure you’re involved with provisioning machines. You may not have built them, but you do care for them day to day. Interacting with your machines as much as you do, over time you may even get close to them. However, eventually you must put them out of their misery and upgrade to newer gear. (Another aside: It’s actually funny how quickly this cycle can run. You just installed the fastest, shiniest new hardware, but in a few short years you’re yearning for that new, state-of-the-art box.)

Ultimately, the way I explain what I do depends entirely on my audience and their frame of reference. To those in the industry, it’s simple: I start by saying I sell and install IBM’s Power hardware line and specialize in AIX. But to non-industry folks, I can’t have their eyes glazing over from my tales of patching, upgrading, installing, cabling, provisioning and deploying machines. Not to mention backups, restores, clones, LUNs, mirrors, migrations, copies, archives, scripts, cron jobs and the like. Maybe it’s enough for them to know that my specialty is enterprise IBM hardware. Or maybe I just say I work with computers.

What do you do?

Customized Comfort

Edit: All links still work at the time of this writing.

Originally posted October 25, 2011 on AIXchange

Do you have a nice customized shell and environment? Do you have a wonderful prompt that displays your current working directory and username? Does it change your terminal window name when you login?

Do you have aliases set up so that things like oem_setup_env or ls –la are as easy as typing “oe” and “ll”? Is your PATH variable set so that you don’t have to explicitly enter the path to the command you’re trying to run? Are your favorite settings and commands–like “set –o vi,” “stty erase^” or rm asking you if you really want to delete that file–all ready to run when you login?

Like anyone, I love having the same prompts, scripts and tools available across all of the LPARs I manage. This gives me the same familiar look and feel each time I login to my machines. I’m sure you can relate. When you work for a company you can set things up the way you want–at least as far as your own user ID is concerned. Of course, customizing your team’s root login usually involves some level of compromise when multiple admins are involved. How do you decide which customizations will run with the root ID?

As a consultant, much of my work these days involves others’ machines. The various sites I travel to are all customized to others’ specifications. I cannot just login and change things to fit my preferences. In fact, when I do new OS installs, I generally don’t have anything other than defaults to work with anyway. I always figure I’ll need to login and quickly run “set –o vi.” That’s usually the minimum of what I need to get by (although “stty erase ^?” is a close second on many systems). I just need to be sure to periodically run “uname –a” and pwd so that I know which system I’m on and which filesystem I’m in.

This is my job of course, but these experiences are nowhere as nice as working my own machines and having the prompt setup to give me the information I need in the manner that I like no matter where I am. However, facing the unfamiliar isn’t all bad. I’ve picked up good ideas from the customers I visit, and sometimes they borrow some of my customization preferences.

I assume most of you work in one environment. How do you customize it? Engage me in a thought experiment: What if your favorite tools and scripts weren’t always available to you? What are the first things you like to add to a new system? Are you so set in your ways that you’d freak out working in a bare-bones new install environment?

Using backupios

Edit: Some links no longer work.

Originally posted October 18, 2011 on AIXchange

In a recent AIXchange blog entry I discussed using the viosbr command to backup VIO server settings. Now I’ll tell you about backupios. Both commands should be used in your VIOS environment.

While viobr allows you to restore mappings, backupios is used to restore the whole VIOS operating system. So think of backupios as your VIOS’s mksysb:

“[backupios] creates an installable image of the root volume group, either onto a bootable tape, file system or DVD.

“The backupios command creates a backup of the Virtual I/O server and places it onto a file system, bootable tape or DVD. You can use this backup to reinstall a system to its original state after it has been corrupted. If you create the backup on tape, the tape is bootable and includes the installation programs needed to install from the backup.

“If the -cd flag is specified, the backupios command creates a system backup image to DVD-RAM media. If you need to create multi-volume discs because the image does not fit on one disc, the backupios command gives instructions for disk replacement and removal until all the volumes have been created.

“(Note: Vendor disc drives may support burning to additional disc types, such as CD-RW and DVD-R. Refer to the documentation for your drive to determine which disc types are supported.)

“If the -file flag is specified, the backupios command creates a system backup image to the path specified. The file system must be mounted and writable by the Virtual I/O Server root user prior to running the backupios command (see mount command for details). Backing up the Virtual I/O Server to a remote file system will create the nim_resources.tar image in the directory you specify. The Virtual I/O Server must have root write access to the server on which the backup will be created. This backup can be reinstalled from the HMC using the installios command.

“The backupios command empties the target_disks_stanza section of bosinst.data (which is part of the nim_resources.tar image) and sets RECOVER_DEVICES=Default. This allows the mksysb file generated by the command to be cloned to another logical partition. If you plan to use the nim_resources.tar image to install to a specific disk, then you need to repopulate the target_disks_stanza section of bosinst.data and replace this file in the nim_resources.tar image. All other parts of the nim_resources.tar image must remain unchanged.”

When I take backups, I typically think in terms of having access to a NIM server in my environment, so I’m just interested in the VIOS mksysb. I like to run:

backupios -file vio.mksysb -mksysb –nomedialib

Using the –nomedialib flag means I exclude the media library, so I’m not backing up all of those .iso images that hang around in my VIOS’s /var/vio/VMLibrary filesystem. Of course, it’s pointless to waste that space on a bunch of CD images (.iso files), since they’re generally simple to recreate if need be. (Of course there are exceptions, so by all means backup any images that are NOT easily recreated.)

Again, be sure to backup your VIOS environment with both viobr and backupios. Together, they give you the tools you need should something go wrong.

IBM Updates AIX, POWER7 Lineup

Edit: Have you migrated off POWER7 yet?

Originally posted October 11, 2011 on AIXchange

I install POWER7 systems at customer sites all around the country. Once customers get their hands on these new systems, I find that people are wowed by the hardware speed. Especially impressed are those customers who upgrade from machines a generation or two back, like POWER5 machines.

This week IBM is announcing some changes to AIX and the POWER7 lineup. Although the entry servers will still be known as the 710, 720, 730 and 740 and the enterprise servers will still be called the 770 and 780, they will all have new model and machine type numbers. This is intended to help customers differentiate the new servers from the old, though it’s important to understand that these machines are not POWER7+. General availability is set for Oct. 21.

Here are the new numbers:

Model      Machine Type

  • 710 8231-E1C
  • 720 8202-E4C
  • 730 8231-E2C
  • 740 8205-E6C
  • 770 9117-MMC
  • 780 9179-MHC

All of the POWER7-enhanced systems end with the letter C, while of course the current models end in B, so it’s easy to determine which system type you have.

Another change made in the interest of clarity is that the 710 and 730 no longer share the same machine type and model. Also note that the 740 is no longer available as a tower — it’s exclusively rack-mounted now.

Enhanced I/O Capabilities and Higher Memory Densities 

The biggest changes in the hardware revolve around the enhanced I/O capabilities and the increased memory densities across the servers. The servers all benefit from PCIe Gen2, which, according to the announcement details that I saw, provides “twice the I/O bandwidth which will enable higher performance, greater efficiencies and more flexibility.” Keep in mind that if you’re not driving your Gen1 PCIe adapters to the point where they become your bottleneck, simply switching to Gen2 won’t magically give you better performance. However, you will get better utilization of the hardware going forward with Gen2.

PCIe Gen2 provides for more I/O ports available per adapter. You’ll now see dual port 10G Ethernet cards and 4-port 8G fibre adapters. You’ll be able to push SAS data out at 6G per second vs. the current generation’s 3G per second. The new 5913 Large Cache SAS adapter has 1.8 GB cache and can drive up to 72 HDD or 26 SSD, or you can mix and match the drive types with this adapter. A huge improvement with this card is that it no longer has batteries, so you won’t have to worry about replacing them. If it loses power the card will use a capacitor and write to flash memory. Note that this card won’t be available before Oct. 31.

Gen2 allows you to more fully virtualize your systems by pushing more I/O with fewer adapters. With the new Gen2 adapters, you’ll benefit whether it’s fibre, SAS, networking or infiniband. Moving forward, we can stop thinking about PCI-X and concentrate solely on PCIe.

These new systems have more PCIe I/O slots in the CEC, with greater functionality per slot. The familiar IVE/HEA adapter is replaced with a standard 2-port 1 GB Ethernet card (on the entry systems) and an integrated multifunction card (on the enterprise machines). The latter consists of a 4-port card with two 10GB Ethernet ports and two 1 GB Ethernet ports, plus USB ports and a serial port.

There were four card slots in the entry level CEC; now the entry systems have five slots that can be populated, while the enterprise machines have six slots per CEC. Considering the optional half height cards that can be added to the 720 and 740, you can have up to 10 total cards by counting the standard Ethernet card that comes with the system (though you can’t use another card in place of the Ethernet card in that slot).

This announcement also includes new DIMM sizes: 64 GB in the enterprise server space and 16 GB in the entry systems. This allows the new “C” models to have greater maximum memory: 128 GB on the 710, 256 GB on the 720 and 730, and 512 GB on the 740. The new 770 and 780 models can have up to 4 TB of memory in the 4 node system, 1 TB per CEC.

If you need even more cores, a 96-core large capacity 780 server is available. Imagine pairing up 96 cores and 4 TB of memory on your 780. In addition, a clock speed tweak brings the 770 to 3.3 and 3.7 GHz, depending on whether you chose six or eight core per socket. The 780 can max out at 3.92 GHz.

Finally, watch for larger capacity 15K SFF SAS drives and a 1 TB RDX removable disk drive. The latter is positioned as an intriguing alternative to tape.

As you’d expect, customers can continue to upgrade to the latest technology from existing systems, including POWER6 570s and 520s.

PowerVM and AIX Updates

Besides hardware improvements, changes are coming with PowerVM and AIX. Active memory mirroring is a feature where the hypervisor has two copies running at the same time, with both copies being updated simultaneously. In the (rare) event of a hypervisor memory failure on the primary copy, the second copy will be invoked with notification sent to IBM. This capability was previously available on the 795, but now with the new machines it comes standard on the 780 and as an option on the 770.

With AIX 7 TL1 expect to see a new feature called active system optimizer, which is designed to autonomically improve workload performance (AIX 7 on POWER7 only). A new network option you can set is called tcp_fastlo, which enables TCP fast loopback. This reduces TCP/IP overhead and lowers CPU utilization if two TCP endpoints are on the same frame (e.g., communication between two processes in the same LPAR).

In addition, AIX features JFS2 filesystem enhancements that allow admins to tune performance by altering filesystem caching. This can be accomplished without having to unmount filesystems. Compared to earlier AIX releases, there’s a 50 percent reduction in JFS2 memory usage for metadata.

Other software enhancements include:

  • A new logical volume manager option to retry failed I/O operations indefinitely. This capability can aid in recovery from transient failure of SANs, for instance.
  • AIX 5.3 WPARs, which follow on the current AIX 5.2 WPAR offering. This allows you to run 5.3 workloads inside of AIX 7 into the future (i.e., even after IBM eventually ends its support of AIX 5.3). AIX 5.3 TL12 SP4 is required to make use of the 5.3 WPARs.

With the new C models, these versions of AIX and VIOS are supported:

AIX 5.3 TL12 SP5
AIX 6.1 TL5 SP7
AIX 6.1 TL6 SP6
AIX 7.1 TL0 SP4
AIX 7.1 TL1
VIOS 2.2.1

  • A new offering called PowerSC provides automated tools for security and compliance standards on PowerVM virtual machines. Using trusted logging, you can capture and compile AIX audit information from LPARs in real-time. (Did someone make a dynamic change to an AIX LPAR?) Trusted boot cryptographically signs and validates boot images before they’re started, while trusted network connect verifies that a boot image that’s trying to connect to the network is at the correct security patch and update level. Finally, prebuilt compliance profiles match industry standards like PCI, DOD and SOX.
  • Another new capability is active memory deduplication. It’s available on the new machines running the new firmware, and is used in conjunction with active memory sharing. Active memory deduplication allows systems containing duplicate memory pages to remove those duplicates while fitting similar workloads within any physical memory constraints.
  • PowerVM offers its own improvements. Live partition mobility operations can potentially run at twice the previous speed while performing up to eight LPM operations at once. Network balancing allows for load balancing across backup and primary shared Ethernet adapters. Shared storage pools are also enhanced. These PowerVM capabilities are available on the new VIO server. I’ll definitely write much more on this soon.
  • A new entry level analytics system, the 7710, is meant for customers that don’t need the full capacity of the existing 7700 offering. While coming in at about half the price of the 7700, the 7710 is a fully optimized and integrated solution that can be used in test and dev environments. It’s targeted for those with data warehouses under 10 TB.

There are also updates to PowerHA SystemMirror, including an SAP LiveCache Hot Standby solution, and PowerHA Federated Security, which provides for centralized administration via System Director, along with additional supported storage options to use with HA (including XIV, the V7000, the SVC and DS8800 and options from EMC, Hitachi and HP).

Finally, keep an eye out for coming changes to documentation, installation, configuration, management and packaging. Although some of these improvements aren’t quite ready, IBM’s intention is to make PowerVM quicker and easier to install and configure. Look forward to things like no-touch VIOS installation, GUI-based VIOS installs, VIOS setup and validation tools, and the capability to manage VIO servers as a pair rather than individually.

Backing Up VIOS

Edit: I still find customers that are not taking good backups. Some links no longer work.

Originally posted October 4, 2011 on AIXchange

Once you’ve set up your VIO server (VIOS), mapped the disks and configured everything, one question remains. How are you going to back up those settings? The answer is the viosbr command. I wrote about this back in January 2010, but I’m not sure how many people are using it. You’ll find much more about viosbr here.

From the website:

“[Viosbr] performs the operations for backing up the virtual and logical configuration, listing the configuration and restoring the configuration of the Virtual I/O Server. The viosbr command can be run only by the padmin user.

“This viosbr command backs up all the relevant data to recover a Virtual I/O Server after a new     installation. The -backup parameter backs up all the device properties and the virtual devices     configuration on the Virtual I/O Server. This includes information regarding logical devices, such as storage pools, file-backed storage pools, the virtual media repository and PowerVM Active Memory Sharing (AMS) paging devices. It also includes the virtual devices, such as Etherchannel, shared Ethernet adapters (SEAs), virtual server adapters and server virtual fibre channel (SVFC) adapters.

“Additionally, it includes the device attributes, such as the attributes for disks, optical devices, tape devices, fibre channel SCSI controllers controllers, Ethernet adapters, Ethernet interfaces and logical Host Ethernet Adapters (HEAs). All the configuration information is saved in a compressed XML file. If a location is not specified with the -file option, the file is placed in the default location /home/padmin/cfgbackups if the user does not specify a full path for saving the file. This command can be run once or can be run in a stipulated period of time by using the -frequency parameter with the daily, weekly, or monthly option. Daily backups occur at 00:00, weekly backups on Sunday at 00:00, and monthly backups on the first day of the month at 00:01. The -numfile parameter specifies the number of successive backup files that will be saved, with a maximum value of 10. After reaching the given number of files, the oldest backup file is deleted during the next backup cycle. The format of the file name is .xx.tar.gz, where xx starts from 01.

“The viosbr command does not back up the parent devices of adapters or drivers, device drivers, virtual serial adapters, virtual terminal devices, kernel extensions, the Internet Network Extension (inet0), virtual I/O bus, processor, memory, or cache.

“The -view parameter displays the information of all the backed up entities in a formatted output. This parameter requires an input file in a compressed or noncompressed format that is generated with the -backup parameter. The -view parameter uses the option flags type and detail to display information in detail or to display minimal information for all the devices or for a subset of devices. The -mapping option flag provides lsmap-like output for Virtual Small Computer System Interface (VSCSI) server adapters, SEA, SVFC adapters and PowerVM Active Memory Sharing paging devices. The entities can be controllers, disks, optical devices, tape devices, network adapters, network interfaces, storage pools, repositories, Etherchannels, Shared Ethernet Adapters, VSCSI server adapters, SVFC adapters and paging devices. The -list option displays backup files from the default location /home/padmin/cfgbackups or from a user-specified location.”

Although viosbr is great for capturing mappings etc, you must still run the backupios command if you plan on creating a mksysb of the vios root volume group from your VIOS.

Although you may be backing up your client LPARs, you should also be backing up your VIOS.

Logging Fibre Cards into a Switch

Edit: Some links no longer work.

Originally posted September 27, 2011 on AIXchange

I recently worked with a customer that was trying to figure out how to log their fibre cards into a switch before loading an OS onto the LPAR.

I immediately thought of this recent documentation. Although this information is intended for NPIV clients, it worked just fine for our standalone LPARs and physical fibre cards.

“If a vfc-client device is defined for an LPAR which is already running an operating system, then if/when the operating system opens the vfc-client device, the device will log in to the SAN. But in some cases it is desirable to force a vfc-client device to log in to the SAN before an operating system is installed.

“SSH to an HMC which is managing the LPAR. Use the vtmenu command on the HMC to open a virtual terminal session on the LPAR’s system console. On the HMC GUI, select the server on which the LPAR resides, then select the LPAR, and shut the LPAR down if it is running. Then, use Operations>Activate>Profile>Advanced … to open the Activate Logical Partition-Advanced window. In the window, select Boot mode: Open Firmware OK Prompt. In the LPAR’s system console window, you will see the LPAR start up and present the open firmware prompt (“0 >”).”

We followed these instructions, booted our LPAR and ended up at the 0 > prompt.

We then ran ioinfo from the Open Firmware OK Prompt:

    0 > ioinfo

I then saw:

    Select a tool from the following
    1. SCSIINFO
    2. IDEINFO
    3. SATAINFO
    4. SASINFO
    5. USBINFO
    6. FCINFO
    7. VSCSIINFO
    q – quit/exit
    ==> 6

I selected option 6 to run FCINFO:

    FCINFO Main Menu
    Select a FC Node from the following list:
      #  Location Code                Pathname
    —————————————————————
      1. U8233.E8B.0623B7P-V5-C21-T1    /vdevice/vfc-client@30000015
      2. U8233.E8B.0623B7P-V5-C22-T1    /vdevice/vfc-client@30000016
      q – Quit/Exit
    ==> 1

I then selected the correct fibre channel port. (I was using two physical 2-port fibre adapters. The example above shows some virtual fibre adapters.) Then I selected 1 to list the attached fc devices. It took a minute and then it logged into the switch.

Once that was done the SAN guys did their zoning magic and we were able to boot from NIM and install the OS on SAN LUNs.

This saved the SAN guys the aggravation of manually entering WWNs, and in one case it allowed us admins to discover that one of the fibre cables hadn’t been connected to the card. Once we attached that cable and reran the FCINFO command, it logged right in.

I’ve booted LPARs from NIM servers in the past to do the same type of thing, but how about you? How do you like to set up your machines and get them logged into the SAN?

Note: A quick reminder about the upcoming IBM Power Systems Technical University conference in Miami. It starts on Oct. 10, so register soon if you plan on attending. And be sure to follow #ibmtechu on Twitter for more information.