Locating a Problematic Filesystem

Edit: Some links no longer work.

Originally posted September 9, 2014 on AIXchange

It was an ordinary day. I needed to take a mksysb. Only this time, I was getting an error.

            /usr/bin/mkszfile[1266]: FS_MIN_LOG = FS_MIN_LOG *

            20480 : 0403-009 The specified number is not valid for this command.

            0512-008 mksysb: The mkszfile command failed. Backup canceled.

I checked ps –ef | grep mkszfile and saw that it was still trying to run, but it wasn’t doing anything. I went ahead and killed the process.

The error message didn’t tell me much, but fortunately a quick web search yielded a few different ideas and suggestions, including this. Then I found an entry from this blog (that advertises “Unix tips, food reviews and astronomy”):

“A google search revealed it was probably a bad FS causing the problem. To identify which one(s), I ran the following: sh -x /usr/bin/mkszfile

“This gave the full output and I could see which file system it was processing when it crashed. I then unmounted the file system, [ran fsck] and remounted it before re-running the mkszfile.

“In this case there were four file systems it complained about and after [running fsck] them the mkszfile ran through ok. A re-run of the mksysb then worked ok.”

That seemed simple enough, so I gave it a try. It went exactly as described in the blog post. After running the command, the filesystem that was causing me issues was the last one that was processed before the error occurred. Luckily for me that filesystem wasn’t being used at the time, so I just unmounted it, ran fsck –y /filesystem and then remounted it. Then the mksysb worked as expected.

Now when the next person does a web search on this error code, there will be two sources confirming that running sh –x /usr/bin/mkszfile is the way to locate the filesystem that’s causing you problems.

vSCSI vs. NPIV

Edit: Most people seem to do everything with NPIV these days.

Originally posted September 2, 2014 on AIXchange

The IBM Redbook, “PowerVM Best Practices,” has a detailed look at mixing vSCSI and NPIV on VIO client LPARs.

From Section 5.1.3:

“It is possible to mix a virtual Small Computer System Interface (SCSI) and N-Port ID Virtualization (NPIV) within the same virtual I/O client. You can have rootvg or booting devices that are mapped via virtual SCSI adapters, and data volumes that are mapped via NPIV.

“Mixing NPIV and a virtual SCSI has advantages and disadvantages, as shown in Table 5-1.

Advantages

* It makes multipathing software updates easier for data disks.

* You can use the Virtual I/O Server to perform problem determination when virtual I/O clients have booting issues.

Disadvantages

* Requires extra management at the Virtual I/O Server level.

* Live Partition Mobility (LPM) is easier with NPIV.”

What’s your preference? Do you want your SAN guys to provide all your LUNs via NPIV and manage the same multipath drivers on the client for both rootvg and datavg? Or would you rather manage your rootvg multipath drivers on your VIO server, map up the rootvg disks to the clients via vSCSI and use NPIV for your data LUNs?

I prefer to use vSCSI for rootvg. I want to boot my VIO server from my internal disks, map some LUNs to my VIO server to use for my client LPARs rootvg, and then map my data disks via NPIV to my client LPARs. This allows me to troubleshoot by booting my VIO servers locally, and boot my LPARs “locally” via vSCSI.

When I need to update multipath software on the client LPARs, I’m not dealing with a chicken-and-egg dilemma where I’m booting my machine using the same multipath software I now need to update.

When I need to update my client rootvg multipath software, I’m updating my VIO server, which also booted locally. At no time am I “changing the tire while the car is speeding down the road,” as might be necessary if I updated drivers when booting my client using NPIV.

Yes, doing it this way requires more effort compared to simply having your SAN team map everything to your clients. In the end though, I believe the benefits outweigh the burdens.

If you disagree, feel free to make your case for NPIV in comments. I’ll also accept input from anyone who wants to back me up on vSCSI.

Useful Storage Links

Edit: Some links no longer work.

Originally posted August 26, 2014 on AIXchange

Here’s an assortment of really good storage-related articles — the majority of which are found on IBM developerWorks — that are worth your time. While some of them are a few years old, they still provide relevant information.

“Guide to selecting a multipathing path control module for AIX or VIOS.”

“Using the AIX Logical Volume Manager to perform SAN storage migrations.”

“IBM AIX SAN Volume Controller update and migration.”

“IBM AIX MPIO: Best practices and considerations.”

“Tracing IBM AIX hdisks back to IBM System Storage SAN Volume Controller (SVC) volumes.”

“Shuffling disk data around.”

“AIX and VIOS Disk And Fibre Channel Adapter Queue Tuning.”

“Move data quickly between AIX LPARs using Logical Volume Manager.”

“Tip: Online migration of a file system to a smaller physical volume.”

If you know of other useful storage-related articles, please cite them in comments.

More Resources for AIX Newbies

Edit: I wonder how many newbies there are year over year.

Originally posted August 19, 2014 on AIXchange

As I’ve noted previously, there are more newcomers to the AIX platform than you might imagine. A company may acquire an AIX system through a merger or replace an old Solaris or HP-UX box with a current IBM Power Systems model. As a result, one of their IT pros suddenly becomes the AIX guy. So, now what? How does an AIX newbie get up to speed with virtualization and AIX?

 I’ve mentioned the QuickSheets and QuickStarts from William Favorite. I’ve also highlighted conferences, classes and free monthly user group meetings that you can look into. Recently though, I was pointed to this old IBM web page featuring various AIX learning resources. I call it old because some of the links no longer work, but what’s still available is surprisingly useful.

Some of the material covers concepts from AIX 5.3, but even much of this information remains valid today. It’s also nice that some of the links take you to current Redbook offerings and IBM training courses.

The working links cover:

* AIX security and migration (this is AIX 5.3 material)

* Virtualization introduction

* Systems Director

* Power Systems Redbooks (updated here)

* IT technical training

* IBM business partner training

* IBM professional certification

On a related note, I’ve always believed that the simplest thing employers can do to help their IT staff members get started with AIX or any operating system that’s new to them is to invest in a small lab/sandbox machine and HMC.

I’m continually amazed to see companies spend big bucks on the latest hardware and software, but then neglect to foot the bill for additional test systems. It’s great that some companies devote an LPAR or two to testing, but you can only do so much in that environment. (In addition, there can be pressure to repurpose virtual test labs into running other production workloads. Then before you know it, the production needs grow so critical that these LPARs are made offlimits to reboots and testing.)

With Windows and x86 Linux servers especially, it’s relatively easy and cheap to get access to test machines. I also know of people who’ve purchased old Power hardware on eBay just to have something that they can run AIX on.

With actual test boxes, you can safely reboot servers, install firmware and upgrade operating systems without touching production. If you make a mistake on a test system, not only haven’t you hurt anything, you’ve learned a valuable lesson.

How do you learn, and keep learning? How do you stay current with your skills? If your machine is happily running along and you have little need to touch it, how can you ever expect to be able to support the machine when an issue hits?

Connecting Your HMC to IBM Support

Edit: I would imagine some ports have changed in the last 6 years.

Originally posted August 12, 2014 on AIXchange

You’ve been asked to connect your HMC to IBM Support. The network team wants to know about the different connectivity options. They need to know which IP addresses must be opened across the firewall.

What do you do? First, read this:

 “This document describes data that is exchanged between the Hardware Management Console (HMC) and the IBM Service Delivery Center (SDC) and the methods and protocols for this exchange. This includes the configuration of Call Home (Electronic Service Agent) on the HMC for automatic hardware error reporting. All the functionality that is described herein refers to Power Systems HMC version V6.1.0 and later as well as the HMC used for the IBM Storage System DS8000.

“Outbound configurations are used to configure the HMC to connect back to IBM. The HMC uses the IBM Electronic Service Agent tool to connect to IBM for various situations including reporting problems, reporting inventory, transmitting error data, and retrieving system fixes. The types of data the HMC sends to IBM are covered in more detail in Section 4.”

Included are diagrams that show different scenarios for sending data to IBM, including with/without a proxy server, using a VPN, or even using a modem (though IBM does recommend Internet connectivity). Specific options including pass through server connectivity, multi-hop VPN, and remote modem. IBM states that there are no inbound communications; all communications are outbound only.

Further, IBM explains why your machine may need to “call home”:

            * To report to IBM a problem with the HMC or one of the systems it’s managing.

            * To download fixes for systems managed by the HMC.

            * To report to IBM inventory and system configuration information.

            * To send extended error data for analysis by IBM.

            * To close an open problem.

            * To report heartbeat and status of monitored systems.

            * To send performance and utilization data for system I/O, network, memory, and processors.

There’s also a list of the files that are sent to IBM, and the authors point out that no client data that is sent to IBM.

On that note, here’s IBM’s statement on data retention:

“When Electronic Service Agent on the HMC opens up a problem report for itself, or one the systems that it manages, that report will be called home to IBM. All the information in that report will be stored for up to 60 days after the problem has been closed. Problem data that is associated with that problem report will also be called home and stored. That information and any other associated packages will be stored for up to three days and then deleted automatically. Support Engineers that are actively working on a problem may offload the data for debugging purposes and then delete it when finished. Hardware inventory reports and other various performance and utilization data may be stored for many years.

“When the HMC sends data to IBM for a problem, the HMC will receive back a problem management hardware number. This number will be associated with the serviceable event that was opened. The HMC may also receive a filter table that is used to prevent duplicate problems from being reported over and over again.”

Finally, there’s this list of the IP addresses that need to be allowed across any firewalls. All connections use port 443 TCP:

            Americas

            • 129.42.160.48

            • 129.42.160.49

            • 207.25.252.200

            • 207.25.252.204

            Non-Americas

            • 129.42.160.48

            • 129.42.160.50

            • 207.25.252.200

            • 207.25.252.205

IBM adds that when an inbound remote service connection to the HMC is active, only these ports are allowed through the firewall for TCP and UDP:

            * 22, 23, 2125, 2300 — These ports are used for access to the HMC.

            * 9090, 9735, 9940, 30000-30009 — These ports are used for Web-based System Manager             (POWER5).

            * 443, 8443 — These ports are used for Web-based user interface (POWER6).

            * 80 — This port is used for code downloads.

Take a few moments to read this document. Or, even better, send it to your network team so they can read it for themselves.

On Going Dark

Edit: Some links no longer work. I do not unplug nearly as often as I should.

Originally posted August 5, 2014 on AIXchange

I have another quick story involving my work with Boy Scouts.

Each summer we try to get the older boys involved in some high adventure activities. Last year this included target shooting (shotguns and .22 caliber rifles), archery, spelunking, rappelling and hatchet throwing. I didn’t bring my laptop, but with my cellphone I could check in with the office and answer emails. Really, it was the best of both worlds. I was able to camp out, but at the same time I could help out the people I work with.

This summer’s adventures consisted of backpacking, canoeing and canyoneering. Everything went smoothly in our case, though we do know members of the troop that had to be rescued around the time we were out.

For me, the main difference between this year and last was that I didn’t have cellphone coverage during our recent trek into the Arizona mountains. Honestly, I’m not sure this was a bad thing.

Where we were, there was absolutely no cellular coverage of any kind (though just 20 miles down the mountain, the service was fine). Of course when you’re responsible for the well being of a bunch of kids, you’d prefer to have a means of instant communication should an emergency arise. The troop leaders were talking about satellite phones. Perhaps next year we’ll look at something like this or this.

However, just looking at it from a work perspective, what would you do? Would you be OK knowing that cell phone service was a 15-20 minute drive away, or do you need to be constantly in touch? I will admit that I like to know what’s going on, not only in my world but in general. I had no way of checking headlines or sports scores or emails. I was completely cut off.

And yet, I think I enjoyed it. 

It takes awhile to truly unplug, and I might have gone through some withdrawal symptoms initially upon losing my access. Eventually though, I felt relieved. Since I knew that checking in wasn’t an option, I could focus on enjoying the trip. Since I couldn’t check messages, I didn’t feel guilty about not responding to them. The “out of office” auto message option exists for a reason, after all. I was finally, truly, away.

For another perspective on what it’s like to go a few days without having a working Internet in your hands, Jon Paris and Susan Gantner share this story about “going dark” during a cruise.

I guess there’s something to be said for being unplugged, especially if you’re out in nature. Even though I returned to tons of messages, when I got back I was recharged and ready to get back to work.

How about you? When you go on vacation, do you escape from technology?

Can We Talk? Yes, and it’s So Much Easier Now

Edit: It has only gotten easier to talk to people around the world.

Originally posted July 29, 2014 on AIXchange

A friend living overseas recently emailed me. He was having issues with an older HACMP cluster and wanted another set of eyeballs to check it. At the time I happened to be talking with a PowerHA guru, so I invited him to take a look as well.

Our small troubleshooting group reminded me of the people who work on their cars in their driveway. At least in my formative years, the sight of someone tinkering with a car would inevitably draw curious neighbors eager to see the mechanic do his thing. In this case, the attraction was an old HACMP cluster that — via a WebEx session — my guru friend and I could examine from several time zones away.

I’m still amazed at the relative ease with which it is now possible to communicate with anyone, anywhere. I have family members in South Africa. Years ago they actually sent a telegram to my door because they couldn’t reach me on the phone. (Not that transnational phone service was inherently unreliable in those days, but occasionally calls didn’t get through.) Surprised as I was to discover that telegrams still existed, it was the best alternative for delivering time-sensitive information at that time.

Awhile back, I sent them a magicJack VOIP system so they could have a local U.S. number. This means that any time I want I can pick up the phone and make what’s essentially a free phone call to the other side of the world.

Admittedly, VOIP technologies aren’t yet completely reliable. My friend with the HACMP cluster experienced issues with his VOIP solution. We tried IM, but weren’t satisfied waiting for each side to type out messages. Ultimately, he opted to call me on his cell phone. Of course that wasn’t free, but calling internationally is much cheaper than it was even a few years ago.

As for the HACMP issue, it was fairly straightforward. A change had been made in the environment. Someone added NFS to the cluster nodes, but not to the HACMP resource groups. The admin then decided to remove NFS, but didn’t remove it completely. As a result, the cluster was out of sync, and HAMP wouldn’t start at the next failover:

            ERROR: The nodes in resource group HA_RG are configured with more than one NFS domain. All nodes in a resource group must use the same NFS domain.

            Use the command ‘chnfsdom <domain name>’ to set the domain name.

With this error message pointing us in the right direction, the issue was quickly resolved.

We’re fortunate enough to work with some impressive technology, and that includes the older systems that continue to function effectively. But do you ever stop and really think about the amazing communication capabilities we have these days? Do you just take it for granted that these devices that fit in our pockets and purses allow us to interact in realtime with people from around the world for a relatively low cost and with very little effort?

Lessons from a Physical Server Move

Edit: Monitoring goes a long way. Some links no longer work.

Originally posted July 22, 2014 on AIXchange

A customer planned to use Live Partition Mobility (LPM) to move running workloads from frame 2 to frame 1. The steps were: shutdown frame 1, physically move frame 1, recable frame 1 and power it back on, then use LPM to bring the workload from frame 2 to frame 1, and, finally, repeat the process to physically move frame 2.

The task at hand was simple enough, but there was a problem. The physical server that was being moved had been up for 850 days. Do not make the mistake of moving a machine that’s been running continuously for more than two years without first logging in and checking on the server’s health. Furthermore, make sure you’ve setup alerting and monitoring of your servers.

I got a call after step one of the customer’s plan was complete and the damage had been done. Nonetheless, much can be learned from this episode.

Was errpt showing tons of unread errors? Yes. Had the error log been looked at? No. Had someone cleared the error log before support got involved with the issue? Yes. Was support still able to help? Yes. When you send a snapshot to IBM support, they can access the error log even if it’s been cleared from the command line, assuming those errors have not been overwritten in the actual error log file in the meantime.

Were there filesystems full? Yes. In this case one of the culprits was the /var/opt/tivoli/ep/runtime/nonstop/bin/cas_src.sh script, which wrote a file — /dev/null 2>&1 — that filled up the / filesystem.

To make matters worse, the machines are part of a shared storage pool, and after the physical move frame 1 would not rejoin the shared storage pool (SSP) cluster. This left only two of four VIO servers as part of the SSP.

It turned out that after the physical move, the network ports weren’t working. As a result, Multicast wasn’t working. At least getting Multicast back up was easy enough. However, the two VIO servers were still unable to join the cluster, and the third VIO server on frame 2 (vio3) had protected itself by placing rootvg in read-only mode as it logged physical disk errors. So from a 4-VIO server cluster, only one was actually functional, and that one had its own issues. If things weren’t fixed quickly, production would be impacted.

The problem with the one operable VIO server was, because it switched to read-only, SSP errors were occurring whenever someone tried to start or stop any of the cluster nodes. In other words, it was keeping the cluster in a locked state:

            clstartstop -start -n clustername -m vio3
            cluster_utils.c get_cluster_lock 6096 Could not get lock: 2
            clmain.c cl_startstop 3030 Could not get clusterwide lock.

Fortunately, rebooting the third VIO server cleared up this issue. And with that, the other VIO servers came back into the SSP cluster. Ultimately, the customer was able to use LPM to move clients to frame 1, which had already been physically moved. This allowed the customer to then shut down frame 2 and physically move it as well.

So what have we learned? Check your error logs. Check your filesystems. Schedule the occasional reboots of your machines. Make sure you’re applying patches to your VIO servers and LPARs. Make sure you have good backups.

Finally, note that in this instance, having the capability to perform LPM operations really made a huge difference. Despite the severity of these problems, the users of these systems had no idea that anything had been going on at all.

System Monitoring Shouldn’t Be Neglected

Edit: You still find this phenomenon, and it still surprises me.

Originally posted July 15, 2014 on AIXchange

What are you doing to monitor your systems from both the hardware and OS levels? Are you using a commercial product? Are you using an open source product? Are you using hand-built scripts that run from cron? Are you using anything?

Have you logged into your HMC lately? Does anything other than green appear in the system status, attention LEDs or Serviceable Events sections of the display? Countless times I’ve seen machines where the HMC messages were being ignored. Is your HMC set up to contact IBM when your servers run into any issues?

When your machines have issues, are you deluged with alerts? One customer I know of had a script that monitored their machine and sent emails when errors were detected. During one event, the PowerHA system actually failed over because the node became unresponsive due to the volume of errors being generated and the way the script was written. This forced the customer to go into the mail queue and clean up a huge number of unsent messages. Then they had to go into the email client and clean up all of the messages they’d received. Finally, they had to schedule downtime to fail the application back to the node it was supposed to be running on.

I know of multiple customers that simply route error messages to a mail folder — and then never bother checking them. What’s the point of monitoring a system if you never analyze the information you collect?

How diligent are you about deactivating monitoring during periods of scheduled maintenance? In many organizations where a help desk monitors systems, cycles are wasted because techs are so often called to follow up on alerts and error messages triggered by scheduled events.

Of course there are other impacts that can result from neglecting systems. If internal disks are going bad, and you’re not monitoring and fixing them, eventually you will lose your VIOS rootvg (assuming that’s how you have it set up). And just as some customers will ignore the system monitoring messages they collect, other customers don’t take action on hardware events that are being logged. Having robust hardware that notifies you when it needs maintenance is only useful if you actually heed the notifications.

Deploying your OS and installing your application is relatively simple, but along with that we must make decisions and take actions to manage and maintain these systems during the operational production phase of service. Sure, everyone is busy, and some tools cost money — but try explaining that to someone who cares when production goes down.

On a totally unrelated topic, I want to acknowledge that AIXchange is having a birthday. Seven years ago this week — July 16, 2007 — the first article was posted on this blog. Many thanks to everyone who takes the time to read this blog, and special thanks to those who have suggested topics. I welcome your input, and it does make a difference.

Here’s to the next seven years.

Webinars Cover the World of AIX

Edit: Some links no longer work.

Originally posted July 8, 2014 on AIXchange

Hopefully you regularly listen to the AIX Virtual User Group webinars, either live or on replay. Recent sessions have been devoted to the POWER8 server announcements, Linux on Power and SRIOV.

If you’re outside of the U.S., you should know that similar webinars are taking place worldwide. For instance, there’s the IBM Power Systems technical webinar series that originates from the U.K. This group’s next event, which is set for July 16, covers PowerVKM. Dr. Michael Perzl is the presenter, and as someone who’s already working with PowerVKM, I look forward to what he has to say.

Previously, this group presented “More tricks of the Power Masters,” which, as you might imagine, was an hour-long session consisting of tips and tricks for using IBM Power Systems hardware. Thirty-eight total replays of these sessions can be found here. Specifically, I recommend this video of several presentations by Gareth Coates. Gareth is an excellent speaker who’s always on the lookout for tips he can use in future sessions, and he mentioned that he is on the lookout for IBM i content as well. (He’ll be sure to give you credit for your help.)

As I’ve mentioned on numerous occasions, there’s little I love more than learning, finding and sharing AIX tips and tricks. With that in mind, please indulge me while I cite some specific information that’s available in the “Power Masters” videos:

* For starters, to force a refresh of the operating system level information on the HMC, run:

            lssyscfg –r lpar –m –osrefresh

(In addition, Power Masters offers good info on performing HMC updates from the network, which I’ve also written about here and here.)

* To find out how many virtual processors are active on my system, use the kdb command (and use it carefully):

            echo vpm | kdb

* To protect AIX processes when AIX is out of memory, use:

            vmo –o nokilluid=X

* To test your RSCT connection, use:

          /usr/sbin/rsct/bin/rmcdomainstatus –s ctrmc

Some other Power Masters topics:

* Using Live Partition Mobility checklists. (I wanted to point this out so I have a reason to add that FLRT now has LPM checks available.)

* viosbr (which I’ve also covered here).

Some of the other information presented was first used in a session that took place in 2013, called Power “Ask the Experts.” I covered that here.

Of course there’s much, much more on not just AIX but also IBM i topics, so check out the Power Masters videos on YouTube. And if you don’t already, be sure to tune into the AIX Virtual User Group and IBM Power Systems technical series webinars.

We’re Not the Only Techies

Edit: I still cannot land a plane.

Originally posted July 1, 2014 on AIXchange

As I’ve noted previously, I work with Boy Scouts. Recently I took a group of boys to an airport to work on their aviation merit badge.

We found a pilot who was willing and able to spend time on a Saturday with the troop. He invited the scouts to visit a maintenance and training facility and spend time on an airplane simulator.

Although he had interesting information to share, I quickly figured that, as a pilot, he hadn’t spent a lot of time creating PowerPoint presentations. Prior to taking the scouts to the hangar so they could learn how to conduct a pre-flight inspection of an aircraft, he showed them a presentation covering the merit badge requirements. At one point, he clicked on what he hoped was a link to a video, but it turned out he had inadvertently made a screen capture of the video rather than an actual link to it. (Not that this issue wasn’t easily addressed; he ended up going directly to YouTube and showing us things like this.)

But indeed, our pilot guide did admit that he hadn’t used PowerPoint in years. On top of that, during the presentation, the overhead projector had an issue. For those of us who spend our time in meetings and conference rooms, fixing projector issues is second nature. Once again though, he wasn’t immediately sure what to do.

All of us — even the scouts themselves — were pretty smug about our computer and projector knowledge at this point. Then we went into the next room and got into the simulator. Long story short:  I’m not cut out to land an airplane, or even to keep one riding smoothly through the air. So we all have our different skills. Frankly, as long as my pilots are experts at flying, I’ll excuse their shortcomings when it comes to using software programs and projectors.

Of course the scouts, most of whom have considerable experience with computer games, made me feel even more inept on the simulator. A lot of those kids had a pretty light touch on the airplane controls and managed a reasonably good landing on the first try.

As an AIX pro, I’m generally surrounded by others with similar professional backgrounds. Quite possibly, it’s the same for you. But we should all keep in mind that while most people need computers to do their jobs, they don’t live and breathe technology the way that many of us do.

Ultimately, my day at the airport reminded me that, even if most people don’t know computers like we do, we’re far from the only smart folks out there doing challenging, technical work. And thank goodness for all these people and their unique specialties, because you really wouldn’t want to see me at the controls of your plane.

More POWER8 Docs

Edit: Some links no longer work.

Originally posted June 24, 2014 on AIXchange

I love reading about new computing technologies, particularly the latest IBM Power Systems releases. It doesn’t hurt that, as a consultant, I have opportunities to work with the newest hardware, but even if that wasn’t the case, I’d still want to know everything about what’s coming out of IBM. I guess I’m like those folks who read automotive magazines, even though I don’t plan on buying a new Tesla anytime soon.

With this in mind, I’d like to point you to three new IBM documents — draft Redpapers — that cover the recently unveiled POWER8 models.  All three publications are scheduled to be finalized by the end of this month.

As you might expect, given that the models have many of the same features, there’s some overlap in the information presented. For instance, this is the table of contents for all three publications:

            Chapter 1. General description
            Chapter 2. Architecture and technical overview
            Chapter 3. Virtualization
            Chapter 4. Continuous availability and manageability

So if you read these Redpapers back to back, you might have a case of déjà vu. Nonetheless, I believe the information is well worth your time.

Let’s start with redp5097, which covers the 4U models, the S814 and the S824. As a reminder, the S in the model number stands for scale out, the 8 stands for POWER8, the 1 or 2 stand for the number of sockets, and the 4 stands for 4U.

Redp5098 covers the S812L and the S822L. Again, as a reminder, S for scale out, 8 for POWER8, 1 or 2 for the number of sockets, and 2 for 2U. L designates that these are Linux-only servers. I wrote about my experiences with the S822L here.

Finally, there’s redp5102, which covers the S822. For completeness, the S is scale out, the 8 is POWER8, the 2 is 2 socket and the 2 is 2U.

At the bottom of the splash page for each publication there’s a link to a blog post that lists five things to know about the IBM POWER8 architecture. I suggest checking this out as well.

So what are your plans to run POWER8 in your shop?

Can Vendors Make Our Lives Easier?

Edit: This application is still handled this way and is still a pleasure to work with.

Originally posted June 19, 2014 on AIXchange

Recently I was listening to a few admins compare and contrast two different shops that run AIX. All of them had worked in the first environment for several years. In this environment — we’ll refer to it as Shop No. 1 — they were constantly fighting problems and fixing others’ mistakes. Bad management and bad change control were cited as the primary issues. Pages would come at all hours of the day. Periodically missing a child’s soccer game — or a full night’s sleep — was the norm. Their jobs were stressful, to say the least.

Eventually, these admins found new positions working in customer environments where a vendor dictates much of their production computing environment. The vendor has very strict requirements that must be met before applications are allowed to run in production. Customers must have adequate hardware to handle the anticipated workload and capacity. Customer hardware must hit very specific IOPS numbers. The vendor requires access to customer systems that host the vendor software, and customers must agree to run vendor-specific monitoring tools on the systems. There are very strict requirements around change control, which means changes aren’t made on the systems without approvals. It’d take a catastrophe — a very rare and unusual event — for an admin to ever get called out of bed to go to work. Understandably, this group of admins was happier, professionally and personally, working with this vendor’s software. We could call this environment Shop No. 2.

Now for the discussion itself. The argument was made that if Shop No. 1 went with this vendor product or a similar solution that led to the same strict requirements being enforced, it would cease to be such a difficult place to work. But is it really possible that vendor requirements can so profoundly impact their customers’ working environments?

From the vendor perspective, being hands-on makes sense. If I can get my customers to agree to run my software on systems that have the capacity and functionality to handle it, if I can get them to properly manage and monitor their systems, my software — if it’s any kind of quality product — should work well. And I shouldn’t have to deal with irate customers blaming me when they try to run my software on an under-capacity system or when their own in-house programming introduces bugs. It’s in my interest to support customers — and only those customers — who agree to my requirements, because I can be confident there won’t be issues. Why wouldn’t I want to do that?

On the flip side, if I’m an admin, why wouldn’t I want to make sure my systems are capable of running the software I’m using? Why wouldn’t I want proper change control processes to be instituted? Doesn’t this seem like a win-win?

When considering vendors and the vendor solutions we deploy on our hardware, besides asking if a given software package will do the job, maybe we should ask how it will be supported. Because I can easily imagine a world where very specific instructions and very specific software support lead to very stable work environments.

How about you? Does your vendor have significant say-so over your environment, or is vendor input largely limited to implementation and support? How involved do you want your vendor to be?

Getting Started With PowerKVM

Edit: This is no longer a thing, although the commands and tools are still useful.

Originally posted June 10, 2014 on AIXchange

I recently installed and started playing with IBM PowerKVM software on the S822L. Luckily I had a good PowerKVM quick start guide to follow. There’s also a draft version of a Redbook that will help you with your installation. It covers how to netboot and includes examples of the menus you’ll encounter.

I started by connecting my laptop Ethernet port to the HMC1 port on the S822L. This port is using the default address of 169.254.2.147, while the HMC2 port is using the default of 169.254.3.147. In my case I set my laptop to 169.254.2.140 and logged into ASMI as I’m used to doing with PowerVM.

Once I was in I was prompted to change my ASMI password. Then I went to the system configuration/hypervisor configuration menu item in ASMI. There I was presented with a choice of using PowerVM or PowerKVM. I selected PowerKVM and entered an IPMI session password.

To get a console or power on and off the system, you need to get impitool. With a web search you can find methods using Cygwin to get it running on Windows, or you could look at ipmiutil. Since I was using Linux, I just made sure I had ipmitool installed, and I was set to go.

In ASMI I went to System information > Real-time progress indicator so I could see the LEDs from the front of the display without actually being in front of the machine. I verified I was at the 01 N V=N prompt, and then ran:

            ipmitool –I lanplus –H x.x.x.x –P password power on

Once I did this, I saw my LED codes change as the system powered on.

To get a console I ran:

            ipmitool –I lanplus –H x.x.x.x –P password sol activate

I soon discovered that I could kill my console session by running:

            Ipmitool –I lanplus –H x.x.x.x –P password sol deactivate

Finally, to power off I’d run:

            Ipmtool –I lanplus –H x.x.x.x –P password power off

I made sure the PowerKVM DVD (that I burned from an .iso image) was in the drive, the machine was powered up and my console was open. Eventually, petitboot came up. Due to a bad DVD I didn’t initially get what I expected, but once I had a .iso image that was in good shape, I was able to select the PowerKVM LiveCD option. The screen displayed:

            POWERKVM_LIVECD

            System information

            System configuration

            Exit to shell

The install wizard (which reminded me quite a bit of an old school RedHat install) prompted me for a root password, time zones, which disks to use, etc. Once that was completed it installed

PowerKVM and the system rebooted. Then petitboot came back up. I selected my freshly installed system and I was able to boot to a root prompt.

During the install I specified a network address for my network card and made sure it was on the network. To get Kimchi to work I needed to get into my sol console and edit /etc/sysconfig/selinux, change the selinux permissions to permissive and then reboot the server.

I verified that kimchid was allowed in the firewall by running:

            firewall-cmd –list-services

I ran systemctl to see the state of Kimchi:

            kimchid.service             loaded active running   Kimchi server

Once in a while I’d see it wasn’t running. In those instances I’d run:

            systemctl stop kimchid.service

            systemctl start kimchid.service

At this point I was able to connect to kimchi by going to https://ip.address.of.powerkvm.host:8001.

This gave me a graphical interface to simplify creating guests. I’ve gotten different versions of Redhat, SUSE, Ubuntu and Debian to all run successfully.

I copied my .iso files to /var/lib/libvirt/images and then set up templates in kimchi. It was pretty self explanatory. I’d click on the green + button, pick the local .iso image option and select create templates from selected .iso. At this point I could edit the number of CPUs, the amount of memory and disk, choose networking options, etc. Then I started my guest machine. I clicked on the live tile to get a console, and configured the machine as I would any new install.

By logging into my PowerKVM instance and running top, I can see all of the copies of my operating systems running as the qemu user. By running virsh commands I can get information about my machines as well as stop/start them, suspend them, etc. For example, this virsh command gave me information about the disks I was using for a machine called redhat7-1

            virsh qemu-monitor-command –hmp redhat7-1 info block

            drive-virtio-disk0: /var/lib/libvirt/images/9f82cbd6-d345-4591-aa68-748f2c7b2b4e-0.img (raw)

            drive-scsi0-0-0-2: /var/lib/libvirt/images/rhel-server-7.0-ppc64-dvd.iso (raw, read-only)

                Removable device: locked, tray closed

Another nice way to run the system is to simply enter virsh and run interactively. To see all the machines you can control, enter list. You can also get a console to a virtual machine — in my case I entered:

console redhat7-1

This is definitely an interesting new way to access a Power Systems machine. As I continue to work with PowerKVM, I’ll post more tips and tricks.

Yes, We Do Have AIX Forums

Edit: Still good options, especially if more people joined in.

Originally posted June 3, 2014 on AIXchange

Recently I attended a conference and ran into someone who was engaged in a futile search for another attendee. This acquaintance had been scanning the crowd for days with no luck.

After complaining about the fact that the font size that was used for attendee badges, both first and last names, should have been bigger so that the names could be more easily read from a distance, we bandied about some potential solutions.

Twitter? Perhaps this elusive attendee would notice a tweet. What about adding this person as a friend on Facebook, or sending a message on LinkedIn? Someone else in our group suggested contacting the conference organizers to see if they could get a mobile number for this person, but that option seemed pretty shady. I know I wouldn’t want conference organizers to hand out my personal contact info willy nilly.

Ultimately, the best suggestion was pretty old school. Just go to the front desk and ask if this individual is staying at the hotel, and if so, leave an old-fashioned paper message. Obviously that wasn’t a high-tech solution, especially coming from a bunch of AIX pros, but it did, in fact, work.

I believe there’s a lesson in this anecdote. How often do you immediately turn to some sexy, leading-edge solution to accomplish some task or address some problem, when a simple, tried and true method will do the job more quickly and/or effectively?

For instance, I constantly hear people say we need more online AIX forums. But we already have online forums. So why not use the resources we currently have, like IRC and the AIX mailing list?

These resources remain valuable. Try them, if you haven’t already. You’ll find good advice from admins who are online and willing to help.

I’m all for progress; I certainly don’t miss my alphanumeric pager. But sometimes, those tried and true, old-school methods are still the way to go.

Random NIM Notes

Edit: Still good stuff.

Originally posted May 27, 2014 on AIXchange

Recently, a customer wanted to remove a client from its original NIM master server and add it to a new, just-installed NIM master server.

The comments made to this post reminded me of the niminit command:

A quick recap on how to register and initialize clients to the master.

On the master, make sure the clients are allowed to register themselves. This is the default from master installation.

Make sure the clients are in /etc/hosts.

There are also other things to make sure that comms can happen, like the /etc/hosts.equiv file…

If you can ssh to and from the master and client, then generally this is fine.

Make sure the client does not have the master software/filesets installed:

lslpp -vl | grep -i nim

check for “bos.sysmgt.nim.master 6.1.6.0 COMMITTED Network Install Manager ” and if it exist, remove it with:

installp -ug bos.sysmgt.nim.master

On the client do this:

(if the client was registered previously and the niminfo file exist)

rm /etc/niminfo

niminit -a name= -a master= -a connect=nimsh (this will build the niminfo file and register the client)

nimclient -p (permit the master to do work on the client)

nimclient -C (disable cryptographic thingy)

nimclient -l (to establish a task was successfully done by getting nim master info as output)

Now you can work on the client.

Once we removed the /etc/niminfo file and ran niminit, and we got up and running on the new NIM master server.

On another NIM topic, Waldemar Mark Duszyk wrote about removing NIM clients from NIM master servers.

I have not done any patching for a while and today, when I had to remove a nim client definition I could not remember the second command to use. Now, I do so here it is for the record the process:

First, reset the client

# nim -F -o reset NIM_CLIENT_NAME

Now, remove all associated with the client resources.

# nim -o deallocate -a subclass=all  NIM_CLIENT_NAME

At this stage the client can be removed.

# nim -o remove -F  NIM_CLIENT_NAME

the NIM_CLIENT_NAME is the hostname of the client to be removed.

Feel free to share your own useful NIM tips and tricks in the comments.

Managing a Dump Device

Edit: Don’t overlook this.

Originally posted May 20, 2014 on AIXchange

Have you ever seen errors like this in your error log?

E87EF1BE   0515150014 P O dumpcheck      The largest dump device is too small.
E87EF1BE   0514150014 P O dumpcheck      The largest dump device is too small.

Have you verified that your system is capable of storing a system dump? If not, this technote on managing a dump device could help:

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure. Its intent is to help the system administrator ensure that a system dump will be complete and usable for troubleshooting purposes.

This document applies to AIX versions 5, 6, and 7.

There are different sections to this technote, including:

Managing system dump devices
Determining proper size for dump device
Setting a tape drive as a dump device
Do not dump to a mirrored logical volume
Dumping outside the rootvg
Remote dumps over to a network
How to create a dedicated dump device
Related documentation

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are two dumps devices (a primary and secondary). To view information about the current dump devices, enter:

sysdumpdev -l

Example:

# sysdumpdev -l

primary             /dev/lg_dumplv

secondary            /dev/sysdumpnull

copy directory      /var/adm/ras

forced copy flag    FALSE

always allow dump    TRUE

dump compression     ON

type of dump         traditional

The document also provides information about the primary and secondary dump devices, along with different flags that you can set to manage your dump devices:

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the rootvg volume group.

It is not recommended that a standalone dump logical volume be mirrored. It is much better practice to have a primary and a secondary dump device, each wholly contained on separate hdisks, rather than mirroring these devices. If for some reason the primary dump device is inaccessible the dump program will then attempt to dump to the secondary device.

So how do you fix the error I listed at the start of this post? Read the whole technote for more information, but the short answer is: estimate how much space you need for your dump by running sysdumpdev –e, then divide that estimated size by your physical partition (PP) size to determine how many PPs your dump device should have:

Note: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device should be a size that is greater than the value returned. If the dump device is a standard dump logical volume, such as lg_dumplv, then use the command extendlv to increase its size. If it is the primary paging space hd6, use the command chps.

Believe me, you don’t want to wait for a catastrophic operating system software failure to discover that your dump devices are too small.

Limiting Concurrent Logins

Edit: Some links no longer work.

Originally posted May 13, 2014 on AIXchange

Recently, an AIX mailing list member was asking about limiting concurrent logins to a machine:

            Is there a setting in AIX that will allow the number of times a user can login? The scenario is thus:

            John logs in, uses one app license.
            John logs in from a second terminal and uses a second license.
            John logs in again from a third terminal and uses a third license.
            And so on.

            We have multiple users doing this and we are maxing out our application licenses, looking for a way to stop it. The logins are ssh based via a proprietary application. Their .profile funnels them into a limited menu where they can only do certain things.

Discussion followed and some good suggestions emerged. Since this issue is something that may come up for some of you, I thought I’d lay out the solution here. First, a mailing list commenter recommended this blog post:

How to restrict users to a single login to a system at any one given time.

Question: The ability to set up user accounts so that users can only be logged onto a system only once at any given time (no concurrent access) does not exist on AIX.

Cause: This is a not a defect, but just the way AIX is designed. Sun Solaris has the same limitation.

Answer: The answer is to create a method that will check to see if a user is already logged in. Here is an example of one possible solution:”

The blog author then described creating a script and making a few modifications to the system to enable the desired behavior.

Another commenter recommended the setting, chuser maxulogs=2 user_name, noting that this limits users to two login sessions. He adds that details can be found by doing a man page lookup on chuser.

I prefer this latter option for its simplicity, but either solution is worth considering if you’re faced with this dilemma.

How many of you saw the original exchange on the mailing list? Do any of you have another way to address this problem?

Using the HMC with IBM Flex System Nodes

Edit: Cannot remember the last time I thought about Flex System.

Originally posted May 6, 2014 on AIXchange

The short-lived SDMC wasn’t widely adopted, in part because customers who were familiar with the HMC didn’t want to relearn the SDMC interface. While the IBM Flex Systems Manager (FSM) isn’t going anywhere, it does pose a similar issue — at least for one of my customers that wanted to manage its standalone servers and flex nodes from the same HMC.

If you didn’t know, you can do this. However, there are caveats. IBMer James Nash has a great set of slides with information about the solution.

He begins by pointing to this announcement on new IBM Flex System functionality:

The IBM Flex System now supports the IBM Power Hardware Management Console (HMC) and the IBM Integrated Virtualization Manager (IVM). The IBM Power Rack-mounted HMC brings the same, full function PowerVM management available on POWER7 and POWER7+ rack servers to the POWER7 and POWER7+ processor-based Flex System compute nodes. IVM is an easy-to-use, browser-based tool providing access to the entry set of PowerVM functions.

The HMC function is available with HMC Version 7 Release 7.7.0 Service Pack 2. The IVM function is available with VIOS Version 2.2.2.3. Both solutions require Power Compute node firmware Version 7.7.3.

Note: The use of the IBM Power Hardware Management Console (HMC) and the IBM Integrated Virtualization Manager (IVM) are only supported as part of a Flex System configuration with Power compute nodes and cannot be used in the same configuration with IBM Flex System Management node.

James also references this technote, which explains that “flex nodes can be managed by HMC, IVM or FSM. Only one manager type can be active; dual management by FSM and HMC is not supported.”

The technote lists the supported combinations:

 01AF773_03301AF773_051 
Version 7 Release 7.7.0Service Pack 2Service Pack 3
Note: HMC does not support TLS 1.2 enablement on this firmware level.
 
Version 7 Release 7.8.0Not supported.*Service Pack 0
(approved Jan/2014)
 
Version 7 Release 7.9.0Not supported.Not supported. 

* Note: All partitions (virtual servers) on the node must be powered off prior to converting from FSM to HMC 7.7.8 or later or upgrading an existing HMC from 7.7.7 to 7.7.8 or later.

Keep in mind that if you use the HMC for this function, you miss out on some FSM features, namely:

* Auto discovery of resources

* Physical and virtual management

* Network and storage management

* Alerts, health status, call home

* Firmware management

* Remote console

James’s slides include a nice chart that lays out the respective features and functions of the HMC and FSM. For instance, both allow you to power on/off nodes and create and activate LPARs, but only the FSM allows you to manage flex chassis components, control IBM storage and the IBM network, and run VMControl.

In addition, there are some good screen shots of how to “un-manage” your devices and discover your nodes from your HMC:

“Moving from HMC managed Power Flex nodes to FSM managed Power Flex nodes is only supported if you shutdown the Power Flex nodes and set to factory defaults. This will result in loss of all LPAR/virtual Server configurations. If you plan to go from FSM to HMC then back to FSM, make a backup of the FSM before you move to the HMC. This will save configuration information from the point in time that you created the backup so any changes you make on the HMC will not be reflected when you ‘go back’ to using the FSM to mange your Power Flex nodes.”

The final slide presents some frequently asked questions:

            * Can I use FSM and HMC simultaneously (same chassis)? No.

            * When using HMC, can FSM manage the x86 servers in the same chassis? No.

            * Can the FSM manage non-Flex Power servers? No.

            * Does this mean the FSM is going away? No.

            * Can I have dual HMC management? Yes.

            * Can I use IBM Systems Director? Yes.

            * Can I use PowerVC? Yes.

            * Can some nodes use IVM and others HMC? Yes.

You may be perfectly happy running the FSM, but if not, understand that this alternative is available.

More POWER8 Announcement Details

Edit: Some links no longer work.

Originally posted April 29, 2014 on AIXchange

By now you’re probably aware of IBM’s announcement of POWER8 and new hardware models. Last week I examined the speeds and feeds. Here’s a look at some of the other coverage:

            *  ExtremeTech.com: IBM unveils Power8 and OpenPower pincer attack on Intel’s x86 server  monopoly

             *  New York Times: IBM Opens Chip Architecture, in Strategy of Sharing and Self-Interest

             *  Forbes: IBM Spends $2.4 Billion On New Power Servers And Partners With Google And Nvidia To Go After Intel

            *  DataCenterKnowledge.com: IBM Unveils New POWER8 Systems, Built for Big Data

            *  InformationWeek.com: IBM Unveils Power8 Chip As Open Hardware

           *  CrucialCIO.com: IBM Unveils Power 8 For Cloud, Big Data Crunching    

           *  ITJungle

  •  Additional resources:

             * OpenPOWER Foundation

            *  OpenPOWER roadmap

             * Video

            *  Redbooks

Of course there’s much more online (just search on POWER8 and OpenPOWER), and I’d encourage you to use Twitter search as well.

Naturally, I’m looking forward to getting my hands on the machines, but I’m especially excited to work on the machines with PowerKVM. You can be sure I’ll write more about this in the near future.

I was excited about POWER4 (which came out back in 2001), and was just as enthusiastic about POWER5, POWER6 and POWER7. I’m sure I’ll feel the same about the next big thing after POWER8. Trying out new hardware and new capabilities never gets old.

POWER8 Speeds and Feeds

Edit: Links still work.

Originally posted April 23, 2014 on AIXchange

POWER8 technology created some buzz when it was first discussed at the Hot Chips conference and slides that describe the chips could be found online before today. But now we have more information about the actual systems that will be shipping when they become generally available in June.

I recently attended an education session for IBMers and business partners that covered information around POWER8 and the new IBM hardware announcements that were made today. I hit some of the technical highlights in my article “IBM Delivers With POWER8.” I had planned for this content to be posted in my blog, but because of some technical issues, it became an article instead. I will also write future blog posts on the topic.

Device Mapping on IBM i

Edit: Link still works.

Originally posted April 15, 2014 on AIXchange

More and more IBM i administrators who once relied exclusively on internal disks are now getting their disks from VIOS using vSCSI or NPIV. These administrators need a handy way to determine which SAN LUNs correspond to which individual virtual disks in IBM i. If your SAN administrator wants to know which LUN is DD001 on the IBM i client, how do you respond?

In disparate computing environments, it’s a question that’s bound to come up. Maybe the SAN admin is making some changes and wants to take back a LUN, and the IBM i admin needs to be able to determine which LUN is allocated to which virtual disk in the IBM i environment. Maybe the administrator temporarily used an internal hdisk and is now migrating to a SAN LUN. Whatever the reason, there are certainly times when you want to know which backing device is being used in an IBM i environment.

Luckily, this document is available to help with the cross-referencing.

Problem (Abstract)

This document describes how to cross-reference (device mapping) IBM i disk units with VIOS disk units.

Resolving the problem

At some point, you may need to know the physical location of the virtual disk units. This information might be needed for performance measurements or service, or you might need to be able to translate to and from IBM i to VIOS to the physical devices (including an external disk, if relevant). This document describes how to map the i drives to the VIOS drives and then to the physical disks.

These scenarios are presented in this document:

1) Check device mapping from the HMC — With HMC version 7.3.4 SP2 or later you can see the device mapping from the HMC if the server is HMC Managed.

2) Find the device mapping in a single adapter client partition — Another way to do this mapping is by calculating the LUN from the Controller number. Back on the Display Disk Unit Details screen, we know that DD001 is on Controller 3. You should convert the controller number from decimal to hex and then add 0x80. In this case, Decimal 3 = 0x03 + 0x80 = 0x83. Looking at the lsmap, we can see the LUN 0x083 is hdisk9.

Mapping the i drive to the correct hdisk becomes important if you ever want to remove disk units from the partition. After you have removed the disk unit from the ASP, you must be able to determine which hdisk to remove from the partition profile.

3) Multiple adapter client partition — If you need to identify disks, you will need to know which virtual adapter (vhost) has the correct mapping. In this case, we need to look at the SYS Card identifier. This number identifies the client adapter ID in the partition you are working with. In a multiple adapter environment, the controller number is not unique. Therefore, you also need to determine which adapter you are looking at. To do this, you need to use the Sys Card Number from the Display Disk Unit details above. You may use either HMC or IVM to view the partition properties. Partition ID 4 has two client adapters. They connect to server adapters 401 and 403, respectively. View the server partition properties/virtual adapters tab to see the information.

4) vios command line interface on VIOS 2.1.2.X and IBM I R6.1.1 or later — If this is an IVM environment with multiple SCSI adapters you can use the following command to map the disks.

Use this command to get the system name:
lssyscfg -r sys -F name

Use the system name as input into the following command:
lshwres -m -r virtualio –rsubtype scsi –level lpar -F topology

5) The system card number explained — Above in the multiple adapter example disk unit’s details showed that the Sys Card field was 145 for 16 of the drives, and 147 for the remaining 15. Why is the sys card field a different number than the HMC or IVM adapter ID? The Sys Card field wraps at 256. So if you add 145 and 256, you get 401. Likewise, if you add 147 and 256, you get 403. We also need to keep in mind that it is possible to have more than one Sys Card of the same number because customers can configure the adapter ID. For example, looking at an IBM i partition disk unit details there are 62 disk units in various ASPs. There are two Sys Card 33s listed in the details.

Looking at the partition properties, you see the following:

TypeAdapter IDConnecting PartitionConnecting Adapter
Server SCSI801LPAR1801
Server SCSI33LPAR433

U9117.MMA.1007E34-V1-C801 = vhost1
U9117.MMA.1007E34-V1-C33 = vhost2

256 + 256 + 256 + 33 = 801
33 = 33

Have you had to map LUNs back to IBM i disks? What methods did you use?

Good Things to Bookmark

Edit: Some links no longer work.

Originally posted April 8, 2014 on AIXchange

If you’ve never heard of the POWER Systems Reference, you’re in for a treat. This site catalogs a host of basic informational and support resources.

For instance, the Quick Ref tab provides (among other things):

Default FSP Addresses

-p5

-FSP A HMC1  192.168.2.147

-FSP A HMC2  192.168.3.147

-FSB B HMC1  192.168.2.146

-FSP B HMC2  192.168.3.146

-p6/7

-FSP A HMC1  169.254.2.147

-FSP A HMC2  169.254.3.147

-FSP B HMC1  169.254.2.146

-FSP B HMC2  169.254.3.146

 Phone Numbers

888-426-4357  IBM HELP

800-426-2255  IBM Direct

800-426-4968  IBM 4 YOU

800-225-5249  CALL AIX

800-300-8751  Rochester Quality Hot Line

800-426-5463  Poughkeepsie Quality Hot Line

(800-IBM-LINE)

877-603-2145  Mechanicsburg Parts

800-678-4727 Opt. 1 Parts Administration

888-426-4357 Opt. 4,1,2 RETAIN

Public Links

Assist On Site – AOS

CoD Activation Codes

Code – IBM Support Portal

Code Compatibility Matrix

Google

Info Center

Sales Manual

The HMC Menu tab features sample HMC menus for either systems or frames, along with examples of scenarios covering HMC connections. There’s also an ASMI menu table, which, as you can imagine, displays ASMI menus (that can vary based on your firmware level).

The POWER tabs (POWER8, POWER7, POWER6, POWER5, POWER4) cover the different models of hardware including links to adapter placement, sales manuals and IBM Redbooks.

The information here is so valuable, I can’t help but wonder why I never thought of doing something similar myself. I guess I should just be grateful that I don’t have to.

While I’m at it, the QuickSheets and QuickStart guides are also worth bookmarking.

Feel free to cite your favorite online resources in comments.

A tmux Primer

Edit: It has been longer than 10 years now. Some links no longer work.

Originally posted April 1, 2014 on AIXchange

I wrote this piece about screen and vnc for IBM Systems Magazine 10 years ago, and I still refer to it, because I still use screen and vnc. However, I must admit that tmux is giving screen a run for its money these days.

A ton of good tmux tutorials are available, but I’ll start with this one:

“tmux is useful to people in different ways. To me, it’s most useful as a way to maintain persistent working states on remote servers—allowing you to detach and re-attach at will… you can use tmux to have multiple panes within multiple windows within multiple tabs within multiple sessions.”

Indeed, tmux does provide another way to look at detachable sessions. But why should a loyal screen user (like me) go learn something new?

“tmux is a lot like screen, only better. The short answer for how it’s better is that tmux is better designed to perform the same functions. Screen gets you there (kind of) but does so precariously.

“Here are a few of the key advantages of tmux over screen:

l    Screen is a largely dead project, and its code has significant issues

l    Tmux is an active project with an active codebase

l    Tmux is built to be truly client/server; screen emulates this behavior

l    Tmux supports both emacs and vim shortcuts

l    Tmux supports auto-renaming windows

l    Tmux is highly scriptable

l    Window splitting is more advanced in tmux

Enough about that. Use tmux.”

As I said, there are several other good tutorials on tmux. I also recommend this two-parter (here and here).

Perzl.org is the place to get tmux for AIX. Or start here if you’re unsure of how to get all the dependencies from perzl.org.

Once you master tmux, you won’t look back — and again, this is coming from a long-time screen user. Lately I’ve even been thinking that it would be nice if we could use tmux to reconnect to persistent sessions on the HMC.

Are you already using tmux? How about other persistent tools? I’m always looking for new things. Are there tools you’d recommend to me?

Resolving an Issue with Dual HMCs

Edit: Change control is still key.

Originally posted March 25, 2014 on AIXchange

Recently, a customer was unable to run a DLPAR command against some of the LPARs on their frame. That in itself isn’t unusual. Generally in these situations the network isn’t communicating between the HMC and the LPAR, or perhaps RMC daemons need to be restarted somewhere.

This environment had dual HMCs connected to the managed system. HMC1 controlled some of the LPARs and HMC2 controlled others, but not by design. Although there was no rhyme or reason to it, for simplification let’s say that HMC1 was controlling LPAR1 and LPAR3 and HMC2 was controlling LPAR2 and LPAR4. The correct setup would have been HMC1 and HMC2 controlling LPAR1, LPAR2, LPAR3 and LPAR4. In reality approximately 40 LPARs were on the frame, with each HMC controlling approximately half of the LPARs.

If you were on HMC1, you could DLPAR LPAR1 and LPAR3 with no issues. If you were on HMC2, you could DLPAR LPAR2 and LPAR4 with no issues. The problem was that the only way to know which HMC was controlling which LPAR was to either login to the HMC command line and run lspartition –dlpar, or use the HMC GUI and select HMC Management > View Network Topology. There was no way to know which HMC you needed to login to to manage which LPAR. This headache needed to be resolved.

Initially we did some troubleshooting with IBM Support. That resulted in us running things like:

            /usr/sbin/rsct/install/bin/recfgct
            /usr/sbin/rsct/bin/rmcctrl –p

We tried getting root access via pedbg. We also tried collecting a snap:

            /usr/sbin/rsct/bin/ctsnap

Eventually, once we escalated high enough up the support food chain, someone noticed a very basic HMC setup problem:

            The LPAR IBM.MgmtDomainRM default file shows this msg where it’s attempting to create a IBM.MCP entry for hmc1. It fails with Error number 14, duplicate key for localhost.

            2610-652 The specified time limit has been exceeded.
            Mon Feb 17 13:04:32 CST 2014(439849)      ../../../../../src/rsct/rm/MgmtDomainRM/MCP_cfg.c/01438/1.22  2613-034 Error number 14 was returned when attempting to define an IBM.MCP resource.
            2610-014 The key token localhost is a duplicate.

During the initial build of the HMCs, they had been given their unique hostname and IP address, but somehow someone made a change that resulted in both hostnames being reset to localhost. Since these HMCs had the same hostname and ran on the same network, only one of the two was capable of managing LPARs at any given time. The other one would always fail.

Needless to say, if you run multiple HMCs, make sure they have unique hostnames. And in any environment, it’s essential to establish good change control so people aren’t making changes to systems without proper approvals and documentation.

Installing Language Filesets

Edit: Future me is still glad these archives are here.

Originally posted March 18, 2014 on AIXchange

If you’ve ever installed locale files on your AIX server, you might appreciate my recent predicament. A customer’s application team recently asked me to install bos.loc.utf.EN_US as part of their overall installation. The file exists on the installation media, but not in .bff format. So how do you install it?

Web searching was of little help (perhaps I didn’t enter the most accurate search terms). In any event, my searches turned up links like this, this and this, none of which were what I was really looking for. I just wanted to learn how to install the filesets.

Through blind luck and poking around, ultimately I went into smitty > system environments > manage language environment > add additional language environments.

For both the cultural convention to install and the language translation to install I selected:

            UTF-8    English (United States) [EN-US]

I told it where to find the files, and it worked.

I cut and pasted my smitty screen below. Hopefully this will help you visualize what I’m talking about:

                        Add Additional Language Environments

            Type or select values in entry fields.

            Press Enter AFTER making all desired changes.

                                                                          [Entry Fields]

            CULTURAL convention to install                 UTF-8             English (U> +

            LANGUAGE translation to install                 UTF-8             English (U> +

            * INPUT device/directory for software         [/dev/cd0]                   +

            EXTEND file systems if space needed?        yes                               +

            WPAR Management

              Perform Operation in Global Environment     yes                             +

              Perform Operation on Detached WPARs       no                              +

                Detached WPAR Names                            [_all_wpars]                +

              Remount Installation Device in WPARs      yes                               +

              Alternate WPAR Installation Device            []

The process to install this fileset was easier than I’d expected (as using smitty usually is), though less intuitive than I’d hoped (you have to know to go into the language environments in the first place).

Incidentally, one reason I’m writing about this this experience is simply to have it documented somewhere. I actually will do web searches when I’m working on an issue, and find the answer in a link to something I’d written years earlier. In fact it happens fairly often. So as much as I enjoy sharing with readers, I admit I have another, slightly selfish motivation for writing these posts. Sometimes they help Future Me solve problems. Hopefully Future Me will appreciate the time I took to write about this particular issue.

Upgrading the HMC

Edit: Links still work at the time of this writing.

Originally posted March 11, 2014 on AIXchange

Awhile back Chris Gibson (@cgibbo) retweeted this document on upgrading to the newest HMC model.

So why would you upgrade when your current HMC is working fine? For the same reasons you upgrade anything else — for starters, faster CPU and better performance. The latest version of the HMC typically runs much faster than whatever you had before. Of course the other, more critical factor is that, over time, older HMC models are no longer supported. In any event, upgrading will happen sooner or later.

As the document notes, upgrading on the HMC isn’t simply a matter of backing up one HMC and restoring it to your new one. You’re basically clearing out your HMC and connecting a new console:

“This document describes the procedure for replacing an existing HMC. This procedure should be followed when ‘upgrading’ an existing HMC to a newer model/type. It applies to the situation where the new HMC manages the same server and may have the same IP address and host name as the HMC it replaces.

            “If this procedure is not followed, some of the errors which may be encountered include the following:

            o Open Serviceable Events which cannot be closed
            o Platform Event Logs which are not reported or called home through Serviceable Events
            o RMC communication problems between the HMC and partitions.”

I’ll preserve the meat of the information here in case the IBM link goes away in the future:

            Before removing the old HMC:

            1) Close all serviceable events.

              a. Verify that all serviceable events reported against a managed server have been reported to IBM and repaired.
              b. Close the serviceable events.

            2) Permanently remove all server connections.

              a. Record all current connections:

                  1. Access the restricted shell
                  Local HMC: Click HMC Management, Open Restricted Shell Terminal.
                  Remote: ssh to the HMC.

                  2. Run lssysconn -r all
                  3. Save the output.

              b. Expand Systems Management, Servers.

              c. For each server, remove the connection:

                    1. Select the server.
                    2. Expand Connections, select Reset or Remove Connections, select Remove Connection, click OK.

              d. For each frame, remove the connection.

                    1. Access restricted shell
                    2. Run lssysconn -r all | grep type=frame
                    3. Record the IP address of the frame connections listed in the output. Example below: 

172.16.250.255 and 172.16.255.251

    lssysconn -r all | grep type=frame
    resource_type=frame,type_model_serial_num=9458-100*9920250,side=b,ipaddr=172.16.250.255,alt_ipaddr=unavailable,state=Connected
    resource_type=frame,type_model_serial_num=9458-100*9920250,side=a,ipaddr=172.16.255.246,alt_ipaddr=unavailable,state=Connected

                    4. For each listed IP address, issue the command: rmsysconn –ip -o remove

                    Example: rmsysconn –ip 172.16.250.255 -o remove

            3) Ensure all connections have been removed.
              a. Access restricted shell
              b. Run lssysconn -r all
              c. Verify that there are no connections listed as shown in the example below:

              lssysconn -r all
              No results were found.

            4) Remove the HMC.

            Configure the new HMC:


Note: Restoring upgrade data or backup HMC data to a different model is not supported. A possible alternative is to use HMC data replication to replicate information onto the new hardware. Information that can be replicate includes: customer contact information data; user data (user profiles, task roles and resource roles); outbound connectivity configuration data.

Remember to periodically ask, is your firmware current? Is your HMC code current? Is your HMC model current? Is your VIO server code current? Are your OS images patched?

So are you ready to upgrade now? If not, what’s holding you back?

The Key Question of Why, and Why the Answer Matters

Edit: Still good questions to ask ourselves. Last link no longer works.

Originally posted March 4, 2014 on AIXchange

A while back Nigel Griffiths (@mr_nmon) tweetedabout a TED talk on leadership. It was 18 minutes well-spent.

Simon Sinek, the speaker, gave this presentation in 2009, and it was posted to TED.com in 2010. He starts by drawing three circles that look like a target. The innermost circle is labeled “Why?” The middle layer is is labeled “How?” and the outer layer is labeled  “What?”

Simon believes that understanding why we do the things we do is significantly tougher than explaining how or what it is we do. He also finds the question of why to be more compelling. He points to prominent companies advertising their wares. For instance, he says that Apple tells consumers, foremost, why they should buy Apple products. At Apple, they “think different” and create machines that are easy to use. He contrasts that with Tivo, which eschews the why for the what, and simply informs consumers about the features of its products.

Simon further considers the vagueness of the question of why by pointing out how we often use our emotions to arrive at decisions. We cannot really even articulate why — it’s just a gut decision. He believes that those who can successfully communicate the reason why are people who can get others to believe in their vision.

Of course this extends beyond ad campaigns. Simon believes that businesses thrive when everyone believes in what’s being accomplished. His contention is that if you seek talented people to do a job, you’ll get people who will perform acceptable work for the money. However, if you hire talented people who also believe in what you’re doing, their passion will lift the entire operation. He adds that people follow causes far more readily than they follow individual leaders.

In his response to Simon’s presentation, Nigel considers the question of why he uses and advocates for IBM Power Systems and POWER8 processors.

            “Why POWER8?

            * We believe server sprawl means a future in which we drown in small computers taking 80 percent of the world’s electricity to run their idle loops 80 percent of the time.

            *We believe we have got to build a vastly better computer to avoid that.”

Again, I encourage you to take a few minutes and view Simon’s TED talk. Then you may want to ask yourself, as an AIX pro, why? Why are you passionate about AIX and Power Systems? Why were you excited about upgrading from POWER6 to POWER7+ processors, and why are you excited for POWER8 to come out?

For that matter, why do you even do the job you do? Is it strictly for the paycheck, or is there a broader reason for your career choice? Does having access to superior hardware and tools give you a more positive outlook on your job? Does the opportunity to learn new things on a daily basis drive you (as it does me)? Why do you do what you do?

Popular Presentation Located, Updated

Edit: Link no longer works.

Originally posted February 25, 2014 on AIXchange

I often get questions from readers. The most common question I’ve received recently is, “Where did the slides go?”

In April 2013 I noted that Fredrik Lundholm had been compiling the Power Implementation Quality Standard for commercial workloads, and posted a link to what was then the latest version of his slide presentation (1.9). I knew this was valuable information, but I had no idea how widely it was being read until this particular set of slides disappeared from the Internet. I’ve received at least an email or two every week from people wondering what happened.

What happened was that Fredrik regularly updates this information, and the links tend to change when he does. The good news is his latest version, 1.12, is available here. On Page 3 he lists the changes he’s made in each version of the presentation. I’ll just highlight the updates that have come since version 1.9:

            Changes for 1.12

            – PowerHA and PowerHA levels, AIX levels, VIO levels

            – Virtual Ethernet buffer update

            Changes for 1.11

            – Power saving animation

            – Network configuration update admin VLAN/simplification

            – Removal of obsolete network design

            Changes for 1.10

            – Favor performance without Active Energy Manager

            – AIX/GPFS code level updates

            – AIX Memory Pin

For my part I posted a comment to my post from last year, noting this new version. Please let me — and Fredrik — know if you find this information useful.

PowerHA and Multicast Setup

Edit: Link works at the time of this writing.

Originally posted February 18, 2014 on AIXchange

Recently during a PowerHA 7.1.2 installation, the network team was unable to get multicast communication working properly. Fortunately we were able to use this document to get everything going.

From the document, entitled “PowerHA System Mirror 7.1 and Multicasting Setup”:

            “PowerHA SystemMirror 7.1 Standard Edition High Availability solution implements clustering using multicast (IP based multicast) based communication between the nodes/hosts in the cluster. Multicast based communication provides for optimized communication method to  exchange not only heartbeats, but also allows clustering software to communicate critical  events, cluster coordination messages etc in 1 to N method instead of communication 1 to 1 between the hosts.

            “Multicast communication is a well established mode of communication in the world of TCP/IP network communication. However in some cases, the network switches used in the communication path need to be reviewed and enabled for multicast traffic to flow between the cluster hosts through them. This document explains some of the network setup aspects that may need to be reviewed before the PowerHA SystemMirror 7.1 cluster is deployed.

            “Note that multicast communication is used during the initial discovery phase when the cluster is being created, but also during the normal operations of the cluster. Hence it is extremely important that the multicast traffic to flow between the cluster hosts in the datacenter before the cluster formation can be attempted. Please plan to test and verify the multicast traffic flow between the would-be cluster nodes before even attempting to create the cluster.”

Before I get too far along, I should note that with PowerHA 7.1.3, unicast is an added communication option. In fact, it’s the default option. These issues with getting multicast working are likely a behind this change. But in the case of this customer, the commitment was made to version 7.1.2.

Here’s a bit more about multicast:

            “Multicasting is a form of addressing, where a group of hosts form a group and exchange messages. A multicast message sent by one in the group is received by all in the group. This allows for efficient cluster communication where many times messages need to be sent to all the nodes in the cluster. For example a cluster member may need to notify the rest of the nodes about a critical event and can accomplish the same by sending a single multicast packet with the relevant information.

            “One of the simplest method to test end to end multicast communication is to use the mping command available on AIX. In Fig 1, start the mping command in receive mode on one Host (Say Host A) and then use mping command to send packets from the other Host (Host B). If   multiple hosts will be part of the cluster, test end to end mping communication from each host to the other.”

Finally, here are the document’s troubleshooting guidelines:

            “If mping command fails to receive packets from Host to Host in the network environment, there could be some issue in the network path in regards to multicast packet flow. Follow some of the general guidelines below to troubleshoot the issue:

  1. Review the switch vendor’s documentation for guidelines in regards to switch setup. Some of the known switch guideline links are included in the reference.
  2. Disable IGMP snooping on the switches. Most switches will allow for disabling IGMP snooping. If your network environment allows, disable the IGMP snooping and allow all multicast traffic to flow without any problems across switches.
  3. If your network requirements does not allow snooping to be disabled: Debug the problem by disabling the IGMP snooping and then adding network components one at a time for snooping
  4. Debug if necessary by eliminating the cascaded switch configurations (having only one switch between the Hosts).”

In our case, we disabled the IGMP snooping on the switch and multicast started to work.

What about your experience? Did you have any issues getting multicasting up and running? Please share your thoughts in comments.

Document Examines IBM i External Storage Options

Edit: In this piece I address a future reader, who would now be a past reader.

Originally posted February 11, 2014 on AIXchange

As I’ve previously noted, while this blog is focused on AIX, I think it’s worthwhile to occasionally discuss the IBM i operating system. Increasingly, AIX pros are asked to support their IBM i counterparts as they connect to external storage for the first time.

For many years IBM i systems only used internal disks. And I expect that some IBM i environments will continue to rely exclusively on internal disk for years to come. After all, managing your machine is easy when you’re in total control of the environment from disks to server. However, things have changed. These days IBM i is commonly used in shared environments (like SANs), and of course this is where adding disks becomes tricky.

While the VIO server is common to both AIX and IBM i, those of us on the AIX side have been using it for years. In contrast, many IBM i pros have little to no experience with VIOS, and thus find it difficult to pick up. If you’re an IBM i administrator in this situation, you may find this document helpful.

“Hints and Tips: V7000 in an IBM i Environment” examines external storage options. The authors are Alison Pate, IBM Advanced Technical Sales Support, and Jana Jamsek, IBM Advanced Technical Skills, Europe. The document was most recently revised in August 2013. (I like to note the date because readers will often find years-old posts on this blog that reference documentation that’s likely been updated over time. So if you see this in, say, 2017, first, glad you’re here, Future Reader, and second, be sure you track down the latest version of Alison and Jana’s work.)

For instance, this section lays out the challenges of attaching IBM i to SAN disks without VIOS:

            Translation from 520 byte blocks to 512 byte blocks

            “IBM i disks have a block size of 520 bytes. Most fixed block (FB) storage devices are formatted with a block size of 512 bytes so a translation or mapping is required to attach these to IBM i. (The DS8000 supports IBM i with a native disk format of 520 bytes).

            “IBM i performs the following change of the data layout to support 512 byte blocks (sectors) in   external storage: for every page (8 * 520 byte sectors) it uses additional 9th sector; it stores the 8-byte headers of the 520 byte sectors in the 9th sector, and therefore changes the previous 8* 520-byte blocks to 9* 512-byte blocks. The data that was previously stored in 8 * sectors is now spread across 9 * sectors, so the required disk capacity on V7000 is 9/8 of the IBM i usable capacity. Vice versa, the usable capacity in IBM i is 8/9 of the allocated capacity in V7000.

            “Therefore, when attaching a Storwize V7000 to IBM i, whether through vSCSI, NPIV or native attachment this mapping of 520:512 byte blocks means that you will have a capacity ‘overhead’ of being able to use only 8/9ths of the effective capacity.

            “The impact of this translation to IBM i disk performance is negligible.”

The document also identifies the requirements for and potential issues with using vSCSI or NPIV. One section looks at sizing for performance and the need to consider I/O as well as capacity. The authors recommend getting a Disk Magic model to determine what’s best for your environment. They suggest starting with 80G LUN sizes, noting, “the recommendation is to create a dedicated storage pool for IBM i with enough managed disks backed by a sufficient number of spindles to handle the expected IBM i workload. Modeling with Disk Magic using actual customer performance data should be performed to size the storage system properly.”

IBM Mulitpath is another topic of discussion:

            “With using the recommended switch zoning we achieve that four paths are established from a LUN to the IBM i: two of the paths go through adapter 1 (in NPIV also through VIOS 1) and two of the paths go through adapter 2 (in NPIV also through VIOS 2); from the two paths that go through each adapter one goes through the preferred node, and one goes through the non-preferred node. Therefore two of the four paths are active, each of them going through different  adapter, and different VIOS if NPIV is used; two of the path are passive, each of them going through different adapter, and different VIOS if NPIV is used. IBM i Multipathing uses Round Robin algorithm to balance the IO among the paths that are active.”

In addition, the document includes good graphics that further help explain the concepts being discussed.

There’s much more than I can cover here, so be sure to check it out. Though the document is IBM i specific, I believe this information is relevant for IBM i and AIX admins alike.

AIX Discussion List is a Place to Get Answers

Edit: Does this even exist anymore?

Originally posted February 4, 2014 on AIXchange

Where do you go when you have questions and need help? Do you talk to a coworker? Call IBM Support? Head straight to Google?

One option you may not be familiar with is the IBM AIX Discussion list. Mailing lists may seem like a relic from the Internet’s early days, but rest assured, they’re still out there. However, for you younger folks, I’ll include this brief backgrounder from Wikipedia:

“An electronic mailing list or email list is a special usage of email that allows for widespread distribution of information to many Internet users. It is similar to a traditional mailing list— a list of names and addresses— as might be kept by an organization for sending publications to its       members or customers, but typically refers to four things:

  • a list of email addresses,
  • the people (“subscribers”) receiving mail at those addresses,
  • the publications (email messages) sent to those addresses, and
  • a reflector, which is a single email address that, when designated as the recipient of a message, will send a copy of that message to all of the subscribers.”

According to its website, the IBM AIX Discussion list “is intended for the discussion of AIX. AIX is the IBM Unix solution for small and large computer systems. Initially, this list will be used for dissemination of information and technical details of AIX on all levels. It may be necessary to break this list down into machine types that AIX will run on.”

You can join the mailing list, browse the archives, or search the archives. If you don’t want every message (and associated replies) coming into your inbox, just subscribe to the mailing list digest. That way you’ll only receive one daily email containing all the discussion from the previous 24 hours.

I forget when this list was established, but I’ve been using it for many years. Lately though it seems there’s been a bit more traffic. Still, the more the better, so I thought I’d mention it here. If you have questions about AIX and PowerVM, the IBM AIX Discussion list is a great place to get answers. I recognize quite a few of the names of participants, and these are knowledgeable, trustworthy people. And as mentioned, even if you don’t get into the discussion, the archives offer an invaluable repository of information.

Are you familiar with the list? If so, are you part of the conversation, or are you a lurker? If you have questions — and we all have questions from time to time — I encourage you to subscribe and make use of this tool.

A Look at AIX Mirror Pools

Edit: This is still an interesting method to consider.

Originally posted January 28, 2014 on AIXchange

A storage guy I know went to last fall’s IBM Technical University conference to learn more about IBM Power Systems and AIX, but he came away very excited about the AIX Logical Volume Manager (LVM). We may take it for granted, but for him this information about what we could do with our built-in volume manager was revolutionary.

In addition to the base LVM and its capability to easily mirror logical volumes (including mirroring physical disks to LUNs, as well as mirroring LUNs that might be coming from different physical storage arrays), there’s also the relatively new concept of AIX mirror pools.

IBMer Michael Perzl authored and recently updated a document that does a great job of explaining this. From the Introduction to AIX Mirror Pools” abstract:

            “This document tries to shed more light onto AIX mirror pools which were introduced with AIX V6.1 Technology Level 2. AIX mirror pools unfortunately seem not to be well known despite being a very powerful new AIX feature which simplifies the task of mirroring data significantly. One reason may be that for using AIX mirror pools no extra commands exist but the existing AIX LVM commands have been extended to incorporate the mirror pool functionality.

            “This document is not meant to be an all-encompassing guide to AIX mirror pools but give a first impression what tasks can be accomplished much easier than before. The intended audience for this document are AIX users and system administrators. A general knowledge and understanding of AIX LVM is required.

            “An example of how mirror pools can be beneficial is when used with remote disks. If a volume group is created with physical volumes that are located in two different locations, the disks in one location can be assigned to one mirror pool and the disks in the other location to a different mirror pool. When a logical volume is created in that volume group, each mirror copy of that logical volume can be assigned to a mirror pool. Thus, when partitions are allocated for that copy they will only come from disks that are in the assigned mirror pool.

            “Without mirror pools, the only way to restrict which physical volume is used for allocation when creating or extending a logical volume is to use a map file. This typically is a very tedious and error-prone process. Thus, the main advantage of mirror pools is that they simplify the task of mirroring data significantly compared to the steps that were required before. This is specially beneficial when used with remote disks. If a volume group is created with physical volumes that are located in two different locations, the disks in one location can be assigned to one mirror pool and the disks in the other location to a different mirror pool. When a logical volume is created in that volume group, each mirror copy of that logical volume can be assigned to a mirror pool. Thus, when partitions are allocated for that copy they will only come from disks that are in the assigned mirror pool.

            “The following system requirements must be fulfilled for mirror pools:

            • Mirror pools are only available in AIX V6.1 TL 2 and higher

            • Mirror pools are only available for SVG type (scalable) volume groups.

            • After assigning PVs (physical volumes) to a mirror pool, the volume group can no longer be imported to a previous version of AIX that does not support mirror pools.

            • While it is possible to assign multiple logical volume copies to a mirror pool, it is recommended that only one copy of a logical volume be assigned to a mirror pool.

            • Volume groups can enable strict mirror pools. If this is enabled all of the logical volumes in the volume group must use mirror pools.

            • Any changes to mirror pool characteristics will not affect partitions allocated before the changes were made. The reorgvg command should be used after mirror pool changes are made to move the allocated partitions to conform to the mirror pool restrictions.”

The entire document is well worth your time. Go through it and then and get onto a lab system so you can play around with mirror pools.

IBM Rolling Out Entitlement Validations

Edit: Some links no longer work.

Originally posted January 21, 2014 on AIXchange

Last fall I wrote about IBM changing its approach to delivering software products and updates:

“Another portal feature is an entitlement check that allows customers to download fixes. Just enter your machine type and serial number. The various entitlement types are tied to the level of maintenance you have on your machines. Going forward, IBM will move toward making the capability to download fixes a privilege available primarily to paying customers.”

Get ready. It appears we’ll soon need to verify that we’re entitled to download fixes. The details can be found here:

            “Starting in January 2014, IBM will implement entitlement validation on Fix Central for select software products and updates and for Machine Code (also known as firmware or microcode) updates for select machines. Entitlement for Machine Code updates will be checked through user-provided serial numbers. Entitlement for software products will be validated through IBM ID association to relevant IBM customer numbers. Additional information may be requested or required to confirm entitlement.

These entitlement validations are not being implemented in all countries at this time.

IBM reserves the right to change, modify or withdraw its offerings, policies and practices at any time.

FAQs

            1. How does entitlement work?

     A: Entitlement for Machine Code updates is validated through user-provided machine serial numbers. Entitlement for software products will be validated through IBM ID association to relevant IBM customer numbers.

            2) What if I failed entitlement but have warranty or an applicable support contract in place?

     A: Please submit a request for help during the download process.

            3) Who can I contact for additional help?

     A: You can submit a request for help during the download process or contact IBM Support.

            Note: Fix Central Machine Code updates are available only for IBM machines that are under warranty or an IBM hardware maintenance service agreement. Code for operating systems or other software products is available only where entitled under the applicable software warranty or IBM software maintenance agreement. All code (including Machine Code updates, samples, fixes or other software downloads) provided on the Fix Central website is subject to the terms of the license agreements which govern the use of the associated code.

Some exemptions may apply.

Visit the IBM Fix Central site directly at http://ibm.com/support/fixcentral or navigate to the “Downloads” section on any IBM Support Portal product page.

For even more information about all of IBM’s Electronic Support sites and tools, please visit our   information site at: https://www.ibm.com/support/home/

The next time you need to download fixes, you might want to give yourself some extra time to make sure you’re properly entitled. Try to be proactive and test things out before you find yourself in a situation where you need to try to download something in a hurry.

Adapter Numbering Schemes

Edit: These days I think most shops don’t worry about adapter numbers.

Originally posted January 14, 2014 on AIXchange

How do you number your virtual adapters when you’re planning to build a new Power system?

Some people put no thought into their numbering plans; they simply have the system pick the next available adapter number in the HMC GUI. Others just use even and odd numbers — 10, 20, 30 and 40 from vio1 and 11, 21, 31 and 41 from vio2 — and then map the next LPAR to the next available number. When troubleshooting is needed, they look to their documentation or employ some other method to figure out which adapter goes where.

I was recently shown a numbering scheme for virtual fibre and virtual SCSI adapters — 4-digit numbers for fibre and 3-digit numbers for SCSI. The first digit was 1 for vio1 or 2 for vio2. The second digit on the virtual fibre adapter indicated which physical fibre port it was connected to via the vfcmap command.

When using NPIV (see here and here) and running vfcmap, you indicate the physical FCS device you’ll be using for your connection. In this numbering scheme I know the VIO server the virtual adapter is coming from and the physical FCS device it’s mapping to. The last two digits indicate the partition ID it’s connecting to. (Obviously if your partition IDs extend into 3- or 4-digits, you would modify as necessary for vio3 and vio4.)

For example, on vio1 I might have virtual adapter 1112, and on vio2 I might have 2112. This would indicate the VIO server it came from, the physical device it was using and the LPAR ID, which in this case would be 12. By using the same numbers on both the client and the server, tracking down adapters becomes very simple. Virtual SCSI is the same, only 112 and 212 would be used. This indicates the VIO server it came from and the LPAR ID it was connected to. There would be no need to indicate the physical device it was mapping to.

A scheme like this comes in handy when you’re planning server builds that comprise many physical machines and many, many virtual machines. For example, a customer wanted four paths over four virtual adapters into their dual VIO servers, two from each VIO server.

For LPAR ID 10, 1110 and 1210 could be used for vio1, and 2110 and 2210 for vio2. Then for LPAR ID 11, 1311 and 1411 could be used for vio1, and 2311 and 2411 for vio2. This pattern would continue until you ran out of physical adapters; then you would circle back around. For example, if you had the physical fcs0, fcs1, fcs2 and fcs3 adapters on your vio1 server, you might see a pattern like this for the four LPARs with IDs 10, 11, 12 and 13:

            1010 – vio1, physical adapter 0, id10

            1110 – vio1, physical adapter 1, id10

            1211 – vio1 physical adapter 2, id11

            1311 – vio1, physical adapter 3, id11

            1012 – vio1, physical adapter 0, id12

            1112 – vio1, physical adapter 1, id12

            1213 – vio1, physical adapter 2, id13

            1313 – vio1, physical adapter 3, id13

With fcs0, fcs1, fcs2, and fcs3 adapters on vio2, you might see:

            2010 – vio2, physical adapter 0, id10

            2110 – vio2, physical adapter 1, id10

            2211 – vio2, physical adapter 2, id11

            2311 – vio2, physical adapter 3, id11

            2012 – vio2, physical adapter 0, id12

            2112 – vio2, physical adapter 1, id12

            2213 – vio2, physical adapter 2, id13

            2313 – vio2, physical adapter 3, id13

You’d then have these virtual adapters on each client.

            LPAR 10 – 1010,1110,2010,2110

            LPAR 11- 1211,1311,2211,2311

            LPAR 12- 1012,1112,2012,2112

            LPAR 13 – 1213,1313,2213,2313

It may seem confusing, but trust me, the more you use it, the more sense it makes.

Feel free to share your own adapter-numbering scheme in Comments.

Configuring X11 Forwarding

Edit: Some links no longer work.

Originally posted January 7, 2014 on AIXchange

A customer recently ran across an issue where their X11 forwarding was working fine on an AIX6.1 machine, but not on an AIX 7.1 machine. They were looking for a second set of eyes to make sure their configuration looked OK. 

Here’s the question (and ultimately, the answer) that I received:

            I’m stumped on a problem and hoping you might be able to shed some light on it. We’ve added several 7.1 systems recently and I’m trying to get X11 forwarding working on one of them. I’ve  got the systems configured the same way as our 6.1 systems, and PuTTY is configured the same way as well, but when I login to the 7.1 box, no .Xauthority file is created and my $DISPLAY doesn’t get set.

            I found a post on how to manually recreate the .Xauthority file and followed those steps, but the .Xauthority file is not created. If I run an xauth list command it says it’s creating the file, but it doesn’t actually create the file. The sshd_config file has X11Forwarding yes and a line for the Xauthlocation.

            I figured this would be something simple in /etc/ssh/sshd_config, but was told this when I asked about it:

                        Here’s the sshd_config info, and openssh was restarted using stopsrc -s sshd; startsrc -s sshd:

                        X11Forwarding yes

                        X11DisplayOffset 10

                        X11UseLocalhost yes

                        XauthLocation /usr/bin/X11/xauth

            (I’ve also tried it with the X11DisplayOffset and X11UseLocalhost commented out and restarting after making that change.)

            When we looked at the putty event log we saw:

            2013-11-05 09:43:56        Requesting X11 forwarding

            2013-11-05 09:43:56        X11 forwarding refused

            We also saw this article, and verified all of it was set correctly.

            I use ssh the X11 forwarding, but the DISPLAY variable isn’t set:

* Check X11Forwarding directive in sshd_config

* Check that ssh client has X11 forwarding option set

* The AIX machine is missing xauth programm. Install X11.apps.config fileset.

* There are some older OpenSSH or OpenSSL versions that are buggy. I have had issues with OpenSSH versions 4.6.X, OpenSSH_4.3p2, OpenSSL 0.9.7l 28

            And at this point we set up a webex so we could share the screen and figure out what the problem was. We changed settings in sshd_config. We tried just manually exporting the DISPLAY to the windows workstation running cygwin and that worked fine. We checked /etc/hosts and /etc/netsvc.conf and everything seemed to be in order.

            Finally, we found this post. What worked for me was to add ‘AddressFamily inet’ to /etc/ssh/sshd_config.

         This article had the same information.

            Once we added the AddressFamily inet to the sshd_config, it worked as expected.

            If someone else runs into this issue on AIX 7.1, hopefully this information will help. This also shows how important it is to document these finds when we come across them. I bet that a year from now someone will end up reading this post and it will fix that person’s problem, just like reading the articles I found fixed my problem.

Using rendev

Edit: This looks like the last post from 2013 based on the note below. Some links no longer work.

Originally posted December 17, 2013 on AIXchange

**Please note: This blog will be updated on January 7, 2014.**

Two weeks ago I reposted a tip from Russell Adams on displaying disk UUID. Here’s another tip that Russell submitted to a mailing list for IBMers and business partners earlier this fall. In this case he’s taken the time to update it. (Note: There are minor edits for the sake of clarity.)

            In AIX 6.1 TL6 and AIX 7.1 a new command was introduced to rename devices in AIX, rendev. This makes keeping your rootvg on hdisk0 (and hdisk1) and preserving device naming consistency across VIO and HACMP nodes simple!

            rendev -l device -n newname

            A few caveats:

            1) Renaming devices should always be done while the device is in a defined state (i.e., after “rmdev -l”); it cannot be used on active PVs in a VG or other online devices. While rendev can perform the rmdev of the device for you, it’s better to take the device offline first.

            2) Renaming ethernet (entX) adapters requires either manually renaming the enX and etX adapters, or removing them. Once the entX device has been renamed, cfgmgr will create matching enX & etX devices.

            3) Renaming fiber cards (fcsX) requires all that child devices be renamed manually. This includes fcsX, fscsiX, fcnetX, and sfwcommX. Use “rmdev -Rl fcsX” to unconfigure all the parent and child devices into the defined state, and then rename them. cfgmgr does not name the child devices to match.

            4) I recommend using rendev for renumbering like devices (e.g., ent2 -> ent11) rather than giving devices new name prefixes (e.g., ent2 -> lan3). Renaming device paths that are used by other device drivers (e.g., Powerpath) may cause issues.

            5) You can rename vhost and vfchost devices on VIOS in oem_setup_env *before* they’re mapped (not after). The devices should be manually put into a defined state first, and then can be renamed via rendev.

            Example:

            – New LPAR ID 5 will be mapped to VIO1 (ID 99) and VIO2 (ID 98)
            – LPAR ID 5 is assigned the following virtual adapters in its profile:
              – Slot 1-9 for network adapters
              – Slot 51 VSCSI to VIO1 (99) slot 51
              – Slot 52 VSCSI to VIO2 (98) slot 52
              – Slot 53 VFC to VIO1 (99) slot 53
              – Slot 54 VFC to VIO2 (98) slot 54
            – VIO1 (ID 99) dynamically adds:
              – Slot 51 VSCSI to LPAR ID 5 slot 51
              – Slot 53 VFC to LPAR ID 5 slot 53
            – VIO2 (ID 98) dynamically adds:
              – Slot 52 VSCSI to LPAR ID 5 slot 52
              – Slot 54 VFC to LPAR ID 5 slot 54

            Then on each VIO server, for each vhost and vfchost device:

            – lsdev -slots determine hardware location codes of new vhost and vfc devices


            – For example, vhost12 is XXXX-V99-C51
              – rmdev -l vhost12
              – rendev -l vhost21 -n vhost51  (from C/slot number to match LPAR ID and slot)
              – cfgmgr (or cfgdev) to activate them

            When done, map normally for VSCSI or VFC.

            The client LPAR and VIO slot matching numbering system has always been a good idea. Naming the devices significantly increases the readability of the virtual environment. All vhost and vfchost devices will have unique names that link them to the client LPAR.

            When coupled with ‘lspv -u’ on the client LPAR and VIO servers to determine client LUN mappings via UUID without manual tracing, this significantly simplifies virtual environments and troubleshooting.

Thanks again to Russell for allowing me to repost this information.

Running the RoCE Adapter in NIC Mode

Edit: Some links no longer work.

Originally posted December 10, 2013 on AIXchange

A colleague sent me an interesting solution to a problem he was seeing with a PCI32 10GbE RoCE converged host bus adapter.

It was coming up in RDMA mode by default under VIOS 2.2.2.3 (which is AIX 6100-08-03-1339 under the covers), and the customer wanted it to run it in NIC mode:

            “The PCIe2 10 GbE RDMA Over Converged Ethernet (RoCE) Adapter was supported only on previous versions of the AIX operating system to use the Remote Direct Memory Access (RDMA) configuration mode. AIX 7 with 7100-02 or later supports the adapter that is configured in either the RDMA or the network interface card (NIC) configuration. The host bus adapter (HBA), which was not available in earlier versions of the AIX operating systems, manages which mode is enabled.

            “As of AIX 7 with 7100-02, the PCIe2 10 GbE RoCE Adapter can be configured to run in the NIC configuration. If you do not have the network-intensive applications that benefit from RDMA, then you can run the adapter in the NIC configuration.”

The preceding URL includes some instructions on moving from RDMA to NIC (or from NIC to RDMA), but also note the steps taken by my colleague in his situation:

            ISSUE: PCI card EC30 (PCIe2 10GbE RoCE Converged Host Bus Adapter) is presenting only as hba0 and roce0 and not providing ethernet over fibre devices such as entX. The intention is to use this card for use on a VIO server as a shared Ethernet adapter in NIC mode.

            SOLUTION: The stacktype has to be changed from aix_ib to ofed. The default for this card as installed this instance is aix_ib (Infiniband).

########################

#  VIEW CARD LOCATION  #

########################

# lscfg |grep “-C5-“

* hba0             U78C5.001.DQD02KZ-P2-C5-T1        PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714106104)

+ roce0            U78C5.001.DQD02KZ-P2-C5-T1-L0     PCIe2 10GbE RoCE Converged Network Adapter

########################

#  REMOVE ROCE DEVICE  #

########################

# rmdev -dl roce0

roce0 deleted

#############################

#  VIEW ATTRIBUTES OF HBA0  #

#     (note stack_type)     #

#############################

# lsattr -El hba0

bar0          0xfbf00000         Bus memory address 0 False

bar1          0xfc000000         Bus memory address 1 False

bar2          0x80000000         Bus memory address 2 False

busintr       0                  Bus interrupt level  False

busintrl      129536             Bus interrupt        False

devid         0xb315506714106104 Device ID            False

intr_priority 3                  Interrupt priority   False

rom_mem       0x80080000         ROM memory address   False

stack_type    aix_ib             RoCE Stack Type      True

#############################

#  CHANGE DEVICE ATTRIBUTE  #

#############################

# chdev -l hba0 -a stack_type=ofed

hba0 changed

########################

#  RUN CFGMGR          #

########################

# cfgmgr

#############################

#  VIEW ATTRIBUTES OF HBA0  #

#     (note stack_type)     #

#############################

# lsattr -El hba0

bar0          0xfbf00000         Bus memory address 0 False

bar1          0xfc000000         Bus memory address 1 False

bar2          0x80000000         Bus memory address 2 False

busintr       0                  Bus interrupt level  False

busintrl      129536             Bus interrupt        False

devid         0xb315506714106104 Device ID            False

intr_priority 3                  Interrupt priority   False

rom_mem       0x80080000         ROM memory address   False

stack_type    ofed               RoCE Stack Type      True

#########################

#  VIEW CARD LOCATION   #

# (note new ent devices #

#########################

# lscfg |grep “-C5-“

* hba0             U78C5.001.DQD02KZ-P2-C5-T1        PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714106104)

+ ent4             U78C5.001.DQD02KZ-P2-C5-T1-L1     RoCE Converged Network Adapter

+ ent5             U78C5.001.DQD02KZ-P2-C5-T1-L2     RoCE Converged Network Adapter

As noted, information on this issue can be found under PCIe2 10 GbE RoCE Adapter support:

            “The PCIe2 10 GbE RoCE Adapter is preconfigured to operate in the RDMA configuration mode. A network that uses RDMA is more complicated to set up than the NIC configuration mode, but provides better performance than the NIC mode for network-intensive applications. This mode is often helpful for network storage or high-performance computing.”

Displaying Disk UUID

Edit: Some links no longer work.

Originally posted December 3, 2013 on AIXchange

If you’re on a mailing list for IBMers and business partners, you saw this information from Russell Adams earlier this fall. For the benefit of those who aren’t on that list, and with Russell’s permission, I’m reposting it here. (Note there are minor edits for the sake of clarity.)

AIX 6.1 TL7 introduced a new flag for the lspv command. It shows the unique ID (UUID) of disks in additional columns of the lspv output.

This new lspv -u is particularly useful in VIO server environments using vSCSI because the VIO client LPAR hdisk UDID contains the real UDID from the VIO server hdisk.

For example on a client LPAR using VSCSI for the rootvg (merged columns and spaces in the UDID are not a paste error)

… our client UDID contains the UDID from the VIO server with a prefix and suffix (where ^ indicates the prefix and suffix added by the VIO server):

Using UDIDs, the client can be quickly cross-referenced to the server with the most significant bytes of the UDID — in this case the middle 15 digits.

Historically, to find the real LUN that a client is using in a VIO server environment would require these steps for each VIO server:

* Obtain client hdisk parent vscsi device hardware location code and LUN number.
* Lookup on HMC, which VIO server and slot the client vSCSI device is linked to.
* Lookup vhost adapter on the VIO server by slot number.
* Lookup vSCSI mappings for vhost adapter to location hdisk on the VIO server.

A client with dual VIO servers would have to repeat the procedure twice. PVIDs can also shortcut the process, but they may not show up on the VIO server’s lspv output until after they’re written to the client and the VIO server is rebooted. If the client rewrites the PVID, the VIO server can also be out of date. Thus UDIDs are the preferred method because they’re static values.

The output can stretch the columns until they merge and spaces in the UDID break the columns. Hopefully this will be fixed in a future release.

More Twitter discussion from @robmcnelly:

@neverfishagain Tesla and Power 8 #power http://lnkd.in/dnHrmtS

@brian_smi  The Shell Scripts that make up #AIX https://www.ibm.com/developerworks/community/blogs/brian/entry/the_shell_scripts_that_make_up_aix …

@ROOTvgNET http://www-01.ibm.com/support/docview.wss?uid=aixtools_home … – IBM AIX Support Center Tools – United States | Some great #aixtools you should know.

@chmod666 The guy who gives me my passion for #AIX has just opened a blog. Please check it http://www.aixpowerlevel.com  . Thank you @prosty !!!!

@robmcnelly RT @cgibbo RT @Prosty: How to download a specific package from IBM fix central http://aixpowerlevel.com/2013/11/how-to-download-a-specific-package-from-ibm-fix-central/ … #AIX

Logging Console Output

Edit: Some links no longer work.

Originally posted November 27, 2013 on AIXchange

If you want to log your console output so you have a record of the commands you’re running, I can think of a few ways to accomplish this. One is to login and use the script command:

            “The script command makes a typescript of everything displayed on your terminal. The typescript is written to the file specified by the File parameter. The typescript can later be sent to the line printer. If no file name is given, the typescript is saved in the current directory with the file name typescript.

            “The script ends when the forked shell exits.

            “This command is useful for producing hard-copy records when hard-copy terminals are in short supply. For example, use the script command when you are working on a CRT display and need a hard-copy record of the dialog.”

Another option is to set up logging in PuTTY, connect to the HMC over ssh and run vtmenu or run mkvterm:

            “Being someone who does implementation services, I want to always be sure about what I do, what I did and what affect it had. So keeping logs of every time I connect to any device is very important to me. I want to be able to go back in time to any point at any client I have ever worked with (and here are some other putty tricks to try)…”

And here’s a review of 10 native PuTTY tips.

Perhaps though you’re just using the vanilla Java console session on the HMC. In that case, this method should work:

            “Command line access to the HMC through SSH is required. You can enable this from the HMC console or through remote WebSM.

            “Once you are logged in:

            Click HMC Maintenance -> System Configuration -> Enable/Disable Remote Command *Execution, check the SSH box.

            Next, click Enable/Disable Remote Virtual Terminals, check the box for Enable.

            1. Open a vterm window to the LPAR. Look at the left side of the title bar to get the vterm string, which will be in the form:

                        Partition number*Machine type-Model*Serial number

            2. From an SSH session on the HMC, create an empty file in /tmp with the vterm string as the filename. In this example:

            3. $ touch /tmp/004*7040-681*020153A (Note: The serial number in the filename is case sensitive. It must match the string from the vterm title bar exactly.)

            4. Close the vterm, then open it back up and proceed with troubleshooting (activate the LPAR, etc.).

            5. As soon as the LPAR outputs information to the vterm window, check the size of the file in /tmp on the HMC and verify that it has been written to.

            6. If it has not been written to, double check the name of the file against the title bar. If they match, try performing a Close Terminal Connection operation to force it closed, then open it again. This should cause it to start logging to the file.

            7. When logging is no longer needed, close the vterm. Send the output of the file in /tmp on the HMC to your remote Support Center. Next, remove the file from /tmp, if you fail to do so, it will log everything written to the vterm whenever a vterm window is opened. And you will inadvertently fill up /var on the HMC at some point in the future.

            (Note: In an SP environment, use s1term <frame#> <slot#> | tee <logfile> instead. )

            “This is documented in the “HMC Installation and Operations Guide: Appendix D:Using scripts to connect remotely.”

Do you use one of these methods to log console output when you are working on machines, or do you prefer another way?

More Twitter discussion from @robmcnelly:

@mr_nmon After RHEL6.4 install on Power, only took 5 mins as root to get #PowerVC running. Any good #AIX admin user can do it. Writing my hints now.

@sql_handle #IBMPower #AIX sql.sasquatch track – Things Fall Apart: Again my Turn to Win Some or Learn Some http://bit.ly/IhdimK

@LindaGrigoleit #powersystems Academic Initiative is going strong and growing http://bit.ly/1ffbE3r #ibmi #aix #powerlinux #ibmacademicinitiative

RT @cgibbo PowerHA & mountguard via @wmduszyk http://www.wmduszyk.com/?p=10643&langswitch_lang=en … #AIX #PowerHA

@cgibbo LPM and viosecure.

 https://www.ibm.com/developerworks/community/blogs/cgaix/entry/lpm_and_viosecure?lang=en … #AIX #PowerVM

@brian_smi Little Known Feature in #SMIT Install Software Screen on #AIX https://www.ibm.com/developerworks/community/blogs/brian/entry/little_known_feature_in_smit_install_software_screen …

@mr_nmon #PowerVC 1.2 hands-on demo video by Greg Hintermeister showing the new wave of #PowerSystems LPAR management http://www.youtube.com/watch?v=RFTbC6JW7YE&feature=em-uploademail 

Following Up on the Technical University Event

Edit: Some links no longer work. I do not see much paper at events these days.

Originally posted November 19, 2013 on AIXchange

This article got me thinking about my own experiences at the recent IBM Power Systems Technical University conference. I always say that if you only get to go to one IBM education event during the year, this is the one to attend. There’s always something new to learn, and if you find yourself in a session that isn’t at the right skill level for you (too easy/too hard/too whatever), you can easily get up and go to another room down the hall.

In my case I had the chance to spend some time during the conference with IBMer Chris Gibson. I’ve known Chris for years, but since he lives in Australia, this was our first chance to say hello in person. It was also a great opportunity to actually hear IBM “rock stars” like Nigel Griffiths (@mr_nmon on Twitter) and Jay Kruemcke (@chromeaix) during their sessions. Thursday night’s “meet the experts” (or “stump the chumps,” depending on who you ask) event is always a highlight, as attendees answer all sorts of questions from the audience in a freewheeling discussion where customers can get more insight into IBM’s future plans.

One interesting change this year was that IBM distributing an electronic version of the session planner. In years past, upon registering you were handed a booklet containing the whole week’s sessions at a glance. This year we got a URL and login info so we could do everything electronically. Android users could even install an application.

The electronic approach had pros and cons. One nice thing was that information about session changes (additions, drops, changes in venue, etc.) were sent immediately to attendees’ handheld devices. Providing session feedback from my phone was also a snap. Using the directory of attendees, feedback could also be shared with conference goers and conference organizers. In addition, the electronic version made it very easy to search for topics, presenters, etc.

Some downsides to this approach included limited bandwidth and spotty network coverage at the venue. I found cellular coverage to be lacking in some areas. I imagine though that most attendees had at least one phone and one tablet/laptop with them — getting the wifi working fairly quickly wasn’t a huge issue. However, at times it took awhile to load session data , and frankly, I just missed having my usual paper copy.

Apparently I wasn’t the only one, because later in the week, hard copies for each day’s sessions were made available at the registration desk. I like the paper schedule because I can look at the grid, circle the sessions I’m interested in, and know at a glance where I’m headed in the next hour. The paper schedule also offers the added benefit of making it easy to track down sessions slides that interest me. This year there was an index file to help us find specific sessions.

Obviously the trend toward eliminating paper is well-established, but it’s still good for some things. It will be interesting to see how IBM approaches this at future events.

Striking a Balance Between the Command Line and GUIs

Edit: Some links no longer work. Still a good discussion.

Originally posted November 12, 2013 on AIXchange

There’s been some recent discussion and debate about the benefits of using the command line (or green screen, in IBM i parlance) as opposed to GUIs.

Despite the wishes of some, the green screen is still with us. Despite the perceptions of others, the GUI is more than just a pretty interface.

In my case, I’ve been doing more work that involves connecting external storage with IBM i systems. What’s interesting to me is that many (though certainly not all) of the IBM i guys I’ve been dealing with strongly prefer the HMC GUI to the VIO server command line when they’re configuring virtual networking and virtual storage. As more and more can be done straight from the HMC GUI, these folks find that they no longer need to login as padmin. Then they question why they should even bother to learn the commands. Finally, if they never need to touch VIOS after installation, they wonder why they should bother even learning it when they can just call some AIX guy like me if they ever need help.

I can see the merit in this argument, and my counterargument that it’s good to know what’s going on under the covers is generally met with a look of, “I’m just going to call you anyway, so let’s talk about something else.”

This really got me thinking. Just what makes certain people inclined to favor the command line in IBM i over GUIs (and vice versa)? I’m sure familiarity has something to do with it. I saw my first AS/400 command line and first logged in as QSECOFR in the late 1980s. Today I can logon and do practically everything I could do then in the exact same way. If I’d spent my whole career working on servers from AS/400 up to Power Systems with IBM i, why would I want to learn anything else?

By forcing folks like this to learn VIOS, we’re taking them way out of their comfort zone into the crazy upside-down world of UNIX. This isn’t just about IBM i people, though. Other UNIX pros can be driven batty when they’re forced to use the korn shell with set –o vi and stty erase ^? and oem_setup_env and all of the other quirky nonsense many of us take for granted because we use it every day.

While I’m neither shocked nor surprised when someone tells me that they prefer the HMC GUI, it does make me pause and consider. Do IBM i professionals REALLY prefer the command line? The biggest command line backers I know hate mouses and will go to any length to avoid using them. Were this truly a command line vs. GUI argument, then IBM i guys would make the effort learn how to login as padmin just to get out of pointing and clicking on an HMC GUI.

Maybe the truth lies somewhere in the middle. Maybe the IBM i pros do understand that GUIs have their advantages in certain situations. In general though, they still prefer to do things the way they’ve always done them.

I’m the same way. I’d rather login as padmin or root and get my work done on the command line. I find it faster, and I believe it gives me more control of and greater insight into what’s going on with the configuration. But another part of it is that I’ve always preferred logging into command lines when administering any machine. It’s what I’ve always done. The HMC GUI came along — not to mention IBM Systems Director and myriad other tools to manage machines — but I kept doing things the way I did them.

Is this because I’m resistant to change, or is it truly because I’m using the right tool for the job?

A Closer Look at PowerVC

Edit: Some links no longer work.

Originally posted November 5, 2013 on AIXchange

During the recent Enterprise2013 event, IBMers Glen Corneau and Bill Miller gave a nice presentation on PowerVC, the new virtualization management product.

We were also able to see a live demo. The understanding was that this isn’t GA code that will be present once the product is actually released, but what we saw represented the functionality that’s expected to be delivered.

I noticed the following in one of the slide decks: “with PowerVC you can register physical hosts, a storage subsystem, and network resources and use them to create a virtual environment. You can create, resize and attach volumes to virtual machines. You can monitor the utilization of the resources in your environment, and you can migrate virtual machines while they are running. You can capture a virtual machine that is configured the way you want it to be.”

PowerVC is designed to be simple to install and configure, while providing an intuitive user interface. What we saw in Orlando reminded me of the V7000 or XIV interfaces (perhaps you’re familiar with them).

PowerVC is built on OpenStack, which includes open APIs that are designed to provide flexibility and agility. There are two editions, Express and Standard. Express is meant for IVM-managed servers, while standard is intended to be used on HMC managed servers. POWER6, POWER7 and POWER7+ are all supported. On the OS side, AIX and PowerLinux are currently supported, while IBM says IBM i will be supported in a future PowerVC release.

PowerVC is run on top of a RHEL 6.4 OS image running on Power or x86 with a minimum of 8 GB of memory, two virtual uncapped CPUs with a minimum of one entitled CPU (two entitled CPUs are recommended) along with 40 GB of disk (or more if you plan on importing many .iso images). For now, the IBM SVC storage family — SVC, V7000, V3700, V3500 — with V6.4 or later code must be used. PowerVC is currently is limited to one managed storage subsystem.

The PowerVC server must be able to talk across the network to the storage, fabric and IVM/VIOS LPARs. This product does not install VIOS for you; it assumes that IVM/VIOS is already configured and installed.

With the Express edition, IVM 2.2.1.5 or later is required. You can run Virtual SCSI only, and your storage must be pre-zoned. This edition has a limit of five managed hosts and a maximum of 100 LPARs.

Standard edition requires HMC V7.7.8 or later running on CR5/C08 or later HMC models. You must be running VIOS V2.2.3 or later. It supports NPIV only, and Brocade switches only. (Note: Hopefully more vendors’ storage and network products will be supported in future releases, but we need to be sure to let those vendors know that they need to provide APIs to OpenStack.) Ten managed hosts and 40 LPARs per host are allowed, for a maximum of 400 LPARs.

The charts from the presentation in Orlando also detail the differences between VMControl V2.4.3 and   PowerVC V1.2.

            VMControl:

            _ Supports AIX, Linux and IBM i

            _ Supports IVM and HMC

            _ Suspend/Resume workloads

            _ Remote Restart workloads

            _ LPM to host or pool

            _ Virtual SCSI and NPIV (with appropriate storage+SAN)

            _ VIOS Shared Storage Pool support

            _ Requires IBM Systems Director

            _ Supports NIM-based, SCS-based and SSP capture/deploy

            _ Supports IBM DS8000-family, XIV, SVC-family, DS storage

            _ Limited third party disk support

            Use VMControl if you are looking for the following capabilities across multiple platforms:

            • Cross-platform management, navigation and look and feel

            • Management of multi-workload system pools, Virtual Image Versioning management

            • System Pool creation and Manage workload availability end to end

            • Supports NIM, SCS and Shared Storage Pool deployment environments

            • Supports NPIV and VSCI environments for XIV, SVC, V7000, and DS8000

            • Requires IBM Systems Director as a base

            PowerVC

            _ Supports AIX, Linux

            _ IBM i is a statement of direction

            _ Supports IVM and HMC

            _ LPM to host

            _ Virtual SCSI and NPIV (with appropriate storage+SAN)

            _ Built on OpenStack, no IBM Systems Director dependency

            _ Supports SCS-type image capture/deploy

            _ Supports ISO images

            _ Supports SVC-family storage

            _ Modify resources during deploy

            Use PowerVC if you want to manage Virtual Machines on Power running Linux or AIX:

            • PowerVC is advanced virtualization management for Power Systems

            • Fast time to value and quick integration with SmartCloud bundle

            • PowerVC initial offering supports NPIV, V7000 or SVC and Brocade switches

            • Virtual Machine Image management, deployment, relocation, capture and creation

            • Create and manage virtual machines, automate workload and resource provisioning

            • Offered standalone or as part of SmartCloud bundle and AIX Enterprise Edition

Learn more about PowerVC:

            PowerVC on the Web

            PowerVC on Service Management Connect

            PowerVC Prototype Demo

On another note, I saw this on a mailing list:

            Flex System Manager v1.3.1: for Android and iOS1.3.1
            This release went live Friday, October 12, 2013.

Android: https://play.google.com/store/apps/details?id=com.ibm.msm.android

iOS: https://itunes.apple.com/us/app/ibm-flex-system-manager-for/id576901013?ls=1&mt=8

And finally, more Twitter discussion from @robmcnelly:

@chromeaix 9h #Oracle documented their policies for software licensing See http://www.oracle.com/us/corporate/pricing/software-investment-guide/index  …

RT @cgibbo RT @mymindspace: “Best Practice” recommendation for #AIX Virtual Memory Manager settings for #DB2: http://www-01.ibm.com/support/docview.wss?uid=swg21328602 …

RT @chromeaix IBM Software PVU value for any Power Systems core running Linux is now set to 70 #powerSystems #linux http://ow.ly/qnJD9

@IBMRedbooks 22h IBM Power Systems – it is all made by one company! Read our new blog post here: http://ibm.co/16NxOSd  #PowerSystems

RT @BreakingNews FAA: Airlines can safely expand passenger use of portable electronic devices during all phases of flight – @NBCNews

@IBMPowereSupp 30 Oct This is what we were showing at #Enterprise2013 RT @IBMPowereSupp: Support Portal is out and it’s not scary at all! http://support.ibm.com

@attritionorg 29 Oct Telnet of the Day: 107.21.219.86

@IBMRedbooks 29 Oct William Lowe, the ‘father of the IBM PC,’ dies at 72: http://cnet.co/1irimjS

@UnixToolTip 29 Oct “Pointing and clicking does not scale.”

Getting Started with CoD

Edit: Some links no longer work. Luckily COD activiation is much easier these days.

Originally posted October 29, 2013 on AIXchange

Hopefully you read Charlie Cler’s article in the September issue of IBM Systems Magazine about the various Capacity on Demand (CoD) options.

Here’s a CoD question I often get: How do you activate it?

The first step is to get the necessary activation codes from IBM — and yes, you’ll likely be getting multiple codes. Recently a customer I was working with got three codes: one to activate more memory, one to activate more processors and one to enable PowerVM Enterprise edition, VIO servers, micropartitioning, etc., on the newly activated processors.

IBM ships the codes on paper, so the hardest part of the whole exercise is making sure the 34 characters of the activation codes are correctly entered on your HMC. The document you receive from IBM displays the system type, the serial number, the Anchor card CCIN, the Anchor Card serial number and the Anchor card unique identifier. You’ll also see how many previously activated resources you had and how many you’re activating. This can be a nice sanity check to ensure that the order went through as expected.

I’ve had customers mistakenly believe they could activate resources that don’t yet exist in the machine. However, this isn’t magic; the hardware must first be installed. In many cases the needed resources are installed with new customer systems, but not immediately activated.

Once you have your activation code(s), go to your HMC, select the server you want to work on and then select the Capacity on Demand task. There’s a spot on this menu where you can view the history log — which displays all of the dates and times of the various resources you’ve activated — as well as a place to enter your new CoD code.

There are also places where you can view capacity settings for your resources — including, for example, inactive CoD memory, permanently activated memory, temporarily activated memory and installed memory.

The same menus are available for processors. You’ll also find options for Enterprise Enablement. Further down the screen is an area for PowerVM and Other Advanced Functions.

Many customers expect their environments to grow and can logically assume that they’ll eventually need new resources, but in these cases it can be difficult to pinpoint when the need will require action. CoD can be a great way to prepare for the unknown.

Note: We just finished up the IBM Power Systems Technical University at Enterprise2013 event. I attended tons of great sessions and met and talked with many IBM presenters as well as a number of readers of this blog. Incidentally, the 2014 Technical University will be at the Venetian in Las Vegas on Oct. 6-10, so add it to your calendar now and start making plans to attend. I hope to see you there.

Finally, a few highlights from Twitter. Follow @robmcnelly, and check out #ibmenterprise for tweets related to the conference.

RT @ElReg They’ve taken my storage hostage … now what?: How one user device nearly brought down the business. http://bit.ly/19Ac9jG

RT @cgibbo RT @IBMPowerSystems: #PowerSystems tip of the day: seastat search option http://ibm.co/16vH3pS

RT @chromeaix #powersystems #AIX IBM PowerHA SystemMirror rapid deploy cluster worksheets for IBM AIX http://ow.ly/2AXYeJ

RT @cgibbo RT @IBM_FLRT: Check out new #FLRT Lite https://www14.software.ibm.com/support/customercare/flrt/liteHome… Quick and easy recommendations at your fingertips!

RT @scalzi Old School (via @Reddit): http://i.imgur.com/g0Zgf6q.jpg

The Power of the HMC Command Line

Edit: Some links no longer work.

Originally posted October 22, 2013 on AIXchange

When using the HMC, do you do more with the GUI or on the command line? The more systems you’re managing and the more operations you’re doing, the more you’ll benefit by getting comfortable with the HMC command line.

While I like new commands such as lsnportlogin and chnportlogin, the HMC command line itself isn’t new. For instance, this article from 2008 has some handy tricks. And to give you an idea of the wealth of useful information here, I’ll include the list of contents:

            HMC Management

                HMC Version

                Network configuration of the HMC

                Reboot the HMC

                How to change the HMC password (of user hscroot)

                Show Available Filesystem Space

            LPAR Management: Status Information

                LPAR Status

                Show Status and LED/LCD Display of an LPAR

                Show Status and LED/LCD Display of a Systems Running in FullPartitionMode

                Overview LPAR IDs

                Overview Connection State

                Show a List of all I/O Adapters

                Overview DLPAR status

            LPAR Management: Operations

                Soft Reset of an LPAR

                Soft Reset of a Systems Running in FullPartitionMode

                Hard Reset of an LPAR

                Hard Reset of a Systems Running in FullPartitionMode

                Virtual Console

                Activation of an LPAR

                How to boot an LPAR into SMS Menu

                How to Power on a System Running in FullPartitionMode

                Bring the key switch to position NORMAL

            LPAR Configuation

                Change an LPAR’s Name

                Rename a Managed System

                DLPAR: Increase the Number of Processing Units of an LPAR

            Operations in an virtualized environment

                Make virtual WWPNs visible to the SAN

                Show all virtual WWPNs assigned to an LPAR

                Logout virtual WWPNs from the SAN

Here are just a few things you can do from the HMC command line:

* Would you like to see all of the managed systems that are connected to your HMC? Run:

            lssyscfg -r sys -F name

* Perhaps you need to know which LPARs are on your machine and whether or not they’re running:

            lssyscfg -m Server1 -r lpar -F name:state

* This handy command lists every machine connected to your HMC, and tells you whether or not the LPARs on these devices are running:

            for m in $(lssyscfg -r sys -F name); do echo $m ; lssyscfg -r lpar -m $m -F name:state ; done

* Maybe you want to know the machine name, along with the IP address the service processor is using, and whether or not it’s connected to the HMC:

            lssysconn -r all -F type_model_serial_num:ipaddr:state | sort

* Maybe you want to see which I/O devices are assigned to which LPARs:

            lshwres -r io -m Server1 –rsubtype slot -F lpar_name:drc_name:description

* Or perhaps you want to see the profile information for your LPAR 1:

            lssyscfg -r prof -m Server1 –filter “lpar_ids=1”

* Another command I like is lssyscfg, which helps you determine all of the wwpns associated with your LPAR:

            lssyscfg -r prof -m Server1 -F virtual_fc_adapters –filter lpar_names=lpar1

This command would provide this output:

“””2405/client/3/vios2/2405/c0507606b5ef0012,c0507606b5ef0013/0″”,””1605/client/2/vios1/1605/c0507606b5ef0010,c0507606b5ef0011/0″”,””2605/client/3/vios2/2605/c0507606b5ef0014,c0507606b5ef0015/0″”,””1405/client/2/vios1/1405/c0507606b5ef0016,c0507606b5ef0017/0″””

* With this command, you can easily see what the adapter numbers are and which VIO server they’re connected to. Obviously you could change what you’re filtering on; in this case we’re just looking it up via LPAR ID number rather than the LPAR NAME:

            lssyscfg -r prof -m Server1 -F virtual_fc_adapters –filter lpar_ids=8

* Maybe you want to list every WWPN for every LPAR on your machine with its default profile:

            lsnportlogin -m Server1 –filter “profile_names=default”

* Or maybe you really just want the WWPNs without other information included:

            lsnportlogin -m Server1 –filter lpar_names=lpar1 | cut -c 68-88

wwpn=c0507602c5340034

            wwpn=c0507602c5340035

            wwpn=c0507602c5340042

            wwpn=c0507602c5340043

            wwpn=c0507602c5340044

            wwpn=c0507602c5340045

            wwpn=c0507602c5340030

            wwpn=c0507602c5340031

* Maybe you want to list out the LPAR names with the WWPNs:

            lssyscfg -r prof -m Server1 –filter lpar_names=lpar1 -F lpar_name,virtual_fc_adapters

* Or you could check every frame connected to your HMC with something like this:

            lssyscfg -r sys -F name |
            while read M; do lshwres -r virtualio –rsubtype fc –level lpar -m $M -F lpar_name,wwpns|
            sed ‘s/^/’$M,’/’
            done

* This loop is used to login the virtual fibre adapters of all of the LPARs on a frame:

            for i in `lssyscfg -m Server1 -r lpar -F name`; do echo $i;chnportlogin -o login -m Server1 -p $i ; done

There’s much more of course, but this should give you an idea of the power of the HMC command line.

Finally, some interesting links this week courtesy of those I follow on Twitter:

@cgibbo Dynamic Platform Optimizer with Tracy Smith. October 31, 2013. Register now. https://www1.gotomeeting.com/register/214938672 … https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/AIX+Virtual+User+Group+-+USA … #AIX

RT @UnixToolTip RT @jpmens: Unix Recovery Legend http://www.ee.ryerson.ca/~elf/hack/recovery  …

@mr_nmon IDC Whitepaper on security of #PowerVM Virtualisation http://public.dhe.ibm.com/common/ssi/ecm/en/pol03175usen/POL03175USEN.PDF … Double standards: POWER=mission critical & x86 anything goes!

@ibmperformance Oracle’s hardware business may be worse than we thought http://gigaom.com/2013/10/14/oracles-hardware-business-may-be-worse-than-we-thought/ … via @gigaom

@chromeaix #powersystems #AIX Using the NIM service handler with the NIM Alternate Disk Migration tool http://ow.ly/2ADpjg

Three Scripts

Edit: Some links no longer work.

Originally posted October 15, 2013 on AIXchange

I’ve seen a few good scripts lately. Brian Smith mentions a couple of them in his blog. This one will show you if your HBA/hdisk settings are actually in effect.

            “There are several storage related settings in AIX that cannot be changed if the device is active. These include “fast_fail,” dynamic tracking (dyntrk), and the “num_cmd_elems” for HBAs and the Queue Depth for hdisks.

            “Your options to set these are either make the device inactive (usually by taking redundant paths offline) and then make the change, or to use the “-P” flag on chdev and then reboot the server to make the change effective at the next boot.

            “The “-P” option on chdev has one major drawback however. As soon as you make the change with chdev “-P”, it appears that the setting is active right away even before the reboot. If you check with “lsattr”, it will appear as if the setting has taken effect. However it actually won’t take effect until the next reboot. What has essentially taken place is that the running configuration is out of sync with the ODM. The ODM reflects the updated settings, however they can’t be changed in the running configuration of the AIX kernel until the next reboot.”

Brian’s other script shows the location of every physical partition on each hdisk:

            “… The output shows which Logical Volume (LV) is on each of the PPs (or if it is free space). The output is color coded so each LV has its own color so that it is very easy to see where each LV physically is across the entire Volume Group. You can specify the number of columns of output depending on the size of your screen.

            “The intended use of the script is to show a visual representation of the Volume Group to make using commands which move around LPs/PPs such as migratelp easier to use, to make LVM/disk maintenance easier, and also as a learning tool.”

Finally, IBMer Dean Rowswell sent me the following script. He explains, “Lately I’ve been working with customers who are still using virtual SCSI, so I updated my old script with some new information. Maybe others will use this, too. It helped me to quickly and easily set the MPIO path priorities to balance I/O across the VIOS.”

It runs on an AIX LPAR. Here’s some sample output:

root@nim:/:# get_vdisk_path_priority
———————
Virtual SCSI adapters
———————
U9117.MMA.06XXXXX-V6-C41  Virtual I/O Slot  vscsi1
U9117.MMA.06XXXXX-V6-C42  Virtual I/O Slot  vscsi2

Attributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: vscsi1 ->
fast_fail 30
Attributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: vscsi2 ->
fast_fail 30

VDISK: hdisk0     ADAPTER: vscsi2  (MPIO priority: 2)   ADAPTER: vscsi1
(MPIO priority: 1)    VG: vgNIM1
VDISK: hdisk2   ADAPTER: vscsi1  (MPIO priority: 1)   ADAPTER: vscsi2
(MPIO priority: 2)     VG: rootvg
VDISK: hdisk3     ADAPTER: vscsi1  (MPIO priority: 1)   ADAPTER: vscsi2
(MPIO priority: 2)     VG: vgNIM2
VDISK: hdisk7     ADAPTER: vscsi2  (MPIO priority: 2)   ADAPTER: vscsi1
(MPIO priority: 1)     VG: None

Now for Dean’s script: 

#!/bin/ksh

# Created by Dean Rowswell, November 8, 2011

# Modified by Dean Rowswell, September 5, 2013

#       Combine all paths into a single line with the hdisk

# Modified by Dean Rowswell, October 2, 2013

#          Add the volume group info for each hdisk

#         Display the vscsi adapter attributes

# This script will display each virtual scsi disk path priority info

VDISKS=`lsdev -Cc disk -Sa -s vscsi -F name`

if [ ${#VDISKS} -eq 0 ]

then

        echo “There are no Virtual SCSI disks on this system”

        exit 0

else

            DATE=`date +’%Y%m%d_%H%M%S’`

        echo “———————“

        echo “Virtual SCSI adapters”

        echo “———————“

        lsslot -c slot|grep vscsi

            for VSCSI in `lsdev -Ccadapter -Sa -F name|grep vscsi`

            do

                           echo “\nAttributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: ${VSCSI} -> \c”

                        lsattr -El ${VSCSI} -a vscsi_err_recov,vscsi_path_to -F value | tr ‘\n’ ‘ ‘

            done

            echo

            lspv >/tmp/lspv.${DATE}

            for VDISK in ${VDISKS}

        do

                echo “\nVDISK: ${VDISK}\t\c”

                LSPATHS=`lspath -F ‘parent:connection’ -l ${VDISK}`

                for LSPATH in ${LSPATHS}

               do

                        PARENT=`echo ${LSPATH} | awk -F: ‘{print $1}’`

                        CONN=`echo ${LSPATH} | awk -F: ‘{print $2}’`

                        echo ”  ADAPTER: ${PARENT}  (MPIO priority: `lspath -AE -l ${VDISK} -p ${PARENT} -w ${CONN}|awk ‘{print $2}’`) \c”

                done

                        VG=`grep -w ${VDISK} /tmp/lspv.${DATE} | awk ‘{print $3}’`

                        echo “\tVG: ${VG}\c”

        done

fi

rm /tmp/lspv.${DATE}

echo

More conversation from Twitter (@robmcnelly):

@ROOTvgNET

ROOTVG – AIX & POWER Portal – Stream AIX AUDIT into SYSLOG | Do more with AIX Audit and Syslog! http://www.rootvg.net/content/view/575/1/ … via @ROOTvgNET

‏@chromeaix #powersystems #AIX Using the NIM service handler with the NIM Alternate Disk Migration tool http://ow.ly/2ADpjg

@cgibbo Creating a System Copy WPAR. https://www.ibm.com/developerworks/community/blogs/cgaix/entry/lpar_to_wpar_migration?lang=en … #AIX #WPARS

@ibmvlp Download the Quick Reference mobile app for IBM Power Systems http://www-03.ibm.com/systems/power/resources/mobileapp/index.html …

@power_gaz >100 on “Tricks of the Power Masters” webinar http://tinyurl.com/PowerSystemsTechnicalWebinars … including @cgibbo and @chmod666 Thanks #ibmpowersystems #aix #power7

@UnixToolTip One advantage of visually dull environments like the command line or an editor is that there isn’t much to do there but work.

@mr_nmon 9 Oct Ten Things POWER & AIX Techies need to know from the IBM Announcements 8th Oct 2013 See my #AIXpert blog https://www.ibm.com/developerworks/community/blogs/aixpert/entry/ten_things_power_techies_need_to_know_from_the_ibm_announcements_8th_oct_2013 …

@cgibbo: @brian_smi new -T flag for mksysb command.” This does a JFS2 snapshot & backs that up for a time consistent fileset.

@cgibbo 7 Oct Simplified Shared Ethernet Adapter Failover config. Removes requirement for ctrl channel for SEA FO. #PowerVM #AIX http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=760&letternum=ENUSJP13-0509 …

@cgibbo 7 Oct New lsattr P option to display attributes that may not yet be in effect on running system. AIX 7.1 TL3 & 6.1 TL9 #AIX http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=760&letternum=ENUSJP13-0508 …

@zootman HMC v 7.8 – Tracking of DLPAR activity within the current profile enables reactivation of a LPAR with all changes intact. YAY!

PowerVP, PowerVC Highlight Today’s IBM Announcements

Edit: Some links no longer work.

Originally posted October 8, 2013 on AIXchange

Today’s announcements have quite a bit of interesting information, but two new products that I’m especially excited about are PowerVP and PowerVC.

PowerVP uses a graphical Java client to monitor virtual workloads, which of course can be much more complex than the workloads we once managed on standalone systems. With virtualized systems, performance issues can come from physical hardware, the VIO server, the client LPAR or something outside of the frame entirely.

Once you install an agent on the supported version of your operating system and update your POWER7 or POWER7+ systems to the proper level of firmware, you’ll have the option of viewing system performance in real time or in DVR mode. The latter allows you to read log files to determine how your system was performing earlier. Data is collected on the Windows, AIX or Linux workstation on which the PowerVP monitor runs.

PowerVP allows you to drill down and view specific memory DIMM or CPU usage. You can also see the hardware adapters you’re using and how heavily these components are being utilized. In short, PowerVP provides an overall view of your hardware so you can see how your machines are consuming resources. GA is expected on Nov. 15.

PowerVC is advanced virtualization management. I think of it as a simplified version of the VMcontrol plugin for System Director. This product is based on OpenStack, an open source solution that provides cloud infrastructure capability. IBM wanted to make PowerVC simple to install, simple to configure and simple to use. On a recent briefing call, IBM said it brought customers to its labs and recorded them using the product. These sessions indicated that PowerVC requires minimal training and is very intuitive and self-explanatory.

This video shows a prototype of the code from June 2013. Additional information can be found here and here. PowerVC GA is expected on Nov. 22.

Other announcement highlights include:

* The Power Integrated Facility for Linux — Power IFL is a new solution that allows customers to activate unused cores and memory to run Linux workloads on IBM Power hardware at a very competitive price point. In practice, customers will run these Linux cores in virtual shared processor pools that are separate from their existing AIX or IBM i shared processor pools.

* PowerSC is updated with Linux compliance automation and improvements in the trusted firewall.

* The new PowerVM version features shared storage pool enhancements and improved live partition mobility performance. In addition, new information in the VIOS Performance Advisor tool is going to be available to cover fibre channel, shared Ethernet adapters and shared storage pools.

* New Power Enterprise pools are designed to allow for flexibility for IBM clients. Customers will be able to purchase virtual processors and memory (CUoD resources) that may be shared within a defined pool of enterprise-class Power servers. Applications may be reallocated within that pool of servers as needed with live partition mobility.

* AIX 7.1 TL3 and 6.1 TL9 feature enhanced live backup support and provide better LDAP support for users and groups.

As you learn more about these announcements, what stands out to you?

Finally, more highlights from Twitter (@robmcnelly):

RT @cgibbo RT @Fed67j: A new #hmcScanner version is available with graph: http://ibm.co/1489RC8

@mr_nmon Demo’s are so 1990s! I captured everything I know on #SystemsDirector in YouTube videos=4 hours/14 parts: https://www.ibm.com/developerworks/community/blogs/aixpert/entry/systems_director_6_3_demonstrations?lang=en … #POWER7

RT @cgibbo RT @mohakevin: #AIX – LVM #cheatsheet http://adri.ws/jdwq4

RT @brian_smi Script to show if your #AIX HBA / hdisk settings are actually in effect

 https://www.ibm.com/developerworks/community/blogs/brian/entry/compare_hba_settings …

@sandy_carter Wow! Blogs are 63% more likely to influence purchase decisions than magazines. (Source: Optimind) #socbiz @PamMktgNut #ibmsocialbiz

@IBMRedbooks Watch our new IBM Power 710 and 730 Technical Overview and Introduction video here: http://youtu.be/Jq565b_2fks  #PowerSystems

@AIXmag VIDEO: @robmcnelly demonstrates how to create sysplans from the HMC & run the system-planning tool on your PC. #AIX

RT @Greater_IBM 5 Ways To Become An #IBM Champion (Oct 15 Deadline): http://wp.me/p2kcos-2wm #ibmchampion #developerworks #ITLeader

Logging a Client Virtual Fibre Channel Adapter into a SAN

Edit: Some links no longer work.

Originally posted October 1, 2013 on AIXchange

Recently a customer was presenting some LUNs to some NPIV clients on a server. There were many LUNs and many clients, and the SAN guys wanted all of them to appear on the switch so they could begin zoning them.

I remembered reading this Chris Gibson article about the chnportlogin and lsnportlogin commands that you can run on the HMC command line:

            “There are two new HMC (V7.7.3.0) commands that can force a client Virtual Fibre Channel adapter to log into a SAN. This should make the life of the AIX and SAN administrator easier, as they will no longer need to install AIX in order for the new VFC adapters to log into the SAN. Although there was an unsupported method* for doing this already (see links below). Nor will the SAN admins need to “blind” zone the WWPNs.”

For instance, we could run…

            chnportlogin –m Server1 –o login –id 84

            lsnportlogin -m Server1 –filter lpar_ids=84

… and see something like the code in this Word document:

This output was also helpful in that it provided the wwpn information for each client.

One of the LPARs had issues, so we were able to log out, log back in and then verify the information with these commands:

            chnportlogin -m Server1 -o logout –id 72

            chnportlogin -m Server1 -o login –id 72

            lsnportlogin -m Server1  –filter lpar_ids=72

For usage information you can just run the commands on the HMC command line:

            chnportlogin

            Usage: chnportlogin -o login | logout

                                -m

                                 -p | –id

                                [-n ]

                                [-w ]

                                [-d ]

                                [-v]

                                [–help]

This performs N_Port login and logout operations for virtual fibre channel client adapters that are configured in a partition or a partition profile:

    -o                  – the operation to perform:

                            login  – log in the virtual fibre channel client adapters

                            logout – log out the virtual fibre channel client adapters

    -m – the managed system’s name

    -p – the name of the partition for which the operation is to be performed

    –id – the ID of the partition for which the operation is to be performed

    -n    – the name of the profile for which the operation is to be performed

    -w      – the maximum time, in minutes, to wait for VIOS commands issued by the management console to complete

    -d – the level of detail to request from VIOS commands issued by the management console – values range from 0 (none) to 5 (highest)

     -v                  – enables verbose mode

     –help              – prints this help

lsnportlogin

Usage: lsnportlogin -m

                    [-w ]

                    [-d ]

                    –filter “”

                    [-F []]

                    [–header]

                    [–help]

This lists WWPN login status for virtual fibre channel client adapters.

    -m       – the managed system’s name

    -w           – the maximum time, in minutes, to wait for VIOS commands issued by the management console to complete

   -d         – the level of detail to request from VIOS commands issued by the management console –  values range from 0 (none) to 5 (highest)

    –filter “” – filters the WWPNs to be listed. The syntax is:

                               “filter_name1=value,filter_name2=value,…”

                                 or

                                  “”filter_name1=value1,value2,…”,…”

                                 Valid filter names are:

                                  lpar_ids, lpar_names, profile_names 

     -F []   – delimiter separated list of the names of the attributes to be listed. If no attribute names are specified, then all attributes will be listed.

    –header                 – prints a header of attribute names when -F is also specified

    –help                   – prints this help

Chris’s article cites this additional information:

            * lsnportlogin

            * chnportlogin

            * How to force a vfc-client device to log in to the SAN (The OLD way)!

            * How to Capture SAN Boot Debug for Virtual I/O Server and AIX on P6 Systems

            * Disk path design for AIX including SAN zoning     

Be sure to read the article, and give the instructions a try the next time you’re setting up NPIV.

More highlights from Twitter (@robmcnelly):

RT @cgibbo RT @nixysug: Got sendmail ipv6 errors in syslog?  http://nixys.fr/blog/?p=1260

@ibmperformance 25 Sep Guns and Butter at OpenWorld http://wp.me/p1lgsI-bm

@mr_nmonn 25 Sep Oct 1st #IBM opens 4th #LinuxOnPOWER Centre in Montpellier, France, to cover Europe for briefings, architecture design & HW for users & ISVs.

RT @ElReg Ellison ditches cloud keynote for billionaires’ boat race: Mass exodus after King Database snubs attendees http://bit.ly/15qRHOi

RT @chmod666: Want to reset a lost hscroot password. Add init=/bin/rcpwsh on kernel line in grub at hmc boot. #HMC

@brian_smi 23 Sep Update: Visualize the Physical Layout of an #AIX Volume Group https://www.ibm.com/developerworks/community/blogs/brian/entry/update_visualize_the_physical_layout_of_an_aix_volume_group …

RT @mr_nmon IBM #PowerSystems Announcements Oct 8th Completely new products, large new features & upgrades. Register http://bit.ly/SCwebcastUK

Restricting FTP Access

Edit: Some links no longer work.

Originally posted September 24, 2013 on AIXchange

A customer was trying to restrict user access to a particular directory on an AIX system when FTP was used. We came across two good options.First, I recalled this exchange on Twitter:

            sungokcho: RT @ibmaix: #AIX #tip to restrict ftp user to a given directory use /etc/ftpaccess.ctl. It is useful if the user connects via winscp (via @JuanMDia35)

This 2009 post covers the same thing. And here’s some detailed information:

            ftpaccess.ctl File

            The /etc/ftpaccess.ctl file is searched for lines that start with allow:, deny:, readonly:, writeonly:, readwrite:, useronly:, grouponly:, herald: and/or motd:. Other lines are ignored. If the file doesn’t exist, then ftp access is allowed for all hosts. The allow: and deny: lines are for restricting host access. The readonly:, writeonly: and readwrite: lines are for restricting ftp reads (get) and writes (put). The useronly: and grouponly: lines are for defining anonymous users. The herald: and motd: lines are for multiline messages before and afterlogin.             

The syntax for all lines in /etc/ftpaccess.ctl is in the form:

            keyword: value, value, …

            where you can specify one or more values for every keyword. You can have multiple lines with the same keyword. The lines in /etc/ftpaccess.ctl are limited to 1024 characters, anything more than 1024 characters will be ignored.

            The syntax for the allow: and deny: lines are:

            allow: host, host, …

            deny: host, host, …

           If an allow: line is specified, then only the hosts listed in all the allow: lines are allowed ftp access. All other hosts will be refused ftp access. If there is no allow: line, then all hosts will be given ftp access except those hosts specified in the deny: line(s). The host can be specified as either a hostname or IP address.

            The syntax for the readonly:, writeonly: and readwrite: lines is:

            readonly: dirname, dirname, …

               writeonly: dirname, dirname, …

            readwrite: dirname, dirname, …

            The readonly: lines list the read-only directories and the writeonly: lines list the write-only directories. Read access is denied in a write-only directory and write access is denied in a read-only directory. All other directories are granted access except when a readwrite: line is specified. If a readwrite: line is specified, only directories listed in the readwrite: line and/or listed in the readonly: line are granted access for reading, AND only directories listed in the readwrite: line and/or listed in the writeonly: line are granted access for writing. Also, these lines can have a value of “ALL” or “NONE”.

            The syntax for the useronly:, puseronly:, grouponly:, and pgrouponly: lines is:

               useronly: username, username, …

            puseronly: username, username, …

            grouponly: groupname, groupname, …

            pgrouponly: groupname, groupname, …

Although we found that we could control users with this method, we were looking to do more, so we researched vsftpd and were able to install packages from Perzl.org. (I wrote about installing packages from Perzl.org earlier this year.)                           

From this page we found that vsftpd “supports standard FTP and secure FTPS protocols. Built-in mechanisms allow implicit and explicit mode of FTPS. Security is achieved by using of external SSL library, which simplify the source code of the server. An unusual feature is the ability to force anonymous connections through SSL encryption, thus increasing overall security of anonymous file transfers. SSLv1, SSLv2 and TLS protocols are provided. Optionally validation of client certificates can be configured. The access of users can be controlled by deny and enable lists. The server can be configured to generate detailed activity logs – the log format may be verbose or compatible with wu-ftpd format.”

In our case we edited the configuration file as follows:

            anonymous_enable=NO

            local_enable=YES

            ftpd_banner=”FTP Access”

            local_root=/tmp/transferfiles

            write_enable=YES

            secure_chroot_dir=/home/jail

            idle_session_timeout=3600

            file_open_mode=0777

            local_umask=022

This provided the functionality we were looking for.

Finally, some recent conversation from @rmcnelly on Twitter:

Chris Gibson ‏@cgibbo New VIOS tunables with v2.2.2.2.
https://www.ibm.com/developerworks/community/blogs/cgaix/entry/new_vios_tunables_with_v2_2_2_2?lang=en … #VIOS #AIX #PowerVM

Rob McNelly ‏@robmcnelly 21 Sep
Is string theory right? Is it just fantasy? Out of touch with reality?
http://www.youtube.com/watch?v=2rjbtsX7twc

Nigel Griffiths ‏@mr_nmon 19 Sep
FAQ4: Hostnames short or long? The answer is long and mandatory and don’t user underscore either See AIXpertBlog
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/faq4_hostnames_short_or_long?lang=en

Chris Gibson ‏@cgibbo 18 Sep
What’s next from #Powersystems? Join us on October 8th to find out! http://www.ibm.com/smarter-computing/us/en/readynow/webcast.html …

Nigel Griffiths ‏@mr_nmon 18 Sep
Enterprise2013 = Power Technical Uni Orlando Oct21-25 New products will be explained SSP4, PowerXX & Power## http://www-03.ibm.com/systems/enterprise/ … CU there

Jay Kruemcke ‏@chromeaix 13 Sep Oracle ASM and IBM #FlashSystem best practices http://ow.ly/oQeDk

Nigel Griffiths ‏@mr_nmon 17 Sep
IBM pledges $1Billion for #Linux & specifically for Linux on POWER see Wall Street Journal blog
http://blogs.wsj.com/digits/2013/09/16/ibm-again-pledges-1-billion-to-a-linux-effort/ … PowerSystems #POWER7

Rob McNelly ‏@robmcnelly 13 Sep
You can get an #IBM badge if you hang around the offices long enough: http://imgur.com/uywR5QB

A Look at IBM Electronic Support

Edit: Some links no longer work.

Originally posted September 17, 2013 on AIXchange

In June, Julie Craft presented to the AIX Virtual User Group on the topic of IBM Electronic Support for AIX and Power Systems. Listen to the replay and learn more about the tools that are available for you to use.

One slide shows what’s covered in the presentation. The discussion areas are titled, “Prevent Problems and Stay Current, “Find Information,” “Download Fixes and Updates,” “Troubleshoot Problems,” “Work with IBM Support” and “Learn More.”

Right off the bat Julie states that IBM’s goal is to make it easy for customers to find what they need to do the work. Basically, IBM wants to make finding information simple enough so that customers don’t have to contact IBM. However, should the need arise, you can easily open the calls electronically.Julie mentions the IBM Support Portal (support.ibm.com). This site is meant to be the starting point of the IBM Support experience. It’s designed to centralize the various products that IBM supports and attempts to make the user experience more consistent.

Once you register, you can login and set up subscriptions, notifications, etc. Then you can select the product list that interest you. The presentation replay covers this in detail so be sure to watch it. You’ll learn about the notifications and alerts — like security advisories and new TLs and SPs — that registered users receive. You’ll also learn about delivery options — including daily or weekly emails or RSS feed.

The presentation continues with a closer look at the Support Portal. The documentation tab includes links to Information Centers, Redbooks, white papers, and more. From the downloads page customers can search by APAR, fix ID, arbitrary text, and you can include prerequisites and co-requisite fixes. IBM Support has updated Fix Central in an effort to make firmware and HMC codes easier to find. There aren’t as many menu items to navigate now.

Another portal feature is an entitlement check that allows customers to download fixes. Just enter your machine type and serial number. The various entitlement types are tied to the level of maintenance you have on your machines. Going forward, IBM will move toward making the capability to download fixes a privilege available primarily to paying customers.

The presentation also covers the fix level recommendation tool (FLRT), which can be used for both health checks that display current system fix levels and compares them to IBM recommendations. This can be used for firmware, software etc. Other Electronic Support options include:

* the capability to save an inventory/load a saved inventory,

* a VIOS to NIM master mapping tool that determines the AIX version needed to use NIM with VIOS, and

* a system software mapping tool that displays minimum support levels for AIX and VIOS. 

In addition, services requests and PMRs can be logged without contacting IBM Support. Check out this video for details, and register for service request here.

Speaking of videos, here’s one about customer replaceable units and performing operations on your hardware.

To summarize, the presentation includes this checklist for submitting AIX problems:

            1. Always check the error logs before you open a problem.
            2. Be clear and precise on the problem description and severity. Indicate exactly the error received (entire error code, LEDs, error report entries). Provide clear description of the problem and your environment – analysts do not know your environment nuances.
            3. Include all OS, fix and patch levels in the PMR with you open it. Include TL and service pack levels.
            4. The three most common data gathering tools are snap, zsnap or perfpmr. Snap comes preinstalled on AIX systems. You may be asked to download the latest versions of zsnap or perfpmr to gather additional data.
            5. Make absolutely sure you follow the naming standard in uploading data. Update the PMR with filename and location of where uploaded.
            6. Be sure to execute any steps given by the support analyst precisely. Deviating from what the analyst asks you introduces new variables into the problem determination and can delay resolution.
            7. Don’t be afraid to ask questions. If something is unclear or you are concerned about doing something the analyst asks you to, speak up.
            8. If asking a ‘how to’ question, explain what you are trying to accomplish.
            9. Utilize resources such as developerWorks, Redbooks, forums, etc., for how-to information.
            10. Follow up! Don’t be afraid to ask for status if you don’t hear back from support after a reasonable amount of time.

This is an another excellent presentation. AIX Virtual User Group monthly webinars are always worth the time. But if you want to save time watching webinar replays, do what I do: Download the files and view them with VLC. Then go into playback and select speed/faster. Although the presenters might sound funny, I don’t find it difficult to keep up with what’s being said, and I do save time.

Finally, some recent posts and re-tweets from my Twitter feed (@robmcnelly):

            Are you creative genius behind the #NextPowerApp? Submit your idea to win an iPad: http://bit.ly/13R939f

            How to install #PowerVM VIO Server from the HMC GUI #powersystems

            #PowerSystems Technical Universities Orlando Oct 21st  … (within Enterprise1023) & Athens Nov 5th  …

            RT @cgibbo RT @AIXDownUnder: Script to create MKSYSB backups for all NIM clients & keep 2 versions on hand #AIX

            Are you a Walnut Guru?? Have you used brown? 

Advice on HMC Connections

Edit: Some links no longer work.

Originally posted September 10, 2013 on AIXchange

If you connect a managed system to an HMC and it isn’t recognized, how can you troubleshoot the problem?

Well, it depends. What about the connection? Is your HMC behaving as a DHCP server, or have you assigned static IP addresses in your environment?

This interesting blog post offers some tips and tricks, although the IP addresses they recommended didn’t work for my customer.

The post concludes with this advice:

            “If you have had both HMC-1 and HMC-2 connections possibly taken off their default IP addresses, one way that comes to mind is a ‘sniffer’ utility like WireShark. You can attach your laptop to one of the HMC connections and in a short amount of time determine the IP address of the system connection. If you do work like this, you should be familiar with tools of this type, and be prepared to use them in the case of an unknown IP address assignment.”

Here’s a nice document about accessing ASMI. If you’re dealing with a Power Systems Model 720 or 740, try this Redbook. Section 2.14.2 examines HMC connectivity; 2.14.3 looks at HMC high availability.

If the IP addresses are unknown or not working (perhaps the system was previously set up with static addresses but is now being connected to a new HMC), this video with Brian Smith could help. He discusses resetting an ASMI password and explains how to use the front panel to get the HMC IP addresses. Starting around 1:30, he shows how you can use the arrows on the front of the machine.This document, called “Managing the Control Panel Functions,” can also serve as a reference guide. Page 14 shows what is displayed when performing function 30 from getting in manual mode:

            “Accessing the control panel functions using the physical control panel: The control panel functions correspond to function numbers on the control panel. To activate a control panel function, do the following:                        

  1. Select a function number by pressing the Increment (↑) or Decrement (↓) button on the control panel.

        2. To activate the function, press Enter on the control panel.

         Putting the physical control panel in manual operating mode: You must first put the physical control panel in manual operating mode before you can select or activate certain functions. To put the physical control panel in manual operating mode, do the following:

                                    1.Use the Increment button to scroll to function 02.
                                        02______________________________
                                    2.Press Enter to start function 02.
                                    3.Press Enter again to move to the second character on the function 02 menu. The current system operating mode is displayed with a pointer, as shown in the following example:
                                       02__B__N<___________________P___
                                    4.Use the Increment button to scroll through the system operating modes, and select M for manual, as shown in the following example:
                                       02__B__M<___________________P___
                                    5.Press Enter to select the system operating mode.
                                    6.Press Enter again to exit function 02.                       

            “The control panel is in manual operating mode”

Once you know the HMC IP address, it’s trivial to connect to it, login to ASMI and change the values to whatever is needed to establish communications with the HMC.

On another note, I wanted to try an experiment and include a few tweets that I thought were interesting.  If you are already following @robmcnelly then some of these might be repeats, but hopefully the rest of you will find this information useful as well.

RT @AIXmag RT @kristijan: New blog post: AIX boot hangs with HMC 2700 LED code

RT @cgibbo RT @chmod666: Activating resources temporarily by using On/Off CoD 

#IBM #PowerHA v7.1.2 for AIX Enprise Edition support for EMC SRDF

AIX, NIM Make System Restoration Easier

Edit: Some links no longer work.

Originally posted September 3, 2013 on AIXchange

Recently I received this query:

            “I wonder if you have a basic tutorial describing the steps needed to restore an AIX server. I am thinking of a scenario similar to the following:

            “You have an AIX server with some applications and/or databases running on it. A disaster occurs, and you need to restore the server. I have access to a backup taken by a TSM server. So, what I think that could be done is the following:

            1. Insert the AIX DVD in the server.
            2. Using the HMC console Boot in SMS mode (press F1 or ESC 1 several times when the IBM banner is show on screen).
            3. Choose the DVD as boot device.
            4. Boot the server from DVD.
            5. Select the disk to do the OS install.
            6. Select the packages to be installed (do a basic install).
            7. Once the OS is installed, the server will reboot.
            8. Configure the root password, the network and several basic things.
            9. Access the server from the network to Install TSM client and configure it.
            10. Restore the backup from TSM.

            “What do you think?”

What I think is that while this might be a reasonable way to restore a system that runs another OS, you needn’t go to all this trouble with AIX. I recommend using the mksysb command, which “creates a backup of the operating system (that is, the root volume group). You can use this backup to reinstall a system to its original state if it is corrupted. If you create the backup on tape or UDFS capable media, the backup is bootable and includes the installation programs needed to install from the backup.”

Even though many of you are familiar with mksysb, I wanted to post this question to make a point: Lots of people are new to AIX these days. For the most part, they’re been working in UNIX environments, and then through employer acquisition/job change or what have you, they’re suddenly charged with maintaining AIX systems. They might not realize that NIM and mksysb images are even options. Fortunately, they can seek advice and access others resources to learn more about the platform.

Now, back to the question. I would first make sure I had a good, recent mksysb. This allows you to skip step 1. In step 2, when I was in SMS mode, I’d make sure the networking information allowed for booting from the NIM server. Boot from the NIM; don’t use AIX media. Once the restore was completed, I could skip steps 6-9 and proceed to step 10.

The other advantage with NIM is that it gives me a clone of the system. Obviously you don’t want to have to reload the system, and try and recall all the packages and settings that you had installed, in the aftermath of a disaster. Make sure you are taking your backups. You never know when you might really need it.

For additional information on this topic, there’s this good article by Jaqui Lynch. I’ve also covered NIM on this blog (here and here). Can you think of any circumstances that would require you to completely rebuild an AIX machine from scratch in a disaster situation?

BYOD’s Slippery Slope

Edit: I did not even mention the cost of some of this hardware. Some links no longer work.

Originally posted August 27, 2013 on AIXchange

We’re starting to hear more about BYOD — that’s bring your own device to work:            

“Some believe that BYOD may help employees be more productive. Others say it increases employee morale and convenience by using their own devices and makes the company look like a flexible and attractive employer.”

Having been around long enough to recall the days when employers routinely provided phones and pagers, I understand the benefits of BYOD. However, I can also see some serious issues. What if your smartphone, tablet or laptop is lost or stolen? I do not expect to see your employer offer replace your devices, and since you rely on them for work, you are going to be solely responsible for the loss. Who provides tech support? And what about the data on these devices? Who does that belong to?

Of course, things do happen. One of my sons managed to drop an iPod on an airplane. It slid around the floor of the cabin, someone picked it up and pocketed it, and we never saw it again. Apple was certainly no help. We had the serial number, and you’d think Apple could monitor that were the thief to, say, connect to an iTunes account. But that’s not something they do.

Some people do manage on their own to recover stolen devices. This guy used open source software to locate his stolen laptop and phone. According to this article, you might also have some luck if gmail or Dropbox is still running and updating the IP address information.

However, a savvy thief will simply wipe a pilfered device and remove the tracking software.

Other solutions are works in progress. This article notes that some law enforcement officials support the creation of “kill switches” that would render smartphones inoperable after they are stolen:

            “To drive home their point about the danger of violent smartphone thefts, authorities introduced relatives of 23-year-old Megan Boken, who was shot and killed in St. Louis in 2012 by an assailant who was trying to steal her iPhone.”

Others advocate for the creation of a database of stolen devices. This “blacklist” would allow mobile providers to refuse service on devices that are reported stolen. However, critics point out that thieves could get around this by altering the International Mobile Equipment Identity (IMEI) number.As technologists, you’d think we would be able to develop an elegant solution to these problems, but so far that seems to have eluded us. As much as I love my technology, I’d prefer not to put a target on my back when I use my smartphone.

Power Systems Experts Weigh In

Edit: Some links no longer work.

Originally posted August 20, 2013 on AIXchange

I wish I’d been in Manchester and London last month. Apparently I missed out on a great Power Systems event.

            “The fifth POWER Ask the Experts is a one day customer technical event in the U.K…. It proved to be a very popular free event. We had well over a 100 people attend which was a mixture of customers, a few business partners and some IBMers.”

Even though I wasn’t there, I can at least download the slides. Pat O’Rourke gave a Power Systems update, Nigel Griffiths presented performance best practices with POWER7, Gareth Coates presented hands-on tricks of the Power masters, and David Spurway gave a cost comparison between IBM Power and Intel servers. The finale was an NDA session covering Power systems trends and directions, so obviously we don’t have slides for that one.

Although I may see some of this information at this fall’s IBM Power Systems Technical University conference in Orlando, I am sure some unique topics were covered at the U.K. event.

I do encourage you to check out the slides, because some of these tips may be new to you.

* For starters, by logging into your HMC and running any of these four commands, you’ll get detailed information about memory and disk usage, etc.

            monhmc –r mem –n 0

            monhmc –r disk –n 0

            monhmc –r proc –n 0

            monhmc –r swap –n 0

* Another HMC tip concerns disconnecting and reconnecting a managed system from the HMC. Run mksysconn –o auto to clear the connection history on your HMC before reconnecting the managed system. Run lssyscfg –r sys –F name in order to see which managed systems are attached to your HMC.

* To show the vios and vhost for a client VSCSI adapter, run:

            # print “cvai” | kdb | grep vscsi | grep –v read

* Another VIOS tip: Don’t go into oem_setup_env to run commands on your VIO servers. Be sure to check out slide 20, which covers failures with creating system plans that may stem from messing around as root on your VIOS.

* From the same presentation comes the reminder that the $export CLI_DEBUG=33 command provides detailed information about the commands VIOS is running under the covers.

* This lshwres command provides all the WWPN on a system:

            lshwres -r io –rsubtype slotchildren -m Server-9117-MMB-SN101509A –F
            phys_loc,description,mac_address,wwpn,microcode_version |grep Fibre

* The DPO observations are also well worth a read. Here are some useful DPO-related HMC commands:

            lsmemopt –m -o currscore
            lsmemopt –m -o calcscore
            optmem –m -t affinity –o start

* One must be careful with kdb, but if you want to see how many virtual processors are active, enter the following on the command line:

            # echo vpm | kdb

* Finally, there’s a reference to the IBM Redbook, “IBM PowerVM Virtualization Managing and Monitoring.”

Of course there’s much more information than what I’ve shared here. Download the slides and see for yourself.

The OpenPOWER Consortium can take Power in New Directions

Edit: Some links no longer work.

Originally posted August 13, 2013 on AIXchange

When I think of OpenPOWER, I think of the Linux-capable IBM systems unveiled almost 10 years ago. Now though, the name signifies something new after this week’s IBM announcement:

“IBM, Mellanox, NVIDIA and Tyan… announced plans to form the OpenPOWER Consortium – an open development alliance based on IBM’s POWER microprocessor architecture. The Consortium intends to build advanced server, networking, storage and GPU-acceleration technology aimed at delivering more choice, control and flexibility to developers of next-generation, hyperscale and cloud data centers.

“The move makes POWER hardware and software available to open development for the first time as well as making POWER IP licensable to others, greatly expanding the ecosystem of innovators on the platform. The consortium will offer open-source POWER firmware, the software that controls basic chip functions. By doing this, IBM and the consortium can offer unprecedented customization in creating new styles of server hardware for a variety of computing workloads.”

Basically this announcement is a statement of direction: These companies are saying they plan to form the consortium. We should expect to hear more later on once it’s actually up and running.

The idea, as noted in the Wall Street Journal, will be to look at the complete server hardware stack — from the processor to the firmware to the operating system.

From the WSJ article:

“The alliance the companies plan to announce Tuesday would allow many companies to license IBM microprocessor designs—based on a technology dubbed Power—that are now only found in Big Blue’s own server systems. Licensees could incorporate IBM-designed circuitry in their own chips, with members of the alliance working on related products such as servers, networking and storage devices, participants said.

“The effort will start with Power8, a forthcoming member of the chip family that IBM plans to discuss at a technical conference this month.”

Having been virtualizing systems for more than 40 years, IBM has a long history around enterprise servers and virtualization. I look forward to seeing what this consortium comes up with around these proven technologies.

We’ve known for many years about the virtualization flexibility and raw performance available with POWER systems, PowerVM and the hypervisor, and obviously IBM will continue innovating with AIX and IBM i and coming out with new Power server models. However, the capabilities that we’ve taken for granted with these systems may now be available to more IT pros throughout the industry.

Power chips are already in video game consoles, computers in our vehicles and rovers on Mars, in addition to our computer rooms. Who knows where they’ll end up next?

More on the IBM PowerLinux Announcement

Edit: Some links no longer work.

Originally posted August 6, 2013 on AIXchange

As noted at the end of last week’s post, on July 30 IBM made another PowerLinux announcement. Here’s the full IBM press release.

“The PowerLinux 7R4 is the high-end addition to IBM’s line-up of Power Systems PowerLinux servers running industry standard Linux from Red Hat and SUSE. Joining the PowerLinux 7R1 and 7R2 models, the PowerLinux 7R4 delivers a new level of performance with up to 4 sockets and 32 cores — ideal for clients seeking a Linux solution capable of handling compute-intensive workloads including analytics, cognitive computing, database and web infrastructure. The PowerLinux 7R4 takes advantage of the same virtualization, middleware, and applications that are available on all Power Systems running Linux today.

“In addition to IBM DB2 database software for Linux, which offers an average 98 percent compatibility when migrating Oracle Database applications, IBM announced that EnterpriseDB’s enterprise-level PostgreSQL-based database solution is now available on all Power Systems servers running Linux.

“Switching databases has traditionally been costly and risky due to limited application compatibility and lack of comprehensive migration tools and resources. EnterpriseDB’s Postgres Plus Advanced Server and IBM Power Systems solve this problem by providing extensive Oracle compatibility functionality, migration tools and expertise that can deliver significant cost savings while allowing many Oracle based applications to run virtually unchanged,” said Ed Boyajian, President and CEO, EnterpriseDB.

“IBM has participated in a wide range of open source projects since 1999, and today this includes Open Stack, Open Daylight, KVM, Apache and Eclipse in addition to Linux. Hundreds of IBM programmers and engineers around the world are contributing to open source as part of the collection of global open source communities, including experts working on projects such as KVM and hands-on support for clients, IBM Business Partners and software vendors interested in running Linux on Power Systems. In May 2013 IBM opened the world’s first IBM’s Power Systems Linux Center in Beijing, and in June 2013 IBM announced its intention to open two more IBM Power Systems Linux Centers in New York and Austin.”

Here’s the IBM PowerLinux 7R4 announcement letter:

“The IBM PowerLinux 7R4 (8248-L4T) server is a powerful 2-socket or 4-socket server that ships with 16 or 32 fully activated cores and I/O configuration flexibility to meet today’s growth and tomorrow’s processing needs. The server features:

  • Powerful POWER7+ DCM processors that offer 3.5 GHz and 4.0 GHz performance with 16 or 32 fully activated cores
  • Up to 1024 GB of memory
  • Rich I/O options in the system unit: six PCIe 8X Gen2 slots in the system unit; two GX++ slots for I/O drawers; six hard disk drive (HDD)/solid-state drive (SSD) SAS small form factor (SFF) bays and integrated SAS I/O controllers; integrated multifunction card with four Ethernet, two USB, and one serial port; redundant hot-swap ac power supplies in each enclosure; 19-inch rack-mount 5U configuration …

“Without PowerVM, dynamic LPAR allows one partition per processor. With PowerVM , up to 20 partitions are allowed per processor. Logical partitioning is supported when IBM PowerVM for IBM PowerLinux (#EC22) is ordered.

“The backplane can be configured as one set of six bays, two sets of three bays (3/3), or three sets of two bays (2/2/2). Configuration options will vary, depending upon the controller options and the operating system selected. The controllers for the six-bay or 3/3 configurations are always the two pairs of embedded controllers. If the 2/2/2 configuration is used, the two embedded controllers run the first two sets of bays (2/2) and a feature 5901 PCIe SAS adapter located in a PCIe slot in a CEC enclosure controls the third set (2). By having three controllers, you can have three boot drives supporting three partitions.

“The IBM PowerLinux 7R4 (8248-L4T) server is designed with both IBM and customer serviceability in mind. Advancements such as Guiding Light LED architecture are used to control a system of integrated LEDs that lead the individual servicing the machine to the correct part as quickly as possible. With the PowerLinux 7R4 server, you can replace service parts (customer replaceable unit). To do this, the PowerLinux 7R4 server uses Guiding Light LEDs to indicate the parts that need to be replaced. An HMC attached to the PowerLinux 7R4 server enables support personnel (with your authorization) to remotely log in to review error logs and perform remote maintenance if required.

“Concurrent maintenance guided service procedures will continue to be supported by the Repair and Verify (R&V) component of the Service Focal Point application running on the HMC. Repair procedures that are not covered by the guided R&V component are documented and available for display on any web browser-enabled system as well as on the HMC. These procedures are available through the InfoCenter application.”

If you search for IBM 7R4 you will find more analysis. Here are two additional articles.

InformationWeek says:

“Why buy Power when there are more x86 choices? Performance is the differentiator, according to IBM. Multi-threaded Java applications, for example, can take advantage of four threads per core instead of the two threads per core on Intel machines. What’s more, Power 7+ series upgrades introduced over the last year include a highly optimized IBM Java Virtual Machine for better Java performance. Finally, the machine has a 2.5 times more cache than competitive Intel machines.

The Register says:

 “… because of the relatively high cost of Power Systems iron, which was marketed to Unix and proprietary customers used to paying a premium for every component in their systems, it was difficult to pitch a Power-based machine against an x86 box and win. So, with the PowerLinux machines, IBM cut its prices to take that issue off the table. And now, IBM can focus the conversation on the performance of Java, database, and analytics workloads and show that a Power7+ alternative can take on a Xeon system and make economic as well as technical sense.”

IBM continues to make Power servers an attractive option for running Linux. As AIX and IBM i cannot run on the 7R1, 7R2 or 7R4, IBM has made the pricing on these systems very competitive when compared with traditional x86 commodity hardware. Take the time to investigate whether Linux on Power makes sense in your environment.

In case you missed it, here is some information from today’s Wall Street Journal about the OpenPOWER Consortium:

“The effort will start with Power8, a forthcoming member of the chip family that IBM plans to discuss at a technical conference this month.”

The IBM news release says the consortium is “an open development alliance based on IBM’s POWER microprocessor architecture. The Consortium intends to build advanced server, networking, storage and GPU-acceleration technology aimed at delivering more choice, control and flexibility to developers of next-generation, hyperscale and cloud data centers.”

IBM Systems Magazine also had an article on the announcement today.

An AIX on Power Performance Primer

Edit: Some links no longer work.

Originally posted July 30, 2013 on AIXchange

Check out this document from IBM’s Dirk Michel, “AIX on Power – Performance FAQ.” It’s only 87 pages, but there’s great information. I encourage you to read it and become familiar with its contents.

Chapter 2 asks and answers the question, “what is performance?”

            “For interactive users, the response time is the time from when the users hits the button to seeing the result displayed. The response time often is seen as a critical aspect of performance because of its potential visibility to end users or customers. The throughput of a computer system is a measure of the amount of work performed by a computer system over the period of time. Examples for throughput are megabytes per second read from a disk, database transactions per minute, megabytes transmitted per second through a network adapter. Throughput and response time are related. In many cases a higher throughput comes at the cost of poorer response or slower response as well as better response time comes at the cost of lower throughput.”

Chapter 4 covers workload estimation and sizing:

            “Some questions to consider before beginning the sizing exercise:
            1. What are the primary metrics, e.g., throughput, latency, that will be used to validate that the system is meeting performance requirements?
            2. Does the workload run at a fairly steady state, or is it bursty, thereby causing spikes in load
on certain system components? Are there specific criteria, e.g., maximum response time that must be met during the peak loads?
            3. What are the average and maximum loads that need to be supported on the various system components, e.g., CPU, memory, network, storage?”

Chapter 5 covers performance concepts along with CPU performance, multiprocessor systems, multithreading, processor virtualization, memory performance, caches, cache coherency, virtual memory, memory affinity, processor affinity and more.

Chapter 6 is an examination of performance analysis and tuning:

            “This chapter covers performance analysis and tuning process from a high level point of view. Its purpose is to provide a guideline and best practice on how to address performance problems using a top down approach. Application performance should be recorded using log files, batch run times or other objective measurements. General system performance should be recorded, and should include as many components of the environment as possible. Before collecting any data or making tuning or configuration changes, define what exactly is slow. A clear definition about what aspect is slow usually helps to shorten the amount of time it takes to resolve a performance problem since a performance analyst gets a better understanding what data to collect and what to look for in the data.”

Section 6.3.4 presents a performance analysis flow chart.

Chapter 7 gives a performance analysis how-to:

            “This chapter is intended to provide information and guidelines on how to address common performance problems seen in the field, as well as tuning recommendations for certain areas. Please note that this chapter is not intended to explain the usage of commands or to explain how to interpret their output.”

Chapter 8 includes frequently asked questions. Here’s one I like:

            “I heard that… should I change…?
            “No, never apply any tuning changes based on information from unofficial channels. Changing
performance tunables should be done based on performance analysis or sizing anticipation.”

Chapter 9 features things you should know about POWER7. Section 9.10 covers virtualization best practices, for example:

            9.10.1 Sizing virtual processors

  •    The number of virtual processors of an individual LPAR should not exceed the number of physical cores in the system
  •    Shared processor pool: the number of virtual processors of an individual LPAR should not exceed the number of physical cores in the shared processor pool

            9.10.2 Entitlement considerations
            Best practice for LPAR entitlement would be to set the LPARs entitlement capacity to its average physical CPU usage and let the peaks addressed by additional uncapped cycles. For example, an LPAR running a workload that has an average physical consumed of 3.5 cores and a peak utilization of 4.5 cores should have 5 virtual processors to handle the peak CPU usage and an entitlement of 3.5.

Chapter 11 covers the AIX Dynamic System Optimizer, and Chapter 12 explains how to report a performance problem using perfpmr.

Obviously there’s far more than I’ve listed here. Read it for yourself and share your thoughts in comments.

Also take a look at today’s PowerLinux announcement: http://www-03.ibm.com/press/us/en/pressrelease/41582.wss

The PowerLinux 7R4 is the high-end addition to IBM’s line-up of Power Systems PowerLinux servers running industry standard Linux from Red Hat and SUSE. Joining the PowerLinux 7R1 and 7R2 models, the PowerLinux 7R4 delivers a new level of performance with up to 4 sockets and 32 cores. “Powerful POWER7+ DCM processors that offer:

 • 3.5 GHz and 4.0 GHz performance with 16 or 32 fully activated cores 

• Up to 1024 GB of memory

• Rich I/O options in the system unit:     

• Six PCIe 8X Gen2 slots in the system unit    

• Two GX++ slots for I/O drawers    

• Six hard disk drive (HDD)/solid-state drive (SSD) SAS small form factor (SFF) bays and integrated SAS I/O controllers    

• Integrated Multifunction Card with four Ethernet, two USB, and one serial port”

Virtualization on Power Resources

Edit: Some links no longer work.

Originally posted July 23, 2013 on AIXchange

Lately I’ve been getting questions about virtual processors and shared processor pools. Here are some resources on this topic that might help.

* In January Rosa Davidson of IBM delivered a great two-part presentation, “Capacity Entitlement and Virtual Processors.” The replays and slides are available here.

Here’s an explanation of virtual processors found in the POWER6 documentation in IBM’s Information Center:

“However, when you install and run an operating system on a logical partition that uses shared processors, the operating system cannot calculate a whole number of operations from the fractional number of processing units that are assigned to the logical partition. The server firmware must therefore represent the processing power available to the operating system as a whole number of processors. This allows the operating system to calculate the number of concurrent operations that it can perform. A virtual processor is a representation of a physical processor to the operating system of a logical partition that uses shared processors. “

* Be sure to look at these IBM Systems Magazine articles on virtual processor folding and shared processor settings:

“Virtual processors are what the operating system thinks it has since it can only relate to whole numbers of processors. And the desired virtual processor value is basically the maximum number of physical processors that an uncapped shared processor partition can use if processor units are available in the shared processor pool.”

* Finally, here’s something that I wrote for IBM Systems Magazine‘s AIX EXTRA e-newsletter:

“… keep in mind you can never use more physical CPUs than virtual CPUs as defined in your LPAR. Even if you allocate one virtual processor to an LPAR and set it to be uncapped, you can’t run more than one physical processor because there would be no other virtual processors available.

“This way, you can limit the LPARs in your shared processor pools even if your LPAR is uncapped and there are 16 processors available in a shared processor pool. You still won’t be able to use more than one physical CPU because you only allocated one virtual CPU.

“A virtual processor can represent from 0.1 to 1 of a physical processor. If you have one virtual processor, the range it can physically consume will never be more than one. If you have three virtual processors, you can use from 0.3 to 3, but never more than three.

“It makes sense, as you’re basically giving your VM the illusion that it’s dealing with a physical processor. If it boots up, and sees three virtual processors, even if it’s running on 0.3 physical processors, it won’t see more than three processors. If it’s running uncapped and wanted to use four physical processors, where would they run if there are only three virtual processors?”

I’m sure more good documentation is available. Feel free to post a comment with any resources you’ve used to get up to speed with virtualization on Power.

UNIX Has 2 Vowels… and That’s About It

Edit: Some links no longer work.

Originally posted July 16, 2013 on AIXchange

While reading up on some of the activities surrounding the 25th anniversary of IBM i, I came across this tweet:

“Celebrating 25 years of vowel conservation”

Aaron’s point is that IBM i administrators — going all the way back to the AS/400 days — use precious few vowels on the command line:

            “The AS/400 operating system is consistent in its presentation and names. Commands have names of up to 10 letters. The commands typically take the form of three letters. For example, to work with active jobs, the command is WRKACTJOB. That’s a single word with no spaces. WRK is the AS/400 abbreviation for ‘work’ and ACT is the abbreviation for ‘active.’ Because the AS/400 is consistent in its naming style, after you know some of the abbreviations, you will be able to guess the names of commands.”

As I recently noted, I worked on the AS/400 back when, and I believe that AIX and IBM i pros have much to offer one another. Nevertheless, I have to stick up for AIX here. Look at these common UNIX commands: lsdev, lsattr, lscfg, chdev. Clearly, we’re not wasting vowels, either.In all seriousness, this methodology is pretty standard across all IBM systems. Check out this Tivoli page, for instance:

            “Vowels are often omitted to shorten the name of a command. Commands are named using two conventions, depending on their provenance…

            “Commands that are inherited from the previous versions of Software Distribution are named using the w+verb+object convention, which matches the way you might think of the action. For example, to import a reference model in Change Manager, you use the wimprmod command. To delete a reference model, you use the wdelrmod command.”

In UNIX you’ll find many commands that are or are nearly vowel-free — cp, rm, ls, awk and ln, just to name a few. And of course the UNIX philosophy permeates AIX:

            1. Small is beautiful.
            2. Make each program do one thing well.
            3. Build a prototype as soon as possible.
            4. Choose portability over efficiency.
            5. Store data in flat text files.
            6. Use software leverage to your advantage.
            7. Use shell scripts to increase leverage and portability.
            8. Avoid captive user interfaces.
            9. Make every program a filter.

            Unix is simple. It just takes a genius to understand its simplicity.
            – Dennis Ritchie 

Although I congratulate IBM i admins on their judicious use of vowels over the past 25 years, they’re not alone in this ongoing effort.

Getting NPIV Info from VIO Servers

Edit: Some links no longer work.

Originally posted July 9, 2013 on AIXchange

Here’s another script from Dean Roswell. This one is for getting NPIV information from VIO servers. If you’re not sure how to set up ssh password-free login for your VIOS, read this. The same document is also referenced in last month’s post featuring Dean’s script that displays information about the frames that are managed by your HMC.

Here’s the latest version of this script. I will ask Dean to post in comments when he makes changes to the tools.

            #!/bin/ksh
            # Created by Dean Rowswell, IBM, May 31, 2013
            # List Virtual and Physical Fibre Channel info for NPIV environments
            #
            # Assumption:
            #    Password-less ssh must be setup from this system to the Virtual
            I/O Server(s)

            VIOS_LIST=”vios1 vios2″
            VIOS_USER=”padmin”

            VER=”1.0″

            # Parameter checks
            if [ ${#*} -ne 0 ]
            then
                    while getopts :vVh:u: PARMS
                    do
                           case $PARMS in
                                    v|V)    echo “This is get_lpar_fcinfo version:
            $VER” ; exit ;;
                        h)    VIOS_LIST=`echo $OPTARG | tr ‘,’ ‘ ‘` ;;
                        u)    VIOS_USER=${OPTARG} ;;
                                     ?)      echo “\nUSAGE:\t$0 [ -v, -V, -h, -u ]”
                                          echo “\t-v or -V will print out the
            version and exit”

                                            echo “\t-h VIOS hostname(s) or IP
            address(es) COMMA SEPARATED to use”
                                            echo “\t-u VIOS userid to use (only
            required if padmin not used)\n”
                            echo “EXAMPLE: get_lpar_fcinfo -h vios1,vios2\n”
                                            exit ;;
                            esac
                    done
            fi

            printf “%-12s %-10s %-27s %-5s %-20s %-12s %-5s %-6s %-6s %-27s\n”
            VIOS VFCHOST# VIOS_SLOT LPAR# LPAR_NAME STATUS PORTS PHYS VIRT
            LPAR_SLOT

            Get_Info_From_VIOS () {
            for VIOS in ${VIOS_LIST}
            do
                ssh ${VIOS_USER}@${VIOS} ioscli ioslevel >/dev/null 2>/dev/null
                if [ $? -ne 0 ]
                then
                    echo “Password-less SSH access to VIOS ${VIOS} with user
            ${VIOS_USER} is not setup\n”
                    continue
                fi

                ssh ${VIOS_USER}@${VIOS} “ioscli lsmap -all -npiv -fmt :” | awk -v
            VIOS=”$VIOS” -F: ‘{printf “%-12s %-10s %-27s %-5d %-20s %-12s %-5d
            %-6s %-6s %-27s\n”, VIOS,$1,$2,$3,$4,$6,$9,$7,$11,$12}’
            done
            }

            Get_Info_From_VIOS | sort -k4

If you’re interested, download the (sanitized) output I saw from a test machine below:

If you test it out, leave a comment to let me know how it worked and/or if you find it useful in your environment.

Note: Now AIX news and information is available to you on the go. IBM Systems Magazine, Power Systems edition, just launched an app for the Apple iPad.

Remote HMC Upgrades, Revisited

Edit: Some links no longer work.

Originally posted July 2, 2013 on AIXchange

I had an old CR2 HMC running version 7.4.0.1 that was managing some POWER7 servers along with an old POWER5 server running version SF240_415 microcode. I wanted to go to the latest (as of this
writing) HMC code version, 7.7.7 SP1.

I immediately wondered if the latest HMC code could manage that older version of POWER5 microcode. Happily, it can, with this version of firmware. It’s also a match for the version of microcode that was running on the POWER7 machines.I went ahead and downloaded the 7.7.7.0 files to my HMC so that I could do the upgrade. This January 2011 post covers what the basics of what I wanted to accomplish.

To get the latest files (as opposed to those referenced in my old post), I used this command:

            getupgfiles -h 170.225.15.40 -u anonymous –passwd ftp -d
            /software/server/hmc/network/v7770

It worked like a charm. It rebooted and I was running 7.7.7.0 — until I received this message:

            lshmc –V
            A connection to the Command Server failed.

I found this technote:

            Problem(Abstract)
            A connection to the Command Server failed.
            Symptom
            hscroot@bldhnethmc01:~> lshmc -v
            connect: Connection refused
            A connection to the Command Server failed.
            Resolving the problem
            Reboot HMC

That made me laugh. I tried the reboot, and had no luck. Then it dawned on me that this was an old HMC. Would 7.7.7 even run on it? And shouldn’t I have looked into that before doing the upgrade?

Although not specific to AIX, this is relevant information:

            Important Notes:
            * Version 7.7.7 is not supported and cannot be installed on HMC models C03, C04 or CR2.
            * If an HMC is used to manage any POWER7 processor based server, the HMC must be a model CR3 or later model rack-mount HMC or C05 or later deskside HMC.

Lucky for me I had a CR3 available. I was able to upgrade that HMC with no problems. Once I was at 7.7.7.0, I wanted to get the latest fixpack, so I went to the updates tab on the HMC and selected UPDATE HMC. This information was helpful:

            To install SP1, do the following:
            a) In the HMC Navigation pane, select Updates.
            b) In the Work pane, click the Update HMC button. The “HMC Install Corrective Service Wizard” panel is displayed.
            c) On the Current HMC Driver Information panel, click Next.
            d) On the Select Service Repository panel, click Remote Server, then click Next.
            e) On the Installation and Configuration Options panel (if using a local FTP server, modify the entries as appropriate for your local FTP server):

                        Remote server type: FTP

                        Remote Server: public.dhe.ibm.com
                        User ID: anonymous
                        Password:
                        Remote directory: /software/server/hmc/updates

            Click Next.
            On the Select Service Package panel, scroll down to HMC_Update_V7R770_SP1.iso , click to select, and click Next.

After clicking on Finish, I received this message:

            Management console corrective service installation in progress. Please wait…
            Corrective service file offload from remote server in progress…”

It took quite a while to download the .iso image, but once that happened, the upgrade completed as expected in around 30 minutes:

            The corrective service file offload was successful. Continuing with
            HMC service installation…
            Verifying Certificate Information
            Authenticating Install Packages
            Installing Packages
            — Installing ptf-req ….
            — Installing RSCT ….
            src-3.1.4.2-13008
            rsct.core.utils-3.1.4.2-13008
            rsct.core-3.1.4.2-13008
            rsct.service-3.5.0.0-1
            rsct.basic-3.1.4.2-13008
            — Installing CSM ….
            csm.core-1.7.1.20-1
            csm.deploy-1.7.1.20-1
            csm_hmc.server-1.7.1.20-1
            csm_hmc.hdwr_svr-7.0-3.4.0
            csm_hmc.client-1.7.1.20-1
            csm.server.hsc-1.7.1.20-1
            — Installing LPARCMD ….
            hsc.lparcmd-3.0.0.1-1
            ln:
            creating symbolic link `/usr/hmcrbin/lsnodeid’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/lsrsrc-api’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/mkrsrc-api’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/rmrsrc-api’
            : File exists
            — Installing InventoryScout ….
            — Installing Pegasus ….
            — Updating baseOS ….
            Corrective service installation was successful.

You’re then prompted to reboot. In my case, that also took a nice long while, but it did eventually come back.

Lessons learned from this experience:

1. Don’t assume anything, even when you’re using a crash and burn test box.

2. Make sure your hardware can support the software you plan to run on it.

And the one lesson relearned: Updating your HMC remotely really is the way to go.

The Stages of Team Building

Edit: Which stage are you in?

Originally posted June 25, 2013 on AIXchange

Are you forming, storming, norming or performing? Or perhaps you’re just wondering what in the world I’m talking about.

What I’m talking about is Tuckman’s stages of group development. I was introduced to it through my sons’ involvement with Scouting. Tuckman’s theory is that every group of people is in one of four stages of team building. When we started Wood Badge training, we were at the forming phase:

            “In the first stage of team building, the forming of the team takes place. The individual’s behavior is driven by a desire to be accepted by the others, and avoid controversy or conflict. Serious issues and feelings are avoided, and people focus on being busy with routines, such as team organization, who does what, when to meet, etc. Individuals are also gathering information and impressions — about each other, and about the scope of the task and how to approach it. This is a comfortable stage to be in, but the avoidance of conflict and threat means that not much actually gets done.”

The next phase, we were told, is marked by high enthusiasm and low skills:

            “Every group will next enter the storming stage in which different ideas compete for consideration. The team addresses issues such as what problems they are really supposed to solve, how they will function independently and together and what leadership model they will accept. Team members open up to each other and confront each others’ ideas and perspectives. In some cases storming can be resolved quickly. In others, the team never leaves this stage. The maturity of some team members usually determines whether the team will ever move out of this stage. Some team members will focus on minutiae to evade real issues.

            “The storming stage is necessary to the growth of the team. It can be contentious, unpleasant and even painful to members of the team who are averse to conflict. Tolerance of each team member and their differences should be emphasized. Without tolerance and patience the team will fail. This phase can become destructive to the team and will lower motivation if allowed to get out of control. Some teams will never develop past this stage.”

The norming stage is described as low enthusiasm and low skill. It can be an unpleasant place to be, and many teams never make it out of this stage:

            “The team manages to have one goal and come to a mutual plan for the team at this stage. Some may have to give up their own ideas and agree with others in order to make the team function. In this stage, all team members take the responsibility and have the ambition to work for the success of the team’s goals.”

            When (or if) a team becomes cohesive and cooperative, that’s the performing stage:

            “It is possible for some teams to reach the performing stage. These high-performing teams are able to function as a unit as they find ways to get the job done smoothly and effectively without inappropriate conflict or the need for external supervision. By this time, they are motivated and knowledgeable. The team members are now competent, autonomous and able to handle the decision-making process without supervision. Dissent is expected and allowed as long as it is channeled through means acceptable to the team. Supervisors of the team during this phase are almost always participative. The team will make most of the necessary decisions. Even the most high-performing teams will revert to earlier stages in certain circumstances. Many long-standing teams go through these cycles many times as they react to changing circumstances. For example, a change in leadership may cause the team to revert to storming as the new people challenge the existing norms and dynamics of the team.”

Of course, performing teams have their own challenges. People leave, new people come in. Things change. Eventually you find yourself back at the storming stage, and trying to move though the cycle again. I know I’ve experienced all the stages in my professional life. I’ve been part of high-performing teams where we all trusted each other, relied on our strengths and compensated for our weaknesses. And I’ve been through turf wars where everyone was looking out for themselves. No one helped anyone and nothing of valued was accomplished.

So how are your teams doing?

Sharing Hardware, and Perspective

Edit: Some links no longer work.

Originally posted June 18, 2013 on AIXchange

A few months back I wrote about IBM i and VIO server, so I was immediately intrigued when a colleague recently pointed me to this document on IBM i virtualization and open storage. I believe this is an updated version of the original. Take the time to give it a read.

Here’s some more good information on IBM i. This document is called the performance capabilities reference:

            “The purpose of this document is to help provide guidance in terms of IBM i operating system performance, capacity planning information, and tips to obtain optimal performance on IBM i operating system.”

I found this pretty interesting. Some of it feels very conservative to me compared what I see in the AIX world. If I wasn’t familiar with VIOS and its benefits, reading this document would honestly make me a bit reluctant to use it in production with IBM i. Despite this — and the fact that this document is intended for IBM i users — I think AIX pros can also benefit from the information. We have Power Systems hardware in common, after all.

Some quick highlights:

* Chapter 2 covers IBM i communications performance.

* Chapter 4 covers internal storage performance. Page 35 has a good chart comparing SSD, SAS and SCSI disk, controllers and enclosures. Here’s one example of information that should interest AIX users. Though the numbers might not match exactly, we can still gain good information here.

* Chapter 5 has details on SAN performance numbers, while chapter 6 covers VIOS and IVM. Section 6.2.2 should give you perspective on why IBM i users would find it scary to move to external storage. Their storage has always been managed internally, going back to the introduction of the AS/400 systems. Now these folks have to trust SAN admins to RAID-protect their disks, and even if the disk is protected, IBM i will report otherwise.

* Section 6.2.3 reminds you to not put LUNs into volume groups when using VIOS. Simply map the LUN directly to the client LPAR. Also remember that with IBM i, you can only have 16 virtual disks on each virtual SCSI adapter.

* Section 6.4 provides examples of virtual SCSI performance. Section 6.7 has a VIO client performance guide, while section 6.8 gives performance observations and tips.* Chapter 7 covers logical partitions and best practices when setting them up. There’s good information about applications running on IBM i, and at the end check out chapter 19 for general performance information tips and techniques. This material is pretty specific to IBM i, however.

Given the audience for this blog, I don’t often write about IBM i. But I do so occasionally, because I believe it’s worthwhile. Tempting as it might be, AIX pros shouldn’t dismiss this topic. It’s that sort of mentality, after all, that’s kept IBM i admins from embracing VIOS. (“VIOS is so much like AIX. Why should I bother with it?”) 

Perhaps 15-20 years ago, that attitude would be acceptable. But, again, we all use the same hardware now. In some environments today, AIX and IBM i run together on the same physical frame. Knowing where everyone is coming from — especially as it pertains to a vital area like system performance — can be very beneficial.

Sizing Power Systems

Edit: Some links no longer work.

Originally posted June 11, 2013 on AIXchange

I attend presentations all the time, and I always appreciate it when I get a copy of the slide decks (which are usually in PowerPoint) afterward. That way I can review them later and refresh my memory as needed. I can also share them with the world, something I recently did with this set of slides.

For me, the next best thing to being at a presentation is being able to watch it online. This is one reason why I’m such a fan of the AIX Virtual User Group — they make presentations available via replay. I wish more presenters, whether they’re at technical conferences or speaking to user groups, would record their work and post it on YouTube or some video site. We’d all benefit from their expertise.

If I don’t see a presentation, either live or recorded, I feel like I’m missing out. Sure, if I’m familiar with the topic, I can generally get up to speed simply by reading the material. But I think it’s very important to be able to actually hear the presenter discuss what’s on the PowerPoint and explain, in his or her own words, why these particular notes or these particular graphics were included.With that backdrop, I want to tell you about a presentation from Jorge L. Navarro Cueva from IBM, who discusses ideas for sizing Power Systems.

No, unfortunately, I didn’t get to see this presentation, but I recommend it just the same. Jorge offers elementary advice for anyone — from beginner to expert — who needs to size Power systems. At only 15 slides, it’s a relatively quick read, but I figure many of you may benefit from reviewing some of the concepts he covers. This list of topics is found on page 3 of the slide deck:

            1. Understand the performance metrics.
            2. Know the most used performance benchmarks.
            3. Don’t get obfuscated by benchmarketing.
            4. Variability is your worst enemy.
            5. Size the true peak load.
            6. Avoid the “what is the peak” pitfall.
            7. Be aware of the consequences of undersizing.
            8. Design a balanced system.
            9. Garbage In, Garbage Out.
            10. Master the sizing tool.

Slide 5 offers a sound reminder: App 1 may need four cores and App 2 may need four cores, but the two apps don’t necessarily need the four cores at the same time.

Slide 6 makes a good point: Your intervals may not give you enough detail to properly size a system. Or they may actually provide too much detail.

Given the two desks on slide 8, I can see why one might have a longer wait to get service from Desk A than Desk B.

Jorge’s presentation comes in two sets of slides. His email is listed on the first slide in each set, so if you have questions, you should contact him directly. If you do correspond with Jorge, I hope you’ll post your questions and his answers in comments section. (However, be sure you get his permission before doing so.) This additional information would certainly be useful for others who will come across this post in the future.

Generating HMC and LPAR Info

Edit: Some links no longer work.

Originally posted June 4, 2013 on AIXchange

Dean Roswell sent over another handy script that you should add to your virtual bag of tricks. Dean’s latest script (version 1.4 as of this writing) provides a quick list of information about the HMC and the LPARs running on it.First, set up your ssh client so you can connect without a password between your HMC and the LPAR you’ll run the script on.

If you aren’t sure how to set this up, you should be able to find help through a web search. I created my id_rsa.pub file after viewing this document. It lists these steps:

            To enable scripts to run unattended between an SSH client and an HMC, do the following:

            Open the Remote Command Execution task from the HMC Management work pane.
            From the Remote Command Execution window, select Enable remote command execution using the ssh facility.
            Create an HMC user with one of the following roles:

                        Super administrator (hmcsuperadmin)
                        Service representative (hmcservicerep)
            On the client’s operating system, run the SSH protocol key generator.
            To run the SSH protocol key generator, do the following:
            To store the keys, create a directory named $HOME/.ssh (either RSA or DSA keys can be used).
            To generate public and private keys, run the following command: ssh-keygen -t rsa
            The following files are created in the $HOME/.ssh directory:
                        private key: id_rsa
                        public key: id_rsa.pub
            The write bits for both group and other are turned off. Ensure that the private key has a permission of 600.

Once this was complete, I copied over my file using:

            mykey=’cat $HOME/.ssh/id_rsa.pub’
            ssh hmc.domain.com -l hmcuser mkauthkeys -a \”$mykey\”

Then I copied over Dean’s script, making sure it was executable. I changed the HMC_LIST variable to match an HMC in my environment:

#!/bin/ksh

# Created by Dean Rowswell, IBM, March 20, 2013

# Modified by Dean Rowswell, IBM, April 24, 2013

#    Calculate the USED Processor and Memory values

# Modified by Dean Rowswell, IBM, May 7, 2013 – Version 1.0

#    Display Memory and Processor config for each LPAR

#    Accept parameters for the HMC(s) and HMC user to use

#    Correctly determine HMC information for Version 7.3.5

#    Ignore mem_mode for POWER5 servers

# Modified by Dean Rowswell, IBM, May 9, 2013 – Version 1.1

#    Calculate the LPAR totals for Memory, Processor Entitlement and Virtual Processors

# Modified by Dean Rowswell, IBM, May 9, 2013 – Version 1.2

#    Skip any HMC which does not have password-less ssh setup

# Modified by Dean Rowswell, IBM, May 10, 2013 – Version 1.3

#    Remove the G in the LPAR memory column and add GB label to header

#    Calculate the Entitlement to Virtual Processor ratio for each LPAR and overall system

# Modified by Dean Rowswell, IBM, May 10, 2013 – Version 1.4

#    Fixed bug with divide by zero error if LPAR is in the Not_Activated state and the Virtual Processor value is 0

# List HMC, POWER server, and LPAR info using the HMC

#

# Assumption:

#    Password-less ssh must be setup from this system to the HMC(s) in the HMC_LIST variable

HMC_LIST=”hmc1 hmc2″

HMC_USER=”hscroot”

VER=”1.4″

# Parameter checks

if [ ${#*} -ne 0 ]

then

        while getopts :vVh:u: PARMS

        do

                case $PARMS in

                        v|V)    echo “This is get_lpar_info version: $VER” ; exit ;;

                     h)     HMC_LIST=`echo $OPTARG | tr ‘,’ ‘ ‘` ;;

                     u)     HMC_USER=${OPTARG} ;;

                        ?)      echo “\nUSAGE:\t$0 [ -v, -V, -h, -u ]”

                                echo “\t-v or -V will print out the version and exit”

                                echo “\t-h HMC hostname(s) or IP address(es) COMMA SEPARATED to use”

                                echo “\t-u HMC userid to use (only required if hscroot not used)\n”

                           echo “EXAMPLE: get_lpar_info -h hmc1,hmc2\n”

                                exit ;;

                esac

        done

fi

for HMC in ${HMC_LIST}

do

       ssh ${HMC_USER}@${HMC} date >/dev/null 2>/dev/null

       if [ $? -ne 0 ]

       then

              echo “\nPassword-less SSH access to HMC ${HMC} with user ${HMC_USER} is not setup\n”

              continue

       fi

       echo “\n=================================”

       echo “HARDWARE MANAGEMENT CONSOLE”

       echo “Hostname: ${HMC} / \c”

       ssh ${HMC_USER}@${HMC} “lshmc -v | grep -E ‘TM|SE|RM'” | sed ‘s/eserver xSeries 336 -\[//g’ | sed ‘s/]-//g’ | tr -s ‘\n’ ‘ ‘ | awk ‘

       {MODEL = $2 ; SERIAL = $4 ; VERSION = $6};

       END { print “Model: ” MODEL “\nSerial: ” SERIAL ” / Ver: ” VERSION}’

       echo “`date`”

       echo “=================================”

       MANAGEDSYS=`ssh ${HMC_USER}@${HMC} “lssyscfg -r sys -F type_model*serial_num|sort”`

       for SYSTEM in ${MANAGEDSYS}

       do

              echo “\nIBM POWER SYSTEM: ${SYSTEM} / SysFW Ver: \c”

              ssh ${HMC_USER}@${HMC} “lslic -m ${SYSTEM} -F ecnumber:activated_level|sed ‘s/:/_/g’|cut -c 3-“|tr -s ‘\n’ ‘ ‘

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r proc –level sys -F installed_sys_proc_units:configurable_sys_proc_units:curr_avail_sys_proc_units”|awk -F: ‘

              {INSTALL = $1 ; CONFIG = $2 ; AVAIL = $3};

              END { print “\n   PROC INFO:\t” INSTALL ” Installed / ” CONFIG ” Configurable / ” CONFIG-AVAIL ” Used / ” AVAIL ” Available “}’

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level sys -F installed_sys_mem:configurable_sys_mem:curr_avail_sys_mem:sys_firmware_mem:mem_region_size” |awk -F: ‘

              {INSTALL = $1 ; CONFIG = $2 ; AVAIL = $3 ; SYSFW = $4 ; LMB = $5};

              END { print ”   MEM INFO:\t” INSTALL/1024 ” GB Install / ” CONFIG/1024 ” GB Config / ” (CONFIG-AVAIL)/1024 ” GB Used / ” AVAIL/1024 ” GB Avail / ” SYSFW/1024 ” GB SysFW / ” LMB ” MB LMB”}’

              echo ”   LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)\n   ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP”

              Get_LPAR_Info() {

                     LPARS=`ssh ${HMC_USER}@${HMC} “lssyscfg -r lpar -m ${SYSTEM} -F lpar_id:name:lpar_env:os_version:state|sed ‘s/ /_/g’|sort -n”`

                     for LPAR in ${LPARS}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              PROC=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r proc –level lpar -F lpar_id:curr_proc_mode:curr_sharing_mode:curr_shared_proc_pool_id:run_proc_units:run_procs:run_uncap_weight|sort -n”`

              for LPAR in ${PROC}

              do

                     printf ”      %-24s\n” ${LPAR}

              done

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:mem_mode:run_mem” >/dev/null 2>/dev/null

              if [ $? -eq 0 ]

              then

                     MEM=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:mem_mode:run_mem|sort -n”`

                     for LPAR in ${MEM}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              else

                     MEM=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:run_mem|sort -n”`

                     for LPAR in ${MEM}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              fi

              }

              Get_LPAR_Info | sort -n | awk -F: ‘{

              if (NF == 5) { LPAR_ID=$1; LPAR_NAME=$2; OS_TYPE=$3; OS_VER=$4; STATE=$5 }

              if (NF == 3) { MEM_MODE=$2; MEM=$3 }

              if (NF == 2) { MEM_MODE=”NA”; MEM=$2 }

              if (NF == 7) { PROC_MODE=$2; SHARE_MODE=$3; SHARED_POOL=$4; PROC_UNITS=$5; VIRT_PROC=$6; WEIGHT=$7 }

              if ((length(LPAR_ID) != 0 && length(MEM_MODE) !=0 && length(PROC_MODE) != 0)) {

              if (VIRT_PROC == 0) { RATIO = “NA”  } else { RATIO = PROC_UNITS/VIRT_PROC}

              printf ”   %3d %-20s %-9s %-24s %-13s %5.1f %-8s %-7s %-17s %-3d %-4.2f %3d %3d %5.2f\n”, LPAR_ID, LPAR_NAME, OS_TYPE, OS_VER, STATE, MEM/1024, MEM_MODE, PROC_MODE, SHARE_MODE, SHARED_POOL, PROC_UNITS, VIRT_PROC, WEIGHT, RATIO; TOTAL_MEM += MEM; TOTAL_PROC_UNITS += PROC_UNITS; TOTAL_VIRT_PROC += VIRT_PROC ; LPAR_ID=””; MEM_MODE=””; MEM=””; PROC_MODE=”” }

              } END {print ”    —————————————————————————————————————————————–” ; printf ”       LPAR TOTALS %63.1f %43.2f %3d %9.2f\n”, TOTAL_MEM/1024, TOTAL_PROC_UNITS, TOTAL_VIRT_PROC, TOTAL_PROC_UNITS/TOTAL_VIRT_PROC}’

       done

done


In my environment I got the following output, some of which is masked to protect the identity of the customer that allowed me to run this. Of course the output looks best in a format wide enough to support all of the columns. Here, most of the lines are wrapped. A few spaces are added so you can see the different entries.

=================================

HARDWARE MANAGEMENT CONSOLE

Hostname: HMC1 / Model: 7042-CR4

Serial: 123456B / Ver: V7R7.7.0.2

Sun Jun  2 15:23:37 CDT 2013

=================================

IBM POWER SYSTEM: 8202-E4B*12345CP / SysFW Ver: AL720_108

PROC INFO:   4.0 Installed / 4.0 Configurable / 2.9 Used / 1.1 Available

MEM INFO:    64 GB Install / 64 GB Config / 32.75 GB Used / 31.25 GB Avail / 2.25 GB SysFW / 256 MB LMB

LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)

ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP

1 vios1              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         3.2 ded      shared  uncap             0   0.50   4 128  0.12

2 lpar1               aixlinux  AIX_5.3_5300-12-05-1140  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

3 vios2              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

4 vios3              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

5 nim                 aixlinux  Unknown                  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

6 db2                 aixlinux  Unknown                  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

7 lpar2               aixlinux  Unknown                  Running         4.0 ded      shared  uncap             0   0.20   2 128  0.10

8 vios3               aixlinux  Unknown                  Running         2.0 ded      shared  uncap             0   0.20   2 128  0.10

—————————————————————————————————————————————–

LPAR TOTALS                                                            30.5                                        2.90  28      0.10

IBM POWER SYSTEM: 9111-520*12346A / SysFW Ver: SF240_415

   PROC INFO:   2.0 Installed / 2.0 Configurable / 2 Used / 0.0 Available

   MEM INFO:    4 GB Install / 4 GB Config / 4 GB Used / 0 GB Avail / 0.34375 GB SysFW / 16 MB LMB

   LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)

   ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP

 1 lpar1           aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 2 demo            aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 3 test            aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 4 lpar2           aixlinux  Unknown                  Running         3.7 ded      ded     share_idle_procs  0   0.00   2   0  0.00

 5 lpar3           aixlinux  Unknown                  Not_Activated   0.0 ded      shared  uncap             0   0.00   0   0  0.00

 6 lpar4           aixlinux  Unknown                  Not_Activated   0.0 ded      shared  uncap             0   0.00   0   0  0.00

—————————————————————————————————————————————–

LPAR TOTALS                                                             3.7                                        0.00   2      0.00

If you want to see how both scripts look with the proper spacing, check out the attached PDFs below.

Let me know how the script works for you. And many thanks to Dean for giving permission to post it here.

VIOS Installation via GUI

Edit: Some links no longer work.

Originally posted May 28, 2013 on AIXchange

When you build a VIO server on HMC 7.7.7.0 SP1, there’s a new option to make your life easier.

Check out the HMC readme:

“Add a GUI enhancement for the installation of VIOS, allowing the user to install the Virtual I/O     Server and managing Virtual I/O Server images using a GUI interface.”

The readme also points to future support (delivered by service pack) for importing VIO server images via FTP.

This new functionality makes it easier for “non-AIX” people to install VIOS. It seems particularly helpful if you’re dealing with one of the smaller server models with a split backplane, since these models make it physically impossible to load the VIO server by attaching the internal system DVD to both sets of internal disks.

In any event, as soon as I learned of this option, I naturally wanted to try it out. So I first verified my HMC version:

I then created a VIO partition as I typically would, but the first time I activated my new VIO server I saw something new:

I selected yes to install the VIO server, and I got some new options:

It will assume you’re booting over the network since your install images will reside on your HMC, and you’ll see:

Be sure you don’t have any open and connected console windows, because the subsequent error message doesn’t mention the console. It only says your network adapter cannot be detected, and you won’t be able to activate your VIO partition. 

This is a totally different way to install. It’s not necessary to select SMS or open a console window as you might have done in the past. Of course you can still use the old installation method if you prefer.

On the next screen you can specify where you’re installing VIOS from. In my case I was installing from DVD, but remember that this DVD is now physically located in the HMC:

I entered the IP address, subnet mask and gateway as requested. When I selected OK, I got this screen:

The installation process started to run the commands under the covers, using my HMC as a network installation server:

After it copied the DVD information from DVD1, I got:

I then loaded the second VIO install DVD into the HMC, and processing began:

After it had copied both DVDs, flashing messages appeared about powering up the profile, doing ping tests, etc. The messages scroll by in the window on your screen. Unfortunately, the newest messages aren’t written to your screen, they’re located at the bottom of a log file, so you’ll have manually scroll down to view them. Expect to be annoyed whenever new messages pop up. Alternatively, you can ssh into the HMC and check /var/log/nimol.log, but it would be nice if the messages appeared in a friendly way on the screen.

In the log file I saw quite a few entries, this is just a taste of what you will find:
    ioserver nimol: ,info=LED 610: mount  -r 10.44.3.108:/extra/default1/SPOT/usr /SPOT/usr
    ioserver nimol: ,-S,booting,ioserver
    ioserver nimol: ,info=LED 610: mount  10.44.3.108:/extra/default1/mksysb /NIM_BOS_IMAGE
    ioserver nimol: ,info=LED 610: mount  10.44.3.1.08:/extra/default1/bosinst.data /NIM_BOSINST_DATA
    ioserver nimol: ,info=LED 610: mount  10.44.3.108:/extra/default1/lpp_source /SPOT/usr/sys/inst.images,
    ioserver nimol: ,info=extract_data_files
    ioserver nimol: ,info=query_disks
    ioserver nimol: ,info=extract_diskette_data
    ioserver nimol: ,info=setting_console
    ioserver nimol: ,info=initialization
    ioserver nimol: ,info=verifying_data_files
    ioserver nimol: ,info=prompting_for_data_at_console
    ioserver nimol: ,info=BOS install 1% complete : Making boot logical volume.
    ioserver nimol: ,info=BOS install 2% complete : Making paging logical volumes.
    ioserver nimol: ,info=BOS install 3% complete : Making logical volumes.
    ioserver nimol: ,info=BOS install 4% complete : Forming the jfs log.
    ioserver nimol: ,info=BOS install 5% complete : Making file systems.
    ioserver nimol: ,info=BOS install 6% complete : Mounting file systems.
    ioserver nimol: ,info=BOS install 7% complete
    ioserver nimol: ,info=BOS install 7% complete : Restoring base operating system.
    ioserver nimol: ,info=BOS install 7% complete : 0% of mksysb data restored.
    …Skipping…
    ioserver nimol: ,info=BOS install 89% complete
    ioserver nimol: ,info=BOS install 89% complete : Initializing dump device.
    ioserver nimol: ,info=recover_device_attributes
    ioserver nimol: ,-R,success
    ioserver nimol: ,info=BOS install 89% complete : Network Install Manager customization.
    ioserver nimol: ,info=BOS install 90% complete : Creating boot image.

In one instance my network guys hadn’t set up the switch to ensure that the port was on the right VLAN, so my ping test between the new VIO server and the HMC failed. If this happens to you, you won’t be able to simply return to the screen where you entered the IP information. You’ll have to start over. So make sure your physical network is ready to go when you try this.

For a second test, instead of selecting the DVD, I tried the local repository option:

When I selected local repository and clicked on import, I got this screen:

I gave the server image file a name and clicked OK:

After DVD1 finished, it asked for DVD2:

I clicked OK once more. There was no indication that the second DVD was being read, but something must have been processed, because after awhile it returned to my install screen:

Again I filled in my network information and clicked OK. Instead of reading from the DVD in the HMC, it was reading directly from the HMC’s local disk repository, much like the way VIO servers read from the virtual media repository that we can create on our VIO servers. Obviously using the disk image in the repository was much faster than using the DVD media, and I didn’t need to physically remove DVD1 and insert DVD2. This made subsequent installs less painful, as no one needed to visit the raised floor.

You’ll see status updates in the bar in the upper left, like:
    Network booting install adapter
    Network boot proceeding
    Starting installation
    Installation in progress
    Installation completed

At this point, the progress bar keeps going to the right as the server installs.

I didn’t like that it just took control and started installing to a disk without allowing operator intervention. This could be dangerous if disks you don’t want overwritten simply appear on your VIO server. For instance, if I had data on hdisk0 and wanted to install specifically to hdisk1, I didn’t find an intuitive way to specify that I wanted to use hdisk1.

Once the install was running I opened my console window and was able to watch it install as I would normally expect to. At the same time I was logged into my HMC running tail –f /var/log/nimol.log so I could view the install progress.

I’ll write more as I continue to experiment with this.

So have you tried this method? Did you know this new functionality even existed?

The Value of an Open Mind

Edit: I still want to go back to Gilwell, happy land.

Originally posted May 21, 2013 on AIXchange

I’ve learned some interesting lessons about attitude lately. My sons participate in Boy Scouts. I was involved with Scouting at their age, and while I enjoyed the campouts and other outdoor activities, I never worried about rank advancements. And I certainly didn’t care much for uniforms.

In a way I’m more invested in Scouting now, thanks to my boys. Early on I would fill in for other adult leaders during campouts and other activities. But even though I was helping out by showing up and providing the necessary minimum two-deep leadership, I didn’t do much beyond serving as a chaperone.

For some time I was encouraged to attend a training program called Wood Badge. When I first heard about it, I imagined a bunch of gung-ho Scout dads roughing it in the woods, tying knots and climbing trees. That first impression — the one concocted entirely in my mind — made me hesitate. I was already a father. What sort of week-long training did I need to look after a bunch of kids?

Eventually, I relented and took the training, but this didn’t immediately change my attitude. I was surly when I arrived, still questioning in my mind the necessity of this experience. However, it didn’t take long for my preconceived notions to transform.

For one thing, many of the Wood Badge participants were moms. They led Cub Scout packs and were also wanting to learn how to be more effective leaders to the boys they work with. Even the fact that some of us were older and out of shape made me look at things differently — none of these people really lived up to my image of a “super scouter.” Although I went into it as someone who was baffled by the skits and songs and general silliness, by the end of the training I was enjoying the interaction, and looking forward to singing about Gilwell and Happy Land.

This is the course description found on Wikipedia:

            “Wood Badge is a Scouting leadership program and the related award for adult leaders in the programs of Scout associations throughout the world. Wood Badge courses aim to make Scouters better leaders by teaching advanced leadership skills, and by creating a bond and commitment to the Scout movement. Courses generally have a combined classroom and practical outdoors-based phase followed by a Wood Badge ticket, also known as the project phase. By ‘working the ticket,’ participants put their newly gained experience into practice to attain ticket goals aiding the Scouting movement.

            “On completion of the course, participants are awarded the Wood Badge beads to recognize significant achievement in leadership and direct service to young people. The pair of small wooden beads, one on each end of a leather thong (string), is worn around the neck as part of the Scout uniform.”

Admittedly, those words don’t make the Wood Badge course seem very interesting, but I now know from experience. When you’re out in the woods and actually camping and practicing outdoor skills, the instruction holds quite a bit of value. Honestly, the training was outstanding. Wood Badge training consists of six 16-hour days, covering skills like listening, communication, team building and dealing with change. There are games and physical activities that help you learn to work as a team to solve problems. These are all skills that certainly apply to Boy Scouts, but they’re also applicable to our jobs and our personal interactions.

Many others who took the course were as pleasantly surprised as I was. Some said it was much more effective than corporate training they’d been through. By the end I had to agree. We were all strangers coming in, but we bonded more than you’d think possible. And we all wanted to be better Scout leaders and get more involved in the program.

My one regret is I didn’t do it sooner. I wasted years by being so close-minded.

So does this experience have anything to do with working on IBM Power Systems? I think so. Obviously, it showed me the value of a good attitude. It also demonstrated, once again, that as much as you can learn by reading, nothing beats hands-on training.

I’m now working on my Wood Badge tickets, and hoping to someday return to Happy Land. So what are you working on? What goals have you set for yourself? How do you plan on accomplishing them?

A Big Step Forward in Storage

Edit: Some links no longer work.

Originally posted May 14, 2013 on AIXchange

As a consultant I get to play with some cool, cutting-edge technologies. However, I have yet to get my hands on a half-petabyte storage array, consisting of only flash drives:

            “On the 12-hour flight from Zurich to San Francisco, the two scientists plotted out the fastest way to install and setup the two racks — each filled with 240 terabytes of Flash provided by Texas Memory Systems (an acquisition IBM completed in October 2012), as well as 10 IBM Power 730 Express servers.

            “‘This demonstration marks a tipping point for transactional workloads. It’s the first time Flash storage has outperformed hard disks in all aspects, including capacity and performance density, and cost per Input/Output Operations Per Second (IOPS) and energy efficiency,’ Ioannis said.

            “By the numbers, the two achieved a remarkable feat: the IBM Flash System 820 achieved more than 6 million IOPS running an IBM DB2 workload on IBM Power servers.

            “‘In terms of energy our system runs on 19 kilowatts compared to 4.5 megawatts with high capacity hard disks, a 236 fold improvement,’ Nikolas said.”

This article points to IBM’s claim that flash “can speed the response times of information gathering in servers and storage systems from milliseconds to microseconds – orders of magnitude faster. Because it contains no moving parts, the technology is also more reliable, durable and more energy efficient than spinning hard drives.” According to the article, by year end IBM will open 12 “flash competency centers” worldwide for the purpose of introducing its customers to the technology.

A solution that uses less energy while providing massively superior performance? Sign me up. Seriously, I’m hoping I can visit one of those flash competency centers soon.

One more thing from this article:

            “A deal has been announced between IBM and Sprint Nexel involving the installation of nine flash storage systems in Sprint’s data centre, amounting to 150TB of flash capacity. Flash is used to accelerate Sprint Nexel’s phone activation application and the company is expanding its use of the technology to other parts of the data centre. Sprint has a strategy to move its most active data to all-flash storage systems.”Even on home systems, I’ve seen huge performance gains when going with solid-state drives (SSD) compared to hard disk drives (HDD). Although SSD costs are still higher, they seem to be dropping, and (knock on wood) I have yet to experience a failure with my drives.

Perhaps you can get your toes wet with something like this:

            “Storwize V7000 includes IBM System Storage Easy Tier, a function that responds to the presence of [SSDs] in a storage pool that also contains [HDDs]. The system automatically and non-disruptively moves frequently accessed data from HDD MDisks to SSD MDisks, thus placing such data in a faster tier of storage.

            “Easy Tier eliminates manual intervention when assigning highly active data on volumes to faster responding storage. In this dynamically tiered environment, data movement is seamless to the host application regardless of the storage tier in which the data resides. Manual controls exist so that you can change the default behavior, for example, such as turning off Easy Tier on storage pools that have both types of MDisks.”

Some people use external HDD to store lots of media files, but rely on SSD for with their main system. Manually moving the larger, less frequently accessed files to another storage media is something I like to call “poor man’s tiering.”

Is SSD indeed the future of storage? Is there something else I should be watching for?

Verifying Firmware

Edit: Link no longer works.

Originally posted May 7, 2013 on AIXchange

Hopefully you’ve seen Nigel’s post about verifying firmware before installing:

Be sure to check the comments for more information from the developers. For instance:

            Prevention

            Before installing Power firmware, verify through the firmware release notes/readme information that the selected level is supported on the targeted server MTM.

            Example of 01AL770_032_032.readme.txt:

            System firmware level 01AL770_032_032

            System Firmware Release for the General Availability of the POWER7 System p Servers 8231-E1D, 8231-E2D, 8246-L1D, 8246-L2D, 8246-L1T, 8246-L2T, 8202-E4D, 8205-E6D

            Recovery

            Set Boot Side to P. From ASMI:

            – Expand the “Power/Restart Control” menu.

            – Select “Power On/Off System.”

            – Under the “Firmware boot side for the next boot” option, select “Permanent.”

             – Click “Save settings”. (NOT ‘save settings and power on’)

            – Reboot the system. From ASMI:

            – Expand the “System Aids” menu.

            – Select “Reset Service Processor.”

             – Click “Continue.”

            – Wait for the system to reconnect and show stable state in HMC GUI.

            Perform a Reject Fix operation. From HMC:

            – Select the applicable server.

            – Select the “Updates” menu.

            – Select “Change Licensed Internal Code for the current release.”

            – Select “Advanced features.”

            – Select “Reject Fix – Copy Permanent to Temporary.”

            – Click “OK.”

            After the Reject Fix is completed successfully, revert the system to the T side to enable concurrent updates. From ASMI:

            – Expand the “Power/Restart Control” menu.

            – Select “Power On/Off System.”

            – Under the “Firmware boot side for the next boot” option, select “Temporary.”

            – Click “Save settings.”

            – Expand the “System Aids” menu.

            – Select “Reset Service Processor.”

            – Click “Continue.”

            Power on the server.

And also this comment:

            Abstract: REMOVING UNSUPPORTED POWER SYSTEMS FIRMWARE, SRC B1813463

            SYMPTOM: After applying an unsupported system firmware level to the temporary side of the FSP the system stops at SRC B1813463. To resolve this problem follow the steps below to remove the unsupported system firmware. Follow the instructions specific to the method used to update the code.

            IMPORTANT: Always consult firmware readme files and verify supported levels before updating or upgrading system firmware. HMC levels v7r6.3 and v7r7.2 include an update to verify the
system firmware level is supported before allowing a firmware update or upgrade to begin.

            PROBLEM ISOLATION AIDS:
            – The system may be any of the following IBM servers:

            IBM Power 710 Express Server, Type 8231, models E1C, E2B
            IBM Power 720 Express Server, Type 8202, models E4B, E4C
            IBM Power 730 Express Server, Type 8231, models E2B, E2C
            IBM Power 740 Express Server, Type 8205, models E6B, E6C
            IBM Power 750 Express Server, Type 8233, model E8B
            IBM Power 755 Express Server, Type 8236, model E8C
            IBM Power 770 Server, Type 9117, any model
            IBM Power 780 Server, Type 9179, any model
            IBM PowerLinux 7R1 server, Type 8246, models L1C, L1S
            IBM PowerLinux 7R2 Server, Type 8246, models L2C, L2S

            – This tip is not option specific.
            – This tip is not software specific.

            – The system has the symptom described above.

            FIX: User must follow the guidelines listed below to remove the unsupported code. Follow the instructions depending on the method used to update the code:

            — HMC Managed Systems

            1) Using the ASMI, set Boot Side to Permanent.
               a) Expand the “Power/Restart Control” menu.
               b) Expand the “Power On/Off System” menu.
               c) Under the “Firmware boot side for the next boot” option, select “Permanent.”
               d) Click the “Save settings” button. DO NOT click the “Save Settings and Power On” button. It will cause the server to power on running the unsupported firmware side and require that you restart the procedure.
               e) Expand the “System Service Aids” menu.
               f) Select “Reset Service Processor.”
               g) Click the “Continue” button.

            Note: If this step is not completed the unsupported firmware will not be removed and SRC B1813463 will be displayed again.

            2) Using the HMC GUI, wait for the system to reconnect and show a state of “Power off.”
            3) Using the HMC GUI, perform “Reject Fix -Copy Temp. to Perm.”
               a) Select the applicable server.
               b) Select the “Updates” menu.
               c) Select “Change Licensed Int. Code for current release.”
               d) Select “Advanced features.”
               e) Select “Reject Fix – Copy Permanent to Temporary.”
               f) Click the “OK” button.
            4) Wait for “Reject Fix” is completed successfully.
            5) Using the ASMI, set the Boot Side back to Temporary and reset the service processor.
               a) Expand the “Power/Restart Control” menu.
               b) Select “Power On/Off System”.
               c) Under the “Firmware boot side for the next boot” option, select “Temporary.”
               d) Click the “Save settings” button.
               e) Expand the “System Aids” menu.
               f) Select “Reset Service Processor.”
               g) Click the “Continue” button.

            — Stand alone systems via USB
            — Not available for 9117-MMx and 9179-MHx servers.

            Updating firmware via USB is independent of the operating system installed. The only restriction is that the server cannot be HMC managed.

            1) Remove all system firmware present in the USB drives root directory.
            2) Download the RPM file for the latest supported firmware, then copy it into the USB drives root directory. (Note: Only one level of code should be contained in the USB root directory.)
            3) Insert the USB drive to the top port of the FSP (left side port for tower systems).
            4) Change the FSP Boot Side from Temporary to Permanent using either method [A] ASMI, OR [B] Operator (control) Panel.
               [A] Using the ASMI:
                   1) Expand the “Power/Restart Control” menu.
                   2) Expand the “Power On/Off System” menu.
                   3) Under the “Firmware boot side for the next boot” option, select “Permanent.”
                   4) Click the “Save settings” button.
               [B] Using the Operator (control) Panel.
                   1) Use the Increment or Decrement buttons to select Function 02.
                   2) Press the Enter button.
                   3) Press the Enter button until the field marker moves to the right of the character “T.”
                   4) Use the Increment or Decrement button to change the “T” to a “P.”
                   5) Reset the FSP using either method [A] ASMI, or [B] Performing a pin-hole reset, or [C] Removing AC power.
                          [A] Using ASMI:
                             1) Expand the “System Aids” menu.
                             2) Select “Reset Service Processor.”

Power Systems Best Practices

Edit: This is still a good document, but the link keeps changing.

Originally posted April 23, 2013 on AIXchange

Recently I received this set of slides from Fredrik Lundholm covering best practices for Power Systems with AIX. I’ll cover a few highlights, though honestly, I could discuss every slide. The information here is that valuable. So I highly recommend taking take the time to view the entire thing.If you download his slides, be sure to look at the notes. For example on page 7 where he discusses a virtualized system design, the notes contain a couple of links relating to Entitled Software Support, including this ESS how-to guide.

Page 8 lists guidelines for capacity planning. Fredrik points out the rational starting places for your CPU and LPAR weights if no information is provided. The fact that you can make reasonable guesses without a ton of workload information just reminds me how forgiving this platform is. If things change, CPU and memory settings can be easily adjusted. Whole physical adapters can even be added or removed if necessary.

Page 9 covers firmware and using Microcode Discovery service and FLRT.

Page 11 tells you where to get fixes for the VIO server. The notes cover items that have been fixed in each release.

Page 12 covers network best practices. The notes contain a link to a step by step network configuration guide.

Page 13 shows a nice diagram of a shared Ethernet adapter load sharing configuration that is available in VIOS 2.2.1+.

Page 14 shows the recommended architecture when more than one VLAN is used.

Page 15 features a reminder about SEA and virtual Ethernet interfaces. Be sure to select large send and large receive; it’s not the default setting.

            For all SEA interfaces, chdev -l entX -a largesend=1   (survives reboot)

            For all SEA interfaces, chdev -l entX -a large_receive=1   (survives reboot)

Page 17 covers storage and the need to ensure that the correct multi-path drivers are installed.Page 18 has a nice picture illustrating how the configured machines will look.

Page 19 covers setting up fc_err_recov and dyntrk, along with setting up no_reserve and round_robin.

From page 20: To allow graceful round robin load balancing over multiple paths, set timeout_policy to fail_path for all physical hdisks in the VIO server:

            # chdev –l hdisk0 –a timeout_policy = fail_path

Page 21 has links to documentation for installing AIX. Page 22 has a nice chart illustrating good choices for running AIX. The red green and yellow color coding are intended to help you decide which TL to run.

Page 23 lists AIX tuning and values that should be changed.

Page 24 covers AIX 5.3 memory tuning.

Page 26 has a nice tip: Largesend increases virtual Ethernet throughput performance and reduces processor utilization. Starting with AIX 6.1 TL7 sp 1 and AIX 7.1 sp 1, the operating systems that supports the mtu_bypass attribute for the shared Ethernet adapter provide a persistent way to enable the largesend feature. To determine if the operating system supports the mtu_bypass attribute, run the following lsattr command [lsattr -El enX |grep by_pass]. If the mtu_bypass attribute is supported, the… command will return:

            mtu_bypass off Enable/Disable largesend for virtual Ethernet True

            Enable largesend on all AIX en interfaces through:

            chdev -l enX -a mtu_bypass=on

Page 27 shows the recommended vSCSI parameters on each client partition. Page 28 covers vSCSI Queue Depth tuning for different disk subsystems.

There is also a section on PowerHA. It’s recommended that new deployments go with PowerHA 7.1. Page 31 covers I/O pacing with PowerHA.

An FAQ starts on page 32. Here’s a tip I like:

            Q: How do I run nmon to collect disk service times, top process cpu consumption, etc?

            A: STG Lab services recommends the following parameters for nmon data collection:

            /usr/bin/nmon –M -^ –f –d –T –A –s 60 –c 1435 –m /tmp/nmonlog

            This will invoke nmon every minute and continue for 24 hours capturing vital disk access time data along with top processes.

            -d includes the Disk Service Time section in the view

            -T includes the top processes in the output and saves the command line arguments into the UARG section

            -^ includes the Fibre Channel (FC) sections

            On the HMC, there is an “Allow performance information collection” checkbox on the processor configuration tab. Select this checkbox on the partition that you want to collect this data. If you are using IVM… use the lssyscfg command, specifying the all_perf_collection (permission for the partition to retrieve shared processor pool utilization) parameter. Valid values for the parameter are 0, do not allow authority (the default) and 1, allow authority.

Starting on page 36 there are reference documents to older information, which may still be helpful for certain environments.

This is a fantastic set of slides with current, real world information and suggestions.

IBM i Turns 25

Edit: Some links no longer work.

Originally posted April 23, 2013 on AIXchange

Though the focus of this blog is AIX, there is value in discussing the other OSs that can run on IBM Power Systems: Linux, VIOS and IBM i. With that in mind, have you seen all the information and videos about IBM i turning 25?

While I primarily find myself on AIX these days, when I started in the late 1980s I worked on AS/400 systems, the predecessors to IBM i. Part of my job involved tending to a line printer that required us to change paper and forms. The most exciting part of the job was changing from green bar paper to white, and then back again (with an occasional run of custom forms thrown in).

The AS/400 was a great platform to work on as a computer operator. And compared to other operating systems of that era, OS/400 didn’t require much care and feeding. Those machines just ran.

I recall our IBM CE coming on site. He’d log in, look at logs and ask us how we were doing, but the only thing we ever really needed from him was to repair or replace the green screen displays we had connected to the AS/400 via twinax. He never had to actually do anything with the AS/400 box itself. Basically, the guy was our version of the Maytag repairman.

Of course over the past 25 years the AS/400 has gone through a few rebrandings. And over time IBM has brought IBM i and AIX together architecturally. One important thing AIX and IBM i now share in common is the capability to virtualize adapters using the VIO server. However, as AIX pros we are generally more comfortable with VIOS. Sometimes I hear IBM i folks complain about how complicated it is — and IBM is working to make VIOS more user friendly. But this is where, as an AIX/VIOS person, you can help your IBM i friends by configuring VIOS for them. Although you can certainly dedicate your adapters and direct connect to SAN storage, VIOS allows everyone to connect to the same SAN. That’s a nice advantage.

Speaking of the coming together of AIX and IBM i, you should know that COMMON, the conference that for years has centered on AS/400, iSeries, System i and IBM i technologies, continues to add more AIX content to its user group meetings. The one that took place in Austin, Texas, earlier this month had AIX courses covering application development, high availability, networking, systems management and web applications.

So did you know that IBM i is celebrating 25 years? Do you still make the mistake of calling it an AS/400?

If, like me, you worked on the AS/400 in the beginning, that’s one thing. But it’s neither technically correct — nor positive for the platform — to refer to today’s IBM Power Systems running IBM i as an AS/400. While it demonstrates the loyalty that users have always had to AS/400 systems, IBM Champion Trevor Perry points out that it needs to change.  As he states: “Conflicted people called it AS/400. Confused people called it iSeries. Confident people called it IBM i.”

I think AIX users can see his point. I mean, we love our systems, but I don’t know of anyone who still uses the name RS/6000. So what do you think? Does the name matter? Do you plan to step up and call it by its name, or are you going to remain conflicted and call it an AS/400?

The Search for Answers, the Need for Help

Edit: I still ask for help, hopefully you do too.

Originally posted April 16, 2013 on AIXchange

Sure, you work in the field of technology, but that doesn’t automatically make you a creature of social media. So really, how plugged in are you? From Facebook to Twitter to Google+ to news.google.com to plain old email, do you often see the jokes and memes and viral videos that go around the Internet? Or are you so insulated you not only don’t know that planking or the Harlem Shake fad is over, you never knew it was a thing to begin with?

Of course compared to 30 or even 20 years ago, we as a society have fewer and fewer shared experiences. Not that long ago there were four television channels (the three major networks and your local UHF station). People talked about the big TV events because everyone was watching the same things at the same times. You got to see Christmas specials once a year. The Grinch? Once. Rudolph? Once. There were no videos to rent, buy or download. Most households didn’t even have remotes, much less cable television and VCRs.

These days, someone might recommend a long discontinued show (Arrested Development, Firefly, Freaks and Geeks, IT Crowd, etc.) and — thanks to online services like Netflix or Hulu — you might binge on the entire series over one weekend.

To be sure, the way we consume mass media is changing. Even the most-watched programs now, like the Academy Awards or major sporting events, have significantly fewer viewers than what they enjoyed a generation ago. We’re at least as likely to find new music we like on YouTube or Internet radio or even in TV commercials as we are on what is now known as “terrestrial” radio.

If there’s a single vehicle for shared experiences today, it might be YouTube. Consider this presentation that’s generated more than 2 million views between YouTube and TED.com: It’s called “The Art of Asking,” and the presenter is a woman named Amanda Palmer.

I encourage you to watch the whole thing, but I’ll give you some highlights. Around the 9-minute mark she talks about how she got nearly $1.2 million from her Kickstarter fundraising project, and how “crowd-funding” worked for her. She talks about how her record label considered her a failure when she sold only 25,000 recordings. But it turns out that the same number of fans and supporters, around 25,000, created a successful Kickstarter project, and ultimately helped her raise $1.2 million. Selling 25,000 recordings may make you a “failure,” but getting 25,000 people to support you can make you a big success.

Around the 9:30 mark, Palmer mentions how she didn’t make anyone pay for her music; she only asked them to. By asking her audience, she connected with them. And she says when you connect with people, people want to help you.

Palmer concludes by saying we need to change from “how do we make people pay for music?” to “how do we let people pay for music?”

I think this phenomenon has always been a part of our world as IT pros. Because what we do is complex, and no one person has all the answers, we rely on one another. Many people — readers, clients, friends, what have you — ask me for help. And I can assure you that I get help from countless people. Sure, we give each other a hard time. We joke and fool around and say just RTFM. But over the years I’ve developed a mental list of trusted advisors, people I know who know things. I ask, they help. They ask, I help.

Oftentimes help comes in the form of simply answering a question. In your work, when you search for an answer to a technical matter, you’re exercising faith that not only that someone has found the answer, but that they’ve taken the time to put the correct answer out there. Many of my posts are based on real-life experiences. In this blog I attempt to share questions that were answered and things that were discovered. But you don’t need a blog to help others find answers. You can always share what you know in the comments section here or in any other forums you frequent. Your thoughts, ideas and experiences may one day be the answer someone else is searching for.

When people really need assistance, don’t you want to help them?

People Always Make the Difference

Edit: Still one of my favorite places to visit.

Originally posted April 9, 2013 on AIXchange

I recently wrote about visiting customer locations. I didn’t mention it then, but one visit really stands out. I had no problem finding the place, and there was nothing awe-inspiring about the physical environment. What I’ll always remember is how I was treated when I arrived.

Upon entering I was immediately greeted by a security guard — literally, greeted. He’s one of the happiest people I’ve ever met. He welcomed me, asked my name and showed me to the receptionist so I could get signed in and connected with the folks who were waiting for me.

Interacting with this person throughout the day, I noticed something. I wasn’t special. He greeted everyone that came through the door like an old friend. If he didn’t know someone, he asked for a name, and he remembered it.

The other thing that struck me was the reaction to the security guard. While most of the visitors smiled and nodded, only a handful ever actually uttered any response. I asked about this, and he told me that this was typical — his friendliness generally wasn’t reciprocated.

I honestly felt badly to hear that. I wondered he kept such a positive outlook in the face of constant indifference. After all, it’d be easy to conclude that his efforts simply weren’t worth it.

Then he said something I’ve heard a thousand times, but never fully appreciated. He told me he can’t control anyone’s attitude except his own. Despite the lack of response, he chooses on a daily basis to be happy at work and greet everyone by their name. Long story short, this guy’s choice really brightened my day — really, several days. It was a long-term project and I made several follow-up trips to that facility. The security guard always greeted me by name and with a smile.

Smile. Say hello. Remember names. It seems so simple, it seems so trivial. Yet these small gestures really do matter. You can have the world’s most luxurious facility, but people always make the difference. I visit a lot of customer locations, and I could write about some amazing, pristine work environments. But this experience means more to me. Given the choice, I’d much rather work with happy folks in an old building in the middle of nowhere.

It might not be a big deal, but ask yourself, right now, are you in a good mood? Are you smiling? Or are you having a bad day? Everyone has bad days of course, especially when confronted by external issues and problems beyond your control. Still, if you are having a bad day, could it be better if you just made the choice to be happier?

Open Source AIX Software Remains Plentiful

Edit: Some of these links no longer work.

Originally posted April 2, 2013 on AIXchange

Remember the UCLA freeware repository? This post is part of a discussion surrounding the repository going offline back in 2007. As Nigel wrote at the end of this thread:           

“There are still people active in this area. Take a look at www.perzl.org/aix. I got Apache, PHP, rrdtool and the wonderful Ganglia (with POWER5/6) enhancements from here. I would also recommend telling your local IBM representative that you think this needs to be fixed. Customer pressure is a good incentive for IBM to get organized, sort this out and eventually works.”

As this old post shows, perzl.org has been around for a while, though plenty of admins are unaware of it. For instance, just recently when a customer was interested in getting gnupg working on AIX and they were having trouble getting the package dependencies worked out, I referred them to this tip:

            “A solution to the RPM dependency… problem. I guess everybody who has installed a couple of RPM packages using rpm itself and not the help with a tool like yum ran into the following issue:

            1) You have downloaded and want to install RPM aaa.rpm.
            2) aaa.rpm has dependency on bbb.rpm and ccc.rpm.
            3) bbb.rpm has dependency on ddd.rpm and ccc.rpm on eee.rpm and fff.rpm.
            4) etc. 

“So you end up circling through all your RPM files and downloading all prerequisite RPM files just to install aaa.rpm. This can become quite annoying and time-consuming for packages with lots of dependencies. This is actually where a tool like yum is helping you a lot because it does all the steps outlined above for you. Unfortunately, I have so far found no way of compiling and providing YUM for AIX that could be done in a compatible manner (to the IBM provided RPM) as AIX still uses the old V3.0.5 version of RPM while all RPM-based Linux distributions have switched to RPM V4.X a long time ago. Also all recent YUM versions require at least a RPM version >= 4.4.

“My solution approach to this problem:

• Basically what you want is a complete and self-contained list of dependencies for the RPM file aaa.rpm.
• You download all the RPM packages on this list (make sure that you have downloaded them all into a separate directory which was empty before).
• After downloading all the RPM packages on the list you can just install the RPM file aaa.rpm as easy as rpm -Uvh *.rpm
• This approach mimics kind of the AIX NIM behavior of a software bundle (the list here) and a lpp_source (the separate directory containing all required RPM files).”

Read more in the Perzl.org FAQ.

In the meantime, hopefully this procedure can help someone else with a similar situation. Just get whichever .deps file you’re interested in for the package you want to install:

1) If wget is not already installed on your system, download wget to /tmp/gnupg (or some other temporary location)

ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/wget/wget-1.9.1-1.aix5.1.ppc.rpm

Install the rpm with: rpm –ivh wget*rpm2) Use wget to download the gnupg rpm dependency file to /tmp/gnupg.

Use this file for AIX 7.1:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix71/gnupg-1.4.13-1.aix5.1.ppc.deps

Use this file for AIX 6.1:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix61/gnupg-1.4.13-1.aix5.1.ppc.deps

Use this file for AIX 5.3:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix53/gnupg-1.4.13-1.aix5.1.ppc.deps3)

From /tmp/gnupg, run wget -B http://www.oss4aix.org/download/everything/RPMS/ -i
gnupg-1.4.13-1.aix5.1.ppc.deps.

This will download the dependencies needed to install gnupg.

4) Run rpm –Uvh *rpm. The dependencies are now installed. (On one test LPAR I got a warning about a conflict with /opt/freeware/man/man3/Thread.3. I got past it by running rpm –Uvh –force *rpm.)

5) Download gnupg:

wget http://www.oss4aix.org/download/RPMS/gnupg/gnupg-1.4.13 -1.aix5.1.ppc.rpm

6) Install with rpm –ivh gnupg*rpm.

You should now have gnupg on your system.

            /opt/freeware/bin/gpg –version
            gpg (GnuPG) 1.4.13
            Copyright (C) 2012 Free Software Foundation, Inc.
            License GPLv3+: GNU GPL version 3 or later
            This is free software: you are free to change and redistribute it.
            There is NO WARRANTY, to the extent permitted by law.

            Home: ~/.gnupg
            Supported algorithms:
            Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
            Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
                    CAMELLIA128, CAMELLIA192, CAMELLIA256
            Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
            Compression: Uncompressed, ZIP, ZLIB, BZIP2

Here’s a list of open source tools in addition to perzl.org. Perhaps you’ll recognize some. Even better, perhaps you’ll find something that’s new to you. Also check out IBM AIX Toolbox download info and Bull AIX freeware.

So which of these open source software repositories do you use and recommend?

The Value of Test Systems

Edit: I still love test labs.

Originally posted March 26, 2013 on AIXchange

Two weeks ago I asked readers to recommend some resources for IT pros who are new to AIX. The first comment was simply this:

“Can’t beat playing around on a test system!”

I couldn’t agree more. I write plenty about the value of training and how it’s worth your time to read IBM Redbooks, and these things are great. Still, nothing beats hands-on learning. I know that back when the size of a JFS filesystem couldn’t be reduced, I was very grateful that my indoctrination to growing a file system came on a test box rather than a production system. I was new, and I needed that practice playground. I think it’s unfortunate that so many customers who switch to AIX from another operating system neglect to add at least one test box to this new environment. At least one. And with multiple test machines, it becomes possible to do things like PowerHA, shared storage pools and live partition mobility testing. With the reasonable cost of current 710 and 720 models, I’m amazed that more customers don’t automatically add test machines to their hardware orders.

And speaking of training, if your boss doesn’t want you out of the office for a week attending an educational conference, tell him there’s an alternative: Just get me a test box. I think every IT pro understands that there’s a huge difference between reading about something and actually doing it. A test box is like a classroom that’s always open and available.

What do you need a test box for? What don’t you need it for? When a test box is available, programmer/administrator mistakes are learning opportunities rather than lost uptime. Test boxes are where we learn, where we validate, where we get comfortable with the technology. If you ask around, I think you’ll find that the people who excel at their jobs generally have spent considerable time on test systems. Certainly access to test hardware makes for more confident admins.

Now, if your employer absolutely won’t pay for one, there are other ways to access a test system. IBM has a virtual loaner program available to business partners. With this you can at least logon to the command line of a remote AIX systems. Of course this isn’t the same as having a test box onsite, available whenever you want to play around with it.

While it’s frustrating thinking about customers that won’t provide test boxes, what’s even worse is hearing from IT pros who don’t use the precious access they have. I really do hear some complain, “I have the lab, but I have no time to use it.” Geez… make time! Skip an episode or two of “Mad Men” or “Big Bang Theory” or “Scooby Doo” or whatever it is that people watch these days. If it really matters to you, you’ll find the time to further yourself professionally.

So do you work with test machines? Does your employer provide them or did you break down and buy an old POWER5 box off of eBay? Please share your experiences in comments.