Sometimes Folks Only Learn The Hard Way About Keeping Software Current

Edit: My third post for IT Jungle

Originally posted February 1, 2021

School’s been out for me for a very long time, but I still enjoy learning. I gain a sense of satisfaction whenever I learn something new. Specific to technology, exposure to new concepts helps me understand how things work together. I cannot count the number of times where I watched over someone’s shoulder, or watched someone on a shared screen, to learn about a new tool or technique, or a different way to set up my desktop or environment.

Watching and listening to people is my preferred way to learn, but other forms of education – reading IBM Redbooks and other documents, or articles, or watching webinar replays – are also worthwhile. Pick a topic in the Power Systems ecosystem, whether it is the pros and cons of a virtual HMC, how many physical adapters you can fit into a given machine, how to transfer files or backup machines, CL or RPG programming techniques, updates to the open source environment on IBM i or overall general best practices, I love to soak it all up.

Digging into new topics – even if it’s only an inch deep – is especially important for IT pros in smaller workplaces. At smaller companies, fewer people wear more hats, and they’re typically asked to do more.

Even if I don’t use this information immediately, even if I never use it, I still value the experience of learning. I like to know what’s possible. To me it’s worth the time to get exposed to the concepts, and the less familiar I am with something, the more motivated I am to read about it. You never know when some tidbit of information that you’ve absorbed will come in handy. Having even the slightest introduction to a topic makes it much easier to conduct a search or ask a question later on.

Scott Berkun offered a similar perspective on Twitter recently: “If you’re experienced in your job, a great way to grow is to study something else. Go read a book about or go to a conference relating to something you know little about. You’ll ask big questions. You’ll learn new models and thoughts. You’ll return to your work with fresh eyes.”

In that vein, many of you may be unfamiliar with the inner workings of the PowerVM Virtual I/O Server, a.k.a. VIOS. It’s worth learning more about this topic though, because action needs to be taken in virtualized environments.

VIOS allows you to virtualize Power Systems servers. While I see a great deal of it in AIX and Linux on Power environments, it also supports a growing number of IBM i workloads. If you’re a VIOS user, you should know that, as of October 2020, VIOS 2.x is no longer supported by IBM. And you should understand why it’s important to move VIOS 3.1.x as soon as possible.

The withdrawal of support for VIOS 2.x doesn’t mean that you’re completely out of options for getting support. As is the case with most withdrawn offerings, IBM will still provide extended support. But paying the premium isn’t the only issue, or even the primary issue.

Also, this isn’t a boilerplate plea to move to the latest and greatest. You should move because VIOS 3.x is fundamentally different from VIOS 2.x. The updated version is designed to function more efficiently with newer Power Systems hardware (Power8 and later). VIOS 3.x runs AIX 7.2 under the covers, whereas VIOS 2.x runs AIX 6.1. If you’re not well-versed in the AIX operating system numbering scheme, AIX 6.1 arrived in the 2007 timeframe; 6.1 TL9 went end of support in April 2017. So under the covers, those versions of VIOS were getting long in the tooth. The new VIOS code, as noted, is better able to exploit the improvements in the Power hardware, (think of being able to dispatch to more threads, etc.). In addition, IBM removed VIOS legacy code (does IBM Systems Director ring a bell?), resulting in overall performance improvements with VIOS 3.x.

As a refresher, storage that’s been allocated to a VIO server can be connected to a VIO client by using either NPIV or vSCSI. Each option has advantages and disadvantages, but these days I see NPIV more often than not.

The following description of a separation of duties is mostly applicable to larger shops, where different teams have different roles, verses a smaller shop where one guy may do it all. With NPIV, the SAN guys are zoning LUNs directly to the client LPAR, whereas with vSCSI they’re zoning LUNs to the VIO server itself. It’s an extra step for the Power Systems administrator to then map the allocated LUN to the client IBM i LPAR. SAN guys generally prefer the added visibility into which LPARs are using the actual LUNs. Rather than see a LUN disappear when it gets mapped to a black box (the VIO server), I find that they tend to prefer NPIV. In a smaller shop, since it is the same guy doing the zoning and the mapping, it may be less of an issue.

With the code changes that have occurred, it’s important to recognize that the process of upgrading to VIOS 3.x differs from what you’re accustomed to. You’re not simply putting on a fixpack or service pack; you’re doing an under-the-covers OS upgrade from AIX 6.1 to AIX 7.2. Upgrading is still fairly straight-forward, but it does require some planning and preparation, so approach it carefully.

The good news is that most environments set up VIO servers in a dual VIOS configuration, which provides multiple paths to storage and the network and allows for maintenance activities to occur without affecting running client LPARs. The idea is that you can upgrade VIO server 2, test it, and then fail everything over to that second server and then conduct maintenance on VIO server 1. Of course, you should subject your VIO failover process to regular testing. As with any high availability solution, or even system backups, if you do not test, you cannot be sure that things will work correctly when you need it to.

IBM has tools and methods to help you perform the upgrade, but the basic idea is you’re backing up your configuration, doing a new VIOS install, and then restoring the configuration information to your fresh VIOS copy. A friend has been performing quite a few of these upgrades lately. Although IBM has a tool you can use to backup your data, he prefers to gather the configuration data himself.

In any event, he recounted a unique experience. He’d been told that, in this particular environment, all of the IBM i clients were connected via NPIV to their storage. After gathering the necessary data and doing the fresh install of the VIOS code, he went ahead and restored the network and the NPIV configurations. He then failed over the clients to his newly built VIO server and began work on the other one. Most of the IBM i clients were just fine, though a few started spitting out LED codes, indicating that they had lost access to their disks. It turns out that some vSCSI disk connections remained in the environment after all. As noted, he’d taken the word of others regarding how the disk was connected rather than verify the connections himself.

What would you expect to happen if you pull all the paths to the disks out from under a running system? I’d certainly expect them to crash. However, once the problem was understood (this took about an hour), both VIO servers had their vSCSI mappings put back in place, and the LPARs just continued from where they left off. They didn’t crash, they didn’t reboot; they just continued running as if nothing had happened. The customer was able to login and verify the system looked and behaved as they expected it to.

Maybe that speaks to the resilience of IBM i, or maybe my friend got lucky. The maintenance was performed during a quiet time of the day, so the systems weren’t being taxed in that moment. Regardless, I was impressed.

But don’t take the wrong lesson from this story. Yanking disk paths away from running production systems willy nilly is never a good thing. However, I would like to try and reproduce this behavior in the lab. When discussing this result with others, I’ve heard skepticism. There must be another reason why that environment didn’t crash. One thought is that there must have been a connection to the load source disk via NPIV.

I intend to get to the bottom of it. I look at it as another learning opportunity. Formal school may be out for most of us, but we’re still learning.

Rob McNelly is a senior Power Systems solutions architect doing pre-sales and post-sales support for Meridian IT, headquartered in Deerfield, Illinois. McNelly was a technical editor for IBM Systems Magazine, and a former administrator within IBM’s Integrated Technology Delivery and Server Operations division. Prior to working for IBM, McNelly was an OS/400 and IBM i operator for many years for multiple companies. McNelly was named an IBM Champion for Power Systems in 2011, an IBM Champion Lifetime Achievement recipient in 2019, and can be reached at rob.mcnelly@gmail.com.