System Software Maps Provide Quick Answers on OS Support

Edit: I still look at these from time to time.

Originally posted February 12, 2019 on AIXchange

Some weeks back Nigel Griffiths tweeted something that seemed familiar. 

He noted that IBM Support has system software maps for AIX, IBM i, VIO server, SUSE, Red Hat and Ubuntu Linux. The maps allow you to quickly locate all the IBM Power Systems server models that support these operating systems. 

Seeing Nigel tweet about this reminded me that I wrote about the software maps in 2015

These pages are regularly updated and of course now include the POWER9 servers. But as I noted then, you don’t need to be on the latest and greatest hardware to benefit from this information. For instance, the AIX maps extend all the way back to RS/6000 models. Being able to easily determine supported OS versions is especially helpful should you need to deploy workloads on older repurposed hardware. 

As Nigel says, this is a useful webpage, so bookmark it. Don’t wait for us to remind you about this great tool again. You may not need it on a daily basis, but you’ll want this information at your fingertips. 

PowerVC-Based Tool Rebalances Workloads

Edit: Some links no longer work.

Originally posted February 5, 2019 on AIXchange

I was recently asked if there’s a way to automatically rebalance AIX workloads on IBM Power Systems servers. There sure is. It’s called the PowerVC Dynamic Resource Optimizer:

The Dynamic Resource Optimizer (DRO) is a cutting-edge feature of PowerVC that brings an unprecedented level of automation to your Power cloud for PowerVM and PowerKVM hypervisors. When enabled, DRO monitors your compute nodes and automatically detects resource imbalances. Depending on its mode of operation, DRO will either advise or automatically perform actions to restore balance to your cloud. Using this technology allows cloud administrators to spend less time performing labor-intensive infrastructure monitoring tasks, and allows more time to focus their efforts on other critical business initiatives. Additionally, enterprises can achieve higher levels of ROI regarding their hardware as it can run increased workload densities. When workload spikes occur, DRO can quickly recognize the imbalance and rebalance the cloud before chaos unfolds.

Here’s more:

DRO can take two types of actions: virtual machine live migration, and mobile core activations via Power Enterprise Pools. The actions taken by the DRO depend on the options selected by users. Figure 2 is a screenshot of a host group being created where you can see options such as: “CPU utilization, stabilization, run interval, and maximum concurrent migrations.” You can choose to migrate virtual machines, activate mobile cores, or both. If you choose both and the host in need of attention is a member of an Enterprise Pool, the DRO first tries to activate one or more mobile cores. DRO tries to migrate a VM from a busy host to a less busy host.

Go to this page to read the whole thing (and view the screenshots). Learn even more by watching this demo. 

Many customers are unaware of DRO, but it’s easy to implement if you already have PowerVC running in your environment. PowerVC users should investigate this option.

There’s Even More to the POWER9 Story

Edit: Are they still the fastest in the world?

Originally posted January 29, 2019 on AIXchange

We all know that Summit and Sierra are the world’s fastest supercomputers, and that they run on POWER9 processors connected to NVIDIA GPUs. (The second half of this post goes into detail.) 

Here’s more from CNet:

The US now can claim the top two machines on a list of the 500 fastest supercomputers, as Sierra, an IBM machine for nuclear weapons research at Lawrence Livermore National Laboratory, edged out a Chinese system that last year was the very fastest.

The Top500 list ranks supercomputers based on how quickly they perform a mathematical calculation test called Linpack. The top machine, IBM’s Summit at Oak Ridge National Laboratory, had claimed the No. 1 spot in June with a speed of 122.3 quintillion mathematical operations per second, or 122.3 petaflops.

But an upgrade gave it a score of 143.5 petaflops on the newest list. To match that speed, each person on the planet would have to perform 19 million calculations per second. Sierra got an upgrade, too, boosting its performance from 71.6 petaflops to 94.6 petaflops and lifting it from third place to second.

Summit and Sierra are siblings, each using IBM POWER9 processors boosted by Nvidia Tesla V100 accelerator chips and connected with Mellanox high-speed Infiniband network connections. They’re gargantuan machines made of row after row of refrigerator-size computing cabinets. Summit has 2.4 million processor cores and Sierra has 1.6 million.

Supercomputers are used for tasks like virtual testing of nuclear weapons, aerodynamic modeling of aircraft, understanding the formation of the universe, researching cancer and forecasting climate change effects. They’re expensive but prestigious machines that can keep scientists and engineers at the vanguard of research.

But there’s still more to this story, and, not surprisingly, it speaks to the singular quality of IBM Power Systems hardware. Top500, the supercomputer ratings group referenced in the article, published some detailed data, which I’ve condensed to this simple table below. (Go here to see the original.)

RankSystem CoresRmax (TFlop/s)Rpeak (TFlop/s)Power (kW)
12,397,824143,500.0200,794.99,783
21,572,48094,640.0125,712.07,438
310,649,60093,014.6125,435.915,371
44,981,76061,444.5100,678.718,482

Note the stark differences in the number of system cores deployed, as well as the power consumption. The Power Systems machines require far fewer cores and consume roughly 50 percent as much energy. 

Less than half the cores, nearly half the power, and better results? These are some spectacular numbers. If you ever need to make an argument for Power Systems hardware running AIX, IBM i or Linux with the latest processors, this is tremendous ammunition.

A Red Hot Reddit Discussion of AIX & Linux

Edit: Are you ready to switch departments?

Originally posted January 22, 2019 on AIXchange

I’ve long maintained that AIX isn’t imperiled by the prevalence of Linux. Even so, it’s always great to encounter passionate defenses of our favorite operating system, and this thread from the AIX Reddit feed (r/aix) is chock full of them. 

Let’s start with the original post:

Not that I have something against AIX, but I don’t see many people using it. And coming here to this sub-reddit confirmed my fears. Linux sub-reddit has 3000 times more subscribers and it’s a very fast growing technology/community. I fear that AIX doesn’t have such a big future compared to Linux.

I’d prefer to move to a Linux department, where the real deal is.

Should I talk to my company or just go to that department until my internship is over and then decide what I should do?

The responses are, in a word, glorious. Here’s the first one:

Linux is a Kia. AIX is a Mercedes.

I’ve been a UNIX Sys Admin for over 20 years and have worked on AIX, Solaris, HP-UX, Tru-64, Dynix, Pyramid, NCR/MP-RAS, and Linux.

The AIX systems are the ones I’ve had the least problems from. They crash the least and are the most stable.

Another reply:

Linux is something you can learn at home. AIX is a skill you get on the job. Learning AIX is learning about Unix systems. Linux is a Unix-like system. Learn AIX, play with Linux at home and you’ll be better prepared.

Since this is an internship, I don’t think complaining is the way to go. AIX is very much alive. The user base isn’t on reddit.

And another:

The reason the linux subreddits are vastly more popular is because they aren’t an enterprise-tier OS with enterprise-tier support to match. When people run into issues in linux they turn to the open source community. When you run into issues with AIX, you call IBM because that’s what you pay them for.

Much of what you’ll learn on AIX (especially in an internship) will translate to linux and other unix systems. If you get into a position where you’re throwing into a linux environment after having 20 months experience on AIX, you’ll do fine. Spin up some linux VMs on your own time and replicate the things you do at work in AIX on your lab in linux. Best of both worlds….

And one more:

Depends on the environment. If it’s a place that has their stuff together reasonably well you could learn a ton. AIX is the Cadillac of UNIX’s right now. It runs on IBM hardware so there’s a deeper level of integration that is just so much easier to work with. Support is good once you get past the level one folks. But really it’s about the mainframe mentality. The guys who wrote AIX took the lessons from decades of other experience and put that into AIX. It really shows.

Linux is the Windows of UNIX. It’s great because it runs on anything and is easy to pickup. But once you start working on it daily you’ll see the warts.

All that being said Linux is where most of the new jobs are at. AIX jobs are harder to find but more likely to be more rewarding if your into big systems.

This particular story has a happy ending, as the original poster returns with this small but significant edit:

Thanks everyone. You convinced me.

That’s just a sampling of comments. Read the whole thing.

Replicating Changes Across Multiple HMCs

Edit: Have you set this up?

Originally posted January 15, 2019 on AIXchange

For any environment with multiple HMCs, data replication generally makes sense. IBM’s Customizable Data Replication service can help you accomplish this, no matter how your HMCs are set up:

The Customizable Data Replication service provides the ability to configure a set of Hardware Management Consoles (HMCs) to automatically replicate any changes to certain types of data so that the configured set of HMCs automatically keep this data synchronized without manual intervention.

The following types of data can be configured:

Customer information data

  • Administrator information (customer name, address, telephone number, and so on.)
  • System information (administrator name, address, telephone of your system)
  • Account information (customer number, enterprise number, sales branch office, and so on.)

Group data

  • All user-defined group definitions

Modem configuration data

  • Configure modem for remote support

Outbound connectivity data

  • Configure local modem to RSF
  • Enable an internet connection
  • Configure to an external time source

The Customizable Data Replication service can be enabled for the following types of operations:

Peer-to-peer
Provides automatic replication of the selected customized data types between peer HMCs. Changes made on any of these consoles are replicated to the other consoles.

Master-to-slave
Provides automatic replication of the selected customized data types from one or more designated master HMCs to one or more designated slave HMCs. Changes made on a master console are automatically replicated to the slave console.
This document includes links that go into detail on setting up peer-to-peer and master-to-slave replication. This companion doc tells you how to force replication:

As data is replicated from one HMC to another, an internal level indicator for the data being replicated is recorded each time the data is altered on the data source. Learn about how to force the replication of data from one or more data sources.

Each HMC keeps track of the level indicator for each type of data and will not accept data from a data source when the level indicator is not greater than that on the receiving HMC.Keep reading that doc to learn how to force the replication of data from one or more data sources when the level indicator on the receiving HMC is greater than that of the data sources.

If you’ve ever had to manage an environment with multiple HMCs and multiple HMC users, the benefits of the Customizable Data Replication service should be apparent. Changing a single HMC is far easier than attempting to propagate changes across a group of them.

More on VIOS 3.1

Edit: Did you upgrade yet?

Originally posted January 8, 2019 on AIXchange

A quick follow up on this post about VIOS 3.1. 

IBM Champion Stephen Diwell makes some interesting points based on his discoveries during hands-on testing. In this post, he mentions some important things to take note of during the install process, starting with the way paging devices are set up. He also says that resizing your filesystems is a really good idea, and suggests changing the password algorithm from the default. 

In a follow-up post, Steve discusses finding ssh host keys included with the VIO server mksysb image, and shows you how to remedy that issue by removing and regenerating the key. 

In addition, IBM Systems Magazine has this VIOS 3.1 overview from Jaqui Lynch. 

VIOS is an important utility for many of us, so it’s important to know about the potential gotchas as we upgrade to the latest version. As I find more information, I’ll be sure to pass it on.

Staying Fit: An Ongoing Story of Peaks and Valleys

Edit: I am still keeping after it

Originally posted December 18, 2018 on AIXchange

How do you know if your friend is vegan and does Crossfit? Don’t worry, they’ll tell you.

Admittedly, I’ve kind of evolved into that guy. Back in 2016 I posted about my attempts at losing weight. Since the end of the year provides a window for many to get away from the office, and give some thought to New Year’s Resolutions, now seems like an appropriate time to share the latest.

As I wrote then, my awakening came at a physical. The doctor wouldn’t sign off on a medical form due to my obesity. My absolute highest weight was in spring 2013, but by that November I was down around 60 pounds. I felt great. There was nothing dramatic about it; I just changed my diet and exercised regularly.

I’d like to say that was the end of the story, but I’ve ping-ponged since. By the summer of 2015 I’d regained about 25 pounds. So I got focused again, and the weight was off by the next year. Had I learned my lesson? Of course not. By the end of 2017, those 25 pounds were back on.

As I said, my two sons being in Boy Scouts and me wanting to participate in that with them was my original motivation to change. One son is now in the U.S. Marine Corps, having gone to boot camp in August 2017. Between his post boot camp and Military Occupational Specialty training, he was able to come home over the holidays. So on New Year’s Day 2018, we decided to take a little hike.

The venue was Picacho Peak, which is located midway between Phoenix and Tucson in Arizona:

Sunset Vista Trail: 3.1 miles; moderate first 2 miles, becoming difficult; Travels on the south side from the westernmost parking area and goes to the top of the peak. The first 2 miles are moderate, then the route becomes difficult, steep and twisting, with steel cables (gloves are recommended) anchored into the rock in places where the surface is bare. This trail is not recommended during hot weather seasons.

Fun fact: Picacho means “peak” in Spanish. So we basically hiked up Peak Peak. Speaking of peaks, that day I learned I was not in peak physical condition.

Since we’d made this same hike a few years earlier, I figured I’d be fine. I wasn’t. I ended up watching my fit, fresh from boot camp son easily reach the summit while I took a breather and took stock. I realized that, once again, I needed to do something about my fitness levels.

After about three months of dieting and weight-watching, I participated in a Super Sprint triathlon with some Scouts this past April. This consists of a 75-yard swim, a 6-mile bike ride, and a 1.6-mile run. With the memory of getting smoked by my son on that hike still fresh, I got through it without issue. In fact I felt good enough to sign up for a slightly tougher triathlon event in the fall: a 425-yard swim, a 15-mile bike ride and a 5K run.

After training over the spring and summer and doing that second triathlon, the weight has come back off, though there have been setbacks: an injury here, a junk food binge there (I refer to the latter as my “weekends of debauchery”). But overall, I feel really good. My blood pressure is down, my resting heart rate is low, and I’m always looking forward to the next backpacking trip or hike.

I’m no athlete. I don’t do these events for medals, and I surely am not a threat to win anything. My goal is simply to finish, and to do better the next time. Having these events on my calendar (the next one is in April) keeps me focused on fitness, and my rewards from that are many. I have greater endurance and my clothes fit better (even some things I’d gotten too big to wear). Losing weight–and buying properly fitted shoes–helped me overcome plantar fasciitis.

For me dieting comes down to controlling meal portions while paying attention to the mix of protein, fats and carbs. I eat lots of salads, stay away from mindless snacking and avoid desserts and other sweets. I hit the gym three times a week. I work out with their special, pricey equipment, and I attend trainer-led group classes that include activities I might not choose to do on my own.

Most days at home I’ll get in an intense hour of cardio on the treadmill. Varying the workouts while continually edging up the intensity allows my body to continually adjust to the demands I put on it. And (of course) I apply some technology to the matter. My scale auto-magically connects to the cloud, allowing me to compile nearly seven years’ of data on my weight and body fat percentages. While there have been periods where I neglected doing the weigh-ins, checking the graphs and trends is nonetheless quite enlightening.

When I exercise, I track my heart rate, and when I run longer distances outside, I chart my pace. A tracker tells me the number of steps I take each day. Am I faster this time? Did I take as many steps today as I did yesterday? How many calories did I burn? I need to know. I need the numbers.

Of course there’s no one way to get in shape. I’ve seen others succeed with the Whole 30 dietNutrisystemWeight WatchersMedifast and Atkins. Just cutting back on carbs can help.

The trick is to find what works for you, and to find what motivates you. I’m motivated by checking my calendar and seeing that next event, and by seeing my metrics improve. Another motivator is having people who’ve not seen me in a while do a double take and ask if I’ve lost weight. (Why yes I have, thanks for asking.)

Finally, it motivates me just to talk about this. I realize that my story may not prompt anyone to take action, but sharing it helps me. It keeps me accountable. It makes everything real. It means if you see me stuffing my face at the next IBM Technical University, you are free and clear to give me a hard time about it.

Of course I do hope all of you do what you can to preserve and improve your health. If an old guy like me can do it, you can, too.

Note–the next blog post will be January 8, 2019.

Everything to Know about VIOS 3.1

Edit: Some links no longer work.

Originally posted December 11, 2018 on AIXchange

VIOS 3.1 is here, and now is the time to start planning your next move. Should you replace your hardware with new servers and do fresh VIOS installs on those machines, or would a gradual upgrade of your dual VIO servers make more sense? Get informed by digging into the numerous, valuable resources that have recently come out.

The time you invest in this material will be well spent.

Implementing the vHMC Requires Attention to Detail

Edit: Do you use a mix of hardware and virtual HMCs or are you all virtual?

Originally posted December 4, 2018 on AIXchange

If you’re planning to get rid of your physical appliances and run all of the HMCs in your environment as virtual machines, keep this in mind:

Originally the IBM POWER HMC was sold only as an integrated appliance that included the underlying hardware as well as the HMC firmware. IBM extended the POWER HMC offering to allow the purchase of the traditional hardware appliance (e.g. model 7042/7063) or a firmware only virtual HMC machine image. The virtual HMC (vHMC) offering allows clients to use their own hardware and server virtualization to host the IBM supplied HMC virtual appliance.

Support for vHMC
Since the hardware and server virtualization is supplied by the client to run the HMC virtual appliance, this infrastructure that actually hosts the HMC virtual appliance is not monitored by IBM. Serviceable events related to the vHMC firmware are monitored however “call-home” for these events is disabled. For further information see document Callhome on HMC Serviceable Events is Disabled on vHMC 

The HMC virtual appliance continues to monitor the managed Power Systems hardware just like the HMC hardware appliance. Both HMC form factors provide remote notification and automatic call-home of serviceable events for the managed Power Systems servers.

Support for vHMC firmware, including how-to and usage, is handled by IBM software support similar to the hardware appliance. When contacting IBM support for vHMC issues specify “software support” (not hardware) and reference the vHMC product identification number (PID: 5765-HMV).

How-to, install, and configuration support for the underlying virtualization manager is not included in this offering. IBM has separate support offerings for most common hypervisors which can be purchased if desired.

That document also includes a brief Q&A. I’ll highlight the following, which often goes overlooked:

Q: Are there any restrictions related to on-site warranty support for managed servers?
A: Restrictions are similar to the hardware appliance
– You must supply a workstation or virtual console session located within 8 meters (25 feet) of the managed system. The workstation must have browser and command line access to the HMC. This setup allows service personnel access to the HMC.
– You should supply a method to transfer service related files (dumps, firmware, logs, etc) to and from the HMC and IBM service. If removable media is needed to perform a service action, you must configure the virtual media assignment through the virtualization manager or provide the media access and file transfer from another host that has network access to HMC.
– Power vHMC cannot manage (nor service) the server it is hosted on.

The big takeaway is that you shouldn’t assume IBM service reps will plug into your customer network to access your virtual HMC. If you need assistance, IBM expects you to provide a workstation that they can access. And yes, this can be problematic. Worst case: some sort of outage is affecting your VMware cluster while IBM Support needs to work on your POWER hardware. Then you might end up in a pickle.

Incidentally, this is one significant point in favor of the traditional HMC form factor. It takes up 2U in your rack and it exists solely to manage your machines. Nonetheless, people will continue to move away from hardware-based HMCs, so it’s important to understand the requirements. While I prefer keeping a hardware appliance available and using the vHMC as a backup, of course every environment is unique. Only you know what will work best for you.

GDR as a Disaster Recovery Option

Edit: Still something to consider.

Originally posted November 27, 2018 on AIXchange

A sound disaster recovery plan is one that’s regularly being updated. With this in mind, I want to cite this overview of Geographically Dispersed Resiliency (GDR), a DR option that is designed for efficiency.

The GDR solution provides a highly available environment by identifying a set of resources that are required for processing the virtual machines in a server during disaster situations.

The GDR solution uses the following subsystems:

    Controller system (KSYS)
    Site
    Host
    Virtual machines (VMs) or logical partitions (LPARs)
    Storage
    Network
    Hardware Management Console (HMC)
    Virtual I/O Server (VIOS)

IBM Support also has a comparison of PowerHA and GDR:

Disaster recovery of applications and services is a key component to provide continuous business services. The Geographically Dispersed Resiliency for Power Systems (GDR) solution is a disaster recovery solution that is easy to deploy and provides automated operations to recover the production site. The GDR solution is based on the Geographically Dispersed Parallel Sysplex (GDPS) offering concepts that optimizes the usage of resources. This solution does not require you to deploy the backup virtual machines (VMs) for disaster recovery. Thus, the GDR solution reduces the software license and administrative costs.

The following high availability (HA) and disaster recovery (DR) models are commonly used by customers:

    Cluster-based technology
    VM restart-based technology

Clustered HA and DR solutions typically deploy redundant hardware and software components to provide near real-time failover when one or more components fail. The VM restart-based HA and DR solution relies on an out-of-band monitoring and management component that restarts the virtual machines on other hardware when the host infrastructure fails. The GDR solution is based on the VM restart technology.

The following table identifies the differences between the conventional cluster-based disaster recovery model and the GDR solution:



A disaster recovery implementation that uses a set of scripts and manual processes at a site level might take more time to recover and restore the services. The GDR solution automates the operations to recover your production site. This solution provides an easy deployment model that uses a controller system (KSYS) to monitor the entire virtual machine (VM) environment. This solution also provides flexible failover policies and storage replication management.

Finally, Michael Herrera has some great videos that cover conceptssoftware and advanced features.

As you design and update your DR solutions, be sure to consider GDR.

Exploring the Possibilities

Edit: How do you play with AIX?

Originally posted November 20, 2018 on AIXchange

There’s a lot you can do with AIX. But that doesn’t mean we won’t search for even more ways to play with it.

For instance, Chris Gibson recently got AIX running on a Macbook:

After reading this https://worthdoingbadly.com/aixqemu and this https://lists.gnu.org/archive/html/qemu-ppc/2018-05/msg00387.html, I was inspired and very curious. Could I get AIX 7.2 running on QEMU on my MacBook Pro (running Mac OS X 10.13.6)?

Well, the answer my friends, is yes…sort of.

Many thanks to Rob McNelly who originally tweeted this link, https://worthdoingbadly.com/aixqemu. If he had not, I would never have made the journey to QEMU land. So thanks Rob!

Also, thanks to Liang Guo for his assistance. Your guidance was greatly appreciated.

Note: What I describe here is NOT supported by IBM. It is purely a lab experiment to see what was possible with qemu-system-ppc64.

Then there’s this example of AIX 7.2 running on x86 hardware.  

Although those configurations would run too slowly for my taste, I’ve always loved the idea of having lab hardware to test/learn with. Of course IBM Power Systems servers typically run mission critical applications, so playing with the hardware available at work generally isn’t an option. (At the very least, you’d need test/disaster recovery/lab hardware; some workplaces have more options available than others.) Sure, some people buy old used servers and run them at home, but that’s not practical for everyone.

Nonetheless, it’s fun to follow what’s going on out there.

With OpenPOWER taking off, I’ve been tracking the new workstations that are available for running Linux on Power. One is from Raptor Computing Systems, the Talos II:

Talos II — the world’s first computing system to support the new PCIe 4.0 standard — also boasts substantial DDR4 memory, dual POWER9 CPUs, next-generation security, and a price that won’t break the bank. Let the power of Talos II accelerate your computing!

Offerings range from the secure workstation to the basic Talos II bundle.

The price point and specs caught my eye:

The Talos II mainboard is the first modern (post-2013), owner-controllable, workstation- and enterprise-class mainboard. Built around the brand-new IBM POWER9 processor, and leveraging Linux and OpenPOWER technology, Talos II allows you to secure your data without sacrificing performance. Designed with a fully owner-controlled CPU domain, you can audit and modify any portion of the open source firmware on the Talos II mainboard, all the way down to the CPU microcode. This is an unprecedented level of access for any modern workstation…

Getting AIX running in this type of modern environment would be amazing. Imagine being able to acquire some sort of student AIX license while having access to this kind of hardware in your home lab. You could run Linux on Power and AIX on POWER9 hardware that sits on your desktop. That sounds like… nirvana.

As these technologies evolve and the prices come down, my temptation level goes up. Do you know of other POWER9-based workstations or similar technology that’s on the horizon?

More Fun with AIX on a Nutanix Cluster

Edit: The cluster was fun to play with

Originally posted November 13, 2018 on AIXchange

I recently had another hands-on experience with a Nutanix Cluster.

This system consisted of four CS821 nodes. After previously doing an install with the cloud-ready deployment method, I wanted to try an .iso installation as well as installing from NIM. Those are the big three when it comes to installing AIX on Hyperconverged systems.

The first step is to create a VM. Nutanix has an image library that’s much like the virtual media repository on a VIO server in PowerVM. Populating this library with IBM-provided AIX .iso files turned out to be as simple as this:

  • I logged into Prism, opened “image configuration” and selected “upload image.”
  • I named the image (AIX_7200-03-01-1838_flash.iso was the latest available as of this writing), changed the image type to ISO.
  • Then I chose a storage container for the image and provided the image source.

That last one is a nice touch, by the way. Rather than download to your machine and then upload to the cluster and use that as your source, you can provide a URL and Nutanix will download the file directly from the source for you. I selected the correct .iso image from the IBM Entitled Software Support (ESS) site, and rather than using the download director, I selected the “click here to use http” option. This provided the link from IBM’s site to the .iso image to feed to Nutanix.

With my image on the server, I was ready to boot from it. At last check, these files were available from ESS:

  • ISO, AIX v7.2 Install DVD 1 (TL 7200-03-00-1837 9/2018)
  • ISO, AIX V7.2 Install DVD 2 (TL 7200-03-00-1837 9/2018)
  • ISO, AIX v7.2 Install flash (TL 7200-03-01-1838 9/2018)
  • GZ, AIX v7.2 Cloudrdy Virtual Mach Image TL 7200-03-01-1838, (9/2018)

Since DVD 1 is a space-saving .ZIP file, I initially downloaded that. It turns out though the system can’t process .ZIPs, so I instead went with the install flash .iso image. Of course I could have downloaded DVD 1 to my workstation, done the unzip there and then uploaded it, but that would be self-defeating. The idea is to download directly from IBM.

To continue testing, I created a test virtual machine and gave it CPU and memory. Then when I got down to the disks, I selected the virtual CD, told it I wanted to clone from the image service, gave it my AIX v7.2 install flash .iso image, and clicked on update. I added an additional virtual disk to be my hdisk0 in AIX, added in a virtual NIC, and saved the profile.

At this point I powered on my VM and got two options for consoles: a VNC and a COM1. The VNC console allows you to interact with OpenFirmware; COM1 is a traditional serial console.

One thing I’ve yet to figure out is how to display LED codes in the VM table display in Prism. But that just gives me more to look forward to as I continue working with these clusters.

Anyway, my VNC console showed that the VM had booted, while my COM1 console was blank. I entered 1 and my console started to display LED codes. I soon got to my familiar screen where I was prompted to press 1 to install in English.

There was my normal base operating system install and maintenance screen where I could press 1 (to start install with default settings) or 2 (to change/show install settings and install). I entered 2, and wouldn’t you know, it couldn’t detect the Nutanix disk I’d assigned to install the OS.

Luckily support was aware of this issue and had a procedure ready. I needed to go back into the previous welcome to base operating system installation and maintenance screen and follow these instructions:

3 Start Maintenance Mode for System Recovery
3 Access Advanced Maintenance Functions
>>> 0 Enter the Limited Function Maintenance Shell
$ cfgmgr (errors are expected – many devices are not yet available to be configured)
$ exit
99 (Return to previous menu)
5 Select Storage Adapters
>>> 1 scsi0      qemu_vhost-user-scsi-pci:0000:00:02.0
2 Change/Show Installation Settings and Install
1 Disk(s) where you want to install ……
1 hdisk0    qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0
>>> 0  Continue with choices indicated above

After doing this, the disk I’d assigned to the VM appeared and I was able to install AIX to it as expected. Interestingly, I was getting LED codes to my console during the install, but otherwise everything looked the same as any other AIX install from .iso.

Once I got AIX installed, I went ahead and set it up as a NIM server, as I also wanted to test network boot. This too went as expected. The main difference came in how the client is booted from the NIM server. I followed these directions, and after I’d configured my NIM server and created a VM to attempt to boot from it, I powered it on and opened a VNC console. As found in the instructions, here’s the necessary syntax:

To boot the client from the network install manager (NIM) master at the OpenFirmware prompt, use the following command template:
0> boot <NIC-device>:<nim-server-ip>,<\path\to\client\bootfile>,<clientip>,<gateway-ip>

Further in the document, there’s an example:

The following commands boot the client VM from the network install manager (NIM) master at the OpenFirmware prompt:
0> boot net:9.3.94.78,\tftpboot\client-vm.ibm.com,9.3.94.217,9.3.94.1

This worked as expected, and I was able to boot over the network. Unless you have a flat network, I recommend having your NIM server on the Nutanix cluster you’re booting from. As the document states:

“If you are using a static IP address for the client virtual machine (VM), the client and server must be on the same subnet when booting a VM across the network. You cannot specify a subnet mask in the boot command as shown in the following example.”

I took a mksysb to my NIM server and installed a different VM from the mksysb image. Again, everything worked exactly as expected.

One small annoyance was that the COM1 consoles wouldn’t survive power off/power on of the virtual machine, although you could probably get around that by logging into a controller VM and opening a console that way.

As I learn more I’ll be sure to share it. Feel free to tell me about any Nutanix cluster specifics you’d like to read about.

Porting to POWER9

Edit: Have you tried doing the same thing?

Originally posted November 6, 2018 on AIXchange

Linux runs on everything from embedded devices to mainframes. So why should we care that Linux runs on IBM Power Systems servers? Many developers and users are perfectly happy running Linux applications on x86, since that’s the environment they know. However, lack of awareness of alternatives is another factor. In the case of Power Systems, some believe it can be difficult to move an application from x86 to POWER. Of course we know that this is unfounded?and in fact with the relatively recent change from the big endian to little endian format, moving to POWER has never been simpler.

With this in mind, Philippe Hermès recently tweeted this information from French consulting firm ANEO, which is porting some of its applications to POWER9 systems:

In partnership with IBM, ANEO has started porting some applications on IBM latest POWER9 systems. The Power architecture (and the POWER9 processor in particular) is optimized for high memory bandwidth and better performance for applications that require frequent data access.

Memory bandwidth is a technical feature that is not very emphasized in hardware specifications, yet it is often the main performance bottleneck in today’s applications.
One of the codes that have been ported on Power is SeWaS (Seismic Wave Simulator), a modern and optimized simulation software developed by ANEO.

The two goals of this study were to assess performance and difficulty of porting an application on Power.

DIFFICULTY OF PORTING:
The Power architecture uses a specific CPU instruction set, which requires recompiling applications and their dependencies. IBM claims, however, that “95% of Linux x86 applications written in C/C++ are ported on Power without having to change the source code.”

In our case with SeWaS the porting was surprisingly easy. We simply ran the exact same installation script that we usually run on Intel processors and everything worked as expected, making transparent that it was being compiled for a different architecture.

In particular, IBM provides a free software suite named Advance Toolchain, containing most of the common HPC libraries optimized for Power (Boost…) as well as the GCC 7 compiler, which proved very convenient.

PERFORMANCE:
The benchmark was done on virtual machines provided by IBM with only 2 physical cores, which is not a very representative sample of performance. Though, performance measured on these 2 cores is very promising, and it is clear at least that the application was correctly optimized for the Power architecture, even though a very generic installation script was used.

OPTIMIZATIONS / FUTURE DEVELOPMENTS:
Aneo will be doing further benchmarks on Power in the future, especially on systems with POWER9 + NVidia GPUs, from which a much greater performance difference is to expect (usually 5 or 10 times better performance compared to regular CPU machines).
One of the main advantages of POWER9 is enhanced support of accelerators (FPGA, NVidia GPUs), with technologies such as CAPI and NVlink for higher bandwidth, from which seismical applications benefit a lot.

With the memory bandwidth and performance improvements that can be expected from a simple recompile, developers should find it worth their time to at least investigate running their applications on POWER. But even if you’re not developing or recompiling anything at all, this is nonetheless a good reminder of how the enhanced Power Systems architecture benefits your own applications.

A Change to the SMT Mode Default in POWER9

Edit: Did you notice any issues with this change?

Originally posted October 30, 2018 on AIXchange

There’s a rather significant change with the default SMT mode in AIX 7.2 TL3 running on POWER9 servers:

“For POWER9 technology-based servers, the default SMT setting for AIX 7.2 TL 3 has been changed to SMT8 to provide the best out-of-the-box performance experience. For POWER8 technology-based servers, the default SMT setting remains SMT4.”The first thing to understand is that this is a welcome change. IBM has found that running more threads benefits most POWER9 workloads. Naturally any system will perform differently at SMT-8 than SMT-4, so awareness is the key here. Administrators like to know what to expect from their operating system, and they can ill afford to be the last to know how the system is performing. You never want users alerting you to a change in performance.

Of course, if the old setting works best in your environment, you can adjust the SMT level post-upgrade by running the smtctl command:

Each individual Simultaneous Multi-threading (SMT) thread of a physical processor core is treated as an independent logical processor by AIX. The AIX operating system limits the combination of physical processor cores assigned and SMT modes in order to maintain symmetry across all of the physical processor cores assigned to AIX.

When booting a P8 Logical Partition (LPAR), the default number of SMT thread is 4. To increase the default number of SMT threads dynamically, enter:

    smtctl -m on
    smtctl -t 8

The change to SMT-8 is effective immediately and reboot is not required. If you want the setting to persist after rebooting, then you must rebuild the boot image with the bosboot command. The default SMT-4 is intended for better performance for an existing applications that are not designed or compiled for more than 4 threads.

While this information deals with POWER8 and upping the default, you get the idea.

If you’re moving to POWER9 hardware and AIX 7.2 TL3 in the near future, be sure to keep this change in mind.

HMC Enhanced GUI: A Cautionary Tale

Edit: Be careful

Originally posted October 23, 2018 on AIXchange

Just in time for Halloween, here’s a scary story involving the HMC enhanced GUI version and an inexperienced user.

As I understand it, an administrator was using the enhanced GUI to mount an .iso image that was stored in the organization’s virtual media repository. The admin selected virtual storage. Then this individual selected a VIO server and clicked on Action/Manage Virtual Storage. This displays a window that says the VIO server is being queried. The window has multiple tabs, including virtual disks, storage pools, physical volumes, optical devices and virtual fibre channel.

At this point, the admin should have selected optical devices, which allows you to manage virtual optical media. Instead, the virtual fibre channel tab was selected; this brings up fcs devices. A device was chosen, and then the admin opted to modify partition connections. Now, if you’re following along in your own HMC, be careful. The default is that all assigned connections are checkmarked, and there’s a button that forces connection removal from running partitions. If you select that and click OK, all of the checked mappings are removed. It’s a dynamic LPAR operation and everything is wiped.

And that’s what happened. The admin for some reason ignored the warning message, and all of the NPIV mappings were removed from the VIO server. The adapter information was still in the saved profile, but the mappings were gone from the running profile. Fortunately this organization had dual VIO servers, so the client LPARs weren’t affected, but it was a chore to recreate all of the mappings on that particular VIO server. (Given the lack of a change window, rebooting the VIO server wasn’t an option.)

If you ever find yourself in this situation, you may be able to retrieve your mappings by shutting down the VIO server and restarting from the saved profile. But make sure you can rebuild the mappings from your virtual to physical adapters if necessary. Know which virtual adapters are mapped to which physical adapters, and keep the additional critical information that’s needed to recreate your environment. Know the corresponding WWN numbers. Hopefully you’re running hmcscanner regularly, and you should be backing up your VIO configs and VIO servers.

There’s good logging on the new HMC code, which was helpful in this case. We were able to identify the user and the commands that were run.

In short, be careful. The enhanced GUI is still fairly new. Take the time to get used to it.

Restricting Access to the AIX Error Report

Edit: Have you found a use for this in your environment?

Originally posted October 16, 2018 on AIXchange

Awhile back on Twitter, Chris Gibson noted that, starting with AIX 7.2 TL3, administrators will be able to prevent non-privileged users from viewing the AIX error report.

IBM Support has the details:

The restriction can be enabled or disabled by system administrator using “/usr/lib/errdemon -R enable” and “/usr/lib/errdemon -R disable.” By default the restriction is disabled.

When the restriction is disabled, any user can view system error report.
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DE84C4DB   0711092118 I O ConfigRM     IBM.ConfigRM daemon has started.
69350832   0711091818 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0711091918 T O errdemon       ERROR LOGGING TURNED ON

To enable the restriction
(0) root @ spruce1:/
# /usr/lib/errdemon -R enable

(0) root @ spruce1:/
# /usr/lib/errdemon -l

Error Log Attributes
——————————————–
Log File                 /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000
PureScale Logging       off
PureScale Logstream     CentralizedRAS/Errlog
Restrict errpt to privileged users      enable

After enabling the restriction, it will prompt error message if a non-authorized users try to view error report.

(0) testuser @ spruce1:/
# errpt
errpt:
        User does not has sufficient authorizations.

How to enable a user to view error report?
Make him a privileged user by assigning authorization aix.ras.error.errpt

(0) root @ spruce1:/
# mkrole authorizations=”aix.ras.error.errpt” role_errpt

(0) root @ spruce1:/
# chuser roles=role_errpt testuser

(0) root @ spruce1:/
# setkst
Successfully updated the Kernel Authorization Table.
Successfully updated the Kernel Role Table.
Successfully updated the Kernel Command Table.
Successfully updated the Kernel Device Table.
Successfully updated the Kernel Object Domain Table.
Successfully updated the Kernel Domains Table.
Successfully updated the Kernel RBAC log level.

Now the normal user “testuser” can execute errpt

(0) testuser @ spruce1:/
# swrole role_errpt
testuser’s Password:

(0) testuser @ spruce1:/
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DE84C4DB    0711092118 I O ConfigRM            IBM.ConfigRM daemon has started.
69350832       0711091818 T S SYSPROC            SYSTEM SHUTDOWN BY USER
9DBCFDEE   0711091918 T O errdemon              ERROR LOGGING TURNED ON

If this applies to your environment, be sure to add this to your build documentation, checklists and gold images once you’ve updated to AIX 7.2 TL3.

New Stencils Available for POWER9 Systems

Edit: Stencils are a must have for documentation

Originally posted October 9, 2018 on AIXchange

Hopefully you saw the news on Twitter, via Alan Fulton and Nicollette McFadden, among others.

Visio stencils are now available for selected IBM Power Systems POWER9 models, including the AC922 and S914.

For anyone who uses Visio, this is welcome news. Having the proper stencils makes it much easier to create diagrams and documentation. The stencils can be downloaded from here.

When you extract and open them in Visio, you’ll find 115 stencils. At the top of this link you can the most recent updates:

24-Sep-2018
IBM-Common.zip
      IBM-Racks.vss – Added 7042-CR7, 7042-CR8, 7042-CR9 and 7063-CR1 HMC front and rear views
      IBM-SAN.vss – Added 2145-SV1 SVC front and rear views
IBM-Server-Power.zip
      IBM-Server-Power.vss – Added Power S821LC, S822LC (4 models), AC922 and S914 Tower front and rear views
IBM-Tape.zip
      IBM-Storage-Tape.vss – Added TS4300 Base and Expansion Library front and rear views and LTO drives
Stencil File updates includes the renaming of many stencil files to remove “System” from the file name (Disk, Tape and Network).

In addition, archives can be found here:

27-Aug-2018
IBM-Server-Power.zip
   IBM-Server-Power.vss – Added Power E850C, E870C and E880C front and rear views

14-Aug-2018
IBM-Server-Power.zip
   IBM-Server-Power.vss – Added Power S914, H922, L922, S922, H924, S924, E950, E980 and EMX0 PCIe Gen3 Exp. front and rear views
IBM-Classic-Full.zip
   IBM-Server-Power-Classic.vss – *New File* – Moved all Power 5xx Server (Power6) shapes to this new file

Do you document you systems in Visio? Many in my circle do, and for these folks, it’s been a lengthy wait since the previous update. Everyone I’ve talked to is very happy about this development.

NIM Management via HTTP

Edit: Still a good option to consider

Originally posted October 2, 2018 on AIXchange

I love NIM. I rely on NIM when I’m doing new server builds, and it’s also my go-to for installing the VIO server.

Chances are you love NIM as well. That said, one thing you might not be fond of is firewalls between your NIM server and NIM clients, which requires you to work with your network team and ask for ports to be opened.

Here’s a breakdown of ports that need to be opened in a firewall for use with NIM:

ProtocolPort(s)
nimsh3901 – 3902
icmp              5813
rsh*                513 – 1023**
rlogin*513
shell*514
bootp  67 – 68
tftp69 and 32,768 – 65,535
nfs2049
mountd32,768 – 65,535 or user’s choice
portmapper111
NIM1058 – 1059

Again, in some environments, getting approval for such extensive access can be a challenge. Fortunately, a potential alternative exists. Read this IBM Knowledge Center doc to determine if using NIM over HTTP can work in your environment:

Network Installation Manager (NIM) supports the installation of AIX updates over the Hypertext Transfer Protocol Secure (HTTP) protocol to conform to the emerging data center policies that restrict the use of network file server (NFS).

AIX BOS installation still requires the use of the NFS version 3 protocol or the more secure NFS version 4 protocol. In addition to the installation of filesets, NIM customization processes such as script execution and copying the file_res directory are supported over the HTTP protocol.

The HTTP protocol provides the following advantages for NIM management:

  • All communication occur over a single HTTP port. Hence, the authorization through a firewall is easier to manage.
  • AIX installation steps are driven from the client’s end, that is, the target system of the installation. Therefore, remote access is not required for running the commands.
  • NIM or any other products that currently use the client-server model of NFS can easily use HTTP.
  • (The capability) to extend the end product to support additional protocols.

AIX 7.2.0 ships a new service handler that provides HTTP access to NIM resources. The nimhttp service is defined in the /etc/services and the nimhttp daemon, which listen for requests over the 4901 port. When the nimhttp service is active, NIM clients attempt to access the /etc/services file and request customization of the scripts that are defined in the nimhttp service. If HTTP access fails or if the access is denied, access failover attempt to the NFS client occurs.

Were you aware of this option? Have you used it before?

What’s in Store for Memory

Edit: The more things change

Originally posted September 25, 2018 on AIXchange

In August there was an event called Hot Chips 30. A long-running conference for the semiconductor industry, Hot Chips is the place to learn about high-performance microprocessors and related topics like system memory. Here are a couple of interesting articles that came out of the conference that look at memory and where it’s headed in the near future.

This is from HPCwire:

Having launched both the scale-out and scale-up POWER9 [servers], IBM is now working on a third P9 variant with “advanced I/O,” featuring IBM’s 25 GT/s PowerAXON signaling technology with upgraded OpenCAPI and NVLink protocols, and a new open standard for buffered memory….

“The PowerAXON concept gives us a lot of flexibility,” said [IBM Power architect Jeff] Stuecheli. “One chip can be deployed to be a big SMP, it can be deployed to talk to lots of GPUs, it can talk to a mix of FPGAs and GPUs – that’s really our goal here is to build a processor that can then be customized toward these domain specific applications.”

The article concludes with this:

The roadmap shows the anticipated increase in memory bandwidth owing to the new memory system. Where the POWER9 SU chip offers 210 GB/s of memory bandwidth (and Stuecheli says it’s actually closer to 230 GB/s), the next POWER9 derivative chip, with the new memory technology, will be capable of deploying 350 GB/s per socket of bandwidth, according to Stuecheli.

“If you’re in HPC and disappointed in your bytes-per-flop ratio, that’s a pretty big improvement,” he said, adding “we’re taking what was essentially the POWER10 memory subsystem and implementing that in POWER9.” With Power10 bringing in DDR5, IBM expects to surpass 435 GB/s sustained memory bandwidth….

It’s an odd statement of direction, but maybe a visionary one, essentially saying a processor isn’t about computation per se, but rather it’s about feeding data to other computational elements.

This piece from top500.org says IBM is aiming to take memory in a new direction:

The memory buffering adds about 10ns of latency to memory accesses compared to a direct DDR hookup, but the tradeoff for more bandwidth and capacity is worth it for these extra-fat servers. And although the Centaur buffered memory implementation still uses DDR memory chips as the storage media, this no longer really needs to be the case since the DDR smarts have moved off the chip.

IBM plans to generalize this memory interface, which will be known as OpenCAPI memory, in their next version of the POWER9 processor that is scheduled to be launched in 2019. As far as we can tell, these upcoming POWER9 chips will be suitable for two-socket HPC servers, as well as mainstream systems. IBM is projecting that its next POWER9 chip will support over 350 GB/sec of memory bandwidth per socket, which is more than twice the speed of today’s fastest chips for two-socket servers. The company also intends to reduce the latency penalty to around 5ns in its first go-around.

Perhaps the bigger news here is that OpenCAPI memory will be proposed as an open standard for the entire industry. The use of the OpenCAPI brand is intentional, since IBM wants to do for memory devices, what the original OpenCAPI was designed to do for I/O devices, namely level the playing field. In this case, the idea is to enable any processor to talk to any type of memory via conventional SerDes links. As a result, CPUs, GPUs, or FPGAs would no longer need to be locked into DDR, GDDR, or any other type of memory technology. So, for example, a chip could use the interface to connect to traditional DDR-type DIMMs, storage-class memory based on NAND or 3D XPoint, or some other type of specialized memory.

Many times we are focused on what we can buy and deploy right now. But if you want to see where things are headed, read these and other articles from the conference at the Hot Chips site.

An Important Reminder about VIOS

Edit: Hopefully you have upgraded by now

Originally posted September 18, 2018 on AIXchange

Nigel Griffiths recently tweeted this reminder:

In Q4 2018 VIOS 2.2.4 will fall out of regular support
In Q4 only 2.2.5.*, 2.2.6.* and 3.1 will be supported
If on 2.2.4 or older upgrade to 2.2.6 NOW
I encourage every one to plan: Testing VIOS 3.1 in Q4 + upgrade in Q1
VIOS 3.1=has many good features + RAS

We have some work to do, ready or not.

Sometimes it’s easy to ignore your systems, because they just run. That is, they run until they stop running. At that point, you call support and learn that, while IBM fixed this issue ages ago, you can’t realize this benefit until you update your firmware, VIO server, AIX levels, etc.

Yes, our systems just run, but they still require maintenance. Hopefully you have a regular patch cycle and your systems are up to date.

But specifically with VIOS, it’s time to take action. Check this lifecycle information. What level are you on? Are you supported by IBM? Will you be supported after the fourth quarter of 2018? Start testing VIOS 3.1 as it becomes available (details are here and here) and plan for your change windows now. Remember to look at the maintenance strategy, and if you’re wondering which software versions you should be running, use FLRT.

With the recent announcement of POWER9 systems, it’s critical that the software you run can take full advantage of the new hardware.

LPM Copy Time Stats

Edit: I am always looking for other perspectives

Originally posted September 11, 2018 on AIXchange

Newsflash: I’m not the only person out there writing about AIX and related topics. I also understand that this is a good thing, for all of us. I know l like reading about this stuff as much as I like writing about it. With that in mind, I recently discovered Stephen Diwell’s blog. Stephen is an IBM Champion who discusses AIX performance, the VIO server, the HMC, PowerHA and much more. He doesn’t post often, but there’s lots of useful technical content, including this on Live Partition Mobility (LPM) copy time stats:

VIO Servers have the statistics on LPM copy times and the amount of data copied. This is stored in an alog file that only root can read, given that padmin (or like users) do not have access to alog command.

Login to your VIO Server.

You need to be root:
oem_setup_env

Read the LPM alog file:
alog -of /home/ios/logs/LPM/lpm.log | more

Example Output:
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1575] Final migration statistics:
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1620] Overall time of migration: 3 minutes 43 seconds
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1641] Amount of data received: 161695465991 bytes
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1645] Effective network throughput: 725091775 bytes/s

Short. Sweet. Useful.

Also read about Power Systems firmware updates and tracking CPU usage. Here are some of his other posts:

  • IBM HMC Upgrades
  • Optimizing Power with Affinity Groups
  • AIX NIM Server Tuning
  • p7 vs p8 Real World Core Reductions
  • p7 Core cost from poor LPAR Affinity
  • AIX Dynamic LPAR Name Changes
  • PowerVM LPM with a Dead VIO Server
  • AIX or VIOS Errors: 29FA8C20 and 7BFEEA1F

Take the time to read them all. Are there other AIX resources you recommend?

A Discussion of Software Security

Edit: Some links no longer work

Originally posted September 4, 2018 on AIXchange

Containers or virtual machines–which provides greater security? IBM Research attempted to answer this question, as explained in this recent article:

Are virtual machines (VM) more secure than containers? You may think you know the answer, but IBM Research has found containers can be as secure, or more secure, than VMs.

James Bottomley, an IBM Research Distinguished Engineer and top Linux kernel developer, writes:

“One of the biggest problems with the current debate about Container vs Hypervisor security is that no one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors ‘feel’ more secure than containers because of the interface breadth) but no one actually has done a quantitative comparison.”

To meet this need, Bottomley created Horizontal Attack Profile (HAP), designed to describe system security in a way that it can be objectively measured. Bottomley has discovered that “a Docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.”

He performed these tests with Docker, Google’s gVisor, a container runtime sandbox; gVisor-kvm, the same container sandbox using the KVM, Linux’s built-in VM hypervisor; Kata Containers, an open-source lightweight VM; and Nabla, IBM’s just released container type, which is designed for strong server isolation.

Bottomley’s work is only the start. He’s shown it’s possible to objectively measure an application’s security. As he said, “I don’t expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well.

I would need more details, but the article makes it sound like this work was done on x86 with x86 hypervisors. I wonder if the results would be different if the containers ran in Linux on Power with the PowerVM hypervisor:

The PowerVM and Power hardware teams always put security at the center of our designs. Protection of client data is one of the key values of a PowerVM solution. If you ever wondered if your hardware or software are exposed to a security issue, the USA National Institute of Standards and Technology (NIST) maintains a searchable DB of all known vulnerability. Searching for PowerVM or PowerVM Hypervisor will display “There are 0 matching records.” This is because the PowerVM Hypervisor have yet to have a security vulnerability. Searching for other virtualization solutions will list all their known vulnerabilities which you should be sure to address to protect your confidential information. The following blog contains details about how PowerVM provides data isolation between partitions to maintain our perfect security record.

PowerVM takes advantage of the Power hardware to provide high levels of security. The hardware is designed with three different protections domains, Hypervisor domain, Kernel domain and application domain. The hardware limits the instructions that can be executed based on the current protection domain and the hardware provides very specific entry points to transition between domains. If a lower priority domain attempts to issue an instruction reserved for a higher priority domain, the instruction will generate an instruction interrupt within the current domain. The most privileged level is the hypervisor domain which is where the PowerVM security takes place. For example, instructions that change the mapping of partition addresses to physical real addresses, instructions that modify specific hardware registers are restricted such that they are only allowed in hypervisor mode.

The way the hardware has been designed, only the hypervisor is able to access memory via a physical real address. Code running in partitions accesses memory through a layer of indirection where the partitions addresses are actually aliases to the physical real memory. This support is not only leveraged for partition isolation but is leveraged by other virtualization functions on the server.

If we’re talking about IBM Power Systems servers, I would still argue that an LPAR is more secure. What do you think?

Other Options for Volume Group Backups

Edit: Still good to remember

Originally posted August 28, 2018 on AIXchange

How do you backup your volume groups? These days we’re often dealing with snapshots on a SAN, but there are still occasions when you want to backup to a local tape drive, a file, or a NIM server. The specifics depend upon the amount of data and the type of infrastructure we’re dealing with. (Obviously moving terabytes across a 100mb network isn’t the best way to go.) 

When backing up rootvg, you’d typically run mksysb. But how would you backup datavg? The best choice is to run savevg:

The savevg command finds and backs up all files belonging to a specified volume group. The volume group must be varied-on, and the file systems must be mounted. The savevg command uses the data file created by the mkvgdata command.

Note: The savevg command will not generate a bootable tape if the volume group is the root volume group. Although the tape is not bootable, the first three images on the tape are dummy replacements for the images normally found on a bootable tape. The actual system backup is the fourth image.

To restore from this backup, run restvg:

The restvg command restores the user volume group and all its containers and files, as specified in the /tmp/vgdata/vgname/vgname.data file (where vgname is the name of the volume group) contained within the backup image created by the savevg command.

The restvg command restores a user volume group. The bosinstall routine reinstalls the root volume group (rootvg). If the restvg command encounters a rootvg volume group in the backup image, the restvg command exits with an error.

If a yes value has been specified in the EXACT_FIT field of the logical_volume_policy stanza of the /tmp/vgdata/vgname/vgname.data file, the restvg command uses the map files to preserve the placement of the physical partitions for each logical volume. The target disks must be of the same size or larger then the source disks specified in the source_disk_data stanzas of the vgname.data file.

Another option is backing up just your volume group structure, which can be used to recreate volume groups and filesystems. This is hugely beneficial if you’re cloning a system to new disks on a new frame where you want the exact same volume groups and filesystems. Simply use the savevgstruct and restorevgstruct backup commands. On the latter:

The restorevgstruct command restores the structure of a previously saved user volume group. If the -ls flag is specified, a list of previously saved volume groups and the date each volume group was saved is displayed. This command does not work on rootvg.

I understand that these concepts are familiar to many of you, but I’m regularly questioned about these things, so I believe it’s a worthy discussion. Hopefully this provides some clarity.

Vulnerability Checker Provides Security Info

Edit: This is still a useful tool.

Originally posted August 21, 2018 on AIXchange

The FLRT Vulnerability Checker Online (FLRTVC) allows you to check your AIX system for HIPER and Security vulnerabilities:

The Fix Level Recommendation Tool Vulnerability Checker (FLRTVC) online provides security and HIPER (High Impact PERvasive) reports based on the fileset inventory of your system. The report will guide you in discovering vulnerable filesets, the affected versions, interim fixes that are installed, as well as a link to the security bulletin for further action.

FLRTVC exists as a standalone ksh script which may be downloaded from our FLRTVC Script webpage. FLRTVC uses HIPER/Security data from FLRT (aparCSV) to compare against the installed filesets (lslpp -Lcq) and interim fixes (emgr -lv3) to report your risks.

This webpage was developed based on feedback received from customers at Edge2015. We welcome your feedback on this tool and ways to improve it! Please use the Feedback button on the FLRT page or visit the FLRT IBM developerWorks Community. Follow us on Twitter @IBM_FLRT for updates!

Follow the instructions to get started:

FLRTVC Online will accept two input files, lslpp.txt (required) and emgr.txt (optional), that will be cross-examined with the aparCSV that is provided through our website. If any filesets listed in lslpp.txt are found to be within the affected versions listed in aparCSV, they will be displayed in the generated report.

Step 1) Log in to the AIX server that will be checked for vulnerabilities.
Step 2) Run the “lslpp” command: lslpp -Lcq > lslpp.txt
Step 3) (optional) Run the “emgr” command: sudo emgr -lv3 > emgr.txt
Step 4) Move the files to a machine that has an internet browser.
Step 5) Upload the file(s) using the buttons of their respective type.
Step 6) (optional) Filter the filesets using a search term.
Step 7) (optional) Select an APAR type.
Step 8) Click on “Run vulnerability checker” to begin.

If you’d prefer to not run the report interactively, one machine at a time, submitting each one via a web page (and I suspect this applies to most of you), just download the script:

The FLRTVC script works by downloading an apar.csv file from the FLRT website using CURL or WGET, whichever your machine has installed. Then, it uses the commands “emgr -lv3” for interim fixes and “lslpp -Lcq” for installed filesets, and compares to the vulnerabilities reported in the apar.csv file. FLRTVC will report any findings using one of two formats: Compact and Full (verbose). Compact is preferable for scripting purposes, and full reporting is for a more human-readable format that may be piped to an e-mail address.

Please see below for the flags and different usages:

Flags
-d = Change delimiter for compact reporting.
-f = File selection for *.csv file.
-q = Quiet mode, hide compact reporting header.
-s = Skip download, use default apar.csv file.
-v = Verbose, full report (for piping to email).
-g = Grep for filesets with phrase, useful for verbose mode.
-t = Type of APAR [hiper | sec].
-l = Enter a custom LSLPP output file, must match lslpp -Lqc.
-e = Enter a custom EMGR output file, must match emgr -lv3.
-x = Skip EFix processing.
-a = Show all fixed and non-fixed HIPER/Security vulnerabilities.

Examples

Compact Formatting
# /flrtvc.ksh -c

Verbose Formatting
# ./flrtvc.ksh -v

Set a custom CSV file
# ./flrtvc.ksh -f myfile.csv

Report on a specific fileset in verbose mode
# ./flrtvc.ksh -vg printers

Show only hiper results
# ./flrtvc.ksh -t hiper

Custom lslpp and emgr outputs, for reporting on other systems
# ./flrtvc.ksh -l lslpp.txt -e emgr.txt

Grouping flags together
# ./flrtvc.ksh -vf myfile.csv -g printers
# ./flrtvc.ksh -vsg printers

The vulnerability checker delivers valuable information about your systems. Try it for yourself.

A Guide to HMC Access

Edit: The link still works, and it is still a good idea to set up roles.

Originally posted August 14, 2018 on AIXchange

You probably have users in your environment who need access to the Hardware Management Console (HMC), and if so, it’s very likely you want to limit what they can do with this access. The IBM Knowledge Center lays out HMC user roles and other pertinent information in this document that was most recently updated in June:

Each HMC user has an associated task role and a resource role. The task role defines the operations the user can perform. The resource role defines the systems and partitions for performing the tasks. The users may share task or resource roles. The HMC is installed with five predefined task roles. The single predefined resource role allows access to all resources. The operator can add customized task roles, customized resource roles, and customized user IDs. The page includes six tables, though the first table is merely a list of headings for the next four. Those tables cover user roles, IDs, commands and control panel functions. The sixth table is a list of tasks that can only be performed from the command line:

Table 1. HMC task groupings
Table 2. HMC Management tasks, commands, and default user roles
Table 3. Service Management tasks, commands, and default user roles
Table 4. Systems Management tasks, commands, and default user roles
Table 5. Control Panel Functions tasks, commands, and user roles
Table 6. Command line tasks, associated commands, and user roles

Each table covers these default roles and IDs:

Operator (hmcoperator)
Super Administrator (hmcsuperadmin)
Viewer (hmcviewer)
Service Representative (hmcservicerep)

These tables provide a good overview of HMC commands and appropriate default user roles.

The POWER9 System Roll-Out Continues

Edit: Some links no longer work

Originally posted August 7, 2018 on AIXchange

Today IBM is announcing two new POWER9-based enterprise systems: the E950 (9040-MR9) and the E980 (9080-M9S). The E980 is the follow-on to both the E870 and E880, and delivers 1.5X the performance. Rather than have two high-end machines, as was the case in POWER8 (and going back to POWER7 with the 770/780), those systems are collapsed into the E980. As for the E950, with its available cores and memory, it packs quite a punch for a 4-socket server, as you’ll soon see.

These systems use different chips than those that run on the POWER9 scale-out servers that were announced in February. While the scale-out servers support direct-attached memory, these “scale-up” servers support buffered memory attachments with the POWER9 enterprise chip. This results in differences in memory bandwidth: up to 170GB/s peak bandwidth with the scale-out servers compared to 230 GB/s of peak bandwidth on the scale-up servers.

Some Quick Highlights:

E950

  • GA on Aug. 17
  • 8, 10, 11 or 12-core processor options; they will run at speeds up to 3.8 GHz
  • 2-4 processors per system (up to 48 total cores)
  • Up to 4 TB of RAM per processor; up to 16 TB of DDR4 RAM on a 4-processor system.
  • 4U Server that fits in a 19-inch rack.
  • 10 PCIe Gen4 slots and 1 PCIe Gen3 slot that specifically supports the default network card that is used at the factory. These are full-height, half-length slots with blind swap cassettes, meaning you can hot swap your I/O cards.
  • 8 SFF 2.5 SAS bays for your SAS drives. (Note: Because storage adapters aren’t built into the back plane, any storage adapters that run your SAS drives would take up a PCI slot. You have the choice of a single or dual back plane, but keep in mind that the later will take up 2 PCI slots.)
  • 4 NVMe flash drives; this is a great option for local boot of your VIO servers.
  • Three years, 24 x 7 warranty.
  • Supports AIX and Linux; no support for IBM i is planned at this time.

E980

  • 1-2 node system available on Sept. 21; 3-4 node system available on Nov. 16. All MES upgrades from E870/ E870C/E880/E880C available on Nov. 16.
  • 8, 10, 11 or 12-core processor options; they will run at speeds up to 4.0 GHz.
  • 32, 40, 44 or 48 processor cores per node, meaning the 1-2 node system supports up to 96 cores and the 3-4 node system supports up to 192 cores.
  • Up to 16 TB of RAM per node; up to 64 TB of RAM per 4-node system.
  • Modular scalable system: 1-4 5U CECs + 2U control unit.
  • Up to 32 PCIe Gen4 slots on a 4 node system. Low-profile I/O cards, 8 per CEC.
  • Up to 16 PCIe I/O expansion drawers, 4 per CEC.
  • 4 NVMe flash drives per CEC.
  • 1 year 24 x 7 warranty.
  • Supports AIX, Linux and IBM i.

Note: The E980’s 2U system control unit resides at the bottom of the nodes. With the E880, the control unit is the middle. Keep this change in mind as you determine the physical system placement in your rack, particularly if you plan to leave room for future systems growth.

As was the case with the systems that were announced earlier this year, both of the enterprise class systems come with PowerVM Enterprise Edition built in. (You can no longer select PowerVM Standard edition.) With the enterprise edition, you can utilize Live Partition Mobility (LPM) across your environment to quickly and easily move workloads from POWER7/POWER8 servers to POWER9 models. A free 60-day activation can be requested from IBM (Feature Code ELPM), so if your systems are currently running PowerVM standard edition, you still have a way to perform live migrations.

When running LPM between POWER9 systems, you can expect faster partition migrations due to on-chip encryption and compression. In IBM’s migration testing, the before and after results were pretty startling: one test workload that ended up transferring 51 GB in 5 minutes to migrate the LPAR was pared down to only 1 GB of data and 15 seconds for the data transfer when encryption and compression were deployed. Obviously your mileage will vary based on individual workloads and LPAR characteristics.

Both systems support Capacity Upgrade on Demand (CUoD), meaning you can buy extra physical cores and memory and activate them as needed. CUoD takes a lot of the uncertainty out of system planning.

Keep an eye out for AIX 7.2 TL3 running on POWER9; it will now ship with SMT8 enabled instead of SMT4, so going forward you’ll need to pay attention to how your workloads are running after migrations and upgrades. Expect to see sizable performance improvements over POWER7 and POWER8; I’ll share some actual numbers once they come out.

If you’re familiar with what was IBM PowerCare, it is now called the IBM Power to Cloud Reward Program. With the purchase of an enterprise system, you’ll earn credits for services that can be redeemed for various on-site IBM Lab Services offerings.

Speaking of cloud, these systems come with cloud management console (CMC) entitlements for three years.

You’ll also be able to install and use PowerVC 1.4.1.

Finally, note these levels of firmware, HMC, VIOS, AIX, IBM i and Linux versions that you’ll need to be running:

  • Firmware level FW920.10 (available in third quarter)/FW920.20 (4Q).
  • HMC code level V9R1.920.
  • VIOS 2.2.6.31 (3Q)/VIOS 2.2.6.32 & 3.1.0 (4Q).
  • AIX 7.2 TL2.
  • AIX 7.2 TL1 (P8 compatibility mode).
  • AIX 7.1 TL4, TL5 (P8 compatibility mode).
  • AIX 6.1 TL9 (P7 compatibility mode).
  • IBM i 7.3 TR5.
  • IBM i 7.2 TR9.
  • Ubuntu 16.04.4 (P8 compatibility mode).
  • RedHat RHEL 7.5 LE (P8 compatibility mode.)
  • RedHat RHEL 7.6 LE (4Q).
  • SuSE SLES 11 SP4 (P8 compatibility mode).
  • SuSE SLES 12 SP3.
  • SuSE SLES 15.

As always, IBM Power Systems deliver on performance―not to mention scalability, reliability, serviceability, agility and flexibility. I look forward to getting my hands on these systems.

Techspeak Explained

Edit: The links no longer work which is a real shame.

Originally posted July 31, 2018 on AIXchange

You don’t need me to tell you that there are a lot of acronyms in tech. But it never hurts to be reminded that more and more workers are entering the world of AIX and IBM Power Systems from non-UNIX and/or non-IBM backgrounds. As a consultant, I regularly meet people who are new to IBM systems and unfamiliar with many IBM-specific terms–e.g., PMR, APAR, NIM, WPAR, VPD and SEA–that we take for granted.

Luckily IBM maintains this index of terms and definitions. Most are specific to IBM software and hardware products, but there are also general computing terms.

Let’s try it, shall we? Check V, and you’ll see that VPD is vital product data. DDM has two meanings (here and here), one of which is “a field-replaceable unit (FRU) that consists of a single disk drive and its associated packaging.”

Admit it: You’re wondering what an FRU is now, aren’t you? Go here.

This is a valuable resource for anyone who’s new to IBM technology and needs help translating from IBM to English.

Making a PowerVC Proxy

Edit: The link no longer works.

Originally posted July 24, 2018 on AIXchange

As I’ve noted numerous times, Twitter is a great resource for anyone who wants to learn about what’s new in the world of AIX and IBM Power Systems.

Case in point, Chris Gibson (@cgibbo) pointed to this article on setting up an HTTP proxy on PowerVC:

Have you ever struggled to give your end users access to the PowerVC UI, but don’t want to give them real access to the PowerVC host? For example, I’ve seen a few scenarios recently where we want to make PowerVC UI publicly available, but still need PowerVC itself sitting on an internal private network with connections to the private management infrastructure. There are a number of ways you can go about doing this with port forwarding, iptables rules, etc. But perhaps the easiest way to do this is to set up a very simple light-weight HTTP proxy with NGINX.

Install nginx. On Ubuntu/Debian, simply run: sudo apt install nginx. On Redhat, run: sudo yum install nginx

nginx should start automatically. If not, run: sudo systemctl start nginx

Remove the default config file: sudo rm /etc/nginx/sites-enabled/default

Install ssl-cert. This will allow automatic generation of self-signed ssl certificates:  sudo apt install ssl-cert or sudo yum install ssl-cert.

Add the following configuration file, modifying the 10.0.0.10 IP address to match that of your PowerVC server (paste this entire entry into a bash shell)

Finally, restart nginx (sudo systemctl restart nginx), and point your web browser to http://X.X.X.9 and you should see the PowerVC GUI.

Read the whole thing for the actual code and a detailed explanation. Also be sure to check out this specific information on NGINX, which is linked at the end of the document.

My Blogging Anniversary

Edit: I will keep writing if you will keep reading.

Originally posted July 17, 2018 on AIXchange

What were you doing 11 years ago? I was living in another state and working for a different company. Since then I moved back home to Arizona. I lost weight, got more active, and spent more time outdoors camping, backpacking, hiking and going on scuba trips. 

Eleven years ago I had small children at home. Now one son is in the military, and the other is about to start his sophomore year of high school. Before I know it, he too will be grown and out of the house and I’ll be an empty nester. I’m sure many of you are on similar paths. 

Eleven years ago was also when I was asked to start writing AIXchange. Fifty weeks a year for 11 years (we take time off over the holidays and the week of July 4), I’ve written about AIX and IBM Power Systems servers, and many, many other technology-related topics

The constant search to find interesting topics to cover in this blog helps me as an IT professional. These duties keep me focused on what’s current in tech. I especially love researching IBM announcements and learning about new technology before it becomes common knowledge. 

The time I’ve put into this blog has paid off in unexpected ways. I’m sure it’s a big reason why I’ve been an IBM Champion. And being part of the IBM Champion community, which allows me to directly interact with those involved with IBM announcements, has benefited this blog. I’m better informed and more capable of explaining and analyzing AIX and IBM Power Systems technology. 

Occasionally I’ll be using Google to solve a problem and see one of my old posts come up in the results. It’s happened more than once. That’s something else I didn’t expect. It’s a good thing I write this stuff down rather than having to reinvent the wheel. 

Many of my readers have kindly made suggestions as to what they want to see covered. They’ve sent tips and tricks and scripts and links to presentations. I’m always happy to highlight what others are doing, and share their knowledge. 

Eleven years ago the POWER6 processors were announced. Version 6.1 of AIX was released late in 2007. Computers and networks and disks have of course gotten much faster since then, but the parameters of our jobs haven’t changed all that much. We’re still needed to care for our systems and keep them running. My efforts here won’t change, either. I’ll keep learning and keep writing. AIXchange will continue to provide a window into IBM products and other technologies, reflecting what’s new, what’s changing and what’s going away. And rest assured, I’ll keep using my Model M keyboard.

Returning to AIX

Edit: Some links no longer work

Originally posted July 10, 2018 on AIXchange

Recently I received this email:

It’s been a number of years since I have administered AIX. I was on AIX before 5L. (Was there a version 5?) It may have been v4.

I am going to update my skill set on AIX, since I see it a lot out there in the wild.

What are the deltas between 4 -> 5?? -> 5L -> 6 -> 7 <== would I even recognize AIX anymore

What would be the best way to update my skills on AIX?

I thought about it and replied, but then I realized that this individual can’t be the only person in our profession who’s ever switched jobs and/or been tasked with supporting different operating systems. There must be a number of admins who’ve worked on AIX, moved to a different opportunity in IT, and then found themselves managing AIX systems again. With that in mind, I thought I’d share the gist of my response here.

If you haven’t worked with AIX lately, know that a lot of what you’re familiar with is still there. For instance, smitty is still a valid way to manage your systems. While the logical volume manager (LVM) has evolved from JFS to JFS2, it should look and feel familiar. That’s true for much of AIX and its related capabilities. As far as ways to update your AIX skills, here are some places to start:

  • irc–Get on the ##aix channel and ask questions. Now, you shouldn’t expect immediate answers as most of the users have day jobs, but be patient; you should eventually get a response. You can also post questions on Reddit or in the AIX forum.
  • Nigel Griffiths has a series of YouTube videos.
  • The IBM Power Systems technical webinar series (previously known as the AIX Virtual User Group) conducts monthly presentations. Dig into the replays whenever you have time.
  • Get hands-on–Even if you don’t have access to a lab machine at work, you can still get on a system. Used systems are sometimes available on eBay. Or you could get AIX running on Nutanix.

Of course many other AIX/IBM Power Systems resources are out there. Please make your own suggestions in comments.

AIX Implementation Best Practices Updated for POWER9

Edit: One of my go-to reference guides

Originally posted June 26, 2018 on AIXchange

An updated version of AIX implementation best practices for commercial workloads was released in May. This should not be confused with the POWER9 performance best practices document I referenced three weeks ago. In this case, I’m talking about the latest in Fredrik Lundholm’s popular series of presentations, which I previously wrote about here

Here’s Fredrik’s introduction to his current presentation:

Dear Power Team,

It is that time of year to renew the best practices for the spring and POWER9 implementations. Please read and enjoy! As always please share any comments or requests for improvement with me.

In next version I am planning to include a section on VIOS rules and how they complement the best practices.

I’ve included a section on rootvg failure monitoring in PowerHA donated by Chris Gibson… .

On slide 3 you can see the changelog since the last time I wrote about it.

Changes for 1.20:
Rootvg failure monitoring in PowerHA 7.2, Default Processor mode,

Changes for 1.19:
2018 Apr Update, POWER9 enablement, Spectrum Scale 4.2 certified with Oracle RAC 12c,

Changes for 1.18:
2017 Sep Update,, new AIX default multipathing for SVC

Changes for 1.17:
2017 Update, VIOS 2.2.5, poll_uplink clarification (edit)

The reminder from page 4
This presentation describes the expected best practices implementation and documentation guidelines for Power Systems with AIX. These should be considered mandatory procedures for virtualized Power servers.

The overall goal is to combine simplicity with flexibility. This is key to achieve the best possible total system availability with adequate performance over time.

While this presentation lists the expected best practices, all customer engagements are unique. It is acceptable to adapt and make implementation deviations after a mandatory review with the responsible architect (not only engaging the customer) and properly documenting these:

General Design Principles for Power System implementations (page 6)
System and PowerVM Setup recommendations (page 15)
AIX Setup recommendations (page 28)
PowerHA (page 38)
Linux/IBM i (page 43)
FAQ (page 44)
Reference Slides: Procedures for older AIX/VIOS releases (page 49)

Fredrik has developed a good following over the years, and it’s easy to see why. If you’ve not checked out his previous presentations, take the time to go through this.

What are your resource needs? You’ll know when you know

Edit: Some links no longer work

Originally posted June 20, 2018 on AIXchange

A few weeks ago I came across this great exchange in the AIX forum:

How do I determine the resources needed based on volume of transactions. By resources I mean, the cores, memory etc. Is there a way to arrive at that value?

The reply took the form of an analogy:

This question is about the same as “how much petrol does it take to go 100 miles”–without any specification of details it cannot be answered. In the above version: a bicycle would need no petrol at all, a car maybe 10 [liters] and a tank perhaps 200L of diesel. In your question: it depends on the transactions, the type of processor, the database used, the amount of memory, etc., etc….

In addition there are no fixed values for this, a lot of these estimations are done on experience. So, without you telling us more about your requirements we can’t answer your question, not even with a rough estimation.

As Nigel Griffiths notes in this IBM developerWorks post, basic common sense is a useful guide in these matters:

Trick 2: Don’t worry about the tea bags!
No one calculates the number of teabags they need per year. In my house, we just have some in reserve and monitor the use of tea bags and then purchase more when needed. Likewise, start with a sensible VIOS resources and monitor the situation.

Can this sort of thinking apply to our LPARs? Until we start running a given workload, we may not know how much memory and CPU we’ll ultimately need. Luckily, POWER-based systems are very forgiving in this regard. If some spare memory and CPU is available on our machines, we can (provided our profiles are set correctly) add or remove CPU and memory with a quick dynamic LPAR operation. As we monitor our workloads and tweak our resource allocations, we can arrive at a satisfactory answer with minimal effort.

Here’s the same AIX forum member making a similar analogy back in 2013:

A simple comparison of the difference between performance and speed can be described with this analogy: We have a Ferrari, a large truck, and a Land Rover. Which is fastest? Most people would say the Ferrari, because it can travel at over 300 [kilometers per hour]. But suppose you’re driving deep in the country with narrow, windy, bumpy roads? The Ferrari’s speed would be reduced to near zero. So, the Land Rover would be the fastest, as it can handle this terrain with relative ease, at near the 100kph limit. Right? But, suppose, then, that we have a 10-tonne truck which can travel at barely 60kph along these roads? If each of these vehicles are carrying cargo, it seems clear that the truck can carry many times more the cargo of the Ferrari and the Land Rover combined. So again: which is the “fastest”? It depends on the purpose (amount of cargo to transport) and environment (streets to go). This is the difference between “performance” and “speed.” The truck may be the slowest vehicle, but if delivering a lot of cargo is part of the goal it might still be the one finishing the task fastest.

So how do you determine the amount of resources you’ll need? As Nigel says in the previously referenced developerWorks post:

The classic consultant answer is “it depends on what you are doing with Disk & Network I/O” is not very useful to the practical guy that has to size a machine including the VIOS nor the person defining the VIOS partition to install it!

“Watch your workload and adjust as needed” may be wishy-washy advice, but the point is that real-world system workloads are difficult to simulate. While rPerfs and workload estimators can get you pretty far, you’ll inevitably need to make adjustments along the way. And as I said, this is yet another reason to appreciate AIX and IBM Power Systems. This combination is so easy to manage when it comes to adjusting resources and migrating workloads to different hardware as needed.

New Doesn’t Always Mean Improved

Edit: I still miss the keyboard on the Blackberry

Originally posted June 12, 2018 on AIXchange

Awhile back, Dan Kaminsky posed these questions on Twitter:

  • Who asked Slack to shut down their IRC gateway?
  • Who asked Apple to remove the headphone port?
  • Who *are* technical organizations actually listening to? Not asking as an attack. It’s behavior that is happening, with full awareness of unpopularity. What is the source?

I love this sentiment. In fact, I ask these sorts of questions all the time. For instance, who decided that we no longer wanted mechanical keyboards? Why do laptops have trackpads when everyone was cool with the TrackPoint?

It’s a little bit like an automatic transmission versus a stick shift. If you know how to drive a stick, you don’t want an automatic transmission. If you don’t drive a stick shift, you’re not going to buy a car that’s got one.

One of the advantages of a TrackPoint is that your hands don’t have to leave the home row to move the cursor. So, you can type and move the cursor without doing this [mimes a hand shifting between a keyboard and a trackpad].

Plus, your finger doesn’t really have to move, because a TrackPoint is strain-gauged, so it measures pressure. It doesn’t move around like a joystick, it’s measuring pressure. Some people get it and some people don’t; some people acquire the taste. It’s hard to explain, but I still think there’s a use for it.

For the record, mechanical keyboards are still available, though when I started in IT, they were ubiquitous. But again, how do these decisions get made? I assume the desire to cut costs is a foremost consideration in these instances. Maybe there were licensing issues with IBM. Regardless of the reasoning or circumstances though, it sometimes feels like we’re heading backwards and forgetting valuable lessons from the past.

This article from 2007 questions the common perception of user-friendliness:

Graphic User Interface (GUI) is commonly considered to be superior to Text-based User Interface (TUI). This study compares GUI and TUI in an electronic dental record system. Several usability analysis techniques compared the relative effectiveness of a GUI and a TUI. Expert users and novice users were evaluated in time required and steps needed to complete the task. A within-subject design was used to evaluate if the experience with either interface will affect task performance. The results show that the GUI interface was not better than the TUI for expert users. GUI interface was better for novice users. For novice users there was a learning transfer effect from TUI to GUI. This means a user interface is user-friendly or not depending on the mapping between the user interface and tasks. GUI by itself may or may not be better than TUI.

I think you know how this ends up: The only folks using text-based interfaces, CLI and the like, are us, the so-called expert users. For all the non-technical end users in the enterprise, GUI predominates.

I don’t have the answers, but it sure seems like there’s disconnect between those who design and enhance our technology and the consumer base. Maybe it’s a result of corporate cost-cutting, or maybe it’s so marketing teams can point to new features. 

Or maybe it’s generational. The things we take for granted, younger people have no idea how they work. For instance, Slack went down a couple weeks ago. I had to laugh, knowing that irc keeps on running. Then I found this about a hotel that provides an instructional video on using its rotary phones. (Note: You have to be at least 35 to find that sentence astounding.) 

Anyway, I’m sure you can come up with your own examples of changes that didn’t seem helpful or necessary. For all we gain with new technologies, it’s not a perfect trade-off. New doesn’t always mean improved. 

POWER9 Performance Best Practices

Edit: Best practices are always a great place to start.

Originally posted June 5, 2018 on AIXchange

In April, IBM’s Therese Eaton (@tetweetings) noted this availability of this POWER9 performance best practices document.

Along with POWER9 (and POWER8) best practices, there’s instruction on managing AIX updates and upgrading from Version 5.3 to 7.1.

While it’s only a brief checklist, there is important information here:

  • Ensure your firmware is current.
  • Follow the memory plug-in rules.
  • Ensure OS level is current.
  • Evaluate the use of SMT8.
  • Right-size your shared LPARs.
  • Use DPO to optimize partition placement.

Also covered are AIX and VIO server tunables, CPU utilization, VIO server configuration, and virtual Ethernet adapters.

The second page has links to virtualization best practices, rPerf reports, 100G adapter best practices, VIOS sizing, Java performance, VIO server advisor, and IBM Redbooks.

Particularly for those who are new to the platform, these resources can be a big help.

Applying VIOS Rules Post-Install

Edit: Do you make changes to the defaults?

Originally posted May 29, 2018 on AIXchange

Awhile back my colleague Eric Hopkins was installing VIO server 2.2.6.21 when he noticed something new: a reminder to apply rules post-installation:

Virtual I/O Server (VIOS) rules management consists of two rules files. The default rules file contains the critical recommended device rules for VIOS best practice, and the current rules file captures the current VIOS system settings based on the default rules. To deploy the recommended default device settings on a newly installed VIOS, run the rules –o deploy –d command and then restart the system. The default rules are contained in an XML profile, and you cannot modify the default rules.

You can customize rules on VIOS, by using the current rules. The initial current rules are captured from the system by using default rules as a template and then saving them in an XML profile. You can modify the current rules or add new rules. The new rules must be supported on the VIOS level. You can apply the changed current rules to VIOS, for currently discovered and newly discovered device types and instances. You can use the rules command to manage VIOS rules files.

This is what was displayed after logging in following his 2.2.6.21 install: 

================================================

IBM Virtual I/O Server

                      login: padmin

[compat]: 3004-610 You are required to change your password.

        Please choose a new one.

padmin’s New password:

Enter the new password again:

Indicate by selecting the appropriate response below whether you

accept or decline the software maintenance terms and conditions.

Accept (a) |  Decline (d) |  View Terms (v) > a

$ license -accept

  Current system settings are different from the best practice recommendations for a VIOS.

  To view the differences between system and the recommended settings, run the following:

  $rules -o diff -s -d

  To deploy the VIOS recommended default settings, run the following:

  $rules -o deploy -d

  $shutdown -restart

$ rules -o diff -s -d

devParam.disk.fcp.mpioosdisk:reserve_policy device=disk/fcp/mpioosdisk             single_path | no_reserve

devParam.disk.fcp.mpioapdisk:reserve_policy device=disk/fcp/mpioapdisk             single_path | no_reserve

devParam.disk.fcp.nonmpiodisk:reserve_policy device=disk/fcp/nonmpiodisk           single_path | no_reserve

devParam.disk.fcp.aixmpiods8k:reserve_policy device=disk/fcp/aixmpiods8k           single_path | no_reserve

devParam.disk.sas.mpioapdisk:reserve_policy device=disk/sas/mpioapdisk             single_path | no_reserve

devParam.PCM.friend.fcpother:algorithm device=PCM/friend/fcpother                   fail_over | round_robin

devParam.PCM.friend.iscsiother:algorithm device=PCM/friend/iscsiother               fail_over | round_robin

devParam.PCM.friend.otherapdisk:algorithm device=PCM/friend/otherapdisk             fail_over | round_robin

devParam.PCM.friend.sasother:algorithm device=PCM/friend/sasother                   fail_over | round_robin

devParam.PCM.friend.aixmpiods8k:algorithm device=PCM/friend/aixmpiods8k             fail_over | round_robin

devParam.adapter.pseudo.ibm_ech:hash_mode device=adapter/pseudo/ibm_ech              default | src_dst_port

devParam.adapter.pciex.df1000fe:num_cmd_elems device=adapter/pciex/df1000fe                      200 | 1966

devParam.adapter.pciex.df1000fe:max_xfer_size device=adapter/pciex/df1000fe             0x100000 | 0x400000

devParam.adapter.pci.df1023fd:num_cmd_elems device=adapter/pci/df1023fd                          200 | 1966

devParam.adapter.pci.df1023fd:max_xfer_size device=adapter/pci/df1023fd                     0x100000 | 0x400000

devParam.adapter.pciex.771032257710650:num_cmd_elems device=adapter/pciex/771032257710650            500 | 2048

devParam.adapter.pciex.771032257710650:max_xfer_size device=adapter/pciex/771032257710650   0x100000 | 0x400000

devParam.adapter.pciex.77103224:num_cmd_elems device=adapter/pciex/77103224                          200 | 1024

devParam.adapter.pciex.77103224:max_xfer_size device=adapter/pciex/77103224                 0x100000 | 0x400000

devParam.adapter.pciex.df1000f1df1024f:max_xfer_size device=adapter/pciex/df1000f1df1024f   0x100000 | 0x400000

devParam.adapter.pciex.df1000f1df1024f:num_cmd_elems device=adapter/pciex/df1000f1df1024f            500 | 4014

devParam.adapter.pciex.df1000f114108a0:max_xfer_size device=adapter/pciex/df1000f114108a0   0x100000 | 0x400000

devParam.adapter.pciex.df1000f114108a0:num_cmd_elems device=adapter/pciex/df1000f114108a0            500 | 4014

devParam.adapter.pciex.df1000f11410010:num_cmd_elems device=adapter/pciex/df1000f11410010            500 | 4014

devParam.adapter.pciex.df1000f11410010:max_xfer_size device=adapter/pciex/df1000f11410010   0x100000 | 0x400000

devParam.adapter.pciex.771032257710760:max_xfer_size device=adapter/pciex/771032257710760   0x100000 | 0x400000

devParam.adapter.pciex.771032257710760:num_cmd_elems device=adapter/pciex/771032257710760            500 | 2048

devParam.adapter.pciex.771032257710750:max_xfer_size device=adapter/pciex/771032257710750   0x100000 | 0x400000

devParam.adapter.pciex.771032257710750:num_cmd_elems device=adapter/pciex/771032257710750            500 | 2048

devParam.adapter.pciex.771032257710680:max_xfer_size device=adapter/pciex/771032257710680   0x100000 | 0x400000

devParam.adapter.pciex.771032257710680:num_cmd_elems device=adapter/pciex/771032257710680            500 | 2048

devParam.adapter.pciex.771032257710660:max_xfer_size device=adapter/pciex/771032257710660   0x100000 | 0x400000

devParam.adapter.pciex.771032257710660:num_cmd_elems device=adapter/pciex/771032257710660            500 | 2048

devParam.adapter.pciex.7710018077107f0:max_xfer_size device=adapter/pciex/7710018077107f0   0x100000 | 0x400000

devParam.adapter.pciex.7710018077107f0:num_cmd_elems device=adapter/pciex/7710018077107f0            500 | 2048

devParam.adapter.pciex.771001801410af0:max_xfer_size device=adapter/pciex/771001801410af0   0x100000 | 0x400000

devParam.adapter.pciex.771001801410af0:num_cmd_elems device=adapter/pciex/771001801410af0            500 | 2048

devParam.adapter.pciex.df1000e21410f10:max_xfer_size device=adapter/pciex/df1000e21410f10   0x100000 | 0x400000

devParam.adapter.pciex.df1000e21410f10:num_cmd_elems device=adapter/pciex/df1000e21410f10            500 | 4096

devParam.adapter.pciex.df1060e21410100:max_xfer_size device=adapter/pciex/df1060e21410100   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410100:num_cmd_elems device=adapter/pciex/df1060e21410100            500 | 4096

devParam.adapter.pciex.df1060e21410520:max_xfer_size device=adapter/pciex/df1060e21410520   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410520:num_cmd_elems device=adapter/pciex/df1060e21410520            500 | 4096

devParam.adapter.pciex.df1000e2df1002e:max_xfer_size device=adapter/pciex/df1000e2df1002e   0x100000 | 0x400000

devParam.adapter.pciex.df1000e2df1002e:num_cmd_elems device=adapter/pciex/df1000e2df1002e            500 | 4096

devParam.adapter.pciex.df1000e214105e0:max_xfer_size device=adapter/pciex/df1000e214105e0   0x100000 | 0x400000

devParam.adapter.pciex.df1000e214105e0:num_cmd_elems device=adapter/pciex/df1000e214105e0            500 | 4096

devParam.adapter.pciex.df1060e214105f0:max_xfer_size device=adapter/pciex/df1060e214105f0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214105f0:num_cmd_elems device=adapter/pciex/df1060e214105f0            500 | 4096

devParam.adapter.pciex.df1060e21410370:max_xfer_size device=adapter/pciex/df1060e21410370   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410370:num_cmd_elems device=adapter/pciex/df1060e21410370            500 | 4096

devParam.adapter.pciex.df1060e214103a0:max_xfer_size device=adapter/pciex/df1060e214103a0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214103a0:num_cmd_elems device=adapter/pciex/df1060e214103a0            500 | 4096

devParam.adapter.pciex.df1000e2df1082e:max_xfer_size device=adapter/pciex/df1000e2df1082e   0x100000 | 0x400000

devParam.adapter.pciex.df1000e2df1082e:num_cmd_elems device=adapter/pciex/df1000e2df1082e            500 | 4096

devParam.adapter.pciex.df1060e214103e0:max_xfer_size device=adapter/pciex/df1060e214103e0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214103e0:num_cmd_elems device=adapter/pciex/df1060e214103e0            500 | 4096

devParam.adapter.pciex.df1060e21410410:max_xfer_size device=adapter/pciex/df1060e21410410   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410410:num_cmd_elems device=adapter/pciex/df1060e21410410            500 | 4096

devParam.adapter.pci.df1080f9:max_xfer_size device=adapter/pci/df1080f9                     0x100000 | 0x400000

devParam.adapter.pci.df1080f9:num_cmd_elems device=adapter/pci/df1080f9                              200 | 2048

devParam.adapter.pci.df1000fd:max_xfer_size device=adapter/pci/df1000fd                     0x100000 | 0x400000

devParam.adapter.pci.df1000fd:num_cmd_elems device=adapter/pci/df1000fd                              200 | 1966

devParam.adapter.pci.df1000fa:max_xfer_size device=adapter/pci/df1000fa                     0x100000 | 0x400000

devParam.adapter.pci.df1000fa:num_cmd_elems device=adapter/pci/df1000fa                              200 | 2048

devParam.adapter.pci.df1000f9:max_xfer_size device=adapter/pci/df1000f9                     0x100000 | 0x400000

devParam.adapter.pci.df1000f9:num_cmd_elems device=adapter/pci/df1000f9                              200 | 2048

devParam.adapter.pci.df1000f7:max_xfer_size device=adapter/pci/df1000f7                     0x100000 | 0x400000

devParam.adapter.pci.df1000f7:num_cmd_elems device=adapter/pci/df1000f7                              200 | 1024

devParam.adapter.pci.77102224:max_xfer_size device=adapter/pci/77102224                     0x100000 | 0x400000

devParam.adapter.pci.77102224:num_cmd_elems device=adapter/pci/77102224                              200 | 1024

devParam.driver.iocb.efscsi:dyntrk device=driver/iocb/efscsi                                           no | yes

devParam.driver.iocb.efscsi:fc_err_recov device=driver/iocb/efscsi                     delayed_fail | fast_fail

devParam.driver.qliocb.qlfscsi:dyntrk device=driver/qliocb/qlfscsi                                     no | yes

devParam.driver.qliocb.qlfscsi:fc_err_recov device=driver/qliocb/qlfscsi               delayed_fail | fast_fail

devParam.driver.qiocb.qfscsi:dyntrk device=driver/qiocb/qfscsi                                         no | yes

devParam.driver.qiocb.qfscsi:fc_err_recov device=driver/qiocb/qfscsi                   delayed_fail | fast_fail

devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan                    2048 | 4096

devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan                     512 | 4096

devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan                   2048 | 4096

devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan                    512 | 4096

devParam.adapter.pciex.77103225141004f:max_xfer_size device=adapter/pciex/77103225141004f   0x100000 | 0x400000

devParam.adapter.pciex.77103225141004f:num_cmd_elems device=adapter/pciex/77103225141004f           1024 | 2048

devParam.adapter.pciex.7710322514101e0:max_xfer_size device=adapter/pciex/7710322514101e0   0x100000 | 0x400000

devParam.adapter.pciex.7710322514101e0:num_cmd_elems device=adapter/pciex/7710322514101e0            500 | 2048

devParam.adapter.pciex.df1000e31410140:max_xfer_size device=adapter/pciex/df1000e31410140   0x100000 | 0x400000

devParam.adapter.pciex.df1000e31410140:num_cmd_elems device=adapter/pciex/df1000e31410140           1024 | 4096

devParam.adapter.pciex.df1000e31410150:max_xfer_size device=adapter/pciex/df1000e31410150   0x100000 | 0x400000

devParam.adapter.pciex.df1000e31410150:num_cmd_elems device=adapter/pciex/df1000e31410150           1024 | 6144

Have you run into this during new VIOS deployments? What you think? Personally, I appreciate the prompt to fix things right away at installation. It’s good to get these tasks out of way as opposed to having to remember to take care of them later in the process.

A Primer on the New Hyperconverged Systems

Edit: A shame this was not adopted

Originally posted May 22, 2018 on AIXchange

So hyperconverged systems running AIX are here, and it’s very cool. If you’re looking for more technical detail, the IBM Knowledge Center provides some practical information that techies will find very interesting. This doc features concepts and recommendations on planning, deploying and installing AIX, network booting and configuring virtual machines. There’s also a section on troubleshooting. Here are some interesting tidbits:

  • AIX cannot determine the number of physical cores in the system and reports a large default value when running on IBM Hyperconverged Systems powered by Nutanix.
  • The system administrator must use the Nutanix PRISM GUI to obtain information about system capacity for capacity planning and software licensing purposes.
  • Nutanix does not support micro-partitioning of CPUs or shared processor pools with entitlement controls found on PowerVM based systems. When the AIX operating system is running in this environment, AIX represents all virtual processors as fully entitled and having capped shared CPUs.
  • The AIX operating system supports virtual I/O Ethernet and SCSI devices (virtio-net and virtio-scsi types) by using the KVM VirtIO virtualization standard that is used in IBM hyperconverged systems. The AIX operating system also supports the CD device (spapr type) used in this environment.
  • Hyperconverged systems use fully virtualized I/O; therefore, workloads that rely on physical I/O are not supported.
  • AIX on IBM Hyperconverged Systems powered by Nutanix supports installations through AIX cloud images and DVD ISO media. This environment also supports installations through traditional methods for network-based installations by using the NIM that is currently supported on PowerVM systems.
  • You can access the AIX console through the PRISM GUI by using the COM1 console connection after the VM has been started. You must use a VNC console connection to interact with the open firmware.
  • If you’re using a static IP address for the client VM, the client and server must be on the same subnet when booting a VM across the network. You cannot specify a subnet mask in the boot command as shown in this example:
    • 0> boot <NIC-device>:<nim-server-ip>,<\path\to\client\bootfile>,<clientip>,<gateway-ip>
    • 0> boot net:9.3.94.78,\tftpboot\client-vm.ibm.com,9.3.94.217,9.3.94.1
  • You must restart the AIX VM when you change the number of CPUs, amount of memory, and after adding or removing a network device or CD-ROM device.
  • AIX supports serial console connections on IBM Hyperconverged Systems. You must choose the COM1 connection while launching a console from PRISM to interact with the AIX operating system.
  • The VNC console connection must be used to interact with open firmware before the AIX operating system is loaded after starting or rebooting a VM.
  • As the VM loads the AIX operating system and software drivers, AIX IPL progress codes are displayed in the COM1 console.
  • AIX does not provide concurrent diagnostics support, including adapter firmware updates, for IBM Hyperconverged Systems powered by Nutanix. The Nutanix product provides support for device diagnostics and firmware updates.

Rest assured, I will continue to find ways to get hands-on with these clusters, and let you know what I learn along the way. I’ve been asked why this is such a big deal, and there’s a simple answer: It’s AIX running on what’s essentially a new and different hypervisor. In short, there’s another way to run our favorite OS.   

Here’s another way to look at it: Different skills are needed to manage AIX and Power systems. You need to learn the HMC and keep up with the changes to the interface. You also have to learn the VIO server and how dual VIO failover works, etc. You have to learn SMS, ASMI and so many other things. Sure, we all understand this stuff, we like working with this stuff, but it is a barrier of entry for new admins.   

Having been hands-on with the Prism interface, I can tell you that it’s far simpler to use than the HMC and VIO server interfaces. Again, that’s nice for us, but when you think of the newcomers to AIX, it’s huge. Along with that, if you’re already using Nutanix in your datacenters, it’s a snap to add in a POWER-based cluster and receive the performance advantages of both Linux on Power and AIX. 

“Game-changer” is pretty cliched techspeak at this point, but it fits here. The capability to run AIX on Nutanix is a game-changer. I hope this gives you an idea of the different things you’ll be able to do with AIX going forward.

Getting Hands on With AIX on a Nutanix Cluster

Edit: Shame this did not gain more traction.

Originally posted May 15, 2018 on AIXchange

Ever since IBM’s intriguing statement of direction about AIX running on POWER-based Nutanix clusters, I’ve eagerly awaited the real thing. The wait ended last week, when availability of the hyperconverged systems was made official at the Nutanix .NEXT conference in New Orleans.

Now here’s the really cool part: during the IBM Technical University earlier this month, I got some hands-on experience with AIX running on a Nutanix cluster. Then last week, I was able to access a cluster again, this time via Webex video conferencing.

So how does this all work? I’ll start with the Prism interface. Watch this to get some familiarity with it. Prism is the GUI that manipulates the virtual machines that we created and managed. While the video I reference is actually an x86 cluster, Prism’s look and feel is similar to that of a POWER-based cluster.

Once we were logged into Prism, we loaded a pre-GA raw disk image provided by IBM into our image repository. It’s very similar to how we use the VIO server’s virtual media library, only instead of booting from CD and installing AIX, we basically took a clone of this disk image and booted from that.

Compared to creating a machine on the HMC, there isn’t much to configure in a VM when creating it via Prism. (This video gives you a feel for those tasks.) This solution–and the capability to clone virtual machines in particular–feels similar to using PowerVC images and shared storage pools with our existing POWER servers. However, with a hyperconverged solution, there’s no need to worry about managing a SAN at all, because your disks are locally attached to your compute nodes.

I entered the name of my VM, the number of virtual CPUs, the number of cores per VCPU, and the amount of memory I wanted. Then I added a network interface and some logical disks that I carved out of a larger pool of physical disk. I selected “clone from image service” along with the correct disk image. I clicked on add, and the VM was created. After clicking on the power on option and selecting the console, the machine booted up. I logged in as root with no password and I was up and running.

At this point I clicked the clone option; that’s all it took to get another machine up and running. The lspv command displayed the same PVID on both systems. They were identical disk clones. In the prtconf command output, I saw the following:

System Model: IBM pSeries (emulated by qemu)
Machine Serial Number: Not Available
Processor type: PowerPC_POWER8
Processor Version: PV_S_Compat
Number of Processors: 4
Processor Clock Speed: 2095 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: (this was a long UUID string)
Platform Firmware level: Not available
Firmware Version: SLOF, HEAD

The information about the physical hardware is a little different from what we’re used to seeing in PowerVM-based systems. To determine my serial number, I’ll typically run either uname –a or prtconf; neither worked in this instance. Instead I went into the Prism GUI to see the physical node I was running my AIX image on.

Here’s a snippet of the some of the output generated by running lsdev. Again, there are some differences:

vscsi0 Virtual SCSI Client adapter
cd0 Virtual SCSI Optical Served by VIO Server
ent0 qemu_virtio-net-pci:0000:00:01.0 Virtio NIC Client Adapter
scsi0 qemu_vhost-user-scsi-pci:0000:00:02.0 Virtio SCSI Client Adapter
hdisk0 qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0 MPIO Other Virtio SCSI Disk Drive
hdisk1 qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0 MPIO Other Virtio SCSI Disk Drive

Later, I built an “empty” virtual machine. I gave it a name and assigned memory, CPU, disk and a network, but I didn’t give it anything to boot from. On the Nutanix cluster there’s no SMS to boot into. By default it tried to boot from the network. After that timed out, it booted into the Slimline Open Firmware (SLOF) interface.

Since I didn’t have a NIM server built, I couldn’t test that process. Rest assured, that will be one of the first things I do once I get my own solution.

In the systems running AIX, I was able to load a virtual CD .iso containing AIX filesets just as we’d do with PowerVM and VIO optical media libraries. Then I went into smitty and loaded filesets, just as we’d do with any other AIX system.

When I ran oslevel –s, the system returned 7200-02-02-1810.

Using chfs command to resize filesystems went as expected.

Running lsattr –El hdisk0 produced some interesting unique_id information. The disks appeared as a 54391NUTANIX disk.

I ran the mount command to mount the virtual CD inside AIX, poked around for a bit, and unmounted it. Then I went into the Prism GUI, removed the .iso I’d been using and added a different image into the virtual CD. Finally, I went back into AIX and mounted this new .iso on the fly.

Migrating virtual machines across physical nodes was like running Live Partition Mobility with PowerVM. Of course there were minor differences running AIX on this different hypervisor, but overall everything worked as expected. Getting right to work in this new environment was very simple.

As you’ll need AIX 7.2 to deploy machines into this environment, you should listen to Chris Gibson’s recent AIX Virtual User Group presentation on AIX 7.2 migration.

There’s much more I want to do with this technology. I plan to test out a mksysb migration to move my systems to the supported version of AIX that will run on a Nutanix cluster. Later on, I’ll get into SLOF and boot from a NIM server. I also want to kick off workloads and run performance scripts. Basically, I want to see what can and can’t be done with this compared to traditional AIX environments running on PowerVM.

The fact that there’s another platform and hypervisor choice when it comes to running AIX is a big deal. For one thing, it’s still more proof that AIX is here for the long haul.

Hopefully I’ve explained everything well. Please pose questions and share your impressions in comments.

A Not So Technical Look at Technical Debt

Edit: Don’t let things rot. Entropy is real.

Originally posted May 8, 2018 on AIXchange

This Twitter discussion got me thinking about technical debt, a concept I discussed here:

As often as I see it, it still surprises me when I encounter a company that depends on some application, but chooses to run it on unsupported hardware without maintenance agreements and/or vendor support. If anything goes sideways, who knows how they will stay in business.

I find it a bit funny to see other tech pros take such a narrow view of technical debt. Does it only apply to software, or is it reasonable to also apply it to other areas of technology? Why not both? Why not go even further? Consider, for instance, this analogy:

In a non-technical example, imagine owning an older car that has served well but is due for retirement in three months. In three months you plan to invest in a new car because the old one is no longer cost effective due to continuous maintenance needs, lower efficiency and so forth. But before your three month plan to buy a new car comes around, the old car suffers a minor failure and now requires a significant investment to keep it running. Putting money into the old car would be a new investment in the technical debt. Rather than spending a large amount of money to make an old car run for a few months, moving up the time table to buy the new one is obviously drastically more financially sound.

With cars, we see this easily (in most cases.) We save money, potentially a lot of it, by quickly buying a new car. If we were to invest heavily in the old one, we either lose that investment in a few months or we risk changes our solid financial planning for the purchase of a new car that was already made. Both cases are bad financially.

IT works the same way. Spending a large sum of money to maintain an old email system six months before a planned migration to a hosted email system would likely be very foolish. The investment is either lost nearly immediately when the old system is decommissioned or it undermines our good planning processes and leads us to not migrate as planned and do a sub-par job for our businesses because we allowed technical debt to drive our decision making rather than proper planning.

Technical debt is accrued when we put off patching our systems or upgrading our hardware, or when we fail to keep our maintenance contracts in place. Sometimes it’s accidental. We’re told that a system will be replaced, so we hold off on patching or upgrading an application or OS. But then the promised replacement is delayed or canceled, and the next thing you know, we’re running older code and the upgrade path is far more complicated than it would have been had we kept on top of it. Or we may let support lapse, believing that servers are going away soon. Instead, soon never happens and a critical piece of our infrastructure is no longer supported. Everything from missing a change window to change freezes to lack of cycles can contribute to these scenarios.In IT, putting off change because it’s convenient is an all-too prevalent and incredibly damaging mindset. There always comes a point where replacing old technology is the most cost-effective option. Unfortunately, far too few businesses recognize this. It’s on us to make executives, and even some of our colleagues in IT, understand the true cost of technical debt. If we let things rot, they will, in fact, rot, and letting things rot is far worse than doing nothing. We must fight to retain the ability to upgrade our systems as needed.

More Help for the HMC Transition

Edit: I assume you have transitioned by now?

Originally posted May 1, 2018 on AIXchange

Awhile back Kiran Tripathi (@SocialKiran) made note of this IBM Knowledge Center breakdown of HMC interfaces.

The Hardware Management Console (HMC) provides more than one interface that you can use to manage your virtual environment.

These interfaces are called the HMC Classic interface, the HMC Enhanced interface, and the HMC Enhanced+ interface. When you log on to the HMC for managing your virtual environment, you select which interface you want to use. To change the interface that is used, log out of the HMC and log in to the HMC with a different selection.

HMC Classic interface
The HMC Classic interface is the continuation of the interface that was provided in previous versions of the HMC. The HMC Classic interface supports many of the same tasks as the HMC Enhanced interface, such as managing servers, partitions, and adapters.

The HMC Classic interface is not supported in Hardware Management Console (HMC) Version 8.7.0, or later. The functions that were previously available in the HMC Classic interface are now available in the HMC Enhanced+ interface.

HMC Enhanced interface
The HMC Enhanced interface is an updated version of the HMC Classic interface, and is provided with HMC Version 8 Release 8.2.0. In addition to providing simplified paths to completing virtualization management tasks, it also provides new functions that are not available in the HMC Classic interface. The main new function is the use of templates. You can use templates to complete the following tasks:

  • Deploying a system.
  • Creating a partition.
  • Capturing a system or partition configuration as a template.
  • Running Template Library management functions, including edit, copy, import, and export.

HMC Enhanced+ interface
The HMC Enhanced+ software interface provides new navigation paths to tasks that are common to the HMC Classic interface and the HMC Enhanced interface, and some functions that are unique to the HMC Enhanced+ interface. The new tasks and functions include enhanced Activate task for partitions and Virtual I/O Servers with network boot and network installation options, and graphical representation of the virtual network that represents the relationship between various components in the network for a system.

This information differs slightly from what’s found in the HMC doc I referenced in February, but there are similarities. In each case, you get a list of tasks with explanations of this can be accomplished in either the classic interface, the enhanced interface or the Enhanced+ interface.

Of course the classic interface is going away, but for those environments that are still working with the old menus, these documents will help you with the transition.

History Bytes

Edit: Where will we be in 20 years?

Originally posted April 24, 2018 on AIXchange

How many of you keep stacks of old computer publications? I did, until I was finally told to get rid of some of my PC Computing magazines from the 90s. Recently though, I was sent back in time when someone on Twitter posted a link to Byte magazine’s April 1998 issue

The tweeter originally pointed to an article about crash-proof computing that explained why PCs are so crash prone compared to mainframes and other mission-critical computers. Yep, that was still a new concept back then. 

Some of the characteristics of “crash-proof computing” included attentive administrators, reliable software, robust memory protection and redundant hardware. Of course, this could very easily describe today’s IBM Power Systems environments. 

The author contrasted that with the typical PC environments of the day. If you were in the workforce in 1998 you might remember how often machines would crash. Admittedly, today’s personal computers are very reliable. It’s been years since I’ve had to reboot my laptop. 

Another article from this issue that caught my eye was a quick two-page read titled, “IBM’s Powerhouse Chip.” It described the 8-way superscalar core, and how the new POWER3 raises the bar for high-performance CPUs.

 With POWER9 systems now available, I couldn’t help but marvel at how far we’ve come in 20 years. Give it a read, and I expect you too will find yourself thinking about how nice it is to be running machines nowadays. I can remember having root on some POWER3 and POWER4 hardware years ago. It truly is a night-and-day difference from then to now. 

For me, the biggest nostalgia kick came from the ads. Here’s a taste:

  • Micron servers and laptops (with specs that my phone beats today).
  • Digital Alpha.
  • Gateway.
  • Silicon Graphics.
  • IBM e-business and solutions for a small planet.
  • Intel Pentium II.
  • IBM Desktstar 14G and 16G disks.

A lot of these companies are gone now, as is Byte itself–this issue was one of the last. Of course, many big tech companies endure: Dell, IBM, Microsoft, Information Builders, APC, Kingston, Intel, and CDW, to name a few. 

Honestly, these little trips back in time, and the chuckles I get from them, help keep me grounded. Rest assured though, 20 years from now, today’s cutting edge technologies will seem similarly quaint.

Troubleshooting a vSCSI Mapping

Edit: Some links no longer work

Originally posted April 17, 2018 on AIXchange

I was recently asked to help troubleshoot a vSCSI mapping. My colleague was running an SAP HANA POC workload on POWER, and is new to the platform. At the time an older version of HMC code was being used, so we still had access to the classic HMC interface. Since mapping the virtual adapters and managing the profiles is a manual process, there’s always the potential for errors. And unfortunately, mistakes were made with the adapter numbering.

As an aside, in the enhanced version of the HMC GUI, all of this is automated. The enhanced version may still be unfamiliar to many, but it does provide this among other benefits. Regardless, we’re going to have to make the transition, because IBM has made it known that the classic interface is going away and support for x86 hardware appliances is being phased out.

In any case, even though we verified that the adapters were set up correctly in the profiles and the LUN was mapped to the correct adapter, a Linux OS that had been previously installed on the LUN couldn’t be booted. It didn’t appear as a as a bootable device. At this point, the question was posed: “are you sure the LUN is mapped correctly? Can the LPAR really see it?” 

As I often do, I decided to try an internet search, and came across this thread. It’s from 2013 but it’s precisely what I was hoping to find:

In the first reply:

During SMS processing “LIST ALL DEVICES” will ONLY show devices that SMS thinks are BOOTABLE.

Then further in the thread:

You can try and boot the partition in openboot. Then type ioinfo and pick vscsi, this should show you your LUNs.

In our case it was as simple as booting the LPAR normally (i.e., not into SMS mode) and then selecting option 8, the open firmware prompt. From there we could select the proper disk and get a confirmation that it did indeed see the LUN, along with the LUN’s size.

This was enough to convince us that our mappings were fine. We then used bootable media to make the LUN bootable.

Although many of us use NPIV or shared storage pools these days, mentally file this anecdote away in the event you ever find yourself using vSCSI.

Fixes for a PowerHA Issue

Edit: Hopefully you have already put on these patches by now

Originally posted April 10, 2018 on AIXchange

I received this information from Chris Gibson a few weeks ago. If you use PowerHA, I recommend checking to see if you’re on affected levels of AIX:

High Impact / Highly Pervasive APAR
IJ02843 – PowerHA node halt during ip changes

“USERS AFFECTED:

  • Systems running PowerHA System Mirror on the
  • AIX 7100-05 Technology Level or
  • AIX 7200-02 Technology Level with
  • rsct.basic.rte at 3.2.3.0.
    **************************************************************
  • PROBLEM DESCRIPTION:
  • An improvement in obtaining adapter state information from
  • AHAFS event responses introduced some errors in handling
  • * internal tracking of monitored IP addresses.
    *
  • This can result in a core dump of the hagsd process any time
  • an IP change occurs at the OS layer. This means the failure
  • cannot happen while a cluster is running stable with no
  • changes occurring, but it is a risk during startup, shutdown,
  • or a failover scenario, and cannot be predicted beyond that.

Problem summary
    A flaw in handling of monitored IP changes during some
    adapter state improvements in RSCT 3.2.3.0 has led to the
    risk of a hagsd core dump in a couple code paths.

Problem conclusion
    Transition of IP lists during a monitoring change has been
    corrected.”


Here’s some additional information:

In a PowerHA cluster, if an IP address is changed on one of the AIX nodes, the node may reboot unexpectedly due to a core dump of the hagsd process. This can happen when a Service IP is configured during normal PowerHA startup/shutdown/failover, or other operations resulting in an IP change.

Affected AIX Levels and Recommended Fixes
Minimum Affected Level Maximum Affected Level Fixing Level Interim Fix
7100-05-00
rsct.basic.rte 3.2.3.0 7100-05-02
rsct.basic.rte 3.2.3.0 7100-05-03
IJ02843 iFix
7200-02-00
rsct.basic.rte 3.2.3.0 7200-02-02
rsct.basic.rte 3.2.3.0 7200-02-03
IJ02843 iFix

Note: Applying the ifix requires PowerHA to be stopped on the node prior to applying the fix.

Securing Your HMC

Edit: Some links no longer work

Originally posted April 3, 2018 on AIXchange

IBM developerWorks has a nice article about securing your HMC:

If you use Power HMC and are looking for information on how to secure your HMC, you are at the right place. Default configuration of HMC is good enough for most enterprise users. You will find steps to harden HMC further based on your corporate security standards. The steps mentioned below work on HMC V8.8.4.0 and later. It is recommended that every HMC is set to minimum at Level 1. You may choose to go to Level 2 and Level 3 depending on your environment and corporate security requirements. If necessary, please check with your corporate security compliance team before making these changes.

The document includes instructions for changing passwords, setting up accounts for each HMC user, assigning necessary roles to users, setting up LDAP, blocking ports in firewalls, etc. You’ll also find a list of HMC network ports, along with some thoughts around completely taking your HMC off of the network. There’s discussion around setting up NIST SP 800-131A compliance, ciphers and certificates, along with commands you can use to audit the HMC and audit user activity. Finally, there’s a mention about centralizing your HMC logs using rsyslog to send data to a central log server.

The end of the doc lays out the options for tracking fixes:

If you come across a hot new security vulnerability everyone is talking about, you can look at the attachment section of wiki to start with. It has a list of vulnerabilities fixed in last couple of years. You can click on CVEs to read associated security bulletin. This list will be kept up-to-date.

You can search for the latest security bulletins, check Twitter (@IBMPowereSupp) or subscribe to receive email notifications. There’s also a discussion group on LinkedIn (IBM PowerVM).

As an aside, the doc includes a recommendation to use Kali Linux to determine the OpenSSH version that’s running on your HMC. A commenter mentions that if running Kali and metasploit is frowned upon in your environment, running ssh –vvv is another way to find the OpenSSH version.

Beyond that, what do you think? This seems like useful information that we can use in our environments.

POWER9 Attracts a BIG Customer

Edit: Still pretty impressive

Originally posted March 27, 2018 on AIXchange

I’d heard rumors for a while, but those rumors were confirmed last week: Google runs IBM Power Systems* in its production environment. This is from Forbes.com:

The biggest OpenPOWER Summit user news was that Google confirmed that it has deployed the “Zaius” platform into its data centers for production workloads. Google’s Maire Mahony, on stage at the event today said, we have “Zaius deployed in Google’s Data Center,” and we are “scaling up machine count.” She concluded by saying she considers the platform “Google Strong.” Mahony shared with me afterward that “Google Strong” refers to the reliability and robustness. Not to take away from the other deployments announced at the event, but this announcement is huge.

Mahony explained what Google likes about POWER9:

  • More cores and threads for core Google search
  • More memory bandwidth for RNN machine learning execution
  • Faster and “more open” flash NAND sitting on OpenCAPI acceleration bus

I was told it was a simple recompile to get their code to run on POWER, but I’d still love to hear Google engineers talk about their actual use of POWER and how these systems perform compared to the others in the data centers.

The Forbes article itself is more generally focused on POWER9 and news from the OpenPOWER Summit. The Motley Fool gets more into specifics:

Why, and for what, is Google using POWER9 processors? Google found that the performance of its web search algorithm, the heart and soul of the company, scaled well with both the number of cores and the number of threads available to it. IBM’s POWER9 processor is a many-core, many-thread beast. Variants of the chip range from 12 to 24 cores, with eight threads per core for the 12-core version and four threads per core for the 24-core version. Intel’s chips support only two threads per core via hyperthreading.

The bottom line is that IBM’s POWER9 chips are ideally suited for workloads that fully take advantage of the large number of threads available. Google’s web search is one such workload. They’re not well suited for workloads that don’t benefit from more threads, which is why the market-share ceiling for POWER isn’t all that high.

Mahony also talked about the importance of bandwidth. It doesn’t matter how fast a processor is if it can’t move data fast enough. IBM claims that one of its POWER9-based systems can transfer data up to 9.5 times faster than an Intel-based system, using OpenCAPI and NVIDIA NVLink technology. That’s important for any kind of big data or artificial intelligence (AI) workload.

AI workloads are often accelerated by GPUs or other specialized hardware. Google developed its own accelerator, the Tensor Processing Unit, which it uses in its own data centers for AI tasks. But these accelerators still require a host processor that can move data fast enough.

Obviously readers of this blog–as well as the guy who writes it–already know and love POWER. But it’s always nice to see some big name enterprises get on board with POWER hardware.

System Planning Tool Updated for POWER9

Edit: Have you grabbed the latest version?

Originally posted March 20, 2018 on AIXchange

The six POWER9 servers IBM announced last month GA this week. Are you ready to refresh your System Planning Tool?

The System Planning Tool (SPT) helps you design a managed system that can support a specified set of workloads.

You can design a managed system based on workload data from your current systems, based on new workloads that you want the managed system to support, based on sample systems that are provided with the utility, or based on your own custom specifications. The SPT helps you design a system to fit your needs, whether you want to design a logically partitioned system or want to design an unpartitioned system.

There are a number of options available to help you get started with using the SPT:

  • You can use the sample system plans that the SPT provides as a starting point for planning your system.
  • You can create a system plan based on existing performance data.
  • You can create a system plan based on new or anticipated workloads.
  • You can create a system plan by using the Hardware Management Console (HMC). You can then use the SPT to convert the system plan to SPT format, and modify the system plan for use in system ordering or system deployment.

With the SPT, you can copy logical partitions from a system in one system plan to either another system in the same system plan or to a different system in another system plan. For example, you can build up system plans that contain your own sample logical partitions, and then copy one or more of these sample logical partitions into a new system plan that you are creating. You also can copy a logical partition within the same system plan. For example, you can define the attributes of a partition within a system plan and then make 7 copies of that partition within the same plan.

You can export a system plan as a .cfr file and import it into the marketing configurator (eConfig) tool to use for ordering a system. When you import the .cfr file into the eConfig tool, the tool populates your order with the information from the .cfr file. However, the .cfr file does not contain all the information that the eConfig tool requires. You will need to enter all required information before you can submit your order.

If you make any changes to the hardware assignments or placement in the system, the SPT validates the changes to ensure that the resulting system fulfills the minimum hardware requirements and hardware placement requirements for the logical partitions.

When you are done making changes to the system, you can save your work as a system plan. You can import this file into your HMC. You then can deploy the system plan to a managed system that the HMC manages. When you deploy the system plan, the HMC creates the logical partitions from the system plan on the managed system that is the target of the deployment.

IBM’s SPT page has further information. If you’d like to be notified of SPT updates, select the Releases tab and email IBM (subject line: subscribe to distribution list) at the address listed on that page.

Select the download tab to see the latest version (6.18.047.0 as of this writing).

I’ve heard of users having issues minor issues with this particular version of SPT. While it may still be bleeding edge software, once the kinks get ironed out you’ll be glad you have this tool. So be sure to download the updates.

Dealing With an HMC Upgrade Problem

Edit: Hopefully none of you will see this in the future

Originally posted March 13, 2018 on AIXchange

During a recent HMC upgrade, a buddy of mine had a problem. While you’re unlikely to find yourself in this situation, if you ever do, you’ll be glad you read this.

He was trying to create a new VIO server in enhanced mode, and it kept failing. After opening a PMR and sending in logs, he got this back from IBM Support.

I’ve gone through your logs and found that you have hit a known issue. The problem actually occurred when you upgraded the HMC. During the upgrade some users and groups were recreated with new UIDs and GIDs, rather than being restored, so files in /data that existed prior to the upgrade are orphaned, and the new versions of those users do not have full access to them. The restore upgrade data never actually completed.

PMC0000E: An unexpected Throwable was caught.
Throwable=”PmcJobException” with message “com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft”
Cause=”PmcJobException” with message “com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft”
com.ibm.pmc.jaxb.api.server.jobs.PmcJobException: com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft
/tmp/ls_all.out:
6291473    4 drwxr-xr-x   2 503      504          4096 Nov  7 16:53 /data/pmc/templates/systemtemplate
/data/pmc/templates/systemtemplate

You have a few options to recover from this condition.

First, if you upgraded from 850 to 870 and you saved the SaveUpgradeData off to USB, you can scratch install the HMC to 850 and then restore upgrade data after the scratch install at 850. You would then need to install PTF MH01730 and THEN upgrade to 870. If you no longer have that data, you won’t be able to restore any upgrade data. If you upgraded from 860, you’ll have to scratch install.

The second option would be to scratch install 870 without any restore of upgrade data. We have a doc that tells you the information you will need to document for the new install.

Scratch Installation of Version 8 HMC from Recovery DVD

Items to Document Prior to Performing Scratch Installation of the HMC

The third option is not supported, but many customers have had good luck with this method. You can run the following commands as root to correct the ownership/permissions issues. This is not guaranteed but many have had good luck with it. If any problems arise during or after these steps, you will have to scratch install the HMC.

find /data -uid 501 -exec chown ccfw {} +
find /data -uid 502 -exec chown soliddb {} +
find /data -uid 503 -exec chown wlp {} +
find /data -gid 501 -exec chgrp ccfw {} +
find /data -gid 503 -exec chgrp soliddb {} +
find /data -gid 504 -exec chgrp wlp {} +
find /data -gid 508 -exec chgrp hmc {} +

You will need to be root to try the workaround. Here are the needed passwords to gain root access.

ssh in as hscpe user
enter hscpe password
run PESH <serial number>
enter password of the day
run su –
enter your root password

Obviously for that third method to work, you’d need to get the hscpe password from IBM Support, but it is something to keep in mind.

Again, you’ll probably never run into this, but if you do, this at least gives you some ideas. In any event, open up a PMR and let support guide you.

AIX Migration Prep

Edit: Still good information

Originally posted March 6, 2018 on AIXchange

Here’s an oldie but a goody: a document covering AIX migration preparation:

Information regarding version 5, 6 and 7 installation:

  • In AIX V5, at the first reboot after an install, you will be prompted to view/accept your licenses before you can continue to use your system.
  • Starting in AIX Version 6.1, a separate Software Maintenance Agreement (SWMA) acceptance window displays during installation immediately after the license acceptance window. The response to the SWMA acceptance (accept or decline) is stored on the system, and either response allows the installation to proceed, unlike license acceptance which requires an accept to proceed.
  • NIM masters/servers which are to serve version 5, 6 or 7 resources should be upgraded first. A NIM master/server must be at the same level or later than the software in any resources being served.
  • Any migration will require more space. If you have no free partitions in the root volume group, or all file systems are near full, it would be a good idea to add another disk to the rootvg. Alternatively you can install a mksysb of the system to a larger disk before running the migration. See the table below for the required space information for AIX 5, 6, and 7.

For more complete information on your release, it is highly recommended you review the Release Notes.

5.3 Release Notes

6.1 Release Notes

7.1 Release Notes

NOTE:

The latest version of the media should be used to do the migration. The latest version will always be shipped when you order the media. If you have an older version of the media and would like to obtain the latest version, you can order it at the following web site:

IBM Entitled Software Ordering and Download

If assistance is required registering for the site, call Software Delivery (1-800-879-2755 opt2 then opt2 again for the U.S.). They will require the machine model and serial number of a machine licensed to run the AIX version you are ordering. Outside the U.S., contact your local support center. 

Those who regularly do migrations may find this information to be pretty basic, but for everyone else–particularly people who are new to supporting AIX–it’s a great resource.

The 2018 IBM Champions

Edit: Some links no longer work.

Originally posted February 27, 2018 on AIXchange

This came out about a month ago, but I want to acknowledge this year’s IBM Champions:

After reviewing more than 1400 nominations, IBM is proud and happy to announce the 2018 class of IBM Champions. The IBM Champions program recognizes innovative thought leaders in the technical community and rewards these contributors by amplifying their voice and increasing their sphere of influence.

An IBM Champion is an IT professional, business leader, developer, and educator who influences and mentors others to help them innovate and transform digitally with IBM software, solutions, and services. From the nominations, 650 IBM Champions were selected…. Among those are:

  • 62% renewing; 38% new Champions
  • 38 countries represented
  • 6 business areas, including Analytics (34%), Cloud (25%), Collaboration & Talent Solutions (24%), Power Systems (9%), Storage (1%), IBM Z (7%)


These individuals evangelize IBM solutions, share their knowledge, and help grow the community of professionals who are focused on IBM offerings. IBM Champions spend a considerable amount of their own time, energy, and resources on community efforts—organizing and leading user group events, answering questions in forums, contributing articles and applications, publishing podcasts, sharing instructional videos, and more.

As a reward, IBM Champions receive IBM Champion-branded merchandise, IBM Champion open badges, and invitations and discounts to IBM conferences. They are highlighted online and recognized at live events. In addition, they may be offered various speaking opportunities that enable them to raise their visibility and broaden their sphere of influence. They are recognized for the work they have done over the past year and supported and enabled with education and opportunities to do even more advocacy in the next year.

You can search for names from all 650 Champions here. The 39 IBM Power Systems champions–of which I am one–are listed hereI’ve said it before, but it bears repeating: I’m proud of this honor. It’s always nice to get recognition for the things you do, and believe in.

A Valuable Doc on HMC GUI Options

Edit: Some links no longer work

Originally posted February 20, 2018 on AIXchange

Alan Fulton (@The_Iron_Monger) tweeted the link to this information on GUI options in the new HMC. As the classic view we’re all used to goes away, you should explore this doc that clocks in at a tidy 15 pages:

Introduction
Menus Available – Enhanced GUI Only
Enhanced GUI Path New Features
Classic GUI to Enhanced GUI Mapping
Main Menu Navigation
Managing Servers -> All Actions
Managing Servers
Managed System Advanced Options
Partition Management
Partition Properties
Serviceability Options
Capacity Upgrade On Demand
Groups and Power Enterprise Pools
Management of the HMC and Administration
Service Management/Serviceability
August 2017 Code Update
Enhanced GUI Advantages
Network Topology
Storage Topology
Optional views for System and LPAR Objects
Box View
List View
Shortcuts to menus
Resources and Performance Dashboard
Additional Information and firmware level
Relational view Tasks LogThe appendix has some links as well that you’ll find particularly useful: 

Be sure to read it over.

IBM Unveils Six POWER9 Servers

Edit: And now we wait for POWER10 servers

Originally posted February 13, 2018 on AIXchange

IBM is announcing six new POWER9 scale-out servers today, with general availability set for March 20. IBM is touting these systems as future forward, cloud-ready infrastructure for mission critical workloads. The systems will max out with 4 TB of memory and will have PCIe Gen4 adapters, which doubles the bandwidth of Gen3 cards.

Each system will have PowerVM Enterprise Edition built in, and IBM is helping customers migrate by providing 60-day temporary licenses for existing machines that don’t already have PowerVM Enterprise Edition. This will allow you to use live partition mobility to migrate running workloads from existing POWER7 or POWER8 machines to your new POWER9 machine.

The new scale-out systems will use direct-attached industry standard DDR4 DIMMs in place of the custom buffered memory DIMMs that we saw on POWER8, making memory subsystem pricing more competitive with non-IBM servers. The memory subsystems will provide up to 170 GB/s of bandwidth.

The POWER9 processors will run up to eight threads, which should in particular provide a performance boost to applications that are written to exploit these additional threads. These systems are configured to have dynamic, adjustable processor frequencies. For example, a maximum performance mode setting will have different thermal and energy characteristics compared to other settings like static power save, dynamic performance, etc.

These six new systems consist of a Linux-only variant, three AIX and IBM i “traditional” servers, and two SAP HANA edition machines that will be capable of running limited AIX and IBM i workloads (up to 25 percent core activations total). Five of the systems will max out at 4 TB of memory; the S914 will max out at 1 TB.

The L922 Model

The L922 (model 9008-22L) is a 2U 1- or 2-socket system with 8, 10 or 12 cores per socket. This system is Linux-only. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0), and four are Gen3. It can have up to eight small form factor drives.

The S922 Model

The S922 (model 9009-22A) is a 2U 1- or 2-socket system with 4, 8 or 10 cores per socket. This system will run AIX, IBM i or Linux. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0), and four are Gen3. It can have up to eight small form factor drives.

The S914 Model

The S914 (model 9009-41A) is a 4U 1-socket system that will run AIX, IBM i or Linux. This is the only system that comes with a tower variant. It will have 4, 6, or 8 cores per socket, though keep in mind you still have the option to factory deconfigure cores on your systems if you find that one or two cores are sufficient for your smaller workloads. If you choose to go with four cores, you won’t be able to attach I/O drawers to the machine. It will have eight PCIe slots; two are Gen4 (and CAPI 2.0 capable) and six are Gen3. There are options for 12 or 18 small form factor internal disks, and it will have an option to run on 110 VAC power. Reminder: This system will max out at 1 TB of memory.

The S924 Model

The S924 (model 9009-42A) is a 4U 2-socket system with 8, 10, or 12 cores per socket. It will run AIX, IBM i or Linux. It has a total of 11 PCIe slots; five are PCIe Gen4 (4 CAPI 2.0), and six are PCIe Gen3. There are options for 12 or 18 small form factor internal disks.

The S914, S924, and H924 are all capable of including internal RDX media. The selection of RDX will affect how many internal disks the machines can hold, but note that none of the six systems will have internal DVDs or tape drives. Plan on doing more with USB flash media, external USB connected DVDs and network based operating system installations going forward.

The H922 Model

The H922 (model 9223-22H) is a 2U 1- or 2-socket system with 4, 8 or 10 cores per socket. It will primarily run SAP HANA, but can run up to 25 percent AIX and IBM i core activations. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0) and four are Gen3. It can have up to eight small form factor drives.

The H924 Model

The H924 (model 9223-42H) is a 4U 2-socket system with 8, 10, or 12 cores per socket. It will run SAP HANA with up to 25 percent AIX and IBM i core activations. It has a total of 11 PCIe slots; five are PCIe Gen4 (4 CAPI 2.0), and six are PCIe Gen3.

An interesting feature with all these machines (except for the L922) is the capability to run NVMe devices. The POWER9 scale-out systems will support up to 4 x 400 GB M.2 form factor NVMe devices on the S914, S922, S924, H922 and H924. This should be particularly beneficial for environments that include VIO servers, since the NVMe devices can be used as your internal boot media. This is certainly more convenient and cost effective compared with ordering a split backplane and hard drives.

The POWER9 Software Stack

Here’s the software stack you’ll need to run on these machines:

Firmware level FW910

HMC code level V9R1.910

VIOS 2.2.4, 2.2.5, 2.2.6

AIX 7.2 TL2

AIX 7.2 TL0, TL1 (P8 Compatibility Mode)

AIX 7.1 TL4, TL5 (P8 Compatibility Mode)

AIX 6.1 TL9 (P7 Compatibility Mode)

IBM i 7.3 TR4

IBM i 7.2 TR8

Ubuntu 16.04.4 LTS (P8 Compatibility Mode)

RedHat RHEL 7.4 LE (P8 Compatibility Mode)

SuSE SLES 11 SP4 (P8 Compatibility Mode)

SuSE SLES 12 SP3

For an overview of AIX 7.2, read this. Incidentally, I’ve seen roadmaps for AIX 7.1 and 7.2 that extend to 2027. Our future’s so bright, AIX pros should don protective eyewear.

To round things out, there’s a new 19-inch rack option, the 7965-S42.

Statement of Direction: AIX VM on Hyperconverged Systems

One other tidbit I caught in today’s news: In a statement of direction, IBM said it “intends to enable selected AIX VM guests on IBM Hyperconverged Systems powered by Nutanix (CS series).” I wrote a previous AIXchange post about Nutanix running on POWER nodes, and I’ll revisit this topic in the near future.

We’ve been talking about POWER9 for awhile now, but soon it will actually be in our computer rooms. I can’t wait.

Hardware Maintenance EOS Extension on the Way

Edit: All of these are still important considerations

Originally posted February 6, 2018 on AIXchange

Back in November IBM announced a hardware maintenance end of service (EOS) extension for customers with unsupported legacy systems. This offering is expected to be available in the spring:

IBM Hardware Maintenance End of Service Extension is the answer for clients who are not able to migrate off IBM devices prior to the end of service (EOS) date. With this offering, IBM may continue to provide support to clients beyond the effective EOS date based on availability of repair parts, skills, and engineering field support.

IBM recognizes that there are many reasons why clients might be unable to migrate to replacement technology prior to a device’s EOS effective date, and therefore require extended support for a period of time. With Hardware Maintenance End of Service Extension, IBM may continue to provide limited support to clients beyond the effective EOS date based on availability of repair parts, skills, and engineering field support.

For pricing information, contact your IBM representative or IBM authorized Business Partner.

The topic of those who live with legacy systems and how to help them is a drum I’ve been beating regularly of late. And previously I wrote about the barriers to moving forward, including a lack of motivation and the double-edged sword that is hardware reliability.

If I’m being honest, you can scare people into action. I’ve told clients how vulnerable they are and what it can mean to their business if that system that sits in a corner actually goes away. I’ve pointed out that their backups are inadequate and their disaster recovery plans and support options are non-existent. Making these points can occasionally provide motivation (especially if I use a spooky voice).

Seriously though, it’s a shame that oftentimes these customers don’t take action until the worst has happened or is happening. Think about how catastrophic it would be if your system went down and you had no support, no backups and precious few options for replacing that old hardware.

Or think about this: Whatever IBM does, or whatever I or anyone else says, will have only a limited effect. You simply can’t reach some of these enterprises, who may not employ a single IT person who’s up on AIX. And if they don’t have anyone with expertise on their systems, they most likely don’t have anyone who would bother to keep current on IBM Power Systems and AIX news and information, either. So for as much as I’ve discussed this, I know I’m essentially preaching to the choir. The sad truth is that a lot of these customers–even though they’re fine now, even though they’ve been fine for years–won’t be fine forever. Eventually, lightning will strike, and not all of these businesses will survive.

Follow Me (at a Faster Speed)

Edit: I typically run at 2.8x these days.

Originally posted January 30, 2018 on AIXchange

This article helps me articulate the benefits of listening to information at something faster than normal speed:

Rachel Kenny started listening to podcasts in 2015 — and quickly fell behind. “As I started subscribing to more and more podcasts, they started stacking up, and I couldn’t keep up at normal speed,” the 26-year-old data scientist in Indianapolis told BuzzFeed News. “I also had to listen to the backlist of all the podcasts when I subscribed to them.” So Kenny began listening faster: first at 2x, then she worked her way up to 3x.

Kenny’s listening habits may be extreme, but she’s not alone. Meet the podfasters, a subset of podcast obsessives who listen to upward of 50 episodes a week, by, like Kenny, listening extremely fast. They’re an exclusive group: According to Marco Arment, creator of the Overcast podcast app, only around 1% of Overcast listeners use speeds of 2x or higher. (An app called Rightspeed, which costs $2.99, allows you to listen at up to 10x.)

Yes, I actually do this, and no, I don’t blow through recordings at anywhere near 10X speed. But as someone who frequently tunes into replayed webinars, prerecorded vendor training sessions and the like, I’m all for consuming the most information in a reduced amount of time. Being able to take in two one-hour webinars in a single hour without losing comprehension is certainly valuable to me.

I’ve found that speeding up recordings 1.5X to 2X works best. I do have to make myself focus on the content to understand what’s being said, but honestly, I see that as another benefit. Overall though, at this range I can follow along without difficulty. And guess what? You probably can, too.

More from the article:

In fact, according to behavioral neuroscientist Stephen Porges, because recordings played at higher speeds are at a higher pitch, they are actually easier to hear. Low-frequency noises, like street noise, vacuum cleaners, or airplanes, get in the way of our understanding of people talking; by playing podcasts at a higher speed, the listener is creating a greater acoustic differentiation between the words and lower-frequency background noises. According to Porges, the muscles in the middle ear help to dampen low-frequency sound so we can hear speech more clearly — but if we don’t exercise those muscles (by, say, not having much human interaction), then they don’t work as well. Thus, listening to things at a higher frequency, and speed, could be helpful.

I speed up my recordings in a couple of ways. If it’s something I can download in .mp4 format, for example, I’ll open it with vlc, go to the menu and select playback/speed. This gives me options to go faster or slower. For YouTube videos that I’ll play on Firefox or Chrome, I’ll login and click the settings icon, and from there I can set the speed. Various browser plugins also allow you to control video playback speed. Fun fact: These plugins also work with Netflix on your computer, so binge-watching a series can go that much quicker.

Oddly enough, I haven’t figured out how to do this with my TV. At least I’ve yet to find the DVR controls that speed up the content. Admittedly, I’m not that motivated to find a solution, since I can always pop old school DVDs into my computer and use vlc.

Whether you’re consuming AIX information or trying to catch up on a favorite TV series, I urge you to explore this. And if you do “speed listen,” tell me about it in comments.

A PowerAI Primer

Edit: Some links no longer work.

Originally posted January 23, 2018 on AIXchange

I found this IBM developerWorks post about PowerAI on the IBM Linux on Power Twitter feed (@ibmpowerlinux).

This information is a pleasant surprise. Articulating why customers should care about PowerAI can be challenging. In many cases this workload is handled by departments or organizations that are different from the ones we typically work with:

PowerAI is an IBM Cognitive Systems offering for the rapidly growing and quickly evolving artificial intelligence (AI) category of deep learning. PowerAI brings a suite of capabilities from the open source community and combines them into a single enterprise distribution of software that incorporates complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and training the model; to inference; testing; and moving the model into production.

Busy as we are tending to AIX servers and workloads, topics like TensorFlow or Caffe seldom come up. We might skim articles about AI or deep learning, but we quickly move on. But this post connects the dots for us:

Deep learning is the fastest growing subcategory of machine learning and uses software neural networks to help develop patterns of analysis within the system to generate predictive capability: deep learning is a platform that is capable of effectively learning how to learn, and it is immensely powerful for helping clients get the most out of their data.

You may think this information applies only to some distant future, but I find it quite timely. Look at it this way: Things are very different in our data centers today compared to 20 years ago. We need to have an idea of what’s coming over the next 20 years: What will PowerAI give your organization?

  1. Helps to make deep learning easier and faster for organizations….
  2. Designed to provide an end-to-end deep learning platform for data scientist.
    • Ready-to-use deep learning frameworks (TensorFlow, IBM Caffe, and BVLC Caffe).
    • Distributed as easy-to-install binaries.
    • Includes all dependencies and libraries.
    • Easy updates: Code updates arrive from a repository….
  3. Designed for enterprise scale. PowerAI enables clients to distribute the training of a model across many servers, with the potential for greatly improving performance. What used to take weeks on a single server can potentially now be completed in just hours. This distributed capability is also transparent to the application logic previously written for a single server implementation. It is the best of both worlds: potential performance improvements without having to change the application code.
  4. Deep learning to unleash new analytic capabilities….
  5. Training neural network models. With PowerAI, data scientists have visual tools for understanding accuracy while the model is running; If accuracy is not high, the model can be stopped without wasting additional time.

IBM intends to deliver IBM PowerAI Vision, an application development tool for computer vision workloads. IBM PowerAI Vision is intended to automatically train deep learning models for different image and video input data sets. 

Also check out these PowerAI videos: a shorter version, a longer version and an installation how-to.

Legacy Environments: What Can Be Done?

Edit: Yes, 5.3 is still out there.

Originally posted January 16, 2018 on AIXchange

Following up on this post about customers that continue to rely upon legacy systems, I’m curious: What would you do if you had to manage an environment with old POWER machines running AIX 5.3?

I still see it every now and then. For instance, recently I was talking to an executive in a manufacturing organization. This organization is filled with old equipment, production-related machines that cost $100,000 or more new. The makers of that equipment are long gone, but he knows how to maintain it and everything is paid for, so this guy is thrilled. As for his IT gear, he wouldn’t care if Windows* XP or DOS was still in place. In his mind he’s printing money with these ancient machines. Even though his OS is likely insecure and his hardware could stop running at any point, he doesn’t care. IT is simply not part of the equation. He’ll run these systems into the ground.

As I said, this sort of thing isn’t common, but it’s also far from exceptional. You’ve probably seen it yourself. Typically these operations have standalone systems that sit in a corner and run critical applications from internal disks. Best case, there may be a tape drive and old mksysb backup scripts that are happily running out of cron. Still, do you ever wonder how often, if ever, the tape drives are cleaned, or if new tapes are purchased? Has anyone ever tried restoring from these tapes?

Of course all these systems have been so reliable that no one even checks on them–not that anyone on staff would know about them anyway. The IT guy who took care of this stuff when it was new probably left years ago, and was never replaced.

They say you can’t help people who aren’t willing to help themselves, but when it comes to these customers, I still want to try. So how would you deal with these types of operations? What are some ways forward to at least try to minimize risk?

I would start with the used market and search for the same model hardware with the same tape drive–although even this option is becoming a challenge. If I could find similar hardware, I’d take a mksysb from the source system and try to restore it onto the “new” box. What I like about a secondary system is that this testing can go on without affecting anything. Plus, if you need to back out, it’s as simple as going back to the original machine. At least this provides some peace of mind, because if it all works, you know that there’s a good backup and hardware to restore it to.

If that cloned hardware can run, then it’s possible to try to upgrade the “new” machine, again with no downtime. Of course this is no sure thing, but surprisingly, some applications will run on a newer version of the OS. If the OS upgrade doesn’t work, or it’s already apparent that the application won’t run on a newer OS version, the next step would be to get AIX to the latest TL version possible. That would at least allow these folks to consider versioned WPARs or try to restore the mksysb onto a newer generation of hardware running in older processor modes.

If I couldn’t locate a whole new machine, I’d settle for ordering replacement internal disks and planning for an outage. I’d remove the original disks, install the new ones, restore the OS and try an upgrade. If I had to back out, I could just reinstall the original disks. Of course this method comes with its own risks.

These are all what I’d call “the best you can do” options. None of these solutions are ideal, and none change the grim reality that nearly of all of these customers are without hardware support, OS support or application support. And for whatever reason, this is considered an acceptable risk. If it ain’t broke, don’t fix it, right?

As I said though, I’m trying to help. Perhaps you can help, too. Hit the comments and tell me what you’d do in this sort of situation.

Security Vulnerability Impacts POWER Processors

Edit: Hopefully you are running current systems / firmware.

Originally posted January 9, 2018 on AIXchange

You’ve most likely heard the news that emerged last week regarding a security vulnerability impacting all microprocessors. There will be patches and fixes forthcoming for different architectures and microprocessors, including IBM POWER processors, as indicated in this Jan. 3 post from IBM’s PSIRT blog:

If this vulnerability poses a risk to your environment, the first line of defense is the firewalls and security tools that most organizations already have in place. Complete mitigation of this vulnerability for Power Systems clients involves installing patches to both system firmware and operating systems. The firmware patch provides partial remediation to this vulnerability and is a prerequisite for the OS patch to be effective. These will be available as follows:

Firmware patches for POWER7+, POWER8 and POWER9 platforms will be available on January 9. We will provide further communication on supported generations prior to POWER7+, including firmware patches and availability.

Linux operating systems patches will start to become available on January 9. AIX and i operating system patches will start to become available February 12. Information will be available via PSIRT.

Clients should review these patches in the context of their datacenter environment and standard evaluation practices to determine if they should be applied.

PSIRT also issued this post that includes links for POWER and System z:

For a detailed explanation, check out this post from Security Intelligence, an IBM-sponsored site: 

A hardware vulnerability, discovered independently by researchers from academia and Google, underscores a microprocessor flaw that, if exploited, could allow an attacker to read data from privileged kernel memory.

Since this flaw impacts all modern microprocessors, it can affect any device that uses them, including multiple operating systems running on mobile devices, laptops, workstations and servers.

It is important to note that to exploit this vulnerability, a malicious actor would need to execute untrusted code on the physical system or on a virtual machine linked to that system. This may include running content from webpages loaded in web browsers or accessed through mobile apps.

This article also provides these recommendations for mitigating risk:

This new triple-pronged flaw requires a risk assessment process for all organizations. Security teams will have to inventory their assets and determine which ones may be vulnerable. Then, after setting criticality and sensitivity scores, assets should be patched or applied mitigating controls.

An attacker must be able to place code into an application running on the system itself or on a virtual machine attached to the system to use this exploit this vulnerability. Therefore, protections to prevent unauthorized access into systems from outside the infrastructure can serve as a first barrier, as well as existing access controls for internal users.

The most immediate action security teams can take to protect assets is to prevent execution of unauthorized software, or access of untrusted websites, on any system that handles sensitive data, including adjacent virtual machines. Assume that any type of execution, including binary execution, carries the potential for attack.

Also, ensure security policies are in place to prevent unauthorized access to systems and the introduction of unapproved software or software updates.

If the organization is operating environments where preventing execution of unauthorized software is not possible, or is inconsistent, protection may only be possible by applying updates to system firmware, operating systems, and application code, as well as leveraging system-level protections to prevent the execution of unauthorized code.

In cases of update impact issues, mitigating controls should be applied in the interim, but patching is ultimately the remediation needed to prevent potential attacks. Please note that most patches released so far require rebooting systems and must be evaluated for the potential impact of such event on a given asset.

These hardware bugs, incidentally, are being called Meltdown and Spectre. For a quick overview, see this RedHat-produced video, and read this primer:  

Meltdown and Spectre exploit critical vulnerabilities in modern processors. These hardware bugs allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit

Meltdown and Spectre to get hold of secrets stored in the memory of other running programs. This might include your passwords stored in a password manager or browser, your personal photos, emails, instant messages and even business-critical documents. Meltdown and Spectre work on personal computers, mobile devices, and in the cloud. Depending on the cloud provider’s infrastructure, it might be possible to steal data from other customers.

The Internet Storm Center has more. This page also links to a podcast segment where Meltdown and Spectre are explained. 

For those working with Linux distributions, here are some tips for patching vulnerabilities on Spectre and Meltdown. Those on desktop machines need to keep in mind the need to update firmware, OS and browsers.

As I see more information from IBM Support, I will do my part in getting it out there, both on this blog and on Twitter, where you can follow me @robmcnelly.

Another AIX vs. Linux Discussion

Edit: Some links no longer work

Originally posted January 3, 2018 on AIXchange

Six years ago I wrote an article about how much I love AIX. It’s a topic I’ve revisited a number of times–most recently, here

So let’s have another AIX vs. Linux discussion. Certainly, plenty of folks are still talking about this, in their companies and on Twitter

As admins, I don’t expect this to change. For one thing, let’s face it: we’re an argumentative lot. We argue over vi vs. emacs, Gnome vs. KDE, Debian vs. Redhat. People think about their workloads and where they want to run them. Do they want to take advantage of POWER’s performance advantages and run Linux on Nutanix clusters on POWER nodes? Do they want to plan on running POWER9 with NVIDIA GPUs and work on machine learning or artificial intelligence or build the world’s fastest supercomputer? Those solutions will be running Linux. 

Of course Linux is growing, and of course IBM embraces Linux across the mainframe and POWER servers. As you carry on your own OS discussions, by this logic, you should add x86 vs. POWER to the discussions. 

My point is still that I see advantages with AIX and the VIO server. Do I use Linux? Of course I do, and have for years, just like most of you. 

My issue with Linux comes down to a simple question: Which Linux? Which version of Linux do you love? Redhat? Debian? Suse? Which LVM and filesystem do you love? Which backup solution do you love? Which hypervisor do you love? I could go on, but you get the point. When you’re talking about Linux, you can be talking about many different things. 

Are you a big fan of systemd and the way that solution is heading? Which desktop manager do you love? Is Linux trying to satisfy both desktop users and enterprise users? 

I don’t mind the debates, though I don’t agree with the dire predictions. My take? AIX will be just fine. I’ve seen timelines for AIX and IBM i that extend years from now. See for yourself: Look for the AIX TL release dates and end of service pack support charts. As of this writing I see AIX 7.2 TL2 going to 2020, and AIX 7.2 TL5 going out to 2022. As more technology levels are released, you can be sure that those dates will keep marching into the future. 

At the end of the day, I am pretty sure we can all agree on this: at least we’re not Windows administrators.

Stuff Old People (Still) Say

Edit: I am sure you can think of more

Originally posted December 19, 2017 on AIXchange

Some months ago I saw this on Twitter, and it’s stuck with me:

@krhoyt Said “Do I sound like a broken record?” in a meeting, and wondered if it even made sense any more. Has it weathered the years? #getoffmylawnI guess that got me thinking about things that I say, and made me wonder why I still say them. Of course, readers of this blog know I’m old school. For instance, I still have a turntable and vinyl records, though I don’t listen to them as often as I once did. I like to buy reliable old cars, maintain them, and drive them into the ground. In addition, I still have a landline with a wired (not wireless) headset, and when people talk about Slack, it makes me reminisce about IRC.

But does anyone under the age of 45 actually understand what it means to sound like a broken record? Youngsters today, who hold a lifetime’s worth of music in their pockets, view cassettes as antique.

For that matter, why do we still say “hang up the phone” when we’re obviously just tapping the end call button on our smartphones? Or why does the save icon on most programs on your computer still look like a floppy disk? Few kids have even seen a floppy. Then again, when you want an image to symbolize saving a document, what other icon would make sense? Maybe a cloud?

Of course not everything old is necessarily bad. Sometimes things from our past even make a comeback. For instance, I’ve read about kids rediscovering TV antennas. Using this technology to get free over-the-air TV channels is mind-blowing for them.

I know I’m getting Seinfeldian here, but I really wonder about these things. And seriously, what is the deal with airline food?

Living with Legacy

Edit: I still run into AIX 5.3 all the time

Originally posted December 12, 2017 on AIXchange

This Twitter comment got me thinking about companies that continue to run legacy hardware and operating systems, etc.:

The software needs to be reliable. We had an operator put in a wrong toolholder and poof, 37k out the window for a new spindle. Imagine if it were a software error? Machine manufacturers aren’t going to risk it. They develop something, test the snot out of it, and then try not to change anything unless they have to. The machines are likely to last far longer than the average operating system. We have some that are between 15-20 years old and it is not uncommon to find them 30+ years old like the lathe this guy is working on. The more expensive the machine, the more likely it will be rebuilt and kept running.

Although his focus is Windows XP, I’m sure if you ask around, you can find examples of ancient unsupported systems on your own. They may even be in your own organizations.

Here’s a sampling of replies:

This is almost always the vendor’s fault, and no they usually don’t test on new OSes when they come out. I’ve seen this happen with law enforcement, medical, and industrial software.

I’ve had vendors tell me the newest version of Windows/Windows Server they support is one that had been out of extended support for years. Server software that requires 2000/2003. Client software that requires NT or XP. Low competition, vendor lock in? Why bother.

And the other problem? These industries are boring. Not many SV nerds getting on GitHub to write new jail cell management software. Again, little/no competition, niche industry, locked in customers. No choice but to run insecure software.

The vendors have no incentive to change, or they’ve gone out of business, but their customers have found that their existing solutions solve their problems so they’re not looking for another solution. I still come in contact with people that are happily running AIX 5.3 on POWER5 hardware and don’t see any reason to move ahead. Save for the occasional hard drive failure, they don’t even touch their systems.

While most of us understand the numerous, important benefits of being current, supported and up to date, others do not. IBM has tried to give them a way forward with Extended Support and other options allow customers to run unsupported operating systems on modern hardware, but you can only do that for so long.

Unfortunately, these customers may find themselves painted into a corner with no way out. The technical debt will catch up with them.

So what’s the answer? What is the best path forward for customers that insist on running some old application on DOS, or on AIX 4.3.3, or something similar? Because eventually you will get called to help out this sort of customer. What are you doing now to prepare for the day when critical legacy infrastructure goes down?

Maybe all we can do is hope that this joke, recently told on Twitter, becomes reality:

@eashman AWS has announced new PDP-11 instances. Useful for airlines and Motor Vehicle departments looking to move to the cloud without upgrading existing infrastructure. #reInvent #geekhumor

The POWER9™ Journey Begins

Edit: At the time of this writing we are waiting to begin the POWER10 Journey

Originally posted December 5, 2017 on AIXchange

One of the great benefits of being an IBM Champion is the ability to attend IBM briefings about unannounced products. For instance, recently, IBM gave us some details about the new Linux-only POWER9™ server. The AC922 (Machine Type 8335-GTG) for high-performance computing (HPC) is being officially announced today, and will be generally available Dec. 22. Learn more about the AC922 here. 

The AC922 will be the first GA system to run a POWER9 processor. This machine takes advantage of the new faster speeds we’ll see from PCIe Gen 4–which is twice as fast as PCIe Gen 3–along with the improved speeds of CAPI 2.0 and next-generation NVIDIA NVLink, where POWER remains the only processor for NVLink from the processor to the GPU accelerators.

It contains two POWER9 sockets with up to 40 cores, and up to four NVIDIA Volta-based Tesla V100 GPUs. It will max out at 1TB of memory if you use the 16 x 64 GB DIMMs (eight per socket). It has four PCIe Gen 4 slots, and can have up to 7.7 TB of storage and 3.2TB NVMe adapters. It’s not designed for virtualization; it’s intended to be configured as a bare metal “single server.”

There are two processor modules: 16-core and 20-core. Initially, the available memory options are 16, 32, or 64GB industry standard DIMMs. There are two hard drive slots per machine. You can choose from 1TB and 2TB HDD options and 960GB, 1.92TB and 3.84TB SSDs. You have your choice of RAID0, RAID1 and RAID10.

I’ll share some information I received during the call. These notes come from presentations created by IBM experts:

  • The AI era is going to be a journey. Clients are faced with challenges of commodity hardware combined with open source software. IBM has built the best systems in the marketplace to crush the data challenges of the AI era. These are enabled with advanced I/O interfaces, new shared memory structures and co-optimized hardware and software.
  • There are three key points with the AC922, which make it the best server for AI. First, it is designed from the ground up for AI workloads; this starts with the acceleration superhighway. In AC922, IBM has introduced second generation NVLink between the CPU and GPU, this is 5.6x faster than PCIe Gen 3 architectures. Second, IBM did not only focus on the NVLink and the GPU, but designed a balanced system, one that is designed for the AI era with industry-leading memory bandwidth, PCIe Gen 4 buses for the best network connectivity with Infiniband and high performance storage adapters. Lastly, IBM took the open source deep learning frameworks and optimized them around this advanced design. This results in the best server and solution for enterprise AI. Additionally, this server design will find use in applications such as HPC and accelerated databases…so do not think it is just for AI.
  • At the center of Power Systems’s differentiation is the processor. Everything starts from here and it is designed for the cognitive era. Power has always had a stronger core with up to 4x the threads over x86. The architecture also enables advantaged memory bandwidth for a balanced system design, enabling ease of data movement within the system. One of the core differentiators Power delivers is the advanced I/O interfaces. Last fall IBM introduced POWER8 with NVLink. This was the first processor with NVLink between the CPU and the GPU. With POWER9 IBM introduced more advanced interfaces such as next generation NVLink, PCIe Gen 4 and OpenCAPI.
  • This remains the only processor in the industry to leverage NVLink between CPU-GPU.
  • When IBM talks about AC922, they discuss CORAL. CORAL is the collaboration between Oak Ridge, Argonne and Lawrence Livermore research labs for the Department Of Energy. It all starts with the POWER9 processor and the NVIDIA Tesla V100. IBM is combining these on a motherboard, which is differentiated with the connectivity between them. All nodes are contained in a standard rack mount chassis. It is the repeatable building block used for this super computer.

Resources
On Twitter, the OpenPOWER Foundation shared photos of this system that were shown at November’s SC17 conference. Check out picture 10.

This article describes the nodes that make up the Summit Supercomputer. It gives you an idea about potential real-world uses for these nodes:

Oak Ridge National Laboratory’s new Summit supercomputer, projected to be the fastest in the world, should rocket the U.S. back into the lead over China on the top 500 list of fastest supercomputers. At SuperComputing 2017, IBM demoed its Power Systems AC922 server nodes that serve as the backbone of the Summit supercomputer. …

Summit promises to deliver 5-10x more performance than its predecessor, Titan, but it crams much more power into a smaller footprint. Titan featured 18,688 nodes, but Summit will overpower it with “only” ~4,600 nodes. That capability stems from increased node performance; Summit will offer more than 40 TeraFLOPS per node, whereas each Titan node weighed in at 1.4 TeraFLOPS. Packing all that power into a single node begins with IBM’s water-cooled Power Systems AC922 node. Each node is equipped with two IBM POWER9 processors and six Nvidia Volta GV100 GPUs. The nodes also feature an aggregate of 512GB of coherent DDR4 and HBM2 (High Bandwidth Memory) along with 1,600GB of non-volatile RAM. …

Supercomputers are all about parallel computation and moving data between the CPUs, GPUs, memory, and networking, so Summit provides numerous layers of extreme bandwidth. The system features 96 lanes of PCIe 4.0 that comes in handy for the dual-port Mellanox EDR InfiniBand adapter, which has a theoretical maximum throughput of 400Gb/s. IBM has measured throughput at 392Gb/s, which is twice the bandwidth of a PCIe 3.0 adapter. The Volta GV100’s connect via PCIe 3.0 and NVLink 2.0. The NVLink interface provides 100GB/s of throughput for CPU-to-GPU and GPU-to-GPU traffic. The GPUs are arranged in a dual-mesh design. Interestingly, IBM also produces a model with four GPUs that will power CORAL’s Sierra supercomputer. The four-GPU model (the last picture in the album above) touts 150GBps for inter-GPU/CPU communication. Due to the reduced number of GPUs, IBM can provision more links (“bricks” in NVLink parlance) to the CPUs and GPUs, which increases throughput. …

The POWER9 processors have eight memory channels, for a total of 16 channels per server that provide 340GB/s of aggregate bandwidth. Each Summit node will wield a maximum of 2TB of DDR4-2666 memory.

In this video, “Scott Soutter, IBM; Steve Fields, IBM Power Systems; and Dylan Boday, IBM Power Systems discuss Power AI, deep learning frameworks, continued partnership with Nvidia for POWER9, and Open CAPI, from SC17 in Denver, Colorado.”

Although there’s nothing AIX-specific in today’s announcement, more announcements that cover the AIX and IBM i ecosystem will be made in the future.

IBM has issued a statement of direction for the POWER9 Enterprise hardware. I’ve also seen timelines for AIX and IBM i that, I assure you, extend years into the future.

Obviously, there’s much more ahead with POWER9, but this machine is the first step on that journey.

Using lvmo to Migrate LVM Performance Tuning Values

Edit: Some links no longer work

Originally posted November 28, 2017 on AIXchange

If you use the lvmo command to tune Logical Volume Manager (LVM) pbufs, this information may be useful:

The lvmo command sets or displays pbuf tuning parameters. The equal sign can be used to set a particular tunable to a given value. Otherwise, if no equal sign is used, the value of the tunable will be displayed.

Of course the warning is also very helpful:

Misuse of the lvmo command can cause performance degradation or operating-system failure.

Yes, lvmo requires the utmost care, but when used properly, it can provide valuable function. For instance, via Twitter I found this IBM developerWorks post from May. It explains how to use lvmo for migrating LVM performance tuning values.

These tunables are stored outside of the on disk volume group in ODM, and aren’t preserved when the volume group is moved to a new LPAR. The exportvg and importvg of the volume group will set the LVM performance tunables to the default values.

A way around this is to backup lvmo tunables before exporting the volume group, and then restore them after importing.

developerWorks provides a downloadable sample script, lvmo_tool, to demonstrate this:

    # lvmo_tool -?
    getopt: Not a recognized flag: ?
    Usage: lvmo_tool -b lvmo_tool -r

To back up lvmo tunables:

    # lvmo_tool -b testvg
    lvmo tunables are saved in /tmp/lvmo00f6f42a00004c000000015b827ff33f.

Run “lvmo_tool -r /tmp/lvmo00f6f42a00004c000000015b827ff33f testvg” to restore lvmo tunables after importing the volume group.

To restore lvmo tunables, import the volume group with no varyon option:

    # importvg -y testvg -n hdisk1 testvg

restore lvm tunables

    # lvmo_tool -r tmp/lvmo00f6f42a00004c000000015b827ff33f testvg

varyon the volume group:

    # varyonvg testvg

developerWorks adds this note:

Restoring lvm tunables on already varied volume group requires a varyoff and varyon the volume group if:

  • New values of the tunables are less than the volume group’s current values.
  • Changing max_vg_pbufs of the volume group.

Do you think you’d find this tool helpful?

Losing a Laptop

Edit: This still has the potential to be devastating

Originally posted November 21, 2017 on AIXchange

A buddy recently checked into a hotel. He asked the desk clerk about in-room safes. He’s told the hotel doesn’t have them. Then he asked the clerk to recommend a good nearby restaurant within walking distance. He was given directions, went, ate, returned to his room, and found his laptop had been stolen. He wondered if the clerk was somehow in on the theft, but he couldn’t prove it.

I mention this story because it points to how essential our laptops are to our lives. Losing our computers–under any circumstance–inhibits our ability to make a living. I liken it to an auto mechanic or a construction worker having his tools stolen. This puts someone’s livelihood in jeopardy. At minimum, there’s a significant inconvenience involved in replacing the stolen gear.

Think about your laptop and its contents. Is the hard drive encrypted? Is there a power-on password? How would you be affected if it were taken from you forever?

Whenever I get a new machine, I spend time recreating my environment. Certainly, that process is made easier if I have access to my old laptop in order to make comparisons. There may be VPN definitions to recreate, virtual machines and .iso files that you like to have available, software packages to download and install, and documents that are in process or saved locally. I’m sure you have your own list of tools and capabilities that you use everyday. They would be difficult or impossible to replace.

How long would it take you to rebuild and recover? Would some things simply be lost for good? While you’re pondering that, you might want to ask yourself about maintenance. How recent is your latest backup? Have you tested a restore?

I know some people who, I guess, wouldn’t be lost without their laptop. They have relatively thin clients, they use cloud storage and/or regularly backup their files. (There’s software that automatically backs up everything hourly, or at an even faster interval.)

So what do you do to protect your laptop? Do you bring it everywhere you go rather than leave it unattended at any point during your travels? That can be a drag when you want to walk around and explore a new destination. Do you really want to bring everything with you all of the time?

There are other options, like a laptop lock. Of course the downside to this is that it’s pretty obvious what’s been locked up. Plus it’s not capable of protecting any other valuables you might have with you.

I’ve heard about something called a backpack/bag protector. Everything goes in your bag, then you wrap this metallic mesh of cables around it and attach everything to a bedframe or a drainpipe in your bathroom. I guess it’s like chaining a bicycle to a bike rack. Apparently international travelers use them when staying at places like youth hostels. I’ve also heard of them being used by hikers and backpackers; they just attach their bags to a tree.

The outfit that makes the backpack protector also has what it calls a portable safe. It’s basically their own bag that contains a built in mesh, and it comes with an integrated lock and a cable that can be attached to any heavy, unmovable object.

Obviously there’s no foolproof solution. Hotel safes certainly aren’t impenetrable (watch herehere and here). Devices like locks and portable safes can be defeated by bolt cutters. Keeping your laptop with you at all times is a problem if you’re mugged. Even common sense has its limits. When you’re traveling you may not go around advertising that you have a laptop with you, but when a bunch of techies gather at a conference, the thieves can figure out that there will be unsupervised computers in the area.

What steps do you take to protect your laptop, on the road or even when you’re at home?

On Becoming a Sponsor User

Edit: Some links no longer work

Originally posted November 14, 2017 on AIXchange

While attending the IBM Technical University last month I went to a session on the Cloud Management Console (CMC). One thing I highlighted when I first wrote about the CMC is how you get access to the product. You can pay $50 per frame per month, or, if you’ve purchase a C model, you receive access to the product for three years.

Another way to gain access to the service is to become a Sponsor User. Although the information here describes teams that are building products, I think it’s a good overview for those interested in the CMC:

Sponsor Users are real-world users that regularly contribute their domain expertise to your team, helping you stay in touch with users’ real-world needs throughout the project.

Despite our best efforts, empathy has its limits. If you’re designing the cockpit of an airliner but you aren’t a pilot, you simply won’t know how it feels to land a plane. Without that first-hand experience, it’s easy to lose touch with our users’ reality and allow bias and personal preference to creep into our work.

Sponsor Users are real users or potential users who bring their experience and expertise to the team. They aren’t passive subjects––they’re active participants who work alongside you to deliver a great outcome. While they won’t completely replace formal design research and usability studies, Sponsor Users will help you break the empathy barrier and stay in touch with real-world needs throughout your project.

Anatomy of a Sponsor User
A good Sponsor User is representative of your intended user, they’re invested in the outcome, and they have the availability to regularly work with you and your team.

1) Are they representative of your target user? A good Sponsor User reflects the actual user you intend to serve. As enthusiastic as your client, customer, or economic buyers may be to help you, they are often not the user who will ultimately derive personal value from your offering.

2) Are they personally invested in the outcome? A good Sponsor User cares as much about your project’s outcome as you do. Look for candidates who have a particularly demanding use case––a Sponsor User who relies heavily on your offering to be successful will have a vested interest in your project’s success.

A word of caution: don’t mistake a demanding use case with an “extreme” use case. If you’re working on a Hill that concerns a family minivan, a race car driver is probably not a great candidate for a Sponsor User, no matter how interested they are in working with you.

3) Are they available to collaborate? A good Sponsor User is open and willing to share their expertise and experience with your team.
While being a Sponsor User isn’t a full-time job, it is a commitment. Set expectations, but be respectful of their time and be flexible around their schedule. What’s important is that their insights and ideas are heard.

If you’ re interested in becoming a Sponsor User with the IBM Cognitive Systems team, contact cary-anne olsen-landis at caolsen (at) ibm dot com. She’ll tell you more about the team and the products they’re working on.

A Tip on Getting Started with the PowerHA 7.2.1 GUI

Edit: Some links no longer work.

Originally posted November 7, 2017 on AIXchange

There are a lot of ways to get familiar with the new PowerHA 7.2.1 GUI:

In PowerHA® SystemMirror Version 7.2.1, or later, you can use a graphical user interface (GUI) to monitor your cluster environment.

The PowerHA SystemMirror GUI provides the following advantages over the PowerHA SystemMirror command line:

Monitor the status for all clusters, sites, nodes, and resource groups in your environment.
Scan event summaries and read a detailed description for each event. If the event occurred because of an error or issue in your environment, you can read suggested solutions to fix the problem.

Search and compare log files. Also, the format of the log file is easy to read and identify important information.

View properties for a cluster such as the PowerHA SystemMirror version, name of sites and nodes, and repository disk information.

More information is available in these videos from Shawn Bodily and Michael Herrera. And then there’s this virtual user group presentation.

I’m mentioning this now in part because someone recently asked me how to locate the fileset that’s needed to get it to work. While the requirements tell you what version of AIX you need to be running, they don’t tell you where to get the cluster.es.smui.server fileset.

For that, you need to go here and download the ESD_PowerHA_SystemMirror_v7.2.1_Std_Ed_122016.tar.gz archive. The package unzips into three directories: the installp directory, an smui_server directory, and a usr directory. While you might assume the filesets are in the installp directory, they’re actually found in smui_server. Credit to Shawn Bodily, who pointed this out to me. Be sure to keep this in mind as you do your own testing of the PowerHA GUI.

Just Back from Technical University

Edit: Some links no longer work

Originally posted October 31, 2017 on AIXchange

I haven’t written about the IBM Technical University lately, but rest assured, I continue to make time for it as schedules allow.

The most recent event took place in New Orleans two weeks ago. When you look at the list of presenters, there was a lot of technical firepower on hand. The Technical University isn’t an exercise in marketing or fluff. It’s technical information for technical people: there were lectures, hands-on labs and numerous opportunities to meet and talk to the speakers and attendees.

IBM puts on several of these types of events each year including Interconnect in March and the IBM Systems Technical University in May. The good thing about this iteration is that, compared to some of the other events, this one did not feel “huge.” There was plenty of room in most of the sessions, and, as I said, ample opportunities to interact with speakers, vendors, salespeople and other attendees in the Solution Center, networking center, and hallways.

The Technical University is a worldwide happening. The conference in Prague is coming up in early November, and future events are set for Cairo, Dubai and Florianpolis.

There’s nothing like being at a large gathering of your peers, as all of us learn from industry experts and from each other. On a personal note, it’s gratifying to have chance real-life meetings with people who’ve been reading this blog over the years. Plus there’s always the possibility of bumping into a fellow IBM Power Champion. (And on that note, nominations for 2018 Power Champions are now open.)

I recognize that it can be tricky to get away from the office, and that some of our employers balk at the idea of paying for training. But events like Technical University are worth it. And training, in general, adds value. It reminds me of an anecdote that I see all the time:

Two managers are talking about training their employees. The first asks, “What if we train them, and they just leave?” The second responds, “What if we don’t train them, and they stay?”

Anyway, for those of you who attended the conference, what were your impressions? And for those of you who haven’t been to a Technical University event, what has kept you from attending?

Patching: Seeking a Happy Medium

Edit: Still an ongoing issue

Originally posted October 24, 2017 on AIXchange

Let’s talk about patching. IT pros understand that it’s critical to patch in a timely matter. Or at least they should understand, but then, getting behind on patching was one factor in the Equifax breach (and many other breaches for that matter).

Even though patching is essential, having control over when and how you patch is highly desirable. When we’re talking about servers that run your core business, you should have absolute control over when and how you apply your fixes. Of course with this power comes responsibility. You should be coordinating with change control and testing changes in a test/dev/QA environment before anything is put into production, and you should be installing fixes in a timely manner, especially high severity fixes.

However, not everyone gets to decide when their patches get installed. Unless you’re inclined to go into your advanced settings and fiddle around a bit, from what I can gather recent Windows versions offer very little in the way of controlling how and when updates are made. I’ve seen Windows 10 systems reboot with no warning on “Patch Tuesday.” I realize this behavior is aimed at non-technical users, and their systems should certainly be kept reasonably current. Nonetheless, they should still have some control over the process. And it’s not just a workplace issue. I’ve seen patch downloads occur over metered connections when it would make more sense to allow these users to choose when to actually download the fixes. Not everyone has unlimited data, even at home; and this is certainly the case with most cellular users. If you’re using your phone as a wifi hotspot with a laptop, you don’t want your limited data allowance chewed up by a Windows update that could have waited till you got home.

Related to this, I’ve read news articles about people reporting system issues once patches were installed. How furious would you be if couldn’t do work following an unplanned reboot, or even worse, if your machine no longer rebooted at all? Imagine the chaos in your life if you no longer had access to your computer, especially if it happened when you were not expecting it.

The point is, if you’re in the middle of something and work gets lost to an auto-reboot, it’s counter-productive. I’d like to see a happy medium with consumer devices. Even my phone lets me postpone updates until it’s more convenient. As an IT pro, having a head’s up with these devices is valuable. I like to take a good backup before patching so it’s easier to roll back the changes if disaster strikes. That may not be possible with a machine that just reboots out from under you.

These are just things I’ve seen recently. To be honest, I’m not sure how widespread this issue is, or whether the fault lies primarily with Microsoft, corporate IT policies or users themselves. I’m just an AIX administrator with a blog, after all.

Perhaps the solution is to switch to Linux on the desktop–although that hasn’t worked out so well in Munich.

What are you seeing with patching, either in the enterprise or among your non-techie friends on the desktop?

A Hitch with SEA Failover Testing

Edit: Test test test.

Originally posted October 17, 2017 on AIXchange

A few months back, I ran into an issue during shared Ethernet adapter (SEA) failover testing. After upgrading to VIO server 2.2.5.10, we would fail VIOS1 and verify our disks and networks were functioning as expected on the VIO clients. Then we’d bring VIOS1 back online and fail VIOS2. The network would hang on the VIO clients.

When we checked the status of our SEAs on VIOS1, they would show up as “unhealthy.” The only way we could resolve this was to reboot the VIO server. This was unexpected behavior and not the way failover used to work.

Eventually we found that we could change the settings on the health_time_req attribute so that it would timeout sooner:

Health Time (health_time_req)
Sets the time that is required to elapse before a system is considered “healthy” after a system failover. After a Shared Ethernet Adapter moves to an “unhealthy” state, the Health Time attribute specifies an integer that indicates the number of seconds for which the system must maintain a “healthy” state before it is allowed to return into the Shared Ethernet Adapter protocol. The default value is 600 seconds.

It appears IBM is aware of this issue and working on a fix. Chris Gibson recently relayed this information:

APAR status
Closed as program error.

Problem summary
Given a pair of VIOS LPARs (2.2.5.x and up) with matching SEAs in HA mode (ha_mode set to auto or sharing) with one node in UNHEALTHY state, if the healthy node is rebooted or loses link, the UNHEALTHY node will not assume the PRIMARY state. In the field, a customer reboots the primary LPAR and waits until it is back up. Then the customer reboots the backup LPAR. Unbeknownst to the customer, the primary LPAR has gone into the UNHEALTHY state because the link came up slightly delayed.

When the backup LPAR is shutdown, the primary LPAR does not take over and become PRIMARY as it did before the upgrade.

Problem conclusion
Code changed to disable link check as part of health check and also reduce the default value of health_check attribute to 60 secs and minimum value to 1s.

This is another reason to do plenty of testing after updates. In our case we just went from 2.2.4.22 to 2.2.5.10, yet we were bit by this issue. For anyone doing VIO maintenance, it’s certainly something to be aware of.

Have you seen this type of behavior?

Power Systems Best Practices Doc Updated

Edit: I always look for the latest version of this document. Some links no longer work.

Originally posted October 10, 2017 on AIXchange

Not long ago I was asked about the Power Systems best practices document that I wrote about in March.

The reader who contacted me couldn’t download the presentation, nor could I when I tried. So I reached out to Fredrik Lundholm, the author, who assured me that it was still available. I tried again, and it worked.

In the interim, a new version of this doc, 1.18, was released. Download it here.

A tip: I’ve found that I need to click on the download button in the top right and select “direct download” in order to get it to work. If your download isn’t successful, you’ll see a timeout message stating that the file cannot be previewed.

Anyway, some highlights from the updated presentation:

  • Slides 13-15 cover the VIO server. Page 13 has VIOS policy, page 14 has the VIOS release lifecycle (showing VIO out into the 2023 timeframe), and page 15 shows network access mechanism information.
  • Page 19 shows VIOS virtual Ethernet tuning information, page 20 has SR_IOV and vNIC information, and page 21 shows storage information.
  • Page 26 has the AIX latest support matrix as of September 2017, page 27 has AIX policy information.
  • Page 36 has PowerHA recommendations, page 39 has Linux and IBM i notes on mtu_bypass and SEA performance.

If you’ve seen previous versions, it’s pretty easy to spot the changes. All new/updated slides are labeled “Updated 1.18” in red.

If you’re new to this, be sure to read this introduction on page 4:

This presentation describes the expected best practices implementation and documentation guidelines for Power Systems with AIX. These should be considered mandatory procedures for virtualized Power servers.

The overall goal is to combine simplicity with flexibility. This is key to achieve the best possible total system availability with adequate performance over time.

While this presentation lists the expected best practices, all customer engagements are unique. It is acceptable to adapt and make implementation deviations after a mandatory review with the responsible architect (not only engaging the customer) and properly documenting these.

Fredrik does a great job of presenting this information. Every update is well worth your time.

My Reading List

Edit: Some links no longer work

Originally posted October 3, 2017 on AIXchange

From time to time I’ll share some random links to AIX documentation I find online or via Twitter. But I also regularly read certain individuals, some who write about Power/AIX and some who cover tech more generally. I thought I’d share that list here:

Jay Kruemcke currently works for SUSE, but you might have caught him at IBM technical conferences in standing-room only, NDA-required, AIX trends and directions sessions. That’s my way of saying he’s a popular speaker. His blog delves into the personal at times, as he explained back in 2011:

One of the reasons I started this blog is to give me an opportunity to discuss topics outside of just IBM AIX and Power Systems. One of my professional passions is product management – the process of creating and managing a product or offering from inspiration through launch, product maturity and eventually the withdrawal of the product. It is a way to “own” a piece of the business and put your own unique mark on a company.

The author of this blog chooses to remain anonymous. As he explains:

I’m my just a simple dumb sysadmin who loves Unix systems and who loves to blog.
I’m now blogging for more than seven years and it has always been for me a way to better understand the things I am working on and a way to share the knowledge. I do not do this for recognition or fame. It’s just my way to thanks all the people who are blogging around the world and to give back what they gave to me: knowledge

I do this for free. I do not accept any donation, or any offers related to money. This blog will stay ad free forever.

For some personal reasons my name will never appear on this website. I prefer to stay anonymous even if most of you will probably find a way to know who I am or already know my real identity.

Recent posts include “Managing a Docker Swarm Cluster with Prometheus” and “Building a Docker Swarm as a Service.”

Bartlomiej Grabowski writes about IBM Power Systems–IBM i as well as AIX:

First of all, I’m pleased to welcome you to my blog. My idea was to create a simple website, where a user can easily find information about IBM i/iSeries/System i/AS400 (so many names for the same system over the last 15 years), AIX, Virtual I/O Server, PowerVM features, and POWER Systems. There are a number of sites about VIOS and IBM i, but I couldn’t find one where PowerVM features are described from the IBM i perspective. I’m also going to publish some simple scripts, and programs which I think might be useful.

Now, let’s move on to some background info about me. My name is Bartlomiej Grabowski, and I’ve been working as principal system support specialist. Main areas of expertise include IBM i, AIX, PowerVM , VIOS, Power Systems hardware. Specifically, I have had the pleasure to work with solutions based on software, hardware replication, DS 8k, SVCs, independent ASP, and dozens of LPARs, and servers. Also, I have collaborated with IBM, and other experts creating several Redbook publications.

Recent posts covered administrative domains on IBM i and LUG 2017 at IBM Rochester.

Brian Krebs doesn’t cover Power Systems, but his stories around the security field are usually very interesting and unique:

He’s recently written about the Sonic, Deloitte and Equifax breaches. I also recommend checking out “Who is Marcus Hutchins” or “Twitter Bots Use Likes, RTs for Intimidation” to get an idea of the kind of information he provides.

Accelerate with IBM Storage lists upcoming calls around different IBM storage topics. Call replays are available if you can’t listen live.

Of course I have to include Nigel Griffiths and Chris Gibson. Both write about new hardware, tools, tips and more.

Gareth Coates authors “Tips of the Power Masters.” These are practical, easy to understand and easy to implement solutions.

The Linux on POWER blog has a self-explanatory name. Recent headlines include: “Red Hat now supports Containers on IBM POWER Systems” and “IBM Advance Toolchain for Linux on Power 11.0-0 released.”

These are my go-tos. Who do you read? Make your recommendations in comments.

Design, Customize and Buy Your OpenPOWER LC Server Online

Edit: Have you bought servers using this method?

Originally posted September 26, 2017 on AIXchange

Did you know how easy it is to design your own OpenPOWER LC server? Here’s a hint: it’s pretty easy.

Just go here and select your server. Your choice will be customized with various workload types, including Hadoop and Spark Analytics, memory intensive clusters, open source DB, deep learning and GPU-accelerated.

Depending the option you choose, you’ll be presented with a different server type. Then you’ll be able to select your chassis. For instance, if you choose GPU accelerated with NVIDIA NVLink, you get:

IBM Power System S822LC for High Performance Computing
Tackle new problems with NVIDIA Tesla P100 on the only architecture with NVIDIA NVLink — eliminating barriers between CPU-GPU.

Experience unprecedented performance and application gains with the new POWER8 with NVIDIA NVLink — delivering 2.8X the CPU-GPU bandwidth compared to x86 based systems.

IBM Power Systems S822LC for High Performance Computing pairs the strengths of the POWER8 CPU with 4 NVIDIA Tesla P100 GPUs. These best-in-class processors are tightly bound with NVIDIA NVLink technology from CPU-GPU — to advance the performance, programmability and accessibility of accelerated computing and resolve the PCIe bottleneck.

For memory intensive clusters, there’s an additional option of either a 1U or 2U system. Here’s the 1U description:

IBM Power System S821LC
A dense, high-data throughput server for your enterprise and cloud.
Compute-intensive workloads can now benefit from two POWER8 processors in a 1U form factor. This server delivers the density your business needs for virtualization, database and HPC deployments.

IBM Power Systems S821LC brings open innovation and high-density computing to the Linux server market with superior virtualization, incorporating POWER8 processors, tightly coupled FPGAs and accelerators and faster I/O using CAPI. Optimize processing power while simultaneously increasing workload throughput and reducing data center floor space requirements.

And here’s the 2U version:

IBM Power System S822LC for Commercial Computing
Open standards-based system designed to simplify and optimize your data center.
Open standards-based system that provides flexible deployment options for hybrid cloud, big data and business-critical applications.

The IBM Power System S822LC is designed to deliver superior performance and throughput for high-value Linux workloads, such as industry applications, big data and LAMP. With greater reliability, serviceability and availability than competitive platforms, the Power System S822LC incorporates OpenPOWER Foundation community innovation for clients that need to run big data, Java, open source and industry applications.

From here, simply click on “build your server” and you’ll be presented with options for your processor, memory, storage and PCIe cards. In my 1U example, I chose the 2×8 core option from this list:

    1x 8 core CPU at 3.32 GHz (8x POWER8 cores)
    1x 10 core CPU at 2.92 GHz (10x POWER8 cores )
    2x 8 core CPUs at 3.32 GHz (16x POWER8 cores)
    2x 10 core CPUs at 2.92 GHz (20x POWER8 cores)

Then I picked 16 DIMMS at a 32G DIMM size, giving me 512G total.

For storage you can chose from NVMe, SSD, SAS or SATA drives, and also tailor the size and quantity to your needs. There are also options for adapter cards.

Once you’ve made your selections, you’ll advance to your server config. Here you can download server specs and view the starting price.

Then just click on “purchase now,” proceed to checkout, and login with your IBM ID to finalize the purchase. Like I said, easy.

Have you been purchasing systems this way? Let me know in comments.

POWER9: What’s Already Out There Says Plenty

Edit: At the time of this writing we are talking about POWER10

Originally posted September 19, 2017 on AIXchange

In March I wrote about the POWER9 roadmap. More recently, I sat in on a confidential briefing about the upcoming release. All I can really say about it is that some exciting things are coming, and I can’t wait to share the details with you.

Of course, per the confidentiality agreement I signed, I will have to wait. But the thing is, if you look at what’s already been publicly divulged about POWER9 (see hereherehere and here), you’ll get a clear, if incomplete, picture.

Here’s what I’ll add to that: If you look at the roadmaps for AIX and POWER, you’ll see that IBM delivers its solutions at a consistent pace. So if you consider the timelines of previous releases, it’s safe to assume that we won’t have to wait much longer for new products.

Plus, this supercomputer is already running a POWER9 solution:

Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 4,600 nodes when it arrives in 2018. Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory. To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect.

As I said, I’ll write more as soon as I can. For now, a quick show of virtual hands: How many of you have made the move to POWER8, and how many plan to make the move to POWER9?

Some Familiar and Not So Familiar Uses of ifconfig

Edit: Some links no longer work

Originally posted September 12, 2017 on AIXchange

In our line of work, you never stop learning. But I also believe it’s important to refresh one’s memory by revisiting some basics from time to time.

For instance, we should all know the OSI model, which is described here.

How well do you know your ifconfig commands? Here are some examples:

When do we use ‘ALIAS’? Consider the following command:

# ifconfig en0 <ip address> <subnet mask> alias

What is the function of ‘alias’ here? Alias is basically used to assign more than 1 ip address to a single interface.

For eg:
# ifconfig en0 192.168.0.2 255.255.255.0 alias

This command will assign 2 ip addresses to a single interface en0.

If no ‘alias’ is used:
# ifconfig en0 192.168.0.2 255.255.255.0

This command will replace the earlier ip address for the interface en0 with a new ip address.
So, by using ‘alias’, we can assign 255 ip addresses to a single interface.Or maybe you want to remove the TCP/IP configuration on a host:

The rmtcpip command removes TCP/IP configuration on a host machine. The basic functions of this command is:

* Removes the network interface configurations.
* Restores /etc/rc.tcpip to the initial installed state.
* Restores /etc/hosts to the initial installed state.
* Removes the /etc/resolv.conf file.
* Removes the default and static routes.
* Sets the hostname to localhost.
* Sets the hostid to 127.0.0.1.
* Resets configuration database to the initial installed state.

Like I said though, you never stop learning. As it pertains to ifconfig, awhile ago I become aware of an interesting option that I hadn’t tried. Chris Gibson tweeted about it:

Move IP address seamlessly from one interface to another. # ifconifg en0 transfer 10.1.1.10 en1

He included this link from the IBM Knowledge Center:

transfer tointerface
* Transfers an address and its related static routes from interface to tointerface. For IPv6, this command works only for addresses added by using the ifconfig command.
* ifconfig interface addressfamily address transfer tointerface

Note: If you want to transfer an IP address from one interface to another, and if the destination interface is not part of the virtual LAN (VLAN) to which the IP address belongs, you must add the VLAN to the adapter on which the destination interface is configured.

This is certainly handy. Did know it was available? Have you used it?

Tech Changes, but Teaching Doesn’t

Edit: We were all beginners once

Originally posted September 5, 2017 on AIXchange

Even though it was published in the earliest days of the internet, this 1996 article about helping people learn to use computers still rings true.

Do you find yourself falling into any of these traps when you’re teaching users about your systems? This was written at a time when people were logging onto the network using modems and operating systems that are primitive by today’s standards, so obviously things are quite different now. In a lot of ways, things are better now. Twenty years ago, few people outside of IT knew much about computer basics or getting online:

Computer people are fine human beings, but they do a lot of harm in the ways they “help” other people with their computer problems. Now that we’re trying to get everyone online, I thought it might be helpful to write down everything I’ve been taught about helping people use computers.

First you have to tell yourself some things:

Nobody is born knowing this stuff.
You’ve forgotten what it’s like to be a beginner.
If it’s not obvious to them, it’s not obvious.

Have you forgotten what it’s like to be a beginner? Most users today have literally grown up with computers, but that doesn’t mean they really understand what goes on under the hood. Pretty much anyone under the age of 80 can get online–and quite a few octogenarians can, too! However, simple access to mobile devices and user-friendly operating systems doesn’t make anyone a techie.

And don’t ignore the learning curve for actual techies, either. Not all of us come from UNIX/Linux backgrounds and have spent decades working from a command line. You may be comfortable with AIX or the VIO server, but these environments can be intimidating to newbies:

Beginners face a language problem: they can’t ask questions because they don’t know what the words mean, they can’t know what the words mean until they can successfully use the system, and they can’t successfully use the system because they can’t ask questions.

You are the voice of authority. Your words can wound.

Computers often present their users with textual messages, but the users often don’t read them.
By the time they ask you for help, they’ve probably tried several things. As a result, their computer might be in a strange state. This is natural.

They might be afraid that you’re going to blame them for the problem.

The best way to learn is through apprenticeship–that is, by doing some real task together with someone who has a different set of skills.

Your primary goal is not to solve their problem. Your primary goal is to help them become one notch more capable of solving their problem on their own. So it’s okay if they take notes.

Personally, I love WebEx and other screen sharing technology. There’s nothing better than getting on a shared screen session and a call and walking through an issue. There are multiple ways that I can coach them through it. They can just watch me, or even better, I can watch them figure it out for themselves:

Don’t take the keyboard. Let them do all the typing, even if it’s slower that way, and even if you have to point them to every key they need to type. That’s the only way they’re going to learn from the interaction.

Try not to ask yes-or-no questions. Nobody wants to look foolish, so their answer is likely to be a guess. “Did you attach to the file server?” will get you less information than “What did you do after you turned the computer on?”

Take a long-term view. Who do users in this community get help from? If you focus on building that person’s skills, the skills will diffuse to everyone else.

Never do something for someone that they are capable of doing for themselves.

Take the time to read the whole thing. And if you like that, you may want to peruse this archive of articles on a variety of interesting topics that were published from 1994-1996.

It’s interesting to see not only how technology has changed over time, but what hasn’t changed. I can’t help but wonder what it will be like 20 years from now. What will be different? What will be the same?

My Workout Day at the Data Center

Edit: It can still make for a good workout

Originally posted August 30, 2017 on AIXchange

As I’ve noted previously, IT pros aren’t the healthiest lot. But if you spend any time setting up new hardware in data centers, you’re at least getting a workout.

This occurred to me while I was recently unboxing and racking customer setup equipment, including V7000, V5000 and V9000 storage units and several POWER8 servers. In a sense it’s like opening presents on Christmas. I’m always amazed to see the effort and care that goes into the packing and shipping of this gear. It’s done in such a way that the boxes can take some abuse (which they often do) while the contents survive quite nicely.

Keep in mind that many environments don’t allow cardboard in the computer room, so most of the gear must be unpacked and transported at least a short distance to get it to the raised floor. Even with carts and lift tools and sufficient manpower, a lot of this stuff is pretty heavy. On top of that, you may find that you’re unloading hardware in areas where the facility’s A/C isn’t up to snuff, at least compared to the chilly computer room. And once all the boxes of servers, controllers, expansion drawers and disks are opened, you’re also dealing with a fair amount of trash, so it’s good to have a roomy staging area and a plan for waste management.

The point is, it’s easy to take for granted what goes into this process as well as what it takes out of you. After doing several racks worth of equipment, you might find yourself a little sore the next day, so be sure to build in some recovery time into your project plan.

My most recent “workout” at the data center has me convinced that we could develop technology-related, CrossFit-type programs based on these activities. For sure there’s plenty of bending/lifting/hauling/kneeling, etc. that goes on. And if you don’t have power tools, the simple acts of installing rails and tightening screws and attaching cables must equate to various familiar exercises. Why put yourself through dead lifts, squats or bench presses when you can just do a rack and stack?

My plan is still in the inception stages, but I have to think people would happily pay for this type of workout. People pay to go to hot yoga; what could I charge them for time spent in hot and cold aisles in a computer room? Or maybe The Techie Fitness Spa (the name’s also a work in progress) will be more like those strongman competitions or the gyms where they toss old tractor tires around. But instead, I’ll have racks and drawers and everything else we deal with. To make my establishment stand out, I could recreate the whole computer room experience by throwing in a few man traps, retina and fingerprint scanners, and bag checks.

No doubt, the money will be rolling in soon. I just hope that nobody tries to steal my idea in the meantime.

Is Anyone Interested in a Real-Time AIX Forum Using Slack?

Edit: I am still using it with the IBM Champions

Originally posted August 21, 2017 on AIXchange

I recently started using Slack. It’s a group messaging tool that seems to be making inroads at IBM. There’s also a channel for IBM Champions, which is the one I joined. Despite my limited experience with Slack, I can see some interesting possibilities with it, which I’ll get to at the end of this post.

So what is Slack? I’ll let IBM’s Chuck Calio explain it. Chuck created a presentation, and with his permission I’ll share some details from the slides with you.

He starts by explaining some of the limitations of our current communication methods. While some of this is IBM-specific, I’m sure you can find it relatable:

•    Email: good for formal communications, overload from way too many, easy for discussions to fragment, key people often left out, responses often too slow.
•    Conference calls: good for education/intros, 1:1 sensitive calls, very limited active participation by the entire group
•    Connections/communities: share useful resources with extended team, forums for discussions, blogs, wikis [but] lack modern application integration.

Here’s how Chuck defines Slack:

Slack is a next generation real time “collab app” aimed at businesses rather than individuals. It’s optimized for teams that will interact with each other around specialized topics (channels). Slack’s strength is around creating an open transparent collaborative “web” of many diverse people to accelerate global team collaboration and innovation around specialized topics.

The Benefits of Slack

Why is Slack different from what we have today? Here are six reasons:

1. Enables information transparency across large distributed global and diverse teams; drives, enhances collaboration and accelerates innovation (vs. private, individual chats and 1:1 learning).
2. Encourages people to collaborate around specific topics (channels), across big groups (teams), across business units (IBM, non-IBM).
3. Slack is optimized to work across multiple devices (PC, laptop, tablet, mobile).
4. Slack chats build up into a corpus of searchable deep knowledge and makes it easier for new team members to quickly come up to speed.
5. Support ecosystem of hundreds of modern applications, many deeply integrated.
6. Capability for bots integrated into Slack, built in analytics.

He further helps define a few concepts in Slack, starting with a team:

Groups of people (from two to tens of thousands) that share a common purpose or interest (teams) interact around specialized topics (channels). Typical activities include sharing content, asking questions, getting/giving help, generating or testing new ideas, etc.

Slack [allows users to create] an open, transparent, collaborative “web” of many diverse people to accelerate global team collaboration and innovation around specialized topics.

Individuals can/should be part of and contribute to multiple Slack teams and channels.

Slack works best if a large percentage of the team is actively engaged and contributing in channels.

Channels, according to Chuck, are “focused group discussions, messages, notifications and collaborations.” They’re organized by:

•    Topic (#openpower)
•    Purpose (eg, #sales-tv-ads)
•    Department
•    Announcements
•    Practices
•    Anything else you want

Here’s his comment on threads:

With threads, you can branch off and have discussions around a particular message without having to skip to a different channel or a DM. Threads move conversations to your sidebar—where you can ask and answer questions, give feedback, or go off on an inspired tangent.

To use, click the “start a thread” button on any message.

Here’s what he said about conducting “stand-up” meetings:

The beauty of doing stand-ups in Slack is that each person can post their status at any time, and it can be read asynchronously by everyone else. Our team’s rule is that you just need to post your status in the stand-up channel on Slack at some point in your day. It needs to be a meaningful update, not just “I’m doing work today.” Once a team member reads the other statuses, they can take action on it at that time. Pointing out blockers is especially helpful, so that other people can see what might affect their progress and think about how their work affects others.

And finally, some some hints and tips:

Communicate in public channels whenever possible. By keeping most of your conversations open to all team members. Benefits include:

•    Leverage the wisdom of the crowd.
•    Get answers and responses from SMEs faster.
•    Build a database of organizational knowledge with near zero effort.
•    Draws more of your team into Slack. (No one wants to miss out on critical conversations!)
•    Gain visibility into the latest happenings in your areas of interest.

If you remember irc (which is still a thing, by the way), I’d say Slack is a modernized version of that. It’s new and shiny and seems easy enough to use.

A Public AIX Slack Channel?

Do you think setting up a public AIX Slack channel would be a good idea? I’m serious about this. Take a look at this tool and imagine a forum that provided real-time help and communication and collaboration for AIX users across your desktop and mobile device. Would you be interested in joining that group? Maybe something like this would quickly become too big to manage, but I find the idea very intriguing. So please, let me know what you think.

Taking Your HMC to the Cloud

Edit: Some links no longer work.

Originally posted August 15, 2017 on AIXchange

Have you heard about the new Cloud Management Console (CMC)? It provides a new way of managing our environments from a single pane of glass. The data from your HMC flows to a central location, and you manage it there. If you have multiple HMCs in a large environment, that’s a great convenience.

Enterprises with one of the newer Power Systems C server models are eligible to receive three years of “free” access to the product. Otherwise, check with IBM or your business partner for specific pricing details.

Rather than install (and manage and patch) software, your HMC is set up to connect to IBM’s cloud, allowing the device to send data about your environment. This being a cloud-based model, you can access the CMC from a mobile device or any browser.

Alternatively, you can give it a spin with IBM’s hosted trial:

Scroll to the bottom and click on the sponsor agreement. Select “I agree” and click on “I confirm.” You’ll then get an email with instructions on gaining access to the platform so you can try it out.

To run it in your own environment, your HMC must be at V8R8.6 SP1 PTF MH01698. Once you’ve ordered the product, you’ll be able to register with an existing IBM ID. From there, you can select the unique subdomain you’ll be using for your enterprise. IBM will provide you with an API key. Copy and paste the key into the CLI on your HMC, and then start Cloud Connector using the chsvc command.

Once your HMC data is loaded into the CMC, you can filter and search the information that has been collected. In the performance app you can see utilization trends, gather data about performance and capacity, check your servers’ allocation levels, view performance data, and more.

You can manage users, permissions, and access to the tool, and any apps that aren’t needed can be shut off. Blacklists can also be enabled, so if you have a managed system from which you don’t wish to forward information, it will not be sent. In addition, it’s possible to connect your HMC to IBM through a proxy if need be.

Support can be obtained by opening a ticket in Zendesk. As this is a subscription and you’re buying a service, traditional avenues of IBM support aren’t available.

Incidentally, if you’re wondering, IBM is well aware of the concerns regarding cloud technology. During a recent training webinar I attended, it was mentioned that some think of cloud as a dirty word and don’t want anything to leave their data center. The counter to this argument was a simple question: Have you set up Call Home to IBM? Do you trust that? That’s another example where information flows from your data center to IBM. Why have a problem with one when you rely upon the other? It was also noted that all apps are read-only, and that nothing comes into your data center from the outside.

The final point made at the webinar is that IBM isn’t gathering information for the sake of doing so; they want to aggregate data and use their expertise to help their customers. They want to find the connections and insights that are buried within your data that can help your business.

Going forward, additional capabilities and applications will be brought to the CMC. Eventually, Project Monocle will be incorporated.

For details, see the data sheet:

The IBM Cloud Management Console for Power Systems provides a consolidated view of the Power Systems cloud landscape including inventory of systems and virtual components, performance information and logging. The Cloud Management Console is hosted in the IBM cloud and can be accessed securely at any time enabling system administrators to easily run reports and gain insight into their Power cloud deployments. This solution has been built for mobile devices, tablets and desktop browsers enabling cloud operators to enjoy convenient access to this application.
And here’s the announcement letter:

IBM Cloud Management Console for Power Systems is a software as a service (SaaS) offering that provides enterprise-wide performance, inventory, and logging insight for IBM Power Systems servers. This SaaS offering gives clients a central enterprise-wide view of their Power Systems servers without having to install or maintain software at their data center.