Migrating the Cluster Repository Disk

Edit: Still good stuff.

Originally posted June 18, 2019 on AIXchange

Dino Quintero (@DinoatRedbooks on Twitter) maintains a Redbooks blog on IBM developerWorks. In this post from April, he and Shawn Bodily explain how to migrate the cluster repository disk on PowerHA SystemMirror:

The following procedure is valid for clusters that are PowerHA SystemMirror v7.2.0 and later. Verify cluster level on any node in the cluster by executing the halevel -s command as follows:

TST[root@aixdc79p:/] # halevel -s
7.2.1 SP2

The repository disk is the only disk in the caavg_private volume group and requires special procedures. You do not use LVM on it. It is recommended to run a verification on the cluster prior to replacing a repository disk. If there are any errors, these must be addressed and corrected before replacing the repository disk.

To get started, go through these steps, which are detailed in the post:

clmgr verify cluster
bootinfo –s hdisk#
chdev –l hdisk# -a pv=yes
chdev –l hdisk# -a reserve_policy=no_reserve

Here’s the command to swap the disk:

clmgr modify cluster REPOSITORY=hdisk#

Then verify both disks are repository disks with:

clmgr query repository

Remove the original disk with:

clmgr –f delete repository hdisk#

Then rerun the query command:

clmgr query repository

At this point, verify your new disk is the repository disk. 

Then finally, sync the cluster:

clmgr sync cluster

I wanted to highlight that last part, but there’s a lot more in the actual post. I encourage you to read the whole thing.

Power Overload

Edit: Some links no longer work

Originally posted June 11, 2019 on AIXchange

I attended last month’s IBM Systems Technical University (TechU) conference in Atlanta, and as always, it was an enjoyable and enlightening time. In one of his sessions, Nigel Griffiths had a great slide that challenged attendees’ “street cred”:

Which is right?

Power
Power9
Power 9
Power-9
POWER9
POWER 9
POWER-9

Did you know the answer, or did you need to check the Twitter comments? 

In that same session, Nigel remarked on how the word power has become overloaded. That got me thinking: If I ask if you have enough power, what am I talking about? The POWER processor architecture in general? A POWER-based server? Can you really have enough of either in your environment? 

What about electrical power to your data center? Have you run out of power because you have so much power-hungry gear? What about electrical power to your rack, do you have enough? Is it the right kind of power? 

Of course we have powerful power supplies on our POWER servers. Using the energy management modes, we can adjust the power and performance modes. Maybe you run your Power Systems server in power saver mode. 

You can probably come up with more—and more clever—examples, but Nigel’s point is that, in our world, power has many meanings. I agree—and as someone who gets asked about power all the time, please understand if I request some clarification on occasion. 

I encourage you to check out a TechU event in your area. The fall North American conference is this October in Las Vegas, but events are held worldwide.

DR Solutions and the Need to Keep Pace

Edit: Some links no longer work

Originally posted June 4, 2019 on AIXchange

Chris Gibson recently updated his blog post about using the ghostdev and clouddev flags in the disaster recovery process. 

In his original post, Chris replicated rootvg via his SAN. But since this was written in 2012, an update was needed. Here’s what Chris heard from IBM:

Since one or more of the physical devices will change when booting from an NPIV replicated rootvg, it is recommend to set the ghostdev attribute. The ghostdev attribute will trigger when it detects the AIX image is booting from either a different partition or server. Ghostdev attribute should not trigger during LPM operations (Live Partition Mobility). Once triggered, ghostdev will clear the customized ODM database. This will cause detected devices to be discovered as new devices (with default settings), and avoid the issue with missing/stale device entries in ODM. Since ghostdev does clear the entire customized ODM database, this will require you import your data (non-rootvg) volume groups again, and perform any (device) attribute customization. To set ghostdev, run “chdev -l sys0 -a ghostdev=1”. Ghostdev must be set before the rootvg is replicated.

This is from his update:

If ghostdev is set to 1 and you attempt to use SRR (or offline LPM), the AIX LPAR will reset the ODM during boot. This is (most likely) not desired behavior. If the ODM is cleared the system will need to be reconfigured so that TCP/IP and LVM are operational again. If you require a “ghostdev like” behavior for your AIX disaster recovery (DR) process, I would recommend you set the sys0 attribute, clouddev, to 1, immediately after you have booted from your replicated rootvg. Rebooting your AIX system with this setting enabled will “Recreate ODM devices on next boot” and allow you to reconfigure your LPAR for DR. Once you’ve booted with clouddev=1 and reconfigured your AIX LPAR at DR, immediately disable clouddev (i.e. set it to 0, the default), so that the ODM is not cleared again on the next system reboot. Some more details on clouddev [follow].

Chris concludes with:

If you are looking for a more modern and automated solution for your AIX DR, I would highly recommend you take a look at the IBM VM Recovery Manager for IBM Power Systems. Streamline site switches with a more economical, automated, easier to implement high availability and disaster recovery solution for IBM Power Systems.

Many admins struggle with disaster recovery. Some enterprises roll their own solutions, while others rely on IBM offerings. Frequently admins aren’t up to speed on the latest solution designs and implementation techniques. Quick example: simplified remote start. Are you aware of this capability? Read this and scroll to the bottom of the page for links to related and more detailed videos. 

Disaster recovery solutions are important, but creating and adhering to a DR plan is critical. Regular testing is the only way you’ll know that what you have will work should the need arise.

More PowerAI Resources

Edit: Some links no longer work

Originally posted May 28, 2019 on AIXchange

Following up on last week’s AI-themed post, I encourage you to check the extensive PowerAI documentation available from the IBM Knowledge Center. 

You’ll find instructions on planning, installing frameworks and PowerAI system setup, along with frequently asked questions, a developer portal, and more.  

There are also two new Redbooks that get into more concepts and information around Deep Learning and AI and Big Data.

Here are a couple short excerpts from the deep learning Redbook. First, from page 22 section 2.1: “What is IBM PowerAI?”

IBM PowerAI is a package of software distributions for many of the major deep learning (DL) software frameworks for model training, such as TensorFlow, Caffe, Chainer, Torch, and Theano, and their associated libraries, such as CUDA Deep Neural Network (cuDNN), and nvCaffe. They are extensions that take advantage of accelerators, for example, nvCaffe is NVIDIA extension to Caffe so that it can work on graphical processing units (GPU). As with nvCaffe, IBM has an own extension to Caffe, which is called IBM Caffe. Furthermore, the IBM PowerAI solution is optimized for performance by using the NVLink-based IBM POWER8 server, the IBM Power S822LC for High Performance Computing server, and its successor, the IBM Power System AC922 for High Performance Computing server. The stack also comes with supporting libraries, such as Deep Learning GPU Training System (DIGITS), OpenBLAS, Bazel, and NVIDIA Collective Communications Library (NCCL).

Here’s more from section 2.2:

IBM PowerAI provides the following benefits:

Fast time to deploy a DL environment so that clients can get to work immediately:

  • Simplified installation in usually less than 1 hour
  • Precompiled DL libraries, including all required files

Optimized performance so users can capture value sooner:

  • Built for IBM Power Systems servers with NVLink CPUs and NVIDIA GPUs, delivering performance unattainable elsewhere
  • Distributed DL, taking advantage of parallel processing

Designed for enterprise deployments:

  • Multitenancy supporting multiple users and lines of business (LOBs)
  • Centralized management and monitoring by integrations with other software

IBM service and support for the entire solution, including the open source DL frameworks.

One thing I can tell you from experience: the most recent releases of PowerAI are much easier to install than the earlier versions. And upgrading is simple enough. For instance, I was working on Redhat with an older PowerAI version, so I followed this information to upgrade it:

Taking those steps, in that order, we were able to start working with the latest version of PowerAI.

Automation and AI at Work

Edit: This is still pretty interesting

Originally posted May 21, 2019 on AIXchange

There’s an old saying about there being no free lunches. (Kids, ask your grandparents.) But in the age of AI, apparently that’s no longer the case. Check out this entertaining story about a savvy techie who used AI and Instagram to automatically post information and receive free meals:

I’m going to explain to you how I’m receiving these free meals from some of the best restaurants in New York City. I’ll admit?—?it’s rather technical and not everyone can reproduce my methodology. You’ll either need a background in Data Science/Software Development or a lot of free time on your hands. Since I have the prior, I sit back and let my code do the work for me. Oh, and you guessed it, you’ll need to know how to use Instagram as well….

Some of this may seem like common sense, but when you’re automating a system to act like a human, details are important. The process can be broken down into three phases: content sharing, growth hacking, and sales & promotion….

So how did he do it?

… I needed to create an algorithm that can weed out the bad from the good. The first part of my “cleaner” has some hard-coded rules and the second is a machine learning model that refines the content even further.

I played around with a number of classification algorithms such as Support Vector Machines and Random Forests but landed on a basic Logistic Regression. I did this for a few reasons, first being Occam’s Razor?—?sometimes the simplest answer is the right one. …

I wrote a Python script that randomly grabs one of these pictures and auto-generates a caption after the scraping and cleaning process is completed. Using the Instagram API, I was able to write code that does the actual posting for me. I scheduled a cron job to run around 8:00 AM, 2:00 PM, and 7:30 PM every day.

At this point, I have a complete self-sustaining robotic Instagram. My NYC page, on its own, is finding relevant content, weeding out bad potential posts, generating credits and a caption, and posting throughout the day. In addition, from 7:00 AM to 10:00 PM, it is growing its presence by automatically liking, following, and unfollowing with an intrigued audience which has been further redefined by some data science algorithms.

And in conclusion:

Due to the power of AI, automation, and data science?—?I am able to sit back and relax while my code does the work for me. It acts as a source of entertainment while at the same time being my salesman.

I hope this helps inspire some creativity when it comes to social media. Anyone can use these methods whether they are technical enough to automate or if they need to do it by hand. Instagram is a powerful tool and can be used for a variety of business benefits.

I’ve skipped most of the details, so by all means, read the whole thing. Crazy as it sounds, it’s a fantastic example of what can be accomplished with machine learning and artificial intelligence.

The lshwres Command and Hardware Discovery

Edit: Some links no longer work

Originally posted May 14, 2019 on AIXchange

Recently, a friend was trying to get the lshwres command to work in his environment. 

I’ve previously written about using the HMC command line to get information from managed machines. It’s a terrific use of the HMC, especially if you’re working with new machines and your OS isn’t loaded yet. Even in established environments, the HMC command line makes everything easier. Why bother with logging into multiple VIO servers or LPARs to get information? 

In my friend’s case, he was running a loop. First he used lssyscfg to get the system names. Then he fed those names into the lshwres command. It was simple enough; he wanted to collect WWNs for his SAN team so they could get them zoned appropriately:

lshwres -r virtualio -m <machine name> –rsubtype fc –level lpar -F lpar_name,wwpns

However, for some reason, only the two new E950 machines were providing the expected output. Meanwhile, nothing was happening with the two new S924 machines. “No results were found” was the only response from them. 

A web search on the lshwres command returned this information from IBM developerWorks:

If the lshwres commmand displays the message:

No results were found

then hardware discovery has not yet been done. The managed server must be powered off and back on with the hardware discovery option. To power on a managed server with hardware discovery from the HMC command line, use the HMC command chsysstate -m xxx -r sys -o onhwdisc, where xxx is the same as above. Or choose the Hardware Discovery option in the HMC Power On window, assuming the Hardware Discovery option is offered by the HMC GUI.

Please note that when powered on with the hardware discovery option, a server takes longer to get to partition standby mode than without the option. And once in partition standby mode, a temporary LPAR for hardware discovery is created and started, which takes additional time.

This IBM Support Center document has more about hardware discovery

Sure enough, once we powered off the machines and restarted the frames with the onhwdisc option, the S924s also gave the expected output.

Setting the Record Straight on Power Systems

Edit: Still good information

Originally posted May 7, 2019 on AIXchange

There’s an IBM-produced blog about Power Systems servers that’s worth bookmarking. This introduction explains it well:

Here are some of the most common myths we hear in the marketplace today:

  • Power has no cloud strategy.
  • Migrating to Power is costly, painful and risky.
  • x/86 is the de-facto industry standard and Power will soon be obsolete.
  • Power solutions are more expensive than x/86 solutions.
  • Linux on Power operates differently and is managed differently than Linux on x/86.
  • x/86 is the best platform to run SAP HANA, Nutanix and open source databases like Mongo DB, MariaDB and EnterpriseDB.
  • Reliability, availability and serviceability (RAS) features are no longer a differentiator because every platform is the same.
  • Oracle software runs better on Exa and/or Sparc systems than it does on Power.
  • Power is a closed, proprietary architecture.
  • The OpenPOWER Foundation is weak and not really important to anyone in the industry.

There are regular updates. In particular, I loved this post from March that addresses the perception that x86 is the industry standard:

To begin breaking down this myth, let’s consider how IBM Power Systems stands apart from x86.

Designed for enterprise workloads. x86 is designed to accommodate multiple markets and design points, from smartphones to laptops, PCs and servers. Power Systems, on the other hand, is designed for high-performance, enterprise workloads like data analytics, artificial intelligence and cloud-native apps and microservices—workloads that are driving innovation and digital transformation in organizations today.

Targeting new market segments. Over the years, x86 vendors shipped a lot of systems into commodity markets, but there have always been market segments it couldn’t get because of the limitations of its general-purpose architecture.

Today, a growing number of market segments where just a few years ago x86 was the only solution available, are facing strong competition from Power Systems. Consider the number of clients who bought x86-based solutions for SAP HANA, Nutanix and open source databases like MongoDB, EDB PostgreSQL and Redis, to name a few. They didn’t buy x86 solutions because they were the best choice; they bought them because they were the only choice. SAP HANA is an excellent example. 2,500-plus clients now run this application on Power Systems instead of x86.

These applications, plus the rising demand for data analytics, HPC infrastructure and cognitive solutions like AI, machine learning and deep learning, may be the most cogent examples of market segments x86 is struggling to keep.

On the forefront of high-performance computing. In addition, two of the world’s most powerful supercomputers are running IBM POWER9: the US Department of Energy’s Summit and Sierra at Oak Ridge National Laboratory in Tennessee and Lawrence Livermore National Laboratory in California.

Growing revenue. Far from being pushed out of the market, IBM Power Systems has enjoyed five consecutive quarters of growth driven by client adoption of the latest generation of Power processors, IBM POWER9.

As I said, there’s much more. I cite this information because, in our world, perception is often reality. As users of IBM solutions, we need to be doing our part to help educate those around us about the real-world value of Power Systems.

Text from AIX

Edit: Do you do anything similar?

Originally posted April 30, 2019 on AIXchange

As I’ve noted previously, I regularly visit the AIX Forum. Generally there’s good discussion, and occasionally an interesting question is raised. For instance, about a month ago a forum member asked about sending texts from AIX

The first reply noted that the curl command can be used for this:

curl http://textbelt.com/text -d number=123456789 -d “message=hello from me”

Alternatively, you could email your provider (assuming they have an SMS gateway). For instance in the U.S. Verizon allows you email the 10-digit mobile number followed by @vtext.com. Messages are limited to 160 characters, including the subject line and recipient’s email address. To include an attachment, enter the recipient’s 10-digit number followed by @vzwpix.com. 

AT&T offers something similar. Information about other carriers, along with browser add-ins, can be found here

So why text through AIX in the first place? Administrators often do this as part of a notification process. If there’s an error in the system, the admin receives an automated text message. Or maybe you want to know when a job completes. 

Maybe you want to set up some reminders in your crontab. Although this example runs on Linux, something similar can be set up on AIX. 

Of course you must weigh the benefits of getting notified via SMS vs. an email notification, but it’s always nice to have options. 

Updating Old Firmware with the OpenBMC Tool

Edit: Have you run into this?

Originally posted April 23, 2019 on AIXchange

I was recently upgrading the firmware on an AC922 server when I realized that the firmware was old enough that no GUI was available for the task. 

Now, for those of you dealing with more recent releases, firmware can be updated using the OpenBMC GUI, which is explained on page 9 of this PDF. Simply connect to your web browser’s BMC IP address and you’re set. 

In my case, I needed the OpenBMC tool. Learn the basic commands and functionality here; download the tool here. Page 3 of the aforementioned PDF outlines the procedure for updating your firmware. 

I was running this command (where bmc or pnor is the type of image being flashed to the system):

openbmctool -U <username> -P <password> -H <BMC IP address or BMC host name> firmware flash <bmc or pnor> -f xxx.tar

My problem was it kept failing when copying the .tar file from my machine to the BMC. Fortunately, this alternative method allowed me to update the firmware. I’d scp the files over to the BMC in the /tmp/images directory. The solution would automatically decompress the files and make them available for use. 

From there I was able to use the curl commands referenced in the GitHub link above and consult the REST-cheat sheet. 

One tricky issue I ran into with curl is that it stores cookies on your machine. After running the commands a few times, they stopped working due to stale cookies. So I had to delete the cjar file in my directory and log back in to get the updates to work. 

Once I got the hang of scp/curl method and my system was updated, the latest version of firmware got my GUI working. So it’s possible I won’t need to be doing these updates manually going forward. Nonetheless, I wanted to share this information so you’d at least have a starting point should you run into these issues when updating the firmware on your machines.

Machine Learning on AIX

Edit: Have you had a chance to try this?

Originally posted April 16, 2019 on AIXchange

If you believe that machine learning is strictly for Linux, check out this IBM tutorial on installing and configuring Python machine learning packages on AIX:

Machine learning is a branch of artificial intelligence that helps enterprises to discover hidden insights from large amounts of data and run predictions. Machine learning algorithms are written by data scientists to understand data trends and provide predictions beyond simple analysis. Python is a popular programming language that is used extensively to write machine learning algorithms due to its simplicity and applicability. Many packages are written in Python that can help data scientists to perform data analysis, data visualization, data preprocessing, feature extraction, model building, training, evaluation, and model deployment of machine learning algorithms.

This tutorial describes the installation and configuration of Python-based ecosystem of machine learning packages on IBM AIX. AIX users can use these packages to efficiently perform data mining, data analysis, scientific computing, data plotting, and other machine learning tasks. Some of these Python machine learning packages are NumPy, Pandas, Scikit-learn, SciPy, and Matplotlib.

Because all these packages are Python based, the latest version of Python needs to be installed on the AIX system. YUM can be used to install Python on AIX or it can be directly installed from AIX toolbox. This tutorial talks about Python3 but same should work for Python2 as well. You need to have python3-3.7.1.-1 or later version of Python from AIX toolbox to run these machine learning packages.

In this tutorial, we use a Python package management tool called pip to install these machine learning packages on AIX. These packages are compiled as part of pip installation because binary versions of these packages for AIX are not available on the Python Package Index (PyPI) repository.

You’ll also find detailed instructions for installing on your system. In addition, there are several related tutorials covering topics like the Scientific Computing Tools for Python, NumPy, Scikit Learn, Project Jupyter, and YUM on AIX.

An Introduction to Problem Analysis

Edit: More good information when you do not know where to start.

Originally posted April 9, 2019 on AIXchange

The IBM Knowledge Center has a number of documents on problem analysis for AIX, Linux and more. While this information may seem basic to anyone’s who’s spent years dealing with these types of issues, junior admins should spend some time with beginning problem analysis, and all of the other info linked in this document. 

The doc on AIX and Linux Problem Analysis starts with these tips: Remember the following points while troubleshooting problems:

  • Has an external power outage or momentary power loss occurred?
  • Has the hardware configuration changed?
  • Has system software been added?
  • Have any new programs or program updates (including PTFs) been installed recently?

There are also tutorials on IBM i problem analysis and light path diagnostics on Power Systems

Finally, there’s a problem reporting form.  

Credit to Kiran Tripathi on Twitter, who pointed me toward these docs.

Choosing the Proper Level for Managing VIOS with NIM

Edit: These pages are still worth bookmarking.

Originally posted April 2, 2019 on AIXchange

Here’s an IBM document on VIO server to NIM mapping (courtesy of Chris Gibson on Twitter). 

The chart shows you which levels are needed for your NIM master to manage your VIO servers. Particularly since the update to VIOS 3.1, it’s critical that your NIM master is at the correct level. 

While I’m talking about IBM documents, here are some other pages worth bookmarking: 

 Just poking around on IBM support websites can be productive. You never know what information you might uncover that will help you somewhere down the line.

Comments on the Changing UNIX Landscape

Edit: It is still a significant slice of the POWER business.

Originally posted March 26, 2019 on AIXchange

I was quoted in this recent NetworkWorld article on the slow decline of UNIX. 

You’ll have to register to read the whole thing, but I want to hit some highlights:

Most of what remains on Unix today are customized, mission-critical workloads in fields such as financial services and healthcare. Because those apps are expensive and risky to migrate or rewrite, Bowers expects a long-tail decline in Unix that might last 20 years. “As a viable operating system, it’s got at least 10 years because there’s this long tail. Even 20 years from now, people will still want to run it,” he says.

The gist of the article is that IBM is the last company standing in the UNIX space. That sounds pretty dire, but the tone changes when IBM executive Steve Sibley is quoted. He notes that 10 years from now the company will continue to have a substantial number of AIX clients, the majority of which will be Fortune 500 clients. Here’s the part where I come in:

“No one buys a platform for the platform,” McNelly says. “They buy an application. As long as application support remains for some key platforms, it’s hard to beat the value of AIX on [IBM Power Systems]. Many times after companies do some analysis, [and consider] the current stability and the migration effort, [it] makes no sense to move out of something that’s perfectly functional and supported and has a strong roadmap into the future.”

To elaborate, the beauty of IBM Power Systems hardware is that it’s positioned to run whatever application and operating system you want to run: AIX, IBM i or Linux. As stated in the article, these large, powerful systems are designed for uptime and resiliency, but this focus does not come at the expense of enabling smaller nodes intended to run Nutanix or smaller IBM/OpenPOWER servers. IBM runs the world’s fastest supercomputers while still providing large enterprise systems with capabilities like capacity on demand and virtualization that competitors cannot.

A Look at AIX and Cloud

Edit: Some links no longer work.

Originally posted March 19, 2019 on AIXchange

I’m quite late to this, but if you haven’t caught Petra Bührer’s Power Systems Virtual User Group presentation, “Enterprise Cloud Bundle and AIX Enterprise Edition,” check it out. (Download the slides and watch the video.) A couple highlights from the Jan. 31 broadcast: 

  • In slide 26, Petra goes over the AIX roadmap and explains why IBM is sticking with 7.2 as the current AIX version. (Spoiler: Now that new functionality is made available through technology levels and service packs, there’s no technologically driven need to update the AIX version number at this time.) She also notes upcoming AIX end of service/end of life dates: April 2019 for AIX 5.3 and April 2020 for AIX 6.1. In addition, POWER5, POWER6 and POWER7 hardware end of service arrives in 2019.
  • In slide 27, Petra discusses the roles of AIX and Red Hat going forward.

 Petra also has her own perspective on AIX’s long-term viability (a favorite discussion topic for yours truly: herehere and here). Her July 2018 IBM Systems Magazine article is also worth your time. As I’ve often said, the Power Systems VUG does a great job of providing detailed information on an array of topics. If you can’t catch these presentations live, you can always go back and dig into the replays.

Logging in NPIV from the HMC

Edit: More than one option is always useful.

Originally posted March 12, 2019 on AIXchange

Back in 2013 I wrote about using the chnportlogin and lsnportlogin commands to display and change N_Port ID virtualization (NPIV) mappings. This same operation can be accomplished using the HMC, which came in handy for me when I was recently asked how to get secondary World Wide Names (WWNs) logged in to use for live partition mobility:

A login operation may need to be initiated to facilitate SAN Administrators zoning of new virtual WWPNs (vWWPN), including all inactive WWPNs (2nd WWPN in the pair), which are used in Partition Mobility environments.

When performing a login operation, all inactive WWPNs will be activated, including the second WWPN in the pair assigned to each virtual Fibre Channel client adapter. When performing a logout operation, all WWPNs not in use will be deactivated.

To successfully log in a virtual Fibre Channel client adapter, the corresponding virtual Fibre Channel server adapter must exist and it must be mapped.

The primary intent of the login operation is to allow the system administrator to allocate, log in and zone WWPNs before the client partition is activated. With best practices, the WWPNs should be logged out after they are zoned on the Storage Area Network (SAN) and before the partition is activated. If a partition is activated with WWPNs still logged in, the WWPNs used for client access are automatically logged out so they can be logged in by the client.

The login operation can also be used to zone the inactive WWPNs in preparation for a partition mobility operation. If the login operation is performed when a partition is already active, only the inactive WWPNs are activated to the “constant login” state similar to physical Fibre Channel adapters. The WWPNs that are already in use by the virtual Fibre Channel client adapters remain in control of the virtual Fibre Channel clients and are not under the control of this command. This means that active client virtual Fibre Channel WWPNs do not achieve a “constant login” state similar to physical Fibre Channel adapters.

The login operation can interfere with partition mobility operations. Best practice is to perform a logout operation for a partition before attempting to migrate the partition to another server. If a mobility operation is attempted with WWPNs still logged in, the firmware will attempt to automatically log out the WWPNs. However, in some cases, the logouts may not complete in time and may therefore cause the mobility operation to fail.

This IBM Support doc was last modified in December 2016, so the screen shots depict the classic HMC interface. Nonetheless, it’s a good starting point should you need to manually log in ports.

Asking the Right Questions

Edit: It always makes sense to think before you speak.

Originally posted March 5, 2019 on AIXchange

I love the approach expressed in this tweet

Brad Geesaman @bradgeesaman

Instead of asking “Why didn’t you just use X?” Ask: “Was solution X considered?” You’ll 9/10 times get a really good reason and 10/10 times not make yourself sound arrogant and accusatory.

Have you ever wondered how a particular implementation was ever approved? Why was this choice made instead of something simpler or easier? 

Short answer: Often it isn’t that simple. It’s important to understand that, even when the technical solution seems obvious to you, there may be political or other considerations in play behind the scenes that you know nothing about. 

It may seem simple to you: What do you mean you didn’t mirror that logical volume to begin with? What do you mean you never tested your backups before today? What do you mean you only gave 0.1 CPU to that VIO server? Why-as stated in the tweet-didn’t you just do X? 

It’s important though to be open to other possibilities. Some answers may surprise you.  

Sometimes systems that are set up as test machines morph into production machines, and decisions that were perfectly fine for testing weren’t revisited. Obviously there could be skills gaps; those involved did the best they could with the information that they had at the time. Beyond that, requirements change; what once worked great will no longer cut it. Maybe sufficient resources are lacking, either in hardware or personnel, to implement requests. I’ve seen situations where technical employees are overruled and a non-IT decision maker dictates system configuration. 

There could be a hundred reasons why your “no-brainer” solution to this obvious problem wasn’t used. Part of our job is to understand and deal with the constraints that are in place. It’s not our place to simply chime in with some quick fix. Especially when you’re being brought into a new situation, make sure you take the time to really listen before making suggestions, and make sure your questions are the right ones. 

Remember, things change. A few years from now, someone may walk in and wonder about the solution you implemented. “Well why didn’t he (you) think of this?

Getting Started with AIX System Files

Edit: Hopefully this is just a review.

Originally posted February 26, 2019 on AIXchange

Awhile back Shivaprasad Nayak tweeted about AIX system files. 

Here’s a glimpse from the IBM Knowledge Center:

The files in this section are system files. These files are created and maintained by the operating system and are necessary for the system to perform its many functions. System files are used by many commands and subroutines to perform operations. These files can only be changed by a user with root authority.

There are three basic types of files:

All file types recognized by the system fall into one of these categories. However, the operating system uses many variations of these basic types.

Regular files are the most common. When a word processing program is used to create a document, both the program and the document are contained in regular files.

Regular files contain either text or binary information. Text files are readable by the user. Binary files are readable by the computer. Binary files can be executable files that instruct the system to accomplish a job. Commands, shell scripts, and other programs are stored in executable files.

Directories contain information the system needs to access all types of files, but they do not contain the actual file data. As a result, directories occupy less space than a regular file and give the file-system structure flexibility and depth. Each directory entry represents either a file or subdirectory and contains the name of a file and the file’s i-node (index node reference) number. The i-node number represents the unique i-node that describes the location of the data associated with the file. Directories are created and controlled by a separate set of commands.

Special files define devices for the system or temporary files created by processes. There are three basic types of special files: FIFO (first-in, first-out), block, and character. FIFO files are also called pipes. Pipes are created by one process to temporarily allow communication with another process. These files cease to exist when the first process finishes. Block and character files define devices.

Scroll down and you’ll see a list of many files you should be familiar with.

Also check out the parent web page, “Files Reference”:

This topic collection contains sections on the system files, special files, header files, and directories that are provided with the operating system and optional program products. File formats required for certain files that are generated by the system or by an optional program are also presented in this topic collection.

This is all good information, so I wanted to pass it along.

A Lifetime Champion

Edit: I am still happy to be part of the community of Champions.

Originally posted February 19, 2019 on AIXchange

Last month, the IBM Champions program announced its honorees for 2019:

The IBM Champions program recognizes and rewards external experts and thought leaders for their work with IBM products and communities. The program supports advocates and experts across IBM in areas that include Blockchain, Cloud, Collaboration, Data & Analytics, Security, Storage, Power, Watson IoT, and IBM Z.

An IBM Champion is an IT professional, business leader, developer, or educator who influences and mentors others to help them innovate and transform digitally with IBM software, solutions, and services.

From the nominations, 635 IBM Champions were selected…. Among those are:

  • 65% renewing; 35% new Champions
  • 39 countries represented
  • 9 business areas, including Data & Analytics (31%), Cloud (22%), Collaboration Solutions (15%), Power Systems (9%), Storage (7%), IBM Z (6%), Watson IoT (4%), Blockchain (2%), and Security (3%)

As always, I’m happy to be a part of the IBM Champions community. It turns out though that there’s a bit more to the story. At last week’s IBM Think conference, I was one of eight new recipients of the IBM Champion Lifetime Achievement award (video here). 

It’s an incredible honor, and I only wish I could have been there in person. The IBM Champion Lifetime designation, “recognizes IBM Champions who stand above their peers for service to the community. Over multiple years, these IBM Champions consistently excel and positively impact the community. They lead by example, are passionate about sharing knowledge, and provide constructive feedback to IBM. The Lifetime Achievement award provides automatic re-nomination into the IBM Champion program for the duration of the program, plus other benefits.” 

Please allow me to reiterate a couple of familiar points: 1) it means a great deal to be recognized for my contributions and 2) without this blog and those of you who read it, I’m not sure this achievement would be possible. 

Yes, I also use Twitter (@robmcnelly) to help inform and educate AIX/IBM Power Systems users, but most of my time and energy is spent posting to this blog. I’m especially grateful to all who frequent AIXchange and share their insights. Thank you, again, for taking time out of your busy days to read what I write.

System Software Maps Provide Quick Answers on OS Support

Edit: I still look at these from time to time.

Originally posted February 12, 2019 on AIXchange

Some weeks back Nigel Griffiths tweeted something that seemed familiar. 

He noted that IBM Support has system software maps for AIX, IBM i, VIO server, SUSE, Red Hat and Ubuntu Linux. The maps allow you to quickly locate all the IBM Power Systems server models that support these operating systems. 

Seeing Nigel tweet about this reminded me that I wrote about the software maps in 2015

These pages are regularly updated and of course now include the POWER9 servers. But as I noted then, you don’t need to be on the latest and greatest hardware to benefit from this information. For instance, the AIX maps extend all the way back to RS/6000 models. Being able to easily determine supported OS versions is especially helpful should you need to deploy workloads on older repurposed hardware. 

As Nigel says, this is a useful webpage, so bookmark it. Don’t wait for us to remind you about this great tool again. You may not need it on a daily basis, but you’ll want this information at your fingertips. 

PowerVC-Based Tool Rebalances Workloads

Edit: Some links no longer work.

Originally posted February 5, 2019 on AIXchange

I was recently asked if there’s a way to automatically rebalance AIX workloads on IBM Power Systems servers. There sure is. It’s called the PowerVC Dynamic Resource Optimizer:

The Dynamic Resource Optimizer (DRO) is a cutting-edge feature of PowerVC that brings an unprecedented level of automation to your Power cloud for PowerVM and PowerKVM hypervisors. When enabled, DRO monitors your compute nodes and automatically detects resource imbalances. Depending on its mode of operation, DRO will either advise or automatically perform actions to restore balance to your cloud. Using this technology allows cloud administrators to spend less time performing labor-intensive infrastructure monitoring tasks, and allows more time to focus their efforts on other critical business initiatives. Additionally, enterprises can achieve higher levels of ROI regarding their hardware as it can run increased workload densities. When workload spikes occur, DRO can quickly recognize the imbalance and rebalance the cloud before chaos unfolds.

Here’s more:

DRO can take two types of actions: virtual machine live migration, and mobile core activations via Power Enterprise Pools. The actions taken by the DRO depend on the options selected by users. Figure 2 is a screenshot of a host group being created where you can see options such as: “CPU utilization, stabilization, run interval, and maximum concurrent migrations.” You can choose to migrate virtual machines, activate mobile cores, or both. If you choose both and the host in need of attention is a member of an Enterprise Pool, the DRO first tries to activate one or more mobile cores. DRO tries to migrate a VM from a busy host to a less busy host.

Go to this page to read the whole thing (and view the screenshots). Learn even more by watching this demo. 

Many customers are unaware of DRO, but it’s easy to implement if you already have PowerVC running in your environment. PowerVC users should investigate this option.

There’s Even More to the POWER9 Story

Edit: Are they still the fastest in the world?

Originally posted January 29, 2019 on AIXchange

We all know that Summit and Sierra are the world’s fastest supercomputers, and that they run on POWER9 processors connected to NVIDIA GPUs. (The second half of this post goes into detail.) 

Here’s more from CNet:

The US now can claim the top two machines on a list of the 500 fastest supercomputers, as Sierra, an IBM machine for nuclear weapons research at Lawrence Livermore National Laboratory, edged out a Chinese system that last year was the very fastest.

The Top500 list ranks supercomputers based on how quickly they perform a mathematical calculation test called Linpack. The top machine, IBM’s Summit at Oak Ridge National Laboratory, had claimed the No. 1 spot in June with a speed of 122.3 quintillion mathematical operations per second, or 122.3 petaflops.

But an upgrade gave it a score of 143.5 petaflops on the newest list. To match that speed, each person on the planet would have to perform 19 million calculations per second. Sierra got an upgrade, too, boosting its performance from 71.6 petaflops to 94.6 petaflops and lifting it from third place to second.

Summit and Sierra are siblings, each using IBM POWER9 processors boosted by Nvidia Tesla V100 accelerator chips and connected with Mellanox high-speed Infiniband network connections. They’re gargantuan machines made of row after row of refrigerator-size computing cabinets. Summit has 2.4 million processor cores and Sierra has 1.6 million.

Supercomputers are used for tasks like virtual testing of nuclear weapons, aerodynamic modeling of aircraft, understanding the formation of the universe, researching cancer and forecasting climate change effects. They’re expensive but prestigious machines that can keep scientists and engineers at the vanguard of research.

But there’s still more to this story, and, not surprisingly, it speaks to the singular quality of IBM Power Systems hardware. Top500, the supercomputer ratings group referenced in the article, published some detailed data, which I’ve condensed to this simple table below. (Go here to see the original.)

RankSystem CoresRmax (TFlop/s)Rpeak (TFlop/s)Power (kW)
12,397,824143,500.0200,794.99,783
21,572,48094,640.0125,712.07,438
310,649,60093,014.6125,435.915,371
44,981,76061,444.5100,678.718,482

Note the stark differences in the number of system cores deployed, as well as the power consumption. The Power Systems machines require far fewer cores and consume roughly 50 percent as much energy. 

Less than half the cores, nearly half the power, and better results? These are some spectacular numbers. If you ever need to make an argument for Power Systems hardware running AIX, IBM i or Linux with the latest processors, this is tremendous ammunition.

A Red Hot Reddit Discussion of AIX & Linux

Edit: Are you ready to switch departments?

Originally posted January 22, 2019 on AIXchange

I’ve long maintained that AIX isn’t imperiled by the prevalence of Linux. Even so, it’s always great to encounter passionate defenses of our favorite operating system, and this thread from the AIX Reddit feed (r/aix) is chock full of them. 

Let’s start with the original post:

Not that I have something against AIX, but I don’t see many people using it. And coming here to this sub-reddit confirmed my fears. Linux sub-reddit has 3000 times more subscribers and it’s a very fast growing technology/community. I fear that AIX doesn’t have such a big future compared to Linux.

I’d prefer to move to a Linux department, where the real deal is.

Should I talk to my company or just go to that department until my internship is over and then decide what I should do?

The responses are, in a word, glorious. Here’s the first one:

Linux is a Kia. AIX is a Mercedes.

I’ve been a UNIX Sys Admin for over 20 years and have worked on AIX, Solaris, HP-UX, Tru-64, Dynix, Pyramid, NCR/MP-RAS, and Linux.

The AIX systems are the ones I’ve had the least problems from. They crash the least and are the most stable.

Another reply:

Linux is something you can learn at home. AIX is a skill you get on the job. Learning AIX is learning about Unix systems. Linux is a Unix-like system. Learn AIX, play with Linux at home and you’ll be better prepared.

Since this is an internship, I don’t think complaining is the way to go. AIX is very much alive. The user base isn’t on reddit.

And another:

The reason the linux subreddits are vastly more popular is because they aren’t an enterprise-tier OS with enterprise-tier support to match. When people run into issues in linux they turn to the open source community. When you run into issues with AIX, you call IBM because that’s what you pay them for.

Much of what you’ll learn on AIX (especially in an internship) will translate to linux and other unix systems. If you get into a position where you’re throwing into a linux environment after having 20 months experience on AIX, you’ll do fine. Spin up some linux VMs on your own time and replicate the things you do at work in AIX on your lab in linux. Best of both worlds….

And one more:

Depends on the environment. If it’s a place that has their stuff together reasonably well you could learn a ton. AIX is the Cadillac of UNIX’s right now. It runs on IBM hardware so there’s a deeper level of integration that is just so much easier to work with. Support is good once you get past the level one folks. But really it’s about the mainframe mentality. The guys who wrote AIX took the lessons from decades of other experience and put that into AIX. It really shows.

Linux is the Windows of UNIX. It’s great because it runs on anything and is easy to pickup. But once you start working on it daily you’ll see the warts.

All that being said Linux is where most of the new jobs are at. AIX jobs are harder to find but more likely to be more rewarding if your into big systems.

This particular story has a happy ending, as the original poster returns with this small but significant edit:

Thanks everyone. You convinced me.

That’s just a sampling of comments. Read the whole thing.

Replicating Changes Across Multiple HMCs

Edit: Have you set this up?

Originally posted January 15, 2019 on AIXchange

For any environment with multiple HMCs, data replication generally makes sense. IBM’s Customizable Data Replication service can help you accomplish this, no matter how your HMCs are set up:

The Customizable Data Replication service provides the ability to configure a set of Hardware Management Consoles (HMCs) to automatically replicate any changes to certain types of data so that the configured set of HMCs automatically keep this data synchronized without manual intervention.

The following types of data can be configured:

Customer information data

  • Administrator information (customer name, address, telephone number, and so on.)
  • System information (administrator name, address, telephone of your system)
  • Account information (customer number, enterprise number, sales branch office, and so on.)

Group data

  • All user-defined group definitions

Modem configuration data

  • Configure modem for remote support

Outbound connectivity data

  • Configure local modem to RSF
  • Enable an internet connection
  • Configure to an external time source

The Customizable Data Replication service can be enabled for the following types of operations:

Peer-to-peer
Provides automatic replication of the selected customized data types between peer HMCs. Changes made on any of these consoles are replicated to the other consoles.

Master-to-slave
Provides automatic replication of the selected customized data types from one or more designated master HMCs to one or more designated slave HMCs. Changes made on a master console are automatically replicated to the slave console.
This document includes links that go into detail on setting up peer-to-peer and master-to-slave replication. This companion doc tells you how to force replication:

As data is replicated from one HMC to another, an internal level indicator for the data being replicated is recorded each time the data is altered on the data source. Learn about how to force the replication of data from one or more data sources.

Each HMC keeps track of the level indicator for each type of data and will not accept data from a data source when the level indicator is not greater than that on the receiving HMC.Keep reading that doc to learn how to force the replication of data from one or more data sources when the level indicator on the receiving HMC is greater than that of the data sources.

If you’ve ever had to manage an environment with multiple HMCs and multiple HMC users, the benefits of the Customizable Data Replication service should be apparent. Changing a single HMC is far easier than attempting to propagate changes across a group of them.

More on VIOS 3.1

Edit: Did you upgrade yet?

Originally posted January 8, 2019 on AIXchange

A quick follow up on this post about VIOS 3.1. 

IBM Champion Stephen Diwell makes some interesting points based on his discoveries during hands-on testing. In this post, he mentions some important things to take note of during the install process, starting with the way paging devices are set up. He also says that resizing your filesystems is a really good idea, and suggests changing the password algorithm from the default. 

In a follow-up post, Steve discusses finding ssh host keys included with the VIO server mksysb image, and shows you how to remedy that issue by removing and regenerating the key. 

In addition, IBM Systems Magazine has this VIOS 3.1 overview from Jaqui Lynch. 

VIOS is an important utility for many of us, so it’s important to know about the potential gotchas as we upgrade to the latest version. As I find more information, I’ll be sure to pass it on.

Staying Fit: An Ongoing Story of Peaks and Valleys

Edit: I am still keeping after it

Originally posted December 18, 2018 on AIXchange

How do you know if your friend is vegan and does Crossfit? Don’t worry, they’ll tell you.

Admittedly, I’ve kind of evolved into that guy. Back in 2016 I posted about my attempts at losing weight. Since the end of the year provides a window for many to get away from the office, and give some thought to New Year’s Resolutions, now seems like an appropriate time to share the latest.

As I wrote then, my awakening came at a physical. The doctor wouldn’t sign off on a medical form due to my obesity. My absolute highest weight was in spring 2013, but by that November I was down around 60 pounds. I felt great. There was nothing dramatic about it; I just changed my diet and exercised regularly.

I’d like to say that was the end of the story, but I’ve ping-ponged since. By the summer of 2015 I’d regained about 25 pounds. So I got focused again, and the weight was off by the next year. Had I learned my lesson? Of course not. By the end of 2017, those 25 pounds were back on.

As I said, my two sons being in Boy Scouts and me wanting to participate in that with them was my original motivation to change. One son is now in the U.S. Marine Corps, having gone to boot camp in August 2017. Between his post boot camp and Military Occupational Specialty training, he was able to come home over the holidays. So on New Year’s Day 2018, we decided to take a little hike.

The venue was Picacho Peak, which is located midway between Phoenix and Tucson in Arizona:

Sunset Vista Trail: 3.1 miles; moderate first 2 miles, becoming difficult; Travels on the south side from the westernmost parking area and goes to the top of the peak. The first 2 miles are moderate, then the route becomes difficult, steep and twisting, with steel cables (gloves are recommended) anchored into the rock in places where the surface is bare. This trail is not recommended during hot weather seasons.

Fun fact: Picacho means “peak” in Spanish. So we basically hiked up Peak Peak. Speaking of peaks, that day I learned I was not in peak physical condition.

Since we’d made this same hike a few years earlier, I figured I’d be fine. I wasn’t. I ended up watching my fit, fresh from boot camp son easily reach the summit while I took a breather and took stock. I realized that, once again, I needed to do something about my fitness levels.

After about three months of dieting and weight-watching, I participated in a Super Sprint triathlon with some Scouts this past April. This consists of a 75-yard swim, a 6-mile bike ride, and a 1.6-mile run. With the memory of getting smoked by my son on that hike still fresh, I got through it without issue. In fact I felt good enough to sign up for a slightly tougher triathlon event in the fall: a 425-yard swim, a 15-mile bike ride and a 5K run.

After training over the spring and summer and doing that second triathlon, the weight has come back off, though there have been setbacks: an injury here, a junk food binge there (I refer to the latter as my “weekends of debauchery”). But overall, I feel really good. My blood pressure is down, my resting heart rate is low, and I’m always looking forward to the next backpacking trip or hike.

I’m no athlete. I don’t do these events for medals, and I surely am not a threat to win anything. My goal is simply to finish, and to do better the next time. Having these events on my calendar (the next one is in April) keeps me focused on fitness, and my rewards from that are many. I have greater endurance and my clothes fit better (even some things I’d gotten too big to wear). Losing weight–and buying properly fitted shoes–helped me overcome plantar fasciitis.

For me dieting comes down to controlling meal portions while paying attention to the mix of protein, fats and carbs. I eat lots of salads, stay away from mindless snacking and avoid desserts and other sweets. I hit the gym three times a week. I work out with their special, pricey equipment, and I attend trainer-led group classes that include activities I might not choose to do on my own.

Most days at home I’ll get in an intense hour of cardio on the treadmill. Varying the workouts while continually edging up the intensity allows my body to continually adjust to the demands I put on it. And (of course) I apply some technology to the matter. My scale auto-magically connects to the cloud, allowing me to compile nearly seven years’ of data on my weight and body fat percentages. While there have been periods where I neglected doing the weigh-ins, checking the graphs and trends is nonetheless quite enlightening.

When I exercise, I track my heart rate, and when I run longer distances outside, I chart my pace. A tracker tells me the number of steps I take each day. Am I faster this time? Did I take as many steps today as I did yesterday? How many calories did I burn? I need to know. I need the numbers.

Of course there’s no one way to get in shape. I’ve seen others succeed with the Whole 30 dietNutrisystemWeight WatchersMedifast and Atkins. Just cutting back on carbs can help.

The trick is to find what works for you, and to find what motivates you. I’m motivated by checking my calendar and seeing that next event, and by seeing my metrics improve. Another motivator is having people who’ve not seen me in a while do a double take and ask if I’ve lost weight. (Why yes I have, thanks for asking.)

Finally, it motivates me just to talk about this. I realize that my story may not prompt anyone to take action, but sharing it helps me. It keeps me accountable. It makes everything real. It means if you see me stuffing my face at the next IBM Technical University, you are free and clear to give me a hard time about it.

Of course I do hope all of you do what you can to preserve and improve your health. If an old guy like me can do it, you can, too.

Note–the next blog post will be January 8, 2019.

Everything to Know about VIOS 3.1

Edit: Some links no longer work.

Originally posted December 11, 2018 on AIXchange

VIOS 3.1 is here, and now is the time to start planning your next move. Should you replace your hardware with new servers and do fresh VIOS installs on those machines, or would a gradual upgrade of your dual VIO servers make more sense? Get informed by digging into the numerous, valuable resources that have recently come out.

The time you invest in this material will be well spent.

Implementing the vHMC Requires Attention to Detail

Edit: Do you use a mix of hardware and virtual HMCs or are you all virtual?

Originally posted December 4, 2018 on AIXchange

If you’re planning to get rid of your physical appliances and run all of the HMCs in your environment as virtual machines, keep this in mind:

Originally the IBM POWER HMC was sold only as an integrated appliance that included the underlying hardware as well as the HMC firmware. IBM extended the POWER HMC offering to allow the purchase of the traditional hardware appliance (e.g. model 7042/7063) or a firmware only virtual HMC machine image. The virtual HMC (vHMC) offering allows clients to use their own hardware and server virtualization to host the IBM supplied HMC virtual appliance.

Support for vHMC
Since the hardware and server virtualization is supplied by the client to run the HMC virtual appliance, this infrastructure that actually hosts the HMC virtual appliance is not monitored by IBM. Serviceable events related to the vHMC firmware are monitored however “call-home” for these events is disabled. For further information see document Callhome on HMC Serviceable Events is Disabled on vHMC 

The HMC virtual appliance continues to monitor the managed Power Systems hardware just like the HMC hardware appliance. Both HMC form factors provide remote notification and automatic call-home of serviceable events for the managed Power Systems servers.

Support for vHMC firmware, including how-to and usage, is handled by IBM software support similar to the hardware appliance. When contacting IBM support for vHMC issues specify “software support” (not hardware) and reference the vHMC product identification number (PID: 5765-HMV).

How-to, install, and configuration support for the underlying virtualization manager is not included in this offering. IBM has separate support offerings for most common hypervisors which can be purchased if desired.

That document also includes a brief Q&A. I’ll highlight the following, which often goes overlooked:

Q: Are there any restrictions related to on-site warranty support for managed servers?
A: Restrictions are similar to the hardware appliance
– You must supply a workstation or virtual console session located within 8 meters (25 feet) of the managed system. The workstation must have browser and command line access to the HMC. This setup allows service personnel access to the HMC.
– You should supply a method to transfer service related files (dumps, firmware, logs, etc) to and from the HMC and IBM service. If removable media is needed to perform a service action, you must configure the virtual media assignment through the virtualization manager or provide the media access and file transfer from another host that has network access to HMC.
– Power vHMC cannot manage (nor service) the server it is hosted on.

The big takeaway is that you shouldn’t assume IBM service reps will plug into your customer network to access your virtual HMC. If you need assistance, IBM expects you to provide a workstation that they can access. And yes, this can be problematic. Worst case: some sort of outage is affecting your VMware cluster while IBM Support needs to work on your POWER hardware. Then you might end up in a pickle.

Incidentally, this is one significant point in favor of the traditional HMC form factor. It takes up 2U in your rack and it exists solely to manage your machines. Nonetheless, people will continue to move away from hardware-based HMCs, so it’s important to understand the requirements. While I prefer keeping a hardware appliance available and using the vHMC as a backup, of course every environment is unique. Only you know what will work best for you.

GDR as a Disaster Recovery Option

Edit: Still something to consider.

Originally posted November 27, 2018 on AIXchange

A sound disaster recovery plan is one that’s regularly being updated. With this in mind, I want to cite this overview of Geographically Dispersed Resiliency (GDR), a DR option that is designed for efficiency.

The GDR solution provides a highly available environment by identifying a set of resources that are required for processing the virtual machines in a server during disaster situations.

The GDR solution uses the following subsystems:

    Controller system (KSYS)
    Site
    Host
    Virtual machines (VMs) or logical partitions (LPARs)
    Storage
    Network
    Hardware Management Console (HMC)
    Virtual I/O Server (VIOS)

IBM Support also has a comparison of PowerHA and GDR:

Disaster recovery of applications and services is a key component to provide continuous business services. The Geographically Dispersed Resiliency for Power Systems (GDR) solution is a disaster recovery solution that is easy to deploy and provides automated operations to recover the production site. The GDR solution is based on the Geographically Dispersed Parallel Sysplex (GDPS) offering concepts that optimizes the usage of resources. This solution does not require you to deploy the backup virtual machines (VMs) for disaster recovery. Thus, the GDR solution reduces the software license and administrative costs.

The following high availability (HA) and disaster recovery (DR) models are commonly used by customers:

    Cluster-based technology
    VM restart-based technology

Clustered HA and DR solutions typically deploy redundant hardware and software components to provide near real-time failover when one or more components fail. The VM restart-based HA and DR solution relies on an out-of-band monitoring and management component that restarts the virtual machines on other hardware when the host infrastructure fails. The GDR solution is based on the VM restart technology.

The following table identifies the differences between the conventional cluster-based disaster recovery model and the GDR solution:



A disaster recovery implementation that uses a set of scripts and manual processes at a site level might take more time to recover and restore the services. The GDR solution automates the operations to recover your production site. This solution provides an easy deployment model that uses a controller system (KSYS) to monitor the entire virtual machine (VM) environment. This solution also provides flexible failover policies and storage replication management.

Finally, Michael Herrera has some great videos that cover conceptssoftware and advanced features.

As you design and update your DR solutions, be sure to consider GDR.

Exploring the Possibilities

Edit: How do you play with AIX?

Originally posted November 20, 2018 on AIXchange

There’s a lot you can do with AIX. But that doesn’t mean we won’t search for even more ways to play with it.

For instance, Chris Gibson recently got AIX running on a Macbook:

After reading this https://worthdoingbadly.com/aixqemu and this https://lists.gnu.org/archive/html/qemu-ppc/2018-05/msg00387.html, I was inspired and very curious. Could I get AIX 7.2 running on QEMU on my MacBook Pro (running Mac OS X 10.13.6)?

Well, the answer my friends, is yes…sort of.

Many thanks to Rob McNelly who originally tweeted this link, https://worthdoingbadly.com/aixqemu. If he had not, I would never have made the journey to QEMU land. So thanks Rob!

Also, thanks to Liang Guo for his assistance. Your guidance was greatly appreciated.

Note: What I describe here is NOT supported by IBM. It is purely a lab experiment to see what was possible with qemu-system-ppc64.

Then there’s this example of AIX 7.2 running on x86 hardware.  

Although those configurations would run too slowly for my taste, I’ve always loved the idea of having lab hardware to test/learn with. Of course IBM Power Systems servers typically run mission critical applications, so playing with the hardware available at work generally isn’t an option. (At the very least, you’d need test/disaster recovery/lab hardware; some workplaces have more options available than others.) Sure, some people buy old used servers and run them at home, but that’s not practical for everyone.

Nonetheless, it’s fun to follow what’s going on out there.

With OpenPOWER taking off, I’ve been tracking the new workstations that are available for running Linux on Power. One is from Raptor Computing Systems, the Talos II:

Talos II — the world’s first computing system to support the new PCIe 4.0 standard — also boasts substantial DDR4 memory, dual POWER9 CPUs, next-generation security, and a price that won’t break the bank. Let the power of Talos II accelerate your computing!

Offerings range from the secure workstation to the basic Talos II bundle.

The price point and specs caught my eye:

The Talos II mainboard is the first modern (post-2013), owner-controllable, workstation- and enterprise-class mainboard. Built around the brand-new IBM POWER9 processor, and leveraging Linux and OpenPOWER technology, Talos II allows you to secure your data without sacrificing performance. Designed with a fully owner-controlled CPU domain, you can audit and modify any portion of the open source firmware on the Talos II mainboard, all the way down to the CPU microcode. This is an unprecedented level of access for any modern workstation…

Getting AIX running in this type of modern environment would be amazing. Imagine being able to acquire some sort of student AIX license while having access to this kind of hardware in your home lab. You could run Linux on Power and AIX on POWER9 hardware that sits on your desktop. That sounds like… nirvana.

As these technologies evolve and the prices come down, my temptation level goes up. Do you know of other POWER9-based workstations or similar technology that’s on the horizon?

More Fun with AIX on a Nutanix Cluster

Edit: The cluster was fun to play with

Originally posted November 13, 2018 on AIXchange

I recently had another hands-on experience with a Nutanix Cluster.

This system consisted of four CS821 nodes. After previously doing an install with the cloud-ready deployment method, I wanted to try an .iso installation as well as installing from NIM. Those are the big three when it comes to installing AIX on Hyperconverged systems.

The first step is to create a VM. Nutanix has an image library that’s much like the virtual media repository on a VIO server in PowerVM. Populating this library with IBM-provided AIX .iso files turned out to be as simple as this:

  • I logged into Prism, opened “image configuration” and selected “upload image.”
  • I named the image (AIX_7200-03-01-1838_flash.iso was the latest available as of this writing), changed the image type to ISO.
  • Then I chose a storage container for the image and provided the image source.

That last one is a nice touch, by the way. Rather than download to your machine and then upload to the cluster and use that as your source, you can provide a URL and Nutanix will download the file directly from the source for you. I selected the correct .iso image from the IBM Entitled Software Support (ESS) site, and rather than using the download director, I selected the “click here to use http” option. This provided the link from IBM’s site to the .iso image to feed to Nutanix.

With my image on the server, I was ready to boot from it. At last check, these files were available from ESS:

  • ISO, AIX v7.2 Install DVD 1 (TL 7200-03-00-1837 9/2018)
  • ISO, AIX V7.2 Install DVD 2 (TL 7200-03-00-1837 9/2018)
  • ISO, AIX v7.2 Install flash (TL 7200-03-01-1838 9/2018)
  • GZ, AIX v7.2 Cloudrdy Virtual Mach Image TL 7200-03-01-1838, (9/2018)

Since DVD 1 is a space-saving .ZIP file, I initially downloaded that. It turns out though the system can’t process .ZIPs, so I instead went with the install flash .iso image. Of course I could have downloaded DVD 1 to my workstation, done the unzip there and then uploaded it, but that would be self-defeating. The idea is to download directly from IBM.

To continue testing, I created a test virtual machine and gave it CPU and memory. Then when I got down to the disks, I selected the virtual CD, told it I wanted to clone from the image service, gave it my AIX v7.2 install flash .iso image, and clicked on update. I added an additional virtual disk to be my hdisk0 in AIX, added in a virtual NIC, and saved the profile.

At this point I powered on my VM and got two options for consoles: a VNC and a COM1. The VNC console allows you to interact with OpenFirmware; COM1 is a traditional serial console.

One thing I’ve yet to figure out is how to display LED codes in the VM table display in Prism. But that just gives me more to look forward to as I continue working with these clusters.

Anyway, my VNC console showed that the VM had booted, while my COM1 console was blank. I entered 1 and my console started to display LED codes. I soon got to my familiar screen where I was prompted to press 1 to install in English.

There was my normal base operating system install and maintenance screen where I could press 1 (to start install with default settings) or 2 (to change/show install settings and install). I entered 2, and wouldn’t you know, it couldn’t detect the Nutanix disk I’d assigned to install the OS.

Luckily support was aware of this issue and had a procedure ready. I needed to go back into the previous welcome to base operating system installation and maintenance screen and follow these instructions:

3 Start Maintenance Mode for System Recovery
3 Access Advanced Maintenance Functions
>>> 0 Enter the Limited Function Maintenance Shell
$ cfgmgr (errors are expected – many devices are not yet available to be configured)
$ exit
99 (Return to previous menu)
5 Select Storage Adapters
>>> 1 scsi0      qemu_vhost-user-scsi-pci:0000:00:02.0
2 Change/Show Installation Settings and Install
1 Disk(s) where you want to install ……
1 hdisk0    qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0
>>> 0  Continue with choices indicated above

After doing this, the disk I’d assigned to the VM appeared and I was able to install AIX to it as expected. Interestingly, I was getting LED codes to my console during the install, but otherwise everything looked the same as any other AIX install from .iso.

Once I got AIX installed, I went ahead and set it up as a NIM server, as I also wanted to test network boot. This too went as expected. The main difference came in how the client is booted from the NIM server. I followed these directions, and after I’d configured my NIM server and created a VM to attempt to boot from it, I powered it on and opened a VNC console. As found in the instructions, here’s the necessary syntax:

To boot the client from the network install manager (NIM) master at the OpenFirmware prompt, use the following command template:
0> boot <NIC-device>:<nim-server-ip>,<\path\to\client\bootfile>,<clientip>,<gateway-ip>

Further in the document, there’s an example:

The following commands boot the client VM from the network install manager (NIM) master at the OpenFirmware prompt:
0> boot net:9.3.94.78,\tftpboot\client-vm.ibm.com,9.3.94.217,9.3.94.1

This worked as expected, and I was able to boot over the network. Unless you have a flat network, I recommend having your NIM server on the Nutanix cluster you’re booting from. As the document states:

“If you are using a static IP address for the client virtual machine (VM), the client and server must be on the same subnet when booting a VM across the network. You cannot specify a subnet mask in the boot command as shown in the following example.”

I took a mksysb to my NIM server and installed a different VM from the mksysb image. Again, everything worked exactly as expected.

One small annoyance was that the COM1 consoles wouldn’t survive power off/power on of the virtual machine, although you could probably get around that by logging into a controller VM and opening a console that way.

As I learn more I’ll be sure to share it. Feel free to tell me about any Nutanix cluster specifics you’d like to read about.

Porting to POWER9

Edit: Have you tried doing the same thing?

Originally posted November 6, 2018 on AIXchange

Linux runs on everything from embedded devices to mainframes. So why should we care that Linux runs on IBM Power Systems servers? Many developers and users are perfectly happy running Linux applications on x86, since that’s the environment they know. However, lack of awareness of alternatives is another factor. In the case of Power Systems, some believe it can be difficult to move an application from x86 to POWER. Of course we know that this is unfounded?and in fact with the relatively recent change from the big endian to little endian format, moving to POWER has never been simpler.

With this in mind, Philippe Hermès recently tweeted this information from French consulting firm ANEO, which is porting some of its applications to POWER9 systems:

In partnership with IBM, ANEO has started porting some applications on IBM latest POWER9 systems. The Power architecture (and the POWER9 processor in particular) is optimized for high memory bandwidth and better performance for applications that require frequent data access.

Memory bandwidth is a technical feature that is not very emphasized in hardware specifications, yet it is often the main performance bottleneck in today’s applications.
One of the codes that have been ported on Power is SeWaS (Seismic Wave Simulator), a modern and optimized simulation software developed by ANEO.

The two goals of this study were to assess performance and difficulty of porting an application on Power.

DIFFICULTY OF PORTING:
The Power architecture uses a specific CPU instruction set, which requires recompiling applications and their dependencies. IBM claims, however, that “95% of Linux x86 applications written in C/C++ are ported on Power without having to change the source code.”

In our case with SeWaS the porting was surprisingly easy. We simply ran the exact same installation script that we usually run on Intel processors and everything worked as expected, making transparent that it was being compiled for a different architecture.

In particular, IBM provides a free software suite named Advance Toolchain, containing most of the common HPC libraries optimized for Power (Boost…) as well as the GCC 7 compiler, which proved very convenient.

PERFORMANCE:
The benchmark was done on virtual machines provided by IBM with only 2 physical cores, which is not a very representative sample of performance. Though, performance measured on these 2 cores is very promising, and it is clear at least that the application was correctly optimized for the Power architecture, even though a very generic installation script was used.

OPTIMIZATIONS / FUTURE DEVELOPMENTS:
Aneo will be doing further benchmarks on Power in the future, especially on systems with POWER9 + NVidia GPUs, from which a much greater performance difference is to expect (usually 5 or 10 times better performance compared to regular CPU machines).
One of the main advantages of POWER9 is enhanced support of accelerators (FPGA, NVidia GPUs), with technologies such as CAPI and NVlink for higher bandwidth, from which seismical applications benefit a lot.

With the memory bandwidth and performance improvements that can be expected from a simple recompile, developers should find it worth their time to at least investigate running their applications on POWER. But even if you’re not developing or recompiling anything at all, this is nonetheless a good reminder of how the enhanced Power Systems architecture benefits your own applications.

A Change to the SMT Mode Default in POWER9

Edit: Did you notice any issues with this change?

Originally posted October 30, 2018 on AIXchange

There’s a rather significant change with the default SMT mode in AIX 7.2 TL3 running on POWER9 servers:

“For POWER9 technology-based servers, the default SMT setting for AIX 7.2 TL 3 has been changed to SMT8 to provide the best out-of-the-box performance experience. For POWER8 technology-based servers, the default SMT setting remains SMT4.”The first thing to understand is that this is a welcome change. IBM has found that running more threads benefits most POWER9 workloads. Naturally any system will perform differently at SMT-8 than SMT-4, so awareness is the key here. Administrators like to know what to expect from their operating system, and they can ill afford to be the last to know how the system is performing. You never want users alerting you to a change in performance.

Of course, if the old setting works best in your environment, you can adjust the SMT level post-upgrade by running the smtctl command:

Each individual Simultaneous Multi-threading (SMT) thread of a physical processor core is treated as an independent logical processor by AIX. The AIX operating system limits the combination of physical processor cores assigned and SMT modes in order to maintain symmetry across all of the physical processor cores assigned to AIX.

When booting a P8 Logical Partition (LPAR), the default number of SMT thread is 4. To increase the default number of SMT threads dynamically, enter:

    smtctl -m on
    smtctl -t 8

The change to SMT-8 is effective immediately and reboot is not required. If you want the setting to persist after rebooting, then you must rebuild the boot image with the bosboot command. The default SMT-4 is intended for better performance for an existing applications that are not designed or compiled for more than 4 threads.

While this information deals with POWER8 and upping the default, you get the idea.

If you’re moving to POWER9 hardware and AIX 7.2 TL3 in the near future, be sure to keep this change in mind.

HMC Enhanced GUI: A Cautionary Tale

Edit: Be careful

Originally posted October 23, 2018 on AIXchange

Just in time for Halloween, here’s a scary story involving the HMC enhanced GUI version and an inexperienced user.

As I understand it, an administrator was using the enhanced GUI to mount an .iso image that was stored in the organization’s virtual media repository. The admin selected virtual storage. Then this individual selected a VIO server and clicked on Action/Manage Virtual Storage. This displays a window that says the VIO server is being queried. The window has multiple tabs, including virtual disks, storage pools, physical volumes, optical devices and virtual fibre channel.

At this point, the admin should have selected optical devices, which allows you to manage virtual optical media. Instead, the virtual fibre channel tab was selected; this brings up fcs devices. A device was chosen, and then the admin opted to modify partition connections. Now, if you’re following along in your own HMC, be careful. The default is that all assigned connections are checkmarked, and there’s a button that forces connection removal from running partitions. If you select that and click OK, all of the checked mappings are removed. It’s a dynamic LPAR operation and everything is wiped.

And that’s what happened. The admin for some reason ignored the warning message, and all of the NPIV mappings were removed from the VIO server. The adapter information was still in the saved profile, but the mappings were gone from the running profile. Fortunately this organization had dual VIO servers, so the client LPARs weren’t affected, but it was a chore to recreate all of the mappings on that particular VIO server. (Given the lack of a change window, rebooting the VIO server wasn’t an option.)

If you ever find yourself in this situation, you may be able to retrieve your mappings by shutting down the VIO server and restarting from the saved profile. But make sure you can rebuild the mappings from your virtual to physical adapters if necessary. Know which virtual adapters are mapped to which physical adapters, and keep the additional critical information that’s needed to recreate your environment. Know the corresponding WWN numbers. Hopefully you’re running hmcscanner regularly, and you should be backing up your VIO configs and VIO servers.

There’s good logging on the new HMC code, which was helpful in this case. We were able to identify the user and the commands that were run.

In short, be careful. The enhanced GUI is still fairly new. Take the time to get used to it.

Restricting Access to the AIX Error Report

Edit: Have you found a use for this in your environment?

Originally posted October 16, 2018 on AIXchange

Awhile back on Twitter, Chris Gibson noted that, starting with AIX 7.2 TL3, administrators will be able to prevent non-privileged users from viewing the AIX error report.

IBM Support has the details:

The restriction can be enabled or disabled by system administrator using “/usr/lib/errdemon -R enable” and “/usr/lib/errdemon -R disable.” By default the restriction is disabled.

When the restriction is disabled, any user can view system error report.
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DE84C4DB   0711092118 I O ConfigRM     IBM.ConfigRM daemon has started.
69350832   0711091818 T S SYSPROC        SYSTEM SHUTDOWN BY USER
9DBCFDEE   0711091918 T O errdemon       ERROR LOGGING TURNED ON

To enable the restriction
(0) root @ spruce1:/
# /usr/lib/errdemon -R enable

(0) root @ spruce1:/
# /usr/lib/errdemon -l

Error Log Attributes
——————————————–
Log File                 /var/adm/ras/errlog
Log Size                1048576 bytes
Memory Buffer Size      32768 bytes
Duplicate Removal       true
Duplicate Interval      10000 milliseconds
Duplicate Error Maximum 1000
PureScale Logging       off
PureScale Logstream     CentralizedRAS/Errlog
Restrict errpt to privileged users      enable

After enabling the restriction, it will prompt error message if a non-authorized users try to view error report.

(0) testuser @ spruce1:/
# errpt
errpt:
        User does not has sufficient authorizations.

How to enable a user to view error report?
Make him a privileged user by assigning authorization aix.ras.error.errpt

(0) root @ spruce1:/
# mkrole authorizations=”aix.ras.error.errpt” role_errpt

(0) root @ spruce1:/
# chuser roles=role_errpt testuser

(0) root @ spruce1:/
# setkst
Successfully updated the Kernel Authorization Table.
Successfully updated the Kernel Role Table.
Successfully updated the Kernel Command Table.
Successfully updated the Kernel Device Table.
Successfully updated the Kernel Object Domain Table.
Successfully updated the Kernel Domains Table.
Successfully updated the Kernel RBAC log level.

Now the normal user “testuser” can execute errpt

(0) testuser @ spruce1:/
# swrole role_errpt
testuser’s Password:

(0) testuser @ spruce1:/
# errpt
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
DE84C4DB    0711092118 I O ConfigRM            IBM.ConfigRM daemon has started.
69350832       0711091818 T S SYSPROC            SYSTEM SHUTDOWN BY USER
9DBCFDEE   0711091918 T O errdemon              ERROR LOGGING TURNED ON

If this applies to your environment, be sure to add this to your build documentation, checklists and gold images once you’ve updated to AIX 7.2 TL3.

New Stencils Available for POWER9 Systems

Edit: Stencils are a must have for documentation

Originally posted October 9, 2018 on AIXchange

Hopefully you saw the news on Twitter, via Alan Fulton and Nicollette McFadden, among others.

Visio stencils are now available for selected IBM Power Systems POWER9 models, including the AC922 and S914.

For anyone who uses Visio, this is welcome news. Having the proper stencils makes it much easier to create diagrams and documentation. The stencils can be downloaded from here.

When you extract and open them in Visio, you’ll find 115 stencils. At the top of this link you can the most recent updates:

24-Sep-2018
IBM-Common.zip
      IBM-Racks.vss – Added 7042-CR7, 7042-CR8, 7042-CR9 and 7063-CR1 HMC front and rear views
      IBM-SAN.vss – Added 2145-SV1 SVC front and rear views
IBM-Server-Power.zip
      IBM-Server-Power.vss – Added Power S821LC, S822LC (4 models), AC922 and S914 Tower front and rear views
IBM-Tape.zip
      IBM-Storage-Tape.vss – Added TS4300 Base and Expansion Library front and rear views and LTO drives
Stencil File updates includes the renaming of many stencil files to remove “System” from the file name (Disk, Tape and Network).

In addition, archives can be found here:

27-Aug-2018
IBM-Server-Power.zip
   IBM-Server-Power.vss – Added Power E850C, E870C and E880C front and rear views

14-Aug-2018
IBM-Server-Power.zip
   IBM-Server-Power.vss – Added Power S914, H922, L922, S922, H924, S924, E950, E980 and EMX0 PCIe Gen3 Exp. front and rear views
IBM-Classic-Full.zip
   IBM-Server-Power-Classic.vss – *New File* – Moved all Power 5xx Server (Power6) shapes to this new file

Do you document you systems in Visio? Many in my circle do, and for these folks, it’s been a lengthy wait since the previous update. Everyone I’ve talked to is very happy about this development.

NIM Management via HTTP

Edit: Still a good option to consider

Originally posted October 2, 2018 on AIXchange

I love NIM. I rely on NIM when I’m doing new server builds, and it’s also my go-to for installing the VIO server.

Chances are you love NIM as well. That said, one thing you might not be fond of is firewalls between your NIM server and NIM clients, which requires you to work with your network team and ask for ports to be opened.

Here’s a breakdown of ports that need to be opened in a firewall for use with NIM:

ProtocolPort(s)
nimsh3901 – 3902
icmp              5813
rsh*                513 – 1023**
rlogin*513
shell*514
bootp  67 – 68
tftp69 and 32,768 – 65,535
nfs2049
mountd32,768 – 65,535 or user’s choice
portmapper111
NIM1058 – 1059

Again, in some environments, getting approval for such extensive access can be a challenge. Fortunately, a potential alternative exists. Read this IBM Knowledge Center doc to determine if using NIM over HTTP can work in your environment:

Network Installation Manager (NIM) supports the installation of AIX updates over the Hypertext Transfer Protocol Secure (HTTP) protocol to conform to the emerging data center policies that restrict the use of network file server (NFS).

AIX BOS installation still requires the use of the NFS version 3 protocol or the more secure NFS version 4 protocol. In addition to the installation of filesets, NIM customization processes such as script execution and copying the file_res directory are supported over the HTTP protocol.

The HTTP protocol provides the following advantages for NIM management:

  • All communication occur over a single HTTP port. Hence, the authorization through a firewall is easier to manage.
  • AIX installation steps are driven from the client’s end, that is, the target system of the installation. Therefore, remote access is not required for running the commands.
  • NIM or any other products that currently use the client-server model of NFS can easily use HTTP.
  • (The capability) to extend the end product to support additional protocols.

AIX 7.2.0 ships a new service handler that provides HTTP access to NIM resources. The nimhttp service is defined in the /etc/services and the nimhttp daemon, which listen for requests over the 4901 port. When the nimhttp service is active, NIM clients attempt to access the /etc/services file and request customization of the scripts that are defined in the nimhttp service. If HTTP access fails or if the access is denied, access failover attempt to the NFS client occurs.

Were you aware of this option? Have you used it before?

What’s in Store for Memory

Edit: The more things change

Originally posted September 25, 2018 on AIXchange

In August there was an event called Hot Chips 30. A long-running conference for the semiconductor industry, Hot Chips is the place to learn about high-performance microprocessors and related topics like system memory. Here are a couple of interesting articles that came out of the conference that look at memory and where it’s headed in the near future.

This is from HPCwire:

Having launched both the scale-out and scale-up POWER9 [servers], IBM is now working on a third P9 variant with “advanced I/O,” featuring IBM’s 25 GT/s PowerAXON signaling technology with upgraded OpenCAPI and NVLink protocols, and a new open standard for buffered memory….

“The PowerAXON concept gives us a lot of flexibility,” said [IBM Power architect Jeff] Stuecheli. “One chip can be deployed to be a big SMP, it can be deployed to talk to lots of GPUs, it can talk to a mix of FPGAs and GPUs – that’s really our goal here is to build a processor that can then be customized toward these domain specific applications.”

The article concludes with this:

The roadmap shows the anticipated increase in memory bandwidth owing to the new memory system. Where the POWER9 SU chip offers 210 GB/s of memory bandwidth (and Stuecheli says it’s actually closer to 230 GB/s), the next POWER9 derivative chip, with the new memory technology, will be capable of deploying 350 GB/s per socket of bandwidth, according to Stuecheli.

“If you’re in HPC and disappointed in your bytes-per-flop ratio, that’s a pretty big improvement,” he said, adding “we’re taking what was essentially the POWER10 memory subsystem and implementing that in POWER9.” With Power10 bringing in DDR5, IBM expects to surpass 435 GB/s sustained memory bandwidth….

It’s an odd statement of direction, but maybe a visionary one, essentially saying a processor isn’t about computation per se, but rather it’s about feeding data to other computational elements.

This piece from top500.org says IBM is aiming to take memory in a new direction:

The memory buffering adds about 10ns of latency to memory accesses compared to a direct DDR hookup, but the tradeoff for more bandwidth and capacity is worth it for these extra-fat servers. And although the Centaur buffered memory implementation still uses DDR memory chips as the storage media, this no longer really needs to be the case since the DDR smarts have moved off the chip.

IBM plans to generalize this memory interface, which will be known as OpenCAPI memory, in their next version of the POWER9 processor that is scheduled to be launched in 2019. As far as we can tell, these upcoming POWER9 chips will be suitable for two-socket HPC servers, as well as mainstream systems. IBM is projecting that its next POWER9 chip will support over 350 GB/sec of memory bandwidth per socket, which is more than twice the speed of today’s fastest chips for two-socket servers. The company also intends to reduce the latency penalty to around 5ns in its first go-around.

Perhaps the bigger news here is that OpenCAPI memory will be proposed as an open standard for the entire industry. The use of the OpenCAPI brand is intentional, since IBM wants to do for memory devices, what the original OpenCAPI was designed to do for I/O devices, namely level the playing field. In this case, the idea is to enable any processor to talk to any type of memory via conventional SerDes links. As a result, CPUs, GPUs, or FPGAs would no longer need to be locked into DDR, GDDR, or any other type of memory technology. So, for example, a chip could use the interface to connect to traditional DDR-type DIMMs, storage-class memory based on NAND or 3D XPoint, or some other type of specialized memory.

Many times we are focused on what we can buy and deploy right now. But if you want to see where things are headed, read these and other articles from the conference at the Hot Chips site.

An Important Reminder about VIOS

Edit: Hopefully you have upgraded by now

Originally posted September 18, 2018 on AIXchange

Nigel Griffiths recently tweeted this reminder:

In Q4 2018 VIOS 2.2.4 will fall out of regular support
In Q4 only 2.2.5.*, 2.2.6.* and 3.1 will be supported
If on 2.2.4 or older upgrade to 2.2.6 NOW
I encourage every one to plan: Testing VIOS 3.1 in Q4 + upgrade in Q1
VIOS 3.1=has many good features + RAS

We have some work to do, ready or not.

Sometimes it’s easy to ignore your systems, because they just run. That is, they run until they stop running. At that point, you call support and learn that, while IBM fixed this issue ages ago, you can’t realize this benefit until you update your firmware, VIO server, AIX levels, etc.

Yes, our systems just run, but they still require maintenance. Hopefully you have a regular patch cycle and your systems are up to date.

But specifically with VIOS, it’s time to take action. Check this lifecycle information. What level are you on? Are you supported by IBM? Will you be supported after the fourth quarter of 2018? Start testing VIOS 3.1 as it becomes available (details are here and here) and plan for your change windows now. Remember to look at the maintenance strategy, and if you’re wondering which software versions you should be running, use FLRT.

With the recent announcement of POWER9 systems, it’s critical that the software you run can take full advantage of the new hardware.

LPM Copy Time Stats

Edit: I am always looking for other perspectives

Originally posted September 11, 2018 on AIXchange

Newsflash: I’m not the only person out there writing about AIX and related topics. I also understand that this is a good thing, for all of us. I know l like reading about this stuff as much as I like writing about it. With that in mind, I recently discovered Stephen Diwell’s blog. Stephen is an IBM Champion who discusses AIX performance, the VIO server, the HMC, PowerHA and much more. He doesn’t post often, but there’s lots of useful technical content, including this on Live Partition Mobility (LPM) copy time stats:

VIO Servers have the statistics on LPM copy times and the amount of data copied. This is stored in an alog file that only root can read, given that padmin (or like users) do not have access to alog command.

Login to your VIO Server.

You need to be root:
oem_setup_env

Read the LPM alog file:
alog -of /home/ios/logs/LPM/lpm.log | more

Example Output:
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1575] Final migration statistics:
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1620] Overall time of migration: 3 minutes 43 seconds
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1641] Amount of data received: 161695465991 bytes
[0 12779544 04/22/18-19:44:50 migmover.c 1.92 1645] Effective network throughput: 725091775 bytes/s

Short. Sweet. Useful.

Also read about Power Systems firmware updates and tracking CPU usage. Here are some of his other posts:

  • IBM HMC Upgrades
  • Optimizing Power with Affinity Groups
  • AIX NIM Server Tuning
  • p7 vs p8 Real World Core Reductions
  • p7 Core cost from poor LPAR Affinity
  • AIX Dynamic LPAR Name Changes
  • PowerVM LPM with a Dead VIO Server
  • AIX or VIOS Errors: 29FA8C20 and 7BFEEA1F

Take the time to read them all. Are there other AIX resources you recommend?

A Discussion of Software Security

Edit: Some links no longer work

Originally posted September 4, 2018 on AIXchange

Containers or virtual machines–which provides greater security? IBM Research attempted to answer this question, as explained in this recent article:

Are virtual machines (VM) more secure than containers? You may think you know the answer, but IBM Research has found containers can be as secure, or more secure, than VMs.

James Bottomley, an IBM Research Distinguished Engineer and top Linux kernel developer, writes:

“One of the biggest problems with the current debate about Container vs Hypervisor security is that no one has actually developed a way of measuring security, so the debate is all in qualitative terms (hypervisors ‘feel’ more secure than containers because of the interface breadth) but no one actually has done a quantitative comparison.”

To meet this need, Bottomley created Horizontal Attack Profile (HAP), designed to describe system security in a way that it can be objectively measured. Bottomley has discovered that “a Docker container with a well crafted seccomp profile (which blocks unexpected system calls) provides roughly equivalent security to a hypervisor.”

He performed these tests with Docker, Google’s gVisor, a container runtime sandbox; gVisor-kvm, the same container sandbox using the KVM, Linux’s built-in VM hypervisor; Kata Containers, an open-source lightweight VM; and Nabla, IBM’s just released container type, which is designed for strong server isolation.

Bottomley’s work is only the start. He’s shown it’s possible to objectively measure an application’s security. As he said, “I don’t expect this will be the final word in the debate, but by describing how we did it I hope others can develop quantitative measurements as well.

I would need more details, but the article makes it sound like this work was done on x86 with x86 hypervisors. I wonder if the results would be different if the containers ran in Linux on Power with the PowerVM hypervisor:

The PowerVM and Power hardware teams always put security at the center of our designs. Protection of client data is one of the key values of a PowerVM solution. If you ever wondered if your hardware or software are exposed to a security issue, the USA National Institute of Standards and Technology (NIST) maintains a searchable DB of all known vulnerability. Searching for PowerVM or PowerVM Hypervisor will display “There are 0 matching records.” This is because the PowerVM Hypervisor have yet to have a security vulnerability. Searching for other virtualization solutions will list all their known vulnerabilities which you should be sure to address to protect your confidential information. The following blog contains details about how PowerVM provides data isolation between partitions to maintain our perfect security record.

PowerVM takes advantage of the Power hardware to provide high levels of security. The hardware is designed with three different protections domains, Hypervisor domain, Kernel domain and application domain. The hardware limits the instructions that can be executed based on the current protection domain and the hardware provides very specific entry points to transition between domains. If a lower priority domain attempts to issue an instruction reserved for a higher priority domain, the instruction will generate an instruction interrupt within the current domain. The most privileged level is the hypervisor domain which is where the PowerVM security takes place. For example, instructions that change the mapping of partition addresses to physical real addresses, instructions that modify specific hardware registers are restricted such that they are only allowed in hypervisor mode.

The way the hardware has been designed, only the hypervisor is able to access memory via a physical real address. Code running in partitions accesses memory through a layer of indirection where the partitions addresses are actually aliases to the physical real memory. This support is not only leveraged for partition isolation but is leveraged by other virtualization functions on the server.

If we’re talking about IBM Power Systems servers, I would still argue that an LPAR is more secure. What do you think?

Other Options for Volume Group Backups

Edit: Still good to remember

Originally posted August 28, 2018 on AIXchange

How do you backup your volume groups? These days we’re often dealing with snapshots on a SAN, but there are still occasions when you want to backup to a local tape drive, a file, or a NIM server. The specifics depend upon the amount of data and the type of infrastructure we’re dealing with. (Obviously moving terabytes across a 100mb network isn’t the best way to go.) 

When backing up rootvg, you’d typically run mksysb. But how would you backup datavg? The best choice is to run savevg:

The savevg command finds and backs up all files belonging to a specified volume group. The volume group must be varied-on, and the file systems must be mounted. The savevg command uses the data file created by the mkvgdata command.

Note: The savevg command will not generate a bootable tape if the volume group is the root volume group. Although the tape is not bootable, the first three images on the tape are dummy replacements for the images normally found on a bootable tape. The actual system backup is the fourth image.

To restore from this backup, run restvg:

The restvg command restores the user volume group and all its containers and files, as specified in the /tmp/vgdata/vgname/vgname.data file (where vgname is the name of the volume group) contained within the backup image created by the savevg command.

The restvg command restores a user volume group. The bosinstall routine reinstalls the root volume group (rootvg). If the restvg command encounters a rootvg volume group in the backup image, the restvg command exits with an error.

If a yes value has been specified in the EXACT_FIT field of the logical_volume_policy stanza of the /tmp/vgdata/vgname/vgname.data file, the restvg command uses the map files to preserve the placement of the physical partitions for each logical volume. The target disks must be of the same size or larger then the source disks specified in the source_disk_data stanzas of the vgname.data file.

Another option is backing up just your volume group structure, which can be used to recreate volume groups and filesystems. This is hugely beneficial if you’re cloning a system to new disks on a new frame where you want the exact same volume groups and filesystems. Simply use the savevgstruct and restorevgstruct backup commands. On the latter:

The restorevgstruct command restores the structure of a previously saved user volume group. If the -ls flag is specified, a list of previously saved volume groups and the date each volume group was saved is displayed. This command does not work on rootvg.

I understand that these concepts are familiar to many of you, but I’m regularly questioned about these things, so I believe it’s a worthy discussion. Hopefully this provides some clarity.

Vulnerability Checker Provides Security Info

Edit: This is still a useful tool.

Originally posted August 21, 2018 on AIXchange

The FLRT Vulnerability Checker Online (FLRTVC) allows you to check your AIX system for HIPER and Security vulnerabilities:

The Fix Level Recommendation Tool Vulnerability Checker (FLRTVC) online provides security and HIPER (High Impact PERvasive) reports based on the fileset inventory of your system. The report will guide you in discovering vulnerable filesets, the affected versions, interim fixes that are installed, as well as a link to the security bulletin for further action.

FLRTVC exists as a standalone ksh script which may be downloaded from our FLRTVC Script webpage. FLRTVC uses HIPER/Security data from FLRT (aparCSV) to compare against the installed filesets (lslpp -Lcq) and interim fixes (emgr -lv3) to report your risks.

This webpage was developed based on feedback received from customers at Edge2015. We welcome your feedback on this tool and ways to improve it! Please use the Feedback button on the FLRT page or visit the FLRT IBM developerWorks Community. Follow us on Twitter @IBM_FLRT for updates!

Follow the instructions to get started:

FLRTVC Online will accept two input files, lslpp.txt (required) and emgr.txt (optional), that will be cross-examined with the aparCSV that is provided through our website. If any filesets listed in lslpp.txt are found to be within the affected versions listed in aparCSV, they will be displayed in the generated report.

Step 1) Log in to the AIX server that will be checked for vulnerabilities.
Step 2) Run the “lslpp” command: lslpp -Lcq > lslpp.txt
Step 3) (optional) Run the “emgr” command: sudo emgr -lv3 > emgr.txt
Step 4) Move the files to a machine that has an internet browser.
Step 5) Upload the file(s) using the buttons of their respective type.
Step 6) (optional) Filter the filesets using a search term.
Step 7) (optional) Select an APAR type.
Step 8) Click on “Run vulnerability checker” to begin.

If you’d prefer to not run the report interactively, one machine at a time, submitting each one via a web page (and I suspect this applies to most of you), just download the script:

The FLRTVC script works by downloading an apar.csv file from the FLRT website using CURL or WGET, whichever your machine has installed. Then, it uses the commands “emgr -lv3” for interim fixes and “lslpp -Lcq” for installed filesets, and compares to the vulnerabilities reported in the apar.csv file. FLRTVC will report any findings using one of two formats: Compact and Full (verbose). Compact is preferable for scripting purposes, and full reporting is for a more human-readable format that may be piped to an e-mail address.

Please see below for the flags and different usages:

Flags
-d = Change delimiter for compact reporting.
-f = File selection for *.csv file.
-q = Quiet mode, hide compact reporting header.
-s = Skip download, use default apar.csv file.
-v = Verbose, full report (for piping to email).
-g = Grep for filesets with phrase, useful for verbose mode.
-t = Type of APAR [hiper | sec].
-l = Enter a custom LSLPP output file, must match lslpp -Lqc.
-e = Enter a custom EMGR output file, must match emgr -lv3.
-x = Skip EFix processing.
-a = Show all fixed and non-fixed HIPER/Security vulnerabilities.

Examples

Compact Formatting
# /flrtvc.ksh -c

Verbose Formatting
# ./flrtvc.ksh -v

Set a custom CSV file
# ./flrtvc.ksh -f myfile.csv

Report on a specific fileset in verbose mode
# ./flrtvc.ksh -vg printers

Show only hiper results
# ./flrtvc.ksh -t hiper

Custom lslpp and emgr outputs, for reporting on other systems
# ./flrtvc.ksh -l lslpp.txt -e emgr.txt

Grouping flags together
# ./flrtvc.ksh -vf myfile.csv -g printers
# ./flrtvc.ksh -vsg printers

The vulnerability checker delivers valuable information about your systems. Try it for yourself.

A Guide to HMC Access

Edit: The link still works, and it is still a good idea to set up roles.

Originally posted August 14, 2018 on AIXchange

You probably have users in your environment who need access to the Hardware Management Console (HMC), and if so, it’s very likely you want to limit what they can do with this access. The IBM Knowledge Center lays out HMC user roles and other pertinent information in this document that was most recently updated in June:

Each HMC user has an associated task role and a resource role. The task role defines the operations the user can perform. The resource role defines the systems and partitions for performing the tasks. The users may share task or resource roles. The HMC is installed with five predefined task roles. The single predefined resource role allows access to all resources. The operator can add customized task roles, customized resource roles, and customized user IDs. The page includes six tables, though the first table is merely a list of headings for the next four. Those tables cover user roles, IDs, commands and control panel functions. The sixth table is a list of tasks that can only be performed from the command line:

Table 1. HMC task groupings
Table 2. HMC Management tasks, commands, and default user roles
Table 3. Service Management tasks, commands, and default user roles
Table 4. Systems Management tasks, commands, and default user roles
Table 5. Control Panel Functions tasks, commands, and user roles
Table 6. Command line tasks, associated commands, and user roles

Each table covers these default roles and IDs:

Operator (hmcoperator)
Super Administrator (hmcsuperadmin)
Viewer (hmcviewer)
Service Representative (hmcservicerep)

These tables provide a good overview of HMC commands and appropriate default user roles.

The POWER9 System Roll-Out Continues

Edit: Some links no longer work

Originally posted August 7, 2018 on AIXchange

Today IBM is announcing two new POWER9-based enterprise systems: the E950 (9040-MR9) and the E980 (9080-M9S). The E980 is the follow-on to both the E870 and E880, and delivers 1.5X the performance. Rather than have two high-end machines, as was the case in POWER8 (and going back to POWER7 with the 770/780), those systems are collapsed into the E980. As for the E950, with its available cores and memory, it packs quite a punch for a 4-socket server, as you’ll soon see.

These systems use different chips than those that run on the POWER9 scale-out servers that were announced in February. While the scale-out servers support direct-attached memory, these “scale-up” servers support buffered memory attachments with the POWER9 enterprise chip. This results in differences in memory bandwidth: up to 170GB/s peak bandwidth with the scale-out servers compared to 230 GB/s of peak bandwidth on the scale-up servers.

Some Quick Highlights:

E950

  • GA on Aug. 17
  • 8, 10, 11 or 12-core processor options; they will run at speeds up to 3.8 GHz
  • 2-4 processors per system (up to 48 total cores)
  • Up to 4 TB of RAM per processor; up to 16 TB of DDR4 RAM on a 4-processor system.
  • 4U Server that fits in a 19-inch rack.
  • 10 PCIe Gen4 slots and 1 PCIe Gen3 slot that specifically supports the default network card that is used at the factory. These are full-height, half-length slots with blind swap cassettes, meaning you can hot swap your I/O cards.
  • 8 SFF 2.5 SAS bays for your SAS drives. (Note: Because storage adapters aren’t built into the back plane, any storage adapters that run your SAS drives would take up a PCI slot. You have the choice of a single or dual back plane, but keep in mind that the later will take up 2 PCI slots.)
  • 4 NVMe flash drives; this is a great option for local boot of your VIO servers.
  • Three years, 24 x 7 warranty.
  • Supports AIX and Linux; no support for IBM i is planned at this time.

E980

  • 1-2 node system available on Sept. 21; 3-4 node system available on Nov. 16. All MES upgrades from E870/ E870C/E880/E880C available on Nov. 16.
  • 8, 10, 11 or 12-core processor options; they will run at speeds up to 4.0 GHz.
  • 32, 40, 44 or 48 processor cores per node, meaning the 1-2 node system supports up to 96 cores and the 3-4 node system supports up to 192 cores.
  • Up to 16 TB of RAM per node; up to 64 TB of RAM per 4-node system.
  • Modular scalable system: 1-4 5U CECs + 2U control unit.
  • Up to 32 PCIe Gen4 slots on a 4 node system. Low-profile I/O cards, 8 per CEC.
  • Up to 16 PCIe I/O expansion drawers, 4 per CEC.
  • 4 NVMe flash drives per CEC.
  • 1 year 24 x 7 warranty.
  • Supports AIX, Linux and IBM i.

Note: The E980’s 2U system control unit resides at the bottom of the nodes. With the E880, the control unit is the middle. Keep this change in mind as you determine the physical system placement in your rack, particularly if you plan to leave room for future systems growth.

As was the case with the systems that were announced earlier this year, both of the enterprise class systems come with PowerVM Enterprise Edition built in. (You can no longer select PowerVM Standard edition.) With the enterprise edition, you can utilize Live Partition Mobility (LPM) across your environment to quickly and easily move workloads from POWER7/POWER8 servers to POWER9 models. A free 60-day activation can be requested from IBM (Feature Code ELPM), so if your systems are currently running PowerVM standard edition, you still have a way to perform live migrations.

When running LPM between POWER9 systems, you can expect faster partition migrations due to on-chip encryption and compression. In IBM’s migration testing, the before and after results were pretty startling: one test workload that ended up transferring 51 GB in 5 minutes to migrate the LPAR was pared down to only 1 GB of data and 15 seconds for the data transfer when encryption and compression were deployed. Obviously your mileage will vary based on individual workloads and LPAR characteristics.

Both systems support Capacity Upgrade on Demand (CUoD), meaning you can buy extra physical cores and memory and activate them as needed. CUoD takes a lot of the uncertainty out of system planning.

Keep an eye out for AIX 7.2 TL3 running on POWER9; it will now ship with SMT8 enabled instead of SMT4, so going forward you’ll need to pay attention to how your workloads are running after migrations and upgrades. Expect to see sizable performance improvements over POWER7 and POWER8; I’ll share some actual numbers once they come out.

If you’re familiar with what was IBM PowerCare, it is now called the IBM Power to Cloud Reward Program. With the purchase of an enterprise system, you’ll earn credits for services that can be redeemed for various on-site IBM Lab Services offerings.

Speaking of cloud, these systems come with cloud management console (CMC) entitlements for three years.

You’ll also be able to install and use PowerVC 1.4.1.

Finally, note these levels of firmware, HMC, VIOS, AIX, IBM i and Linux versions that you’ll need to be running:

  • Firmware level FW920.10 (available in third quarter)/FW920.20 (4Q).
  • HMC code level V9R1.920.
  • VIOS 2.2.6.31 (3Q)/VIOS 2.2.6.32 & 3.1.0 (4Q).
  • AIX 7.2 TL2.
  • AIX 7.2 TL1 (P8 compatibility mode).
  • AIX 7.1 TL4, TL5 (P8 compatibility mode).
  • AIX 6.1 TL9 (P7 compatibility mode).
  • IBM i 7.3 TR5.
  • IBM i 7.2 TR9.
  • Ubuntu 16.04.4 (P8 compatibility mode).
  • RedHat RHEL 7.5 LE (P8 compatibility mode.)
  • RedHat RHEL 7.6 LE (4Q).
  • SuSE SLES 11 SP4 (P8 compatibility mode).
  • SuSE SLES 12 SP3.
  • SuSE SLES 15.

As always, IBM Power Systems deliver on performance―not to mention scalability, reliability, serviceability, agility and flexibility. I look forward to getting my hands on these systems.

Techspeak Explained

Edit: The links no longer work which is a real shame.

Originally posted July 31, 2018 on AIXchange

You don’t need me to tell you that there are a lot of acronyms in tech. But it never hurts to be reminded that more and more workers are entering the world of AIX and IBM Power Systems from non-UNIX and/or non-IBM backgrounds. As a consultant, I regularly meet people who are new to IBM systems and unfamiliar with many IBM-specific terms–e.g., PMR, APAR, NIM, WPAR, VPD and SEA–that we take for granted.

Luckily IBM maintains this index of terms and definitions. Most are specific to IBM software and hardware products, but there are also general computing terms.

Let’s try it, shall we? Check V, and you’ll see that VPD is vital product data. DDM has two meanings (here and here), one of which is “a field-replaceable unit (FRU) that consists of a single disk drive and its associated packaging.”

Admit it: You’re wondering what an FRU is now, aren’t you? Go here.

This is a valuable resource for anyone who’s new to IBM technology and needs help translating from IBM to English.

Making a PowerVC Proxy

Edit: The link no longer works.

Originally posted July 24, 2018 on AIXchange

As I’ve noted numerous times, Twitter is a great resource for anyone who wants to learn about what’s new in the world of AIX and IBM Power Systems.

Case in point, Chris Gibson (@cgibbo) pointed to this article on setting up an HTTP proxy on PowerVC:

Have you ever struggled to give your end users access to the PowerVC UI, but don’t want to give them real access to the PowerVC host? For example, I’ve seen a few scenarios recently where we want to make PowerVC UI publicly available, but still need PowerVC itself sitting on an internal private network with connections to the private management infrastructure. There are a number of ways you can go about doing this with port forwarding, iptables rules, etc. But perhaps the easiest way to do this is to set up a very simple light-weight HTTP proxy with NGINX.

Install nginx. On Ubuntu/Debian, simply run: sudo apt install nginx. On Redhat, run: sudo yum install nginx

nginx should start automatically. If not, run: sudo systemctl start nginx

Remove the default config file: sudo rm /etc/nginx/sites-enabled/default

Install ssl-cert. This will allow automatic generation of self-signed ssl certificates:  sudo apt install ssl-cert or sudo yum install ssl-cert.

Add the following configuration file, modifying the 10.0.0.10 IP address to match that of your PowerVC server (paste this entire entry into a bash shell)

Finally, restart nginx (sudo systemctl restart nginx), and point your web browser to http://X.X.X.9 and you should see the PowerVC GUI.

Read the whole thing for the actual code and a detailed explanation. Also be sure to check out this specific information on NGINX, which is linked at the end of the document.

My Blogging Anniversary

Edit: I will keep writing if you will keep reading.

Originally posted July 17, 2018 on AIXchange

What were you doing 11 years ago? I was living in another state and working for a different company. Since then I moved back home to Arizona. I lost weight, got more active, and spent more time outdoors camping, backpacking, hiking and going on scuba trips. 

Eleven years ago I had small children at home. Now one son is in the military, and the other is about to start his sophomore year of high school. Before I know it, he too will be grown and out of the house and I’ll be an empty nester. I’m sure many of you are on similar paths. 

Eleven years ago was also when I was asked to start writing AIXchange. Fifty weeks a year for 11 years (we take time off over the holidays and the week of July 4), I’ve written about AIX and IBM Power Systems servers, and many, many other technology-related topics

The constant search to find interesting topics to cover in this blog helps me as an IT professional. These duties keep me focused on what’s current in tech. I especially love researching IBM announcements and learning about new technology before it becomes common knowledge. 

The time I’ve put into this blog has paid off in unexpected ways. I’m sure it’s a big reason why I’ve been an IBM Champion. And being part of the IBM Champion community, which allows me to directly interact with those involved with IBM announcements, has benefited this blog. I’m better informed and more capable of explaining and analyzing AIX and IBM Power Systems technology. 

Occasionally I’ll be using Google to solve a problem and see one of my old posts come up in the results. It’s happened more than once. That’s something else I didn’t expect. It’s a good thing I write this stuff down rather than having to reinvent the wheel. 

Many of my readers have kindly made suggestions as to what they want to see covered. They’ve sent tips and tricks and scripts and links to presentations. I’m always happy to highlight what others are doing, and share their knowledge. 

Eleven years ago the POWER6 processors were announced. Version 6.1 of AIX was released late in 2007. Computers and networks and disks have of course gotten much faster since then, but the parameters of our jobs haven’t changed all that much. We’re still needed to care for our systems and keep them running. My efforts here won’t change, either. I’ll keep learning and keep writing. AIXchange will continue to provide a window into IBM products and other technologies, reflecting what’s new, what’s changing and what’s going away. And rest assured, I’ll keep using my Model M keyboard.

Returning to AIX

Edit: Some links no longer work

Originally posted July 10, 2018 on AIXchange

Recently I received this email:

It’s been a number of years since I have administered AIX. I was on AIX before 5L. (Was there a version 5?) It may have been v4.

I am going to update my skill set on AIX, since I see it a lot out there in the wild.

What are the deltas between 4 -> 5?? -> 5L -> 6 -> 7 <== would I even recognize AIX anymore

What would be the best way to update my skills on AIX?

I thought about it and replied, but then I realized that this individual can’t be the only person in our profession who’s ever switched jobs and/or been tasked with supporting different operating systems. There must be a number of admins who’ve worked on AIX, moved to a different opportunity in IT, and then found themselves managing AIX systems again. With that in mind, I thought I’d share the gist of my response here.

If you haven’t worked with AIX lately, know that a lot of what you’re familiar with is still there. For instance, smitty is still a valid way to manage your systems. While the logical volume manager (LVM) has evolved from JFS to JFS2, it should look and feel familiar. That’s true for much of AIX and its related capabilities. As far as ways to update your AIX skills, here are some places to start:

  • irc–Get on the ##aix channel and ask questions. Now, you shouldn’t expect immediate answers as most of the users have day jobs, but be patient; you should eventually get a response. You can also post questions on Reddit or in the AIX forum.
  • Nigel Griffiths has a series of YouTube videos.
  • The IBM Power Systems technical webinar series (previously known as the AIX Virtual User Group) conducts monthly presentations. Dig into the replays whenever you have time.
  • Get hands-on–Even if you don’t have access to a lab machine at work, you can still get on a system. Used systems are sometimes available on eBay. Or you could get AIX running on Nutanix.

Of course many other AIX/IBM Power Systems resources are out there. Please make your own suggestions in comments.

AIX Implementation Best Practices Updated for POWER9

Edit: One of my go-to reference guides

Originally posted June 26, 2018 on AIXchange

An updated version of AIX implementation best practices for commercial workloads was released in May. This should not be confused with the POWER9 performance best practices document I referenced three weeks ago. In this case, I’m talking about the latest in Fredrik Lundholm’s popular series of presentations, which I previously wrote about here

Here’s Fredrik’s introduction to his current presentation:

Dear Power Team,

It is that time of year to renew the best practices for the spring and POWER9 implementations. Please read and enjoy! As always please share any comments or requests for improvement with me.

In next version I am planning to include a section on VIOS rules and how they complement the best practices.

I’ve included a section on rootvg failure monitoring in PowerHA donated by Chris Gibson… .

On slide 3 you can see the changelog since the last time I wrote about it.

Changes for 1.20:
Rootvg failure monitoring in PowerHA 7.2, Default Processor mode,

Changes for 1.19:
2018 Apr Update, POWER9 enablement, Spectrum Scale 4.2 certified with Oracle RAC 12c,

Changes for 1.18:
2017 Sep Update,, new AIX default multipathing for SVC

Changes for 1.17:
2017 Update, VIOS 2.2.5, poll_uplink clarification (edit)

The reminder from page 4
This presentation describes the expected best practices implementation and documentation guidelines for Power Systems with AIX. These should be considered mandatory procedures for virtualized Power servers.

The overall goal is to combine simplicity with flexibility. This is key to achieve the best possible total system availability with adequate performance over time.

While this presentation lists the expected best practices, all customer engagements are unique. It is acceptable to adapt and make implementation deviations after a mandatory review with the responsible architect (not only engaging the customer) and properly documenting these:

General Design Principles for Power System implementations (page 6)
System and PowerVM Setup recommendations (page 15)
AIX Setup recommendations (page 28)
PowerHA (page 38)
Linux/IBM i (page 43)
FAQ (page 44)
Reference Slides: Procedures for older AIX/VIOS releases (page 49)

Fredrik has developed a good following over the years, and it’s easy to see why. If you’ve not checked out his previous presentations, take the time to go through this.

What are your resource needs? You’ll know when you know

Edit: Some links no longer work

Originally posted June 20, 2018 on AIXchange

A few weeks ago I came across this great exchange in the AIX forum:

How do I determine the resources needed based on volume of transactions. By resources I mean, the cores, memory etc. Is there a way to arrive at that value?

The reply took the form of an analogy:

This question is about the same as “how much petrol does it take to go 100 miles”–without any specification of details it cannot be answered. In the above version: a bicycle would need no petrol at all, a car maybe 10 [liters] and a tank perhaps 200L of diesel. In your question: it depends on the transactions, the type of processor, the database used, the amount of memory, etc., etc….

In addition there are no fixed values for this, a lot of these estimations are done on experience. So, without you telling us more about your requirements we can’t answer your question, not even with a rough estimation.

As Nigel Griffiths notes in this IBM developerWorks post, basic common sense is a useful guide in these matters:

Trick 2: Don’t worry about the tea bags!
No one calculates the number of teabags they need per year. In my house, we just have some in reserve and monitor the use of tea bags and then purchase more when needed. Likewise, start with a sensible VIOS resources and monitor the situation.

Can this sort of thinking apply to our LPARs? Until we start running a given workload, we may not know how much memory and CPU we’ll ultimately need. Luckily, POWER-based systems are very forgiving in this regard. If some spare memory and CPU is available on our machines, we can (provided our profiles are set correctly) add or remove CPU and memory with a quick dynamic LPAR operation. As we monitor our workloads and tweak our resource allocations, we can arrive at a satisfactory answer with minimal effort.

Here’s the same AIX forum member making a similar analogy back in 2013:

A simple comparison of the difference between performance and speed can be described with this analogy: We have a Ferrari, a large truck, and a Land Rover. Which is fastest? Most people would say the Ferrari, because it can travel at over 300 [kilometers per hour]. But suppose you’re driving deep in the country with narrow, windy, bumpy roads? The Ferrari’s speed would be reduced to near zero. So, the Land Rover would be the fastest, as it can handle this terrain with relative ease, at near the 100kph limit. Right? But, suppose, then, that we have a 10-tonne truck which can travel at barely 60kph along these roads? If each of these vehicles are carrying cargo, it seems clear that the truck can carry many times more the cargo of the Ferrari and the Land Rover combined. So again: which is the “fastest”? It depends on the purpose (amount of cargo to transport) and environment (streets to go). This is the difference between “performance” and “speed.” The truck may be the slowest vehicle, but if delivering a lot of cargo is part of the goal it might still be the one finishing the task fastest.

So how do you determine the amount of resources you’ll need? As Nigel says in the previously referenced developerWorks post:

The classic consultant answer is “it depends on what you are doing with Disk & Network I/O” is not very useful to the practical guy that has to size a machine including the VIOS nor the person defining the VIOS partition to install it!

“Watch your workload and adjust as needed” may be wishy-washy advice, but the point is that real-world system workloads are difficult to simulate. While rPerfs and workload estimators can get you pretty far, you’ll inevitably need to make adjustments along the way. And as I said, this is yet another reason to appreciate AIX and IBM Power Systems. This combination is so easy to manage when it comes to adjusting resources and migrating workloads to different hardware as needed.

New Doesn’t Always Mean Improved

Edit: I still miss the keyboard on the Blackberry

Originally posted June 12, 2018 on AIXchange

Awhile back, Dan Kaminsky posed these questions on Twitter:

  • Who asked Slack to shut down their IRC gateway?
  • Who asked Apple to remove the headphone port?
  • Who *are* technical organizations actually listening to? Not asking as an attack. It’s behavior that is happening, with full awareness of unpopularity. What is the source?

I love this sentiment. In fact, I ask these sorts of questions all the time. For instance, who decided that we no longer wanted mechanical keyboards? Why do laptops have trackpads when everyone was cool with the TrackPoint?

It’s a little bit like an automatic transmission versus a stick shift. If you know how to drive a stick, you don’t want an automatic transmission. If you don’t drive a stick shift, you’re not going to buy a car that’s got one.

One of the advantages of a TrackPoint is that your hands don’t have to leave the home row to move the cursor. So, you can type and move the cursor without doing this [mimes a hand shifting between a keyboard and a trackpad].

Plus, your finger doesn’t really have to move, because a TrackPoint is strain-gauged, so it measures pressure. It doesn’t move around like a joystick, it’s measuring pressure. Some people get it and some people don’t; some people acquire the taste. It’s hard to explain, but I still think there’s a use for it.

For the record, mechanical keyboards are still available, though when I started in IT, they were ubiquitous. But again, how do these decisions get made? I assume the desire to cut costs is a foremost consideration in these instances. Maybe there were licensing issues with IBM. Regardless of the reasoning or circumstances though, it sometimes feels like we’re heading backwards and forgetting valuable lessons from the past.

This article from 2007 questions the common perception of user-friendliness:

Graphic User Interface (GUI) is commonly considered to be superior to Text-based User Interface (TUI). This study compares GUI and TUI in an electronic dental record system. Several usability analysis techniques compared the relative effectiveness of a GUI and a TUI. Expert users and novice users were evaluated in time required and steps needed to complete the task. A within-subject design was used to evaluate if the experience with either interface will affect task performance. The results show that the GUI interface was not better than the TUI for expert users. GUI interface was better for novice users. For novice users there was a learning transfer effect from TUI to GUI. This means a user interface is user-friendly or not depending on the mapping between the user interface and tasks. GUI by itself may or may not be better than TUI.

I think you know how this ends up: The only folks using text-based interfaces, CLI and the like, are us, the so-called expert users. For all the non-technical end users in the enterprise, GUI predominates.

I don’t have the answers, but it sure seems like there’s disconnect between those who design and enhance our technology and the consumer base. Maybe it’s a result of corporate cost-cutting, or maybe it’s so marketing teams can point to new features. 

Or maybe it’s generational. The things we take for granted, younger people have no idea how they work. For instance, Slack went down a couple weeks ago. I had to laugh, knowing that irc keeps on running. Then I found this about a hotel that provides an instructional video on using its rotary phones. (Note: You have to be at least 35 to find that sentence astounding.) 

Anyway, I’m sure you can come up with your own examples of changes that didn’t seem helpful or necessary. For all we gain with new technologies, it’s not a perfect trade-off. New doesn’t always mean improved. 

POWER9 Performance Best Practices

Edit: Best practices are always a great place to start.

Originally posted June 5, 2018 on AIXchange

In April, IBM’s Therese Eaton (@tetweetings) noted this availability of this POWER9 performance best practices document.

Along with POWER9 (and POWER8) best practices, there’s instruction on managing AIX updates and upgrading from Version 5.3 to 7.1.

While it’s only a brief checklist, there is important information here:

  • Ensure your firmware is current.
  • Follow the memory plug-in rules.
  • Ensure OS level is current.
  • Evaluate the use of SMT8.
  • Right-size your shared LPARs.
  • Use DPO to optimize partition placement.

Also covered are AIX and VIO server tunables, CPU utilization, VIO server configuration, and virtual Ethernet adapters.

The second page has links to virtualization best practices, rPerf reports, 100G adapter best practices, VIOS sizing, Java performance, VIO server advisor, and IBM Redbooks.

Particularly for those who are new to the platform, these resources can be a big help.

Applying VIOS Rules Post-Install

Edit: Do you make changes to the defaults?

Originally posted May 29, 2018 on AIXchange

Awhile back my colleague Eric Hopkins was installing VIO server 2.2.6.21 when he noticed something new: a reminder to apply rules post-installation:

Virtual I/O Server (VIOS) rules management consists of two rules files. The default rules file contains the critical recommended device rules for VIOS best practice, and the current rules file captures the current VIOS system settings based on the default rules. To deploy the recommended default device settings on a newly installed VIOS, run the rules –o deploy –d command and then restart the system. The default rules are contained in an XML profile, and you cannot modify the default rules.

You can customize rules on VIOS, by using the current rules. The initial current rules are captured from the system by using default rules as a template and then saving them in an XML profile. You can modify the current rules or add new rules. The new rules must be supported on the VIOS level. You can apply the changed current rules to VIOS, for currently discovered and newly discovered device types and instances. You can use the rules command to manage VIOS rules files.

This is what was displayed after logging in following his 2.2.6.21 install: 

================================================

IBM Virtual I/O Server

                      login: padmin

[compat]: 3004-610 You are required to change your password.

        Please choose a new one.

padmin’s New password:

Enter the new password again:

Indicate by selecting the appropriate response below whether you

accept or decline the software maintenance terms and conditions.

Accept (a) |  Decline (d) |  View Terms (v) > a

$ license -accept

  Current system settings are different from the best practice recommendations for a VIOS.

  To view the differences between system and the recommended settings, run the following:

  $rules -o diff -s -d

  To deploy the VIOS recommended default settings, run the following:

  $rules -o deploy -d

  $shutdown -restart

$ rules -o diff -s -d

devParam.disk.fcp.mpioosdisk:reserve_policy device=disk/fcp/mpioosdisk             single_path | no_reserve

devParam.disk.fcp.mpioapdisk:reserve_policy device=disk/fcp/mpioapdisk             single_path | no_reserve

devParam.disk.fcp.nonmpiodisk:reserve_policy device=disk/fcp/nonmpiodisk           single_path | no_reserve

devParam.disk.fcp.aixmpiods8k:reserve_policy device=disk/fcp/aixmpiods8k           single_path | no_reserve

devParam.disk.sas.mpioapdisk:reserve_policy device=disk/sas/mpioapdisk             single_path | no_reserve

devParam.PCM.friend.fcpother:algorithm device=PCM/friend/fcpother                   fail_over | round_robin

devParam.PCM.friend.iscsiother:algorithm device=PCM/friend/iscsiother               fail_over | round_robin

devParam.PCM.friend.otherapdisk:algorithm device=PCM/friend/otherapdisk             fail_over | round_robin

devParam.PCM.friend.sasother:algorithm device=PCM/friend/sasother                   fail_over | round_robin

devParam.PCM.friend.aixmpiods8k:algorithm device=PCM/friend/aixmpiods8k             fail_over | round_robin

devParam.adapter.pseudo.ibm_ech:hash_mode device=adapter/pseudo/ibm_ech              default | src_dst_port

devParam.adapter.pciex.df1000fe:num_cmd_elems device=adapter/pciex/df1000fe                      200 | 1966

devParam.adapter.pciex.df1000fe:max_xfer_size device=adapter/pciex/df1000fe             0x100000 | 0x400000

devParam.adapter.pci.df1023fd:num_cmd_elems device=adapter/pci/df1023fd                          200 | 1966

devParam.adapter.pci.df1023fd:max_xfer_size device=adapter/pci/df1023fd                     0x100000 | 0x400000

devParam.adapter.pciex.771032257710650:num_cmd_elems device=adapter/pciex/771032257710650            500 | 2048

devParam.adapter.pciex.771032257710650:max_xfer_size device=adapter/pciex/771032257710650   0x100000 | 0x400000

devParam.adapter.pciex.77103224:num_cmd_elems device=adapter/pciex/77103224                          200 | 1024

devParam.adapter.pciex.77103224:max_xfer_size device=adapter/pciex/77103224                 0x100000 | 0x400000

devParam.adapter.pciex.df1000f1df1024f:max_xfer_size device=adapter/pciex/df1000f1df1024f   0x100000 | 0x400000

devParam.adapter.pciex.df1000f1df1024f:num_cmd_elems device=adapter/pciex/df1000f1df1024f            500 | 4014

devParam.adapter.pciex.df1000f114108a0:max_xfer_size device=adapter/pciex/df1000f114108a0   0x100000 | 0x400000

devParam.adapter.pciex.df1000f114108a0:num_cmd_elems device=adapter/pciex/df1000f114108a0            500 | 4014

devParam.adapter.pciex.df1000f11410010:num_cmd_elems device=adapter/pciex/df1000f11410010            500 | 4014

devParam.adapter.pciex.df1000f11410010:max_xfer_size device=adapter/pciex/df1000f11410010   0x100000 | 0x400000

devParam.adapter.pciex.771032257710760:max_xfer_size device=adapter/pciex/771032257710760   0x100000 | 0x400000

devParam.adapter.pciex.771032257710760:num_cmd_elems device=adapter/pciex/771032257710760            500 | 2048

devParam.adapter.pciex.771032257710750:max_xfer_size device=adapter/pciex/771032257710750   0x100000 | 0x400000

devParam.adapter.pciex.771032257710750:num_cmd_elems device=adapter/pciex/771032257710750            500 | 2048

devParam.adapter.pciex.771032257710680:max_xfer_size device=adapter/pciex/771032257710680   0x100000 | 0x400000

devParam.adapter.pciex.771032257710680:num_cmd_elems device=adapter/pciex/771032257710680            500 | 2048

devParam.adapter.pciex.771032257710660:max_xfer_size device=adapter/pciex/771032257710660   0x100000 | 0x400000

devParam.adapter.pciex.771032257710660:num_cmd_elems device=adapter/pciex/771032257710660            500 | 2048

devParam.adapter.pciex.7710018077107f0:max_xfer_size device=adapter/pciex/7710018077107f0   0x100000 | 0x400000

devParam.adapter.pciex.7710018077107f0:num_cmd_elems device=adapter/pciex/7710018077107f0            500 | 2048

devParam.adapter.pciex.771001801410af0:max_xfer_size device=adapter/pciex/771001801410af0   0x100000 | 0x400000

devParam.adapter.pciex.771001801410af0:num_cmd_elems device=adapter/pciex/771001801410af0            500 | 2048

devParam.adapter.pciex.df1000e21410f10:max_xfer_size device=adapter/pciex/df1000e21410f10   0x100000 | 0x400000

devParam.adapter.pciex.df1000e21410f10:num_cmd_elems device=adapter/pciex/df1000e21410f10            500 | 4096

devParam.adapter.pciex.df1060e21410100:max_xfer_size device=adapter/pciex/df1060e21410100   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410100:num_cmd_elems device=adapter/pciex/df1060e21410100            500 | 4096

devParam.adapter.pciex.df1060e21410520:max_xfer_size device=adapter/pciex/df1060e21410520   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410520:num_cmd_elems device=adapter/pciex/df1060e21410520            500 | 4096

devParam.adapter.pciex.df1000e2df1002e:max_xfer_size device=adapter/pciex/df1000e2df1002e   0x100000 | 0x400000

devParam.adapter.pciex.df1000e2df1002e:num_cmd_elems device=adapter/pciex/df1000e2df1002e            500 | 4096

devParam.adapter.pciex.df1000e214105e0:max_xfer_size device=adapter/pciex/df1000e214105e0   0x100000 | 0x400000

devParam.adapter.pciex.df1000e214105e0:num_cmd_elems device=adapter/pciex/df1000e214105e0            500 | 4096

devParam.adapter.pciex.df1060e214105f0:max_xfer_size device=adapter/pciex/df1060e214105f0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214105f0:num_cmd_elems device=adapter/pciex/df1060e214105f0            500 | 4096

devParam.adapter.pciex.df1060e21410370:max_xfer_size device=adapter/pciex/df1060e21410370   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410370:num_cmd_elems device=adapter/pciex/df1060e21410370            500 | 4096

devParam.adapter.pciex.df1060e214103a0:max_xfer_size device=adapter/pciex/df1060e214103a0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214103a0:num_cmd_elems device=adapter/pciex/df1060e214103a0            500 | 4096

devParam.adapter.pciex.df1000e2df1082e:max_xfer_size device=adapter/pciex/df1000e2df1082e   0x100000 | 0x400000

devParam.adapter.pciex.df1000e2df1082e:num_cmd_elems device=adapter/pciex/df1000e2df1082e            500 | 4096

devParam.adapter.pciex.df1060e214103e0:max_xfer_size device=adapter/pciex/df1060e214103e0   0x100000 | 0x400000

devParam.adapter.pciex.df1060e214103e0:num_cmd_elems device=adapter/pciex/df1060e214103e0            500 | 4096

devParam.adapter.pciex.df1060e21410410:max_xfer_size device=adapter/pciex/df1060e21410410   0x100000 | 0x400000

devParam.adapter.pciex.df1060e21410410:num_cmd_elems device=adapter/pciex/df1060e21410410            500 | 4096

devParam.adapter.pci.df1080f9:max_xfer_size device=adapter/pci/df1080f9                     0x100000 | 0x400000

devParam.adapter.pci.df1080f9:num_cmd_elems device=adapter/pci/df1080f9                              200 | 2048

devParam.adapter.pci.df1000fd:max_xfer_size device=adapter/pci/df1000fd                     0x100000 | 0x400000

devParam.adapter.pci.df1000fd:num_cmd_elems device=adapter/pci/df1000fd                              200 | 1966

devParam.adapter.pci.df1000fa:max_xfer_size device=adapter/pci/df1000fa                     0x100000 | 0x400000

devParam.adapter.pci.df1000fa:num_cmd_elems device=adapter/pci/df1000fa                              200 | 2048

devParam.adapter.pci.df1000f9:max_xfer_size device=adapter/pci/df1000f9                     0x100000 | 0x400000

devParam.adapter.pci.df1000f9:num_cmd_elems device=adapter/pci/df1000f9                              200 | 2048

devParam.adapter.pci.df1000f7:max_xfer_size device=adapter/pci/df1000f7                     0x100000 | 0x400000

devParam.adapter.pci.df1000f7:num_cmd_elems device=adapter/pci/df1000f7                              200 | 1024

devParam.adapter.pci.77102224:max_xfer_size device=adapter/pci/77102224                     0x100000 | 0x400000

devParam.adapter.pci.77102224:num_cmd_elems device=adapter/pci/77102224                              200 | 1024

devParam.driver.iocb.efscsi:dyntrk device=driver/iocb/efscsi                                           no | yes

devParam.driver.iocb.efscsi:fc_err_recov device=driver/iocb/efscsi                     delayed_fail | fast_fail

devParam.driver.qliocb.qlfscsi:dyntrk device=driver/qliocb/qlfscsi                                     no | yes

devParam.driver.qliocb.qlfscsi:fc_err_recov device=driver/qliocb/qlfscsi               delayed_fail | fast_fail

devParam.driver.qiocb.qfscsi:dyntrk device=driver/qiocb/qfscsi                                         no | yes

devParam.driver.qiocb.qfscsi:fc_err_recov device=driver/qiocb/qfscsi                   delayed_fail | fast_fail

devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan                    2048 | 4096

devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan                     512 | 4096

devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan                   2048 | 4096

devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan                    512 | 4096

devParam.adapter.pciex.77103225141004f:max_xfer_size device=adapter/pciex/77103225141004f   0x100000 | 0x400000

devParam.adapter.pciex.77103225141004f:num_cmd_elems device=adapter/pciex/77103225141004f           1024 | 2048

devParam.adapter.pciex.7710322514101e0:max_xfer_size device=adapter/pciex/7710322514101e0   0x100000 | 0x400000

devParam.adapter.pciex.7710322514101e0:num_cmd_elems device=adapter/pciex/7710322514101e0            500 | 2048

devParam.adapter.pciex.df1000e31410140:max_xfer_size device=adapter/pciex/df1000e31410140   0x100000 | 0x400000

devParam.adapter.pciex.df1000e31410140:num_cmd_elems device=adapter/pciex/df1000e31410140           1024 | 4096

devParam.adapter.pciex.df1000e31410150:max_xfer_size device=adapter/pciex/df1000e31410150   0x100000 | 0x400000

devParam.adapter.pciex.df1000e31410150:num_cmd_elems device=adapter/pciex/df1000e31410150           1024 | 6144

Have you run into this during new VIOS deployments? What you think? Personally, I appreciate the prompt to fix things right away at installation. It’s good to get these tasks out of way as opposed to having to remember to take care of them later in the process.

A Primer on the New Hyperconverged Systems

Edit: A shame this was not adopted

Originally posted May 22, 2018 on AIXchange

So hyperconverged systems running AIX are here, and it’s very cool. If you’re looking for more technical detail, the IBM Knowledge Center provides some practical information that techies will find very interesting. This doc features concepts and recommendations on planning, deploying and installing AIX, network booting and configuring virtual machines. There’s also a section on troubleshooting. Here are some interesting tidbits:

  • AIX cannot determine the number of physical cores in the system and reports a large default value when running on IBM Hyperconverged Systems powered by Nutanix.
  • The system administrator must use the Nutanix PRISM GUI to obtain information about system capacity for capacity planning and software licensing purposes.
  • Nutanix does not support micro-partitioning of CPUs or shared processor pools with entitlement controls found on PowerVM based systems. When the AIX operating system is running in this environment, AIX represents all virtual processors as fully entitled and having capped shared CPUs.
  • The AIX operating system supports virtual I/O Ethernet and SCSI devices (virtio-net and virtio-scsi types) by using the KVM VirtIO virtualization standard that is used in IBM hyperconverged systems. The AIX operating system also supports the CD device (spapr type) used in this environment.
  • Hyperconverged systems use fully virtualized I/O; therefore, workloads that rely on physical I/O are not supported.
  • AIX on IBM Hyperconverged Systems powered by Nutanix supports installations through AIX cloud images and DVD ISO media. This environment also supports installations through traditional methods for network-based installations by using the NIM that is currently supported on PowerVM systems.
  • You can access the AIX console through the PRISM GUI by using the COM1 console connection after the VM has been started. You must use a VNC console connection to interact with the open firmware.
  • If you’re using a static IP address for the client VM, the client and server must be on the same subnet when booting a VM across the network. You cannot specify a subnet mask in the boot command as shown in this example:
    • 0> boot <NIC-device>:<nim-server-ip>,<\path\to\client\bootfile>,<clientip>,<gateway-ip>
    • 0> boot net:9.3.94.78,\tftpboot\client-vm.ibm.com,9.3.94.217,9.3.94.1
  • You must restart the AIX VM when you change the number of CPUs, amount of memory, and after adding or removing a network device or CD-ROM device.
  • AIX supports serial console connections on IBM Hyperconverged Systems. You must choose the COM1 connection while launching a console from PRISM to interact with the AIX operating system.
  • The VNC console connection must be used to interact with open firmware before the AIX operating system is loaded after starting or rebooting a VM.
  • As the VM loads the AIX operating system and software drivers, AIX IPL progress codes are displayed in the COM1 console.
  • AIX does not provide concurrent diagnostics support, including adapter firmware updates, for IBM Hyperconverged Systems powered by Nutanix. The Nutanix product provides support for device diagnostics and firmware updates.

Rest assured, I will continue to find ways to get hands-on with these clusters, and let you know what I learn along the way. I’ve been asked why this is such a big deal, and there’s a simple answer: It’s AIX running on what’s essentially a new and different hypervisor. In short, there’s another way to run our favorite OS.   

Here’s another way to look at it: Different skills are needed to manage AIX and Power systems. You need to learn the HMC and keep up with the changes to the interface. You also have to learn the VIO server and how dual VIO failover works, etc. You have to learn SMS, ASMI and so many other things. Sure, we all understand this stuff, we like working with this stuff, but it is a barrier of entry for new admins.   

Having been hands-on with the Prism interface, I can tell you that it’s far simpler to use than the HMC and VIO server interfaces. Again, that’s nice for us, but when you think of the newcomers to AIX, it’s huge. Along with that, if you’re already using Nutanix in your datacenters, it’s a snap to add in a POWER-based cluster and receive the performance advantages of both Linux on Power and AIX. 

“Game-changer” is pretty cliched techspeak at this point, but it fits here. The capability to run AIX on Nutanix is a game-changer. I hope this gives you an idea of the different things you’ll be able to do with AIX going forward.

Getting Hands on With AIX on a Nutanix Cluster

Edit: Shame this did not gain more traction.

Originally posted May 15, 2018 on AIXchange

Ever since IBM’s intriguing statement of direction about AIX running on POWER-based Nutanix clusters, I’ve eagerly awaited the real thing. The wait ended last week, when availability of the hyperconverged systems was made official at the Nutanix .NEXT conference in New Orleans.

Now here’s the really cool part: during the IBM Technical University earlier this month, I got some hands-on experience with AIX running on a Nutanix cluster. Then last week, I was able to access a cluster again, this time via Webex video conferencing.

So how does this all work? I’ll start with the Prism interface. Watch this to get some familiarity with it. Prism is the GUI that manipulates the virtual machines that we created and managed. While the video I reference is actually an x86 cluster, Prism’s look and feel is similar to that of a POWER-based cluster.

Once we were logged into Prism, we loaded a pre-GA raw disk image provided by IBM into our image repository. It’s very similar to how we use the VIO server’s virtual media library, only instead of booting from CD and installing AIX, we basically took a clone of this disk image and booted from that.

Compared to creating a machine on the HMC, there isn’t much to configure in a VM when creating it via Prism. (This video gives you a feel for those tasks.) This solution–and the capability to clone virtual machines in particular–feels similar to using PowerVC images and shared storage pools with our existing POWER servers. However, with a hyperconverged solution, there’s no need to worry about managing a SAN at all, because your disks are locally attached to your compute nodes.

I entered the name of my VM, the number of virtual CPUs, the number of cores per VCPU, and the amount of memory I wanted. Then I added a network interface and some logical disks that I carved out of a larger pool of physical disk. I selected “clone from image service” along with the correct disk image. I clicked on add, and the VM was created. After clicking on the power on option and selecting the console, the machine booted up. I logged in as root with no password and I was up and running.

At this point I clicked the clone option; that’s all it took to get another machine up and running. The lspv command displayed the same PVID on both systems. They were identical disk clones. In the prtconf command output, I saw the following:

System Model: IBM pSeries (emulated by qemu)
Machine Serial Number: Not Available
Processor type: PowerPC_POWER8
Processor Version: PV_S_Compat
Number of Processors: 4
Processor Clock Speed: 2095 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: (this was a long UUID string)
Platform Firmware level: Not available
Firmware Version: SLOF, HEAD

The information about the physical hardware is a little different from what we’re used to seeing in PowerVM-based systems. To determine my serial number, I’ll typically run either uname –a or prtconf; neither worked in this instance. Instead I went into the Prism GUI to see the physical node I was running my AIX image on.

Here’s a snippet of the some of the output generated by running lsdev. Again, there are some differences:

vscsi0 Virtual SCSI Client adapter
cd0 Virtual SCSI Optical Served by VIO Server
ent0 qemu_virtio-net-pci:0000:00:01.0 Virtio NIC Client Adapter
scsi0 qemu_vhost-user-scsi-pci:0000:00:02.0 Virtio SCSI Client Adapter
hdisk0 qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0 MPIO Other Virtio SCSI Disk Drive
hdisk1 qemu_vhost-user-scsi-pci:0000:00:02.0-LW_0 MPIO Other Virtio SCSI Disk Drive

Later, I built an “empty” virtual machine. I gave it a name and assigned memory, CPU, disk and a network, but I didn’t give it anything to boot from. On the Nutanix cluster there’s no SMS to boot into. By default it tried to boot from the network. After that timed out, it booted into the Slimline Open Firmware (SLOF) interface.

Since I didn’t have a NIM server built, I couldn’t test that process. Rest assured, that will be one of the first things I do once I get my own solution.

In the systems running AIX, I was able to load a virtual CD .iso containing AIX filesets just as we’d do with PowerVM and VIO optical media libraries. Then I went into smitty and loaded filesets, just as we’d do with any other AIX system.

When I ran oslevel –s, the system returned 7200-02-02-1810.

Using chfs command to resize filesystems went as expected.

Running lsattr –El hdisk0 produced some interesting unique_id information. The disks appeared as a 54391NUTANIX disk.

I ran the mount command to mount the virtual CD inside AIX, poked around for a bit, and unmounted it. Then I went into the Prism GUI, removed the .iso I’d been using and added a different image into the virtual CD. Finally, I went back into AIX and mounted this new .iso on the fly.

Migrating virtual machines across physical nodes was like running Live Partition Mobility with PowerVM. Of course there were minor differences running AIX on this different hypervisor, but overall everything worked as expected. Getting right to work in this new environment was very simple.

As you’ll need AIX 7.2 to deploy machines into this environment, you should listen to Chris Gibson’s recent AIX Virtual User Group presentation on AIX 7.2 migration.

There’s much more I want to do with this technology. I plan to test out a mksysb migration to move my systems to the supported version of AIX that will run on a Nutanix cluster. Later on, I’ll get into SLOF and boot from a NIM server. I also want to kick off workloads and run performance scripts. Basically, I want to see what can and can’t be done with this compared to traditional AIX environments running on PowerVM.

The fact that there’s another platform and hypervisor choice when it comes to running AIX is a big deal. For one thing, it’s still more proof that AIX is here for the long haul.

Hopefully I’ve explained everything well. Please pose questions and share your impressions in comments.

A Not So Technical Look at Technical Debt

Edit: Don’t let things rot. Entropy is real.

Originally posted May 8, 2018 on AIXchange

This Twitter discussion got me thinking about technical debt, a concept I discussed here:

As often as I see it, it still surprises me when I encounter a company that depends on some application, but chooses to run it on unsupported hardware without maintenance agreements and/or vendor support. If anything goes sideways, who knows how they will stay in business.

I find it a bit funny to see other tech pros take such a narrow view of technical debt. Does it only apply to software, or is it reasonable to also apply it to other areas of technology? Why not both? Why not go even further? Consider, for instance, this analogy:

In a non-technical example, imagine owning an older car that has served well but is due for retirement in three months. In three months you plan to invest in a new car because the old one is no longer cost effective due to continuous maintenance needs, lower efficiency and so forth. But before your three month plan to buy a new car comes around, the old car suffers a minor failure and now requires a significant investment to keep it running. Putting money into the old car would be a new investment in the technical debt. Rather than spending a large amount of money to make an old car run for a few months, moving up the time table to buy the new one is obviously drastically more financially sound.

With cars, we see this easily (in most cases.) We save money, potentially a lot of it, by quickly buying a new car. If we were to invest heavily in the old one, we either lose that investment in a few months or we risk changes our solid financial planning for the purchase of a new car that was already made. Both cases are bad financially.

IT works the same way. Spending a large sum of money to maintain an old email system six months before a planned migration to a hosted email system would likely be very foolish. The investment is either lost nearly immediately when the old system is decommissioned or it undermines our good planning processes and leads us to not migrate as planned and do a sub-par job for our businesses because we allowed technical debt to drive our decision making rather than proper planning.

Technical debt is accrued when we put off patching our systems or upgrading our hardware, or when we fail to keep our maintenance contracts in place. Sometimes it’s accidental. We’re told that a system will be replaced, so we hold off on patching or upgrading an application or OS. But then the promised replacement is delayed or canceled, and the next thing you know, we’re running older code and the upgrade path is far more complicated than it would have been had we kept on top of it. Or we may let support lapse, believing that servers are going away soon. Instead, soon never happens and a critical piece of our infrastructure is no longer supported. Everything from missing a change window to change freezes to lack of cycles can contribute to these scenarios.In IT, putting off change because it’s convenient is an all-too prevalent and incredibly damaging mindset. There always comes a point where replacing old technology is the most cost-effective option. Unfortunately, far too few businesses recognize this. It’s on us to make executives, and even some of our colleagues in IT, understand the true cost of technical debt. If we let things rot, they will, in fact, rot, and letting things rot is far worse than doing nothing. We must fight to retain the ability to upgrade our systems as needed.

More Help for the HMC Transition

Edit: I assume you have transitioned by now?

Originally posted May 1, 2018 on AIXchange

Awhile back Kiran Tripathi (@SocialKiran) made note of this IBM Knowledge Center breakdown of HMC interfaces.

The Hardware Management Console (HMC) provides more than one interface that you can use to manage your virtual environment.

These interfaces are called the HMC Classic interface, the HMC Enhanced interface, and the HMC Enhanced+ interface. When you log on to the HMC for managing your virtual environment, you select which interface you want to use. To change the interface that is used, log out of the HMC and log in to the HMC with a different selection.

HMC Classic interface
The HMC Classic interface is the continuation of the interface that was provided in previous versions of the HMC. The HMC Classic interface supports many of the same tasks as the HMC Enhanced interface, such as managing servers, partitions, and adapters.

The HMC Classic interface is not supported in Hardware Management Console (HMC) Version 8.7.0, or later. The functions that were previously available in the HMC Classic interface are now available in the HMC Enhanced+ interface.

HMC Enhanced interface
The HMC Enhanced interface is an updated version of the HMC Classic interface, and is provided with HMC Version 8 Release 8.2.0. In addition to providing simplified paths to completing virtualization management tasks, it also provides new functions that are not available in the HMC Classic interface. The main new function is the use of templates. You can use templates to complete the following tasks:

  • Deploying a system.
  • Creating a partition.
  • Capturing a system or partition configuration as a template.
  • Running Template Library management functions, including edit, copy, import, and export.

HMC Enhanced+ interface
The HMC Enhanced+ software interface provides new navigation paths to tasks that are common to the HMC Classic interface and the HMC Enhanced interface, and some functions that are unique to the HMC Enhanced+ interface. The new tasks and functions include enhanced Activate task for partitions and Virtual I/O Servers with network boot and network installation options, and graphical representation of the virtual network that represents the relationship between various components in the network for a system.

This information differs slightly from what’s found in the HMC doc I referenced in February, but there are similarities. In each case, you get a list of tasks with explanations of this can be accomplished in either the classic interface, the enhanced interface or the Enhanced+ interface.

Of course the classic interface is going away, but for those environments that are still working with the old menus, these documents will help you with the transition.

History Bytes

Edit: Where will we be in 20 years?

Originally posted April 24, 2018 on AIXchange

How many of you keep stacks of old computer publications? I did, until I was finally told to get rid of some of my PC Computing magazines from the 90s. Recently though, I was sent back in time when someone on Twitter posted a link to Byte magazine’s April 1998 issue

The tweeter originally pointed to an article about crash-proof computing that explained why PCs are so crash prone compared to mainframes and other mission-critical computers. Yep, that was still a new concept back then. 

Some of the characteristics of “crash-proof computing” included attentive administrators, reliable software, robust memory protection and redundant hardware. Of course, this could very easily describe today’s IBM Power Systems environments. 

The author contrasted that with the typical PC environments of the day. If you were in the workforce in 1998 you might remember how often machines would crash. Admittedly, today’s personal computers are very reliable. It’s been years since I’ve had to reboot my laptop. 

Another article from this issue that caught my eye was a quick two-page read titled, “IBM’s Powerhouse Chip.” It described the 8-way superscalar core, and how the new POWER3 raises the bar for high-performance CPUs.

 With POWER9 systems now available, I couldn’t help but marvel at how far we’ve come in 20 years. Give it a read, and I expect you too will find yourself thinking about how nice it is to be running machines nowadays. I can remember having root on some POWER3 and POWER4 hardware years ago. It truly is a night-and-day difference from then to now. 

For me, the biggest nostalgia kick came from the ads. Here’s a taste:

  • Micron servers and laptops (with specs that my phone beats today).
  • Digital Alpha.
  • Gateway.
  • Silicon Graphics.
  • IBM e-business and solutions for a small planet.
  • Intel Pentium II.
  • IBM Desktstar 14G and 16G disks.

A lot of these companies are gone now, as is Byte itself–this issue was one of the last. Of course, many big tech companies endure: Dell, IBM, Microsoft, Information Builders, APC, Kingston, Intel, and CDW, to name a few. 

Honestly, these little trips back in time, and the chuckles I get from them, help keep me grounded. Rest assured though, 20 years from now, today’s cutting edge technologies will seem similarly quaint.

Troubleshooting a vSCSI Mapping

Edit: Some links no longer work

Originally posted April 17, 2018 on AIXchange

I was recently asked to help troubleshoot a vSCSI mapping. My colleague was running an SAP HANA POC workload on POWER, and is new to the platform. At the time an older version of HMC code was being used, so we still had access to the classic HMC interface. Since mapping the virtual adapters and managing the profiles is a manual process, there’s always the potential for errors. And unfortunately, mistakes were made with the adapter numbering.

As an aside, in the enhanced version of the HMC GUI, all of this is automated. The enhanced version may still be unfamiliar to many, but it does provide this among other benefits. Regardless, we’re going to have to make the transition, because IBM has made it known that the classic interface is going away and support for x86 hardware appliances is being phased out.

In any case, even though we verified that the adapters were set up correctly in the profiles and the LUN was mapped to the correct adapter, a Linux OS that had been previously installed on the LUN couldn’t be booted. It didn’t appear as a as a bootable device. At this point, the question was posed: “are you sure the LUN is mapped correctly? Can the LPAR really see it?” 

As I often do, I decided to try an internet search, and came across this thread. It’s from 2013 but it’s precisely what I was hoping to find:

In the first reply:

During SMS processing “LIST ALL DEVICES” will ONLY show devices that SMS thinks are BOOTABLE.

Then further in the thread:

You can try and boot the partition in openboot. Then type ioinfo and pick vscsi, this should show you your LUNs.

In our case it was as simple as booting the LPAR normally (i.e., not into SMS mode) and then selecting option 8, the open firmware prompt. From there we could select the proper disk and get a confirmation that it did indeed see the LUN, along with the LUN’s size.

This was enough to convince us that our mappings were fine. We then used bootable media to make the LUN bootable.

Although many of us use NPIV or shared storage pools these days, mentally file this anecdote away in the event you ever find yourself using vSCSI.

Fixes for a PowerHA Issue

Edit: Hopefully you have already put on these patches by now

Originally posted April 10, 2018 on AIXchange

I received this information from Chris Gibson a few weeks ago. If you use PowerHA, I recommend checking to see if you’re on affected levels of AIX:

High Impact / Highly Pervasive APAR
IJ02843 – PowerHA node halt during ip changes

“USERS AFFECTED:

  • Systems running PowerHA System Mirror on the
  • AIX 7100-05 Technology Level or
  • AIX 7200-02 Technology Level with
  • rsct.basic.rte at 3.2.3.0.
    **************************************************************
  • PROBLEM DESCRIPTION:
  • An improvement in obtaining adapter state information from
  • AHAFS event responses introduced some errors in handling
  • * internal tracking of monitored IP addresses.
    *
  • This can result in a core dump of the hagsd process any time
  • an IP change occurs at the OS layer. This means the failure
  • cannot happen while a cluster is running stable with no
  • changes occurring, but it is a risk during startup, shutdown,
  • or a failover scenario, and cannot be predicted beyond that.

Problem summary
    A flaw in handling of monitored IP changes during some
    adapter state improvements in RSCT 3.2.3.0 has led to the
    risk of a hagsd core dump in a couple code paths.

Problem conclusion
    Transition of IP lists during a monitoring change has been
    corrected.”


Here’s some additional information:

In a PowerHA cluster, if an IP address is changed on one of the AIX nodes, the node may reboot unexpectedly due to a core dump of the hagsd process. This can happen when a Service IP is configured during normal PowerHA startup/shutdown/failover, or other operations resulting in an IP change.

Affected AIX Levels and Recommended Fixes
Minimum Affected Level Maximum Affected Level Fixing Level Interim Fix
7100-05-00
rsct.basic.rte 3.2.3.0 7100-05-02
rsct.basic.rte 3.2.3.0 7100-05-03
IJ02843 iFix
7200-02-00
rsct.basic.rte 3.2.3.0 7200-02-02
rsct.basic.rte 3.2.3.0 7200-02-03
IJ02843 iFix

Note: Applying the ifix requires PowerHA to be stopped on the node prior to applying the fix.

Securing Your HMC

Edit: Some links no longer work

Originally posted April 3, 2018 on AIXchange

IBM developerWorks has a nice article about securing your HMC:

If you use Power HMC and are looking for information on how to secure your HMC, you are at the right place. Default configuration of HMC is good enough for most enterprise users. You will find steps to harden HMC further based on your corporate security standards. The steps mentioned below work on HMC V8.8.4.0 and later. It is recommended that every HMC is set to minimum at Level 1. You may choose to go to Level 2 and Level 3 depending on your environment and corporate security requirements. If necessary, please check with your corporate security compliance team before making these changes.

The document includes instructions for changing passwords, setting up accounts for each HMC user, assigning necessary roles to users, setting up LDAP, blocking ports in firewalls, etc. You’ll also find a list of HMC network ports, along with some thoughts around completely taking your HMC off of the network. There’s discussion around setting up NIST SP 800-131A compliance, ciphers and certificates, along with commands you can use to audit the HMC and audit user activity. Finally, there’s a mention about centralizing your HMC logs using rsyslog to send data to a central log server.

The end of the doc lays out the options for tracking fixes:

If you come across a hot new security vulnerability everyone is talking about, you can look at the attachment section of wiki to start with. It has a list of vulnerabilities fixed in last couple of years. You can click on CVEs to read associated security bulletin. This list will be kept up-to-date.

You can search for the latest security bulletins, check Twitter (@IBMPowereSupp) or subscribe to receive email notifications. There’s also a discussion group on LinkedIn (IBM PowerVM).

As an aside, the doc includes a recommendation to use Kali Linux to determine the OpenSSH version that’s running on your HMC. A commenter mentions that if running Kali and metasploit is frowned upon in your environment, running ssh –vvv is another way to find the OpenSSH version.

Beyond that, what do you think? This seems like useful information that we can use in our environments.

POWER9 Attracts a BIG Customer

Edit: Still pretty impressive

Originally posted March 27, 2018 on AIXchange

I’d heard rumors for a while, but those rumors were confirmed last week: Google runs IBM Power Systems* in its production environment. This is from Forbes.com:

The biggest OpenPOWER Summit user news was that Google confirmed that it has deployed the “Zaius” platform into its data centers for production workloads. Google’s Maire Mahony, on stage at the event today said, we have “Zaius deployed in Google’s Data Center,” and we are “scaling up machine count.” She concluded by saying she considers the platform “Google Strong.” Mahony shared with me afterward that “Google Strong” refers to the reliability and robustness. Not to take away from the other deployments announced at the event, but this announcement is huge.

Mahony explained what Google likes about POWER9:

  • More cores and threads for core Google search
  • More memory bandwidth for RNN machine learning execution
  • Faster and “more open” flash NAND sitting on OpenCAPI acceleration bus

I was told it was a simple recompile to get their code to run on POWER, but I’d still love to hear Google engineers talk about their actual use of POWER and how these systems perform compared to the others in the data centers.

The Forbes article itself is more generally focused on POWER9 and news from the OpenPOWER Summit. The Motley Fool gets more into specifics:

Why, and for what, is Google using POWER9 processors? Google found that the performance of its web search algorithm, the heart and soul of the company, scaled well with both the number of cores and the number of threads available to it. IBM’s POWER9 processor is a many-core, many-thread beast. Variants of the chip range from 12 to 24 cores, with eight threads per core for the 12-core version and four threads per core for the 24-core version. Intel’s chips support only two threads per core via hyperthreading.

The bottom line is that IBM’s POWER9 chips are ideally suited for workloads that fully take advantage of the large number of threads available. Google’s web search is one such workload. They’re not well suited for workloads that don’t benefit from more threads, which is why the market-share ceiling for POWER isn’t all that high.

Mahony also talked about the importance of bandwidth. It doesn’t matter how fast a processor is if it can’t move data fast enough. IBM claims that one of its POWER9-based systems can transfer data up to 9.5 times faster than an Intel-based system, using OpenCAPI and NVIDIA NVLink technology. That’s important for any kind of big data or artificial intelligence (AI) workload.

AI workloads are often accelerated by GPUs or other specialized hardware. Google developed its own accelerator, the Tensor Processing Unit, which it uses in its own data centers for AI tasks. But these accelerators still require a host processor that can move data fast enough.

Obviously readers of this blog–as well as the guy who writes it–already know and love POWER. But it’s always nice to see some big name enterprises get on board with POWER hardware.

System Planning Tool Updated for POWER9

Edit: Have you grabbed the latest version?

Originally posted March 20, 2018 on AIXchange

The six POWER9 servers IBM announced last month GA this week. Are you ready to refresh your System Planning Tool?

The System Planning Tool (SPT) helps you design a managed system that can support a specified set of workloads.

You can design a managed system based on workload data from your current systems, based on new workloads that you want the managed system to support, based on sample systems that are provided with the utility, or based on your own custom specifications. The SPT helps you design a system to fit your needs, whether you want to design a logically partitioned system or want to design an unpartitioned system.

There are a number of options available to help you get started with using the SPT:

  • You can use the sample system plans that the SPT provides as a starting point for planning your system.
  • You can create a system plan based on existing performance data.
  • You can create a system plan based on new or anticipated workloads.
  • You can create a system plan by using the Hardware Management Console (HMC). You can then use the SPT to convert the system plan to SPT format, and modify the system plan for use in system ordering or system deployment.

With the SPT, you can copy logical partitions from a system in one system plan to either another system in the same system plan or to a different system in another system plan. For example, you can build up system plans that contain your own sample logical partitions, and then copy one or more of these sample logical partitions into a new system plan that you are creating. You also can copy a logical partition within the same system plan. For example, you can define the attributes of a partition within a system plan and then make 7 copies of that partition within the same plan.

You can export a system plan as a .cfr file and import it into the marketing configurator (eConfig) tool to use for ordering a system. When you import the .cfr file into the eConfig tool, the tool populates your order with the information from the .cfr file. However, the .cfr file does not contain all the information that the eConfig tool requires. You will need to enter all required information before you can submit your order.

If you make any changes to the hardware assignments or placement in the system, the SPT validates the changes to ensure that the resulting system fulfills the minimum hardware requirements and hardware placement requirements for the logical partitions.

When you are done making changes to the system, you can save your work as a system plan. You can import this file into your HMC. You then can deploy the system plan to a managed system that the HMC manages. When you deploy the system plan, the HMC creates the logical partitions from the system plan on the managed system that is the target of the deployment.

IBM’s SPT page has further information. If you’d like to be notified of SPT updates, select the Releases tab and email IBM (subject line: subscribe to distribution list) at the address listed on that page.

Select the download tab to see the latest version (6.18.047.0 as of this writing).

I’ve heard of users having issues minor issues with this particular version of SPT. While it may still be bleeding edge software, once the kinks get ironed out you’ll be glad you have this tool. So be sure to download the updates.

Dealing With an HMC Upgrade Problem

Edit: Hopefully none of you will see this in the future

Originally posted March 13, 2018 on AIXchange

During a recent HMC upgrade, a buddy of mine had a problem. While you’re unlikely to find yourself in this situation, if you ever do, you’ll be glad you read this.

He was trying to create a new VIO server in enhanced mode, and it kept failing. After opening a PMR and sending in logs, he got this back from IBM Support.

I’ve gone through your logs and found that you have hit a known issue. The problem actually occurred when you upgraded the HMC. During the upgrade some users and groups were recreated with new UIDs and GIDs, rather than being restored, so files in /data that existed prior to the upgrade are orphaned, and the new versions of those users do not have full access to them. The restore upgrade data never actually completed.

PMC0000E: An unexpected Throwable was caught.
Throwable=”PmcJobException” with message “com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft”
Cause=”PmcJobException” with message “com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft”
com.ibm.pmc.jaxb.api.server.jobs.PmcJobException: com.ibm.pmc.rest.templates.library.api.LibraryAccessException: Couldn’t create directory /data/pmc/templates/systemtemplate/deploydraft
/tmp/ls_all.out:
6291473    4 drwxr-xr-x   2 503      504          4096 Nov  7 16:53 /data/pmc/templates/systemtemplate
/data/pmc/templates/systemtemplate

You have a few options to recover from this condition.

First, if you upgraded from 850 to 870 and you saved the SaveUpgradeData off to USB, you can scratch install the HMC to 850 and then restore upgrade data after the scratch install at 850. You would then need to install PTF MH01730 and THEN upgrade to 870. If you no longer have that data, you won’t be able to restore any upgrade data. If you upgraded from 860, you’ll have to scratch install.

The second option would be to scratch install 870 without any restore of upgrade data. We have a doc that tells you the information you will need to document for the new install.

Scratch Installation of Version 8 HMC from Recovery DVD

Items to Document Prior to Performing Scratch Installation of the HMC

The third option is not supported, but many customers have had good luck with this method. You can run the following commands as root to correct the ownership/permissions issues. This is not guaranteed but many have had good luck with it. If any problems arise during or after these steps, you will have to scratch install the HMC.

find /data -uid 501 -exec chown ccfw {} +
find /data -uid 502 -exec chown soliddb {} +
find /data -uid 503 -exec chown wlp {} +
find /data -gid 501 -exec chgrp ccfw {} +
find /data -gid 503 -exec chgrp soliddb {} +
find /data -gid 504 -exec chgrp wlp {} +
find /data -gid 508 -exec chgrp hmc {} +

You will need to be root to try the workaround. Here are the needed passwords to gain root access.

ssh in as hscpe user
enter hscpe password
run PESH <serial number>
enter password of the day
run su –
enter your root password

Obviously for that third method to work, you’d need to get the hscpe password from IBM Support, but it is something to keep in mind.

Again, you’ll probably never run into this, but if you do, this at least gives you some ideas. In any event, open up a PMR and let support guide you.

AIX Migration Prep

Edit: Still good information

Originally posted March 6, 2018 on AIXchange

Here’s an oldie but a goody: a document covering AIX migration preparation:

Information regarding version 5, 6 and 7 installation:

  • In AIX V5, at the first reboot after an install, you will be prompted to view/accept your licenses before you can continue to use your system.
  • Starting in AIX Version 6.1, a separate Software Maintenance Agreement (SWMA) acceptance window displays during installation immediately after the license acceptance window. The response to the SWMA acceptance (accept or decline) is stored on the system, and either response allows the installation to proceed, unlike license acceptance which requires an accept to proceed.
  • NIM masters/servers which are to serve version 5, 6 or 7 resources should be upgraded first. A NIM master/server must be at the same level or later than the software in any resources being served.
  • Any migration will require more space. If you have no free partitions in the root volume group, or all file systems are near full, it would be a good idea to add another disk to the rootvg. Alternatively you can install a mksysb of the system to a larger disk before running the migration. See the table below for the required space information for AIX 5, 6, and 7.

For more complete information on your release, it is highly recommended you review the Release Notes.

5.3 Release Notes

6.1 Release Notes

7.1 Release Notes

NOTE:

The latest version of the media should be used to do the migration. The latest version will always be shipped when you order the media. If you have an older version of the media and would like to obtain the latest version, you can order it at the following web site:

IBM Entitled Software Ordering and Download

If assistance is required registering for the site, call Software Delivery (1-800-879-2755 opt2 then opt2 again for the U.S.). They will require the machine model and serial number of a machine licensed to run the AIX version you are ordering. Outside the U.S., contact your local support center. 

Those who regularly do migrations may find this information to be pretty basic, but for everyone else–particularly people who are new to supporting AIX–it’s a great resource.

The 2018 IBM Champions

Edit: Some links no longer work.

Originally posted February 27, 2018 on AIXchange

This came out about a month ago, but I want to acknowledge this year’s IBM Champions:

After reviewing more than 1400 nominations, IBM is proud and happy to announce the 2018 class of IBM Champions. The IBM Champions program recognizes innovative thought leaders in the technical community and rewards these contributors by amplifying their voice and increasing their sphere of influence.

An IBM Champion is an IT professional, business leader, developer, and educator who influences and mentors others to help them innovate and transform digitally with IBM software, solutions, and services. From the nominations, 650 IBM Champions were selected…. Among those are:

  • 62% renewing; 38% new Champions
  • 38 countries represented
  • 6 business areas, including Analytics (34%), Cloud (25%), Collaboration & Talent Solutions (24%), Power Systems (9%), Storage (1%), IBM Z (7%)


These individuals evangelize IBM solutions, share their knowledge, and help grow the community of professionals who are focused on IBM offerings. IBM Champions spend a considerable amount of their own time, energy, and resources on community efforts—organizing and leading user group events, answering questions in forums, contributing articles and applications, publishing podcasts, sharing instructional videos, and more.

As a reward, IBM Champions receive IBM Champion-branded merchandise, IBM Champion open badges, and invitations and discounts to IBM conferences. They are highlighted online and recognized at live events. In addition, they may be offered various speaking opportunities that enable them to raise their visibility and broaden their sphere of influence. They are recognized for the work they have done over the past year and supported and enabled with education and opportunities to do even more advocacy in the next year.

You can search for names from all 650 Champions here. The 39 IBM Power Systems champions–of which I am one–are listed hereI’ve said it before, but it bears repeating: I’m proud of this honor. It’s always nice to get recognition for the things you do, and believe in.

A Valuable Doc on HMC GUI Options

Edit: Some links no longer work

Originally posted February 20, 2018 on AIXchange

Alan Fulton (@The_Iron_Monger) tweeted the link to this information on GUI options in the new HMC. As the classic view we’re all used to goes away, you should explore this doc that clocks in at a tidy 15 pages:

Introduction
Menus Available – Enhanced GUI Only
Enhanced GUI Path New Features
Classic GUI to Enhanced GUI Mapping
Main Menu Navigation
Managing Servers -> All Actions
Managing Servers
Managed System Advanced Options
Partition Management
Partition Properties
Serviceability Options
Capacity Upgrade On Demand
Groups and Power Enterprise Pools
Management of the HMC and Administration
Service Management/Serviceability
August 2017 Code Update
Enhanced GUI Advantages
Network Topology
Storage Topology
Optional views for System and LPAR Objects
Box View
List View
Shortcuts to menus
Resources and Performance Dashboard
Additional Information and firmware level
Relational view Tasks LogThe appendix has some links as well that you’ll find particularly useful: 

Be sure to read it over.

IBM Unveils Six POWER9 Servers

Edit: And now we wait for POWER10 servers

Originally posted February 13, 2018 on AIXchange

IBM is announcing six new POWER9 scale-out servers today, with general availability set for March 20. IBM is touting these systems as future forward, cloud-ready infrastructure for mission critical workloads. The systems will max out with 4 TB of memory and will have PCIe Gen4 adapters, which doubles the bandwidth of Gen3 cards.

Each system will have PowerVM Enterprise Edition built in, and IBM is helping customers migrate by providing 60-day temporary licenses for existing machines that don’t already have PowerVM Enterprise Edition. This will allow you to use live partition mobility to migrate running workloads from existing POWER7 or POWER8 machines to your new POWER9 machine.

The new scale-out systems will use direct-attached industry standard DDR4 DIMMs in place of the custom buffered memory DIMMs that we saw on POWER8, making memory subsystem pricing more competitive with non-IBM servers. The memory subsystems will provide up to 170 GB/s of bandwidth.

The POWER9 processors will run up to eight threads, which should in particular provide a performance boost to applications that are written to exploit these additional threads. These systems are configured to have dynamic, adjustable processor frequencies. For example, a maximum performance mode setting will have different thermal and energy characteristics compared to other settings like static power save, dynamic performance, etc.

These six new systems consist of a Linux-only variant, three AIX and IBM i “traditional” servers, and two SAP HANA edition machines that will be capable of running limited AIX and IBM i workloads (up to 25 percent core activations total). Five of the systems will max out at 4 TB of memory; the S914 will max out at 1 TB.

The L922 Model

The L922 (model 9008-22L) is a 2U 1- or 2-socket system with 8, 10 or 12 cores per socket. This system is Linux-only. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0), and four are Gen3. It can have up to eight small form factor drives.

The S922 Model

The S922 (model 9009-22A) is a 2U 1- or 2-socket system with 4, 8 or 10 cores per socket. This system will run AIX, IBM i or Linux. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0), and four are Gen3. It can have up to eight small form factor drives.

The S914 Model

The S914 (model 9009-41A) is a 4U 1-socket system that will run AIX, IBM i or Linux. This is the only system that comes with a tower variant. It will have 4, 6, or 8 cores per socket, though keep in mind you still have the option to factory deconfigure cores on your systems if you find that one or two cores are sufficient for your smaller workloads. If you choose to go with four cores, you won’t be able to attach I/O drawers to the machine. It will have eight PCIe slots; two are Gen4 (and CAPI 2.0 capable) and six are Gen3. There are options for 12 or 18 small form factor internal disks, and it will have an option to run on 110 VAC power. Reminder: This system will max out at 1 TB of memory.

The S924 Model

The S924 (model 9009-42A) is a 4U 2-socket system with 8, 10, or 12 cores per socket. It will run AIX, IBM i or Linux. It has a total of 11 PCIe slots; five are PCIe Gen4 (4 CAPI 2.0), and six are PCIe Gen3. There are options for 12 or 18 small form factor internal disks.

The S914, S924, and H924 are all capable of including internal RDX media. The selection of RDX will affect how many internal disks the machines can hold, but note that none of the six systems will have internal DVDs or tape drives. Plan on doing more with USB flash media, external USB connected DVDs and network based operating system installations going forward.

The H922 Model

The H922 (model 9223-22H) is a 2U 1- or 2-socket system with 4, 8 or 10 cores per socket. It will primarily run SAP HANA, but can run up to 25 percent AIX and IBM i core activations. It has nine PCIe slots; five are Gen4 (4 CAPI 2.0) and four are Gen3. It can have up to eight small form factor drives.

The H924 Model

The H924 (model 9223-42H) is a 4U 2-socket system with 8, 10, or 12 cores per socket. It will run SAP HANA with up to 25 percent AIX and IBM i core activations. It has a total of 11 PCIe slots; five are PCIe Gen4 (4 CAPI 2.0), and six are PCIe Gen3.

An interesting feature with all these machines (except for the L922) is the capability to run NVMe devices. The POWER9 scale-out systems will support up to 4 x 400 GB M.2 form factor NVMe devices on the S914, S922, S924, H922 and H924. This should be particularly beneficial for environments that include VIO servers, since the NVMe devices can be used as your internal boot media. This is certainly more convenient and cost effective compared with ordering a split backplane and hard drives.

The POWER9 Software Stack

Here’s the software stack you’ll need to run on these machines:

Firmware level FW910

HMC code level V9R1.910

VIOS 2.2.4, 2.2.5, 2.2.6

AIX 7.2 TL2

AIX 7.2 TL0, TL1 (P8 Compatibility Mode)

AIX 7.1 TL4, TL5 (P8 Compatibility Mode)

AIX 6.1 TL9 (P7 Compatibility Mode)

IBM i 7.3 TR4

IBM i 7.2 TR8

Ubuntu 16.04.4 LTS (P8 Compatibility Mode)

RedHat RHEL 7.4 LE (P8 Compatibility Mode)

SuSE SLES 11 SP4 (P8 Compatibility Mode)

SuSE SLES 12 SP3

For an overview of AIX 7.2, read this. Incidentally, I’ve seen roadmaps for AIX 7.1 and 7.2 that extend to 2027. Our future’s so bright, AIX pros should don protective eyewear.

To round things out, there’s a new 19-inch rack option, the 7965-S42.

Statement of Direction: AIX VM on Hyperconverged Systems

One other tidbit I caught in today’s news: In a statement of direction, IBM said it “intends to enable selected AIX VM guests on IBM Hyperconverged Systems powered by Nutanix (CS series).” I wrote a previous AIXchange post about Nutanix running on POWER nodes, and I’ll revisit this topic in the near future.

We’ve been talking about POWER9 for awhile now, but soon it will actually be in our computer rooms. I can’t wait.

Hardware Maintenance EOS Extension on the Way

Edit: All of these are still important considerations

Originally posted February 6, 2018 on AIXchange

Back in November IBM announced a hardware maintenance end of service (EOS) extension for customers with unsupported legacy systems. This offering is expected to be available in the spring:

IBM Hardware Maintenance End of Service Extension is the answer for clients who are not able to migrate off IBM devices prior to the end of service (EOS) date. With this offering, IBM may continue to provide support to clients beyond the effective EOS date based on availability of repair parts, skills, and engineering field support.

IBM recognizes that there are many reasons why clients might be unable to migrate to replacement technology prior to a device’s EOS effective date, and therefore require extended support for a period of time. With Hardware Maintenance End of Service Extension, IBM may continue to provide limited support to clients beyond the effective EOS date based on availability of repair parts, skills, and engineering field support.

For pricing information, contact your IBM representative or IBM authorized Business Partner.

The topic of those who live with legacy systems and how to help them is a drum I’ve been beating regularly of late. And previously I wrote about the barriers to moving forward, including a lack of motivation and the double-edged sword that is hardware reliability.

If I’m being honest, you can scare people into action. I’ve told clients how vulnerable they are and what it can mean to their business if that system that sits in a corner actually goes away. I’ve pointed out that their backups are inadequate and their disaster recovery plans and support options are non-existent. Making these points can occasionally provide motivation (especially if I use a spooky voice).

Seriously though, it’s a shame that oftentimes these customers don’t take action until the worst has happened or is happening. Think about how catastrophic it would be if your system went down and you had no support, no backups and precious few options for replacing that old hardware.

Or think about this: Whatever IBM does, or whatever I or anyone else says, will have only a limited effect. You simply can’t reach some of these enterprises, who may not employ a single IT person who’s up on AIX. And if they don’t have anyone with expertise on their systems, they most likely don’t have anyone who would bother to keep current on IBM Power Systems and AIX news and information, either. So for as much as I’ve discussed this, I know I’m essentially preaching to the choir. The sad truth is that a lot of these customers–even though they’re fine now, even though they’ve been fine for years–won’t be fine forever. Eventually, lightning will strike, and not all of these businesses will survive.

Follow Me (at a Faster Speed)

Edit: I typically run at 2.8x these days.

Originally posted January 30, 2018 on AIXchange

This article helps me articulate the benefits of listening to information at something faster than normal speed:

Rachel Kenny started listening to podcasts in 2015 — and quickly fell behind. “As I started subscribing to more and more podcasts, they started stacking up, and I couldn’t keep up at normal speed,” the 26-year-old data scientist in Indianapolis told BuzzFeed News. “I also had to listen to the backlist of all the podcasts when I subscribed to them.” So Kenny began listening faster: first at 2x, then she worked her way up to 3x.

Kenny’s listening habits may be extreme, but she’s not alone. Meet the podfasters, a subset of podcast obsessives who listen to upward of 50 episodes a week, by, like Kenny, listening extremely fast. They’re an exclusive group: According to Marco Arment, creator of the Overcast podcast app, only around 1% of Overcast listeners use speeds of 2x or higher. (An app called Rightspeed, which costs $2.99, allows you to listen at up to 10x.)

Yes, I actually do this, and no, I don’t blow through recordings at anywhere near 10X speed. But as someone who frequently tunes into replayed webinars, prerecorded vendor training sessions and the like, I’m all for consuming the most information in a reduced amount of time. Being able to take in two one-hour webinars in a single hour without losing comprehension is certainly valuable to me.

I’ve found that speeding up recordings 1.5X to 2X works best. I do have to make myself focus on the content to understand what’s being said, but honestly, I see that as another benefit. Overall though, at this range I can follow along without difficulty. And guess what? You probably can, too.

More from the article:

In fact, according to behavioral neuroscientist Stephen Porges, because recordings played at higher speeds are at a higher pitch, they are actually easier to hear. Low-frequency noises, like street noise, vacuum cleaners, or airplanes, get in the way of our understanding of people talking; by playing podcasts at a higher speed, the listener is creating a greater acoustic differentiation between the words and lower-frequency background noises. According to Porges, the muscles in the middle ear help to dampen low-frequency sound so we can hear speech more clearly — but if we don’t exercise those muscles (by, say, not having much human interaction), then they don’t work as well. Thus, listening to things at a higher frequency, and speed, could be helpful.

I speed up my recordings in a couple of ways. If it’s something I can download in .mp4 format, for example, I’ll open it with vlc, go to the menu and select playback/speed. This gives me options to go faster or slower. For YouTube videos that I’ll play on Firefox or Chrome, I’ll login and click the settings icon, and from there I can set the speed. Various browser plugins also allow you to control video playback speed. Fun fact: These plugins also work with Netflix on your computer, so binge-watching a series can go that much quicker.

Oddly enough, I haven’t figured out how to do this with my TV. At least I’ve yet to find the DVR controls that speed up the content. Admittedly, I’m not that motivated to find a solution, since I can always pop old school DVDs into my computer and use vlc.

Whether you’re consuming AIX information or trying to catch up on a favorite TV series, I urge you to explore this. And if you do “speed listen,” tell me about it in comments.

A PowerAI Primer

Edit: Some links no longer work.

Originally posted January 23, 2018 on AIXchange

I found this IBM developerWorks post about PowerAI on the IBM Linux on Power Twitter feed (@ibmpowerlinux).

This information is a pleasant surprise. Articulating why customers should care about PowerAI can be challenging. In many cases this workload is handled by departments or organizations that are different from the ones we typically work with:

PowerAI is an IBM Cognitive Systems offering for the rapidly growing and quickly evolving artificial intelligence (AI) category of deep learning. PowerAI brings a suite of capabilities from the open source community and combines them into a single enterprise distribution of software that incorporates complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and training the model; to inference; testing; and moving the model into production.

Busy as we are tending to AIX servers and workloads, topics like TensorFlow or Caffe seldom come up. We might skim articles about AI or deep learning, but we quickly move on. But this post connects the dots for us:

Deep learning is the fastest growing subcategory of machine learning and uses software neural networks to help develop patterns of analysis within the system to generate predictive capability: deep learning is a platform that is capable of effectively learning how to learn, and it is immensely powerful for helping clients get the most out of their data.

You may think this information applies only to some distant future, but I find it quite timely. Look at it this way: Things are very different in our data centers today compared to 20 years ago. We need to have an idea of what’s coming over the next 20 years: What will PowerAI give your organization?

  1. Helps to make deep learning easier and faster for organizations….
  2. Designed to provide an end-to-end deep learning platform for data scientist.
    • Ready-to-use deep learning frameworks (TensorFlow, IBM Caffe, and BVLC Caffe).
    • Distributed as easy-to-install binaries.
    • Includes all dependencies and libraries.
    • Easy updates: Code updates arrive from a repository….
  3. Designed for enterprise scale. PowerAI enables clients to distribute the training of a model across many servers, with the potential for greatly improving performance. What used to take weeks on a single server can potentially now be completed in just hours. This distributed capability is also transparent to the application logic previously written for a single server implementation. It is the best of both worlds: potential performance improvements without having to change the application code.
  4. Deep learning to unleash new analytic capabilities….
  5. Training neural network models. With PowerAI, data scientists have visual tools for understanding accuracy while the model is running; If accuracy is not high, the model can be stopped without wasting additional time.

IBM intends to deliver IBM PowerAI Vision, an application development tool for computer vision workloads. IBM PowerAI Vision is intended to automatically train deep learning models for different image and video input data sets. 

Also check out these PowerAI videos: a shorter version, a longer version and an installation how-to.

Legacy Environments: What Can Be Done?

Edit: Yes, 5.3 is still out there.

Originally posted January 16, 2018 on AIXchange

Following up on this post about customers that continue to rely upon legacy systems, I’m curious: What would you do if you had to manage an environment with old POWER machines running AIX 5.3?

I still see it every now and then. For instance, recently I was talking to an executive in a manufacturing organization. This organization is filled with old equipment, production-related machines that cost $100,000 or more new. The makers of that equipment are long gone, but he knows how to maintain it and everything is paid for, so this guy is thrilled. As for his IT gear, he wouldn’t care if Windows* XP or DOS was still in place. In his mind he’s printing money with these ancient machines. Even though his OS is likely insecure and his hardware could stop running at any point, he doesn’t care. IT is simply not part of the equation. He’ll run these systems into the ground.

As I said, this sort of thing isn’t common, but it’s also far from exceptional. You’ve probably seen it yourself. Typically these operations have standalone systems that sit in a corner and run critical applications from internal disks. Best case, there may be a tape drive and old mksysb backup scripts that are happily running out of cron. Still, do you ever wonder how often, if ever, the tape drives are cleaned, or if new tapes are purchased? Has anyone ever tried restoring from these tapes?

Of course all these systems have been so reliable that no one even checks on them–not that anyone on staff would know about them anyway. The IT guy who took care of this stuff when it was new probably left years ago, and was never replaced.

They say you can’t help people who aren’t willing to help themselves, but when it comes to these customers, I still want to try. So how would you deal with these types of operations? What are some ways forward to at least try to minimize risk?

I would start with the used market and search for the same model hardware with the same tape drive–although even this option is becoming a challenge. If I could find similar hardware, I’d take a mksysb from the source system and try to restore it onto the “new” box. What I like about a secondary system is that this testing can go on without affecting anything. Plus, if you need to back out, it’s as simple as going back to the original machine. At least this provides some peace of mind, because if it all works, you know that there’s a good backup and hardware to restore it to.

If that cloned hardware can run, then it’s possible to try to upgrade the “new” machine, again with no downtime. Of course this is no sure thing, but surprisingly, some applications will run on a newer version of the OS. If the OS upgrade doesn’t work, or it’s already apparent that the application won’t run on a newer OS version, the next step would be to get AIX to the latest TL version possible. That would at least allow these folks to consider versioned WPARs or try to restore the mksysb onto a newer generation of hardware running in older processor modes.

If I couldn’t locate a whole new machine, I’d settle for ordering replacement internal disks and planning for an outage. I’d remove the original disks, install the new ones, restore the OS and try an upgrade. If I had to back out, I could just reinstall the original disks. Of course this method comes with its own risks.

These are all what I’d call “the best you can do” options. None of these solutions are ideal, and none change the grim reality that nearly of all of these customers are without hardware support, OS support or application support. And for whatever reason, this is considered an acceptable risk. If it ain’t broke, don’t fix it, right?

As I said though, I’m trying to help. Perhaps you can help, too. Hit the comments and tell me what you’d do in this sort of situation.

Security Vulnerability Impacts POWER Processors

Edit: Hopefully you are running current systems / firmware.

Originally posted January 9, 2018 on AIXchange

You’ve most likely heard the news that emerged last week regarding a security vulnerability impacting all microprocessors. There will be patches and fixes forthcoming for different architectures and microprocessors, including IBM POWER processors, as indicated in this Jan. 3 post from IBM’s PSIRT blog:

If this vulnerability poses a risk to your environment, the first line of defense is the firewalls and security tools that most organizations already have in place. Complete mitigation of this vulnerability for Power Systems clients involves installing patches to both system firmware and operating systems. The firmware patch provides partial remediation to this vulnerability and is a prerequisite for the OS patch to be effective. These will be available as follows:

Firmware patches for POWER7+, POWER8 and POWER9 platforms will be available on January 9. We will provide further communication on supported generations prior to POWER7+, including firmware patches and availability.

Linux operating systems patches will start to become available on January 9. AIX and i operating system patches will start to become available February 12. Information will be available via PSIRT.

Clients should review these patches in the context of their datacenter environment and standard evaluation practices to determine if they should be applied.

PSIRT also issued this post that includes links for POWER and System z:

For a detailed explanation, check out this post from Security Intelligence, an IBM-sponsored site: 

A hardware vulnerability, discovered independently by researchers from academia and Google, underscores a microprocessor flaw that, if exploited, could allow an attacker to read data from privileged kernel memory.

Since this flaw impacts all modern microprocessors, it can affect any device that uses them, including multiple operating systems running on mobile devices, laptops, workstations and servers.

It is important to note that to exploit this vulnerability, a malicious actor would need to execute untrusted code on the physical system or on a virtual machine linked to that system. This may include running content from webpages loaded in web browsers or accessed through mobile apps.

This article also provides these recommendations for mitigating risk:

This new triple-pronged flaw requires a risk assessment process for all organizations. Security teams will have to inventory their assets and determine which ones may be vulnerable. Then, after setting criticality and sensitivity scores, assets should be patched or applied mitigating controls.

An attacker must be able to place code into an application running on the system itself or on a virtual machine attached to the system to use this exploit this vulnerability. Therefore, protections to prevent unauthorized access into systems from outside the infrastructure can serve as a first barrier, as well as existing access controls for internal users.

The most immediate action security teams can take to protect assets is to prevent execution of unauthorized software, or access of untrusted websites, on any system that handles sensitive data, including adjacent virtual machines. Assume that any type of execution, including binary execution, carries the potential for attack.

Also, ensure security policies are in place to prevent unauthorized access to systems and the introduction of unapproved software or software updates.

If the organization is operating environments where preventing execution of unauthorized software is not possible, or is inconsistent, protection may only be possible by applying updates to system firmware, operating systems, and application code, as well as leveraging system-level protections to prevent the execution of unauthorized code.

In cases of update impact issues, mitigating controls should be applied in the interim, but patching is ultimately the remediation needed to prevent potential attacks. Please note that most patches released so far require rebooting systems and must be evaluated for the potential impact of such event on a given asset.

These hardware bugs, incidentally, are being called Meltdown and Spectre. For a quick overview, see this RedHat-produced video, and read this primer:  

Meltdown and Spectre exploit critical vulnerabilities in modern processors. These hardware bugs allow programs to steal data which is currently processed on the computer. While programs are typically not permitted to read data from other programs, a malicious program can exploit

Meltdown and Spectre to get hold of secrets stored in the memory of other running programs. This might include your passwords stored in a password manager or browser, your personal photos, emails, instant messages and even business-critical documents. Meltdown and Spectre work on personal computers, mobile devices, and in the cloud. Depending on the cloud provider’s infrastructure, it might be possible to steal data from other customers.

The Internet Storm Center has more. This page also links to a podcast segment where Meltdown and Spectre are explained. 

For those working with Linux distributions, here are some tips for patching vulnerabilities on Spectre and Meltdown. Those on desktop machines need to keep in mind the need to update firmware, OS and browsers.

As I see more information from IBM Support, I will do my part in getting it out there, both on this blog and on Twitter, where you can follow me @robmcnelly.

Another AIX vs. Linux Discussion

Edit: Some links no longer work

Originally posted January 3, 2018 on AIXchange

Six years ago I wrote an article about how much I love AIX. It’s a topic I’ve revisited a number of times–most recently, here

So let’s have another AIX vs. Linux discussion. Certainly, plenty of folks are still talking about this, in their companies and on Twitter

As admins, I don’t expect this to change. For one thing, let’s face it: we’re an argumentative lot. We argue over vi vs. emacs, Gnome vs. KDE, Debian vs. Redhat. People think about their workloads and where they want to run them. Do they want to take advantage of POWER’s performance advantages and run Linux on Nutanix clusters on POWER nodes? Do they want to plan on running POWER9 with NVIDIA GPUs and work on machine learning or artificial intelligence or build the world’s fastest supercomputer? Those solutions will be running Linux. 

Of course Linux is growing, and of course IBM embraces Linux across the mainframe and POWER servers. As you carry on your own OS discussions, by this logic, you should add x86 vs. POWER to the discussions. 

My point is still that I see advantages with AIX and the VIO server. Do I use Linux? Of course I do, and have for years, just like most of you. 

My issue with Linux comes down to a simple question: Which Linux? Which version of Linux do you love? Redhat? Debian? Suse? Which LVM and filesystem do you love? Which backup solution do you love? Which hypervisor do you love? I could go on, but you get the point. When you’re talking about Linux, you can be talking about many different things. 

Are you a big fan of systemd and the way that solution is heading? Which desktop manager do you love? Is Linux trying to satisfy both desktop users and enterprise users? 

I don’t mind the debates, though I don’t agree with the dire predictions. My take? AIX will be just fine. I’ve seen timelines for AIX and IBM i that extend years from now. See for yourself: Look for the AIX TL release dates and end of service pack support charts. As of this writing I see AIX 7.2 TL2 going to 2020, and AIX 7.2 TL5 going out to 2022. As more technology levels are released, you can be sure that those dates will keep marching into the future. 

At the end of the day, I am pretty sure we can all agree on this: at least we’re not Windows administrators.

Stuff Old People (Still) Say

Edit: I am sure you can think of more

Originally posted December 19, 2017 on AIXchange

Some months ago I saw this on Twitter, and it’s stuck with me:

@krhoyt Said “Do I sound like a broken record?” in a meeting, and wondered if it even made sense any more. Has it weathered the years? #getoffmylawnI guess that got me thinking about things that I say, and made me wonder why I still say them. Of course, readers of this blog know I’m old school. For instance, I still have a turntable and vinyl records, though I don’t listen to them as often as I once did. I like to buy reliable old cars, maintain them, and drive them into the ground. In addition, I still have a landline with a wired (not wireless) headset, and when people talk about Slack, it makes me reminisce about IRC.

But does anyone under the age of 45 actually understand what it means to sound like a broken record? Youngsters today, who hold a lifetime’s worth of music in their pockets, view cassettes as antique.

For that matter, why do we still say “hang up the phone” when we’re obviously just tapping the end call button on our smartphones? Or why does the save icon on most programs on your computer still look like a floppy disk? Few kids have even seen a floppy. Then again, when you want an image to symbolize saving a document, what other icon would make sense? Maybe a cloud?

Of course not everything old is necessarily bad. Sometimes things from our past even make a comeback. For instance, I’ve read about kids rediscovering TV antennas. Using this technology to get free over-the-air TV channels is mind-blowing for them.

I know I’m getting Seinfeldian here, but I really wonder about these things. And seriously, what is the deal with airline food?