Saved by uuencode (yes, uuencode)

Edit: When was the last time you used this?

Originally posted August 8, 2017 on AIXchange

I’d honestly forgotten about uuencode until recently, when I actually needed it:

Uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.

The name “uuencoding” is derived from “Unix-to-Unix encoding”; i.e. the idea of using a safe encoding to transfer Unix files from one Unix system to another Unix system but without guarantee that the intervening links would all be Unix systems. Since an email message might be forwarded through or to computers with different character sets or through transports which are not 8-bit clean, or handled by programs that are not 8-bit clean; forwarding a binary file via email might cause it to be corrupted. By encoding such data into a character subset common to most character sets, the encoded form of such data files was unlikely to be “translated” or corrupted, and would thus arrive intact and unchanged at the destination. The program uudecode reverses the effect of uuencode, recreating the original binary file exactly. uuencode/decode became popular for sending binary (and especially compressed) files by e-mail and posting to Usenet newsgroups, etc.

It has now been largely replaced by MIME and yEnc. With MIME, files that might have been uuencoded are instead transferred with base64 encoding.

I was working with IBM Support, trying to troubleshoot a server that didn’t have a working network connection. It was connected to the HMC, and I could create a console. IBM wanted me to take a snap and send it to them. Taking the snap was easy, but how could I send it without a network connection? uuencode to the rescue.

IBM gave me these steps:

    #snap -r
    #snap -ac
    Use putty to ssh into the HMC as hscroot.
    #vtmenu
    Select the managed server and the lpar.
    Login as root.
    Left-click on the putty window top left corner icon.
    Select change settings.
    Go to logging and check “All session output.”
    Click on browse to locate where the file will reside locally (in this case, on my laptop).
    Go back to the LPAR and run:

    #uuencode /tmp/ibmsupt/snap.pax.Z /tmp/snapAt this point, the file was “uuencoded,” and these characters scrolled over my screen (and were also logged to the file I’d specified):

M=/A%;-$,#3.)G\10XBAQ/[@=R52H#)0!`,.P8?BPEG@Z++04$0>)CY9@IMUP
M?IC]\)#L6P*$-<.”R_XPCUA*C+8\-\(‘9(>BX2QQ8[@S'”*V!TN&=4\1`MTP
M/WA,?,5X6HZ$?\,E81:Q<-A%S”9F6FHT89Z]@_7P&0`L[!YN$7.)W54_XLCP
M7’@#Q2’X68:’C\0>QR3QR’%)7″AF6H83H`-5D=5P<EA5M%DH!2&(GPR?XE&D
M5<%”+”HF”Y.*K8>G8O(*2*?.TRI&”[^*3<7VAMF”K!A4O”JN$0^&,\0_);<#
MKGA-0UOH%!N+C\7(XF2QLGA9S”QN%CN+G\5A3<$P:QA8S!66$TN’-:.18MLP
M/M@IJW_U-^Z)3<.8XHRCIMCCR”F&%I^&F5$8XL</NMA&#”R6″*L%V,)SNSFQ
M7[A.#-6<%’>$+<7:83*QG]@UO!\&#@N*/\3FXHRDM#@YO”[^%4>$:\**X0[Q
MEE@>G)7*%M.’Z<)!X2*Q:#A<+’$<%Z<MW\,!8XPD5%B,*156!U.%!PH(8JQP
M540K/$3(&)N,3\9.X8$Q6;A@/”V.”&^(,9J-XD$QE]@’51TF$=>%A<)\XHJC
MF?A/S!WV%RN),<8HXYTFRYA@7″N^$2>&8<8*8WFQ<+CR]!BV$W^)A\1A8OPP
MN%A?O#/F%ZN(?<:;8J!QT#@=Z23>&GN-O\8O8@EQT9A:G!*V%F.,L<4]X2]Q
M;E@?^2TN#5.,(XX68XECN1AL/(X4&U.(6\,QX@LQNS@B_!S.6L:,`<9H8Q&Q
MO3AZ@!V^%%.-8YIG8D#Q9OAG[“BF&Y<C[<;IXO$3J_ANW”Y&&/N-D\;7XI(F
MKBQMO”VV#^.,I\9]XJKQ7CA-_”]>$SN.R9$”XG9″QSBQ8″`&&>>*1<:4I05Q
MR?AT3#NN'<LL(<=[8\FQK[AEK”%F(+2′,<=G8\>HPWA$S!&.’IB(I<9NX\.Q
M5E-Q3#IF’,N,;<<F1]RQP5AW+”.>’J>%<<278QWQPIA.[“/F’->’A\2R`XGQ
M9_AM#’&,&T<<Y\;0X[1E13-]S#YN'[N#J<35OTL1EABD”3CF’M.,O<3;XOO9
M/MAP’#I&’/.,^\6!8N<QP-A]G’%T$T,/X\3TXS4QP[A+[#3.’^>)V<;\8YVQ
M_]AJC”3&&I.+M<8$9(CCH9A1JRAF%`./'<71TM]8S?@@62G^'”>/^\?+XYYQ
*`#EKW#C.($,<`<8$
`
end

I then sent the file to IBM and let them handle the decoding, but it’s easy enough to do it yourself. I opened the file with vi and removed some garbage at the beginning that was a result of logging everything with putty. My cleaned-up version started with:

    begin 600 /tmp/snapI renamed it so it ended with a .uue extension. Then I opened the file using 7zip on a Windows machine. Alternatively you could move the file somewhere where uudecode is installed and decode it that way.

The point is that not having network access doesn’t mean you can’t move files around. If you can at least get to the console, you can still transfer files.

Ultimately, I was able to get IBM the information they needed. But let’s get back to uuencode. It’s mentioned in these tips and tricks:

63. How do you send an attachment via mail from AIX?

Uuencode is the answer:
uuencode [source-file] [filename].b64 mail -v -s “subject” [email-address]

For example:
# uuencode /etc/motd motd.b64 mail -v -s “Message of the day” email@hostname.com
I use the .b64 extension which gets recognized by Winzip. When you received your email in Outlook, you will have an attachment, which can be opened by Winzip.

Have you used uuencode/uudecode lately? Does this topic bring back any old memories?

Big Changes are Coming to the HMC

Edit: Some links no longer work

Originally posted August 1, 2017 on AIXchange

Are you bored with that same old x86 version of the HMC that you’ve used for years? Are you tired of the same old interface that you’ve long mastered? Are you ready for a change? Well ready or not, change is coming.

Beginning with V8R8.7.0, there will no longer be an option to run the classic HMC GUI. Everyone will need to get on board with using the enhanced version of the GUI. The good news is that is performance with the enhanced GUI has improved since it originally came out, so now is the time to take another look.

Even if you like the “classic” HMC GUI, or if you’re simply accustomed to it, this development shouldn’t come as a surprise. The enhanced GUI option that’s available when you log into your HMC has been around for more than two years now.

Additionally, we’ve had the option of using the virtual HMC (vHMC) for some time as well.

Until now, those were your choices regarding the HMC: You could run the HMC on dedicated x86 hardware and log into the classic or enhanced GUIs, or you could run the vHMC in VMware or KVM and log into the classic or enhanced GUI. That was it.

But since IBM Power Systems hardware will soon be capable of running HMC code, we’re about to have four HMC options: In addition to being able to run a vHMC on x86, or an HMC on dedicated x86 (as you’re probably used to), you’ll also be able to run a vHMC on a POWER LPAR. This is new. Last but not least is the other new option: we’ll be able to run HMC code on POWER8 hardware once the new HMC model becomes available later this year.

If you’d like to learn more, here are a couple of good resources. First, there’s this AIX Virtual User Group replay. Download the slides (here and here) and listen here. For something quicker, check out this Q&A.

While the original iteration of the vHMC is certainly interesting, I cannot wait to test out a vHMC running in an LPAR on a POWER server. I’m also excited about the new hardware offering, which should GA in the second half of 2017. Since it’s a POWER8 server, we can finally manage our Power Systems hardware fleet without any need for x86 in our environment at all.

So where are you headed with the HMC? Will you stick with x86 HMCs, or will you move to a Power Systems version? Do you have plans for virtual HMCs?

Once I get hands on with the different options, I’ll be sure to share my thoughts and findings with you.

The Place to Go for AIX Updates

Edit: Some links no longer work.

Originally posted July 25, 2017 on AIXchange

I’ve previously mentioned my fondness for reading technical documentation. Another great resource along those tech doc lines is the AIX updates IBM provides. For instance, here’s what’s new for AIX 7.2, and here’s the update for AIX 7.1.

Both of these pages provide links to documentation and information that has been changed, broken down by month.

For example, on the AIX 7.1 page, you’ll find:

June 2017
The following information is a summary of the updates that are made to the AIX 7.1 documentation:

  • Added information about the Fibre Channel Adapter Outstanding-Requests.
  • Limit tunable parameter in the Disk and disk adapter tunable parameters topic.
  • Added information about statistics monitoring in the Monitoring cache statistics topic.
  • Added information about the lsmpio command in the MPIO-capable device management topic.
  • Added description about the pr_sysset array in the /proc File topic.
  • Updated information about new installation images in the Installing optional software and service updates using SMIT topic.
  • Updated information PSz, APP and other metrics in the topas command.
  • Updated information about new installation images in the install_all_updates command.
  • Updated information about the -w flag in the chlv command and the mklv command.
  • Updated information about the -cio option in the mount command.
  • Updated information about the icmptimestamp flag in the no command.
  • Updated information about the -L flag in the lslpp command.
  • Updated information about the pinned pages statistics for the -v option in the vmstat command.

April 2017
The following information is a summary of the updates that are made to the AIX 7.1 documentation:

  • Added an example for the sleep, nsleep or usleep subroutine.
  • Updated information about the application servers and the database servers in the nimadm command.
  • Updated information about the flags in the ld command.
  • Updated information about the /tmp directory files in the Migrating AIX topic.
  • Updated information about removing an adapter from EtherChannel in the Making changes to an EtherChannel using Dynamic Adapter Membership topic.

Here’s a glimpse at the AIX 7.2 page:

June 2017
The following information is a summary of the updates that are made to the AIX 7.2 documentation:

  • Added information about the Fibre Channel Adapter Outstanding-Requests.
  • Limit tunable parameter in the Disk and disk adapter tunable parameters topic.
  • Added information about statistics monitoring in the Monitoring cache statistics topic.
  • Added information about the lsmpio command in the MPIO-capable device management topic.
  • Added description about the pr_sysset array in the /proc File topic.
  • Updated information about new installation images in the Installing optional software and service updates using SMIT topic.
  • Updated information PSz, APP and other metrics in the topas command.
  • Updated information about new installation images in the install_all_updates command.
  • Updated information about the -w flag in the chlv command and the mklv command.
  • Updated information about the -cio option in the mount command.
  • Updated information about the icmptimestamp flag in the no command.
  • Updated information about the -L flag in the lslpp command.
  • Updated information about the pinned pages statistics for the -v option in the vmstat command.

April 2017
The following information is a summary of the updates that are made to the AIX 7.2 documentation:

  • Added information about vSCSI disk support in the Live Update restrictions topic.
  • Added information about the thin-provisioned Shared Storage Pool (SSP) storage in the Best practices for the Live Update function topic.
  • Added an example for the sleep, nsleep or usleep subroutine.
  • Updated information about the application servers and the database servers in the nimadm command
  • Updated information about the flags in the ld command.
  • Updated information about the /tmp directory files in the Migrating AIX topic.
  • Updated information about removing an adapter from EtherChannel in the Making changes to an EtherChannel using Dynamic Adapter Membership topic.

This information goes back to July 2016.

On the left-hand side of each page you’ll find a welcome page, links to the what’s new pages, release notes, and an alphabetical listing of the commands that can be found in the documentation. Be sure to check it out.

Much has Changed, but Not Everything

Edit: Time keeps marching on

Originally posted July 18, 2017 on AIXchange

This blog came to life 10 years ago this week, on July 16, 2007. When I started writing AIXchange, my sons were 8 and 4. Now my oldest is a high school graduate and my youngest is a freshman. Time marches on.

When I started, I was writing about POWER5 and AIX 5.3. HMC Version 7, Live Partition Mobility, WPARs and virtual optical devices were all new solutions/technologies. Within months, I’d turned my attention to AIX 6.1, and then to POWER6.

Now we’re running AIX 7.2 and we await the arrival of POWER9. More of us are running Linux workloads on Power hardware. There is more talk about cloud, cognitive, AI, Blockchain, PowerVC, Live Kernel Patching, flash cache and flash storage.

Some things change. Some don’t.

IBM i is still going strong despite the naysayers talking about the end of legacy hardware and legacy applications. And AIX? Sure, some workloads are migrating away from AIX, but AIX on Power is still an engine that runs core businesses. On premises solutions remain relevant in today’s world.

As I wrote here, I still love AIX as an enterprise operating system. Even as I do more with Linux, I appreciate the simplicity and goodness of AIX.

I still love attending conferences and meeting readers. I love engaging with people on Twitter and finding links to new information and technology.

I am sure that part of the reason I have been named an IBM Champion is due to this blog.

There is always something to learn, and hopefully I’ll continue to be able to share information for at least another 10 years. So maybe I’ll get to talk about POWER11 or POWER12, or AIX 10. I don’t know what will happen, but I look forward to seeing what the future holds for me and this industry.

Thanks for reading.

Booting an LPAR from a USB Port

Edit: Some links no longer work

Originally posted July 11, 2017 on AIXchange

Have you booted LPARs from your USB ports? It was much easier than I thought it would be.

I had been a little worried after reading the intro to this article:

Note from the editor: There is limited USB support in AIX and only specific devices are supported. Other devices might work, but IBM does not support their usage. JFS2 file systems are not officially supported on USB devices, but you can try this at your own discretion.

I guess I should start from the beginning: Recently I was talking with someone who was having networking issues that prevented him from using his NIM server. He wanted to know if he could use a USB flash drive to install his LPAR instead. While this has been supported for quite a while (see here and here), I hadn’t taken the time to mess with it.

From the explanations in these posts (here and here), it seemed easy enough, though. We had a test machine, so we gave it a try.

First, we used dynamic LPAR (DLPAR) to get the USB controller attached to this test LPAR. On this system the adapter came up as Universal Serial Bus UHC Spec. After we attached it and ran cfgmgr, we verified that the device was there (usbms0 is what we were interested in).

    # lsdev | grep -i usb
    usb0       Available       USB System Software
    usbhc0     Available 00-08 USB Host Controller (33103500)
    usbhc1     Available 00-09 USB Host Controller (33103500)
    usbhc2     Available 00-0a USB Enhanced Host Controller (3310e000)
    usbms0     Available 2.3   USB Mass Storage

Then we checked for a virtual CD or an .iso image that was mapped to this LPAR. No dice on either. So I decided to copy a physical DVD to the virtual media library and present that to the client LPAR:

    # lsdev | grep cd

Nothing came back, so in the VIO server I ran:

    mkvdev –fbo –vadapter vhost4

This set up the virtual optical device that was connected to the LPAR.

Then, with the physical CD loaded in the drive, I ran:

    mkvopt –name aix7disk1.iso –dev cd0 –ro

This created the .iso image in the /var/vio/VMLibrary filesystem.

After it finished copying from the physical CD, I was able to load it in the virtual CD using:

    loadopt –disk aix7disk1.iso –vtd vtopt6

(Note: vtopt6 was created earlier when I ran the mkvdev –fbo command.)

I was able to verify it was there by running:

    lsmap –vadapter vhost4

Once the .iso image was mounted in the virtual optical device, I was able to log into the client LPAR and run cfgmgr. That made the cd0 device appear. It was linked to the .iso image in the virtual optical device by virtue of the loadopt command we ran earlier.

    # cfgmgr
    # lsdev | grep cd
    cd0        Available      Virtual SCSI Optical Served by VIO Server

Now that the LPAR had the source DVD (the AIX 1 DVD loaded into /dev/cd0) and the USB device (/dev/usbms0), I was ready to run the dd command:

    # dd if=/dev/cd0 of=/dev/usbms0 bs=4096k
    1010+1 records in.
    1010+1 records out.

At this point, we were able to reboot the LPAR and go into SMS and get it to boot from USB. Booting took a bit longer than it would from a virtual optical device, but it still happened quickly enough.

This is a handy procedure if you need to load a VIO server onto a bare metal machine, for example. It’s especially valuable to know if you either don’t have an optical device or you’re using a split back plane and your optical device is connected to the other VIO server.

So how many of you have done this? What else are you doing with your USB drives?

Info on VIO Commands

Edit: Some links no longer work.

Originally posted June 27, 2017 on AIXchange

I want to highlight a few VIO commands, and point you to where you can find even more commands.

For example, have you heard of the VIO rules command?

Purpose
Manages and deploys device setting rules on the Virtual I/O Server (VIOS).

Syntax
rules -o operation [ -l deviceInstanceName | -t class/subclass/type ][ -a Attribute=Value ] [-d] [-n] [-s] [ –f RulesFile ] [-F] [-h]

Description
The rules command is used to capture, deploy, change, compare, and view VIOS rules. It leverages AIX Run Time Expert Solution (ARTEX) technology. VIOS provides predefined default rules that contain the critical rules for VIOS device configuration that are recommended for VIOS best practice. You can use the rules command to manage device settings rules on VIOS.

You can capture them, deploy them, import them, list them, compare them, modify, delete, and add them.

The IBM Knowledge Center has quite a few examples on using the rules command. It also covers the rulescfgset command:

Purpose
Helps to simplify the rules deploy management process.

Syntax
rulescfgset

Description
The rulescfgset command is an interactive tool to guide a user deploying current rules, upon user direction. It identifies if current system settings match the factory default rules. If any mismatch is found, current rules are merged and updated with the recommended default setting rules automatically. When you allow new rules to be applied, the updated current rules are deployed on the system. The new rules do not take effect until the system reboots. If you do not want to deploy immediately, it returns normally. The rulescfgset command updates current rules, as needed and makes the Virtual I/O Server (VIOS) ready at any time to deploy new rules.

Lastly, here is the whole list of VIO and IVM commands listed alphabetically. This page tells you what’s new with the VIO and IVM commands. I’m willing to bet you’ll find something here that you did not know existed.

I encourage you to check the Knowledge Center periodically for changes and updates to this information.

PowerHA Now Includes HTML Reporting Capability

Edit: Some links no longer work.

Originally posted June 20, 2017 on AIXchange

Here’s an informative write-up about the native HTML report with PowerHA:

IBM PowerHA 7.1.3 has a very nice feature; the native HTML report.

We can get this report via clmgr command, and no external requirements is needed, simply having the software base installed.

This HTML report contains very useful information from the cluster:

  • General Information about the cluster.
  • Nodes configuration.
  • Resource Groups and Application Controllers.
  • Network Configuration (IP Labels, Interfaces…).
  • Shared LVM components.

We can use the HTML report as great summary of our IBM PowerHA cluster!!

To create the HTML report with clmgr command:

# clmgr view report cluster TYPE=html FILE=/tmp/powerha.report

This page also includes a sample report. For details about that, check out pages 63-64 of this Shawn Bodily presentation:

Native HTML cluster report is now available via clmgr:

Alternative to the IBM Systems Director reporting feature
No external requirements. Available in the base product.
       
       Benefits include:

  • Contains more cluster configuration information than any other report.
  • Can be scheduled to run automatically via AIX core abilities (e.g. cron).
  • Portable. Can be emailed without loss of information.
  • Fully translated.
  • Allows for inclusion of a company name or logo into the report header.

Limitations:

  • Per-node operation. No centralized management.
  • Relatively modern browser required for tab effect
  • Only officially supported on Internet Explorer and Firefox


On another note, if you haven’t seen the new user interface, watch this short PowerHA UI video:
https://www.youtube.com/watch?v=d_QVvh2dcCM

As the product continues to evolve I’ll continue to cite interesting new features. Let me know if there’s anything you think I should highlight.

Service and Productivity Tools for LoP Users

Edit: Some links no longer work.

Originally posted June 13, 2017 on AIXchange

If you’re running Linux on Power, are you running these service and productivity tools?

While Linux lacks the diagnostics and reporting capabilities that are built into AIX, these tools help bridge that gap. There are tools to help you with hardware inventory. There’s information about Linux platform diagnostics. Although it’s not quite the same as it is on AIX, you will find  explain_syslog command and the diag_encl utility, for example.

There’s an inventory scout, which surveys the system for vital product data. There’s a servicelog and related utilities to manage events that require service. There are power and environmental management features, service aids, performance management, raid adapter utilities and the IBM Electronic Service Agent tool.

Finally, there’s information about service aids, which provides for things like lparstat, lsslot, bootlist, etc.

If you’re running Linux on Power hardware, it just makes sense to run these packages. Go here for help with installation.

The idea that I can run lscfg, lsmcode, lsvpd and lsvio on a Linux partition warms my heart. Linux administrators need to be able to do these things that AIX admins have taken for granted for years.

A Techie’s Guide to Recreational Reading

Edit: Some links no longer work.

Originally posted June 6, 2017 on AIXchange

What sort of person actually enjoys reading through random IBM support documentation? A lot of us, I imagine. I know I do. I find reading docs in my spare time helps me when I’m actually dealing with a problem. I’ll remember reading something, and even if I don’t recall where I saw it, I can usually locate it with minimal digging.

I frequently browse IBM Techdocs. I’ll search on “aix” or “hmc” for example, and change the any time field to something fairly current. You never know what might pop up.

For example, I recently read this document about initiating a resource dump from the HMC. It wasn’t anything I needed to do at that moment, but if I ever do run into this issue, I will know where to look and I know I’ll feel comfortable with the procedure as it won’t be coming from out of the blue. To me it feels like refreshing my memory as opposed to learning a new concept.

The document offers some introductory information, and then goes through the various steps you would take:

The Resource Dump function can be used to generate Power Hypervisor Resource Dumps, Partition Firmware Dumps, SR-IOV Dumps, and Power Hypervisor Macro Resource Dumps….

A non-disruptive Hypervisor Resource Dump, initiated using selector system or the default blank selector, generates a SYSDUMP dump file. This can take an extended amount of time to complete. The dump status will show ‘In Progress’ until it completes. If IBM Support did not specify a selector, then this is the desired dump type.

Toward the end you can see how to do the same thing from the HMC command line:

Alternatively, these dump types may be initiated from the HMC command line, using:
startdump -m {managed server} -t resource -r “{resource selector}”

Example 1: startdump -m 9119-MME*21ABCDE -t resource -r “system”
would generate a non-disruptive Hypervisor Resource Dump, SYSDUMP type, for server 9119-MME*21ABCDE”

Finally, there’s an option to send the dump into IBM for analysis. Read this for details:

This document describes how to retrieve and send an existing server dump to IBM support using HMC Version 7 and HMC Version 8 (classic or enhanced) user interface. Dump types include any server dump such as FSP dumps (FSPDUMP), system or platform dumps (SYSDUMP), power dumps (PWRDUMP), resource dumps (RSCDUMP), and logical partition adjunct dumps (LPADUMP).

Be sure to read the whole thing. There’s additional information that I haven’t presented here.

Perusing documents from time to time is a simple way to expand your knowledge base. And I do enjoy it, because I never know what will turn up, or when I might need it.

IBM Spectrum Protect Live Demo

Edit: I still love live demos that I can play with vs Youtube videos

Originally posted May 30, 2017 on AIXchange

Recently I was shown a link to the IBM Spectrum Protect Live Demo, and I thought I’d tell you about my experience with it.

The statement on the landing page sums it up well:

Anyone can be a data protection expert by using the Operations Center!

Even with thousands of systems, virtual machines, and applications over multiple sites, you can quickly verify that your data is protected and identify any trouble spots.

You can follow the task scenarios that are provided or explore the sample environment on your own.

Not all product capabilities are included. If you explore beyond the provided scenarios, some tasks might not complete or might be disabled.

Note that the login information (administrator name and password) are provided on the landing page. You should be able to access the site with a click.

On the right side of your screen, if you click on the icon next to the Guided Demo heading, you’ll learn about the demo itself:

This demonstration features a live IBM Spectrum Protect environment (formerly the IBM Tivoli Storage Manager product family). The demo includes sample data for three backup servers running in a virtual environment.

For best results, use one of the following web browsers on a Windows system to view the demo:
    Microsoft Internet Explorer 10 or 11
    Mozilla Firefox ESR 24 or later

Ensure that your screen resolution is set to a minimum of 1680 x 1050 pixels.

Tip: The layout of the Operations Center automatically adjusts to fit the available space. If the demo instructions and images don’t match what you’re seeing, zoom out as needed (Ctrl + – in most browsers).

You can choose from basic tasks that provide details about what you should expect from the demo. You’ll see how to use the dashboard, how to add a client, how to manage your workflow, and how to customize views.

Of course this information is helpful, but as someone who learns by doing, I like to jump in and click around so I can see how intuitive the interface is. I immediately noticed a listing of the number of clients, applications, virtual machines, systems, services alerts and activity along the left side of the screen. Information about servers, storage and data availability was on the right side.

There are multiple areas you can explore, and different scenarios to try. With the enhanced HMC, along with interfaces to the XIV and Storwize systems, IBM has made substantial efforts to help administrators manage their hardware more easily.

I like what I see, and I’d love to see other demo versions of software so we can get used to the look and feel of other products.

CoD Remains an Under-Utilized Option

Edit: Still a powerful tool for your toolbox. Some links no longer work.

Originally posted May 23, 2017 on AIXchange

Although Capacity on Demand has been around for years, I still encounter customers who are unaware of this option. So here’s a primer/reminder:

Certainly if you run enterprise servers, you should know about CoD. The idea behind it is that you know you will need more memory or cores in the future, but you don’t know precisely when. Or maybe it’s only needed temporarily. (Think of any seasonal business.) Rather than add new servers or hardware, which can require planning and possible downtime, IBM will ship these enterprise class systems that have the hardware physically installed, but it’s activated and paid for only when you are ready to use it. You can choose to be charged only for the resources you consume, or you can simply have the hardware activated permanently.

Here’s more about the CoD offering:

Capacity on Demand allows you to easily activate processors and memory without disruption to your operations, paying for the increased capacity as your needs grow. The following programs are available:

Capacity Upgrade on Demand (static)
Capacity Upgrade on Demand provides the ability to bring new capacity on-line quickly and easily. Processors and memory can be activated dynamically without interrupting system or partition operations. Processors can be activated in increments of 1 processor, while memory can be activated in increments of 1 GB. As your workload demands require more processing power, you can activate inactive processors or memory simply by placing an order for an activation feature. You can retrieve, over the Internet an electronically encrypted activation code that unlocks the desired amount of capacity. There is no hardware to ship and install, and no additional contract is required.

Elastic Capacity on Demand (temporary)
Elastic CoD (formally known as On/Off Capacity on Demand) provides short-term processor and memory activation capability for fluctuating peak processing requirements such as seasonal activity, period-end or special promotions. When you order an Elastic CoD feature, you receive an enablement code that allows a system operator to make requests for additional processor and memory capacity in increments of 1 processor day or 1 GB memory day. The system monitors the amount and duration of the activations. Both prepay and post-pay options are available.

Utility Capacity on Demand
Utility CoD provides automated use of on demand processors from the shared processor pool for short-term workloads on IBM POWER6, POWER7 and POWER8 processor-based systems. Utility CoD is for customers with unpredictable, short workload spikes who need an automated and affordable way to help assure adequate server performance is available as needed. Usage is measured in processor minute increments.

Trial Capacity on Demand
Trial Capacity on Demand provides the flexibility to evaluate how additional resources will affect system workloads. A standard request is easily made for a set number of processor core activations and/or a set amount of memory activations. The standard requests can be made after system installation and again after each purchase of permanent processor activation. POWER5 and POWER6 servers except the POWER6 595 can activate up to 2 processor cores and/or up to 4 GB of memory. The POWER6 595, Power 770, 780 795, E870 and E880 can activated up to 8 processor cores and/or up to 64 GB of memory.

An exception request can be made one time, over the life of the machine and enables all available processor cores or memory.

Both standard and exception requests are available at no additional charge….

Trial Active Memory Expansion
Active Memory Expansion can allow a POWER7 and POWER8 server to expand beyond the physical memory limits of the server for an AIX 6.1 partition. Thus a previously memory constrained server can do more work. The degree of expansion depends on how compressible is the partition’s data and on having additional CPU resource for the compression/decompression. A one-time, no-charge 60-day trial allows the specific expansion and CPU usage to be evaluated.

Power Enterprise Pools
Power Enterprise Pools establishes a new level of flexibility and value for systems that operate together as a pool of resources. Mobile activations are available for use on Power 770, 780, 795, E870 and E880 systems and can be assigned to any system in a predefined pool by the user with simple HMC commands. IBM does not need to be notified when these resources are reassigned within a pool. The simplicity of operations provides new flexibility when managing large workloads in a pool of systems. This new feature is especially appealing to aid in providing continuous application availability during maintenance windows. Not only can workloads easily move to alternate systems but now the activations can move as well.The process of activating resources is explained here. Basically, you get the appropriate code from IBM and enter it into the HMC. Power Enterprise Pools require a little more work, as there are actual legal documents to sign, etc., but it’s a great option for moving activations among a pool of multiple physical servers.

Finally, here’s a list of planning guides, user guides, presentations and other technical information.

How to Seize Information About Your SEAs

Edit: Still a useful technique, some links no longer work.

Originally posted May 16, 2017 on AIXchange

Shared Ethernet adapters have matured as a technology. A few years ago when SEAs were new and a little more esoteric, they were occasionally misconfigured, leading to network issues. Now that we have more experience with them, I don’t hear about many problems with SEAs these days.

As is mentioned in the Power Implementation Quality Standard, you may find that you’re more interested in SEA load sharing because it allows you to utilize all of your 10G interfaces and switch ports. Of course a lot of folks have eschewed explicitly setting control channels and are just using the default control channel. As I said, SEAs are easier to use now.

Nonetheless, the question still comes up with both new and legacy systems running SEAs in their VIO servers about which interface is the primary and which is the backup at any given time.

Recently, a client wanted to know the status of their interfaces. This has always been my go-to command to see which SEA is active:

    netstat –cdlistats | grep “State:” | grep –v “Operation State” | grep –v “Stream State”

In this case though, it wasn’t sufficient. My client has multiple SEAs and multiple physical and virtual interfaces, but the output from this command only lists the status of the interfaces; there’s no way to tell which SEA is which:

          padmin $ netstat –cdlistats | grep “State:” | grep –v “Operation State” | grep –v “Stream State”
LAN State: Operational
    State: PRIMARY
LAN State: Operational
LAN State: Operational
    State: BACKUP
LAN State: Operational
LAN State: Operational
    State: PRIMARY
LAN State: Operational
LAN State: Operational

The following command will tell you about all your virtual interfaces, including those that are part of an SEA and if they’re available. You can also find out individual adapter IDs and location codes of backing devices:

    lsmap -all -net

To get the names of the adapters that are SEAs, simply add this flag:

    lsmap -all -net -field sea

You’ll see this output:

    SEA        ent1

    SEA        ent2

    SEA        ent3

By the way, if you’re looking for information about other fields that might be of interest to you, check out this document. It also explains how to change the delimiter.

Ultimately though, my client had a specific issue to address. I took the output from the lsmap -all -net command and created a for loop. Using the awk command, I isolated the entX value that corresponded to the SEAs on the system.

This was the loop I came up with, along with the output that I saw:

for i in `lsmap –all –net –field sea | awk ‘{print $2}’`
do
echo $i ; entstat –all $i | grep State
done

padmin$ for i in `lsmap -all -net -field sea | awk ‘{print $2}’`
> do
> echo $i ; entstat -all $i | grep State
> done
ent1
    State: PRIMARY
LAN State: Operational
LAN State: Operational

ent2
    State: BACKUP
LAN State: Operational
LAN State: Operational

ent3
    State: PRIMARY
LAN State: Operational
LAN State: Operational

Obviously you can get a lot more information with entstat, but this is what I needed.

How do you determine which VIO server is primary and which is the backup in your environment?

About Processor Modes

Edit: Some links no longer work.

Originally posted May 9, 2017 on AIXchange

During the build out of a new POWER8 server, we were loading our VIO servers. We were using the classic HMC interface, and a user clicked on the profile after VIOS was running. He checked the hardware tab, and in that tab, they happened to see the processor compatibility mode.

What would you expect to see there?

Naturally, you’d expect to see POWER8 mode. But instead, POWER7 was the compatibility mode being displayed.

An AIX LPAR did show POWER8 mode. So why wasn’t POWER8 appearing in the VIO server LPAR? For that matter, why are there differences in processor modes anyway?

Here’s a good overview of what I’m talking about. But the short version is this: There are POWER6, POWER7 and POWER8 processor modes. One advantage with these nodes is that they can be used to enable live partition mobility operations between different server families. That helps when migrating from POWER6 or POWER7 to POWER8 servers, because it allows you to take an outage to change the processor mode at your convenience. In effect, you can migrate without any downtime.

There’s also a default processor mode. From the same link:

The default processor compatibility mode is a preferred processor compatibility mode that enables the hypervisor to determine the current mode for the logical partition. When the preferred mode is set to default, the hypervisor sets the current mode to the most fully featured mode supported by the operating environment. In most cases, this is the processor type of the server on which the logical partition is activated. For example, assume that the preferred mode is set to default and the logical partition is running on a POWER8 processor-based server. Because the operating environment supports the POWER8 processor capabilities, the hypervisor sets the current processor compatibility mode to POWER8.

But back to the original issue: Why was VIOS coming up in POWER7 mode? VIO server is based on AIX 6.1. The hypervisor has determined that POWER7 compatibility mode is the best mode in which to run. This is also confirmed in this article:

Once the VIO server was running, I went through all the normal checks. ioslevel shows as 2.2.3.3 and oslevel –s shows the operating system at 6100-09-03-1415. This means the VIO will be running in SMT4 mode since SMT8 requires 7.1 tl03 sp3.

So it’s nothing to worry about; it’s just another thing to be aware of as you move to POWER8.

There’s Still Something About the Good Old Days of Tech

Edit: The more things change..

Originally posted May 2, 2017 on AIXchange

I like to check my Twitter analytics to get a feel for the kinds of topics that my followers find interesting. You can learn the number of impressions and engagements for each individual tweet, as well as your overall engagement rate.

I don’t have a huge Twitter following, so my numbers are usually pretty modest. That’s why I was so surprised at the response to a recent tweet.

On March 12, my notifications lit up over this. I simply said, “Retweet if you’ve ever typed AT commands like ATDT to control a modem.”

To my astonishment, I received more than 9,000 impressions so far for that one, and as of the time of this writing, it still gets the occasional retweet. For the sake of comparison, most of my tweets only generate a few hundred impressions.

I guess it goes to show that there’s something about the “good ol’ days” of tech that seems to capture so many imaginations. For me, when I saw that initial tweet, I instantly flashed back to the sound of a modem connecting.

That sound meant that soon you’d be back online. Of course getting and staying online wasn’t easy in those days. I had a 386 zeos laptop and an external modem, and I can remember needing to be sure to bring either my long distance calling card or some local POP phone numbers so I could check email or send files. To the world at large, computers weren’t ubiquitous then, and wifi and data on our phones didn’t exist. “Long-distance calling” could mean talking to someone a half hour away, and it cost real money. Yet here we were, able to send emails and chat with people from anywhere. It seemed so awe-inspiring back then.

As much as I enjoy being able to connect to wifi from an airplane to check email or watch cat videos while hurtling through the sky at 30,000 feet, there’s something almost romantic about all those hoops we jumped through to get access to our text-based Internet. At least for those of us who’ve been there from the evolution from BBSs to now, there’s a simple perfection to text-based computing. It may be why us older computer nerds have less of an issue with things like the Korn shell and ‘set -o vi’ and vi editing. We were using AT commands before we ever heard the sounds that made it happen. We grew up with these kinds of systems. We didn’t grow up in the graphical, “GUI-fied” world that came after.

This was the soundtrack of life in the 1990s. Before the Eternal September. Before any of your non-techie friends even knew what you were talking about when you mentioned gopher or usenet or irc or email, let alone the World Wide Web.

Running YUM on AIX

Edit: This is still the best way to load rpm packages. Some links no longer work.

Originally posted April 25, 2017 on AIXchange

When it comes to getting open source packages onto your AIX system, there are lots of options. And while package dependencies are always a problem, Michael Perzl’s solution is still sound.

But now there’s another way to handle rpms and dependencies on your AIX machine: YUM. YUM was covered in detail in the December AIX Virtual User Group meeting. I encourage you to listen to the replay and learn more.

Here are some notes from the presentation.

From slide 3:

YUM (Yellowdog Updater, Modified) is an open source command line package management utility for RPM packages.
• YUM is a tool for installing, removing, querying, and managing RPM packages.
• YUM automatically determines dependencies for the packages getting updated/installed thus fetches the dependent packages and installs them with the packages.
• YUM works with an existing software repository that contains RPM packages. The YUM repository can be local or over network.
From slide 4:

YUM on AIX – Before and After
• Before YUM
o Difficult to navigate package dependencies.
o Often must manually determine dependencies, one by one.
o Manually check from toolbox/repository if a package or a new version of package is available.
• After YUM
o YUM automatically consults a dependency database and downloads the dependent packages you need.
o YUM can list all the packages available on the repository.
o Using ‘yum check-update’ one can find if new versions of installed packages are available and can use ‘yum update <package>’ to update it.
Note : YUM doesn’t recognize installp dependencies. For example OpenSSL libraries are provided by AIX in installp format, so if an RPM that depends on OpenSSL then user should make sure to keep his OpenSSL image current.Slide 6 points to this easy to follow document on installing YUM on AIX. Follow along as I share my own experience with getting YUM working.

To get started, I went here and grabbed the rpm.file, which I copied to my machine.

Then I went here for the yum_bundle_v1.tar file, and copied that over.

I installed rpm.rte using smitty. There were no issues. Then I installed the yum_bundle_v1.tar by first untarring it:

I ran this to complete the installation:

At this point I had these rpm files loaded on my systems:

# rpm -qa
expect-5.42.1-3.ppc
tk-8.4.7-3.ppc
readline-6.1-2.ppc
gettext-0.10.40-8.ppc
yum-metadata-parser-1.1.4-1.ppc
db-4.8.24-3.ppc
pysqlite-1.1.7-1.ppc
curl-7.44.0-1.ppc
python-urlgrabber-3.10.1-1.noarch
python-devel-2.7.10-1.ppc
tcl-8.4.7-3.ppc
AIX-rpm-7.2.1.0-2.ppc
sqlite-3.7.15.2-2.ppc
glib2-2.14.6-2.ppc
gdbm-1.8.3-5.ppc
python-2.7.10-1.ppc
python-iniparse-0.4-1.noarch
python-pycurl-7.19.3-1.ppc
yum-3.4.3-3.noarch
python-tools-2.7.10-1.ppc

I went ahead and edited my yum.conf as stated in the instructions. Then I just ran “yum install wget” and saw:

I was able to run yum update and it went ahead and updated my rpm packages as well.

Were you aware of this capability? Have you had any issues with it?

A Different Way to Look at Systems

Edit: Some links no longer work

Originally posted April 18, 2017 on AIXchange

Sure, I enjoy reading about new systems. I’ve also done my share of writing about them. But for me, there’s nothing better than being able to actually visualize hardware. Of course speeds and feeds are helpful, but I want to see where the different cables plug in, or how a server or expansion drawer fits in the rack. It’s just easier to understand how things go together when you can give it the eye test.

The IBM Interactive Product Tour Catalog is a helpful tool for visualizing servers, storage and related solutions. Viewing options are divided into four main categories: Systems and Servers, System Storage, Storage Networking, and Solutions.

For example, from Systems and Storage, you can drill down into IBM LinuxONE, IBM Power Systems, IBM System z, or the IBM zEnterprise BC12. Drill down into Power Systems, and you’ll find Enterprise, Scale Out, Scale Out/Linux, Converged Infrastructure, and I/O Drawer. There are many different models to choose from, including POWER7 and POWER8 options.

Once you choose the product that interests you, you can access product animations that offer views from the front, rear, and even with the cover removed. Here’s the front view of an S822. By hovering your mouse over different parts of the system, you can see where the disks are located, where the SSDs would go, and where the operating panel, USB port and DVD slot are. And again, you can view it without the cover, which allows you to see the location of the fans, processor, memory, power supply and SAS cards.

The Overview option provides server specifications. Here’s the description of the S822:

Power S822 is a 2-socket 2U system which can be ordered with the flexibility of either one or two processor sockets populated provides growth capacity for customers who need it. It provides the benefits of greater performance per core as well as per socket with POWER8 processors, new I/O capabilities, higher internal storage and PCIe capacities and performance, the capability to support CAPI accelerator devices, and greater RAS including hot-plug PCIe capability.As I noted, you can also view enterprise models. For instance with the 880, you can view the system control unit, the Power I/O drawer, or the E880 CEC. There’s an overview and lists of highlights and specifications for this system as well. And don’t forget to check out the storage subsystems and related solutions.

This is a nice way to get more familiar with the form factors and how the systems are actually laid out. Basically, it’s the next best thing to actually being in the same room as the hardware.

The Danger of Defaults

Edit: Some links no longer work

Originally posted April 11, 2017 on AIXchange

A friend who was in the midst of a migration project recently asked me what I knew about TCPTR. Short answer: not much. So I went searching and found this definition:

Configures or displays TCP Traffic Regulation (TR) policy information to control the maximum incoming socket connections for ports.That led me to this more detailed explanation:

TCP network services and subsystems running on AIX automatically and transparently take advantage of this powerful DoS mitigation technology using simple administrative tuning. This new feature provides a simplified approach to increased network security by leveraging centralized management and firewall-based customization. In addition to providing effective service-level and system-level TCP DoS mitigation, IBM AIX TCP Traffic Regulation provides system-wide TCP connection resource diversity across source Internet protocol addresses initiating connections.That jarred my memory, so I went back to this article:

Over the weekend, a client implemented security hardening on their production LPARs. They used AIX 6.1 Security Expert. Apart from some users who had been locked out due to weak passwords, testing went well … until about 9am Monday, when some users reported they couldn’t log in.

I forwarded all that to my friend, but in the meantime, he’d figured out his issue. The details are pretty interesting:

The TCPTR functionality in AIX regulates the amount of connections on certain ports. If you run AIXPert and chose the high settings, it enables this functionality.

This particular application that hits the database on our server generates a lot of connections, more than are allowed by TCPTR by default. So, it was dropping connections. It doesn’t log this, in fact you can’t even enable logging of it (I asked IBM).

We turned it off and our problem went away.

Here is a basic rundown of what happened:

  • We were working on a migration project from Oracle 9 on Solaris 9 to Oracle 11 on AIX 7.1.
  • We had done preliminary migrations and testing with a small number of users.
  • On the weekend of the cutover, things were looking good. Database exports/imports went fine.
  • On Monday morning things were still looking good. No complaints from the users.
  • About mid-morning, we started getting reports of some users experiencing slowness and/or disconnects.
  • We began troubleshooting. We found errors in the Oracle logs like “TNS:packet writer failure” and “TNS:lost contact.”
  • This lead us to believe that we were dealing with an Oracle issue.
  • We spent a good part of the day reviewing and changing Oracle settings including TNS name resolution settings, etc.
  • Later in the day, after doing some we searches, one of the guys stumbled across this article.
  • We checked our systems, and sure enough, tcptr was enabled.
  • We disabled tcptr, and the issue cleared.
  • Upon some further investigation of my notes from six years ago when we first rolled out these new LPARs, it looks like we decided to use the AIXPert tool to enable some hardening of the AIX systems.
  • We must have used the AIXPert “high” setting, which enables TCPTR.
  • We have been running all this time without any issue, because the number of connections to our systems never exceeded the restrictions that tcptr puts in place by default.
  • For this new database we migrated, a large number of client connections are made, which exceeded the default settings for TCPTR.

I see this story as a cautionary tale. We’re taking chances when we accept a tool’s defaults without fully understanding what is being changed under the covers. But what can we really do about this? I always argue for using test and development LPARs and doing testing whenever possible, but this environment had been running with these rules in production for years without an impact until the usage scenario changed and more connections were coming in.

Obviously this isn’t an isolated issue, as at least two customers have run into it that we know about. Now I throw the question out to my readers. Have you experienced this or at least heard about it? Moreover, what should we be doing to protect ourselves and our environments?

More rPerf Resources

Edit: Some links no longer work

Originally posted April 4, 2017 on AIXchange

Earlier this year I pointed you to a method for finding relative performance (rPerf) numbers for your LPAR.

Sometimes you may want to compare the rPerf of different IBM Power Systems, some of which you may not even have access to. If you’re replacing a POWER6 system with POWER8, you’ll need some way to compare these systems so, for instance, you’ll have a better understanding of the number of cores you’ll need to activate for new or existing workloads. As long as you know the model number, the total number of cores on the system and the CPU speed, you can obtain some valuable information.

IBM, which publishes the rPerf values, describes it this way:

rPerf estimates are calculated based on systems with the latest levels of AIX and other pertinent software at the time of system announcement. Actual performance will vary based on application and configuration details. The IBM eServer pSeries 640 is the baseline reference system and has a value of 1.0. Although rPerf may be used to compare estimated IBM UNIX commercial processing performance, actual system performance may vary and is dependent upon many factors including system hardware configuration and software design and configuration. Note that the rPerf methodology used for the POWER6 processor-based systems is identical to that used for the POWER5 processor-based systems. Variations in incremental system performance may be observed in commercial workloads due to changes in the underlying system architecture.

You can always find the rPerf numbers by downloading the latest facts and features guides. Then simply pull those numbers to compare machines. You can also get the IBM Power Systems Performance Report, which includes other performance values like SPECint and SPECfp. Even better, this handy spreadsheet has everything in one place.

The spreadsheet includes a regularly updated change log (the latest update is November, and it goes all the way back to 2007) so you can see if the system that you’re interested in has been added yet:

Data Sources:
http://www-03.ibm.com/systems/power/hardware/reports/system_perf.html
http://www-03.ibm.com/systems/i/advantages/perfmgmt/resource.html

Do you use rPerf numbers as you plan for upgrades? Were you aware of these options for getting these values?

Moving AIX Workloads to the Cloud

Edit: As we do more of these migrations we will all get better at it.

Originally posted June 2020 by IBM Systems Magazine

Q: I’m interested in moving my on-premises IBM AIX workloads to the public cloud. What are my options?

Most major public cloud providers have IBM Power® hardware offerings that can run AIX®, IBM i and Linux® workloads. With the growth of hybrid cloud environments, you likely recognize the value of these solutions, but you may be unsure how to proceed.

Let’s start with migration options and tactics. Depending on network bandwidth, your process may be as simple as creating a NIM server in your cloud environment, then deploying by creating and moving mksysb images to the cloud. Assuming your rootvg data is static, you can build the VM and then work on migrating the remaining datavg data.

It may also make sense to use IBM PowerVC, IBM’s advanced virtualization and cloud management offering, to export and import OVA images between your current data center and new cloud provider. Here’s a closer look at more options for migration.

Build a Better VM

Cloud providers typically have a basic OS template that can be used to build and deploy VMs. But moving existing OS images and data to the cloud can be complex and require extensive preparation.

Identify a provider that understands your unique environment and the tools and options you’ll need. Even though non-IBM providers will run your cloud on IBM Power Systems™ servers, their technical personnel may be more familiar with x86 platforms. Finding a partner that’s knowledgeable about Power Systems infrastructure is especially critical if you don’t have the staff or the available cycles to handle both existing operations and migration. Expect to run some proofs of concept and test migrations to allow stakeholders to get comfortable with the migration process and the operational changes that will occur once you start running in the cloud.

The document “Migration Strategies for IBM Power Systems Virtual Servers” details several migration options. These include using IBM PowerHA® SystemMirror® Enterprise Edition with GLVM to sync data in real time ahead of the actual cutover (ibm.co/2YUMJPy). It also examines application-specific replication tools such as IBM Db2® HADR and Oracle® Goldengate. Most databases have specific migration requirements, but you may be able to ship logs to your new server or export and import data. Iron out details and conduct thorough testing.

Familiar tools such as rsync, savevg and restvg can also help with this process. It may be easy enough to migrate most of the data, then run rsync to sync the last bit before the final cutover. Built-in AIX tools savevg and restvg are used to backup and restore non-rootvg volume groups. These commands can simplify the creation of your new volume groups and filesystems on your new VM.

Chris Gibson, AIX and Power Systems consultant with IBM Lab Services, also suggests checking out the “IBM Cloud Mass Data Migration FAQ” (ibm.co/2ZCeOeH). This document, which lists common questions and concise answers for the IBM Cloud Mass Data Migration solution, is a physical data-transfer service (with up to 120 TB of usable capacity) that accelerates migration into the IBM cloud. The solution is an option if over-the-network data transfer options are cost-prohibitive, slow or unavailable.

A smooth journey to the cloud starts with careful planning. Evaluate multiple providers and their solutions, and take the time to understand the various migration options. 

Graphing AIX Performance Data

Edit: This is still a relevant topic

Originally posted October 2019 by IBM Systems Magazine

Understand your unique environment to best collect and graph data.

Q: What’s the best way to collect and graph AIX performance data?

You probably won’t care for the short answer, but honestly, it depends.

The best way to collect and graph AIX* performance data will come down to the unique characteristics of your environment. And naturally, the available time and collection of skills of those on staff, along with the resources being budgeted to your operations, are also critical considerations.

Now for the longer answer: Because one size obviously doesn’t fit all, arriving at a solution takes time and forethought. This is true whether you’re assembling a brand new environment or you’re realizing that what’s worked in the past no longer serves your purposes.

In my experience as a consultant, I’ll typically start by asking—and getting answers to—numerous questions:

  • If you plan to host the infrastructure in-house, do you have the cycles in your schedule to stand up and learn a new tool? For that matter, do you have an existing VM—or spare capacity to create a new VM in your environment—that can be used for the effort?
  • Are you looking to send the data off-site and let someone else create and present the reports? Is this Software as a Service-type solution impractical or even impossible in your enterprise given your corporate security posture, firewall requirements or other constraints?
  • Will you need software support to help with setup or troubleshooting if something goes wrong with the data collection? Do you require outside expertise in interpreting the graphs and reports?
  • Who are the consumers of the information being created: management, technical staff or both?
  • What kinds of decisions will be made based on the data? Will data be used to help with server consolidation, or is the idea to rebalance the workloads by identifying frames that are “running hot”?
  • Are you looking to retain past data on system performance? (It’s not a bad idea to have this information on hand to address user questions or concerns.) 
  • Do you need trend data to help determine when new servers or additional capacity should be added to the environment?

Your Performance Data Toolbox

Of course, commercial performance tools can do much of this work for you. Options like Performance Navigator from Midrange Performance Group or Galileo Performance Explorer Suite from The ATS Group require little intervention once you get them up and running. These and other products can include vendor support, which may be a priority/requirement for your management.

IBM offers PM for Power; there’s a basic version available at no charge as well as the full-featured product. For many environments, the limited functionality of the free version is sufficient—it just depends on your needs. And newer versions of the HMC include built-in performance graphs. That at least provides a quick, easy way to investigate any potential issues.

Other freely available options include open-source tools such as Ganglia and RRDtool. Or you can always manually feed your nmon files into the Microsoft Excel-based nmon analyzer tool.

Nigel Griffiths has posted articles and videos that detail newer techniques using njmon, influxdb and grafana. These are highly customizable solutions that allow you to change your graphical views on the fly. They’re capable of handling huge amounts of data from large numbers of LPARs and can be implemented with minimal technical expertise.

Re-Evaluating Your Needs

The process doesn’t end with the choice you make. Over time, you’ll need to re-evaluate your solution. Do the tools still do the job they’re needed to do? Is greater automation needed? Are new and better options available? Have your requirements changed?

There’s no one way to store and maintain performance data. But if you ask the right questions—and keep asking questions—the effort and resources you invest in a solution will be worth it.

HA and DR Overview

Edit: Some links no longer work.

Originally posted March 28, 2017 on AIXchange

What are the different high availability (HA) and disaster recovery (DR) solutions are available for Power Systems? What are the pros and cons of these different solutions?

This comparison document created by Carl Burnett, Joe Cropper, and Ravi Shankar helps answer these questions:

“There are many elements to consider as you plan high availability and disaster recovery solutions for your Power Systems environment. In this article, we will explore some of these solutions and discuss some considerations and best practices when using technologies like PowerHA, Geographically Dispersed Resiliency and PowerVC within the data center.

Clustering based HA or DR solutions rely on redundant standby nodes in the cluster to be able to take over the workload and start them when the primary node fails. Each node in the cluster will monitor health of various elements such as network interfaces, storage, partner nodes, etc. to act when any of these elements fail. Clustering technologies were the closest to fault tolerant environments in regards to HA or DR support based on completely redundant software and hardware components. Cluster solutions are often operating system or platform specific and they do provide detailed error monitoring, though they require considerable effort to deploy and maintain….

It is expected that Power deployments use both models of HA & DR as needed. Cluster based HA-DR solutions are best for protecting critical workloads. For example, SAP environments are distributed and cluster-based HA would be the best method to monitor and act for various components of the stack. For other workloads, a VM restart-based model might be sufficient protection for HA and DR….”

That’s from section 2.0, which includes a nice graph that presents various solution types in terms of the availability they provide and the complexity of setting them up.

This comes from sections 3.0. 3.1 and 3.2:

“Cluster HA/DR solutions have existed for Power systems for a long time (PowerHA has been the leading HA/DR solution on AIX for more than 20 years). They have been enhanced recently to provide additional capabilities and user experiences.

VM restart based HA/DR solutions are new in 2016 and are described below:
1. PowerVC High Availability Features: PowerVC added new capabilities around VM restart High Availability. These capabilities enable customers to deploy cloud environments easily and enable simplified High availability.
2. Geographically Dispersed Resiliency (GDR) Disaster Recovery: IBM introduced a new offering for disaster recovery using VM restart technology and storage mirroring.

… PowerVC provides enterprise virtualization and cloud management of Power Systems and leverages OpenStack to do so. PowerVC has introduced high availability management functions over its past few releases. Listed below is a summary of those features:

· One-click system evacuation: During planned maintenance windows, this feature allows administrators to evacuate a host by leveraging live partition mobility (LPM). PowerVC orchestrates the mobility of all active VMs to other hosts in the environment (or a host of your choice), thereby allowing maintenance (e.g., firmware or VIOS updates, etc.) to be performed without disrupting workloads. While the host is in maintenance mode, PowerVC will not place any new VMs on this host either. Once maintenance is done, VMs can be then be placed on the host again and normal operation can resume.

· Automated remote restart: PowerVC has supported PowerVM’s simplified remote restart feature since its inception in the POWER8 timeframe. This feature allows an administrator to rebuild a VM residing on a failed host to a healthy host (assuming the hosts have shared storage). This is a critical availability feature as it provides a mechanism to recover critical VMs in the event their hosting server fails unexpectedly (read: unplanned outage).

… Power systems now provides VM restart based DR solution for the entire data center. GDR integrates deeply with PowerVM environments (HMC, VIOS) to provide for DR restart of VMs across sites using storage replicated VM images. GDR Disaster Recovery solution is easy to deploy and manage. GDR can manage recovery of hundreds of VMs across the sites.”

The References section end of the document also points you to information about Geographically Dispersed Resiliency (GDR), starting with this diagram about the IBM offering. There are also links to two GDR articles (here and here).

A POWER9 Roadmap

Edit: Now we are doing POWER10 roadmaps. Some links no longer work.

Originally posted March 21, 2017 on AIXchange

I want to point you to Jeff Stuecheli’s POWER9 presentation from January’s AIX Virtual User Group meeting. This information doesn’t involve specific announcements or new models, but it provides an informative look the capabilities of the chip itself. Download the presentation and/or watch the video.

Some highlights:

  • The slide on page 2 shows a roadmap with POWER9 appearing in the second half of 2017 and into 2018, with POWER10 appearing in the 2020 timeframe.
  • Page 3 covers different workloads that POWER9 has been designed for.
  • This is from page 4:

Optimized for Stronger Thread Performance and Efficiency
• Increased execution bandwidth efficiency for a range of workloads including commercial, cognitive and analytics
• Sophisticated instruction scheduling and branch prediction for unoptimized applications and interpretive languages
• Adaptive features for improved efficiency and performance especially in lower memory bandwidth systems

  • This is from page 5:

Re-factored Core Provides Improved Efficiency & Workload Alignment
• Enhanced pipeline efficiency with modular execution and intelligent pipeline control
• Increased pipeline utilization with symmetric data-type engines: Fixed, Float, 128b, SIMD
• Shared compute resource optimizes data-type interchange

  • From page 8: There will be two ways to attach memory. You can either attach it directly or you can use the buffered memory in the scale up systems.
  • Page 10 shows a matrix and what you will be able to expect from the two socket vs. multi-socket systems.
  • Page 11 shows the socket performance you can expect from POWER9 vs. POWER8.
  • Page 13 covers data capacity and throughput.
  • Page 15 covers the bandwidth improvements between CECs on the large systems, and page 17 examines the different accelerators that will be incorporated.
  • This is from page 18:

Extreme Processor/Accelerator Bandwidth and Reduced Latency
• Coherent Memory and Virtual Addressing Capability for all Accelerators
• OpenPOWER Community Enablement – Robust Accelerated Compute OptionsState of the Art I/O and Acceleration Attachment Signaling
– PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
– 25Gb/s Common Link x 48 lanes – 300 GB/s duplex bandwidth
• Robust Accelerated Compute Options with OPEN standards
– On-Chip Acceleration – Gzip x1, 842 Compression x2, AES/SHA x2
– CAPI 2.0 – 4x bandwidth of POWER8 using PCIe Gen 4
– NVLink 2.0 – Next generation of GPU/CPU bandwidth and integration using 25G
– Open CAPI 3.0 – High bandwidth, low latency and open interface using 25G

  • This is from page 19:

Seamless CPU/Accelerator Interaction
• Coherent memory sharing
• Enhanced virtual address translation
• Data interaction with reduced SW & HW overhead

Broader Application of Heterogeneous Compute
• Designed for efficient programming models
• Accelerate complex analytic/cognitive applications

  • Page 23 covers OpenCAPI 3.0 features. This is from page 26:

Enhanced Core and Chip Architecture for Emerging Workloads
• New Core Optimized for Emerging Algorithms to Interpret and Reason
• Bandwidth, Scale, and Capacity, to Ingest and Analyze
Processor Family with Scale-Out and Scale-Up Optimized Silicon
• Enabling a Range of Platform Optimizations – from HSDC Clusters to Enterprise Class Systems
• Extreme Virtualization Capabilities for the Cloud
Premier Acceleration Platform
• Heterogeneous Compute Options to Enable New Application Paradigms
• State of the Art I/O
• Engineered to be Open
These are things that stood out to me, but obviously you’ll get more from listening to the replay.

And if that doesn’t further whet your appetite for POWER9, here are two videos from the Open Compute Project Summit: Aaron Sullivan, Rackspace distinguished engineer, gives a video tour of a system. In another video, Google and Rackspace engineers provide even more details around the systems they are designing.

Selected AIX Versions Can Soon Be Licensed Monthly

Edit: Some links no longer work

Originally posted March 14, 2017 on AIXchange

IBM made an interesting announcement today. The Standard editions of AIX 7.1 and AIX 7.2 will soon be available to be licensed on a monthly basis under Passport Advantage.

This is another example of Power Systems and AIX making their platforms cloud-ready as they transition to a hybrid cloud model. So far we have C models, PowerVC and LC models, and now we have another way to license AIX.  

This whitepaper goes into detail about IBM’s cloud directions. Registration is required to access it.

The new licensing model goes into effect on March 28. You’ll be able to order it using part number 5737-D09.

While enterprise clients could find this interesting if they’re trying to move software and operating systems licensing costs over to their opex bucket, I believe managed service providers (MSPs) have the most to gain. By giving these shops an option to get AIX licenses “on demand,” they can more easily respond to their ever-evolving circumstances (for instance, customers moving into or out of their cloud). Being able to buy a month’s worth of licensing — this includes entitlement and support — at a time really increases their flexibility and helps to account for the possibility of multiple customers that might be sharing the same physical cores.

No monthly reporting to IBM is required, which provides for flexible pricing and billing. It will be offered on the IBM Digital Marketplace, so for the first time you’ll be able to virtually swipe a credit card to get access to AIX.

This offer lists at $26 USD per virtual processor core (VPC) per month. It applies only to E850 machines and below: the small-tier systems. The entitlement is based on customer numbers rather than serial numbers, which is something to keep in mind when dealing with IBM Support. They too will need to get used to this new licensing model.

If you’re interested in this offer, you’ll first need to determine the number of VPCs you’ll need to license for the LPARs that you have defined. IBM developerWorks provides this script for making that calculation:

This script can be used with HMC and/or NovaLink instances to collect and report virtual CPU allocations of Power logical partitions and configured processors on the server. The script is invoked with a valid HMC or Novalink username and a list of space separated IP addresses. Only one username is allowed as input, and that username will be applied to all HMCs/NovaLinks specified.

Also, ssh must be configured between the system running the script and the HMC or Novalink instances it attaches to. See https://www.ibm.com/support/knowledgecenter/POWER6/ipha1/ settingupsecurescriptexecution.htm for more info.

Here is an example of invoking the script.
$ ./vcpu_report.sh | tee output.csv
Enter the HMC/Novalink User: hscroot
Enter HMC/Novalink List (space separated): 9.4.28.92 vhmc2.tlab.ibm.com

The will produced a CSV output file with each row listing the associated HMC or Novalink instance, the system Model-Type & Serial Number, the logical partition name, the LPAR type, and the current procs assigned to the logical partition. An LPAR_TYPE of unknown indicates there’s no RMC session between the HMC / NovaLink and partition.
 For each physical server, licensees must have sufficient entitlements for the lesser of the sum of all VPCs on all virtual servers, or the number of physical cores on the system.

Obviously, this new licensing offer isn’t for everyone, but for MSPs and some others, it could be a game-changer.

New Version of Power Systems Best Practices Now Available

Edit: I always like to look for these documents. Some links no longer work.

Originally posted March 7, 2017 on AIXchange

As I noted in this 2013 post, Fredrik Lundholm compiles and updates a presentation called the Power Implementation Quality Standard for commercial workloads.

This presentation has proven to be rather popular, so I want to let you know that Fredrik’s latest set of slides, version 1.17, can be downloaded here.

The presentation lists changes to previous versions:

“There is some information about the E850C, along with VIOS 2.2.5.

The last time I wrote about these slides we were on version 1.9, so there have been quite a few changes since then. He has a change log on page 3, but here are some of the highlights of what has been updated over time.

Changes for 1.17:
2017 Update, VIOS 2.2.5

Changes for 1.15:
VIOS update, PowerHA update, AIX update, GPFS update, poll uplink, vNIC, SR_IOV, Linux, IBM i

Changes for 1.14:
Clarification aio_maxreqs
VIOS clarification, interim FIXES

Changes for 1.13:
Correction on attribute for large receive
Currency update, POWER8,
I/O Enlarged Capacity

Changes for 1.12:
PowerHA, and PowerHA levels, AIX levels, VIO levels.
Virtual Ethernet buffer update

Changes for 1.11:
Power Saving Animation
Network configuration update admin VLAN / simplification
Removal of obsolete Network design

Changes for 1.10:
Favor Performance without Active Energy Manager
AIX/GPFS code level updates
AIX Memory Pin”

Here are some highlights, from my perspective. First, from page 10:

“With firmware levels 740, 760 or 770 and above on POWER7 systems and all POWER8/POWER7+ models, the ASMI interface includes the favor performance setting.

With POWER8 and HMC8 this interface can be directly accessed from the HMC configuration panels, ASMI is not required. A new option fixed max frequency is also available (1.17).

Engage favour performance as a mandatory modification for most environments in the ”Power Management Mode” menu.

This safely boosts system clock speed by 4-9%.”

This is from page 11:

“On E870/E880 machines the recommendation is to disable “I/O Adapter Enlarged Capacity” to free up hypervisor memory. With PowerVM 2.2.5 and FW860 this is now better. Only disable for AIX/IBM i only systems.

Power off the machine, log on to ASMI menu on HMC –> I/O Adapter Enlarged Capacity:
    Disable I/O Adapter Enlarged Capacity by unselecting the tick box
    Power on the server
    Observe Hypervisor memory consumption”

In addition, there’s a current AIX matrix on page 25, along with plenty of other great information.

If you haven’t looked at Fredrik’s work previously, it’s well worth your time.

A Power Champion, Again

Edit: Some links no longer work. Still proud to be a champion.

Originally posted February 28, 2017 on AIXchange

In case you missed me mentioning it on Twitter (@robmcnelly), I was recently selected as part of the 2017 class of IBM Power Champions.

Along with 13 others, I was first honored as an IBM Power Champion in 2011. When the Power Champions program relaunched last year, I was recognized again.

For 2017, there are a total of 41 Champions, 27 of whom are returning Champions. Read more here:

“After reviewing and evaluating the contributions of our applicants, IBM is happy to announce the 2017 IBM Champions for Power!

The IBM Champion program recognizes innovative thought leaders in the technical community—and rewards these contributors by amplifying their voice and increasing their sphere of influence. An IBM Champion is an IT professional, business leader, developer, or educator who influences and mentors others to help them make best use of IBM software, solutions, and services.

These individuals evangelize IBM solutions, share their knowledge and help grow the community of professionals who are focused on IBM Power. IBM Champions spend a considerable amount of their own time, energy and resources on community efforts—organizing and leading user group events, answering questions in forums, contributing wiki articles and applications, publishing podcasts, sharing instructional videos, and more.”

My employer, Meridian IT, also made mention of it here.

Here’s the complete list of 2017 IBM Power Champions:

Babatunde Akanni
Liam Allan
Torbjorn Appehl
Aaron Bartell
Shawn Bodily
Jim Buck
Lionel Clavien
Benoît Créau
Shrirang “Ranga” Deshpande
Anthony English
Cynthia Fortlage
Alan Fulton
Susan Gantner
Cosimo Gianfreda
Ron Gordon
Midori Hosomi
Tom Huntington
Jay Kruemcke
Hal Kussler
Andy Lin
Jaqui Lynch
Alan Marblestone
Christian Massé
Pete Massiello
Rob McNelly
Brett Murphy
Richie Palma
Jon Paris
Michael Pavlak
Trevor Perry
Jerry Petru
Steve Pitcher
Kody Robinson
Randall Ross
Anthony Skjellum
Shawn Stephens
John Stone
Paul Tuohy
Jeroen Van Lommel
Dave Waddell
Keith Zblewski

Even though I don’t do it for the accolades, being recognized along with so many other accomplished people never gets old. Hopefully I’ll continue to merit being included in this prestigious group. In any event, as POWER9 gets ready to launch, and with POWER10 in the planning stages, I look forward to many more years of evangelizing IBM Power Systems running AIX, Linux and IBM i.

Supporting Systems and the People Who Use Them

Edit: This is still relevant today

Originally posted February 21, 2017 on AIXchange

What systems are you running? That’s an easy enough question to answer. You might tell me that you have two 880s, two 850s and two S822s, all running AIX 7.2.

But what do your systems actually do?

The answer to this question might also seem straight-forward. Say, for instance, that one of your systems runs Oracle, one runs WebSphere, and another runs DB2. So you might have a database layer, an application layer, and a web layer. You may be running PowerHA and GPFS. All of the systems you manage are patched, tuned and running great.

But why did your company purchase your systems, and what do they really do?

I would guess that they run the business. They track money. They track people. They track inventory. Maybe they’re hospital systems that manage patient care. Maybe they’re systems involved in dispatching police or firefighters. Whatever they do, they run the core operation and affect actual people.

Next question: How do your users actually interact with the machines that you manage?

Do you even have an answer to this? If not, you should make the effort to understand how your users work with your systems.

Do your systems run a warehouse? Get out on the dock and learn what can be done to improve their workflow.

Do your computers support manufacturing activity? Go spend time on the manufacturing floor.

Do you have a help desk? Head over there. What kinds of things are users having issues with? How can you help?

Or are you working for a hospital? Then go spend time with the nurses at their workstations and learn about the little things that drive them crazy.

“A co-worker of mine once snapped at a nurse when she had problems logging into her workstation. She responded by asking him if he’d like to come up the hall with her and fix an IV or administer some drugs. Touche. The nurse was just as knowledgeable and passionate about healthcare as my coworker was about technology. Working with computers was important, but it was only a small part of her job. She just needed to enter data and to print some reports. She didn’t care about drivers, passwords or proper startup/shutdown sequences. Once we showed her how to do what she needed to do, she was fine, and we didn’t hear from her again.

End users may not know computers, but they know when they’re running slowly. How often do you take the time to actually sit down with your end users and find out how things are working from their perspective? I’ve had users who were printing out reports from one system and retyping the data into another. How easy would it be to save folks from that effort and aggravation? Just leave the raised floor and take a walk. Find people in other departments that use your systems and ask them for feedback. Ask them if you can look over their shoulder while they use your machine sometime.

End users are our customers. If they weren’t using the data we store and process, there would be no need for us. And if we have a better understanding of users’ problems and frustrations, if we show them better ways to do things, the entire organization benefits.”

I believe most of us understand the need to listen to end users. But we’re busy and they’re certainly busy, so a reminder never hurts. If you actually have access to the people that use your system, accept this gift. Learn how your downtimes actually affect them. Our jobs aren’t just about working with cool technology. Those awesome machines are there to support real people doing real jobs.

New Servers Designed for Smaller Environments

Edit: These offerings are usually pretty popular

Originally posted February 14, 2017 on AIXchange

In my consulting work, I see a number of customers with small machines running critical workloads that don’t incorporate virtualization. Because these workloads aren’t necessarily memory- or CPU-intensive, these customers see no need to set up multiple LPARs. They just want a stand-alone system.

The challenge for many of these customers is that, from a technology standpoint, they’re lagging behind. The hardware is old, and the risks of doing business on older, unsupported systems are substantial.

IBM understands this, and today (Feb. 14), the company is making some announcements to address the modest but pressing needs of these customers.

First, IBM is unveiling the 2U S812 (8284-21A) server. It will come in two flavors: a single-core server for IBM i and a 4-core server for AIX workloads. The form factor is a rack mount system; there aren’t any options for a tower.

These systems will be available in e-config on Feb. 28, and will be generally available on March 17.

Again, this server is designed for a particular subset of customers: those that run AIX or IBM i in a single partition, and don’t use virtualization. This server is not intended for Linux workloads; use the Linux-only or other existing hardware models for them.

The IBM i centric single-core server is a 3.02 GHz POWER8 processor with a maximum of 64G of memory. It has six hot pluggable PCIe Gen3 low profile slots?five if an SAS backplane with write cache is used.

The system supports a maximum of 25 users. It can run IBM i 7.2 or 7.3.

You can add a DVD drive, but there’s no bay for tape or RDX in the system unit. The system has 900W power supplies that can take either 110V or 220V power. You cannot add an I/O drawer, but you could add in fibre adapters to attach to external SAN disk.

The AIX flavor is a 4-core 3.02 GHz POWER8 processor with a maximum of 128G of memory. There’s room for six hot pluggable PCIe Gen3 low profile cards, although, as with the IBM i flavor, only five slots are available if you use the SAS backplane with write cache. There is no option to virtualize the system, but you can add in up to three EXP24S or EXP24SX expansion drawers for up to 72 additional drives. It will run AIX 6.1, 7.1 or 7.2, and also has the 900W 110V or 220V power supplies.

Also announced today is an option for the E880C virtual solution edition for SAP HANA. This is a 48-core 4.02 GHz POWER8 processor system with 2 TB of memory.

In addition, there will be changes with the HMC. As 500G drives become less available, IBM will be switching to 1 TB drives for use in the HMC, with the option to have a second disk with a matching capacity.

Finally, there’s another option for the RDX docking station. This will be the EUA4, which is a follow-up to the EU04.

Search the relevant IBM announcement letters for details. You’ll also want to check for information about some products that are being withdrawn from marketing.

Article Misses the Point on VIOS Use

Edit: Hopefully you are running dual VIOS

Originally posted February 7, 2017 on AIXchange

This was posted on Jan. 17, but it’s worth revisiting. I thought the article was a little over the top, starting with the headline:

“Power Systems running IBM’s VIOS virtualisation need a patch and reboot
Unless you’re willing to tolerate the chance of data corruption”

Here’s what follows:

“IBM on Saturday slipped out news of a nasty bug in its VIOS, its Virtual I/O Server that offers virtualisation services on Power Systems under AIX.

Issue IV91339 strikes when moving virtual machines and means “there is a very small timing window where the VIOS may report to the client LPAR that some I/Os have completed before they actually do.”

IBM advises that “This could cause applications running on the client [logical partition] LPAR to read the wrong data from the virtual device. It’s also possible that data written by the client LPAR to the virtual device may be written incorrectly.

Hence the issue’s title: “possible data corruption after LPM failure.”

Of course data corruption is precisely what Power Systems and AIX are supposed not to do. The platforms are promoted as exceptionally stable and resilient, just the ticket for mission critical applications that can’t afford many maintenance windows, never mind unplanned ones.

So IBM’s guidance that “Installation of the ifix requires a reboot” will not go down well with users.” 

After the article went live, it was updated:

UPDATE: IBM’s now released a fix and updated its advice on this issue.

Big Blue now also says “The risk of hitting this exposure outside of the IBM test lab has had extensive evaluation and is considered extremely small. The controlled test environment where this problem was observed makes use of a high-precision test injection tool that was able to inject a specific error within a tiny window.”

“The chances of hitting this window outside of the IBM test lab are highly unlikely and there is no known occurrence of this issue outside of the IBM test lab.”

The Reg is nonetheless aware that IBM has recommended users implement the patch.”

As I said, I thought this was over the top, and judging by these comments, I wasn’t the only one:

Uh… why not?

patch and boot the secondary, patch and boot the primary. Extra points if you are nice enough to disable the vscsi and vfcs of the corresponding vios first (rmdev -pl $adaptername). Ethernet fails over automatically, though you could add extra grace there as well.

Hardly a big deal. And in order to run into iv91339s bug, you´d have to have a failing lpm in first place.

                    ******************************

If this goes back as far as 2.2.3.X – then, clearly – it is not happening often – and management might decide that the higher risk to business is updating and rebooting a dual VIOS configuration.

As far as change records go: whether they are a major pain or a minor pain or no pain – experience has taught many that no records – ultimately is a ‘killing pain’. This again, is a process that can ensure that the business can manage their risk – as they view it. System administration is not the business – even that “we” have the best of intents “they” must okay the process. That is how business is done.

The argument that should be made is that the systems were engineered for concurrent maintenance. Not doing the maintenance now may lead to a disruptive ‘moment’. The business does not need to know the technical details – it needs to know the relative risk and impact on business. The design – aka best practice – of using dual VIOS is that the impact should be zero – even with a reboot!

                    ******************************

Although there are reasons to go with a single VIOS and with more recent features that provide a cluster-like availability on other servers my preference within my organization is to deploy Dual VIOS. It’s a nominal expense to deploy while having the ability to tell the business the platform will continue to service the dozens of VM’s on each box while we do concurrent maintenance for each VIOS.

We are not shy to our stakeholders either on how we’ve built our Power environment (starting with P4 and now mostly P8) so they have confidence in the platform and our ability to keep it all running virtually non-stop. 

Really, the article’s whole premise is faulty. I can’t recall the last time I saw an environment with VIOS that wasn’t using dual VIO servers. Patching one VIOS, rebooting and then patching the other VIOS is business as usual. Updating VIOS with the client LPARs running is common practice, and isn’t much of a risk in my opinion. During your next patch cycle, add the fix as you always would. This platform is exceptionally stable and resilient, and this article and the comments actually illustrate that point.

Decoding iCalendar Files

Edit: This seems to be less of a problem lately

Originally posted January 31, 2017 on AIXchange

If you use an electronic calendar, chances are you’re dealing with multiple calendaring and email systems between your work and personal accounts. Some folks use Google Calendar, others use Outlook and still others use Lotus Notes for example. Personally, I use multiple email clients, and each one has a calendar. I prefer to keep all of my appointments in one place using one piece of software, and everything has to sync with my phone.  

Many calendar meeting invitations, regardless of the platform, get sent back and forth as iCalendar (.ics) files:

“iCalendar is a computer file format which allows Internet users to send meeting requests and tasks to other Internet users by sharing or sending files in this format through various methods. The files usually have an extension of .ics. With supporting software, such as an email reader or calendar application, recipients of an iCalendar data file can respond to the sender easily or counter-propose another meeting date/time. The file format is specified in a proposed internet standard (RFC 5545) for calendar data exchange.

iCalendar is used and supported by a large number of products, including Google Calendar, Apple Calendar (formerly iCal), IBM Lotus Notes, Yahoo! Calendar, Evolution (software), eM Client, Lightning extension for Mozilla Thunderbird and SeaMonkey, and partially by Microsoft Outlook and Novell GroupWise.”

One thing I’ve noticed is that when I get sent an .ics file or calendar invite in Gmail, Google makes it difficult to transfer that file to another mail reader — it tries really hard to force you to use Google Calendar. You can’t simply forward that invite from Gmail to another mail program and expect it to just work. Fortunately, there is a way to deal with this. Select Show Original to view the original email, and then scroll to the bottom, where there’s a section with this header

    Content-Type: text/calendar; charset=”utf-8″; method=REQUEST
    Content-Transfer-Encoding: base64

Google seems to intentionally encode the .ics file (shocking, I know), so you need a way to make it readable. There are tools that work fine in most instances (just search on “base64 decode”). Basically, you’d cut and paste the information and get a valid .ics file. But if you’re dealing with important, work-related documents, keep in mind that this decoding can also be done from your command line.

For instance, here’s how to work with .isc files in Linux:

    $ echo -n ‘scottlinux.com rocks’ | base64
    c2NvdHRsaW51eC5jb20gcm9ja3MK

    $ echo -n c2NvdHRsaW51eC5jb20gcm9ja3MK | base64 -d
    scottlinux.com rocks

On AIX, you can use openssl:

openssl base64 -e <<< ‘Welcome to openssl wiki’
V2VsY29tZSB0byBvcGVuc3NsIHdpa2kK
openssl base64 -d <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kK’
Welcome to openssl wiki

warning base64 line length is limited to 76 characters by default in openssl ( and generated with 64 characters / line ).

openssl base64 -e <<< ‘Welcome to openssl wiki with a very long line that splits…’
V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRo
YXQgc3BsaXRzLi4uCg==

openssl base64 -d <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRoYXQgc3BsaXRzLi4uCg==’
=> NOTHING !

to be able to decode a base64 line without line feed that exceed 76 characters use -A option :
openssl base64 -d -A <<< ‘V2VsY29tZSB0byBvcGVuc3NsIHdpa2kgd2l0aCBhIHZlcnkgbG9uZyBsaW5lIHRoYXQgc3BsaXRzLi4uCg==’

Welcome to openssl wiki with a very long line that splits…

In any event, plenty of available options make it simple enough to decode the text. Once the text is deobfuscated, save it as an .ics file. You should then be able to open the .ics file with your mail client of choice and successfully add it to your calendar.

Taking on the Upgrade Exception

Edit: Still relevant today

Originally posted January 24, 2017 on AIXchange

During a recent conversation over lunch, my companion made a great observation: No one questions the need to upgrade their computers and other devices anymore — with one notable exception.

Who do you know that is still using Windows XP, even on a home PC? We just replace these tools, because we understand that the current models are so much faster and more powerful. The same goes for the miniature computers we carry around in our pockets (sometimes called smartphones). Sure, over the lifetime of your phone, you will periodically update your OS and your apps. But eventually, we move on here, too, knowing that the new phones have the latest hardware — and recognizing that the phone carriers will stop supporting the old hardware and software over time.

You can even see this with the technology in your living room. Most likely your television is not more than a few years old. Larger flat screens have become more affordable, and with HD you can really sense the difference with that clear, sharp picture.

So what’s the one piece of technology many of us are reluctant to upgrade? You guessed it. It’s our Power Systems servers.

I still run across customers who are running POWER6, POWER5 or even older processors, along with unsupported versions of AIX or IBM i. And I’m still surprised when I see businesses ignore their critical infrastructure to this extreme. We all understand that this was — and is — amazing technology. But it is old. POWER6 came out in the summer of 2007 — nearly a decade ago. AIX 5.3 hasn’t been supported since 2012 (unless you paid for extended support).

So why is the need to upgrade and stay current not as obvious to some enterprise computing customers? There are a number of factors, starting with the amount of money customers invest in Power Systems hardware. That said, IBM has made these systems more affordable over time, and leasing options are available.

Yes, these systems keep running, but just as with the other technology in your life, eventually there’s a tipping point where upgrading your current hardware and OS becomes the safe and prudent course of action. As time goes on, replacing old hardware parts becomes harder and harder. And if anything goes wrong with your operating system or unsupported application, you may be on your own. It’s far better to stay current with your OS and application patches and refresh your hardware regularly to ensure that support is available when you need it.

Sure, at one time we were all excited about migrating to Windows XP, or getting 3G on our phones. But so much better technology is available to us now. And yes, upgrades take some work on our part, but don’t you find that experience kind of exciting, too? I do when I think of the end users. Those days after cutover weekend, when they can’t believe how snappy their machines are, and they’re ecstatic over the time they’re saving because their jobs are running faster and the system is more responsive. That’s a great feeling, and it goes well with the relief of knowing that your enterprise is up-to-date with its critical systems.

Thoughts on Performance Tuning

Edit: Still good stuff

Originally posted January 17, 2017 on AIXchange

I recently discovered this post to the UNIX & Linux Forums. While it’s from 2013, “The Most Incomplete Guide to Performance Tuning” has some great — and still relevant — ideas.

For starters, this is from the section called “What Does Success Mean?”

“The problem is that fast is a relative term. Therefore it is absolutely imperative that you agree with your customer exactly what fast means. Fast is not “I don’t believe you could squeeze any more out of it even if I threaten to fire you”. Fast is something measurable – kilobytes, seconds, transactions, packets, queue length – anything which can be measured and thus quantified. Agree with your customer about this goal before you even attempt to optimize the system. Such an agreement is best laid down literally and is called a Service Level Agreement (SLA). If your customer is internal a mail exchange should be sufficient. Basically it means that you won’t stop your efforts before measurement X is reached and in turn the customer agrees not to pester you any more once that goal is indeed reached.

A possible SLA looks like this:

Quote: The ABC-program is an interactive application. Average response times are now at 2.4 seconds and have to be reduced to below 1.5 seconds on average. Single responses taking longer than 2.5 seconds must not occur.

This can be measured, and it will tell you – and your customer – when you have reached the agreed target.

By contrast, here’s a typical example of work that is not covered by an SLA, a graveyard of hundreds of hours of uncounted, wasted man-hours:

Quote: The ABC-program is a bit slow, but we can’t afford a new system right now, therefore make it as fast as possible without replacing the machine or adding new resources.

The correct answer for such an order is: “if the system is not important enough for you to spend any money on upgrade it, why should it be important enough for me to put any serious work into?”

This is from the section, “What Does Performance Mean?”

“Another all too common misconception is the meaning of “performance”, especially its confusion with speed. Performance is not just about being fast. It’s about being fast enough for a defined purpose under an agreed set of circumstances.

A simple comparison of the difference between performance and speed can be described with this analogy: We have a Ferrari, a large truck, and a Land Rover. Which is fastest? Most people would say the Ferrari, because it can travel at over 300kph. But suppose you’re driving deep in the country with narrow, windy, bumpy roads? The Ferrari’s speed would be reduced to near zero. So, the Land Rover would be the fastest, as it can handle this terrain with relative ease, at near the 100kph limit. Right? But, suppose, then, that we have a 10-tonne truck which can travel at barely 60kph along these roads? If each of these vehicles are carrying cargo, it seems clear that the truck can carry many times more the cargo of the Ferrari and the Land Rover combined. So again: which is the “fastest”? It depends on the purpose (amount of cargo to transport) and environment (streets to go). This is the difference between “performance” and “speed”. The truck may be the slowest vehicle, but if delivering a lot of cargo is part of the goal it might still be the one finishing the task fastest.

There is a succinct difference between fast and fast enough. Most of us work for demanding customers, under economic constraints. We have to not only accommodate their wishes, which are usually easy – throw more hardware at the task – but also their wallet, which is usually empty. Every system is a trade-off between what a customer wants, and what he is willing to pay for. This is another reason why SLA’s are so important. You can attach a price tag to the work the customer is ordering, so they know exactly what they’re getting.”

This is from the section, “Work Like You Walk—One Step at a Time”:

“If you try to tune a system, change one parameter, then monitor again and see what impact that had, or whether it had any impact at all. Even if you have to resort to sets of (carefully crafted) parameter changes do one set, then monitor before moving onto the next set.

Otherwise you run into the problem that you don’t really know what you are measuring, or why. For example, suppose you change the kernel tuning on a system while, at the same time, your colleague has dynamically added several GB of memory to that system. To make matters “worse” the guy from storage is in the process of moving the relevant disks to another, faster subsystem. At the end, your system’s response time improved by 10%.

Great! But how? If you need to gain another 5%, where would you start? If you had known that adding 1GB of memory had improved the response time by 3% and that adding 3 GB more was responsible for most of the rest, while the disk change brought absolutely nothing, and the kernel tuning brought around 0.5%, you could start by adding another 3GB, and then check if that still has a positive impact. Maybe it didn’t, but it’s a promising place to start. As it is, you only know that something you, or your colleagues, did caused the effect, and you have learned little about your problem or your system.”

And this is from the conclusion:

“Always remember that, as a SysAdmin, you do not operate in a vacuum. You are part of a complex environment which includes network admins, DBAs, storage admins and so on. Most of what they do affects what you do. Have a lousy SAN layout? Your I/O-performance will suffer. Have a lousy network setup? Your faster-than-light machine may look like a slug to the users. There is much to be gained if you provide these people with the best information you can glean from your system, because the better the service you offer to them, the better the service you can expect back from them! The network guy will love you if you do not only tell him a hostname but also a port, some connection data, interface statistics and your theory about possible reasons for network problems. The storage admin will adore you if you turn out to be a partner in getting the best storage layout possible, instead of being only a demanding customer.

Unix is all about small specialized entities working together in an orchestrated effort to get something done. The key point in this is that the utility itself might be small and specialized but its interface is usually very powerful and generalized. What works in Unix utilities also works in people working together: increase your “interface” by creating better and more meaningful data and you will see that others will better be able to pool their efforts with yours towards a common goal.”

There’s a lot more, so take the time to read the whole thing.

The PowerVM Story Gets Better

Edit: Some links no longer work.

Originally posted January 10, 2017 on AIXchange

Why do I consider PowerVM to be such a powerful virtualization technology? It has many advantages compared to competing virtualization technologies, including the capabilities it borrows from the mainframe.

This IBM site has a detailed list of advantages, but I’ll highlight some particularly significant ones:

  • PowerVM hypervisor—Supports multiple operating environments on a single system.
  • Micro-partitioning—Enables up to 20 VMs per processor core.
  • Dynamic logical partitioning—Processor, memory, and I/O resources can be dynamically moved between VMs.
  • Shared processor pools—Processor resources for a group of VMs can be capped, reducing software license costs, VMs can use shared (capped or uncapped) processor resources. Processor resources can automatically move between VMs based on workload demands.

Consider how many LPARs you can consolidate and run on a single physical frame without the performance penalties and overhead you encounter when compared to competing hypervisors that run on x86 systems. PowerVM is recognized for how well it scales and performs.

I’ve discussed SAP HANA on POWER before, but the story gets better as SAP recently announced that its HANA workloads can run up to eight production databases on a single server running PowerVM.

Compare that with these notes on what you can do with VMware:

“Just like with the vSphere 5.5 SAP HANA support release in the beginning 2014, vSphere 6 supports currently only one production level VM that may get co-deployed with non-production level SAP HANA VMs. No resource sharing with other VMs is supported for production level SAP HANA VMs.”

Here’s a summary from the IBM Systems blog:

“Often, SAP workloads are among the most important workloads running in enterprises today. They deliver the most benefits from high levels of flexibility, resiliency and performance — of which virtualization is key. As the first platform to support up to eight virtualized production instances of SAP HANA, IBM Power Systems enables clients to run multiple HANA instances on the same system without the restrictions of VMware… .

With features like capacity on demand, virtual machines (LPARs) and hypervisor scheduling, PowerVM virtualization makes it simple to consolidate, integrate and manage multiple SAP systems, so you can reduce your data center footprint and accelerate speed to production through fewer servers. These features also give the ability to manage capacity and shift running applications to take advantage of additional available resources. This helps clients to conduct real-time transactions and make insights available for more rapid decisions… .

SAP HANA on Power Systems offers a smarter, more scalable in-memory database. It depends heavily on large memory configurations and low virtualization overhead to deliver rapid, actionable insights. And because Power Systems with PowerVM supports up to two times more virtualized HANA production databases than competitors’ x86 platforms, clients can run more HANA instances on one server, simplify deployment to production and manage their systems more easily.”

Estimate rperf for your LPAR

Edit: Interesting tool.

Originally posted January 3, 2017 on AIXchange

A recent Nigel Griffiths tweet highlighted this page:

“This is a simple script that outputs the current machine or LPAR to give you the rPerf number.

The rperf numbers are only available for certain number of CPUs.

If you have a different number of CPUs then a rough calculations is made based on the top number of CPUs and dividing appropriately.

If you want to know what rperf used to work out your rating use: rperf -v

There are some problems:
* Older machines don’t have rPerf numbers so the script outputs the roltp number. There is no way to convert a roltp number to a rPerf. You will have to apply your own rules for that.
* Only certain numbers of CPU have official rPerf Numbers like 4 way, 8 way and 16 way. With LPARs, we can have lots of odd numbers of CPU. In this case, the script guesses the rPerf based on rPerf numbers in a fairly crude way. These are a simple calculation and will not be exact – i.e. it straightens out the SMP curve. The script will give a lower than actual rPerf number.
* Shared CPU LPARs that include a faction of a CPU are not handled well – the tool will find the Virtual Processor number and use that as the maximum number of CPUs the LPAR can get.
* On shared CPU LPARs the script is not Entitlement aware but entitlement is not a limiting factor on a uncapped LPAR any way. If capped should the script use Entitlement and not VP?

How will the script get updated? – Easy it is a straight forward simple shell script – you can up date it yourself and give the script back to your AIX community via the comments below.

By definition: The rPerf number is the Relative Performance number when compared to the above RS/6000 44p Model 270 374 MHz announced on 7th February 2000, which has a rPerf of exactly 1.

Syntax by example:

Assuming you rename the script to just “rperf” and have it in your $PATH
blue:nag:/home/nag/rperf $ rperf -?
Usage: .
./rperf [-vehH]

blue:nag:/home/nag/rperf $ rperf
82.77 rPerf estimated based on 8.00 Virtual CPU cores

blue:nag:/home/nag/rperf $ rperf -e
82.77 rPerf estimated based on 8.00 Virtual CPU cores
41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores

blue:nag:/home/nag/rperf $ rperf -h
blue 82.77 rPerf estimated based on 8.00 Virtual CPU cores

blue:nag:/home/nag/rperf $ rperf -h -e
blue 82.77 rPerf estimated based on 8.00 Virtual CPU cores
blue 41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores

blue:nag:/home/nag/rperf $ rperf -v
Information is from public documents from www.ibm.com
— – System p Performance Report
— – System p Facts and Features Document
— – Power Systems Facts and Features Document
— – rperf script Version:31 Date:18thJune2015
Machine=IBM,8233-E8B MHz=3550 Rounded-MHz=3550 CPUs=8 CPUType=PowerPC_POWER7
lookup IBM,8233-E8B_3550_8
matchup 32 331.06
calculate cpus=8 from 32 331.06
82.77 rPerf estimated based on 8.00 Virtual CPU cores
41.38 rPerf estimated based on 4.00 Uncapped Entitlement CPU cores
blue:nag:/home/nag/rperf $ rperf -H
rperf -v -e -h -H
  -v = verbose mode and Entitlement (-e)
  -e = output Entitlement rating and Capped / Uncapped state (in addition)
  -h = output the short hostname at the start of the line
  -H = Help = this output
rperf outputs the performance number of the current machine or LPAR
either Relative Performance (rPerf) or Relative OLTP (roltp)
Depending on the age of the machine.
There is no simple way to convert from roltp to rPerf – sorry.

If it says estimated then it is NOT an official number.
For LPARs the number may be estimated but it is a simple maths calculation
i.e. if we have the official number for 4 CPUs then a 1 CPU LPAR is simply
a fourth – this will be an under estimate.

rperf script wiki page
https://www.ibm.com/developerworks/community/wikis/home#!/wiki/Power%20Systems/page/rperf

e-mail to XXXX@uk.ibm.com

Got a machine that is not on the list or any other problem ?
Make sure you have the latest rperf version
Run: rperf -v
Capture the output
Add that output as a comment at the bottom of this webpage
I get automatically notified and will sort it out
If you are a proper Techie: work out the missing line and put that in the comment too.
Thanks for your use, help and feedback, Nigel Griffiths”

There are links for downloading the files. As of this writing, rperf_v33 is the most recent, from November.

Booting AIX in Debug Mode

Edit: Still good to know.

Originally posted December 20, 2016 on AIXchange

I recently had an AIX LPAR that wasn’t booting. In an effort to gather information, IBM Support had me boot it a few different ways. This document details what we needed to do. I’m copying it here because I want to make sure you’re aware of this option.

“How to enable verbose (debug) output during boot and capture it for later analysis by IBM.

Note this technique is for customers using an HMC to manage their systems.

We will capture the console output by logging in to the HMC via an SSH client such as PuTTY, with logging enabled. This will save the output in a file on the user’s PC.

1. Configure an SSH client (eg PuTTY) to log session output to a local file on the PC.
2. Open a connection to the HMC and login as user ‘hscroot’.
3. Bring up a menu of managed servers by running the command “vtmenu”. If there is only 1 managed server this will bring up a list of LPARs available to connect to.
4. At the vtmenu, select the server to which you desire a console session.
5. Select the LPAR from which you need boot debug.
6. Wait for “Open Completed” message (if LPAR were Running you would get a Console: login)

Booting the LPAR to the Open Firmware (OK) prompt
1. Make sure the LPAR is not activated. If it is hung, go to the HMC GUI, and under Systems Management -> Servers -> server name, check the box next to the LPAR. Then from the arrow on the right side of the LPAR name, popup the menu and select “Operations -> Shut Down”.
2. Wait until the LPAR is in a “Not Activated” state, and the Reference Code shows all zeros.
3. Mouse click on the arrows to the right of the LPAR name again, to get the popup menu. Click “Operations -> Activate -> Profile”
4. From the Activate Logical Partition popup window, click the “Advanced” button.
5. From the Activate Logical Partition – Advanced popup window, click “Open Firmware OK Prompt” from the Boot Mode drop down list.

Enabling the debug boot image
1. Back in the SSH console session window, wait for the Open Firmware prompt “0>”
At the 0> prompt, enter “boot -s verbose”

2. For cdrom boot debug enter:
0> boot cdrom:\ppc\chrp\bootfile.exe -s verbose

At this point, the LPAR will continue to boot and debug information will be sent to the console. While the LPAR is booted in this debug state, all commands that are run will output debug information, such as exec() system calls.

Capturing the debug information
The console session is being run via the SSH connection to the HMC and the output will be captured in the log file configured in the first step. Once the system boot fails or hangs, stop the LPAR and send the boot debug log file to IBM Support for review.

Finishing up
To disconnect from the virtual console you have selected, type the characters tilde and dot.
~.”

The console will ask if you wish to terminate the connection. Type “y” to be disconnected from the virtual console.

At this point you can type <ENTER> to stay in the vtmenu session and choose another console, or type “q” to quit back to the HMC shell prompt.

If you are quitting, then type “exit” to close the HMC ssh session and quit the putty tool.

Once we had collected the data, IBM was able to help determine the problem.

As a reminder, you can also get debug information from your VIO server as well, using this technique:

    Login to VIOS as padmin
    $ oem_setup_env
    # script -a /home/padmin/<PMR#.Branch#>clidebug33.out
    # su – padmin
    $ ioslevel
    $ uname -LMm
    $ export CLI_DEBUG=33
    Run offending command to reproduce error
    $ export CLI_DEBUG=”” (to disable debugging mode)
    $ exit (padmin)
    # exit (script)

Back Up Your HMC or Get Ready to Rebuild

Edit: Backup everything. Then test it.

Originally posted December 13, 2016 on AIXchange

A customer had an HMC issue. There were no backups, so the HMC had to reinstalled from scratch. There wasn’t any documentation either, meaning that the customer had no idea what the network settings should be.

Stop reading for a moment and put yourself in this uncomfortable picture. Do you have backups of your HMC? Is your network information well-documented? Is it documented at all? If you had to rebuild your HMC right now, could you?

Luckily for my customer, they had a simple environment, and their HMC was onsite so they could visually inspect their equipment. One network cable was directly plugged into the HMC port of their single POWER8 system; another connected directly from their HMC into their switch. Knowing this, it was simple enough to determine which port should be the private network and which should be the open network.

What about you? Do you know which physical cables from your HMC are used for which network in your environment?

Back to our story: Configuring the open network was straightforward, and my customer was soon able to use the GUI to connect to the HMC. Once the firewall settings were fixed and the ssh port opened, they could login to the HMC via the command line.

Their next issue was getting the HMC to recognize their managed system. They picked a range of IP addresses to use for their DHCP server, but how would they know which IP address was in use by the managed system?

After looking over this documentation, they ran lshmc -n -F clients. That provided the IP address that had been served out by their DHCP server.

From there, it was a snap to add the managed system, since they knew the address it was using. But what about the password? No one in the group was around when it was originally set up, so no one knew the password for the managed system.

Again, ask yourself: Do you know the passwords for your managed system, ASMI, etc.?

A few failed guesses (naturally) resulted in an authentication failure message. The HMC went into a firmware password locked state. With nowhere else to turn, they did a web search and found this IBM Support document with this helpful bit of information:

       Note: The default password for user admin is admin.

So they tried “admin.” Unsurprisingly, that default was still in place. The customer was able to connect to their managed machine. Everything looked as they expected.

I know this is basic, but even the basics can mess you up if you haven’t thought about them, particularly if you weren’t the one who setup your HMC.

Should you ever find yourself in a similar predicament, here’s a pretty good reference HMC setup.

Bare Metal Recovery Options for Linux

Edit: Still a good question.

Originally posted December 6, 2016 on AIXchange

I recently wrote about backups, though I didn’t get into the bare metal recovery options for Linux.

I wrote about this topic in 2005, and here I am, 11 years later, still wondering where is my integrated bare metal recovery mechanism for Linux? The answer is still going to include Storix, though there’s another utility that may also work for you. It’s called Relax-and-Recover:

 “Set up and forget nature

designed to be easy to setup

designed to require no maintenance (e.g. cron integration, nagios monitoring)

Two-step recovery, with optional guided menus

disaster recovery process targeted at operational teams

migration process offers flexibility and control

Bare metal recovery on dissimilar hardware

support for physical-to-virtual (P2V), virtual-to-physical (V2P)

support for physical-to-physical (P2P) and virtual-to-virtual (V2V)

various virtualization technologies supported (KVM, Xen, VMware)”

Relax-and-Recover is a no-cost product, available through a General Public License (although the developers are happy to take donations or sponsorships, and they do offer support contracts).

Check out the quick start guide and these usage scenarios:

“Relax-and-Recover will not automatically add itself to the Grub bootloader. It copies itself to your /boot folder.

To enable this, add

    GRUB_RESCUE=1

to your local configuration.

The entry in the bootloader is password protected. The default password is REAR. Change it in your own local.conf

    GRUB_RESCUE_PASSWORD=”SECRET”

The most straightforward way to store your DR images is using a central NFS server. The configuration below will store both a backup and the rescue CD in a directory on the share.

     OUTPUT=ISO
  BACKUP=NETFS
  BACKUP_URL=”nfs://192.168.122.1/nfs/rear/”

Backup integration
Relax-and-Recover integrates with various backup solutions. Your backup software takes care of backing up all system files, Relax-and-Recover recreates the filesystems and starts the file restore.

Currently Bacula, Bareos, SEP Sesam, HP DataProtector, CommVault Galaxy, Symantec NetBackup, EMC NetWorker (Legato) and IBM Tivoli Storage Manager are supported.

The following /etc/rear/local.conf uses a USB stick for the rescue system and Bacula for backups. Multiple systems can use the USB stick since the size of the rescue system is probably less than 40M. It relies on your Bacula infrastructure to restore all files.

     BACKUP=BACULA
  OUTPUT=USB
  OUTPUT_URL=”usb:///dev/disk/by-label/REAR-000″

I haven’t tried this tool yet, but it looks interesting. Apparently there are even ppc64le images and pcc64 images.

How do you go about a bare metal recovery of your Linux partitions?

Tech Terms Defined Redefined

Edit: Someone needs to update the IBM Jargon file.

Originally posted November 29, 2016 on AIXchange

I love clever definitions of technology-related terms. In the past I mentioned the IBM Jargon and General Computing Dictionary (which will be 30 years old soon).

Here’s a similar list that — while it’s directed toward an academic audience — is more up-to-date. Don’t worry, you’ll recognize all these terms:

“Analytics, n. pl. The use of numbers to confirm existing prejudices, and the design of complex systems to generate these numbers.

App, n. An elegant way to avoid the World Wide Web.

Asynchronous, adj. The delightful state of being able to engage with someone online without their seeing you, while allowing you to make a sandwich.

Badges, n. pl. The curious conceit that since nobody likes transcripts or degrees, the best thing to do is to shrink them into children’s sizes that nobody recognizes. (see Open Badges)

Best practice, n. An educational approach that someone heard worked well somewhere. See also “transformative,” “game changer,” and “disruptive.”

Chromebook, n. A device that recognizes that the mainframe wasn’t such a bad idea after all.

Cloud, n. 1. A place of terror and dismay, a mysterious digital onslaught, into which we all quietly moved. A “just other people’s computers.”

Counsel, n. Well paid, well trained in neither education nor technology, and rules decisively on (and against) both.

Forum, n. 1. Social Darwinism using 1980s technology.

Infographic, n. An easy way to avoid reading and writing.

Powerpoint, n. 1. A popular and low cost narcotic, mysteriously decriminalized.

Shadow IT department, n. A mysterious alliance that does a lot of work on campus. It seems to include little start-up companies like Google, Amazon, Apple, Microsoft, and others.”

You’ll find many more definitions in that link. Or just check out the Original Hacker’s Dictionary or the Business Jargon Dictionary.

I am sure that there are other tech dictionaries written in a similar vein. Please share your favorites in comments.

Adjusting to a Linux World

Edit: I still love AIX.

Originally posted November 22, 2016 on AIXchange

I use Linux, and have for many years. I run Linux on Power hardware, which is something any Linux enterprise user should consider. Still, I prefer to live in the world of AIX. I understand that Linux is a fixture now, but there are features and capabilities that I take for granted with AIX that aren’t there (at least not yet) with Linux.

This article is a few years old, but it gets at the challenges of working simultaneously with open source and proprietary operating systems. (Incidentally, the author cites two books — “The Cathedral and the Bazaar,” by Eric Raymond, and “The Design of Design,” by Frederick P. Brooks — that would aid in your understanding of what he’s talking about):

“Quality happens only when someone is responsible for it. …

Getting hooked on computers is easy—almost anybody can make a program work, just as almost anybody can nail two pieces of wood together in a few tries. The trouble is that the market for two pieces of wood nailed together—inexpertly—is fairly small outside of the “proud grandfather” segment, and getting from there to a decent set of chairs or fitted cupboards takes talent, practice, and education.”

I enjoy the author’s discussion of the bloat and prereqs and dependencies that exist in modern systems:

“… the map helpfully tells you that if you want to have www/firefox, you will first need to get devel/nspr, security/nss, databases/sqlite3, and so on. Once you look up those in the map and find their dependencies, and recursively look up their dependencies, you will have a shopping list of the 122 packages you will need before you can get to www/firefox. Here is one example of an ironic piece of waste: Sam Leffler’s graphics/libtiff is one of the 122 packages on the road to www/firefox, yet the resulting Firefox browser does not render TIFF images. For reasons I have not tried to uncover, 10 of the 122 packages need Perl and seven need Python; one of them, devel/glib20, needs both languages for reasons I cannot even imagine.

libtool’s configure probes no fewer than 26 different names for the Fortran compiler my system does not have, and then spends another 26 tests to find out if each of these nonexistent Fortran compilers supports the -g option.

That is the sorry reality of the bazaar Raymond praised in his book: a pile of old festering hacks, endlessly copied and pasted by a clueless generation of IT “professionals” who wouldn’t recognize sound IT architecture if you hit them over the head with it.

One of Brooks’s many excellent points is that quality happens only if somebody has the responsibility for it, and that “somebody” can be no more than one single person—with an exception for a dynamic duo.”

Who is responsible for Linux? Which distribution do you even consider to be “Linux”? Who’s in charge of making the switch to systemd, or making sure that Linux distributions don’t break as a result of that change? Who do you call when they do break?

Who coordinates between the different distributions and vendors, and how do you know which one is right for you? Are you going with a commercially supported product like Redhat, SUSE or Ubuntu? How about a community supported flavor like Centos or Fedora? 

With AIX, you know who’s responsible. It’s the project managers at IBM, who take input from customers, prioritize what goes into the next release, conduct proper testing to ensure that the large enterprises who rely on AIX will continue to have stable environments, and to allow for a significant amount of backwards compatibility while introducing new features:

“More than once in recent years, others have reached the same conclusion as Brooks. Some have tried to impose a kind of sanity, or even to lay down the law formally in the form of technical standards, hoping to bring order and structure to the bazaar. So far they have all failed spectacularly, because the generation of lost dot-com wunderkinder in the bazaar has never seen a cathedral and therefore cannot even imagine why you would want one in the first place, much less what it should look like.”

This is the kind of thing that I notice. I get that different sets of programmers and designers will have different opinions about what’s important and necessary, but we’re talking about two different worlds here. How many Linux developers/open source users have spent any time working with mainframes or commercial UNIX operating systems? If all you’ve ever seen is Linux, Windows or macOS, how can you even begin to understand what those of us who manage enterprise systems need to effectively do our jobs?

It’s not that I’m not willing to change. For instance I now realize that I shouldn’t think of my critical systems as friendly pets that require special care and feeding. But the problem, for me, is the significant differences in philosophy and design between those who use enterprise systems and those who use Linux/open source solutions.

With AIX, I have a built in logical volume manager (LVM) that allows me to easily import and export volume groups and resize filesystems. Or consider the capability to migrate rootvg to another disk and run bosboot while the system is up and running. This is not always an option on other operating systems. It can be frustrating to learn that you cannot easily resize partitions or filesystems, or find that default filesystems have changed, and not always for the better. With AIX, I easily find new hardware on my running system with cfgmgr, I check and change the attributes of my adapters without writing a value to /proc, and I dynamically remove CPUs and memory. And being able to choose whether I run one adapter virtually with VIO server and another adapter physically by assigning the card to my LPAR is a nice touch.

Now think about multibos and alt_rootvg and alt_disk_install and the power of being able to boot from hdisk1 after a migration, and how, if there are issues, you can boot from your original OS that is still on hdisk0 and try again later. Some Linux distributions don’t even allow migration from one version of the OS to another; it’s suggested you reinstall. With AIX, I’ve upgraded the same systems for years with no problems.

AIX enjoys deep integration with the hardware, and performance tuning is well understood. There’s built in bare metal backup and restore of the operating system, built in error reporting at the hardware level (try finding which component needs to be replaced on your x86 hardware while your system is running’ maybe your light path diagnostics will work, but maybe not), and rock-solid, well considered virtualization solutions at the hardware level.

Last but not least, as of AIX 7.2, there is Live Update.

Yes, the shift away from proprietary UNIX is happening. Shift isn’t even the word.  Linux is learning, absorbing and getting smarter. Linux is eating the world. But as we continue down this path, I’ll continue to think of the features I already have or those that I’ll have to give up, at least until the Linux phenomenon catches up.

Another Case for Backups

Edit: Still good stuff.

Originally posted November 15, 2016 on AIXchange

As I’ve mentioned, the AIX mailing list is a great place to go to pose questions and receive good answers from other AIX pros. While traffic is typically pretty light, I recently came across an interesting thread about the need to take care when editing critical files:

“A coworker was editing the /etc/passwd file on an LPAR on our P720 server. When he tried to save the file, emacs hiccupped and he ended up putting an empty /etc/passwd file in its place.

Now, with no open sessions to the LPAR, no one can access the LPAR. This LPAR is one of our primary NFS servers. So far, only a few items have stopped working, SAMBA being one. But in general, the hundreds of AIX/Linux/Unix clients in our R&D group are still able to reach the NFS mounts (at least the ones they had automounted when this all happened).

I went to the HMC and got a terminal/console there, but still need a password to get in.

Any ideas as to what I might do to crack this nut and get into the box?”

Put on your thinking cap for a moment. How would you get out of this pickle?

The first two replies offer great suggestions:

“What is your backup product? You may be able to restore it using the agent already running on the system.

New logins will be impossible. You’ll have to leverage something that already has access.”

The second simply consists of a link to this IBM Knowledge Center doc, plus the following:

“once in

echo ‘root:!:0:0::/:/usr/bin/ksh’ > /etc/passwd
chmod 644 /etc/passwd
sync;sync;sync;reboot”

The next day, the solution was posted:

“Thank you for all your suggestions. It brought back to heart why I so loved AIX and the support I can get (and occasionally give).

Here is the solution:

We had a backup of the system, but the tape was offsite at our DR site (Some cave under Lake Erie, or the like).

Then it hit me, I do not need “the” /etc/passwd file from this LPAR (last backed up in a Full backup in August!). I just need “a” /etc/passwd file. ANY /etc/passwd file. Or just a one line passwd file I could make myself.

I just needed the back-door of NetBackup to place the file there.

By now I had the NetBackup guys on the line and in Priority 1 mode, so I asked them to pull the /etc/passwd file off of the twin LPAR on the other P720 we have. Then restore it to this LPAR. In less than two minutes, we were back in business!! Then I had a copy of the real passwd file, which is nearly identical to the one from the other LPAR, and I put that in place.”

I’ll also cite the reply to that, because it’s an awesome punch line:

“Having a close call is a good time to review your backups and your bootable media.

If you don’t have NIM then it’s critical to keep media at close level to what you are running available near the machine.”

The rest of the discussion covers things like making sure your NIM server ready to go, along with some more details around what went wrong with editing /etc/passwd in the first place. It turns out they did have a backup of /etc/passwd, but since they couldn’t log into the machine at all, they were unable to copy that saved file.

Indeed, the best time to ensure your machine is backed up is before a disaster strikes. Run through this checklist:

  • Do you have a current viosbr?
  • Have you run backupios?
  • Do you have a current accessible mksysb of your VIO server? Do you have current mksysbs of your LPARs? Do you have a local Alt Disk Copy of rootvg?
  • Is your HMC backup current? Do you have a mksysb of your NIM server?
  • If you take backups, that’s great. But have you tested them? Are they accessible if your computer room burns down? I wrote about this more than 10 years ago, yet here we are, still needing to backup our machines and still needing to know how to restore them.

We’re all busy, but it’s essential to take time now to figure out how you can recover your systems.

Building Virtual Environments

Edit: Some links no longer work.

Originally posted November 8, 2016 on AIXchange

This IBM developerWorks page offers helpful information about building virtual environments. While it hasn’t been updated in awhile, the content is certainly relevant.

There are four sections, covering “pre-virtualization,” planning and design, implementation, and management and administration. The information that follows is excerpted in bits and pieces:

“Virtualization is large subject, so this section will assume you know the basics and you have at least done your homework in reading the two Advanced POWER Virtualization Redbooks. …

Skill Up on Virtualization
* You need to invest time and practice before starting a Virtualization implementation because misunderstandings can cost time and effort to sort out – the old saying “do it right the first time” applies here.
* There is no quick path. …

Assess the virtualization skills on hand
* It is not recommended to start virtualization with only one trained person due to the obvious risk of that person becoming “unavailable”
* Some computer sites run with comparatively low-skilled operations staff and bring in consultants or technical specialists for implementation work – in which case, you may need to check the skill levels of those people. …

Practice makes perfect
* In the pSeries, the same virtualization features are available from top to bottom — this makes having a small machine on which to practice, learn and test a realistic proposition. The current bottom of the line p505 is available at relatively low cost, so if you are preparing to run virtualization on larger machines, you can get experience for a tiny cost.
* Also, in many sites machines in the production computer room have to undergo strict “lock down” and process management – a small test machine does not have to be run this way, and I have seen system administrators and operations run a “crash and burn” machine under their desk to allow more flexibility.”

There’s a nice list of different scenarios that cover small machines, “ranches” of machines, production, etc. The last list mentions a dual VIO server, although I would argue that is the rule and not the exception:

“When to go Dual Virtual IO Server and when not to?

This is impossible to answer, but here are a few thoughts:

* The Virtual IO Server is running its own AIX internal code like the LVM and device drivers (virtual device drivers for the clients and real device drivers for the adapters). Some may argue that there is little to go wrong here. Adding a second Virtual IO Server complicates things, so only add a second one if you really need it.
* Only add a second Virtual IO Server for resilience if you would normally insist on setting up a high-availability environment. Typically, this would be on production machines or partitions. But if you are going to have an HACMP setup (to protect from machine outage, power supply, computer room or even site outage), then why would you need two Virtual IO Servers? If the VIO Server fails, you can do a HACMP fail over the other machine.
* If this is a less-critical server, say one used for developers, system test and training, then you might decide the simplicity of a single VIO Server is OK, particularly if these partitions have scheduled downtime for updates to the Virtual IO Server. Plan on scheduled maintenance. Also note the VIO Server and VIO Clients start quickly so the downtime is far less then older standalone systems.”

VIO server sizing info is one area where the content is old. Nigel Griffiths has updated information, noting, among other things, that the VIO server must be monitored as workloads increase.

Back to the original link. This is found in the section headed, “Common Mistakes”:

“Priorities in Emergencies
Network I/O is very high priority (dropped packets due to neglect require retranmission and are thus painfully slow) compared to disk I/O because the disk adapters will just finish and sit and wait if neglected due to high CPU loads. This means if a Virtual IO server is starved of CPU power, something that should be avoided, but if it happens then the Virtual IO Server will deal with network as a priority. For this reason some people consider splitting the Virtual IO Server into two. One for networks and one for disks, so that disks do not get neglected. This is only a worst case scenario and we should plan and guarantee this starvation does not happen. …

Virtual IO Server below a whole CPU
For excellent Virtual IO Server responsiveness giving the VIO Server a whole CPU is a good idea as it results in no latency waiting to get scheduled on to the CPU. But on small machines, say 4 CPU, this is a lot of computer power compared to the VIO client LPARs (i.e. 25%). If you decide to give the VIO Server say half a CPU (Entitle Capacity = 0.5) then be generous, never make the VIO Server Capped and give it a very large weight factor.”

These excerpts are from the “Implementation” section:

“* In most large installations the configuration is an iterative process that will not quite match the initial design so some modification may have to be made. …
* Also opportunities may appear too to add flexibility of a pool of resources that can be assigned later, once real life performance has been monitored for a few weeks.”

Finally, from the “Management/Administration” section:

“Maintain VIO Server Software
* New and very useful function appear in the latest VIO Server software which makes updating it worthwhile.
* Take careful note the this may require firmware updates too and it is worth scheduling these and in advance of VIO Server software updates.
* There are also fixes for the VIO Server to overcome particular problems.

It is worth making a “read only” HMC or IVM user account for people to take a look at the configuration and know they can’t “mess it up”.

I often get people claim that their Virtual I/O resource is not available when they create a new LPAR and 90% of the time it is due to mistakes on the HMC. The IVM features automation of these setup tasks and is much easier. Also recent new versions of the HMC software make the cross checking of the virtual VIO Server VIO client resources all match up.

It is strongly recommended the the HMC, system firmware and VIO Server software is all kept up to date to make the latest VIO features and user interface advances available and to remove known and fixed problems with early releases.”

At the very end of the document there are a few examples of how to get configuration data from the HMC, create an LPAR using the command line, and create LPARs using a configuration file.

Again, some of this information is dated. But overall, there’s lots of good advice.

My Hosted PowerSC Trial Session

Edit: Some links no longer work.

Originally posted November 1, 2016 on AIXchange

Did you see this AIX EXTRA article about PowerSC?

“PowerSC 1.1.5 will bring us a new user interface that makes the security compliance aspect of the product significantly easier to manage. Many Power Systems clients need to adhere to different security compliance standards for their particular industry. Examples include COBIT, PCI, HIPAA, NERC, and DoD.

PowerSC, and previously aixpert, have always been great tools in managing compliance with these standards. They took the rules and requirements of the different standards and applied them to the AIX operating system, so you didn’t have to. But in order to manage these profiles, users had to log in and execute commands on each machine individually.”

The piece also mentions an IBM hosted trial period, which ended last week. I was among the users who took part in the trial, and while I can’t say a lot about it due to the confidentiality agreement I signed, I will tell you that I liked what I saw with PowerSC. I also like the direction IBM could be taking with this product.

The process to get access to the environment was very easy. We set up a mutually convenient time via email, and at that time we got on a shared screen session together. I was given control of the session, and I was asked about what I saw, what I liked and what I didn’t like. I performed tasks with the product so we could see how intuitive the process was.

After running the product through its paces, I provided my feedback. Again, I think you’ll like it once you get your hands on it. I imagine this new iteration with a GUI might make some of you more motivated to implement PowerSC in your environment.

I’d love to hear from anyone else who participated in the trial. I’m also curious if you knew about the trial before reading this. I previously tweeted about it (@robmcnelly), and so did @AIXmag. But I always wonder how you get your information, whether it’s from this blog, Twitter or the AIX EXTRA email newsletter.

And if you missed out on the trial, I hope you’ll take advantage of chances to test out other IBM products in the future. It’s a free opportunity to learn about tools that could really help you.

AIX Keeps Making History

Edit: I still like to remember the good old days.

Originally posted October 25, 2016 on AIXchange

I’m a fan of history, especially technology-related history. So as I get older, I like to reminisce about “the good old days.” Like when I attended Desert Code Camp 2016 earlier this month.

The event, held at Chandler-Gilbert Community College in Chandler, Ariz., was great. Sessions were focused toward developers, including one that covered IBM Watson and Bluemix.

What got me reminiscing is the fact that I actually attended this school back in 1987, shortly after it opened. It was fun to walk around the campus and see the growth and change that’s taken place over nearly 30 years. While several of the original buildings and computer labs still stand, it was enough of a change to show me that life marches on.

Adding to that weekend’s retro feel, The Retro Wagon filled a room with classic hardware: everything from Altairs, teletypes, Commodore 64s, TRS80s and Apple II computers to slide rules and acoustic couplers. It was like walking into a time warp. If you follow me on Twitter (@robmcnelly) you might have seen photos of some vintage machines. Otherwise, check out The Retro Wagon Twitter feed, which is available from their homepage.

It’s amazing to think that when I was in college and a lot of that technology was being unveiled, my favorite operating system was also part of that era’s innovation. Yes, AIX turns 30 this year. If you’re wondering what AIX was like at its inception, read what some of the key people involved in its creation had to say in this IBM Systems Magazine 20-year retrospective from 2006. There are some great memories, along with some names that you may remember from conferences you’ve attended over the years.

But what about the rest of the story? What can we say about the past 10 years of AIX?

One highlight that immediately comes to my mind is that the latest release of AIX. AIX 7.2 TL1 allows customers to upgrade their operating system with no downtime. Think of what we can do on the fly now: we can patch one VIO server, reboot it and patch the redundant VIO server in the pair — and the VIO clients shouldn’t even notice. We can non-disruptively update system firmware in many cases. We can take minor outages to patch our PowerHA clusters. And now we can patch our OS without an outage. I see Ubuntu is working on something called Livepatch, but I wonder how long it will be before another operating system can be patched on the fly the way AIX can.

The past 10 years of AIX have also given us Live Partition Mobility, where we can move running workloads between POWER6, POWER7 and POWER8 servers.

We have PowerHA, built into the OS with Cluster Aware AIX (CAA).

We have shared storage pools. We have multiple shared processor pools.

We have more granularity when creating virtual machines.

We have CAPI and the capability to have I/O cards talk directly to the CPU.

We have POWER8 processors that give us up to 8 threads per core.

We have active memory expansion and we have WPARs. We have the capability to run AIX 5.2 or AIX 5.3 in a WPAR.

Then there are the products that come with AIX Enterprise Edition, including PowerSC, PowerVC, Cloud Manager and the Dynamic System Optimizer.

There’s plenty more that I didn’t mention. The point is that the past 10 years has produced substantial improvements that have made AIX a more powerful operating system with more advanced virtualization capabilities and more powerful hardware. And these improvements have made our jobs easier.

It’s nearly impossible to imagine where AIX will be in another 10 years, but as much as I like looking back, I’m even more excited about what’s ahead. Just think, 2026 will be the 40th anniversary of AIX. What will we be able to do then? What won’t we be able to do?

Digging into Last Week’s IBM Announcements

Edit: Some links no longer work.

Originally posted October 18, 2016 on AIXchange

Last week IBM announced new hardware models, along with new features and functionality within AIX and IBM i. I believe IBM is once again showing a strong commitment to the Power brand, and, by providing the capability to update your operating system on the fly, giving customers another reason to choose Power Systems.

Here’s the announcement summary. Some highlights include AIX 7.2 enhancements, including the capability to perform live updates. Before we could use AIX Live Update for interim fixes, but now we can perform live updates of the AIX operating system:

  • Introduced in AIX 7.2, AIX Live Update is extended in Technology Level 1 to support any future update without a reboot, with either the geninstall command or NIM.
  • The genld command is enhanced to list processes that have an old version of a library loaded so that processes can be restarted when needed in order to load the updated libraries.

For more about this new feature, read this from IBM developerWorks, and watch this. Before you make that jump to some other operating system, Live Update might give you pause. What other operating system lets you update it while it’s running?

There’s also the capability to use large pages with Active Memory Expansion (AME). According to previously referenced AIX 7.2 announcement letter, “the 64k pages can be configured/compressed in an LPAR, and the amepat command is enabled for 64k page modeling.”

There’s an enhancement to the AIX Toolbox for Linux. IBM will commit to maintaining it with current levels, along with enhancements to yum and updating the USB device library. Again, this is from the AIX 7.2 announcement letter:

“To facilitate the installation of supplementary open source software packages for AIX, IBM introduces the yum package management tool for AIX. Along with yum, there is a mandatory update of the RPM Package Manager to version 4.9.1.3. In this update, new function enables yum to perform automatic open source software dependence discovery and update maintenance for RPM-based open source software installed on your system.

A new policy maintains and addresses open source security vulnerabilities in selected key open source software packages. IBM expands its commitment to keep the key open source packages updated to reasonably current levels….

The cloud-init utility and all of its dependencies are now available on the AIX Toolbox for Linux Applications website. With yum, you can easily install cloud-init, and licensed AIX users receive support.

The libusb development library for USB device access is added to the AIX Toolbox for Linux Applications.”

There’s a new E850C server to go along with the E870C and the E880C:

“The Power E850C server (8408-44E) is the latest enhancement to the Power System portfolio. It offers an improved 4-socket 4U system that delivers faster POWER8 processors up to 4.22 GHz, with up to 4TB of DDR4 memory, built-in IBM PowerVM virtualization, and Capacity on Demand. It integrates cloud management to help clients deploy scalable, mission-critical business applications in virtualized, private cloud infrastructures.

Like its predecessor Power E850 server, which was launched in 2015, the new Power E850C server utilizes 8-core, 10-core, or 12-core POWER8 processor modules. But the E850C processors are 13% – 20% faster and deliver a system with up to 32 cores at 4.22 GHz, up to 40 cores at 3.95 GHz, or up to 48 cores at 3.65 GHz and utilize DDR4 memory. A minimum of two processor modules must be installed in each system, with a minimum quantity of one processor module’s cores activate.”

There are new and improved HA, DR, and backup/recovery solutions, including a new PowerHA interface and dashboard. This link cites Live Partition Mobility resiliency improvements and simplified remote restart enhancements that provide for automated policy-based VM remote restart and VM remote restart when the system is powered off. It also mentions the HMC:

HMC V8.8.6 has been enhanced to include support for the following:

  • Ability to export performance and capacity data collected by the HMC to a CSV formatted flat file for use by other analysis tools
  • Reporting on energy consumption, which can be used either by the REST APIs or by the new export facility
  • Dynamic setting of the Simplified Remote Restart VM property, which enables this property to be turned on or off dynamically.

The PowerHA System Mirror 7.2.1 announcement letter specifically covers the new GUI “that enables at-a-glance health monitoring for a PowerHA on AIX cluster or group of clusters, easy to digest view of PowerHA cluster environment, immediate notification of health status, click-on event status, and intelligent filtering of relevant event logs.”

We’re not done yet. Enhanced I/O and server options include:

“DDR4 CDIMM memory options provide energy savings and DDR4 configuration options:
Smaller-capacity CDIMMs join the existing large-capacity CDIMM for IBM Power® E880, E880C, E870, and E870C servers.

For IBM Power S812L, S814, S822, S822L, S824, and S824L servers, a full set of DDR4 DIMMs is announced that match existing DDR3 capacities.

Capacity Backup for Power Enterprise Systems is new simplified and cost-effective HA/DR offering that replaces the existing Capacity Backup for PowerHA offering. It is now available for the IBM Power E870, E880, E870C, and E880C servers.”

PowerSC 1.1.5 will feature a new interface:

“A new compliance user interface where users can manage compliance profiles across their environment, create custom profiles, and groups of endpoints.

Compliance automation profile updates to Payment Card Industry (PCI) version 3.1 and North American Electric Reliability Corporation (NERC) version 5.

Trusted Network Connect now supports patch management of critical security fixes for open source packages on AIX® base for packages that have been downloaded from the AIX toolbox or other web download sites for AIX Open Source Packages.”

If you manage SANs, you should know about the IBM Network Advisor V14. A key enhancement is an “at-a-glance summary of all discovered b-type devices, including inventory and event summary information used to identify problem areas and help prevent network downtime.

And for those of you who also manage IBM i, 7.2 TR5 has also been announced, along with 7.3 TR1.

Believe it or not, there’s more I haven’t covered, so dig into the links.

The Best Documentation is Well-Organized

Edit: Still some good websites to visit.

Originally posted October 11, 2016 on AIXchange

It’s once again time to nominate IBM Champions. You can do so here.

Seeing that notice reminded me of the IBM Champions event I attended in Austin, Texas, some months back. I’d known a lot of these folks for years, but I was meeting a few of them for the first time. Balazs Babinecz was one of those people I’d never been face to face with. Like me, Balazs has a blog:

“This blog is intended for anyone who is working with AIX and encountered problems and looking for fast solutions or just want to study about AIX. This is not a usual blog, it is not updated every day. I tried to organize AIX related subjects into several topics, and when I find new info/solutions/interesting stuff I will add it to its topic. You can read here about many things of the world of AIX. (NIM, Storage, Network, VIO, PowerHA, HMC, Performance Tuning…)

The structure of each subject is very similar. First I try to give a general overview about a topic with the most important terms and definitions. This is followed by some important/useful commands, what are probably needed for everyday work. At the end, there are some common situations with solutions which could come up during the daily work of an administrator.

I tried to keep it as simple as possible, so without any further instructions you should be able to navigate through this site very easily.

All of these materials have been gathered by me through my experience, IBM Redbooks, forums and other internet sources. It means not all of them is written by me! If I find an interesting data and I think it is valuable, I publish it on this blog. (Basically this blog is my personal viewpoint about AIX related stuff, and it is not an official IBM site.) Most of the things have been tested successfully but it can occur that you encounter typos, missing steps and erroneous data (I cannot guarantee everything works perfectly), so please look and think before you act.”

It was fun getting a chance to meet Balazs, since I’ve frequented his blog over the years. It’s very well organized, with links to information about filesystems, the logical volume manager (LVM), HMC, networks, NIM, performance, storage and backup, install, PowerHA, PowerVM and more. Many of the topics include a section called “Basics,” which provide a good, quick overview of a particular topic.

I listed some other useful AIX resources here, and I should add William Favorite’s AIX QuickSheet to that list.

Balazs’s blog in particular reminds me of the advantages of a simple, easy to navigate web design, where you can start with broad topics and then drill down into specific details. It may seem like a minor thing, but it matters. Read through the comments, and you’ll see that many other admins agree.

A Performance Analysis Tool for Linux

Edit: Did you know this tool exists? Some links no longer work.

Originally posted October 4, 2016 on AIXchange

I often hear from people who want to know how to conduct in-depth performance analysis on Linux. These folks are new to the platform and wonder why they can’t find many of the tools (PerfPMR, for instance) that they take for granted with AIX.

If you find yourself in this situation, you should know about a performance-focused script called the Linux Performance Customer Profiler Utility (lpcpu), which gathers data from both x86- and Power Linux-based systems:

“This script captures a lot of potentially interesting performance data (profile information, system information, and some system configuration information) on a single pass approach. Gathering all of this information at once allows the context to be understood for the performance profiling data to be analyzed.

  • The script will check to be sure you have the basic tools installed on your system.

This script takes advantage of all of the “normal” performance tools used on Linux.

  • iostat
  • mpstat
  • vmstat
  • perf
  • meminfo
  • top
  • sar
  • oprofile
  • perf

In addition, relevant system information is gathered, with the profiler output, into a single tarball saved on your system. By default, the file is saved in /tmp.

The script is designed to run on both x86 and Power servers, the focus being SLES and RHEL distros. It should work on OpenSUSE and Fedora as well.

The script creates a zipped tar-ball placed in /tmp by default. You can un-zip the file and poke around the data files captured to learn more about your system. Typically, the zipped tar-ball is returned to performance analysts at IBM who can help with problem determination and ideas.

In 95% of our interactions with product teams and customers, there is generally something easy to address first. There are naturally many other in-depth tools we might leverage in subsequent data runs, but first we want to be sure everyone is “on the same page.”

Here’s more on testing the script and processing the results:

“This checks for errors and attempts to run all of the default profilers. A typical error will be that the profiler being run is not installed. Obviously, in that case the profiler should be installed, or if not available, you can override the profiler list to skip that tool (but keep in mind that the data gathered may not be as useful).

You do need a number of rpm packages installed on your system.

  • sysstat
  • profile (this rpm is on the SDK image for SLES)

The script does assume the Linux kernel with symbol information is available. This depends on your distro and version since they are generally packaged differently. The script does parse and check all of the common correct places to find vmlinux (the unstripped kernel). On RHEL 6.2, you will need the kernel-debuginfo*.rpm packages installed.

This script is not targeted or focused on Java applications, but it does serve well as the first pass data gathering tool.

Typically, the workload being tested reaches a fairly steady state (has settled down) and performance data can be collected from the system.

In this case, you can run the script for the default 2 minutes, and the script will profile the information and gather everything together.

Along with the script, we have the ability to format the results into a series of html pages and charts.

Previous releases of LPCPU have required an x86 system for producing charts, however the latest release removes this requirement, unless you would prefer to force the old behavior to be used (see the README for details if you would like to force the old behavior).
Take the lpcpu.tar.bz2 file, and unpack it.

    # cd /tmp
    # tar -jxf lpcpu.tar.bz2

Copy the data tarball to the system you would like to host your data on (this could be the test system or a workstation). Unpack the tarball and issues the following commands:

    # pwd
    /var/www/html
    # tar -jxf lpcpu_data.mysystem.default.2012-04-04_1147.tar.bz2
    # cd lpcpu_data.mysystem.default.2012-04-04_1147/
    # ./postprocess.sh   /tmp/lpcpu/
    <lots of messages>
    # ls -l summary.html

View that summary.html file in a browser. Depending on your browser, you may need a web server to make use of the new charting abilities. A Python script is included for running a simple web server should you not have one available. If you cannot run the Python script and do not have a web server available, please fall back on the old charting method (see the README for details).”

If you’ve tried out the tool, I’d like to hear from you. What other things would you like to see it do in your environment?

Removing a Static Route from the ODM

Edit: Still good information.

Originally posted September 27, 2016 on AIXchange

I was recently asked how to remove a static route from the AIX Object Data Manager (ODM), so I pointed my customer to this techdoc.

Although this information is pretty basic, many times when we revisit the basics we’re reminded of something we already knew. And sometimes, we even learn new things.

Some information from the techdoc follows. However, I encourage you to open the above link, which also has images and output:

“Question
How do I remove a static route from the ODM?

Answer
When you are trying to remove a static route it’s best to use smitty or a chdev command to get rid of it. When you use the route delete command that just removes it from the running Kernel. Here is how we can remove the static route.

First Option: Smitty
Step 1: Run netstat -rn.
Step 2: Verify the route you want to remove. Also look at the ODM so you can see later that it was removed from there to. To verify from the odm run lsattr -El inet0. In this example we will remove the route circled in red. Notice on the flags column and you will see it has a flag of H, meaning it is a Host route.

Here is the odm output and circled in red is the same route from the netstat -rn output. It also shows you that it is a Host route we are going to remove. It looks similar to the route above it, but one is a network route and the other is a host specific route.

Step 3: Type smitty route.
Step 4: Select remove a static route.
Step 5: Enter the information for destination and gateway exactly how you see it in the routing table.

For Destination Type we can hit F4 and it will give us two options: net and host. In our case we will select host since we are removing a host specific route.

Under Destination Address we will enter what is in the Destination column of the netstat -rn.

The Gateway value will be what’s in the Gateway column of the netstat -rn.

Hit enter when done.

Step 6: Verify that it was gone with the lsattr command. lsattr -El inet0.

Notice we don’t see the following value for route any longer:

    host,-interface,,,,,,153.6.24.0,153.6.24.56

Second Option: Command line using chdev command.
Step 1: Verify the route we want to remove in the netstat -rn output.
Step 2: Verify which route is the offending route in the lsattr -El inet0 output.
Step 3: Run the following command:

    chdev -l inet0 -a delroute=”net,-interface,,,,,,153.6.24.0,153.6.24.56″

Step 4: Verify that the route is gone in the ODM.”

For further reading on routes and networking with AIX, check out these articles (here and here) as well.

More on the HMC and root

Edit: Still a good discussion.

Originally posted September 20, 2016 on AIXchange

Did you know about the AIX forums that are hosted at unix.com? It had been awhile since I checked them, but when I did recently, I found an open letter that was written to me a few weeks after I wrote about whether IBM should allow root access to the HMC.

It was a happy discovery and an interesting read, so I’ll share a summarized version of it here. I do agree with many of the points that were brought up in the letter and the discussion that followed.

From the first post:

“So, do I want root on the HMC, as McNelly finally asks? No, for the most time a decent user account with a normal, not-restricted shell would suffice. But to manage this account — in the same responsible way I manage the rest of my 350 LPARs — I’d like to become root now and then to do whatever administrators do. Of course I know how to jailbreak the HMC (like perhaps every halfways capable admin does), but why do I need to “break into” a system I have set up, a system I run and for which I (well, actually my company) have paid good money?

…we are not talking about some mobile phone for $69.99. We are talking about the two HMCs I use to manage one and a half dozen p780s and p880s, about 2 million dollars apiece. Do you think it is necessary to squeeze out some minimal additional benefit by pestering me with a restricted shell for my daily work? And if you really think I couldn’t handle the responsibility for such a vital system: don’t you think I should be removed from the position where I manage the LPARs running the corporate SAP systems too?”

Another commenter replied to the thread and argued that a user with unfettered access could blow up the HMC. Then this came up, concerning education:

“Second: this is digging into a much larger area so I’ll try to keep it short. The reason that so few capable admins for AIX are there is because IBM did (and, IMHO, still does) a very bad job at educating them. If I am a Linux admin and want to hone my skills I get myself a PC for $300 and start hacking. I will perhaps make it go FUBAR a few times but all this will teach me valuable lessons and I will be all the more capable once I work on really productive systems professionally. If I am an AIX admin I do… what? Buy myself a system for ~ $20k only to find out I can’t even create an LPAR because I need to shell out another $50k in various licenses for one thing or the other? This might be OK for a bank, but is beyond my financial reach.”

I’ve written plenty about education over the years (for starters, herehere and here), and I do believe this continues to be a problem.

This is from another commenter:

“I can agree with most of what has been said above, I can understand IBM wanting to lock the HMC appliance down as much as possible and I understand the sysadmin desire to have full control of any machine on the network as Bakunin says – if there’s not a competency issue. In truth, my main reason for coming down on the restricted side of this argument is exactly that – competency! I have a number of systems that have been up and running for longer than many of my support contacts have been systems admins, I don’t actually have privileged access to many of the systems – I have elevated access or “root” access on none of the systems. Should I need root access, it has to be requested, approved and I am issued with a one-time password.

I find it to be a total pain, but that is the implemented system. On investigation the reason for the system being implemented was, you guessed it, competency! Cited examples, well I could give you any number. But an example that I think sums it up quite well is one that was easy to recover from, but could have been catastrophic had it been a customer facing system with say five or six thousand users. Instead of a development system, with just a couple of hundred developers. Where the “root” user executed a recursive delete command with a space in it, from the root directory and effectively deleted the full contents of the server – mostly source code and development tools.

I have worked in the *NIX world since 1981, over that time I have watched the skill level of the sysadmin degrade, a lot of it revolves around training – my first “Sysadmin I” course was five weeks long and I never actually saw a machine. It was all spent sitting at a Wyse 30 terminal, with a number of other trainees. Now I see sysadmins working for major vendors, with no training whatsoever.”

The final post on the thread covers at length an issue caused by being locked out of the HMC. Here’s the conclusion:

“Yes, it was my fault not to have the idea with the /var FS earlier. I was tricked by both HMCs losing connection at about the same time and investigated in the completely wrong direction. On the other hand, this is not a UNIX system, it is an appliance. Why am I supposed to act as am admin checking for filesystems when I was first denied all the tools admins have?

Second, my life was made so much easier by being forced to rely on tricks like pulling MAC addresses out of the routers logs instead of simply issuing ifconfig. Find out how long a system is up: uptime. Find out how long a HMC is up: impossible. Check how many packets are being sent/received on a UNIX system: entstat or netstat. Find out the same on a HMC: impossible. This list goes on and on.

And finally: even if I had diagnosed the problem correctly it wouldn’t have helped me any. We actually tried the “official” methods of cleaning up before, but they didn’t work at all (as they usually do — I have seen them fail more often than not). Only breaking in and using normal UNIX commands did what was expected. And why did IBM not see that full FS in the 2.6GB dump they required me to upload? Do I really want to take the risk of my multi-million-dollar environment becoming completely unusable because I have a system at the center which I can neither diagnose nor administrate…”

I can certainly commiserate with the sentiment, although, had it been me, I would have engaged a duty manager and escalated the support ticket. I’d also ask for the one-time HMC password to help with the diagnosis, and maybe even request a shared-screen conversation so I knew I was getting a technician’s full attention. If you’re really stuck, you owe it to yourself to utilize the minds and resources at IBM Support. Keep making noise there until you get what you need.

Anyway, this is a great discussion, and I wouldn’t mind seeing it continued here. So what do you think? Should IBM just give us root to the HMC? Should they continue to offer the one-time password option via support? Is there another solution?

And if you haven’t signed up for the AIX forums, you should. You may not use it regularly, but it’s a great place for launching discussions and getting answers.

POWER9 Media Coverage

Edit: Have you migrated yet? Some links no longer work.

Originally posted September 13, 2016 on AIXchange

Last month’s Hot Chips conference generated quite a bit of press about the soon to be available POWER9 processors:

“Intel has the kind of control in the datacenter that only one vendor in the history of data processing has ever enjoyed. That other company is, of course, IBM, and Big Blue wants to take back some of the real estate it lost in the datacenters of the world in the past twenty years.

The POWER9 chip, unveiled at the Hot Chips conference this week, is the best chance the company has had to make some share gains against X86 processors since the POWER4 chip came out a decade and a half ago and set IBM on the path to dominance in the RISC/Unix market.

As it turns out, IBM will be delivering four different variants of the future POWER9 chip, as Brian Thompto, senior technical staff member for the Power processor design team at the company, revealed in his presentation at Hot Chips. There was only one POWER7 and one POWER7+, with variants just having different cores and caches activated. There were three POWER8 chips, one with six cores aimed at scale out workloads and with two chips sharing a single package and one single-die, twelve-core chip aimed at bigger NUMA machines; this year saw the launch of the POWER8 chip (not a POWER8+ even though IBM did call it that for some time) with twelve cores with the NVLink interconnect from Nvidia woven into it.

With the POWER9 chip, there will be the POWER9 SO (short for scale out) variant for machines aimed at servers with one or two sockets, due in the second half of 2017, and the POWER9 SU (short for scale up) that will follow in 2018 for machines with four or more sockets and, we think, largely sold by IBM itself for its customers running its own AIX and IBM i operating systems.

The four versions of the POWER9 chip differ from each other in terms of the number of cores, whether or not the systems have directly attached memory or use the “Centaur” memory buffer chips, and level of simultaneous multithreading available for specific server configurations… .

The twist on the SMT level is the new bit we did not know, and we also did not know the core counts that would be available on the POWER9 SU variants. We knew that the POWER9 SO chip would have 24 cores, and by the way, Thompto tells The Next Platform that the POWER9 SO chip is a single die chip with 24 cores. The POWER9 SU chip will top out at twelve cores, just like the biggest POWER8 chip did. Both POWER9 chips have eight DDR memory ports, each with its own controller on the die, which now can either talk directly to two DDR memory sticks on the POWER9 SO or to a Centaur buffer chip that in turn talks to four DDR memory sticks each.”

This ChannelWorld article notes that greater throughput is expected:

“Each NVLink 2.0 lane in the POWER9 chip will communicate at 25Gbps (bits per second), seven to 10 times the speed of PCI-Express 3.0, according to IBM. POWER9 will have multiple communication lanes for NVLink 2.0, and they could provide massive throughput when combined.

Recent Nvidia GPUs like the Tesla P100 are based on the company’s Pascal architecture and use NVLink 1.0. The Volta GPU architecture will succeed Pascal, also used in GPUs like the GeForce GTX 1080.

With a tremendous bandwidth improvement in over its predecessor, the NVLink 2.0 technology will be important for applications driven by GPUs, like cognitive computing.”

eetimes.com is intrigued by POWER9’s acceleration strategy:

“Across a range of benchmarks, POWER9 should deliver from 50% to more than twice the performance of the POWER8 when the new chip arrives late next year, said Brian Thompto, a lead architect for the chip. New core and chip-level designs contribute to the performance boost.

The diversity of choices could help attract OEMs. IBM has been trying to encourage others to build Power systems through its OpenPower group that now sports more than 200 members. So far, it’s gaining most interest from China where one partner is making its own Power chips.

Use of standard DDR4 DIMMs on some parts will lower barriers for OEMs by enabling commodity packaging and thus lower costs.

POWER9’s acceleration strategy is perhaps the most interesting aspect of the new chip.

It will be one of the first microprocessors to implement the 16 GTransfer/second PCI Express Gen 4 interconnect that is still awaiting approval of a final spec. Separately, it implements a new 25 Gbit/s physical interconnect called IBM BlueLink.

Both interconnects support 48 lanes and will accommodate multiple protocols. The PCIe link will also use IBM’s CAPI 2.0 to connect to FPGAs and ASICs. BlueLink will carry the next generation NVLink co-developed for Nvidia GPUs as well as a new CAPI.”

eWeek mentions the OpenPower Foundation:

“We want people to know there is an alternative to x86 chips and that alternative can bring a lot of performance with it,” Dylan Boday, IBM Power engineer, told eWEEK last week standing outside of the Moscone Center, home to IDF. “At the end of the day, most people want choice, but they also want to see advantages to that choice.”

IBM traditionally had developed Power chips to run only in its Power servers. However, the company three years ago—with such partners as Nvidia and Google—launched the OpenPower Foundation, enabling third parties to license the architecture to create their own Power-based systems. It was part of a larger effort to embrace open technologies—such as Linux, OpenStack and the Open Compute Project (OCP)—for its Power architecture.

The work is paying off, according to IBM officials. At the first OpenPower Summit last year, the group had about 130 members. That has since grown to more than 200. At the same time, there are more than 2,300 applications that run on Linux on Power, they said.”

As much as I love working with POWER8, I’m already excited for POWER9. (To be honest, I’m even looking forward to the day when I can help customers upgrade to POWER12.) The point is, the future looks bright for the POWER platform.

10G Ethernet on POWER Tips

Edit: Some links no longer work.

Originally posted September 6, 2016 on AIXchange

This great new techdoc from Steve Knudson recently went live. It includes a set of slides that cover Ethernet on POWER, along with a cheat sheet that you may find valuable as you transition to 10G adapters on POWER8 servers:

“Moving some older FCoE 10Gb adapters from POWER7, PCIe-Gen1 slots, to POWER8, PCIe-Gen3 slots, we saw SEA throughput on a single 10Gb Ethernet port move from approx 4.2Gb/sec, up to 8.95Gb/sec. LPAR to LPAR, within the POWER8 hypervisor, we saw an astonishing 45Gb/sec, AIX to AIX. See the full slide deck attached.

The cheat sheet for AIX and SEA performance:

1) Before SEA is configured, put dcbflush_local=yes on the trunked virtual adapters. If SEA is already configured, skip this.

    $ chdev -dev entX -attr dcbflush_local=yes

2) Configure SEA. largesend is on the SEA by default, put large_receive on also.

    $ chdev -dev entY -attr large_receive=yes

3) Up in AIX, before IP is configured, put dcbflush_local on virtual Ethernet adapters. If IP is already configured, skip this.

    # chdev -l ent0 -a dcbflush_local=yes (slide 55)

4) Up in AIX, put thread and mtu_bypass on the interface en0 (slide 55).

    # chdev -l en0 -a thread=on
    # chdev -l en0 -a mtu_bypass=on

5) Assure you have enough CPU in sending AIX, sending VIO, receive VIO, and receiving AIX. See slides 75-76.”

From the agenda in the slides:

    -Physical Ethernet Adapters
    -Jumbo Frames
    -Link Aggregation Configuration
    -Shared Ethernet Adapter SEA Configuration
    -VIO 2.2.3, Simplified SEA Configuration
    -SEA VLAN Tagging
    -VLAN awareness in SMS
    -10 Gb SEA, active – active
    -ha_mode=sharing, active – active
    -Dynamic VLANs on SEA
    -SEA Throughput
    -Virtual Switch – VEB versus VEPA mode
    -AIX Virtual Ethernet adapter
    -AIX IP interface
    -AIX TCP settings
    -AIX NFS settings
    -largesend, large_receive with binary ftp for network performance
    -iperf tool for network performance

    Most syntax in this presentation is VIO padmin, sometimes root smitty.

From slide 13:

    Jumbo frames is a physical setting. It is set
        -on Ethernet switch ports
        -on physical adapters
        -on the link aggregation, if used
        -on the Shared Ethernet Adapter.

-Jumbo frames is NOT set on the virtual adapter or interface in the AIX client LPAR.
-Do not change MTU on the AIX client LPAR interface. We will use mtu_bypass (largesend) in AIX.
-mtu_bypass – up to 64KB segments sent from AIX to SEA, resegmentation on the SEA for the physical network (1500 or 9000 as appropriate).

From slide 16, link aggregation configuration:

-Mode – standard if network admin explicitly configures switch ports in a channel group for our server.
-Mode – 8023ad if network admin configures LACP switch ports for our server. ad = Autodetect – if our server approaches switch with one adapter, switch sees one adapter. If our server approaches switch with a Link Aggregation, switch auto detects that. For 10Gb, we should be LACP/8023ad.
-Hash Mode – default is by IP address, good fan out for one server to many clients. But will transmit to a given IP peer on only one adapter.
-Hash Mode – src_dst_port, uses source and destination port numbers in hash. Multiple connections between two peers likely hash over different adapters. Best opportunity for multi-adapter bandwidth between two peers. Whichever mode used, we prefer hash_mode=src_dst_port
-Backup adapter – optional, standby, single adapter to same network on a different switch. Would not use this for link aggregations underneath SEA Failover configuration. Also would likely not use on a large switch, where active adapters are connected to different, isolated “halves” of a large “logical” switch.
-Address to ping – Not typically used. Aids detection for failover to backup adapter. Needs to be a reliable address, but perhaps not the default gateway. Do not use this on the Link Aggregation, if SEA will be built on top of it. Instead use netaddr attribute on SEA, and put VIO IP address on SEA interface.
-Using mode and hash_mode, AIX readily transmits on all adapters. You may find switch delivers receives on only adapter – switches must enable hash_mode setting as well.

From slide 19, Shared Ethernet Adapter (SEA) configuration:

-Some cautions with largesend
-POWER Linux does not handle largesend on SEA. It has negative performance impact on sftp and nfs in Redhat RHEL.
-A few customers have had trouble with what has been referred to as a DUP-ACK storm when packets are small, and largesend is turned off in one client. Master APAR IV12424 lists APARs for several levels of AIX.
-A potential “denial of service” attack can be waged against largesend, using a specially crafted sequence of packets. ifixes for various AIX levels are listed here.
-largesend is NOT a universal problem, and these ifixes are not believed to be widely needed.From slide 77, iperf 10 Gb, SEA:

-If you are getting less than the values on the two previous slides…
-It appears that LARGESEND is on physical 10Gb adapter interfaces automatically, but you can set it explicitly:

        $ chdev –dev en4 –attr mtu_bypass=on

-Check that largesend, large_receive are on SEA at both ends:

         $ chdev –dev ent4 –attr largesend=1 large_receive=yes

-Check that mtu_bypass (largesend) is on AIX client LPAR interfaces:

         # chdev –l en0 –a mtu_bypass=on

-Watch CPU usage in both VIOs, both Client LPARs during iperf interval and make sure no LPAR is pegged or starving.

You’ll find plenty of other helpful tips and tricks here, so take the time to read through the slides. I’m sure you’ll learn at least one new thing by learn something you didn’t already know.

Using LPM from the Command Line

Edit: If only ISVs would embrace LPM instead of punishing us for using it.

Originally posted August 30, 2016 on AIXchange

In February I wrote about disabling live partition mobility on selected partitions, and recently I received a related question from someone looking for an alternative to using the HMC GUI. Specifically, how do you turn LPM on and off from the command line on a per partition basis?

This post lays out an audit trail:

“Any change to this attribute is logged as a system event, and can be checked for auditing purposes. A system event will also be logged when the Remote Restart or Simplified Remote Restart capability is set. More specifically, a system event is logged when:

    * any of these three attributes are set during the partition creation
    * any of these three attributes are modified
    * restoring profile data.

Users can check system events using the lssvcevents CLI and /or the View Management Console Events GUI. Using HMC’s rsyslog support, these system events can also be sent to a remote server on the same network as the HMC.”

These system events can be logged:

  2420 User name {0}: Disabled partition migration for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2421 User name {0}: Enabled partition migration for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2422 User name {0}: Disabled Simplified Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2423 User name {0}: Enabled Simplified Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2424 User name {0}: Disabled Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.
    2425 User name {0}: Enabled Remote Restart for partition {1} with ID {2} on managed system {3} with MTMS{4}.”

In a real life example, a reader sent me the following information. (Note: The command is in the log output).

“Just FYI, there is a way to make this change from the terminal on the HMC because I can see the command in the audit logs:

02 03 2016 07:48:43 10.9.0.1 <USER:INFO> Feb  3 07:48:43 hmc01 HMC: HSCE2123 User name hscroot: chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=1 command was executed successfully.

02 03 2016 08:21:01 10.9.0.1 <USER:INFO> Feb  3 08:21:01 hmc01 HMC: HSCE2123 User name hscroot: chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=0 command was executed successfully.

In this case we were able to run:

chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=1
chsyscfg -m system1 -r lpar -i lpar_id=20,migration_disabled=0

to do our testing.

I was also able to deduce from those commands, that you could run this command to show the disable_migration state:

lssyscfg -m system1 -r lpar –filter lpar_ids=20 -F migration_disabled”

This information should come in handy should you find yourself wanting to make a change from the command line.

Using AIX System Accounts

Edit: Still good to know.

Originally posted August 23, 2016 on AIXchange

I recently was asked about AIX system accounts. You’ll find the answers — why they’re there, how you login to them, etc. — in this IBM Support doc. It’s an older document that covers the basics, but the information is still relevant:

“Question: What are system Special Accounts?
Answer: Traditionally, UNIX has come with a default set of system user accounts to prevent root and system from owning all system filesystems and files. As such it is never recommended to remove the account but rather set an asterisk in the /etc/security/passwd for all except root. This document describes the default set of user accounts.

root — Commonly called the superuser (UID 0), this is the account that system administrators log into to perform system maintenance and problem determination.

daemon — A user used to execute system server processes. This user only exists to own these processes (and the associated files) and to guarantee that they execute with appropriate file access permissions.

bin — A second system account used primarily to break up owners of important system directories and files from being solely owned by root and system. This account typically owns the executable files for most user commands.

sys — sys user owns the default mounting point for the Distributed File Service (DFS) cache which is necessary before installation and configuration of DFS on a client. /usr/sys directory can also be used to put install images.

adm — The adm user in the /etc/passwd is basically responsible for two system functions:

    * ownership of diagnostic tools, as evidenced by the directory /usr/sbin/perf/diag_tool/
    * accounting, as evidenced by System Accounting Directories:
         /usr/sbin/acct
         /usr/lib/acct
         /var/adm
         /var/adm/acct/fiscal
         /var/adm/acct/nite
         /var/adm/acct/sum

guest — Many computer centers provide accounts for visitors to play games while they wait for an appointment, or to allow them to use a modem or network connection to contact their own computer. Typically, these accounts have names like open, guest, or play.

nobody — An account used by the Network File System (NFS) product, and to enable remote printing nobody exists when a program needs to permit temporary root access to root users. For example, before turning on Secure RPC or Secure NFS, check /etc/public key on the master NIS server to see if every user has been assigned a public key and a secret key. You can create an entry in the database for a user by becoming the superuser and entering:

    newkey -u username

You can also create an entry in the database for the special user, nobody. Users can now run the chkey program to create their own entries in the database.

uucp — UUCP is a system for transferring files and electronic mail between UNIX computers connected by telephone. When one computer dials to another computer, it must log in. Instead of logging in as root, the remote computer logs in as uucp. Electronic mail that is awaiting transmission to the remote machine is stored in directories that are readable only by the uucp user so that other users on the computer cannot read each other’s personal mail.

nuucp — The operating system provides a default nuucp login ID for transferring files. This is normally used for the uucp communication. These two ID’s, uucp and nuucp, are created when the bos.net.uucp fileset is installed. As logging in as the uucp user is not allowed, the nuucp user was created. Basically, uucp user id will not have a password entry set in /etc/security/passwd, but the nuucp user ID will have a password set. You can remove the user nuucp if you wish.

lpd, lp — Used for starting the lpd daemon which is necessary in order for the AIX Spooler to do remote printing.

invscout — Used by Inventory Scout which is a tool that checks the software and hardware configurations on the Hardware Management Console (HMC).

imnadm — IMN Search engine (used by Documentation Library Search).

snapp — Allows access to the snappd command which allows for hand-held PDA devices to be attached to a tty port on an AIX box. The PDA can then function in similar capacities to a dumb terminal.”

Here’s a more recent document from the IBM Knowledge Center:

“AIX provides a default set of system special user accounts that prevents the root and system accounts from owning all operating system files and file systems.

Attention: Use caution when removing a system special user account. You can disable a specific account by inserting an asterisk (*) at the beginning of its corresponding line of the /etc/security/passwd file. However, be careful not to disable the root user account. If you remove system special user accounts or disable the root account, the operating system will not function.”

Finally, here’s a list of accounts you may be able to remove, and here’s a link to accounts that are created by different security components on the system.

If you’re new to this area, these links should help you. And even if you already know this stuff, it never hurts to revisit the basics.

Linux on Power Resources

Edit: Some links no longer work.

Originally posted August 16, 2016 on AIXchange

I know more of you are evaluating and using Linux on Power, so I want to highlight some good resources. (Note: A ton of links follow, and I’ve noticed that some don’t seem to work properly with Internet Explorer, so try another browser if you encounter issues.)

This list of Linux on Power “general resources” from IBM developerWorks shows you how to, among other things, install Linux, get Linux evaluation copies for RedHat and SUSE, and find support options. Don’t forget you can run community supported distributions like Debian, Fedora, OpenSuse, CentOS and Ubuntu on Power as well.

IBM developerWorks also has several other good resources. This one is called the Open Source POWER Availability tool:

“The Open Source POWER Availability Tool (OSPAT) is a search engine that was designed to help you find open source packages that are available on the IBM POWER architecture. The results provide the package name and version and the Linux distribution that supports the package.”

At the Linux on Power Community wiki, there are many links to more information. You can meet the experts and check out this Linux on Power FAQ (though it’s dated, much of the information remains relevant, like this list of supported Linux distributions).

Finally, you can see which software packages have been ported to Linux on Power and determine if there are Docker containers for them:

“There are hundreds of open source packages for ppc64le available on IBM Power Systems and more are being added all the time. These pages include lists of the available packages. To help you find what you’re looking for, we’ve organized the lists by application type and for each type, we’ve listed the ported apps, the Linux distribution(s) they’re available on and where they’re maintained. And if you prefer, you can download a spreadsheet that contains the full list for each category.

Linux distributions officially supported on the IBM Power LE platform (ppc64le) are Ubuntu, Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES). Further, there are community editions like Debian, Fedora and OpenSuse as well as CentOS which are ported and can be deployed.”

At the time of this writing the last update was July 2016, so it seems pretty current. My only complaint is that I did not see a way to download a single file with a list of all the available packages; this would save users from having to jump from page to page or spreadsheet to spreadsheet. As is though, you at least get an idea of whether or not the packages you’re interested in have been ported to the distribution you hope to run them on. Just from browsing the list you can see the sizable number of Linux on Power packages. Rest assured, more are on the way.

Please let me know if a list like this is helpful to you. For that matter, let me know if you find posts that aggregate related things on the web of value.

Connecting with IBMers on Sametime

Edit: Sametime has gone away, Slack is the new tool IBMers use at the time of this writing. Some links no longer work.

Originally posted August 9, 2016 on AIXchange

I worked for IBM from 2000-2006. During that time I used Sametime extensively to communicate with coworkers worldwide. When I left the company, I wanted to continue to IM my former colleagues.

There were, and are, several ways for people outside of IBM to connect with IBMers on Sametime. One option is Pidgin, which is described as the “universal chat client.” If you choose this option, you may have to mess around with the XML file to get it to work (see here):

“Pidgin: First backup, then open and edit the following file (location is Windows 7 specific) with your favourite text editor :
C:\Users\[username]\AppData\Roaming\.purple\accounts.xml.
Now add or edit the following lines within the Sametime settings section under protocol prpl-meanwhile.
<settings>
   <setting name=’fake_client_id’ type=’bool’>1</setting>
   <setting name=’port’ type=’int’>80</setting>
   <setting name=’force_login’ type=’bool’>0</setting>
   <setting name=’server’ type=’string’>extst.ibm.com</setting>
   <setting name=’client_id_val’ type=’int’>4676</setting>
   <setting name=’client_minor’ type=’int’>8511</setting>
</settings>

Congratulations! You should now be connected to IBM’s internal Sametime server. To add contacts or buddies, first find their email address. If you don’t already know your buddies email address, you can search for it using this IBM Employee directory. When you add an internal IBM email address, prefix it with @E. For example, to add Sam you would user name “@E sam@us.ibm.com”. This tells the external Sametime Gateway to add an external contact via email. To add non-IBM users who are also using the Sametime gateway (like me) you can just add them by email address, without the @E prefix.”

Here are two other articles that offer alternative ways to connect with IBMers on Sametime. First, from wissel.net:

“IBM External Sametime Server: You need to have an IBM id, to get one register online.
Once you have it, create a (new) community in your Sametime client (see below). Thereafter lookup your IBMer to add him/her to your buddy list.
Server/Port: extst.ibm.com / 80
Advantage: You can reach any IBMer using Sametime, surprise them.
Disadvantage: Availability is not production level”

This is from IBM developerWorks:

“An ibm.com id – these are free and available from Sign up for an IBMid if you don’t already have one
A Sametime/IBM Instant Messaging compatible client installed on your computer/device. Previously a web client was available however that link is no longer working, so a “fat client” install would seem to be the way to go. You can download the latest Sametime client from Lotus Greenhouse site which will also require a (free) ID to be created. This is a different ID to the IBMid mentioned above, but just as quick and easy to get. You can use other non-IBM clients such as Adium or Pidgin but those clients will require some ‘hacking’ to allow them to connect to the IBM Instant Messaging Gateway — if you’re keen, please check out this blog post from nomaen that details that configuration. Personally, the IBM client does the job really nicely and is available for Windows, Mac, and Linux (RPM and DEB) so I’d just go that route.”

I mention this because I noticed this circulating on Twitter recently. I’m sure a lot of you know at least one IBMer, so this seems like good information to pass along.

Another good resource is IBM whois. This allows you to look up contact information for IBM employees by name.

And while I’m on the subject of instant messaging, don’t forget about IRC.

PowerVC Resources

Edit: I still regularly speed up my videos. Some links no longer work.

Originally posted August 2, 2016 on AIXchange

By now you’re familiar with PowerVC. No? Well, then this post is for you:

“IBM PowerVC Virtualization Center is an advanced virtualization and cloud management offering, built on OpenStack, that provides simplified virtualization management and cloud deployments for IBM AIX, IBM i and Linux virtual machines (VMs) running on IBM Power Systems. PowerVC is designed to improve administrator productivity and simplify the cloud management of VMs on Power Systems servers. PowerVC provides the foundation for Power Systems scalable cloud management, including integration to higher-level cloud orchestrators based on OpenStack technology.

PowerVC helps Power Systems customers lower their total cost of ownership with a simplified user experience that allows simple cloud deployment and movement of workloads and policies to maximize resource utilization. PowerVC has been built to require little or no training to accelerate cloud deployments on Power Systems. PowerVC has the capability to manage the existing infrastructure by automatically capturing information, such as existing VM definitions, storage, network and server configuration information.

PowerVC allows clients to capture and manage a library of VM images, enabling IT managers to quickly deploy a VM environment by launching a stored image of that environment, instead of having to manually recreate a particular environment. By saving virtual images and centralizing image management, IT managers and administrators can migrate and move virtual images to available systems to expedite deployment.”

This IBM Redpiece — that’s a Redbook that’s still in draft form — has a great deal of good information. It covers the latest version of PowerVC, Version 1.3.1.

Nigel Griffiths has also posted a series of PowerVC videos: Part 1Part 2Part 3, and Part 4.

This is a video by PowerVC developer Ana Santos:

There’s also this recent webinar. View the replay here. It’s part of the “IBM Power Systems Technical Webinar Series (including Power Systems Virtualization – PowerVM).” Yep, that whole thing is the name of the webinar series. Catchy, huh? Go here for the slides.

Here’s a session from back in January (slides and replay).

If you’re using the older version of PowerVC, the AIX Virtual User Group did a PowerVC demo in December 2013 (slides and replay). I expect the VUG will have an update on this in the near future.

Finally, there’s the PowerVC cheat sheet, courtesy of Nigel and the AIXpert blog. The same site has this older piece as well.

The trick to getting PowerVC installed in your environment is having a licensed copy of Redhat Linux for either Power or x86. That way, you can install packages from repositories other than those found on the installation DVD. In the future I’ll get further into installation, but I thought it would be helpful to present these resources first.

Another trick: If you find it daunting to watch these long videos, you can download Google Chrome plugins. Start by searching for “video speed controller” or “youtube playback speed control.” These allow you to speed up YouTube videos to, typically, 1.5X or 1.75X playback speeds. Even at the faster speed, you should still be able to understand what’s being said. Assuming you aren’t overly annoyed by the audio, you can save significant time digesting the information.

Lessons Learned from Camp

Edit: I am still missing camp.

Originally posted July 26, 2016 on AIXchange

As long time readers know, I work with Boy Scouts. Recently we took 19 boys to a week-long summer camp, and while I always find being around kids to be instructive, this time I realized that some of these lessons apply to techies as well as campers.

1) We take cell phone coverage for granted.
I don’t know about you, but I’m online quite a bit, to the point where things like checking email and the news are second nature. In addition, I do web searches and I send myself reminders and notes. Except, of course, when I’m out in the wilderness. At least, none of these capabilities exist where our camp is located. If I had to, I could get a signal by literally climbing a mountain. Inconvenient as it is though, I don’t mind putting down my phone for a week. Not only is it less of a distraction when I’m camping (and engaged in activities like swimming, rowing, hiking and horse-riding), I have a greater appreciation for its capabilities when I’m back home. Really, I noticed all of the adults in our camp were more engaged once they realized and accepted that checking for messages wasn’t an option.

2) The world will go on without you.
Between work and camping, I will squeeze in an occasional vacation. But if I have access to my phone, I tend to use it, which can make my vacations seem a lot like working remotely. I routinely find myself checking in, answering questions and generally being available. If camping isn’t for you, find some other way to disconnect when you’re out of the office. Set your out of office message and trust your team members to hold the fort while you go and recharge your mental batteries.

3) We all have adversity to overcome and things to learn. A little enthusiasm helps with both.
Most 12- and 13-year-olds who spend their lives in comfortable suburban surroundings struggle with homesickness, fear of swimming, fear of heights and climbing, fear of sleeping in the woods by themselves (as part of the wilderness survival merit badge), and fear of just being outdoors when a thunderstorm rolls in. But with Scouts, I get to see them overcome their fears and meet the challenges before them. It can be tough facing new technologies and techniques in our careers, especially as we get older and set in our ways. But approaching these challenges with enthusiasm does make a difference. 

4) Meetings go better with food.
When someone orders in lunch or brings treats to a work meeting, it lightens the mood and makes it easier to pay attention. It’s the same with kids. Provide the treats and you’ll get their attention — at least for a few moments.

5) We all need plenty of rest.
Kids need their rest, but one thing we learn in Scouts is you have to get your charges good and tired before bedtime so they’re too exhausted to run around and play pranks all night. Get them up early, keep them up late and make sure they’re active throughout the day. At camp we started at 5 a.m. with the polar bear swim (the staff even throws ice in the pool to really intimidate the campers). Then there were merit badge classes. After that, we had some free time, followed by dinner and campfires. By the time their heads hit the pillows they were out until sunup. The grown-up version of this is get as much accomplished as you can during the day, but when it’s time to rest, rest.

6) Being in shape will help you keep up
Whatever you want to do professionally or personally, you’ll have more energy and get more enjoyment from what you’re doing if you’re exercising and eating right.

As exhausting as a week of herding cats — I mean… kids — is, Scouts camp was a fantastic experience. I’m already looking forward to next year.

Lots of Potential Bookmarks

Edit: Some links no longer work.

Originally posted July 19, 2016 on AIXchange

The January 2016 AIX Virtual User Group meeting featured a presentation that you should check out. It’s from Steve Pittman (download the PDF; watch the video).

One of the things Steve talks about is this web page. It contains links to tons of information covering a wide variety of topics. Seriously. Tons.

There are best practices and scripts. You can learn how to download ISO images from IBM, open PMRs and set up VNC.

In his presentation, Steve says, “most of the how-tos are written for AIX V5.3, but are applicable to AIX V6.1 and V7.1, since there are not many differences between AIX V5.3, V6.1, and V7.1”

In all, I count close to 40 links. Here are just a few of the topics:

  • How to download ISO images of installation CDs for software which is entitled on a Power System server.
  • How to open and view AIX software trouble tickets (Problem Management Records/PMRs) on the Internet.
  • How to initiate a stand-alone dump if an AIX LPAR is hung.
  • How to retrieve a history of diagnostic codes which have been displayed by an LPAR.
  • How to use AIX V5.3 filemon to determine where I/O requests originate.
  • How to use AIX V5.3 fileplace to determine the location on disk of a given file block.
  • How to install, configure, and use SSH on AIX V5.3.
  • How to install, configure, and use VNC on AIX V5.3.
  • How to configure AIX V5.3 as an NTP client.
  • How to configure AIX V5.3 to send mail to users on other hosts.
  • How to monitor for hardware errors on AIX V5.3.
  • How to monitor for issues with dump space on AIX V5.3.
  • How to monitor paging space utilization on AIX V5.3.
  • How to change the order in which AIX V5.3 mounts filesystems.

If that isn’t enough for you, this page — introduced as “a collaborative site for AIX technical information” — has still more links. Although some of this material may be a bit dated, overall it is great information.

I’ll just list the headings for these. Most come with 3-10 different links:

  • Hot new topics and popular wiki pages
  • Getting Started
  • Maintenance
  • Performance
  • Virtualization
  • Security
  • Enterprise Edition and Management Edition for AIX
  • POWER7 and AIX 7
  • POWER6 and AIX 6 Redbooks for AIX and Virtualization
  • Best Practices
  • SAP and Power Systems
  • Code Development
  • Cloud-based Benchmarks
  • Communities and Social Networking

Were you aware of this information? Take some time with both pages, and you might find a number of things worth bookmarking.

A Pictorial Guide to vSCSI Disk Settings

Edit: Still a good Redbook.

Originally posted July 12, 2016 on AIXchange

One of the challenges of configuring virtual disks with the VIO server is knowing which settings must be changed during setup. I recently had something brought to my attention that should help clarify things.

It’s from the IBM Redbook, “PowerVM Virtualization Introduction and Configuration.” Go to page 498, and you’ll find a diagram listing the different settings that need to be changed at each layer of the virtual environment. The pages that follow offer good explanations of what each setting means and why you’d want to change it. These settings are specifically for vSCSI disks.

Here are the proper settings in the client at the hdisk level:

    algorithm=failover
    reserve_policy=no_reserve
    hcheck_mode=nonactive
    hceck_interval=60
    queue_depth=xxx

Use these settings at the vSCSI client level:

    vscsi_path_to=30
    vscsi_err_recov=fast_fail

Use these settings at the hdisk level on the VIO servers:

    algorithm=load_balance
    reserve_policy=no_reserve
    hcheck_mode=nonactive
    hceck_interval=60

Use these settings for your fscsi devices on the VIO server:

    dyntrk=yes
    fc_err_recov=fast_fail

This Redbook clears up a number of other topics as well. There’s an I/O virtualization overview, a planning section and an implementation section with examples. Processor and memory virtualization is covered in a similar manner. In addition, the authors hit on recent PowerVM enhancements, capacity on demand and the System Planning Tool.

To be sure, it’s a lengthy document, but I’m willing to bet you will learn something — likely, many things — if you take the time to read through it.

Note: On a personal note, nine years ago this week — July 16, 2007 — was the date that AIXchange debuted.

For nine years, I’ve been writing these articles, one week at a time. When I scroll through posts from the fall of 2007, I can see the same themes pop up that still hold my interest today: education and tech conferences, virtualization, the HMC. I even wrote about an early demonstration of Live Partition Mobility.

Over time, I can see my writing “voice” evolve. Numerous times I wonder about you, the reader. The web stats say you’re out there. I also know you’re out there because occasionally, I’ll do a web search and one of my posts that I’d long forgotten about will pop up as the answer to my query. I admit, I feel a sense of accomplishment from this sort of thing.

Still, it would be nice to be able to get a better feel for who you are. How did you find this blog? Which topics most interest you? Why do you keep reading? If you once read this blog but do no longer, why did you stop?

I’m often asked how I find things to write about, but honestly, it isn’t that difficult. There’s AIX and Linux and IBM i, servers and storage and virtualization, Redbooks and other documentation, and commands and scripts (did I mention I love scripts?). Plus I talk to customers, attend workshops and conferences and follow people on Twitter. There are tons of things to write about.

Of course the technology is ever-evolving, but the basics don’t change. We have the best hardware and the best operating systems. We need to virtualize, we need change control, we need to find ways to keep up to date with the technology around us. Hopefully the links and articles I share help you keep up to speed.

I plan to make a bigger deal of the 10-year anniversary in 2017, but for now, let me just say thank you for reading, one week at a time.

Power Systems from a Competitor’s View

Edit: Why wouldn’t you run POWER?

Originally posted July 5, 2016 on AIXchange

I’m always interested in stories about customers that choose to migrate from x86 to POWER8 systems. When I hear about 2X performance compared to x86 when running workloads on POWER, I wonder how anyone could consider anything else. Throw in the AIX, IBM i, and Linux on Power operating systems, and to me there’s utterly no reason to run another operating system on other hardware.

Of course, IBM’s competition will make their own cases. Via Twitter, I found this document on migrating from Power Systems to HPE Open Systems, and I’m legitimately curious to hear your own responses to these arguments:

“Hewlett Packard has several decades of experience in migrating mission-critical applications from IBM Power Systems to HP (and now HP Enterprise) open systems. HPE has demonstrated that the majority of such migrations result in a significantly less expensive operating environment – often by a factor exceeding 50 percent. At the same time, the new HPE open environments match or exceed the performance and availability attributes of the original Power Systems.”

First they talk about fewer ISVs supporting POWER. Then they discuss costs.

“Of special importance is the cost of the Oracle database management system. Many of the applications being migrated use an Oracle database. Oracle charges twice as much per CPU for Power Systems than it does for x86 platforms. Furthermore, Oracle RAC (Real Application Cluster) costs $11,500 on an x86 and $23,000 per core on an IBM Power System.”

Why does Oracle charge twice as much per CPU? Because POWER can do twice as much work, so you need half as many cores to run your workload.

It might be worth reading through these slides and thinking about why infrastructure matters and why you might consider POWER systems. Some of these same concepts were covered in a recent AIX Virtual User Group session (video here).

POWER8 has 4X threads per core, 4X the memory bandwidth, 6X cache per core, and runs at higher clock frequencies. Performance per core has grown with each POWER generation, which means you need fewer cores to do the same amount of work, and you can consolidate more workload onto the same server.

Why is Google interested in POWER servers? Why are these new high performance computing contracts being won by POWER servers?

“The other reason to think that Google is serious about the OpenPower effort is that Google is a big believer in the brawny core – as opposed to the wimpy one – and the POWER8 chip has more threads, more memory bandwidth, and will have several high speed interfaces, including IBM’s CAPI and Nvidia’s NVLink, to attach adjunct memory and processing devices directly into the Power chip memory subsystem and processing complex.

“There are only two brawny cores out there: Xeon and Power,” MacKean explained. “Power is directly suitable when you are looking at warehouse-scale or hyperscale computing from that perspective. I actually think that the industry is going to be moving towards more purpose-built computing, and I think that is different users are going to be able to leverage the advanced I/O that IBM is opening up through OpenPower. They are going to be able to go with purpose-built platforms that suit their workloads. I think this is a big part of this. We just heard about the CAPI 2.0 interface having twice the bandwidth and we are actually excited about how that will play out at the system level. It is open, and we are seeing a lot of people innovating in a lot of directions.”

Google gets it. When you’re running critical workloads, you’re not looking for ways to cut corners.

Where do you see yourself going in the future?

Working with Snap Files

Edit: Still a valuable technique.

Originally posted June 28, 2016 on AIXchange

Awhile ago Russell Adams posted an interesting message to the AIX mailing list. He wrote about working with AIX snap files and included a link to his website, which provides some background:

“I frequently work with customer systems where I need a systems inventory. This could be for troubleshooting or just to save the final state of a system for later reference.

I have worked with many consultants who have an inventory script they give customers but I have found that I prefer to use the tools native to the platform when they are available. On AIX I use IBM’s native snap command. If you’ve ever been on the phone with IBM support before, you know they barely wait to ask your name before they ask for you to upload a snap.”

The command he runs on all LPARs in the environment is snap -cfgGiknLt. As he explains:

“This gives a good overview of the system without including a system dump. Most of the time the snap files range from 5 MB to 25 MB.

Always run ‘snap -r’ to clear the snap cache in /tmp/ibmsupt before taking a new snap. This is generally safe as the only files it will remove are files snap knows that it wrote.

By renaming the snap file as follows, you can run a couple of scripts to manipulate the data:

    mv /tmp/ibmsupt/snap.pax.Z /tmp/ibmsupt/$(hostname)_$(date +%Y%m%d).snap.pax.Z”

Russell runs his scripts on his Linux machine, but he’s confident that, with a few tweaks, this could run on AIX as well. Hopefully an enterprising reader will take this on and share the results.

There are two scripts: This one uncompresses and normalizes the snaps, while this one extracts the commands.

His site has numerous examples of extracting snap files and running basic commands. Here’s a small subset:

    % ls -l snap.pax.Z
    -rw-rw-r– 1 adamsrl adamsrl 6748366 May 26 17:20 snap.pax.Z

    % ~/scripts/NormalizeSnap.sh snap.pax.Z
    ========================================
    Untarring, # of files: snap.pax.Z
    pax: ustar vol 1, 199 files, 5775360 bytes read, 0 bytes written.
    Checking general exists.
    ./snapRtvZKUtV/general
    Moving subdirs.
    Opening subsnaps.
    Fixing perms.
    Cleaning empty dirs.
    rmdir ./snapRtvZKUtV/testcase ./snapRtvZKUtV/scraid ./snapRtvZKUtV/other     ./snapRtvZKUtV/hacmp
    Collecting data.
    Cleaning dump and security.
    Renaming to final destination
    Retarring files into: ./7044-170_0110BDC8C_GILSAIX_5300-06-00-    0000_20071205_203526.snap.tar.bz2
    Number of files compressed: 194
    Successfully extracted snap to: ./7044-170_0110BDC8C_GILSAIX_5300-06-00-0000_20071205_203526.snap

He adds:

“Now I can also use standard UNIX text utilities to run aggregate reports on the data from the snaps.

Imagine checking 15 hosts for no_reserve on hdisks, or iostat set to true on sys0. This method of working with snaps can be very powerful even in an offline manner.”

I think this is where the real power of this methodology comes into play. With up-to-date snap information from your systems, you can find out quite a bit about an environment without needing to be VPNed in or logged in at all.

In his mailing list post, Russell explains, “In the spirit of cooperation I wanted to share some of the methods I use for working with AIX snap files. I won’t repeat the full article here but it documents a technique I use for offline data mining for AIX systems including ready to run scripts.”

I would hope — as you make your own changes to these methods — that you would also share your improvements.

Are you already using a method like this in your environment? Can you think of ways to enhance it?

Monitor-ing the Situation

Edit: I did get that USB monitor. And I added a few more to my desktop for good measure.

Originally posted June 21, 2016 on AIXchange

Over the years I’ve discovered that you can never have too many monitors connected to your system. I’m reminded of this whenever I go on the road with a laptop and single screen.

One of these days — even though it will mean adding still more weight to my bag — I’ll break down and get a USB monitor for my laptop:

“If you want the screen space of a traditional monitor mated with the kind of portability you can slip into your laptop’s carrying case, there’s a whole sub-class of monitors designed just for you. These products exist in a sort of limbo between full-size monitors and tablet screens in terms of screen size, resolution, and contrast.”

I find a minimum of two monitors helps me multitask. I can be using one screen that’s logged into a system, while my other screen can be reserved for documentation, or for reading one thing while working on another.

I consider 3-5 monitors a pretty good sweet spot, though someday — someday — I hope to procure a wall of monitors like these.

There are other multi-monitor advocates out here. This article notes that there are productivity benefits to dual monitor usage. This PCWorld piece gets into some of the other benefits.

“Having multiple monitors (and I’m talking three, four, five, or even six) is just…awesome, and something you totally need in your life.

Right now, my main PC has a triple-monitor setup: my main 27-inch central monitor and my two 24-inch side monitors. I use my extra monitors for a number of things, such as comparing spreadsheets side-by-side, writing articles while also doing research, keeping tabs on my social media feeds…

A vertically-oriented monitor can save you a lot of scrolling trouble in long documents. If you’re a gamer, well, I don’t need to sell you on three-plus monitors can be for games that support multi-monitor setups. You just need to plan ahead. Here’s our full guide on setting up multiple multiple monitors—and all the factors you’ll need to take into account before you do so.”

Although that article focuses on using a graphics card with all of your monitors connected to the same system, you can also control multiple systems and monitors with software like Synergy:

“Synergy combines your desktop devices together in to one cohesive experience. It’s software for sharing your mouse and keyboard between multiple computers on your desk. It works on Windows, Mac OS X and Linux.”

What is your ideal setup? Are you OK with just one monitor and lots of windows, or do you prefer lots of windows across lots of monitors?

The AIX Expansion Pack

Edit: How often do you use these packages? Some links no longer work.

Originally posted June 14, 2016 on AIXchange

Are you familiar with the AIX Expansion Pack?

“The AIX Expansion Pack is a collection of extra software that extends the base operating system capabilities. The AIX Web Download Pack is a collection of additional applications and tools available for download from the Web. All complement the AIX operating system with the benefit of additional packaged software at no additional charge.”

By selecting the download link from the right side of that page and signing in with your IBM ID, you’ll find a list of different packages available for download, including openssh, openssl, perl, samba, rsyslog and lsof. (Note: These may not be the most current versions of software, so you could run into code issues. Perzl.org may have more up-to-date versions of the software you’re looking for.)

If you’re wondering whether IBM supports specific programs from the IBM Expansion Pack, this pretty handy table can help you determine whether you can open a PMR. Some entries are marked with PMR support, some have critical support only as part of particular products, while others are unsupported.

If you use open-source software in your AIX environment and you’d like IBM to continue to host and maintain offerings like the AIX Expansion Pack, it wouldn’t hurt to let them know about it. The sentiment expressed in this old post still applies:

“I would also recommend telling your local IBM representative that you think this needs to be fixed. Customer pressure is a good incentive for IBM to get organized, sort this out and eventually works.”

Don’t Forget About Server Consolidation

Edit: I want my enterprise class server.

Originally posted June 7, 2016 on AIXchange

You likely know that we can run multiple operating systems on Power servers. With powerful POWER8 servers, we can consolidate workloads such as AIX, IBM i, and Linux and run them simultaneously on the same server — assuming it’s not one of the newer L or LC Linux-only models.

But about those L and LC boxes: They’ve come up a lot in my recent conversations with customers. While IBM is quick to remind customers that it’s still heavily invested in AIX and IBM i and they’re not going away, they’re also up front with their message about Linux and POWER8 servers: It’s a powerful combination.

When customers are interested in going head to head with x86 servers and competing on cost, the Linux-only L and LC models running PowerKVM virtualization make for an easy case. You’ll get better performance at a lower price. In addition, IBM has also made it convenient for new Power customers to run PowerKVM, in that you don’t need an HMC to manage your systems. Obviously an enterprise that doesn’t use the HMC may not want to invest the time to learn about HMCs and VIO servers.

It’s great to see the interest in these offerings. However, I often end up reminding my customers that an existing IBM solution, the PowerVM hypervisor, might actually be a better option for running their Linux workloads.

Linux workloads can run on smaller scale-out servers, but they can also run on larger systems. This is where PowerVM fits in. It handles Linux workloads, even if you’re not running AIX or IBM i on your frame. 

PowerVM is a mature virtualization offering that’s been running mission-critical workloads for years. Think about it: When is the last time you have had an issue with PowerVM? In addition, when compared with PowerKVM, PowerVM has a better guaranteed quality of service and lower virtualization overhead (because the hypervisor is in the firmware rather than running QEMU). With the ability to have multiple VIO servers, you have higher availability for your systems, and the capability to perform maintenance on those redundant VIO servers. Because you have a smaller attack surface with a firmware-based hypervisor, there’s also better VM segregation and better security. In addition, PowerVM allows you to have shared processor pools to reduce licensing costs and guarantee a certain amount of resources to a group of workloads.

PowerVM offers other advantages. You can choose to set up your LPARs with shared dedicated processors. When defining LPARs, you can guarantee a minimum entitlement for your LPAR and you can hard-cap your virtual machines. Assuming you’re running on higher-end hardware, you’ll be able to use capacity on demand and dynamically change more of the settings on your LPARs compared to what you can do with PowerKVM.

As I said, it’s great that IBM has an option in PowerKVM that competes with x86 systems on cost and performance. But here’s the thing many customers forget: Replacing 20 x86 machines with 20 Power L or LC models isn’t the only option. You may find it more beneficial to consolidate those 20 x86 servers into a small number of beefier Power servers running PowerVM. Your data center cabling, power and cooling requirements will all go down, while your average server utilization will go up.

Sure, you could alternatively replace those 20 x86 machines with a smaller number of Linux-only machines. In doing so, you’ll get better performance per core with Power. But with larger enterprise servers, you can have a far greater number of cores and much more available memory to work with when compared to any of the scale-out models.

Even as IBM continues to update and advance its Linux story, there’s still much to be said for consolidating workloads through PowerVM. These servers remain well worth considering.

Upgrading SDDPCM Drivers

Edit: I still love getting scripts from readers

Originally posted May 31, 2016 on AIXchange

In January I posted some scripts I’d received from Simon Taylor. He’s since provided me with more:

“Hi Rob,
Annual upgrades are happening again. We have the common problems with getting downtime, etc., and I wasn’t over keen on the published methods of upgrading sddpcm device drivers. Fortunately, I came across a post by Josh-Daniel S. Davis on replacing the pre-deinstallation script (which fails if there are any active disks) with one that just exits 0.

Here’s how it works:

I’ve added a post installation (-z) script for nimadm alt_disk_migration. The alt_disk migration takes place in a chroot environment and I expected that there would be no real access to disk device drivers from within the chroot. This seems to be true and my environment migrated successfully from AIX 6 and devices.sddpcm.61.rte 2.6.0.3 to AIX 7 and devices.ssdpcm.71.rte 2.6.7.0

I bundled the sddpcm and devices.fcp.disk.ibm.mpio into the post installation script using uuencode because by the time the post installation script runs, the migration lpp_source had been unmounted. (There’s an install_all_updates script built into the migration that tries to upgrade all software in the lpp_source not already updated by the main upgrade logic. The install_all_updates fails on sddpcm).

The script includes a bit of logic (lslpp -Lqc “devices.sddpcm*”) to find the current version and decides whether or not to upgrade. If an upgrade is necessary, the deinstallation script is found using “ls /usr/lpp/devices.sddpcm*/deinstl/*pre_d” and replaced with the exit 0 script. This has helped us towards our goal of one-click upgrades.”

Simon’s .tar file 

includes this information:

    The mk_alt_post_script tars up the contents of the tar subdirectory
    and uuencodes them into a script called post_alt_mig_script which is
    called by the nimadm command. The attached tar file contains:

    ./alt_disk/
    ./alt_disk/tar/                                         # add the lpps here and run inutoc
    ./alt_disk/tar/readme
    ./alt_disk/tar/upgrade_sddpcm.ksh        # removes old sddpcm and installs new
    ./alt_disk/mk_alt_post_script               # builds post_alt_mig_script
    ./alt_disk/readme

What do you think? Is this something you’d find useful in your environments? If you have similar scripts or ideas that can be shared, please contact me.

Finding the Motivation to Change

Edit: I am still more active than I once was, and I have kept the weight off.

Originally posted May 24, 2016 on AIXchange

This blog typically covers AIX and other technical topics. However, every now and again I write about something else that interests me. This week’s topic, honestly, is sensitive.

You’re overweight. Or, if you’re not, you likely know someone who is. The Centers for Disease Control and Prevention estimates that at least one-third of Americans are obese:

“Obesity is common, serious and costly.

More than one-third (34.9% or 78.6 million) of U.S. adults are obese. Obesity-related conditions include heart disease, stroke, type 2 diabetes and certain types of cancer, some of the leading causes of preventable death.

The estimated annual medical cost of obesity in the U.S. was $147 billion in 2008 U.S. dollars; the medical costs for people who are obese were $1,429 higher than those of normal weight.”

This isn’t necessarily a comment on the IT industry, but obviously our work makes it convenient to fall into a sedentary lifestyle:

“How many of us IT professionals are putting on a few pounds? We do generally have relatively sedentary lifestyles. We drive to our jobs, and sit in front of a computer all day. And if we’re not doing that, we’re sitting in a meeting. Then we go home and play video games and/or watch TV and movies. We eat more fast food than fruits and vegetables. Over time, this lifestyle takes its toll.

Starting healthy new habits like eating better and exercising more can be tough. It can be harder still to maintain these habits. I would argue that some in the IT industry — myself included — should think about getting the habit in the first place.”

When I wrote that back in 2009, I was talking to you — and, as noted, literally describing myself. I was eating junk and putting on pounds. But more recently, things have changed dramatically for me.

I’ll be honest: the logical arguments I made back then did nothing to alter my own behaviors. What happened was my sons got involved in Boy Scouts. I wanted to support them. To become an adult leader, you’re required to get a physical and fill out some paperwork. Basically, you need to demonstrate that you’re fit enough to participate in the week-long summer camps and backpacking trips with troops. One of the BSA forms mentions BMI limits:

“Excessive body weight increases risk for numerous health problems. To ensure the best experience, Scouts and Scouters should be of proportional height and weight. One such measure is the Body Mass Index (BMI), which can be calculated using a tool from the Centers for Disease Control here: http://www.cdc.gov/nccdphp/dnpa/bmi/ . Calculators for both adults and youth are available. It is recommended that youth fall within the fifth and 85th percentiles. Those in the 85th to 95th percentiles are at risk and should work to achieve a higher level of fitness.”

My doctor took this information seriously, and told me that he wouldn’t sign my paperwork until my BMI was where it needed to be. That was my wake-up call. I finally took my weight seriously. I finally stopped stuffing my face.

You’ve heard it all before, diet and exercise. That’s all it is. As mathematically inclined people, we should be able to understand that to lose weight we need to eat less than we burn. Skip the french fries and the hamburger buns and the soda. Mix in a salad. More protein, fewer carbs. Watch your portion sizes.

I’ve been going to a gym. I tried that previously, but I’d either lose motivation or get bored, mostly because I had no idea what I was doing. This time, I hired a trainer and attended classes. For me, it’s well worth the cost. Having someone to hold me accountable and vary my activities definitely helps.

Interesting thing: As much as working in technology can lead you to unhealthy lifestyles, there’s now a lot of cool tech stuff that can help you lose weight. There are apps that allow you to scan bar codes on food packages so you can more easily track your caloric intake. I have a scale that automatically connects to the web each time I get on it. It graphs my weight and tracks my BMI measurements. I have heart rate monitors that show how much effort I put into my exercise. I have fitness trackers that count the steps I walk. Obviously you don’t need the gadgets, but as a techie I enjoy them.

I’m much more active now: running, biking, swimming, hiking. I lived near mountains for awhile. Now I climb them. One of this summer’s scouting activities is a trek to the bottom of the Grand Canyon. For the past three years I’ve participated in an event that purports to be the country’s largest Boy Scout triathlon. The first year I tried it, I was so out of shape I didn’t finish. The second year, I did finish, and last year I lowered my time by 12 minutes compared to the year before. Next time out, I expect to reduce my time again, hopefully by a similar margin.

The point is, since December 2012, I’ve lost more than 60 pounds. I’m still a work in progress, but I believe I’m on the right path.

I know it’s unlikely that my story will cause anyone to change, because I understand that I’m not telling you anything you don’t already know. Most of us engage in unhealthy behaviors. We smoke, we drink, we eat too much, we don’t exercise enough. We know about the health risks but for whatever reason, we don’t make meaningful changes. I personally know how it feels to lose weight, and then put it back on. And I know how easy it is to ignore what you see in the mirror.

But now, I also know how it feels to climb mountains without getting winded. I know how it feels to have my heart rate quickly return to normal after vigorous exercise. I know how it feels to go on lengthy hikes carrying a backpack that weighs more than the pounds I’ve lost. I know what it’s like to have to buy new clothes because nothing in the closet fits anymore. And I find all these things so personally gratifying. That’s why I’m sharing this with you.

If nothing else, if you see me at conferences eating junk, you can remind me of this piece. You can help hold me accountable. Or just maybe, someone will read these words and decide to actually make a change. If even one of you does, I’ll consider my efforts worthwhile.

Finding Lifecycle and Other Product Info

Edit: These charts project far into the future.

Originally posted May 17, 2016 on AIXchange

When is my version of AIX or PowerHA going out of support?

These types of questions come up all the time. The good news is there are multiple ways to find quick answers to them.

IBM has a support lifecycle webpage that provides this type of information about these and other products, like VIO server and PowerKVM. You can also learn product IDs, availability dates and all the different versions of various solutions.

“The IBM Software Support Lifecycle policy specifies the length of time support will be available for IBM software from when the product is available for purchase to the time the product is no longer supported. IBM software customers can use this site to track how long their version and release of a particular IBM software product will be supported. Using the information on this site, customers will be able to effectively plan their software investment, without any gaps in support.

Find detailed information about the available IBM Software Support Lifecycle Policies to help you realize the full value of your IBM software products.

Use the search form, or browse by software family or product name, to find the software lifecycle details you need. To stay up to date, subscribe to the lifecycle news feed, or download lifecycle data in XML format to import into your spreadsheet program or custom data processing application.”

Another option is to visit Fix Central and request to view fix packs for a particular AIX version. For example, if you browse to this page and scroll to the bottom, you’ll see a graphic showing the lifecycle for AIX version 7.2, accompanied by some useful verbiage discussing support plans. Additional graphics are available for other AIX versions to help you visualize where you are and when fixes were released.

If you have trouble displaying the graph, you can get there quickly via FLRT lite. Select one of the AIX versions and scroll to the bottom of the new page.

Finally, there’s this AIX support lifecycle chart.

Enhanced Support Options

Edit: Still the only way to go. Many of these links no longer work.

Originally posted May 10, 2016 on AIXchange

If you have IBM maintenance and support contracts on your IBM hardware and software, it’s a straight-forward arrangement. When something breaks, you can open a PMR and get help.

But did you know that different levels of IBM support are available? Two options you might not know of are Enhanced Software Support and Custom Technical Support.

These options are considered upgrades from “standard” IBM support and might be worth looking into for your environment. I have customers that use these services and believe they receive substantial benefits for the extra cost. This stems from IBM being able to provide customized, proactive support as they get to know their unique environments. I’ve seen IBM meet with the customers’ IT staffs via conference calls and online meetings. IBM Support will prepare reports to use in reviewing open and closed PMRs and highlight available fixes that are applicable to their environments.

This datasheet has detailed information:

“But many others prefer to rely on outside services to supplement their in-house staff with the technical expertise they need — while still retaining full control and ownership of their IT infrastructures. And that’s where IBM Software Support Services — custom technical support comes in.

As a CTS client, you are assigned a technical solutions manager who can:

  • Act as an extension of your staff with the added advantage of IBM support
    • Facilitate appropriate service for you and update your priority support team of your needs.
    • Offer custom problem-prevention assistance to help you make more effective maintenance decisions
    • Use IBM proprietary state-of-the-art analysis tools that can anticipate problems and work with you to help prevent them
    • Provide helpful information on new products, practices and technologies as appropriate.”

One of IBM’s analytical tools is called ProWeb. I recommend you watch this introductory video to learn about it.There is also a technical support appliance that’s designed to help you:

  • Streamline IT inventory management by intelligently discovering inventory and support-coverage information for IBM and non-IBM equipment.
  • Improve technical support management with analytics-based reports and collaborative services.
  • Mitigate costly IT outages via operating system and firmware recommendations for selected platforms.

Go here for details.

Did you already know about these IBM offerings?

What’s in Your Bag?

Edit: This was terrifying. Glad I avoided a watch list.

Originally posted May 3, 2016 on AIXchange

If you travel for your job as I do, you probably lug lots of gear. Chargers, cords and adapters are just some of the necessities that keep your gadgets in working order while you’re on the go.

If you spend any time on the raised floor, hopefully you have a cord that you can plug into a PDU, like these. From that cord I plug in a portable power strip, like these. This allows me to plug in all the gear I need during long stints in the computer room. I find power strips also come in handy when I’m sitting in airports and outlets are at a premium. You can be instantly popular by allowing others to plug into yours during a layover. That said, if you don’t carry a power strip, keeping your battery-powered items charged will usually suffice.

Speaking of batteries, I always bring extras for my laptop to keep it powered up for long flights or any extended time away from outlets. I also carry extra batteries for my noise-canceling headphones (which are great on planes or raised floors) and extra external battery packs for charging my cellphone.

All of this is a prelude to a story about the importance of keeping batteries separate from the rest of your gear.

I have Tom Binh’s snake charmer, and am quite happy with it. Typically I’d just cram my cables and batteries and everything else into it and not give it a second thought. Then late last year I’m at the airport, waiting to head home, and I detect the smell of burning wires or plastic. I wrote it off to the holiday lights and decorations that were plugged in all over. Or maybe it was dust on a bulb or something. Once I got on the plane, the smell disappeared, so I didn’t give it a thought — that is, until I pulled my laptop out of my bag. The same burning smell returned. It was coming from my gear.

It was coming from the snake charmer. I had three external batteries inside of it that I use to recharge my cell phone. Somehow one of the prongs from a power adapter had jammed into the USB port on the battery pack:

“Battery pack manufacturers incorporate safety devices into the pack designs to protect the battery from out of tolerance operating conditions and where possible from abuse. While they try to make the battery foolproof, it has often been explained how difficult this is because fools can be so ingenious.

Subjecting a battery to abuse or conditions for which it was never designed can result in uncontrolled and dangerous failure of the battery. This may include explosion, fire and the emission of toxic fumes.”

Here’s more to keep in mind about carrying extra batteries:

“Any kinds of conductive material being bridged with the external terminals of a battery will result in short circuit. Based on the battery system, a short circuit may have serious consequences, e.g. rising electrolyte temperature or building up internal gas pressure. If the internal gas pressure value exceeds the limitation of cell cap endurance, the electrolyte will leak, which will damage battery greatly. If safe vent fails to respond, even explosion will occur. Therefore don’t short circuit.”

So here I am with a battery that has a melting USB slot with a piece of metal jammed and fused into it. It’s hot, it stinks of burning plastic and metal, and it’s on a plane. Is this contraption going to explode or catch fire? And what will become of me? Will the plane be diverted? Will I be kicked off the flight?

These thoughts raced through my mind. But then, fortunately for me, I had a MacGyver moment. I realized I could unscrew the back piece of the charger, which exposed two wires and a small circuit board connected to the battery itself. I just detached the wires from the battery, it immediately stopped smelling, and the battery started to cool down. Crisis averted.

The amazing part was no one said a thing. A guy seated nearby was watching me, and two flight attendants came by, but none of them questioned me. Everyone acted like it was perfectly normal for a guy to have a smoking stinking electronic device with a circuit board and wires coming off of it on an airplane. Thankfully, the rest of the flight was uneventful.

I learned my lesson though. Now I make sure to segregate my portable batteries from the rest of my plugs and chargers. I still have no idea how that power adapter managed to find that battery’s USB slot, but now I realize that such an occurrence is a possibility.

The point is, check your bags. You may not haul as many batteries as I do, but if you attended the same conference I did last year, you may have same type of charger. Learn from my mistake.

LPM and Firmware Compatibility

Edit: Check your firmware!

Originally posted April 26, 2016 on AIXchange

Here’s something of interest to those who use live partition mobility (LPM): IBM has created a matrix that shows firmware compatibility for conducting LPM operations between systems:

“Ensure that the firmware levels on the source and destination servers are compatible before upgrading.

In [Table 1], you can see that the first column represent the firmware level you are migrating from, and the values in the top row represent the firmware level you are migrating to. The table lists each combination of firmware levels that support migration.”

Below that first chart is a list that “shows the number of concurrent migrations that are supported per system. The corresponding minimum levels of firmware, Hardware Management Console (HMC), and [VIO servers] that are required are also shown.”

Then there’s a list of restrictions, followed by a table that shows the firmware levels and POWER models that support partition mobility:

“Restrictions:
• Firmware levels 7.2 and 7.3 are restricted to eight concurrent migrations.
• Certain applications such as clustered applications, high availability solutions, and similar applications have heartbeat timers, also referred to as Dead Man Switch (DMS) for node, network, and storage subsystems. If you are migrating these types of applications, you must not use the concurrent migration option as it increases the likelihood of a timeout. This is especially true on 1 GB network connections.
• You must not perform more than four concurrent migrations on a 1 GB network connection. With VIOS Version 2.2.2.0 or later, and a network connection that supports 10 GB or higher, you can run a maximum of eight concurrent migrations.
• From VIOS Version 2.2.2.0, or later, you must have more than one pair of VIOS partitions to support more than eight concurrent mobility operations.
• Systems that are managed by the Integrated Virtualization Manager (IVM) support up to 8 concurrent migrations.
• The Suspend/Resume feature for logical partitions is supported on POWER8 processor-based servers when the firmware is at level 8.4.0, or later. To support the migration of up to 16 active or suspended mobile partitions from the source server to a single or multiple destination servers, the source server must have at least two VIOS partitions that are configured as mover service partitions. Each mover service partition must support up to 8 concurrent partition migration operations. If all 16 partitions are to be migrated to the same destination server, then the destination server must have at least two mover service partitions configured, and each mover service partition must support up to 8 concurrent partition migration operations.
• When the configuration of the mover service partition on the source or destination server does not support 8 concurrent migrations, any migration operation that is started by using either the graphical user interface or the command line will fail when no concurrent mover service partition migration resource is available. You must then use the migrlpar command from the command line with the -p parameter to specify a comma-separated list of logical partition names, or the –id parameter to specify a comma-separated list of logical partition IDs.
• You can migrate a group of logical partitions by using the migrlpar command from the command line. To perform the migration operations, you must use -p parameter to specify a comma-separated list of logical partition names, or the –id parameter to specify a comma-separated list of logical partition IDs.
• You can run up to four concurrent Suspend/Resume operations.
• You cannot perform Live Partition Mobility that is both bidirectional and concurrent. For example, [when] you are moving a mobile partition from the source server to the destination server, you cannot migrate another mobile partition from the destination server to the source server.”

Note that if you do not check your firmware versions, a firmware update can cause future planned LPM operations to fail. That’s all the more reason to add this link to your planning checklist.

Speaking of LPM, Chris Gibson takes note of a new HMC system-wide setting that allows LPM with inactive source storage VIO server.

“A new HMC & Firmware 840 feature allows LPM in a dual VIOS configuration when one VIOS is failed. Previously, LPM was not allowed if one of the source VIOS (in a dual VIOS configuration) was in a failed state. Both [VIO servers] had to be operational to perform LPM. The new support allows the HMC to cache adapter configuration from both [VIO servers]. Whenever changes are made to the configuration, the cached information will be updated on the HMC. If one VIOS is failed, instead of querying the failed VIOS, the HMC cache is used instead to create the new configuration on the target VIOS. This support was needed to cover the situation where there’s failed hardware which is causing an outage on the VIOS and requires a disruptive repair action. This new feature is enabled using a server wide HMC setting to enable the automatic caching of VIOS configuration details.”

Why Don’t We Have Root on the HMC?

Edit: I still want root.

Originally posted April 19, 2016 on AIXchange

For as long as there’s been an HMC, there have been frustrated administrators wishing they had root access to it.

The argument for root does contain a certain logic. The HMC runs Linux under the covers, so shouldn’t we, as UNIX admins, have fewer restrictions on what we’re able to do? We have root access (via oem_setup_env) on VIO servers and AIX and Linux LPARs, so why don’t we have root on the HMC? Of course, I’ve yet to meet a system admin who doesn’t believe he needs to have root on everything he touches. It’s our nature.

I recall some early versions of HMC code providing greater default access to the hscroot user. I’d certainly load things up and run them directly on the HMC. I’d play around with the window manager and load VNC and various software packages and generally do what I wanted since I had root access.

In retrospect, this probably wasn’t a great idea on my part. Having too many things running on the HMC makes it a support nightmare. If something isn’t working, is it because of the actual HMC code or hardware, or is the problem one of your pet tools or programs? If you’re IBM, locking down this critical piece of the Power Systems infrastructure and treating it like an appliance makes it much easier to support.

There are forum threads going back to at least 2005 where users share knowledge about getting root on the HMC. It’s tougher to find working information these days, but there are still methods for getting root that don’t involve IBM Support. Naturally people aren’t as willing to discuss them, because when these techniques do get out, they tend to be quickly invalidated.

Now, IBM Support does allow you to reset HMC access passwords. (Note: In the early days of this blog I wrote about getting the celogin password from support, but this isn’t the same as getting root.)

It’s also possible to get access to the product engineering shell (pesh) and get root if there’s a real need to do so. Honestly, after years of HMC enhancements and refinements, there aren’t many legitimate reasons for needing root at this point. Still, if you need to debug or perform other types of maintenance as root, you can contact IBM Support and follow these instructions:

“pesh provides full shell access to the HMC for product engineering and support personnel. pesh takes the serial number of the HMC machine or unique ID of the virtual HMC where full shell access is requested, then prompts the user for a one day password obtained from the support organization. If the password is valid, the user is granted full shell access. Only the hscpe user can run this command.

To obtain full shell access to a Hardware Management Console (HMC):
pesh serial-number-of-HMC-machine

To obtain full shell access to a virtual HMC:
pesh unique-ID-of-virtual-HMC”

The other thing to keep in mind is root isn’t necessary for dealing some common HMC management issues. Are your filesystems filling up? Try this. Are you dealing with some crazy hscroot syntax? Check out EZH, which makes the HMC command line easier to manage. (Here’s an introductory video.)

So do you want root on your HMC? Why or why not?

Coverage of IBM’s OpenPOWER Summit Announcements

Edit: Is POWER making inroads?

Originally posted April 12, 2016 on AIXchange

Last week I was in Austin for a Linux on Power workshop, when, as the kids say, my Twitter timeline blew up with news from the OpenPOWER Summit in San Jose.

Appropriately enough, as I started to write this, I saw tweets from Nigel Griffiths and David Spurway that referred to IBM’s “unusual” announcement.

I think part of what’s driving interest in this topic is that IBM typically keeps its cards close to the vest. The company seldom chooses to publicly reveal its future plans prior to announcements and general availability. Of course, many industry observers (myself included) have attended briefings where IBM tells you what’s ahead, but in those cases they’ve always made us sign NDAs. So, such public talk about POWER9 processors, which won’t be available until well into 2017, is indeed pretty surprising. Then consider Google’s involvement — they’ve never been forthcoming about their use of POWER — and you can see why this is such a big deal. Industry watchers, even those who primarily cover Microsoft or Apple, are realizing that Linux on Power solutions and POWER8 performance are worth paying attention to.

Anyway, for those of you who aren’t on Twitter, I’ll cite some of the articles covering the announcements relating to IBM POWER8 and POWER9 processors.

The Register:

“OpenPower Summit IBM’s POWER9 processor, due to arrive in the second half of next year, will have 24 cores, double that of today’s POWER8 chips, it emerged today.

Meanwhile, Google has gone public with its Power work – confirming it has ported many of its big-name web services to the architecture, and that rebuilding its stack for non-Intel gear is a simple switch flip.

The POWER9 will be a 14nm high-performance FinFET product fabbed by Global Foundries. It is directly attached to DDR4 RAM, talks PCIe gen-4 and NVLink 2.0 to peripherals and Nvidia GPUs, and can chuck data at accelerators at 25Gbps.

The POWER9 is due to arrive in 2017, and be the brains in the U.S. Department of Energy’s Summit and Sierra supercomputers.

Google says it has ported many of its big-name web services to run on Power systems; its toolchain has been updated to output code for x86, ARM or Power architectures with the flip of a configuration flag.

Google and Rackspace working together on Power9 server blueprints for the Open Compute Project. These designs are compatible with the 48V Open Compute racks Google and Facebook are working on.

The blueprints can be given to hardware factories to turn out machines relatively cheaply, which is the point of the Open Compute Project: driving down costs and designing hardware to hyper-scale requirements. Rackspace will use the systems to run POWER9 workloads in its cloud.

The system itself is codenamed Zaius: a dual-socket POWER9 SO server with 32 DDR4 memory slots, two NVlink slots, three PCIe gen-4 x16 slots, and a total core count of 44. And what’s not to like? For one thing: high-speed NVlink interconnects between CPUs and Nvidia GPU accelerators, which Google likes to throw its deep-learning AI code at.”

The Next Platform:

“Google, as one of the five founding members of the OpenPower Foundation in the summer of 2013, is always secretive about its server, storage, and switching platforms, absent the occasional glimpse that only whets the appetite for more disclosures. But at last year’s OpenPower Summit, Gordon McKean, senior director of server and storage systems design and the first chairman of the foundation, gave The Next Platform a glimpse into its thinking about Power-based systems, saying that the company was concerned about the difficulty of squeezing more performance out of systems, and his boss, Urs Hölzle, senior vice president of the technical infrastructure team, confirmed to us in a meeting at the Googleplex that Google would absolutely switch to a Power architecture for its systems, even for a single generation, if it could get a 20 percent price/performance advantage.

Maire Mahoney, engineering manager at Google and now a director of the OpenPower Foundation, confirmed to The Next Platform that Google does indeed have custom Power8 machines running in its datacenters and that developers can deploy key Google applications onto these platforms if they see fit. Mahoney was not at liberty to say how many Power-based machines are running in Google’s datacenters or what particular workloads were running in production (if any). What she did say is that Google “was all in” with its Power server development and echoed the comments of Hölzle that if the Power machines “give us the TCO then we will do it.”

The POWER8 chips got Google’s attention because of the massive memory and I/O bandwidth they have compared to Xeon processors, and it looks like Google and the other hyperscalers have been able to get IBM to forge the POWER9 chip in their image, with more cores and even more I/O and memory bandwidth. “The vision is to build scale out server systems taking advantage of the amazing I/O subsystem that the OpenPower architecture delivers,” Mahoney added.

We happen to think that Rackspace would have done something like Zaius on its own, but the fact that Google is helping with the design and presumably will deploy it in some reasonable volumes means that the ecosystem of manufacturing partners for the Zaius machines should be larger than for Barreleye. And with IBM shipping on the order of several tens of thousands of Power systems a year at this point, if Google and Rackspace dedicate even a small portion of their fleets to Power, it would be a big bump up in shipments.”

I received links to these articles in a group email to IBM Champions:

Bloomberg: 

“Google also said it’s developing a data center server with cloud-computing company Rackspace Hosting Inc. that runs on a new IBM OpenPower chip called POWER9, rather than Intel processors that go into most servers. The final design will be given away through Facebook Inc.’s Open Compute Project, so other companies can build their data center servers this way, too.”

Fortune: 

“The search giant [Google] said on Wednesday that, along with cloud computing company Rackspace, it’s co-developing new server designs that are based on IBM chip technology.”

IDG News Service: 

“Two years ago, Google showed a Power server board it had developed for testing purposes, though it hadn’t said much about those efforts since. It’s now clear that Google is serious about using the IBM chip in its infrastructure.”

San Antonio Business Journal: 

“The two tech giants are using an open source server created by IBM called the POWER9 processor. It is among more than 50 new products being developed across 200 technology companies as part of the OpenPOWER Foundation, an industry controlled nonprofit dealing with the reality and cost of big data demands.”

TechRepublic: 

“The benefit of the Power architecture goes beyond price for performance. Because of the architectural limitations of x86-64, Intel has faced substantive difficulty pushing the number of threads in a processor. Intel’s 22-core Xeon E5-2699 v4 is limited to 44 threads, whereas the 12-core POWER8 has 96 threads.”

ZDNet: 

“The explosion of data requires systems and infrastructures based on POWER8+ accelerators that can both stream and manage the data and quickly synthesize and make sense of data, IBM said about the UM [University of Michigan] partnership.”

The Next Platform: 

“IBM Unfolds Power Chip Roadmap Out Past 2020.”

As a POWER bigot, I love it when mainstream tech outlets acknowledge the benefits of the technology I know and love. And I’m excited to think that this publicity will lead to new customers potentially choosing Linux on Power over x86 solutions.

Migrating to POWER8 Systems

Edit: Hopefully now you are migrating to POWER9

Originally posted April 5, 2016 on AIXchange

You just found out you’re getting new hardware. But hold the celebration — how do you get your existing LPARs to run on it?

This document covers migration paths for AIX systems to POWER8 systems. It’s “intended as a quick-reference guide in transitioning an existing AIX system from prior POWER architectures to a POWER8 system. The focus is on AIX itself, not the application stack.”

The document shows a graphical chart which covers migration paths including Live Partition Mobility, NIM alt disk migration, update/migration installs, versioned WPARs, etc.

Here’s more:

“Which options are available to me?

For AIX 5.3 and earlier
You’ll need to migrate to a POWER8-supported level. … there are fundamentally 3 options in this case:

1. NIM alt disk migration
2. Migrate in-place, then either mksysb, alt_disk_copy, or Live Partition Mobility (if going from POWER6 or POWER7 system).
3. Create mksysb of 5.2 or 5.3 system, install supported 7.1 on POWER8 system, and create AIX 5.2 or 5.3 Versioned WPAR from the mksysb.

For AIX 6.1 or 7.1
You have the option of doing an AIX update to a supported level instead of a migration, though if on AIX 6.1 you may still choose to migrate to 7.1 to get full POWER8 capabilities. Again… there are fundamentally 3 options:

1. If at a level that supports POWER8 and if the system is LPM-capable, Live Partition Mobility can be used to move to the POWER8 system.
2. If at a level that supports POWER8, use mksysb or alt_disk_copy to move to the POWER8 system and AIX update on the POWER8 system only if desired.
3. Update in-place and either mksysb, alt_disk_copy, or Live Partition Mobility (if going from POWER6 or POWER7 system). Note that if alt_disk_copy is chosen, the update can be to the alternate disk rather than in-place.

Partition Mobility is an option for moving partitions dynamically from POWER6/POWER7 to POWER8 systems, provided that the partitions are LPM-capable. Partition Mobility can be performed on both HMC managed systems as well as on Integrated Virtualization manager (IVM) managed systems. The FLRT tool can be used to validate the source and target systems for LPM.

Two types of migration are available depending on the state of the logical partition:
– The migration is active if the mobile partition is in running state.
– The migration is inactive if the mobile partition is shutdown.

Considerations
– POWER6 or POWER7 system is required.
– LPARs must be LPM capable.”

This document can help you decide which method will work best in your environment. It’s worth your time.

Also, don’t forget that there are now options for running AIX 5.3 on POWER8 systems without using a WPAR. If you find yourself in this situation, you can still move up to POWER8.

Another Lifeline for Those on AIX 5.3 Extended Support

Edit: There are still plenty of people on older hardware and software.

Originally posted March 29, 2016 on AIXchange

Nobody likes to admit it, but many customers are still running AIX 5.3 on older hardware. There are many reasons for this. Maybe you have a few LPARs running an older OS. Maybe you’re reliant on a critical application that’s no longer supported. Maybe you’ve fallen so far behind on patching and upgrading that running an old OS is an acceptable risk. Or perhaps it’s simply the stubbornness of an “if it ain’t broke, don’t fix it” mentality.

Whatever your reason, as long as IBM continued to offer extended support for AIX 5.3, you had some peace of mind. Hopefully though, you understood that this wouldn’t last. And now we know: Earlier in March, IBM announced that AIX 5.3 extended support will be discontinued on April 30. However, IBM is throwing customers another lifeline by offering the capability to run AIX 5.3 natively on POWER8 servers:

“Many clients are still using an IBM AIX 5.3 application environment on their IBM Power Systems serves. AIX 5.3 reached end of life in April 2015. However, an extended service contract was offered for 12 months on all supported hardware. This contract will end on April 30, 2016.

Many clients have a subset of applications that are still dependent on a supported AIX 5.3 environment. IBM is enabling AIX 5.3 to run natively on POWER8 servers. The PTF U86665.bff will enable the AIX 5.3 image to run on POWER8 servers and will be available on March 11, 2016.

The LPAR must be at AIX 5.3 TL12 SP9 (latest 5.3 release).

The POWER8 LPAR will run in POWER6 compatibility mode and is limited to SMT2 mode. SMT2 mode results in some capacity loss compared to SMT4/SMT8 mode. IBM publishes SMT2 rPerf values that can be used to quantify POWER8 SMT2 capacity.

AIX 5.3 POWER8 LPARs only support Virtual I/O configurations vSCSI, NIPV, and VLAN.

Only 5.3 POWER8 technology-based system installation methods are supported:

mksysb: First, perform an in-place update to a supported (POWER5/ POWER6/ POWER7) 5.3 TL12 SP9 LPAR with PTF U866665. Standard mksysb command can then be used to capture a POWER8 capable mksysb image. The mksysb image can then be used to install POWER8 LPARs.

NIM: A 5.3 TL12 SP9 NIM environment must be updated to support POWER8. A 5.3 TL12 SP9 NIM lppsource must be updated to include PTF U866665. A NIM SPOT must then be created or updated to utilize the updated lppsource.

All POWER8 systems are supported with the following restrictions:
* POWER8 systems must be at the 840 firmware level.
* POWER8 LPARs must be served by a 2.2.4.10 or 2.2.3.60 VIOS.

Service and support contract: The AIX 5.3 environment on POWER is planned to be supported for a total of 15 months through June 30, 2017. Clients will have to first acquire a service and support contract, after which they will be entitled to download the PTF.”

Of course others outside of IBM have written on this prior to the official announcement (here). It also came up in an IBM training class I’d attended, where I was told not only about the extension, but its benefits:

“This is a new offering and a very good news for customers that have AIX 5.3 applications needing a supported environment for the next 12-15 months. Customers can usually drop down a tier from their existing servers when they move to POWER8, requiring fewer cores due to higher performance, and save on per core service and support costs. The resulting savings are often significant enough to justify investment in new hardware. Moreover, with fewer cores, the customers can save significantly in software license costs as well.”

Chris Gibson has a nice write-up, along with a first look at running a 5.3 LPAR on POWER8 system.

So if you’re one of those holdouts, let’s hear from you: Will this new capability motivate you to migrate your 5.3 LPARs to POWER8? If not, why not?

Finding Minimum AIX Hardware Support Levels

Edit: I still refer to this all of the time.

Originally posted March 22, 2016 on AIXchange

If you just bought an 8408-E8E — otherwise known as the E850 — you may be wondering about its minimum supported AIX versions. Turns out there’s an easy way to find this information: Just go to the System to AIX Maps web page.

Looking down the list, under the POWER8 heading, you’ll find the E850. There are two choices: one for physical adapters, and another for virtualized adapters and the VIO server.

Under the All I/O Configurations link for the E850, there are two options, 7.1 and 6.1:

    Technology Level    Base Level    Recommended Level        Latest Level
    7100-02        7100-02-06    7100-02-07            7100-02-07
    6100-08        6100-08-06    6100-08-07            6100-08-07

Under Virtual I/O Only, there are several options:

    Technology Level    Base Level    Recommended Level        Latest Level
    7200-00        7200-00-01    7200-00-01            7200-00-01
    7100-04        7100-04-00    7100-04-00            7100-04-01
    7100-03        7100-03-01    7100-03-05            7100-03-06
    7100-02        7100-02-01    7100-02-07            7100-02-07
    6100-09        6100-09-01    6100-09-05            6100-09-06
    6100-08        6100-08-01    6100-08-07            6100-08-07

Obviously you can look up many systems besides the E850. Available hardware types go all the way back to POWER4 and beyond, including even older models that only run older versions of AIX. In some cases that old information is not online, but for most of the hardware you’d run, you can at least find the minimum levels that can be run depending on how you set up your I/O.

System to AIX Maps is well worth a bookmark. Be sure to check here to verify that what you’re planning to do will actually work.

POWER Systems? There’s an app for that.

Edit: Do you run the app?

Originally posted March 15, 2016 on AIXchange

It seems like there’s an app for everything related to your Power hardware these days. There’s myHMC Mobile (which I covered here), along with the IBM Redbooks Mobile App. And now there’s the IBM Technical Support Mobile App (available for Android and iPhone):

“The IBM Technical Support mobile app lets clients worldwide quickly and easily access key technical support content and functions for all IBM software and hardware products.

You can use the app to:

  • Expedite troubleshooting by searching for, viewing, and bookmarking technical support content like technotes, APARs, documentation, and Redbooks.
  • View and update your software and hardware Service Request tickets whenever and wherever you need to.
  • Discover the best fixes for your system and email the fix orders using the Fix Level Recommendation Tool.
  • Look up warranty information for hardware systems by scanning the bar code or entering the Machine Type/Serial Number.
  • View Customer Support Plans for your products.
  • Contact IBM, with geo-location assistance and click-to-call.
  • Provide feedback about the app through its Feedback form.”

Installing the app on my Android phone was simple enough. I searched Google Play for IBM Technical Support and it came right up. It did need to access to quite a few permissions. I never know why apps need access to my camera or my photos and files, but I accepted everything so I could test it out.

There are quite a few options on the main page, things like support content where you can search for whatever you need. For a test I just entered S822 and quite a few useful items came back. Being able to do these quick, simple lookups could certainly be handy whenever I’m on a raised floor.

There’s a menu option for service requests. I signed in with my IBM ID and was able to view my open software requests, hardware requests, etc. By selecting Full Site, I was able to open a new PMR from my phone.

Features include support videos, questions and answers (which brings you to IBM developerWorks forums), customer support plans and warranty lookup (where you can scan your server’s bar code). When I entered my machine type and serial number, I received info about my warranty status and system expiration date, along with parts that were shipped with the system.

There’s a menu item that takes you to FLRT LITE. Another option lets you change your settings and language. There are also options to provide feedback and contact IBM.

I found the back button would take me out of the app more often that I liked, but hopefully over time this will be addressed. Overall, I expect I’ll legitimately use this a lot — and I’m not a person who downloads a ton of apps.

If you’ve downloaded and tried the IBM Technical Support Mobile App, let me know what you think.

The Fix (Level) is In: Using FLRT and FLRT LITE

Edit: I still use them both.

Originally posted March 8, 2016 on AIXchange

I’ve mentioned FLRT previously (herehere and here). Hopefully you’ve taken advantage of the tool. On countless occasions it’s helped me determine the latest versions of OSs, firmware and applications, along with end of life, etc.

From IBM:

“The Fix Level Recommendation Tool (FLRT) provides cross-product compatibility information and fix recommendations for IBM products. Use FLRT to plan upgrades of key components or to verify the current health of a system. Enter your current levels of firmware and software to receive a recommendation. When planning upgrades, enter the levels of firmware or software you want to use, so you can verify levels and compatibility across products before you upgrade.”

If you’re new to FLRT, here’s how to get started. First, go to the IBM link above and select your server machine type and model. Then skip down to Partition OS and select AIX, and then select the version you’re running. At this point you can click submit and confirm your AIX level. You’ll also be provided with recommendations for updates and/or upgrades.

That’s just the beginning. FLRT allows you to really drill down and find out the recommendations for your entire stack, including machine firmware, HMC code levels, operating system levels, cluster/virtualization and POWER software, and even disk subsystems.

As useful as FLRT is, for the uninitiated, the tool comes with a learning curve. Getting a response to even a simple query — something like, what is the latest version of AIX? — can be a painful exercise. Fortunately, if you’re just looking for a quick answer to a single question, there’s FLRT LITE.

It is a simple interface, it just asks you to choose from one of these products. Once you click on it, the information you are interested in is at your fingertips:

    Power, PureFlex and Power Blade System Firmware
    HMC and HMC Virtual Appliance
    AIX
    PowerVM Virtual I/O Server
    PowerHA SystemMirror
    Cluster Systems Management
    General Parallel File System
    General Parallel File System Standard Edition
    General Parallel File System Express Edition
    General Parallel File System Advanced Edition
    LoadLeveler
    Parallel Engineering and Scientific Subroutine Library
    Parallel Environment
    Parallel Environment Developer Edition for AIX
    Parallel Environment Runtime Edition for AIX
    PowerVP Standard Edition
    PowerKVM
    Red Hat Enterprise Linux
    SUSE Linux Enterprise Server
    PowerVC Standard Edition
    Spectrum Scale

To learn more about FLRT LITE, read these articles (here and here).

As I said, I use FLRT a lot. How about you? How often do you need to look up OS versions and related information? Do you know of an easier way to get it?