Getting Detailed VIO Server Info

Edit: Still good stuff.

Originally posted January 6, 2015 on AIXchange

I wanted to know the VIO server version I was running. Simple, right? I ran $ioslevel, and learned I was on Version 2.2.3.4.

That’s nice, but how do you find out if you’re running fixpacks or service packs? How can you get more information about the version you’re on?

 The help command was very helpful:

            $help ioslevel

            Usage: ioslevel

                   [Reports the latest installed maintenance level of the system.]

In case you missed it, I was being sarcastic.

Then I remembered that there should be a file that contains this sort of information. Unfortunately, I couldn’t recall where it was located. Was it in /etc, along the lines of /etc/redhat-release on a Redhat server? Nope.

So I took a look at the command that was being run under the covers when I ran ioslevel. First I enabled CLI debugging with:

          export CLI_DEBUG=33

Then I tried running ioslevel again:

          $ioslevel

          AIX: “cat /usr/ios/cli/ios.level “

          2.2.3.4

I already had this output, but the debugging did help me locate the file I was seeking. I went into /usr/ios/cli and found these files:

            .license

            .profile

            .profile.ce

            FPLEVEL.txt

            README.txt

            README.vios

            SPLEVEL.txt

            cron_mail_check.sh

            environment

            ios.level

            ioscli

            itm

            langlist

            lsvirt.snap

            man.ksh

SPLEVEL.txt, FPLEVEL.txt and ios.level were the files I needed. Checking them, I discovered I was running:

            $ cat /usr/ios/cli/ios.level

            2.2.3.4

            $ cat /usr/ios/cli/SPLEVEL.txt

            SP-01

            $cat /usr/ios/cli/FPLEVEL.txt

            FIXPACK:FP-25

That solved my problem, but I still have questions. Am I mistaken, or at some point did ioslevel provide all this information automatically? Has something changed recently?

Chef Client and Other Nuggets from Twitter

Edit: Some links no longer work.

Originally posted December 23, 2014 on AIXchange

Recently on his Twitter feed, IBMer Jay Kruemcke noted that Chef Client is now available on AIX. I’ll write about this in detail in the near future, but for now, here’s the word from Chef’s website:

“Today I’m very pleased to announce the availability of Chef Client 12.0 for IBM AIX 6.1 and 7.1. It is freely available from our downloads page and can be used with any version of the Chef Server up to and including Chef Server 12.

This is the first major new platform for Chef in some time, and it’s certainly been a long time in coming. We’ve heard from some of our large enterprise customers that they have significant investment in IBM’s AIX platform and expect to continue that into the future, so they would like to manage those systems using the same flexible automation tool, Chef, with which they are already familiar.”

I’ve long recommended Twitter as a resource for finding news and information about AIX and related topics. I should point out that you don’t need to have your own Twitter feed to benefit from it. For example, you can find my own feed simply by searching on “Rob McNelly twitter.” (For the record, here’s the direct link.) Twitter’s advanced search function is another, more direct way to dive in. Check out the results for a search on “IBM Power Systems.”

As a regular user of Twitter, I follow a number of AIX experts, including Kruemcke. Other recent tweets of Jay’s have led me to a document that summarizes update benefits for AIX releases and related offerings, as well as videos on using nmon interactively and capturing data to nmon files so they can be analyzed later.

Jay also tweeted about this white paper covering HMC simplification:

“Managing the IBM PowerVM infrastructure involves configuring its different components, such as the POWER HypervisorTM and the Virtual I/O Server(s). Historically, this has required the use of multiple management tools and interfaces, such as the Hardware Management Console (HMC) and the Virtual I/O Server command line interface.

The PowerVM simplification enhancements were designed to significantly simplify the management of the PowerVM infrastructure, improve the Power SystemsTM management user experience, and reduce the learning ramp for users unfamiliar with the PowerVM technologies.

This paper provides an overview of the PowerVM simplification enhancements and illustrates how to use the new features available in the HMC to set up and manage the PowerVM infrastructure.”

While I’m at it, here are some other nuggets I’ve found on Twitter lately. Via IBM’s Nigel Griffiths, learn how to use nmon analyser in this Steve Atkins video. And from IBM’s Chris Gibson, here’s information about an STG lab services offering and the LPM automation tool. And thanks to Christophe Rousseau and cmod666.org, I even came across this humorous visual about live migrations.

Do you use Twitter to locate AIX resources? Who do you follow?

 This blog will be updated on January 6, 2015. Happy New Year!

Is This the End for Desktops and Laptops?

Edit: I am still running my laptop and desktop.

Originally posted December 16, 2014 on AIXchange

Lately I’ve read several commentators predict that desktops and laptops will soon be completely replaced by tablets and phones. I sure hope they’re wrong.

Although there’s much to be said for the portability of a phone or a tablet, I still find that there’s much value to be derived from a laptop. Consider this scenario:

“In the classroom, I took my brand new iPhone 6, plugged it into the lecture theatre’s HDMI port, and ran the whole presentation — in high definition, complete with nicely animated transitions — off my phone.”

Really? Surely the author didn’t use his iPhone 6 to create that presentation? What device would you use to look up the material, edit it together and prepare it to be copied over to the iPhone 6?

One thing I’d immediately want in that scenario is multiple monitors, the bigger the better. One screen to search for the clips, and another open with my editing software to create the presentation.

“My friend runs the IT infrastructure for one of Australia’s most successful online retailers. It’s his job to make sure the customer-facing systems ringing up sales are available 24×7. Always on call, getting texts advising him of the status of his servers, services, and staff, he keeps a laptop close at hand, in case something ever needs his personal attention. Something always does.

“‘Got a little Bluetooth keyboard to go along with it,’ he continued. ‘When I’m in the office I’ll AirPlay it over to an Apple TV connected to a monitor. What’s the difference between that and a desktop?'”

 To me one difference is the tools I’m using to do the job. I still have a pretty decent keyboard and pointing stick, which I still feel is a superior “mouse” compared to a track pad. With this new rig they are proposing I would have to make sure to charge my Bluetooth keyboard and find some free monitor I can use somewhere. I’d prefer to just carry the whole thing with me. 

“The desktop has been dead for some years, resurrected to an afterlife of video editing and CAD. Laptops keep getting smaller and more powerful, but we’ve now reached a moment when they’re less useful than our smartphones.

“The laptop market will not collapse overnight. There’s a lot of inertia in IT — people like what they like and tend to use what they know — but the current cycle of PC replacements is likely to be the last one.

“The computer as we have known it, with integrated keyboard and display, has lost its purpose in a world of tiny, powerful devices that can cast to any nearby screen (Chromecast & AirPlay), browse any website, and run all the important apps. Why carry a boat anchor when you can be light as a feather?”

Maybe this is part of my problem. I like my boat anchors. I like knowing that I have an optical device I can use to burn DVDs in a pinch. I like knowing I can pull out that DVD burner and replace it with a second hard drive. I like knowing I can max out the memory and run some flavor of virtualization software (like VMware) and run multiple operating systems at once.

No doubt, many things can be done with a phone, a Bluetooth keyboard and a borrowed monitor. I’m sure for some if not most users, it’s all they need:

“We still rely on devices with processors and memory, they are just different devices. The mobility trend has been clear for years with notebooks today demanding larger market share than desktops. And one thing significant about notebooks is they required of us our first compromise in terms of screen size. I write today mainly on a 13-inch notebook which replaced a 21-inch desktop, yet I don’t miss the desktop. I don’t miss it because the total value proposition is so much better with the notebook.

“What’s still missing are clearcut options for better I/O — better keyboards and screens or their alternatives — but I think those are very close. I suspect we’ll shortly have new wireless docking options, for example. For $150 today you can buy a big LCD display, keyboard, and mouse if you know where to shop. Add wireless docking equivalent to the hands-free Bluetooth device in your car and you are there.”

If you believe some people, the end of the desktop and laptop is already here:

“This year we’ll see an important structural change take place in the PC hardware market. I’m not saying there won’t still be desktop and notebook PCs to buy, but far fewer of us will be buying them….

“The iPhone in your pocket will become your desktop whenever you are within range of your desktop display, keyboard and mouse. These standalone devices [were] Apple’s big sellers in 2014 and [will be] big sellers for HP and Dell in 2015 and beyond. The next iPod/iPhone/iPad will be a family of beautiful AirPlay displays that will serve us just fine for at least five years linked to an ever-changing population of iPhones.”

Have you ditched your laptop because you can do everything you need to do from your phone? Are you close to doing so? I can’t be the only dinosaur out here.

Creating Additional VLANs With the HMC CLI

Edit: Some links no longer work. I still love the CLI.

Originally posted December 9, 2014 on AIXchange

When I create LPARs, I prefer the HMC command line interface (CLI) to the GUI. The CLI is especially advantageous when I’m using a VPN to a remote site that’s slower than I’d prefer. Pointing and clicking and going through wizards is inconvenient — and it becomes exponentially so when you’re talking about setting up hundreds of LPARs at a time in multiple data centers.

I’ve previously discussed the HMC CLI, and of course I’m not the only one to have written about the various options (herehere and here). Still, I couldn’t find any good examples of a scenario where you’re creating virtual Ethernet adapters with multiple additional VLANs.

The typical GUI interface looks like this: 

In this simple example I’m looking to have additional VLANs on this virtual interface. This information on chsyscfg was helpful:

                virtual_eth_adapters
                Comma separated list of virtual ethernet adapters,
                with each adapter having the following format:

                virtual-slot-number/is-IEEE/port-vlan-ID/
                [additional-vlan-IDs]/[trunk-priority]/
                is-required[/[virtual-switch][/[MAC-address]/
                [allowed-OS-MAC-addresses]/[QoS-priority]]]

In particular, this caught my eye:

                If values are specified for additional-vlan-IDs, they
                must be comma separated.

So here you have a command that is expecting a comma separated list of virtual Ethernet adapters, and inside of that you have a comma separated list of additional VLANs.

However, things didn’t go as I hoped. I tried many different combinations of “ and ‘ and \” but couldn’t get it to work. I finally opened a PMR, so I’ll skip to the conclusion you can hopefully avoid my pain. All I was doing was adding in an adapter to an existing profile, and that’s what ultimately worked. Obviously the serial number and the actual VLAN names are changed, but this command should correspond to the example from the GUI above.

chsyscfg -m SN12345 -r prof -i name=default,lpar_id=2,\”virtual_eth_adapters=\”\”4/1/1/1,2,3,4,5/2/1\”\”\” 

As you likely expect, doing more than one makes it even more convoluted:

chsyscfg -m SN12345 -r prof -i name=default,lpar_id=2,\”virtual_eth_adapters=\”\”2/1/1/1,2,3,4,5/2/1\”\”\,\”\”3/1/2/6,7,8,9,10/2/1\”\”,4/0/10//0/1\”

Just imagine if you had 15 or 20 VLANs per virtual adapter that you needed to deploy throughout your environment.

Maybe this syntax is obvious to you, but it wasn’t obvious to me, so I’m putting this information out there for the next person who needs to determine how to use a comma separated list inside of another comma separated list with the HMC CLI.

Dummy Devices

Edit: Do you still manage disks manually?

Originally posted December 2, 2014 on AIXchange

A customer was looking to assign SAN LUNs to a pair of VIO servers using vSCSI. When the VIO servers were configured, VIO1 was assigned adapters containing odd numbers of internal disks:

            VIO1

            lsdev -Cc disk

            hdisk0 Available 04-00-00 SAS RAID 0 SSD Array

            hdisk1 Available 04-00-00 SAS RAID 0 SSD Array

            hdisk2 Available 04-00-00 SAS RAID 0 SSD Array

            hdisk3 Available 0C-00-00 SAS Disk Drive

            hdisk4 Available 0H-00-00 SAS Disk Drive

            VIO2

            lsdev –Cc disk

            hdisk0 Available 05-00-00 SAS Disk Drive

            hdisk1 Available 06-00-00 SAS Disk Drive

Because of this, the shared LUNs would be out of sync when they were allocated to the VIO servers. On VIO1, the next hdisk number would be hdisk5, while on VIO2, the next hdisk number would be hdisk2. The customer wanted to keep the hdisk numbers consistent on the two servers to simplify the process of mapping the LUNs to client LPARs. Consistent numbering would also make it a bit easier to conduct any future troubleshooting.

There was no shortage of ways to resolve this issue. The SAN team could assign some small LUNs to VIO1 only. That way both servers would have the same number of hdisks once mapping of the “real” shared LUNs began.

I found more options online (see herehere and here). Unfortunately, we couldn’t get any of them to work.

Finally, I was shown this blog post on dummy devices, which includes the following command:

mkdev -l hdisk100 -c disk -s sas -t scsd -p sas0 -w sas –d

Sure enough, this did the job:

hdisk100 Defined

lsdev –Cc disk

hdisk100 Defined   00-00-00-00 MPIO Other SAS Disk Drive

Basically, this method allows you, the admin, to add in all the dummy hdisks you need without involving SAN personnel.

When you’re using vSCSI, do you take the time to keep your hdisks in sync? In what other instances do you find it necessary to create a dummy device? How do you go about it?

The Value of Hardware Maintenance

Edit: Keep your maintenance current.

Originally posted November 25, 2014 on AIXchange

Recently, a customer was looking for help with their machine, a 7038-6M2 running AIX 5.1.

The customer attempted to call IBM for assistance. They didn’t get very far at first. This machine was announced in 2002 and withdrawn from marketing in 2005. In addition, IBM no longer supports AIX 5.1 or any previous operating system.

As I’ve often said, lots of businesses continue to run on older hardware and legacy OSs. While this speaks to the high quality of IBM systems and software, it’s still a risky venture, because even the most well-made systems will break down eventually.

In this case, the test/dev LPAR wouldn’t boot up once a D20 drawer was added to the system. Luckily, the production LPAR wasn’t impacted.

A bad boot drive was initially thought to be the culprit. Some used drives were procured and one of the disks was replaced. The intent was to boot from the remaining good disk and then mirror to the used drive. An AIX 5.1 CD was used as boot media. While the machine came up, the used disk wasn’t recognized. Was this a firmware issue? Was the wrong part number being used as a replacement?

A different used drive was tried, but it wasn’t recognized either. No one was sure what to try next. Finally, the question arose: What kind of a PMR did you try to open with IBM, software or hardware? Ah ha. The customer had tried to open a software PMR but wasn’t entitled. However, the machine was still covered under IBM hardware support.

Once a ticket was opened, the disk carrier was determined to be the problem. When that part was replaced, the machine was able to boot. One of the disks actually was bad though, so one of the replacement disks was used to create a mirror of rootvg. The system is running fine now.

The moral: If you’re still running old hardware, keep IBM maintenance on it. Or better yet, seriously consider upgrading to something newer.

On an unrelated note, I saw the following from IBM that might impact customers who order physical copies of media:

Under Software Updates: Effective November 18, 2014 customers in the USA who select the physical delivery option will be invoiced 350USD + sales tax for the order. Note that download delivery remains free of charge.

Be sure to give yourself any additional time necessary to download and burn any physical media you might need. Otherwise be prepared to pay this new fee if you still want IBM to ship it to you.

Tracking Network Devices

Edit: Still good stuff.

Originally posted November 18, 2014 on AIXchange

Which switch port is your network port plugged into?

Oftentimes this simple bit of information goes undocumented. Perhaps everything is being plugged in at a remote site by some ‘hands and eyes” guys and you’re just not sure if the cabling has been completed or if it’s correct according to the documentation you received. Or maybe you just want more information about the network device that you’re plugging into.

I was reminded of an interesting method for obtaining this information. Before I get into it, keep in mind that this might not work depending on the switch you’re connecting to or its security settings. That said, I’ve had pretty good luck with it so far.

To get this working in my environment, I first needed to see what physical cards I’d connected to the switch. I accomplished this with the lscfg command, which displayed the cards available in my system:

lscfg | grep en

In my test machine I have these ports:

To determine which ports are reporting that they’re up, I ran:

            for i in 0 1 2 3 4 5 6 7

            do

            echo ent$i ; netstat -v ent$i | grep Status

            done

I received this output:

            ent0

            No network device driver information is available.

            ent1

            Physical Port Link Status: Down

            Logical Port Link Status: Down

            DCBX Status: Enabled

            MAC ACL Status: Disabled

            VLAN ACL Status: Disabled

            ent2

            Physical Port Link Status: Up

            Logical Port Link Status: Up

            DCBX Status: Disabled

            MAC ACL Status: Disabled

            VLAN ACL Status: Disabled

            ent3

            Physical Port Link Status: Up

            Logical Port Link Status: Up

            DCBX Status: Disabled

            MAC ACL Status: Disabled

            VLAN ACL Status: Disabled

            ent4

            No network device driver information is available.

            ent5

            Link Status: Down

            Transmit and Receive Flow Control Status: Disabled

            ent6

            Link Status: Down

            Transmit and Receive Flow Control Status: Disabled

            ent7

            Link Status: Down

            Transmit and Receive Flow Control Status: Disabled

This showed me the status of every port. In my case, I know that the ports that report “no network driver information is available” are part of my Shared Ethernet adapters, so that gives me an idea of which adapters are being used by SEA on this VIO server. 

The method that I will now describe will work on network ports that don’t have an SEA on them. Maybe you have your own method to use once your port is already up and active in an SEA? If so, let me know in comments.

In above output, ent2 and ent3 are reporting that they’re up. I put a dummy IP address on them:

ifconfig en2 10.9.0.1 netmask 255.255.255.0

ifconfig en2 up

Then I ran tcpdump:

tcpdump -nn -v -i en2 -s 1500 -c 1 ‘ether[20:2] == 0x2000’

After a short wait, I received this output:

            tcpdump: listening on en2, link-type 1, capture size 1500 bytes

            08:09:13.046930 CDP v2, ttl: 180s, checksum: 692 (unverified)

                        Device-ID (0x01), length: 22 bytes: ‘ucs6120-A(SSI140206FM)’

                        Address (0x02), length: 13 bytes: IPv4 (1) 10.33.0.31

                         Port-ID (0x03), length: 12 bytes: ‘Ethernet1/20’

                         Capability (0x04), length: 4 bytes: (0x00000228): L2 Switch, IGMP snooping

                         Version String (0x05), length: 70 bytes:

                           Cisco Nexus Operating System (NX-OS) Software, Version 5.2(3)N2(2.22c)

                         Platform (0x06), length: 9 bytes: ‘N10-S6100’

                         Native VLAN ID (0x0a), length: 2 bytes: 705

                         AVVID trust bitmap (0x12), length: 1 byte: 0x00

                         AVVID untrusted ports CoS (0x13), length: 1 byte: 0x00

                         Duplex (0x0b), length: 1 byte: full

                         MTU (0x11), length: 4 bytes: 1500 bytes

                         System Name (0x14), length: 9 bytes: ‘ucs6120-A’

                         System Object ID (not decoded) (0x15), length: 14 bytes:

                          0x0000:  060c 2b06 0104 0109 0c03 0103 864f

                         Management Addresses (0x16), length: 13 bytes: IPv4 (1) 10.33.0.31

                         Physical Location (0x17), length: 14 bytes: 0x00/Lab Switch 1

            4 packets received by filter

            0 packets dropped by kernel

 I can do the same thing with en3.

From this I know what kind of switch I’m connected to, what port is it connected to, what OS the switch is running, the VLAN I’m on, the MTU size, the management address of the machine, etc.

Be sure to read the link above for additional information about ether channel and other details.

What other methods do you use to determine which the physical ports your machines are using?

Firefox SSL Fix for HMC Users

Edit: Link no longer works.

Originally posted November 7, 2014 on AIXchange

I have pretty good luck when using Mozilla with my HMCs. However, when I recently upgraded Mozilla, I encountered an issue:

An error occurred during a connection to hmc1. Issuer certificate is invalid. (Error code: sec_error_ca_cert_invalid)

    The page you are trying to view cannot be shown because the authenticity of the received data could not be verified.

    Please contact the website owners to inform them of this problem. Alternatively, use the command found in the help menu to report this

I found a solution in this technote. Although it’s referring to Domino servers, the concept is still the same.

After updating Firefox to version 31 (or later), when Firefox browser users attempt to access a MD5-based SSL certificate, generated by a Domino Web server, the connection attempt will fail with the following error: Secure Connection Failed. An error occurred during a connection to <server name>. Issuer certificate is invalid. (Error code: sec_error_ca_cert_invalid)

Firefox 31 introduces a new security library named security.use_mozillapkix_verification for strict enforcement for SSL certificate verification (see this MozillaWiki article for details).

After updating Firefox to version 31 (or later), when Firefox browser users attempt to access a MD5-based SSL certificate, generated by a Domino Web server, the connection attempt will fail with the error shown below. This includes Domino self-signed testing certificates generated from the Server Certificate Admin database or server SSL certificates generated from the Domino Certificate Authority.

You can perform the following steps on local Firefox browsers to restore the older SSL libraries for Firefox, which will allow HTTPS connections to your server.

Step 1. Type about: config in the Firefox address bar to access Advanced settings. Read the warning presented, and then click the “I’ll be careful, I promise” prompt to accept and proceed.

Step 2. Scroll down to security.use_mozillapkix_verification and double-click to toggle its value from true to false (or, right-click on it and select Toggle).

Once I did this, I was able to connect to my HMC as usual. Hopefully this tip will help should you run into this same issue in the future.

A Useful Data-Compression Option

Edit: Have you run this tool?

Originally posted November 3, 2014 on AIXchange

Perhaps you’re interested in compressing data on your IBM storage devices. But do you have any idea how much of your data is actually compressible?

The Comprestimator utility is designed to tell you how much actual compression you’ll achieve without actually compressing your data. Download version 1.5.1.1 here. From the same link, here’s a detailed description of the tool: 

“Comprestimator is a command line host-based utility that can be used to estimate an expected compression rate for block devices.

The Comprestimator utility uses advanced mathematical and statistical algorithms to perform the sampling and analysis process in a very short and efficient way. The utility also displays its accuracy level by showing the maximum error range of the results achieved based on the formulas it uses. The utility runs on a host that has access to the devices that will be analyzed, and performs only read operations so it has no effect whatsoever on the data stored on the device. The following section provides useful information on installing Comprestimator on a host and using it to analyze devices on that host. Depending on the environment configuration, in many cases Comprestimator will be used on more than one host, in order to analyze additional data types.

In order to reduce the impact of block device and file system behavior mentioned above it is highly recommended to use Comprestimator to analyze volumes that contain as much active data as possible rather than volumes that are mostly empty of data. This increases accuracy level and reduces the risk of analyzing old data that is already deleted but may still have traces on the device.

Comprestimator version 1.5 adds support for analyzing expected compression savings in accordance with Storwize V7000, SAN Volume Controller (SVC) and FlashSystem V840 storage systems running software version 7.3. Among other enhancements in the software, version 7.3 adds support for the 2014 hardware models Storwize V7000 Gen2, SVC DH8 and FlashSystem V840 AC1.

Comprestimator is supported and can be used on the following client operating system versions:

  • Windows 2003 Server, Windows 7, Windows 2008 Server, Windows 8, Windows 2012
  • ESXi 4, 5
  • AIX 6.1, 7
  • Red Hat Enterprise Linux Version 5.x, 6.x
  • HP-UX 11.31
  • Sun Solaris 10, 11
  • SUSE SLES 11
  • Ubuntu 12
  • CentOS 5.x

Comprestimator is designed to scan any block device that is readable by the OS itself. This typically includes devices managed by logical volume managers (LVMs) or partitioned by the OS. However, for practical reasons, since compression is applied to physical volumes, it is recommended to estimate compression by running Comprestimator on the same block device/physical volume that will be compressed, and not on a logical volume, which may be spanning on those volumes. It is thereby highly recommended to always analyze the native block-device when using Comprestimator.

Some volume managers “reserve” some of the LUN capacity for internal use. Since Comprestimator reads directly from the block device, some of these IOs may fail. The tool will tolerate up to 1% failed IOs and a scan will be aborted if this threshold is reached.”

Rather than guess what you might save on disk space when you turn on compression, try this tool and learn some real-world numbers based on your actual environment.

Reformatting IBM i pdisks to AIX hdisks

Edit: It has been a while since I have needed to do this.

Originally posted October 27, 2014 on AIXchange

Recently I needed to reformat an IBM i LPAR as an AIX LPAR for some testing. After defining the partition and reusing the IBM i hardware, I tried to boot it from physical install media (as there is no VIOS on this machine).

The OS would boot, but wouldn’t recognize any of the disks. If I went into maintenance mode and started a shell, lscfg displayed pdisks, but not hdisks.

This made sense, as these disks were still set up as an IBM i raid array. I needed to format them so that AIX could use them. The AIX boot media didn’t have the capability to format the disks; fortunately, IBM developerWorks had what I needed:

The IBM Standalone Diagnostics CD-ROM provides hardware diagnostics and service-related utilities for POWER, PowerPC, eServer i5 system with common pSeries I/O, and RS/6000-based systems. The standalone diagnostics CD-ROM would be used in the following situations when it makes sense to test the hardware independent of the operating system:

* When there is no operating system installed on a system or partition

* When the operating system does not have support for the service related function you wish to perform

* When there may be a problem with the boot device

* When the service documentation specifically recommends running standalone diagnostics

 The actual IBM Standalone Diagnostic CD-ROM can be downloaded here.

Diagnostics, which are available for AIX and Linux systems and logical partitions, can help you perform hardware analysis. If a problem is found, you will receive a service request number (SRN) or a service reference code (SRC) that can help pinpoint the problem and determine a corrective action. Additionally, there are various service aids in the diagnostics that can help you with service tasks on the system or logical partition.

You can run the IBM Standalone Diagnostic the following ways:

 * Running the eServer stand-alone diagnostics from CD-ROM
 
 * Running the eServer stand-alone diagnostics from a Network Installation Management server

I downloaded the standalone diagnostic CD and was able to burn the .iso image to physical media and boot from it. From there I went in and changed the pdisks to hdisks, formatting them as JBOD disks. Then I swapped CDs and put the install media back in the drive. AIX was able to recognize the disks, the OS was installed, and the test LPAR was easily built.

A System Outage, and the Failures that Led to It

Edit: Some links no longer work.

Originally posted October 21, 2014 on AIXchange

Old Power servers just run. Most of us know of machines that sat in a corner and did their thing for many years. However, as impressive as Power hardware is, running an old, unsupported production server with an old, unsupported operating system isn’t advisable. This is one such story of a customer and its old, dusty machine that sat in a back room.

This customer had no maintenance; they simply hoped their box would continue to hum along. To me, that’s like taking your car to a shop and telling the mechanic: “The check engine light has been on for years, and I’ve never changed the oil or checked the tires. Why are you charging me so much to fix this?”

I imagine some of you are thinking about the fact that older applications can be kept in place by running versioned AIX 5.2 or AIX 5.3 WPARs on AIX 7. That option wasn’t selected in this case, however. This was a server running AIX 5.2 and a pair of old SCSI internal disks that were mirrored together in rootvg. Eventually, one of those disks began to fail.

When did it begin to fail? No one knew, because no one monitored the error logs. When the machine finally had enough, it crashed. Reboots would stop at LED 0518. In isolation, that’s no big deal. Just boot the machine into maintenance mode and run fsck.

In this case though, going into maintenance mode only resulted in more unanswerable questions. Where’s the install media? No one knew. No one knew where the most recent mksysb was, either. Ditto for the whereabouts of the keyboard for the console. No one knew. Time to start sweating.

Because this was a standalone server, there was no NIM server. Because it was a production machine, the outage affected several locations. Booting an older version of AIX and then trying to recover a newer version on rootvg is often problematic, and this instance was no exception. Though the customer could get AIX 5.2 media shipped to them from another location, they’d have to wait a day, and there was no guarantee that this version would be at the same level as the operating system they were using.

It turns out this customer was very, very fortunate, because someone, somehow located a 4-year-old mksysb tape. The machine booted from the tape drive and the customer was able to get it into maintenance mode, access a root volume group and run fsck on the rootvg filesystems. Some errors were corrected and the machine was able to boot. From there it was a relatively simple case of unmirroring the bad disk and replacing it with a new disk.

While naturally I’m happy that this customer resolved their issue, I present this story as a cautionary tale. Think of all the things that went neglected prior to the disk failure. Filesystems and errpt weren’t monitored. While nightly data backups were being taken, there were no recent mksysb backups. It’s possible that the last mksysb was taken at the time of system installation. There were no OS disks on hand. Only luck kept this customer from experiencing substantial downtime and losing significant business.

Now consider your environment. Do you occasionally take the time to restore your critical systems on a test basis, just to prove that you could restore them in an actual emergency? If you couldn’t boot a critical system, could you recover it? How long would it take?

More on the New HMC Release

Edit: I still enjoy the new interfaces. Some links no longer work.

Originally posted October 14, 2014 on AIXchange

After writing about the HMC’s new look, I found that one of my options wasn’t working as expected. When I clicked on Manage PowerVM, I got this error.

I opened a ticket with IBM Support, which sent me to this link to run this procedure:

The apply of V8R8.1.0 Service Pack 1 may cause some VIOS related tasks to fail. Impacted HMC tasks include Manage PowerVM and Manage partitions task in the new “enhanced GUI” as well as the Performance and Capacity Monitor (PCM). External applications using the HMC REST API such as IBM PowerVC are also impacted. The error text will typically include the error message “3003c 2610-366 The action array contains an undefined action name at index 0: VioService.

Contact IBM support for the circumvention until a fix is available.

Symptom: The apply of V8R8.1.0 Service Pack 1 may cause some VIOS RMC related tasks to fail. Impacted HMC tasks include Manage PowerVM and Manage partitions task in the new “enhanced GUI” as well as the Performance and Capacity Monitor (PCM). External applications using the HMC REST API such as IBM PowerVC are also impacted.

For example, the Manage PowerVM task fails with the error:

Exception occurred in Querying for Media Repositories from vios P8TVIO1 with ID 1 in CEC 8286-41A*TU20305 – Network interruption occurs while RMC is waiting for the command execution on the partition to finish. The operation might have caused CPU starvation or network disruption. The operation could have completed successfully. (3003c 2610-366 The action array contains an undefined action name at index 0: VioService. )

This procedure requires you to get a root shell on the HMC, which usually requires the assistance of IBM Support.

However, right around the time of this error, I saw a post on IBM developerWorks that notes another way to become root using the ‘shellshock’ bash security bug. (That post has since been removed, though HMC fixes to fix bash are available from IBM’s Fix Central.)

At the end of the instructions, you’re told to wait for the RMC connections to restart.  In my case, nothing happened.  I rebooted the HMC, and found this new error:

Support pointed me to this procedure,but then recommended I apply the fix:

“Querying VIOS using the HMC REST API fails with “Exception occurred in Querying for Media Repositories from vios <vios name> with ID <vios lpar id> in CEC <server MTMS> – Unable to connect to Database”. This error usually indicates that the VIOS database is corrupt and needs rebuilt.”

“At VIOS level 2.2.3.1, the resolution is to apply fix VIOS_2.2.3.1-IV52899m1a. No further recovery is needed.”

Seeing as I was at 2.2.3.3, I thought it was an odd suggestion, as 2.2.3.3 already had the fix. However, in the interest of completeness, I wanted to mention the procedure itself. Perhaps it will help with any issues you might encounter.

I decided to update the firmware on the server and reboot all of the LPARs. The Manage PowerVM option started working. Here are some screen shots.

Right-clicking on the active VIO server brought me to a view that looked pretty similar to what I was used to:

When I went to adapter view, it appeared something was missing:

I’ve been able to use the GUI to set up server templates and shared Ethernet adapters — all without needing to login as padmin to the VIO servers. Keep in mind that the “classic” mode still works exactly as you’re used to. The same is true for all of the VIO commands you’re used to. 

As I continue to learn about this HMC code, I’ll pass along more information.

POWER8 E870 and E880 Offer Impressive Performance

Edit: Link no longer works. This announcement feels like it happened yesterday.

Originally posted October 3, 2014 on AIXchange

New POWER8 server models were announced today: the scale-up E870 (9119-MME) and E880 (9119-MHE) along with an Ubuntu Linux only model called the S824L. The E (with the “E” denoting “enterprise”) models will have I/O drawers available. An IBM Statement of Direction (SoD) indicates that I/O drawers will be available for the S models in 2015. The E870 and E880 will be generally available Nov 18. This blog post provides details on the E models.

These new systems are a blend of the 795 and 780/770s. Architecturally these new machines are similar to the Power 795, but the packaging in a 19-inch rack with multiple CECs is similar to the Power 780/770. The preliminary CPW and rPerf numbers that I saw during the training (that were still being tested and confirmed) were substantial and impressive. I‘m sure we will see more information around these numbers that I did not have available to me at the time of writing.

The E870 is available as a one- or two-node system, and the E880 will eventually be available as a one-, two-, three- or four-node system, although at GA it will only be available as a one- or two-node system. The third and fourth node configurations are planned to GA in June 2015.
Each node in the E870 and E880 will have eight PCIe Gen3 x16 slots for low profile PCIe adapters; optionally these slots will be used as optical interfaces to the I/O drawers. The nodes are 5U in size and come with different core densities and speeds. The E880 will have a 32-core 4.35 GHz option with a SoD for a 48-core node. The E870 will have a 40-core 4.19 GHz option or a 32-core 4.02 GHz option. All nodes in a server must have identical processors, you cannot mix and match nodes. This means that the maximum for the E880 32-core node will be 64 cores at GA, with 128 cores in 2015. An SoD indicates IBM plans for 192 cores in 2015 using 48-core nodes. The E870 will have a maximum of 80 cores with the 40-core node, and 64 cores with the 32-core node.

There are 32 memory slots per node. These systems are using custom DIMMs that are running at 1600 MHz DDR3, with the E870 going up to 2 TB per node (with an SoD taking them to 4 TB per node in 2015) and the E880 going up to 4 TB per node when you use the largest DIMM sizes that are currently available.

There are no integrated SAS bays or SAS controllers in the node. There is no integrated DVD bay or DVD controller in the node. There is no integrated Ethernet in the node and no tape bay in the node. The node is strictly for power supplies, CPUs, memory and PCI slots.

A new concept is the system control unit, which is a 2U drawer that connects to the server at the midplane. It must be immediately physically adjacent to the system nodes. It holds the service processors, the HMC ports, the master system clocks, the operator panel, the VPD and an optional DVD. The system control unit also contains the redundant power, hot plug clock and battery. The idea is that all of the important components in the system control unit are redundant and these components are not a single point of failure for the machine.

A one-node E870 or E880 will take up 7U in a 19-inch rack, two nodes will take up 12U, and eventually when we get to three nodes it will take up 17U, and four nodes will take up 22U. IBM recommends that we leave 1U open at the top and/or bottom of the rack for easier cable management. IBM also recommends that we mount 1U power distribution units (PDUs) horizontally instead of in the side pockets to make cabling easier instead of the PDUs that go along the sides of the racks as many of us are used to.

The I/O expansion drawer connects to the nodes using two PCI slots from the node via an optical cable. For each drawer you attach, you gain 12 slots, but you effectively “lost” two slots on the system node, for a net gain of 10 slots for each I/O drawer. For this first announcement we can attach up to two I/O drawers per node, with a total of four per system in 2014. If you attach two drawers to a two-node system this will give you 56 total I/O slots. The SoD states that IBM plans to support up to four I/O drawers per node which would take us to eight I/O drawers or 96 I/O slots on a two-node system. For this 2014 announcement you can have either zero or two drawers per node. There is no option to just do three drawers, for example, at this time. IBM issued an SoD for the I/O drawers to connect to the S models, but that will not be available until next year, and new firmware will be required to take advantage of I/O drawers.

Using physical I/O you can run: 
AIX
AIX 7.1 TL3 SP4 with APAR IV63332 or later
AIX 7.1 TL2 SP6 or later (Jan 2015)
AIX 6.1 TL9 SP4 and APAR IV63331 or later
AIX 6.1 TL8 SP6 or later (Jan 2015)

With VIOS you can run:
AIX 7.1 TL2 SP1 or later
AIX 7.1 TL3 SP1 or later
AIX 6.1 TL8 SP1 or later
AIX 6.1 TL9 SP1 or later

IBM i
IBM i 7.2 TR1 or later
IBM i 7.1 TR9 or later

Linux
RHEL 6.5 or later
SUSE 11 SP3 and later

VIOS
VIOS 2.2.3.4 with ifix IV63331 or later
VIOS 2.2.2.6 or later (Jan 2015)
The firmware level will be 8.2.0

Other items of note in the announcement include:

  • If you want to do a model upgrade and retain the same serial number you can migrate a 770 D model to an E870, and you can upgrade a 780 D model to an E880.
  • The 5887 EXP24S I/O drawer is supported on these new machines, and if you want internal boot disks, this drawer is going to be the method you use to achieve that.
  • The PVU for the E870 and E880 will be 120, for AIX these machines will be a medium software tier and for IBM i these will be P30 machines.
  • Because these servers will pack a great deal of compute capability in a small footprint, you can definitely hear the fans, especially when they speed up to handle additional load. You may want to consider acoustic doors in your racks.
  • A new S824L model is planned to GA on Oct. 31.It is designed for high-performance analytics, big data and Java application workloads. It will incorporate NVIDIA El Capitan K40 GPU adapters and will run Ubuntu 14.10 exclusively. Virtualization will not be available for this machine.
  • There will be 2x memory available for the S824, you will be able to get 2 TB into the machine with 128G DIMMs via an RPQ, but mixing of DIMM sizes on the machine isn’t allowed.
  • The S822L and S822 are NEBS Level-3 and ETSI certified for use by clients that require a hardened infrastructure, they are designed for “extreme shock, vibration, and thermal conditions which exceed normal data center design standards.”
  • An RPQ is available to allow 900W 100-120V power supply options for four-core or six-core S814 rack-mounted servers.

These are just some of the highlights from the announcements. I have been to a few training sessions so far and there is even more information than I was not able to cover here, but I wanted to give you a flavor of what was coming in the near future. You can read the IBM news release here.

Power Systems, Linux on Power Events Scheduled

Edit: I love this type of training.

Originally posted September 30, 2014 on AIXchange

I recently attended a no-charge Linux on Power workshop that’s currently touring the U.S. In conjunction with this 1-day workshop, a 2-day Power Systems virtualization class is traveling the country as well. Both events are geared toward IBM customers, so see your local IBM rep or business partner representative to be nominated to attend.

IBM has also said that if there’s sufficient demand (12-20 participants), the company will attempt to add events in cities that aren’t currently on the workshop schedule. Email me and I’ll get you in touch with the workshop coordinators.

Here’s the current schedule:

St. Louis
Power virtualization class: Sept 29-30
LoP workshop: Oct. 1

Coral Cables, Fla. (Miami)
Power virtualization: Oct. 21-22
LoP: Oct 23

Costa Mesa, Calif (southern California)
Power virtualization: Oct. 28-29
LoP: Oct. 30

Bethesda, Md. (Washington, D.C.)
Power virtualization: Nov. 4-5
LoP: Nov. 6

Jacksonville, Fla.
Power virtualization: Nov. 11-12
LoP: Nov. 13

Malvern, Pa. (Philadelphia)
LoP: Nov 13

Schaumburg, Ill. (Chicago)
Power virtualization: Nov. 18-19
LoP: Nov. 20

These details are provided by IBM:

 Linux on Power objectives

• Provide a Linux on POWER experience

• Maximum hands-on engagement

• You perform activity AS we present

• The Lecture IS the lab

• Build confidence in the ability to deploy Linux on POWER

• Convey IBM’s renewed commitment to Linux on POWER

• This is not a PowerVM class

• We can speak to any PowerVM questions you have

• PowerVM is not the core of the content

• Your class LPARs and their virtualization are already configured

Linux on Power Agenda

• Lab Introduction

• ISO Media install of Red Hat 6.5 on LPAR

• Linux on POWER Trends and Directions

• Network install of Red Hat 6.5 on LPAR

• Filesystems and LVM

• Graphical Desktop

• Commands and Additional Information

Power Systems Virtualizaton Workshop Highlights:

Overview of POWER Architecture, Power Systems servers, and virtualization concepts

Management Appliance architecture and functions

Hardware Management Console (HMC)

Flex System Manager (FSM)

Virtual Machine creation

Partitioning Configuration and requirements

PowerVM Enterprise Edition virtualization concepts

Micro-Partitioning

LPARs / Virtual Machines

Memory virtualization (AME / AMS)

Shared Storage Pools

PowerVM Live Partition Mobility (LPM / Migration)

Introduction to PowerVC

Introduction to SmartCloud Entry for Power

From my first-hand experience, I can certainly recommend the Linux on Power workshop I attended. I believe both events are a good opportunity to gain more skills and bring back hands-on experience and knowledge to your organization.

A New Look for the HMC

Edit: The interface keeps changing for the better.

Originally posted September 23, 2014 on AIXchange

I decided to update to the latest (as of this writing) HMC code, V8R8.1.0M1.

I went to IBM Fix Central and found MH01420, and downloaded the necessary .iso image.

Once I had done this, I followed the second half of this document. When I clicked on “update HMC,” I just pointed it to the server I’d downloaded the .iso image to.  It very quickly did the updates, then rebooted.

A normal HMC reboot is generally fairly fast. However, with an upgrade, it takes much longer, so be patient. Then, after the reboot, it also takes awhile to actually start the console.  Once it did start, I immediately realized the console had a different look and feel.


Clicking on the “learn more link” at the bottom of the login screen opens a help window opens that displays the differences between the “classic” and “enhanced” login tasks.  Learn more about this here:

“Learn about the differences between the Classic and Enhanced graphical user interface (GUI) in the Hardware Management Console (HMC).

Select which software interface to use when you log in to the HMC. The Classic interface provides access to all traditional functions of the HMC and the Enhanced interface provides both redesigned and new virtualization tasks and functions.

The Classic GUI is available by default on the HMC Version 8.1.0, or earlier.

The Classic GUI is available on the HMC Version 8.1.0.1, or later by choosing the Classic option while logging into the HMC.

The Enhanced GUI is available on the HMC Version 8.1.0.1, or later by choosing the Enhanced option while logging into the HMC.”

Be sure to check out the table from the link to see the differences between the Classic and Enhanced GUI tasks.

Initially, things didn’t appear all that different on the Enhanced GUI, but then I selected the server I wanted to manage.  Then I had some different menu options.

 Once I expanded them all, this was what I saw.

The rest of the menu did not fit in the screenshot. The following menu items are what you see beneath the capacity on demand menu item and the menu continues along the right side of the panel.

I immediately tried the performance option, and was pleasantly surprised with what I saw.

I also took a look at Manage PowerVM, and saw this.

The goal here is to provide the capability to do all of your management from the HMC, with no need to login to the VIO server.

At the bottom of the Manage PowerVM task there was a learn more button.  I clicked and saw this. 

When I clicked on Create Partition from Template I saw:

When you click on a partition, you now get a new set of menus. 

The manage menu gives you a new way to look at your profiles.
 

Click on advanced and you’ll see this.

hmc12.png

I’ve just described getting started. I’ll do more testing and exploring with this new interface. If you have something specific you’d like me to try out/write about, please let me know in comments.

Power and AIX News Via Twitter

Edit: I am still active on twitter. Some links no longer work.

Originally posted September 17, 2014 on AIXchange

While I remain active on Twitter, it’s been awhile since I’ve highlighted tweets on this blog. For this week though, here’s a sampling of fairly recent tweets that caught my interest:

 * Torbjörn Appehl (@tappehl) — Some nice news regarding direct attached external storage for #ibmi on #powersystems 

* Mike Krafick (@MKrafick) — Quick Tip: Monitoring memory on a #AIX server. So easy, even a #DB2 DBA can do it! 

* Jyoti Dodhia (@JyotiDodhia) — TIP: #VIOS networking tips and techniques by @GlennRobinsonVS

* Brian Smith (@brian_smi) — Using #AIX’s built in Performance Recording 

* Nigel Griffiths (@mr_nmon) — New #SharedStoragePool video: Experiments in Repository disk destruction shows SSP carries on & its easy to remake.

* Jyoti Dodhia (@JyotiDodhia) –HOT >> Replay: #Linux on Power for #AIX / #IBM i guys – Doing it Easy Way with @mr_nmon

* COMMON A Users Group (@COMMONug) — FREE webcast “IBM Power Announcements – #ibmi and Power” by @IBMiSight, @Steve_Will_IBMi, Mark Olson: Oct 6 @ 9am CT 

* Gareth Coates (@power_gaz)– What’s in #IBMPowerSystems #HMC V8R810 SP1? See [here]. Classic and Enhanced GUI, System and LPAR Templates etc. I like it!

* chmod666.org (@chmod666) — New post .. finally : Exploit the full potential of #PowerVC by using Shared Storage Pools & Linked Clones

* Site Ox (@siteox) — Linux on Power8 is available NOW at Site Ox. Free 2 Weeks! Site Ox is the official provider of Linux on Power for IBM. 

As I’ve often said, Twitter has much to offer all of us who work with IBM Power Systems and AIX. So who are you following on Twitter? How active are you? And for those of you who don’t use Twitter, how do you keep current with the goings-on in the world of Power and AIX?

Locating a Problematic Filesystem

Edit: Some links no longer work.

Originally posted September 9, 2014 on AIXchange

It was an ordinary day. I needed to take a mksysb. Only this time, I was getting an error.

            /usr/bin/mkszfile[1266]: FS_MIN_LOG = FS_MIN_LOG *

            20480 : 0403-009 The specified number is not valid for this command.

            0512-008 mksysb: The mkszfile command failed. Backup canceled.

I checked ps –ef | grep mkszfile and saw that it was still trying to run, but it wasn’t doing anything. I went ahead and killed the process.

The error message didn’t tell me much, but fortunately a quick web search yielded a few different ideas and suggestions, including this. Then I found an entry from this blog (that advertises “Unix tips, food reviews and astronomy”):

“A google search revealed it was probably a bad FS causing the problem. To identify which one(s), I ran the following: sh -x /usr/bin/mkszfile

“This gave the full output and I could see which file system it was processing when it crashed. I then unmounted the file system, [ran fsck] and remounted it before re-running the mkszfile.

“In this case there were four file systems it complained about and after [running fsck] them the mkszfile ran through ok. A re-run of the mksysb then worked ok.”

That seemed simple enough, so I gave it a try. It went exactly as described in the blog post. After running the command, the filesystem that was causing me issues was the last one that was processed before the error occurred. Luckily for me that filesystem wasn’t being used at the time, so I just unmounted it, ran fsck –y /filesystem and then remounted it. Then the mksysb worked as expected.

Now when the next person does a web search on this error code, there will be two sources confirming that running sh –x /usr/bin/mkszfile is the way to locate the filesystem that’s causing you problems.

vSCSI vs. NPIV

Edit: Most people seem to do everything with NPIV these days.

Originally posted September 2, 2014 on AIXchange

The IBM Redbook, “PowerVM Best Practices,” has a detailed look at mixing vSCSI and NPIV on VIO client LPARs.

From Section 5.1.3:

“It is possible to mix a virtual Small Computer System Interface (SCSI) and N-Port ID Virtualization (NPIV) within the same virtual I/O client. You can have rootvg or booting devices that are mapped via virtual SCSI adapters, and data volumes that are mapped via NPIV.

“Mixing NPIV and a virtual SCSI has advantages and disadvantages, as shown in Table 5-1.

Advantages

* It makes multipathing software updates easier for data disks.

* You can use the Virtual I/O Server to perform problem determination when virtual I/O clients have booting issues.

Disadvantages

* Requires extra management at the Virtual I/O Server level.

* Live Partition Mobility (LPM) is easier with NPIV.”

What’s your preference? Do you want your SAN guys to provide all your LUNs via NPIV and manage the same multipath drivers on the client for both rootvg and datavg? Or would you rather manage your rootvg multipath drivers on your VIO server, map up the rootvg disks to the clients via vSCSI and use NPIV for your data LUNs?

I prefer to use vSCSI for rootvg. I want to boot my VIO server from my internal disks, map some LUNs to my VIO server to use for my client LPARs rootvg, and then map my data disks via NPIV to my client LPARs. This allows me to troubleshoot by booting my VIO servers locally, and boot my LPARs “locally” via vSCSI.

When I need to update multipath software on the client LPARs, I’m not dealing with a chicken-and-egg dilemma where I’m booting my machine using the same multipath software I now need to update.

When I need to update my client rootvg multipath software, I’m updating my VIO server, which also booted locally. At no time am I “changing the tire while the car is speeding down the road,” as might be necessary if I updated drivers when booting my client using NPIV.

Yes, doing it this way requires more effort compared to simply having your SAN team map everything to your clients. In the end though, I believe the benefits outweigh the burdens.

If you disagree, feel free to make your case for NPIV in comments. I’ll also accept input from anyone who wants to back me up on vSCSI.

Useful Storage Links

Edit: Some links no longer work.

Originally posted August 26, 2014 on AIXchange

Here’s an assortment of really good storage-related articles — the majority of which are found on IBM developerWorks — that are worth your time. While some of them are a few years old, they still provide relevant information.

“Guide to selecting a multipathing path control module for AIX or VIOS.”

“Using the AIX Logical Volume Manager to perform SAN storage migrations.”

“IBM AIX SAN Volume Controller update and migration.”

“IBM AIX MPIO: Best practices and considerations.”

“Tracing IBM AIX hdisks back to IBM System Storage SAN Volume Controller (SVC) volumes.”

“Shuffling disk data around.”

“AIX and VIOS Disk And Fibre Channel Adapter Queue Tuning.”

“Move data quickly between AIX LPARs using Logical Volume Manager.”

“Tip: Online migration of a file system to a smaller physical volume.”

If you know of other useful storage-related articles, please cite them in comments.

More Resources for AIX Newbies

Edit: I wonder how many newbies there are year over year.

Originally posted August 19, 2014 on AIXchange

As I’ve noted previously, there are more newcomers to the AIX platform than you might imagine. A company may acquire an AIX system through a merger or replace an old Solaris or HP-UX box with a current IBM Power Systems model. As a result, one of their IT pros suddenly becomes the AIX guy. So, now what? How does an AIX newbie get up to speed with virtualization and AIX?

 I’ve mentioned the QuickSheets and QuickStarts from William Favorite. I’ve also highlighted conferences, classes and free monthly user group meetings that you can look into. Recently though, I was pointed to this old IBM web page featuring various AIX learning resources. I call it old because some of the links no longer work, but what’s still available is surprisingly useful.

Some of the material covers concepts from AIX 5.3, but even much of this information remains valid today. It’s also nice that some of the links take you to current Redbook offerings and IBM training courses.

The working links cover:

* AIX security and migration (this is AIX 5.3 material)

* Virtualization introduction

* Systems Director

* Power Systems Redbooks (updated here)

* IT technical training

* IBM business partner training

* IBM professional certification

On a related note, I’ve always believed that the simplest thing employers can do to help their IT staff members get started with AIX or any operating system that’s new to them is to invest in a small lab/sandbox machine and HMC.

I’m continually amazed to see companies spend big bucks on the latest hardware and software, but then neglect to foot the bill for additional test systems. It’s great that some companies devote an LPAR or two to testing, but you can only do so much in that environment. (In addition, there can be pressure to repurpose virtual test labs into running other production workloads. Then before you know it, the production needs grow so critical that these LPARs are made offlimits to reboots and testing.)

With Windows and x86 Linux servers especially, it’s relatively easy and cheap to get access to test machines. I also know of people who’ve purchased old Power hardware on eBay just to have something that they can run AIX on.

With actual test boxes, you can safely reboot servers, install firmware and upgrade operating systems without touching production. If you make a mistake on a test system, not only haven’t you hurt anything, you’ve learned a valuable lesson.

How do you learn, and keep learning? How do you stay current with your skills? If your machine is happily running along and you have little need to touch it, how can you ever expect to be able to support the machine when an issue hits?

Connecting Your HMC to IBM Support

Edit: I would imagine some ports have changed in the last 6 years.

Originally posted August 12, 2014 on AIXchange

You’ve been asked to connect your HMC to IBM Support. The network team wants to know about the different connectivity options. They need to know which IP addresses must be opened across the firewall.

What do you do? First, read this:

 “This document describes data that is exchanged between the Hardware Management Console (HMC) and the IBM Service Delivery Center (SDC) and the methods and protocols for this exchange. This includes the configuration of Call Home (Electronic Service Agent) on the HMC for automatic hardware error reporting. All the functionality that is described herein refers to Power Systems HMC version V6.1.0 and later as well as the HMC used for the IBM Storage System DS8000.

“Outbound configurations are used to configure the HMC to connect back to IBM. The HMC uses the IBM Electronic Service Agent tool to connect to IBM for various situations including reporting problems, reporting inventory, transmitting error data, and retrieving system fixes. The types of data the HMC sends to IBM are covered in more detail in Section 4.”

Included are diagrams that show different scenarios for sending data to IBM, including with/without a proxy server, using a VPN, or even using a modem (though IBM does recommend Internet connectivity). Specific options including pass through server connectivity, multi-hop VPN, and remote modem. IBM states that there are no inbound communications; all communications are outbound only.

Further, IBM explains why your machine may need to “call home”:

            * To report to IBM a problem with the HMC or one of the systems it’s managing.

            * To download fixes for systems managed by the HMC.

            * To report to IBM inventory and system configuration information.

            * To send extended error data for analysis by IBM.

            * To close an open problem.

            * To report heartbeat and status of monitored systems.

            * To send performance and utilization data for system I/O, network, memory, and processors.

There’s also a list of the files that are sent to IBM, and the authors point out that no client data that is sent to IBM.

On that note, here’s IBM’s statement on data retention:

“When Electronic Service Agent on the HMC opens up a problem report for itself, or one the systems that it manages, that report will be called home to IBM. All the information in that report will be stored for up to 60 days after the problem has been closed. Problem data that is associated with that problem report will also be called home and stored. That information and any other associated packages will be stored for up to three days and then deleted automatically. Support Engineers that are actively working on a problem may offload the data for debugging purposes and then delete it when finished. Hardware inventory reports and other various performance and utilization data may be stored for many years.

“When the HMC sends data to IBM for a problem, the HMC will receive back a problem management hardware number. This number will be associated with the serviceable event that was opened. The HMC may also receive a filter table that is used to prevent duplicate problems from being reported over and over again.”

Finally, there’s this list of the IP addresses that need to be allowed across any firewalls. All connections use port 443 TCP:

            Americas

            • 129.42.160.48

            • 129.42.160.49

            • 207.25.252.200

            • 207.25.252.204

            Non-Americas

            • 129.42.160.48

            • 129.42.160.50

            • 207.25.252.200

            • 207.25.252.205

IBM adds that when an inbound remote service connection to the HMC is active, only these ports are allowed through the firewall for TCP and UDP:

            * 22, 23, 2125, 2300 — These ports are used for access to the HMC.

            * 9090, 9735, 9940, 30000-30009 — These ports are used for Web-based System Manager             (POWER5).

            * 443, 8443 — These ports are used for Web-based user interface (POWER6).

            * 80 — This port is used for code downloads.

Take a few moments to read this document. Or, even better, send it to your network team so they can read it for themselves.

On Going Dark

Edit: Some links no longer work. I do not unplug nearly as often as I should.

Originally posted August 5, 2014 on AIXchange

I have another quick story involving my work with Boy Scouts.

Each summer we try to get the older boys involved in some high adventure activities. Last year this included target shooting (shotguns and .22 caliber rifles), archery, spelunking, rappelling and hatchet throwing. I didn’t bring my laptop, but with my cellphone I could check in with the office and answer emails. Really, it was the best of both worlds. I was able to camp out, but at the same time I could help out the people I work with.

This summer’s adventures consisted of backpacking, canoeing and canyoneering. Everything went smoothly in our case, though we do know members of the troop that had to be rescued around the time we were out.

For me, the main difference between this year and last was that I didn’t have cellphone coverage during our recent trek into the Arizona mountains. Honestly, I’m not sure this was a bad thing.

Where we were, there was absolutely no cellular coverage of any kind (though just 20 miles down the mountain, the service was fine). Of course when you’re responsible for the well being of a bunch of kids, you’d prefer to have a means of instant communication should an emergency arise. The troop leaders were talking about satellite phones. Perhaps next year we’ll look at something like this or this.

However, just looking at it from a work perspective, what would you do? Would you be OK knowing that cell phone service was a 15-20 minute drive away, or do you need to be constantly in touch? I will admit that I like to know what’s going on, not only in my world but in general. I had no way of checking headlines or sports scores or emails. I was completely cut off.

And yet, I think I enjoyed it. 

It takes awhile to truly unplug, and I might have gone through some withdrawal symptoms initially upon losing my access. Eventually though, I felt relieved. Since I knew that checking in wasn’t an option, I could focus on enjoying the trip. Since I couldn’t check messages, I didn’t feel guilty about not responding to them. The “out of office” auto message option exists for a reason, after all. I was finally, truly, away.

For another perspective on what it’s like to go a few days without having a working Internet in your hands, Jon Paris and Susan Gantner share this story about “going dark” during a cruise.

I guess there’s something to be said for being unplugged, especially if you’re out in nature. Even though I returned to tons of messages, when I got back I was recharged and ready to get back to work.

How about you? When you go on vacation, do you escape from technology?

Can We Talk? Yes, and it’s So Much Easier Now

Edit: It has only gotten easier to talk to people around the world.

Originally posted July 29, 2014 on AIXchange

A friend living overseas recently emailed me. He was having issues with an older HACMP cluster and wanted another set of eyeballs to check it. At the time I happened to be talking with a PowerHA guru, so I invited him to take a look as well.

Our small troubleshooting group reminded me of the people who work on their cars in their driveway. At least in my formative years, the sight of someone tinkering with a car would inevitably draw curious neighbors eager to see the mechanic do his thing. In this case, the attraction was an old HACMP cluster that — via a WebEx session — my guru friend and I could examine from several time zones away.

I’m still amazed at the relative ease with which it is now possible to communicate with anyone, anywhere. I have family members in South Africa. Years ago they actually sent a telegram to my door because they couldn’t reach me on the phone. (Not that transnational phone service was inherently unreliable in those days, but occasionally calls didn’t get through.) Surprised as I was to discover that telegrams still existed, it was the best alternative for delivering time-sensitive information at that time.

Awhile back, I sent them a magicJack VOIP system so they could have a local U.S. number. This means that any time I want I can pick up the phone and make what’s essentially a free phone call to the other side of the world.

Admittedly, VOIP technologies aren’t yet completely reliable. My friend with the HACMP cluster experienced issues with his VOIP solution. We tried IM, but weren’t satisfied waiting for each side to type out messages. Ultimately, he opted to call me on his cell phone. Of course that wasn’t free, but calling internationally is much cheaper than it was even a few years ago.

As for the HACMP issue, it was fairly straightforward. A change had been made in the environment. Someone added NFS to the cluster nodes, but not to the HACMP resource groups. The admin then decided to remove NFS, but didn’t remove it completely. As a result, the cluster was out of sync, and HAMP wouldn’t start at the next failover:

            ERROR: The nodes in resource group HA_RG are configured with more than one NFS domain. All nodes in a resource group must use the same NFS domain.

            Use the command ‘chnfsdom <domain name>’ to set the domain name.

With this error message pointing us in the right direction, the issue was quickly resolved.

We’re fortunate enough to work with some impressive technology, and that includes the older systems that continue to function effectively. But do you ever stop and really think about the amazing communication capabilities we have these days? Do you just take it for granted that these devices that fit in our pockets and purses allow us to interact in realtime with people from around the world for a relatively low cost and with very little effort?

Lessons from a Physical Server Move

Edit: Monitoring goes a long way. Some links no longer work.

Originally posted July 22, 2014 on AIXchange

A customer planned to use Live Partition Mobility (LPM) to move running workloads from frame 2 to frame 1. The steps were: shutdown frame 1, physically move frame 1, recable frame 1 and power it back on, then use LPM to bring the workload from frame 2 to frame 1, and, finally, repeat the process to physically move frame 2.

The task at hand was simple enough, but there was a problem. The physical server that was being moved had been up for 850 days. Do not make the mistake of moving a machine that’s been running continuously for more than two years without first logging in and checking on the server’s health. Furthermore, make sure you’ve setup alerting and monitoring of your servers.

I got a call after step one of the customer’s plan was complete and the damage had been done. Nonetheless, much can be learned from this episode.

Was errpt showing tons of unread errors? Yes. Had the error log been looked at? No. Had someone cleared the error log before support got involved with the issue? Yes. Was support still able to help? Yes. When you send a snapshot to IBM support, they can access the error log even if it’s been cleared from the command line, assuming those errors have not been overwritten in the actual error log file in the meantime.

Were there filesystems full? Yes. In this case one of the culprits was the /var/opt/tivoli/ep/runtime/nonstop/bin/cas_src.sh script, which wrote a file — /dev/null 2>&1 — that filled up the / filesystem.

To make matters worse, the machines are part of a shared storage pool, and after the physical move frame 1 would not rejoin the shared storage pool (SSP) cluster. This left only two of four VIO servers as part of the SSP.

It turned out that after the physical move, the network ports weren’t working. As a result, Multicast wasn’t working. At least getting Multicast back up was easy enough. However, the two VIO servers were still unable to join the cluster, and the third VIO server on frame 2 (vio3) had protected itself by placing rootvg in read-only mode as it logged physical disk errors. So from a 4-VIO server cluster, only one was actually functional, and that one had its own issues. If things weren’t fixed quickly, production would be impacted.

The problem with the one operable VIO server was, because it switched to read-only, SSP errors were occurring whenever someone tried to start or stop any of the cluster nodes. In other words, it was keeping the cluster in a locked state:

            clstartstop -start -n clustername -m vio3
            cluster_utils.c get_cluster_lock 6096 Could not get lock: 2
            clmain.c cl_startstop 3030 Could not get clusterwide lock.

Fortunately, rebooting the third VIO server cleared up this issue. And with that, the other VIO servers came back into the SSP cluster. Ultimately, the customer was able to use LPM to move clients to frame 1, which had already been physically moved. This allowed the customer to then shut down frame 2 and physically move it as well.

So what have we learned? Check your error logs. Check your filesystems. Schedule the occasional reboots of your machines. Make sure you’re applying patches to your VIO servers and LPARs. Make sure you have good backups.

Finally, note that in this instance, having the capability to perform LPM operations really made a huge difference. Despite the severity of these problems, the users of these systems had no idea that anything had been going on at all.

System Monitoring Shouldn’t Be Neglected

Edit: You still find this phenomenon, and it still surprises me.

Originally posted July 15, 2014 on AIXchange

What are you doing to monitor your systems from both the hardware and OS levels? Are you using a commercial product? Are you using an open source product? Are you using hand-built scripts that run from cron? Are you using anything?

Have you logged into your HMC lately? Does anything other than green appear in the system status, attention LEDs or Serviceable Events sections of the display? Countless times I’ve seen machines where the HMC messages were being ignored. Is your HMC set up to contact IBM when your servers run into any issues?

When your machines have issues, are you deluged with alerts? One customer I know of had a script that monitored their machine and sent emails when errors were detected. During one event, the PowerHA system actually failed over because the node became unresponsive due to the volume of errors being generated and the way the script was written. This forced the customer to go into the mail queue and clean up a huge number of unsent messages. Then they had to go into the email client and clean up all of the messages they’d received. Finally, they had to schedule downtime to fail the application back to the node it was supposed to be running on.

I know of multiple customers that simply route error messages to a mail folder — and then never bother checking them. What’s the point of monitoring a system if you never analyze the information you collect?

How diligent are you about deactivating monitoring during periods of scheduled maintenance? In many organizations where a help desk monitors systems, cycles are wasted because techs are so often called to follow up on alerts and error messages triggered by scheduled events.

Of course there are other impacts that can result from neglecting systems. If internal disks are going bad, and you’re not monitoring and fixing them, eventually you will lose your VIOS rootvg (assuming that’s how you have it set up). And just as some customers will ignore the system monitoring messages they collect, other customers don’t take action on hardware events that are being logged. Having robust hardware that notifies you when it needs maintenance is only useful if you actually heed the notifications.

Deploying your OS and installing your application is relatively simple, but along with that we must make decisions and take actions to manage and maintain these systems during the operational production phase of service. Sure, everyone is busy, and some tools cost money — but try explaining that to someone who cares when production goes down.

On a totally unrelated topic, I want to acknowledge that AIXchange is having a birthday. Seven years ago this week — July 16, 2007 — the first article was posted on this blog. Many thanks to everyone who takes the time to read this blog, and special thanks to those who have suggested topics. I welcome your input, and it does make a difference.

Here’s to the next seven years.

Webinars Cover the World of AIX

Edit: Some links no longer work.

Originally posted July 8, 2014 on AIXchange

Hopefully you regularly listen to the AIX Virtual User Group webinars, either live or on replay. Recent sessions have been devoted to the POWER8 server announcements, Linux on Power and SRIOV.

If you’re outside of the U.S., you should know that similar webinars are taking place worldwide. For instance, there’s the IBM Power Systems technical webinar series that originates from the U.K. This group’s next event, which is set for July 16, covers PowerVKM. Dr. Michael Perzl is the presenter, and as someone who’s already working with PowerVKM, I look forward to what he has to say.

Previously, this group presented “More tricks of the Power Masters,” which, as you might imagine, was an hour-long session consisting of tips and tricks for using IBM Power Systems hardware. Thirty-eight total replays of these sessions can be found here. Specifically, I recommend this video of several presentations by Gareth Coates. Gareth is an excellent speaker who’s always on the lookout for tips he can use in future sessions, and he mentioned that he is on the lookout for IBM i content as well. (He’ll be sure to give you credit for your help.)

As I’ve mentioned on numerous occasions, there’s little I love more than learning, finding and sharing AIX tips and tricks. With that in mind, please indulge me while I cite some specific information that’s available in the “Power Masters” videos:

* For starters, to force a refresh of the operating system level information on the HMC, run:

            lssyscfg –r lpar –m –osrefresh

(In addition, Power Masters offers good info on performing HMC updates from the network, which I’ve also written about here and here.)

* To find out how many virtual processors are active on my system, use the kdb command (and use it carefully):

            echo vpm | kdb

* To protect AIX processes when AIX is out of memory, use:

            vmo –o nokilluid=X

* To test your RSCT connection, use:

          /usr/sbin/rsct/bin/rmcdomainstatus –s ctrmc

Some other Power Masters topics:

* Using Live Partition Mobility checklists. (I wanted to point this out so I have a reason to add that FLRT now has LPM checks available.)

* viosbr (which I’ve also covered here).

Some of the other information presented was first used in a session that took place in 2013, called Power “Ask the Experts.” I covered that here.

Of course there’s much, much more on not just AIX but also IBM i topics, so check out the Power Masters videos on YouTube. And if you don’t already, be sure to tune into the AIX Virtual User Group and IBM Power Systems technical series webinars.

We’re Not the Only Techies

Edit: I still cannot land a plane.

Originally posted July 1, 2014 on AIXchange

As I’ve noted previously, I work with Boy Scouts. Recently I took a group of boys to an airport to work on their aviation merit badge.

We found a pilot who was willing and able to spend time on a Saturday with the troop. He invited the scouts to visit a maintenance and training facility and spend time on an airplane simulator.

Although he had interesting information to share, I quickly figured that, as a pilot, he hadn’t spent a lot of time creating PowerPoint presentations. Prior to taking the scouts to the hangar so they could learn how to conduct a pre-flight inspection of an aircraft, he showed them a presentation covering the merit badge requirements. At one point, he clicked on what he hoped was a link to a video, but it turned out he had inadvertently made a screen capture of the video rather than an actual link to it. (Not that this issue wasn’t easily addressed; he ended up going directly to YouTube and showing us things like this.)

But indeed, our pilot guide did admit that he hadn’t used PowerPoint in years. On top of that, during the presentation, the overhead projector had an issue. For those of us who spend our time in meetings and conference rooms, fixing projector issues is second nature. Once again though, he wasn’t immediately sure what to do.

All of us — even the scouts themselves — were pretty smug about our computer and projector knowledge at this point. Then we went into the next room and got into the simulator. Long story short:  I’m not cut out to land an airplane, or even to keep one riding smoothly through the air. So we all have our different skills. Frankly, as long as my pilots are experts at flying, I’ll excuse their shortcomings when it comes to using software programs and projectors.

Of course the scouts, most of whom have considerable experience with computer games, made me feel even more inept on the simulator. A lot of those kids had a pretty light touch on the airplane controls and managed a reasonably good landing on the first try.

As an AIX pro, I’m generally surrounded by others with similar professional backgrounds. Quite possibly, it’s the same for you. But we should all keep in mind that while most people need computers to do their jobs, they don’t live and breathe technology the way that many of us do.

Ultimately, my day at the airport reminded me that, even if most people don’t know computers like we do, we’re far from the only smart folks out there doing challenging, technical work. And thank goodness for all these people and their unique specialties, because you really wouldn’t want to see me at the controls of your plane.

More POWER8 Docs

Edit: Some links no longer work.

Originally posted June 24, 2014 on AIXchange

I love reading about new computing technologies, particularly the latest IBM Power Systems releases. It doesn’t hurt that, as a consultant, I have opportunities to work with the newest hardware, but even if that wasn’t the case, I’d still want to know everything about what’s coming out of IBM. I guess I’m like those folks who read automotive magazines, even though I don’t plan on buying a new Tesla anytime soon.

With this in mind, I’d like to point you to three new IBM documents — draft Redpapers — that cover the recently unveiled POWER8 models.  All three publications are scheduled to be finalized by the end of this month.

As you might expect, given that the models have many of the same features, there’s some overlap in the information presented. For instance, this is the table of contents for all three publications:

            Chapter 1. General description
            Chapter 2. Architecture and technical overview
            Chapter 3. Virtualization
            Chapter 4. Continuous availability and manageability

So if you read these Redpapers back to back, you might have a case of déjà vu. Nonetheless, I believe the information is well worth your time.

Let’s start with redp5097, which covers the 4U models, the S814 and the S824. As a reminder, the S in the model number stands for scale out, the 8 stands for POWER8, the 1 or 2 stand for the number of sockets, and the 4 stands for 4U.

Redp5098 covers the S812L and the S822L. Again, as a reminder, S for scale out, 8 for POWER8, 1 or 2 for the number of sockets, and 2 for 2U. L designates that these are Linux-only servers. I wrote about my experiences with the S822L here.

Finally, there’s redp5102, which covers the S822. For completeness, the S is scale out, the 8 is POWER8, the 2 is 2 socket and the 2 is 2U.

At the bottom of the splash page for each publication there’s a link to a blog post that lists five things to know about the IBM POWER8 architecture. I suggest checking this out as well.

So what are your plans to run POWER8 in your shop?

Can Vendors Make Our Lives Easier?

Edit: This application is still handled this way and is still a pleasure to work with.

Originally posted June 19, 2014 on AIXchange

Recently I was listening to a few admins compare and contrast two different shops that run AIX. All of them had worked in the first environment for several years. In this environment — we’ll refer to it as Shop No. 1 — they were constantly fighting problems and fixing others’ mistakes. Bad management and bad change control were cited as the primary issues. Pages would come at all hours of the day. Periodically missing a child’s soccer game — or a full night’s sleep — was the norm. Their jobs were stressful, to say the least.

Eventually, these admins found new positions working in customer environments where a vendor dictates much of their production computing environment. The vendor has very strict requirements that must be met before applications are allowed to run in production. Customers must have adequate hardware to handle the anticipated workload and capacity. Customer hardware must hit very specific IOPS numbers. The vendor requires access to customer systems that host the vendor software, and customers must agree to run vendor-specific monitoring tools on the systems. There are very strict requirements around change control, which means changes aren’t made on the systems without approvals. It’d take a catastrophe — a very rare and unusual event — for an admin to ever get called out of bed to go to work. Understandably, this group of admins was happier, professionally and personally, working with this vendor’s software. We could call this environment Shop No. 2.

Now for the discussion itself. The argument was made that if Shop No. 1 went with this vendor product or a similar solution that led to the same strict requirements being enforced, it would cease to be such a difficult place to work. But is it really possible that vendor requirements can so profoundly impact their customers’ working environments?

From the vendor perspective, being hands-on makes sense. If I can get my customers to agree to run my software on systems that have the capacity and functionality to handle it, if I can get them to properly manage and monitor their systems, my software — if it’s any kind of quality product — should work well. And I shouldn’t have to deal with irate customers blaming me when they try to run my software on an under-capacity system or when their own in-house programming introduces bugs. It’s in my interest to support customers — and only those customers — who agree to my requirements, because I can be confident there won’t be issues. Why wouldn’t I want to do that?

On the flip side, if I’m an admin, why wouldn’t I want to make sure my systems are capable of running the software I’m using? Why wouldn’t I want proper change control processes to be instituted? Doesn’t this seem like a win-win?

When considering vendors and the vendor solutions we deploy on our hardware, besides asking if a given software package will do the job, maybe we should ask how it will be supported. Because I can easily imagine a world where very specific instructions and very specific software support lead to very stable work environments.

How about you? Does your vendor have significant say-so over your environment, or is vendor input largely limited to implementation and support? How involved do you want your vendor to be?

Getting Started With PowerKVM

Edit: This is no longer a thing, although the commands and tools are still useful.

Originally posted June 10, 2014 on AIXchange

I recently installed and started playing with IBM PowerKVM software on the S822L. Luckily I had a good PowerKVM quick start guide to follow. There’s also a draft version of a Redbook that will help you with your installation. It covers how to netboot and includes examples of the menus you’ll encounter.

I started by connecting my laptop Ethernet port to the HMC1 port on the S822L. This port is using the default address of 169.254.2.147, while the HMC2 port is using the default of 169.254.3.147. In my case I set my laptop to 169.254.2.140 and logged into ASMI as I’m used to doing with PowerVM.

Once I was in I was prompted to change my ASMI password. Then I went to the system configuration/hypervisor configuration menu item in ASMI. There I was presented with a choice of using PowerVM or PowerKVM. I selected PowerKVM and entered an IPMI session password.

To get a console or power on and off the system, you need to get impitool. With a web search you can find methods using Cygwin to get it running on Windows, or you could look at ipmiutil. Since I was using Linux, I just made sure I had ipmitool installed, and I was set to go.

In ASMI I went to System information > Real-time progress indicator so I could see the LEDs from the front of the display without actually being in front of the machine. I verified I was at the 01 N V=N prompt, and then ran:

            ipmitool –I lanplus –H x.x.x.x –P password power on

Once I did this, I saw my LED codes change as the system powered on.

To get a console I ran:

            ipmitool –I lanplus –H x.x.x.x –P password sol activate

I soon discovered that I could kill my console session by running:

            Ipmitool –I lanplus –H x.x.x.x –P password sol deactivate

Finally, to power off I’d run:

            Ipmtool –I lanplus –H x.x.x.x –P password power off

I made sure the PowerKVM DVD (that I burned from an .iso image) was in the drive, the machine was powered up and my console was open. Eventually, petitboot came up. Due to a bad DVD I didn’t initially get what I expected, but once I had a .iso image that was in good shape, I was able to select the PowerKVM LiveCD option. The screen displayed:

            POWERKVM_LIVECD

            System information

            System configuration

            Exit to shell

The install wizard (which reminded me quite a bit of an old school RedHat install) prompted me for a root password, time zones, which disks to use, etc. Once that was completed it installed

PowerKVM and the system rebooted. Then petitboot came back up. I selected my freshly installed system and I was able to boot to a root prompt.

During the install I specified a network address for my network card and made sure it was on the network. To get Kimchi to work I needed to get into my sol console and edit /etc/sysconfig/selinux, change the selinux permissions to permissive and then reboot the server.

I verified that kimchid was allowed in the firewall by running:

            firewall-cmd –list-services

I ran systemctl to see the state of Kimchi:

            kimchid.service             loaded active running   Kimchi server

Once in a while I’d see it wasn’t running. In those instances I’d run:

            systemctl stop kimchid.service

            systemctl start kimchid.service

At this point I was able to connect to kimchi by going to https://ip.address.of.powerkvm.host:8001.

This gave me a graphical interface to simplify creating guests. I’ve gotten different versions of Redhat, SUSE, Ubuntu and Debian to all run successfully.

I copied my .iso files to /var/lib/libvirt/images and then set up templates in kimchi. It was pretty self explanatory. I’d click on the green + button, pick the local .iso image option and select create templates from selected .iso. At this point I could edit the number of CPUs, the amount of memory and disk, choose networking options, etc. Then I started my guest machine. I clicked on the live tile to get a console, and configured the machine as I would any new install.

By logging into my PowerKVM instance and running top, I can see all of the copies of my operating systems running as the qemu user. By running virsh commands I can get information about my machines as well as stop/start them, suspend them, etc. For example, this virsh command gave me information about the disks I was using for a machine called redhat7-1

            virsh qemu-monitor-command –hmp redhat7-1 info block

            drive-virtio-disk0: /var/lib/libvirt/images/9f82cbd6-d345-4591-aa68-748f2c7b2b4e-0.img (raw)

            drive-scsi0-0-0-2: /var/lib/libvirt/images/rhel-server-7.0-ppc64-dvd.iso (raw, read-only)

                Removable device: locked, tray closed

Another nice way to run the system is to simply enter virsh and run interactively. To see all the machines you can control, enter list. You can also get a console to a virtual machine — in my case I entered:

console redhat7-1

This is definitely an interesting new way to access a Power Systems machine. As I continue to work with PowerKVM, I’ll post more tips and tricks.

Yes, We Do Have AIX Forums

Edit: Still good options, especially if more people joined in.

Originally posted June 3, 2014 on AIXchange

Recently I attended a conference and ran into someone who was engaged in a futile search for another attendee. This acquaintance had been scanning the crowd for days with no luck.

After complaining about the fact that the font size that was used for attendee badges, both first and last names, should have been bigger so that the names could be more easily read from a distance, we bandied about some potential solutions.

Twitter? Perhaps this elusive attendee would notice a tweet. What about adding this person as a friend on Facebook, or sending a message on LinkedIn? Someone else in our group suggested contacting the conference organizers to see if they could get a mobile number for this person, but that option seemed pretty shady. I know I wouldn’t want conference organizers to hand out my personal contact info willy nilly.

Ultimately, the best suggestion was pretty old school. Just go to the front desk and ask if this individual is staying at the hotel, and if so, leave an old-fashioned paper message. Obviously that wasn’t a high-tech solution, especially coming from a bunch of AIX pros, but it did, in fact, work.

I believe there’s a lesson in this anecdote. How often do you immediately turn to some sexy, leading-edge solution to accomplish some task or address some problem, when a simple, tried and true method will do the job more quickly and/or effectively?

For instance, I constantly hear people say we need more online AIX forums. But we already have online forums. So why not use the resources we currently have, like IRC and the AIX mailing list?

These resources remain valuable. Try them, if you haven’t already. You’ll find good advice from admins who are online and willing to help.

I’m all for progress; I certainly don’t miss my alphanumeric pager. But sometimes, those tried and true, old-school methods are still the way to go.

Random NIM Notes

Edit: Still good stuff.

Originally posted May 27, 2014 on AIXchange

Recently, a customer wanted to remove a client from its original NIM master server and add it to a new, just-installed NIM master server.

The comments made to this post reminded me of the niminit command:

A quick recap on how to register and initialize clients to the master.

On the master, make sure the clients are allowed to register themselves. This is the default from master installation.

Make sure the clients are in /etc/hosts.

There are also other things to make sure that comms can happen, like the /etc/hosts.equiv file…

If you can ssh to and from the master and client, then generally this is fine.

Make sure the client does not have the master software/filesets installed:

lslpp -vl | grep -i nim

check for “bos.sysmgt.nim.master 6.1.6.0 COMMITTED Network Install Manager ” and if it exist, remove it with:

installp -ug bos.sysmgt.nim.master

On the client do this:

(if the client was registered previously and the niminfo file exist)

rm /etc/niminfo

niminit -a name= -a master= -a connect=nimsh (this will build the niminfo file and register the client)

nimclient -p (permit the master to do work on the client)

nimclient -C (disable cryptographic thingy)

nimclient -l (to establish a task was successfully done by getting nim master info as output)

Now you can work on the client.

Once we removed the /etc/niminfo file and ran niminit, and we got up and running on the new NIM master server.

On another NIM topic, Waldemar Mark Duszyk wrote about removing NIM clients from NIM master servers.

I have not done any patching for a while and today, when I had to remove a nim client definition I could not remember the second command to use. Now, I do so here it is for the record the process:

First, reset the client

# nim -F -o reset NIM_CLIENT_NAME

Now, remove all associated with the client resources.

# nim -o deallocate -a subclass=all  NIM_CLIENT_NAME

At this stage the client can be removed.

# nim -o remove -F  NIM_CLIENT_NAME

the NIM_CLIENT_NAME is the hostname of the client to be removed.

Feel free to share your own useful NIM tips and tricks in the comments.

Managing a Dump Device

Edit: Don’t overlook this.

Originally posted May 20, 2014 on AIXchange

Have you ever seen errors like this in your error log?

E87EF1BE   0515150014 P O dumpcheck      The largest dump device is too small.
E87EF1BE   0514150014 P O dumpcheck      The largest dump device is too small.

Have you verified that your system is capable of storing a system dump? If not, this technote on managing a dump device could help:

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure. Its intent is to help the system administrator ensure that a system dump will be complete and usable for troubleshooting purposes.

This document applies to AIX versions 5, 6, and 7.

There are different sections to this technote, including:

Managing system dump devices
Determining proper size for dump device
Setting a tape drive as a dump device
Do not dump to a mirrored logical volume
Dumping outside the rootvg
Remote dumps over to a network
How to create a dedicated dump device
Related documentation

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are two dumps devices (a primary and secondary). To view information about the current dump devices, enter:

sysdumpdev -l

Example:

# sysdumpdev -l

primary             /dev/lg_dumplv

secondary            /dev/sysdumpnull

copy directory      /var/adm/ras

forced copy flag    FALSE

always allow dump    TRUE

dump compression     ON

type of dump         traditional

The document also provides information about the primary and secondary dump devices, along with different flags that you can set to manage your dump devices:

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the rootvg volume group.

It is not recommended that a standalone dump logical volume be mirrored. It is much better practice to have a primary and a secondary dump device, each wholly contained on separate hdisks, rather than mirroring these devices. If for some reason the primary dump device is inaccessible the dump program will then attempt to dump to the secondary device.

So how do you fix the error I listed at the start of this post? Read the whole technote for more information, but the short answer is: estimate how much space you need for your dump by running sysdumpdev –e, then divide that estimated size by your physical partition (PP) size to determine how many PPs your dump device should have:

Note: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device should be a size that is greater than the value returned. If the dump device is a standard dump logical volume, such as lg_dumplv, then use the command extendlv to increase its size. If it is the primary paging space hd6, use the command chps.

Believe me, you don’t want to wait for a catastrophic operating system software failure to discover that your dump devices are too small.

Limiting Concurrent Logins

Edit: Some links no longer work.

Originally posted May 13, 2014 on AIXchange

Recently, an AIX mailing list member was asking about limiting concurrent logins to a machine:

            Is there a setting in AIX that will allow the number of times a user can login? The scenario is thus:

            John logs in, uses one app license.
            John logs in from a second terminal and uses a second license.
            John logs in again from a third terminal and uses a third license.
            And so on.

            We have multiple users doing this and we are maxing out our application licenses, looking for a way to stop it. The logins are ssh based via a proprietary application. Their .profile funnels them into a limited menu where they can only do certain things.

Discussion followed and some good suggestions emerged. Since this issue is something that may come up for some of you, I thought I’d lay out the solution here. First, a mailing list commenter recommended this blog post:

How to restrict users to a single login to a system at any one given time.

Question: The ability to set up user accounts so that users can only be logged onto a system only once at any given time (no concurrent access) does not exist on AIX.

Cause: This is a not a defect, but just the way AIX is designed. Sun Solaris has the same limitation.

Answer: The answer is to create a method that will check to see if a user is already logged in. Here is an example of one possible solution:”

The blog author then described creating a script and making a few modifications to the system to enable the desired behavior.

Another commenter recommended the setting, chuser maxulogs=2 user_name, noting that this limits users to two login sessions. He adds that details can be found by doing a man page lookup on chuser.

I prefer this latter option for its simplicity, but either solution is worth considering if you’re faced with this dilemma.

How many of you saw the original exchange on the mailing list? Do any of you have another way to address this problem?

Using the HMC with IBM Flex System Nodes

Edit: Cannot remember the last time I thought about Flex System.

Originally posted May 6, 2014 on AIXchange

The short-lived SDMC wasn’t widely adopted, in part because customers who were familiar with the HMC didn’t want to relearn the SDMC interface. While the IBM Flex Systems Manager (FSM) isn’t going anywhere, it does pose a similar issue — at least for one of my customers that wanted to manage its standalone servers and flex nodes from the same HMC.

If you didn’t know, you can do this. However, there are caveats. IBMer James Nash has a great set of slides with information about the solution.

He begins by pointing to this announcement on new IBM Flex System functionality:

The IBM Flex System now supports the IBM Power Hardware Management Console (HMC) and the IBM Integrated Virtualization Manager (IVM). The IBM Power Rack-mounted HMC brings the same, full function PowerVM management available on POWER7 and POWER7+ rack servers to the POWER7 and POWER7+ processor-based Flex System compute nodes. IVM is an easy-to-use, browser-based tool providing access to the entry set of PowerVM functions.

The HMC function is available with HMC Version 7 Release 7.7.0 Service Pack 2. The IVM function is available with VIOS Version 2.2.2.3. Both solutions require Power Compute node firmware Version 7.7.3.

Note: The use of the IBM Power Hardware Management Console (HMC) and the IBM Integrated Virtualization Manager (IVM) are only supported as part of a Flex System configuration with Power compute nodes and cannot be used in the same configuration with IBM Flex System Management node.

James also references this technote, which explains that “flex nodes can be managed by HMC, IVM or FSM. Only one manager type can be active; dual management by FSM and HMC is not supported.”

The technote lists the supported combinations:

 01AF773_03301AF773_051 
Version 7 Release 7.7.0Service Pack 2Service Pack 3
Note: HMC does not support TLS 1.2 enablement on this firmware level.
 
Version 7 Release 7.8.0Not supported.*Service Pack 0
(approved Jan/2014)
 
Version 7 Release 7.9.0Not supported.Not supported. 

* Note: All partitions (virtual servers) on the node must be powered off prior to converting from FSM to HMC 7.7.8 or later or upgrading an existing HMC from 7.7.7 to 7.7.8 or later.

Keep in mind that if you use the HMC for this function, you miss out on some FSM features, namely:

* Auto discovery of resources

* Physical and virtual management

* Network and storage management

* Alerts, health status, call home

* Firmware management

* Remote console

James’s slides include a nice chart that lays out the respective features and functions of the HMC and FSM. For instance, both allow you to power on/off nodes and create and activate LPARs, but only the FSM allows you to manage flex chassis components, control IBM storage and the IBM network, and run VMControl.

In addition, there are some good screen shots of how to “un-manage” your devices and discover your nodes from your HMC:

“Moving from HMC managed Power Flex nodes to FSM managed Power Flex nodes is only supported if you shutdown the Power Flex nodes and set to factory defaults. This will result in loss of all LPAR/virtual Server configurations. If you plan to go from FSM to HMC then back to FSM, make a backup of the FSM before you move to the HMC. This will save configuration information from the point in time that you created the backup so any changes you make on the HMC will not be reflected when you ‘go back’ to using the FSM to mange your Power Flex nodes.”

The final slide presents some frequently asked questions:

            * Can I use FSM and HMC simultaneously (same chassis)? No.

            * When using HMC, can FSM manage the x86 servers in the same chassis? No.

            * Can the FSM manage non-Flex Power servers? No.

            * Does this mean the FSM is going away? No.

            * Can I have dual HMC management? Yes.

            * Can I use IBM Systems Director? Yes.

            * Can I use PowerVC? Yes.

            * Can some nodes use IVM and others HMC? Yes.

You may be perfectly happy running the FSM, but if not, understand that this alternative is available.

More POWER8 Announcement Details

Edit: Some links no longer work.

Originally posted April 29, 2014 on AIXchange

By now you’re probably aware of IBM’s announcement of POWER8 and new hardware models. Last week I examined the speeds and feeds. Here’s a look at some of the other coverage:

            *  ExtremeTech.com: IBM unveils Power8 and OpenPower pincer attack on Intel’s x86 server  monopoly

             *  New York Times: IBM Opens Chip Architecture, in Strategy of Sharing and Self-Interest

             *  Forbes: IBM Spends $2.4 Billion On New Power Servers And Partners With Google And Nvidia To Go After Intel

            *  DataCenterKnowledge.com: IBM Unveils New POWER8 Systems, Built for Big Data

            *  InformationWeek.com: IBM Unveils Power8 Chip As Open Hardware

           *  CrucialCIO.com: IBM Unveils Power 8 For Cloud, Big Data Crunching    

           *  ITJungle

  •  Additional resources:

             * OpenPOWER Foundation

            *  OpenPOWER roadmap

             * Video

            *  Redbooks

Of course there’s much more online (just search on POWER8 and OpenPOWER), and I’d encourage you to use Twitter search as well.

Naturally, I’m looking forward to getting my hands on the machines, but I’m especially excited to work on the machines with PowerKVM. You can be sure I’ll write more about this in the near future.

I was excited about POWER4 (which came out back in 2001), and was just as enthusiastic about POWER5, POWER6 and POWER7. I’m sure I’ll feel the same about the next big thing after POWER8. Trying out new hardware and new capabilities never gets old.

POWER8 Speeds and Feeds

Edit: Links still work.

Originally posted April 23, 2014 on AIXchange

POWER8 technology created some buzz when it was first discussed at the Hot Chips conference and slides that describe the chips could be found online before today. But now we have more information about the actual systems that will be shipping when they become generally available in June.

I recently attended an education session for IBMers and business partners that covered information around POWER8 and the new IBM hardware announcements that were made today. I hit some of the technical highlights in my article “IBM Delivers With POWER8.” I had planned for this content to be posted in my blog, but because of some technical issues, it became an article instead. I will also write future blog posts on the topic.

Device Mapping on IBM i

Edit: Link still works.

Originally posted April 15, 2014 on AIXchange

More and more IBM i administrators who once relied exclusively on internal disks are now getting their disks from VIOS using vSCSI or NPIV. These administrators need a handy way to determine which SAN LUNs correspond to which individual virtual disks in IBM i. If your SAN administrator wants to know which LUN is DD001 on the IBM i client, how do you respond?

In disparate computing environments, it’s a question that’s bound to come up. Maybe the SAN admin is making some changes and wants to take back a LUN, and the IBM i admin needs to be able to determine which LUN is allocated to which virtual disk in the IBM i environment. Maybe the administrator temporarily used an internal hdisk and is now migrating to a SAN LUN. Whatever the reason, there are certainly times when you want to know which backing device is being used in an IBM i environment.

Luckily, this document is available to help with the cross-referencing.

Problem (Abstract)

This document describes how to cross-reference (device mapping) IBM i disk units with VIOS disk units.

Resolving the problem

At some point, you may need to know the physical location of the virtual disk units. This information might be needed for performance measurements or service, or you might need to be able to translate to and from IBM i to VIOS to the physical devices (including an external disk, if relevant). This document describes how to map the i drives to the VIOS drives and then to the physical disks.

These scenarios are presented in this document:

1) Check device mapping from the HMC — With HMC version 7.3.4 SP2 or later you can see the device mapping from the HMC if the server is HMC Managed.

2) Find the device mapping in a single adapter client partition — Another way to do this mapping is by calculating the LUN from the Controller number. Back on the Display Disk Unit Details screen, we know that DD001 is on Controller 3. You should convert the controller number from decimal to hex and then add 0x80. In this case, Decimal 3 = 0x03 + 0x80 = 0x83. Looking at the lsmap, we can see the LUN 0x083 is hdisk9.

Mapping the i drive to the correct hdisk becomes important if you ever want to remove disk units from the partition. After you have removed the disk unit from the ASP, you must be able to determine which hdisk to remove from the partition profile.

3) Multiple adapter client partition — If you need to identify disks, you will need to know which virtual adapter (vhost) has the correct mapping. In this case, we need to look at the SYS Card identifier. This number identifies the client adapter ID in the partition you are working with. In a multiple adapter environment, the controller number is not unique. Therefore, you also need to determine which adapter you are looking at. To do this, you need to use the Sys Card Number from the Display Disk Unit details above. You may use either HMC or IVM to view the partition properties. Partition ID 4 has two client adapters. They connect to server adapters 401 and 403, respectively. View the server partition properties/virtual adapters tab to see the information.

4) vios command line interface on VIOS 2.1.2.X and IBM I R6.1.1 or later — If this is an IVM environment with multiple SCSI adapters you can use the following command to map the disks.

Use this command to get the system name:
lssyscfg -r sys -F name

Use the system name as input into the following command:
lshwres -m -r virtualio –rsubtype scsi –level lpar -F topology

5) The system card number explained — Above in the multiple adapter example disk unit’s details showed that the Sys Card field was 145 for 16 of the drives, and 147 for the remaining 15. Why is the sys card field a different number than the HMC or IVM adapter ID? The Sys Card field wraps at 256. So if you add 145 and 256, you get 401. Likewise, if you add 147 and 256, you get 403. We also need to keep in mind that it is possible to have more than one Sys Card of the same number because customers can configure the adapter ID. For example, looking at an IBM i partition disk unit details there are 62 disk units in various ASPs. There are two Sys Card 33s listed in the details.

Looking at the partition properties, you see the following:

TypeAdapter IDConnecting PartitionConnecting Adapter
Server SCSI801LPAR1801
Server SCSI33LPAR433

U9117.MMA.1007E34-V1-C801 = vhost1
U9117.MMA.1007E34-V1-C33 = vhost2

256 + 256 + 256 + 33 = 801
33 = 33

Have you had to map LUNs back to IBM i disks? What methods did you use?

Good Things to Bookmark

Edit: Some links no longer work.

Originally posted April 8, 2014 on AIXchange

If you’ve never heard of the POWER Systems Reference, you’re in for a treat. This site catalogs a host of basic informational and support resources.

For instance, the Quick Ref tab provides (among other things):

Default FSP Addresses

-p5

-FSP A HMC1  192.168.2.147

-FSP A HMC2  192.168.3.147

-FSB B HMC1  192.168.2.146

-FSP B HMC2  192.168.3.146

-p6/7

-FSP A HMC1  169.254.2.147

-FSP A HMC2  169.254.3.147

-FSP B HMC1  169.254.2.146

-FSP B HMC2  169.254.3.146

 Phone Numbers

888-426-4357  IBM HELP

800-426-2255  IBM Direct

800-426-4968  IBM 4 YOU

800-225-5249  CALL AIX

800-300-8751  Rochester Quality Hot Line

800-426-5463  Poughkeepsie Quality Hot Line

(800-IBM-LINE)

877-603-2145  Mechanicsburg Parts

800-678-4727 Opt. 1 Parts Administration

888-426-4357 Opt. 4,1,2 RETAIN

Public Links

Assist On Site – AOS

CoD Activation Codes

Code – IBM Support Portal

Code Compatibility Matrix

Google

Info Center

Sales Manual

The HMC Menu tab features sample HMC menus for either systems or frames, along with examples of scenarios covering HMC connections. There’s also an ASMI menu table, which, as you can imagine, displays ASMI menus (that can vary based on your firmware level).

The POWER tabs (POWER8, POWER7, POWER6, POWER5, POWER4) cover the different models of hardware including links to adapter placement, sales manuals and IBM Redbooks.

The information here is so valuable, I can’t help but wonder why I never thought of doing something similar myself. I guess I should just be grateful that I don’t have to.

While I’m at it, the QuickSheets and QuickStart guides are also worth bookmarking.

Feel free to cite your favorite online resources in comments.

A tmux Primer

Edit: It has been longer than 10 years now. Some links no longer work.

Originally posted April 1, 2014 on AIXchange

I wrote this piece about screen and vnc for IBM Systems Magazine 10 years ago, and I still refer to it, because I still use screen and vnc. However, I must admit that tmux is giving screen a run for its money these days.

A ton of good tmux tutorials are available, but I’ll start with this one:

“tmux is useful to people in different ways. To me, it’s most useful as a way to maintain persistent working states on remote servers—allowing you to detach and re-attach at will… you can use tmux to have multiple panes within multiple windows within multiple tabs within multiple sessions.”

Indeed, tmux does provide another way to look at detachable sessions. But why should a loyal screen user (like me) go learn something new?

“tmux is a lot like screen, only better. The short answer for how it’s better is that tmux is better designed to perform the same functions. Screen gets you there (kind of) but does so precariously.

“Here are a few of the key advantages of tmux over screen:

l    Screen is a largely dead project, and its code has significant issues

l    Tmux is an active project with an active codebase

l    Tmux is built to be truly client/server; screen emulates this behavior

l    Tmux supports both emacs and vim shortcuts

l    Tmux supports auto-renaming windows

l    Tmux is highly scriptable

l    Window splitting is more advanced in tmux

Enough about that. Use tmux.”

As I said, there are several other good tutorials on tmux. I also recommend this two-parter (here and here).

Perzl.org is the place to get tmux for AIX. Or start here if you’re unsure of how to get all the dependencies from perzl.org.

Once you master tmux, you won’t look back — and again, this is coming from a long-time screen user. Lately I’ve even been thinking that it would be nice if we could use tmux to reconnect to persistent sessions on the HMC.

Are you already using tmux? How about other persistent tools? I’m always looking for new things. Are there tools you’d recommend to me?

Resolving an Issue with Dual HMCs

Edit: Change control is still key.

Originally posted March 25, 2014 on AIXchange

Recently, a customer was unable to run a DLPAR command against some of the LPARs on their frame. That in itself isn’t unusual. Generally in these situations the network isn’t communicating between the HMC and the LPAR, or perhaps RMC daemons need to be restarted somewhere.

This environment had dual HMCs connected to the managed system. HMC1 controlled some of the LPARs and HMC2 controlled others, but not by design. Although there was no rhyme or reason to it, for simplification let’s say that HMC1 was controlling LPAR1 and LPAR3 and HMC2 was controlling LPAR2 and LPAR4. The correct setup would have been HMC1 and HMC2 controlling LPAR1, LPAR2, LPAR3 and LPAR4. In reality approximately 40 LPARs were on the frame, with each HMC controlling approximately half of the LPARs.

If you were on HMC1, you could DLPAR LPAR1 and LPAR3 with no issues. If you were on HMC2, you could DLPAR LPAR2 and LPAR4 with no issues. The problem was that the only way to know which HMC was controlling which LPAR was to either login to the HMC command line and run lspartition –dlpar, or use the HMC GUI and select HMC Management > View Network Topology. There was no way to know which HMC you needed to login to to manage which LPAR. This headache needed to be resolved.

Initially we did some troubleshooting with IBM Support. That resulted in us running things like:

            /usr/sbin/rsct/install/bin/recfgct
            /usr/sbin/rsct/bin/rmcctrl –p

We tried getting root access via pedbg. We also tried collecting a snap:

            /usr/sbin/rsct/bin/ctsnap

Eventually, once we escalated high enough up the support food chain, someone noticed a very basic HMC setup problem:

            The LPAR IBM.MgmtDomainRM default file shows this msg where it’s attempting to create a IBM.MCP entry for hmc1. It fails with Error number 14, duplicate key for localhost.

            2610-652 The specified time limit has been exceeded.
            Mon Feb 17 13:04:32 CST 2014(439849)      ../../../../../src/rsct/rm/MgmtDomainRM/MCP_cfg.c/01438/1.22  2613-034 Error number 14 was returned when attempting to define an IBM.MCP resource.
            2610-014 The key token localhost is a duplicate.

During the initial build of the HMCs, they had been given their unique hostname and IP address, but somehow someone made a change that resulted in both hostnames being reset to localhost. Since these HMCs had the same hostname and ran on the same network, only one of the two was capable of managing LPARs at any given time. The other one would always fail.

Needless to say, if you run multiple HMCs, make sure they have unique hostnames. And in any environment, it’s essential to establish good change control so people aren’t making changes to systems without proper approvals and documentation.

Installing Language Filesets

Edit: Future me is still glad these archives are here.

Originally posted March 18, 2014 on AIXchange

If you’ve ever installed locale files on your AIX server, you might appreciate my recent predicament. A customer’s application team recently asked me to install bos.loc.utf.EN_US as part of their overall installation. The file exists on the installation media, but not in .bff format. So how do you install it?

Web searching was of little help (perhaps I didn’t enter the most accurate search terms). In any event, my searches turned up links like this, this and this, none of which were what I was really looking for. I just wanted to learn how to install the filesets.

Through blind luck and poking around, ultimately I went into smitty > system environments > manage language environment > add additional language environments.

For both the cultural convention to install and the language translation to install I selected:

            UTF-8    English (United States) [EN-US]

I told it where to find the files, and it worked.

I cut and pasted my smitty screen below. Hopefully this will help you visualize what I’m talking about:

                        Add Additional Language Environments

            Type or select values in entry fields.

            Press Enter AFTER making all desired changes.

                                                                          [Entry Fields]

            CULTURAL convention to install                 UTF-8             English (U> +

            LANGUAGE translation to install                 UTF-8             English (U> +

            * INPUT device/directory for software         [/dev/cd0]                   +

            EXTEND file systems if space needed?        yes                               +

            WPAR Management

              Perform Operation in Global Environment     yes                             +

              Perform Operation on Detached WPARs       no                              +

                Detached WPAR Names                            [_all_wpars]                +

              Remount Installation Device in WPARs      yes                               +

              Alternate WPAR Installation Device            []

The process to install this fileset was easier than I’d expected (as using smitty usually is), though less intuitive than I’d hoped (you have to know to go into the language environments in the first place).

Incidentally, one reason I’m writing about this this experience is simply to have it documented somewhere. I actually will do web searches when I’m working on an issue, and find the answer in a link to something I’d written years earlier. In fact it happens fairly often. So as much as I enjoy sharing with readers, I admit I have another, slightly selfish motivation for writing these posts. Sometimes they help Future Me solve problems. Hopefully Future Me will appreciate the time I took to write about this particular issue.

Upgrading the HMC

Edit: Links still work at the time of this writing.

Originally posted March 11, 2014 on AIXchange

Awhile back Chris Gibson (@cgibbo) retweeted this document on upgrading to the newest HMC model.

So why would you upgrade when your current HMC is working fine? For the same reasons you upgrade anything else — for starters, faster CPU and better performance. The latest version of the HMC typically runs much faster than whatever you had before. Of course the other, more critical factor is that, over time, older HMC models are no longer supported. In any event, upgrading will happen sooner or later.

As the document notes, upgrading on the HMC isn’t simply a matter of backing up one HMC and restoring it to your new one. You’re basically clearing out your HMC and connecting a new console:

“This document describes the procedure for replacing an existing HMC. This procedure should be followed when ‘upgrading’ an existing HMC to a newer model/type. It applies to the situation where the new HMC manages the same server and may have the same IP address and host name as the HMC it replaces.

            “If this procedure is not followed, some of the errors which may be encountered include the following:

            o Open Serviceable Events which cannot be closed
            o Platform Event Logs which are not reported or called home through Serviceable Events
            o RMC communication problems between the HMC and partitions.”

I’ll preserve the meat of the information here in case the IBM link goes away in the future:

            Before removing the old HMC:

            1) Close all serviceable events.

              a. Verify that all serviceable events reported against a managed server have been reported to IBM and repaired.
              b. Close the serviceable events.

            2) Permanently remove all server connections.

              a. Record all current connections:

                  1. Access the restricted shell
                  Local HMC: Click HMC Management, Open Restricted Shell Terminal.
                  Remote: ssh to the HMC.

                  2. Run lssysconn -r all
                  3. Save the output.

              b. Expand Systems Management, Servers.

              c. For each server, remove the connection:

                    1. Select the server.
                    2. Expand Connections, select Reset or Remove Connections, select Remove Connection, click OK.

              d. For each frame, remove the connection.

                    1. Access restricted shell
                    2. Run lssysconn -r all | grep type=frame
                    3. Record the IP address of the frame connections listed in the output. Example below: 

172.16.250.255 and 172.16.255.251

    lssysconn -r all | grep type=frame
    resource_type=frame,type_model_serial_num=9458-100*9920250,side=b,ipaddr=172.16.250.255,alt_ipaddr=unavailable,state=Connected
    resource_type=frame,type_model_serial_num=9458-100*9920250,side=a,ipaddr=172.16.255.246,alt_ipaddr=unavailable,state=Connected

                    4. For each listed IP address, issue the command: rmsysconn –ip -o remove

                    Example: rmsysconn –ip 172.16.250.255 -o remove

            3) Ensure all connections have been removed.
              a. Access restricted shell
              b. Run lssysconn -r all
              c. Verify that there are no connections listed as shown in the example below:

              lssysconn -r all
              No results were found.

            4) Remove the HMC.

            Configure the new HMC:


Note: Restoring upgrade data or backup HMC data to a different model is not supported. A possible alternative is to use HMC data replication to replicate information onto the new hardware. Information that can be replicate includes: customer contact information data; user data (user profiles, task roles and resource roles); outbound connectivity configuration data.

Remember to periodically ask, is your firmware current? Is your HMC code current? Is your HMC model current? Is your VIO server code current? Are your OS images patched?

So are you ready to upgrade now? If not, what’s holding you back?

The Key Question of Why, and Why the Answer Matters

Edit: Still good questions to ask ourselves. Last link no longer works.

Originally posted March 4, 2014 on AIXchange

A while back Nigel Griffiths (@mr_nmon) tweetedabout a TED talk on leadership. It was 18 minutes well-spent.

Simon Sinek, the speaker, gave this presentation in 2009, and it was posted to TED.com in 2010. He starts by drawing three circles that look like a target. The innermost circle is labeled “Why?” The middle layer is is labeled “How?” and the outer layer is labeled  “What?”

Simon believes that understanding why we do the things we do is significantly tougher than explaining how or what it is we do. He also finds the question of why to be more compelling. He points to prominent companies advertising their wares. For instance, he says that Apple tells consumers, foremost, why they should buy Apple products. At Apple, they “think different” and create machines that are easy to use. He contrasts that with Tivo, which eschews the why for the what, and simply informs consumers about the features of its products.

Simon further considers the vagueness of the question of why by pointing out how we often use our emotions to arrive at decisions. We cannot really even articulate why — it’s just a gut decision. He believes that those who can successfully communicate the reason why are people who can get others to believe in their vision.

Of course this extends beyond ad campaigns. Simon believes that businesses thrive when everyone believes in what’s being accomplished. His contention is that if you seek talented people to do a job, you’ll get people who will perform acceptable work for the money. However, if you hire talented people who also believe in what you’re doing, their passion will lift the entire operation. He adds that people follow causes far more readily than they follow individual leaders.

In his response to Simon’s presentation, Nigel considers the question of why he uses and advocates for IBM Power Systems and POWER8 processors.

            “Why POWER8?

            * We believe server sprawl means a future in which we drown in small computers taking 80 percent of the world’s electricity to run their idle loops 80 percent of the time.

            *We believe we have got to build a vastly better computer to avoid that.”

Again, I encourage you to take a few minutes and view Simon’s TED talk. Then you may want to ask yourself, as an AIX pro, why? Why are you passionate about AIX and Power Systems? Why were you excited about upgrading from POWER6 to POWER7+ processors, and why are you excited for POWER8 to come out?

For that matter, why do you even do the job you do? Is it strictly for the paycheck, or is there a broader reason for your career choice? Does having access to superior hardware and tools give you a more positive outlook on your job? Does the opportunity to learn new things on a daily basis drive you (as it does me)? Why do you do what you do?

Popular Presentation Located, Updated

Edit: Link no longer works.

Originally posted February 25, 2014 on AIXchange

I often get questions from readers. The most common question I’ve received recently is, “Where did the slides go?”

In April 2013 I noted that Fredrik Lundholm had been compiling the Power Implementation Quality Standard for commercial workloads, and posted a link to what was then the latest version of his slide presentation (1.9). I knew this was valuable information, but I had no idea how widely it was being read until this particular set of slides disappeared from the Internet. I’ve received at least an email or two every week from people wondering what happened.

What happened was that Fredrik regularly updates this information, and the links tend to change when he does. The good news is his latest version, 1.12, is available here. On Page 3 he lists the changes he’s made in each version of the presentation. I’ll just highlight the updates that have come since version 1.9:

            Changes for 1.12

            – PowerHA and PowerHA levels, AIX levels, VIO levels

            – Virtual Ethernet buffer update

            Changes for 1.11

            – Power saving animation

            – Network configuration update admin VLAN/simplification

            – Removal of obsolete network design

            Changes for 1.10

            – Favor performance without Active Energy Manager

            – AIX/GPFS code level updates

            – AIX Memory Pin

For my part I posted a comment to my post from last year, noting this new version. Please let me — and Fredrik — know if you find this information useful.

PowerHA and Multicast Setup

Edit: Link works at the time of this writing.

Originally posted February 18, 2014 on AIXchange

Recently during a PowerHA 7.1.2 installation, the network team was unable to get multicast communication working properly. Fortunately we were able to use this document to get everything going.

From the document, entitled “PowerHA System Mirror 7.1 and Multicasting Setup”:

            “PowerHA SystemMirror 7.1 Standard Edition High Availability solution implements clustering using multicast (IP based multicast) based communication between the nodes/hosts in the cluster. Multicast based communication provides for optimized communication method to  exchange not only heartbeats, but also allows clustering software to communicate critical  events, cluster coordination messages etc in 1 to N method instead of communication 1 to 1 between the hosts.

            “Multicast communication is a well established mode of communication in the world of TCP/IP network communication. However in some cases, the network switches used in the communication path need to be reviewed and enabled for multicast traffic to flow between the cluster hosts through them. This document explains some of the network setup aspects that may need to be reviewed before the PowerHA SystemMirror 7.1 cluster is deployed.

            “Note that multicast communication is used during the initial discovery phase when the cluster is being created, but also during the normal operations of the cluster. Hence it is extremely important that the multicast traffic to flow between the cluster hosts in the datacenter before the cluster formation can be attempted. Please plan to test and verify the multicast traffic flow between the would-be cluster nodes before even attempting to create the cluster.”

Before I get too far along, I should note that with PowerHA 7.1.3, unicast is an added communication option. In fact, it’s the default option. These issues with getting multicast working are likely a behind this change. But in the case of this customer, the commitment was made to version 7.1.2.

Here’s a bit more about multicast:

            “Multicasting is a form of addressing, where a group of hosts form a group and exchange messages. A multicast message sent by one in the group is received by all in the group. This allows for efficient cluster communication where many times messages need to be sent to all the nodes in the cluster. For example a cluster member may need to notify the rest of the nodes about a critical event and can accomplish the same by sending a single multicast packet with the relevant information.

            “One of the simplest method to test end to end multicast communication is to use the mping command available on AIX. In Fig 1, start the mping command in receive mode on one Host (Say Host A) and then use mping command to send packets from the other Host (Host B). If   multiple hosts will be part of the cluster, test end to end mping communication from each host to the other.”

Finally, here are the document’s troubleshooting guidelines:

            “If mping command fails to receive packets from Host to Host in the network environment, there could be some issue in the network path in regards to multicast packet flow. Follow some of the general guidelines below to troubleshoot the issue:

  1. Review the switch vendor’s documentation for guidelines in regards to switch setup. Some of the known switch guideline links are included in the reference.
  2. Disable IGMP snooping on the switches. Most switches will allow for disabling IGMP snooping. If your network environment allows, disable the IGMP snooping and allow all multicast traffic to flow without any problems across switches.
  3. If your network requirements does not allow snooping to be disabled: Debug the problem by disabling the IGMP snooping and then adding network components one at a time for snooping
  4. Debug if necessary by eliminating the cascaded switch configurations (having only one switch between the Hosts).”

In our case, we disabled the IGMP snooping on the switch and multicast started to work.

What about your experience? Did you have any issues getting multicasting up and running? Please share your thoughts in comments.

Document Examines IBM i External Storage Options

Edit: In this piece I address a future reader, who would now be a past reader.

Originally posted February 11, 2014 on AIXchange

As I’ve previously noted, while this blog is focused on AIX, I think it’s worthwhile to occasionally discuss the IBM i operating system. Increasingly, AIX pros are asked to support their IBM i counterparts as they connect to external storage for the first time.

For many years IBM i systems only used internal disks. And I expect that some IBM i environments will continue to rely exclusively on internal disk for years to come. After all, managing your machine is easy when you’re in total control of the environment from disks to server. However, things have changed. These days IBM i is commonly used in shared environments (like SANs), and of course this is where adding disks becomes tricky.

While the VIO server is common to both AIX and IBM i, those of us on the AIX side have been using it for years. In contrast, many IBM i pros have little to no experience with VIOS, and thus find it difficult to pick up. If you’re an IBM i administrator in this situation, you may find this document helpful.

“Hints and Tips: V7000 in an IBM i Environment” examines external storage options. The authors are Alison Pate, IBM Advanced Technical Sales Support, and Jana Jamsek, IBM Advanced Technical Skills, Europe. The document was most recently revised in August 2013. (I like to note the date because readers will often find years-old posts on this blog that reference documentation that’s likely been updated over time. So if you see this in, say, 2017, first, glad you’re here, Future Reader, and second, be sure you track down the latest version of Alison and Jana’s work.)

For instance, this section lays out the challenges of attaching IBM i to SAN disks without VIOS:

            Translation from 520 byte blocks to 512 byte blocks

            “IBM i disks have a block size of 520 bytes. Most fixed block (FB) storage devices are formatted with a block size of 512 bytes so a translation or mapping is required to attach these to IBM i. (The DS8000 supports IBM i with a native disk format of 520 bytes).

            “IBM i performs the following change of the data layout to support 512 byte blocks (sectors) in   external storage: for every page (8 * 520 byte sectors) it uses additional 9th sector; it stores the 8-byte headers of the 520 byte sectors in the 9th sector, and therefore changes the previous 8* 520-byte blocks to 9* 512-byte blocks. The data that was previously stored in 8 * sectors is now spread across 9 * sectors, so the required disk capacity on V7000 is 9/8 of the IBM i usable capacity. Vice versa, the usable capacity in IBM i is 8/9 of the allocated capacity in V7000.

            “Therefore, when attaching a Storwize V7000 to IBM i, whether through vSCSI, NPIV or native attachment this mapping of 520:512 byte blocks means that you will have a capacity ‘overhead’ of being able to use only 8/9ths of the effective capacity.

            “The impact of this translation to IBM i disk performance is negligible.”

The document also identifies the requirements for and potential issues with using vSCSI or NPIV. One section looks at sizing for performance and the need to consider I/O as well as capacity. The authors recommend getting a Disk Magic model to determine what’s best for your environment. They suggest starting with 80G LUN sizes, noting, “the recommendation is to create a dedicated storage pool for IBM i with enough managed disks backed by a sufficient number of spindles to handle the expected IBM i workload. Modeling with Disk Magic using actual customer performance data should be performed to size the storage system properly.”

IBM Mulitpath is another topic of discussion:

            “With using the recommended switch zoning we achieve that four paths are established from a LUN to the IBM i: two of the paths go through adapter 1 (in NPIV also through VIOS 1) and two of the paths go through adapter 2 (in NPIV also through VIOS 2); from the two paths that go through each adapter one goes through the preferred node, and one goes through the non-preferred node. Therefore two of the four paths are active, each of them going through different  adapter, and different VIOS if NPIV is used; two of the path are passive, each of them going through different adapter, and different VIOS if NPIV is used. IBM i Multipathing uses Round Robin algorithm to balance the IO among the paths that are active.”

In addition, the document includes good graphics that further help explain the concepts being discussed.

There’s much more than I can cover here, so be sure to check it out. Though the document is IBM i specific, I believe this information is relevant for IBM i and AIX admins alike.

AIX Discussion List is a Place to Get Answers

Edit: Does this even exist anymore?

Originally posted February 4, 2014 on AIXchange

Where do you go when you have questions and need help? Do you talk to a coworker? Call IBM Support? Head straight to Google?

One option you may not be familiar with is the IBM AIX Discussion list. Mailing lists may seem like a relic from the Internet’s early days, but rest assured, they’re still out there. However, for you younger folks, I’ll include this brief backgrounder from Wikipedia:

“An electronic mailing list or email list is a special usage of email that allows for widespread distribution of information to many Internet users. It is similar to a traditional mailing list— a list of names and addresses— as might be kept by an organization for sending publications to its       members or customers, but typically refers to four things:

  • a list of email addresses,
  • the people (“subscribers”) receiving mail at those addresses,
  • the publications (email messages) sent to those addresses, and
  • a reflector, which is a single email address that, when designated as the recipient of a message, will send a copy of that message to all of the subscribers.”

According to its website, the IBM AIX Discussion list “is intended for the discussion of AIX. AIX is the IBM Unix solution for small and large computer systems. Initially, this list will be used for dissemination of information and technical details of AIX on all levels. It may be necessary to break this list down into machine types that AIX will run on.”

You can join the mailing list, browse the archives, or search the archives. If you don’t want every message (and associated replies) coming into your inbox, just subscribe to the mailing list digest. That way you’ll only receive one daily email containing all the discussion from the previous 24 hours.

I forget when this list was established, but I’ve been using it for many years. Lately though it seems there’s been a bit more traffic. Still, the more the better, so I thought I’d mention it here. If you have questions about AIX and PowerVM, the IBM AIX Discussion list is a great place to get answers. I recognize quite a few of the names of participants, and these are knowledgeable, trustworthy people. And as mentioned, even if you don’t get into the discussion, the archives offer an invaluable repository of information.

Are you familiar with the list? If so, are you part of the conversation, or are you a lurker? If you have questions — and we all have questions from time to time — I encourage you to subscribe and make use of this tool.

A Look at AIX Mirror Pools

Edit: This is still an interesting method to consider.

Originally posted January 28, 2014 on AIXchange

A storage guy I know went to last fall’s IBM Technical University conference to learn more about IBM Power Systems and AIX, but he came away very excited about the AIX Logical Volume Manager (LVM). We may take it for granted, but for him this information about what we could do with our built-in volume manager was revolutionary.

In addition to the base LVM and its capability to easily mirror logical volumes (including mirroring physical disks to LUNs, as well as mirroring LUNs that might be coming from different physical storage arrays), there’s also the relatively new concept of AIX mirror pools.

IBMer Michael Perzl authored and recently updated a document that does a great job of explaining this. From the Introduction to AIX Mirror Pools” abstract:

            “This document tries to shed more light onto AIX mirror pools which were introduced with AIX V6.1 Technology Level 2. AIX mirror pools unfortunately seem not to be well known despite being a very powerful new AIX feature which simplifies the task of mirroring data significantly. One reason may be that for using AIX mirror pools no extra commands exist but the existing AIX LVM commands have been extended to incorporate the mirror pool functionality.

            “This document is not meant to be an all-encompassing guide to AIX mirror pools but give a first impression what tasks can be accomplished much easier than before. The intended audience for this document are AIX users and system administrators. A general knowledge and understanding of AIX LVM is required.

            “An example of how mirror pools can be beneficial is when used with remote disks. If a volume group is created with physical volumes that are located in two different locations, the disks in one location can be assigned to one mirror pool and the disks in the other location to a different mirror pool. When a logical volume is created in that volume group, each mirror copy of that logical volume can be assigned to a mirror pool. Thus, when partitions are allocated for that copy they will only come from disks that are in the assigned mirror pool.

            “Without mirror pools, the only way to restrict which physical volume is used for allocation when creating or extending a logical volume is to use a map file. This typically is a very tedious and error-prone process. Thus, the main advantage of mirror pools is that they simplify the task of mirroring data significantly compared to the steps that were required before. This is specially beneficial when used with remote disks. If a volume group is created with physical volumes that are located in two different locations, the disks in one location can be assigned to one mirror pool and the disks in the other location to a different mirror pool. When a logical volume is created in that volume group, each mirror copy of that logical volume can be assigned to a mirror pool. Thus, when partitions are allocated for that copy they will only come from disks that are in the assigned mirror pool.

            “The following system requirements must be fulfilled for mirror pools:

            • Mirror pools are only available in AIX V6.1 TL 2 and higher

            • Mirror pools are only available for SVG type (scalable) volume groups.

            • After assigning PVs (physical volumes) to a mirror pool, the volume group can no longer be imported to a previous version of AIX that does not support mirror pools.

            • While it is possible to assign multiple logical volume copies to a mirror pool, it is recommended that only one copy of a logical volume be assigned to a mirror pool.

            • Volume groups can enable strict mirror pools. If this is enabled all of the logical volumes in the volume group must use mirror pools.

            • Any changes to mirror pool characteristics will not affect partitions allocated before the changes were made. The reorgvg command should be used after mirror pool changes are made to move the allocated partitions to conform to the mirror pool restrictions.”

The entire document is well worth your time. Go through it and then and get onto a lab system so you can play around with mirror pools.

IBM Rolling Out Entitlement Validations

Edit: Some links no longer work.

Originally posted January 21, 2014 on AIXchange

Last fall I wrote about IBM changing its approach to delivering software products and updates:

“Another portal feature is an entitlement check that allows customers to download fixes. Just enter your machine type and serial number. The various entitlement types are tied to the level of maintenance you have on your machines. Going forward, IBM will move toward making the capability to download fixes a privilege available primarily to paying customers.”

Get ready. It appears we’ll soon need to verify that we’re entitled to download fixes. The details can be found here:

            “Starting in January 2014, IBM will implement entitlement validation on Fix Central for select software products and updates and for Machine Code (also known as firmware or microcode) updates for select machines. Entitlement for Machine Code updates will be checked through user-provided serial numbers. Entitlement for software products will be validated through IBM ID association to relevant IBM customer numbers. Additional information may be requested or required to confirm entitlement.

These entitlement validations are not being implemented in all countries at this time.

IBM reserves the right to change, modify or withdraw its offerings, policies and practices at any time.

FAQs

            1. How does entitlement work?

     A: Entitlement for Machine Code updates is validated through user-provided machine serial numbers. Entitlement for software products will be validated through IBM ID association to relevant IBM customer numbers.

            2) What if I failed entitlement but have warranty or an applicable support contract in place?

     A: Please submit a request for help during the download process.

            3) Who can I contact for additional help?

     A: You can submit a request for help during the download process or contact IBM Support.

            Note: Fix Central Machine Code updates are available only for IBM machines that are under warranty or an IBM hardware maintenance service agreement. Code for operating systems or other software products is available only where entitled under the applicable software warranty or IBM software maintenance agreement. All code (including Machine Code updates, samples, fixes or other software downloads) provided on the Fix Central website is subject to the terms of the license agreements which govern the use of the associated code.

Some exemptions may apply.

Visit the IBM Fix Central site directly at http://ibm.com/support/fixcentral or navigate to the “Downloads” section on any IBM Support Portal product page.

For even more information about all of IBM’s Electronic Support sites and tools, please visit our   information site at: https://www.ibm.com/support/home/

The next time you need to download fixes, you might want to give yourself some extra time to make sure you’re properly entitled. Try to be proactive and test things out before you find yourself in a situation where you need to try to download something in a hurry.

Adapter Numbering Schemes

Edit: These days I think most shops don’t worry about adapter numbers.

Originally posted January 14, 2014 on AIXchange

How do you number your virtual adapters when you’re planning to build a new Power system?

Some people put no thought into their numbering plans; they simply have the system pick the next available adapter number in the HMC GUI. Others just use even and odd numbers — 10, 20, 30 and 40 from vio1 and 11, 21, 31 and 41 from vio2 — and then map the next LPAR to the next available number. When troubleshooting is needed, they look to their documentation or employ some other method to figure out which adapter goes where.

I was recently shown a numbering scheme for virtual fibre and virtual SCSI adapters — 4-digit numbers for fibre and 3-digit numbers for SCSI. The first digit was 1 for vio1 or 2 for vio2. The second digit on the virtual fibre adapter indicated which physical fibre port it was connected to via the vfcmap command.

When using NPIV (see here and here) and running vfcmap, you indicate the physical FCS device you’ll be using for your connection. In this numbering scheme I know the VIO server the virtual adapter is coming from and the physical FCS device it’s mapping to. The last two digits indicate the partition ID it’s connecting to. (Obviously if your partition IDs extend into 3- or 4-digits, you would modify as necessary for vio3 and vio4.)

For example, on vio1 I might have virtual adapter 1112, and on vio2 I might have 2112. This would indicate the VIO server it came from, the physical device it was using and the LPAR ID, which in this case would be 12. By using the same numbers on both the client and the server, tracking down adapters becomes very simple. Virtual SCSI is the same, only 112 and 212 would be used. This indicates the VIO server it came from and the LPAR ID it was connected to. There would be no need to indicate the physical device it was mapping to.

A scheme like this comes in handy when you’re planning server builds that comprise many physical machines and many, many virtual machines. For example, a customer wanted four paths over four virtual adapters into their dual VIO servers, two from each VIO server.

For LPAR ID 10, 1110 and 1210 could be used for vio1, and 2110 and 2210 for vio2. Then for LPAR ID 11, 1311 and 1411 could be used for vio1, and 2311 and 2411 for vio2. This pattern would continue until you ran out of physical adapters; then you would circle back around. For example, if you had the physical fcs0, fcs1, fcs2 and fcs3 adapters on your vio1 server, you might see a pattern like this for the four LPARs with IDs 10, 11, 12 and 13:

            1010 – vio1, physical adapter 0, id10

            1110 – vio1, physical adapter 1, id10

            1211 – vio1 physical adapter 2, id11

            1311 – vio1, physical adapter 3, id11

            1012 – vio1, physical adapter 0, id12

            1112 – vio1, physical adapter 1, id12

            1213 – vio1, physical adapter 2, id13

            1313 – vio1, physical adapter 3, id13

With fcs0, fcs1, fcs2, and fcs3 adapters on vio2, you might see:

            2010 – vio2, physical adapter 0, id10

            2110 – vio2, physical adapter 1, id10

            2211 – vio2, physical adapter 2, id11

            2311 – vio2, physical adapter 3, id11

            2012 – vio2, physical adapter 0, id12

            2112 – vio2, physical adapter 1, id12

            2213 – vio2, physical adapter 2, id13

            2313 – vio2, physical adapter 3, id13

You’d then have these virtual adapters on each client.

            LPAR 10 – 1010,1110,2010,2110

            LPAR 11- 1211,1311,2211,2311

            LPAR 12- 1012,1112,2012,2112

            LPAR 13 – 1213,1313,2213,2313

It may seem confusing, but trust me, the more you use it, the more sense it makes.

Feel free to share your own adapter-numbering scheme in Comments.

Configuring X11 Forwarding

Edit: Some links no longer work.

Originally posted January 7, 2014 on AIXchange

A customer recently ran across an issue where their X11 forwarding was working fine on an AIX6.1 machine, but not on an AIX 7.1 machine. They were looking for a second set of eyes to make sure their configuration looked OK. 

Here’s the question (and ultimately, the answer) that I received:

            I’m stumped on a problem and hoping you might be able to shed some light on it. We’ve added several 7.1 systems recently and I’m trying to get X11 forwarding working on one of them. I’ve  got the systems configured the same way as our 6.1 systems, and PuTTY is configured the same way as well, but when I login to the 7.1 box, no .Xauthority file is created and my $DISPLAY doesn’t get set.

            I found a post on how to manually recreate the .Xauthority file and followed those steps, but the .Xauthority file is not created. If I run an xauth list command it says it’s creating the file, but it doesn’t actually create the file. The sshd_config file has X11Forwarding yes and a line for the Xauthlocation.

            I figured this would be something simple in /etc/ssh/sshd_config, but was told this when I asked about it:

                        Here’s the sshd_config info, and openssh was restarted using stopsrc -s sshd; startsrc -s sshd:

                        X11Forwarding yes

                        X11DisplayOffset 10

                        X11UseLocalhost yes

                        XauthLocation /usr/bin/X11/xauth

            (I’ve also tried it with the X11DisplayOffset and X11UseLocalhost commented out and restarting after making that change.)

            When we looked at the putty event log we saw:

            2013-11-05 09:43:56        Requesting X11 forwarding

            2013-11-05 09:43:56        X11 forwarding refused

            We also saw this article, and verified all of it was set correctly.

            I use ssh the X11 forwarding, but the DISPLAY variable isn’t set:

* Check X11Forwarding directive in sshd_config

* Check that ssh client has X11 forwarding option set

* The AIX machine is missing xauth programm. Install X11.apps.config fileset.

* There are some older OpenSSH or OpenSSL versions that are buggy. I have had issues with OpenSSH versions 4.6.X, OpenSSH_4.3p2, OpenSSL 0.9.7l 28

            And at this point we set up a webex so we could share the screen and figure out what the problem was. We changed settings in sshd_config. We tried just manually exporting the DISPLAY to the windows workstation running cygwin and that worked fine. We checked /etc/hosts and /etc/netsvc.conf and everything seemed to be in order.

            Finally, we found this post. What worked for me was to add ‘AddressFamily inet’ to /etc/ssh/sshd_config.

         This article had the same information.

            Once we added the AddressFamily inet to the sshd_config, it worked as expected.

            If someone else runs into this issue on AIX 7.1, hopefully this information will help. This also shows how important it is to document these finds when we come across them. I bet that a year from now someone will end up reading this post and it will fix that person’s problem, just like reading the articles I found fixed my problem.

Using rendev

Edit: This looks like the last post from 2013 based on the note below. Some links no longer work.

Originally posted December 17, 2013 on AIXchange

**Please note: This blog will be updated on January 7, 2014.**

Two weeks ago I reposted a tip from Russell Adams on displaying disk UUID. Here’s another tip that Russell submitted to a mailing list for IBMers and business partners earlier this fall. In this case he’s taken the time to update it. (Note: There are minor edits for the sake of clarity.)

            In AIX 6.1 TL6 and AIX 7.1 a new command was introduced to rename devices in AIX, rendev. This makes keeping your rootvg on hdisk0 (and hdisk1) and preserving device naming consistency across VIO and HACMP nodes simple!

            rendev -l device -n newname

            A few caveats:

            1) Renaming devices should always be done while the device is in a defined state (i.e., after “rmdev -l”); it cannot be used on active PVs in a VG or other online devices. While rendev can perform the rmdev of the device for you, it’s better to take the device offline first.

            2) Renaming ethernet (entX) adapters requires either manually renaming the enX and etX adapters, or removing them. Once the entX device has been renamed, cfgmgr will create matching enX & etX devices.

            3) Renaming fiber cards (fcsX) requires all that child devices be renamed manually. This includes fcsX, fscsiX, fcnetX, and sfwcommX. Use “rmdev -Rl fcsX” to unconfigure all the parent and child devices into the defined state, and then rename them. cfgmgr does not name the child devices to match.

            4) I recommend using rendev for renumbering like devices (e.g., ent2 -> ent11) rather than giving devices new name prefixes (e.g., ent2 -> lan3). Renaming device paths that are used by other device drivers (e.g., Powerpath) may cause issues.

            5) You can rename vhost and vfchost devices on VIOS in oem_setup_env *before* they’re mapped (not after). The devices should be manually put into a defined state first, and then can be renamed via rendev.

            Example:

            – New LPAR ID 5 will be mapped to VIO1 (ID 99) and VIO2 (ID 98)
            – LPAR ID 5 is assigned the following virtual adapters in its profile:
              – Slot 1-9 for network adapters
              – Slot 51 VSCSI to VIO1 (99) slot 51
              – Slot 52 VSCSI to VIO2 (98) slot 52
              – Slot 53 VFC to VIO1 (99) slot 53
              – Slot 54 VFC to VIO2 (98) slot 54
            – VIO1 (ID 99) dynamically adds:
              – Slot 51 VSCSI to LPAR ID 5 slot 51
              – Slot 53 VFC to LPAR ID 5 slot 53
            – VIO2 (ID 98) dynamically adds:
              – Slot 52 VSCSI to LPAR ID 5 slot 52
              – Slot 54 VFC to LPAR ID 5 slot 54

            Then on each VIO server, for each vhost and vfchost device:

            – lsdev -slots determine hardware location codes of new vhost and vfc devices


            – For example, vhost12 is XXXX-V99-C51
              – rmdev -l vhost12
              – rendev -l vhost21 -n vhost51  (from C/slot number to match LPAR ID and slot)
              – cfgmgr (or cfgdev) to activate them

            When done, map normally for VSCSI or VFC.

            The client LPAR and VIO slot matching numbering system has always been a good idea. Naming the devices significantly increases the readability of the virtual environment. All vhost and vfchost devices will have unique names that link them to the client LPAR.

            When coupled with ‘lspv -u’ on the client LPAR and VIO servers to determine client LUN mappings via UUID without manual tracing, this significantly simplifies virtual environments and troubleshooting.

Thanks again to Russell for allowing me to repost this information.

Running the RoCE Adapter in NIC Mode

Edit: Some links no longer work.

Originally posted December 10, 2013 on AIXchange

A colleague sent me an interesting solution to a problem he was seeing with a PCI32 10GbE RoCE converged host bus adapter.

It was coming up in RDMA mode by default under VIOS 2.2.2.3 (which is AIX 6100-08-03-1339 under the covers), and the customer wanted it to run it in NIC mode:

            “The PCIe2 10 GbE RDMA Over Converged Ethernet (RoCE) Adapter was supported only on previous versions of the AIX operating system to use the Remote Direct Memory Access (RDMA) configuration mode. AIX 7 with 7100-02 or later supports the adapter that is configured in either the RDMA or the network interface card (NIC) configuration. The host bus adapter (HBA), which was not available in earlier versions of the AIX operating systems, manages which mode is enabled.

            “As of AIX 7 with 7100-02, the PCIe2 10 GbE RoCE Adapter can be configured to run in the NIC configuration. If you do not have the network-intensive applications that benefit from RDMA, then you can run the adapter in the NIC configuration.”

The preceding URL includes some instructions on moving from RDMA to NIC (or from NIC to RDMA), but also note the steps taken by my colleague in his situation:

            ISSUE: PCI card EC30 (PCIe2 10GbE RoCE Converged Host Bus Adapter) is presenting only as hba0 and roce0 and not providing ethernet over fibre devices such as entX. The intention is to use this card for use on a VIO server as a shared Ethernet adapter in NIC mode.

            SOLUTION: The stacktype has to be changed from aix_ib to ofed. The default for this card as installed this instance is aix_ib (Infiniband).

########################

#  VIEW CARD LOCATION  #

########################

# lscfg |grep “-C5-“

* hba0             U78C5.001.DQD02KZ-P2-C5-T1        PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714106104)

+ roce0            U78C5.001.DQD02KZ-P2-C5-T1-L0     PCIe2 10GbE RoCE Converged Network Adapter

########################

#  REMOVE ROCE DEVICE  #

########################

# rmdev -dl roce0

roce0 deleted

#############################

#  VIEW ATTRIBUTES OF HBA0  #

#     (note stack_type)     #

#############################

# lsattr -El hba0

bar0          0xfbf00000         Bus memory address 0 False

bar1          0xfc000000         Bus memory address 1 False

bar2          0x80000000         Bus memory address 2 False

busintr       0                  Bus interrupt level  False

busintrl      129536             Bus interrupt        False

devid         0xb315506714106104 Device ID            False

intr_priority 3                  Interrupt priority   False

rom_mem       0x80080000         ROM memory address   False

stack_type    aix_ib             RoCE Stack Type      True

#############################

#  CHANGE DEVICE ATTRIBUTE  #

#############################

# chdev -l hba0 -a stack_type=ofed

hba0 changed

########################

#  RUN CFGMGR          #

########################

# cfgmgr

#############################

#  VIEW ATTRIBUTES OF HBA0  #

#     (note stack_type)     #

#############################

# lsattr -El hba0

bar0          0xfbf00000         Bus memory address 0 False

bar1          0xfc000000         Bus memory address 1 False

bar2          0x80000000         Bus memory address 2 False

busintr       0                  Bus interrupt level  False

busintrl      129536             Bus interrupt        False

devid         0xb315506714106104 Device ID            False

intr_priority 3                  Interrupt priority   False

rom_mem       0x80080000         ROM memory address   False

stack_type    ofed               RoCE Stack Type      True

#########################

#  VIEW CARD LOCATION   #

# (note new ent devices #

#########################

# lscfg |grep “-C5-“

* hba0             U78C5.001.DQD02KZ-P2-C5-T1        PCIe2 10GbE RoCE Converged Host Bus Adapter (b315506714106104)

+ ent4             U78C5.001.DQD02KZ-P2-C5-T1-L1     RoCE Converged Network Adapter

+ ent5             U78C5.001.DQD02KZ-P2-C5-T1-L2     RoCE Converged Network Adapter

As noted, information on this issue can be found under PCIe2 10 GbE RoCE Adapter support:

            “The PCIe2 10 GbE RoCE Adapter is preconfigured to operate in the RDMA configuration mode. A network that uses RDMA is more complicated to set up than the NIC configuration mode, but provides better performance than the NIC mode for network-intensive applications. This mode is often helpful for network storage or high-performance computing.”

Displaying Disk UUID

Edit: Some links no longer work.

Originally posted December 3, 2013 on AIXchange

If you’re on a mailing list for IBMers and business partners, you saw this information from Russell Adams earlier this fall. For the benefit of those who aren’t on that list, and with Russell’s permission, I’m reposting it here. (Note there are minor edits for the sake of clarity.)

AIX 6.1 TL7 introduced a new flag for the lspv command. It shows the unique ID (UUID) of disks in additional columns of the lspv output.

This new lspv -u is particularly useful in VIO server environments using vSCSI because the VIO client LPAR hdisk UDID contains the real UDID from the VIO server hdisk.

For example on a client LPAR using VSCSI for the rootvg (merged columns and spaces in the UDID are not a paste error)

… our client UDID contains the UDID from the VIO server with a prefix and suffix (where ^ indicates the prefix and suffix added by the VIO server):

Using UDIDs, the client can be quickly cross-referenced to the server with the most significant bytes of the UDID — in this case the middle 15 digits.

Historically, to find the real LUN that a client is using in a VIO server environment would require these steps for each VIO server:

* Obtain client hdisk parent vscsi device hardware location code and LUN number.
* Lookup on HMC, which VIO server and slot the client vSCSI device is linked to.
* Lookup vhost adapter on the VIO server by slot number.
* Lookup vSCSI mappings for vhost adapter to location hdisk on the VIO server.

A client with dual VIO servers would have to repeat the procedure twice. PVIDs can also shortcut the process, but they may not show up on the VIO server’s lspv output until after they’re written to the client and the VIO server is rebooted. If the client rewrites the PVID, the VIO server can also be out of date. Thus UDIDs are the preferred method because they’re static values.

The output can stretch the columns until they merge and spaces in the UDID break the columns. Hopefully this will be fixed in a future release.

More Twitter discussion from @robmcnelly:

@neverfishagain Tesla and Power 8 #power http://lnkd.in/dnHrmtS

@brian_smi  The Shell Scripts that make up #AIX https://www.ibm.com/developerworks/community/blogs/brian/entry/the_shell_scripts_that_make_up_aix …

@ROOTvgNET http://www-01.ibm.com/support/docview.wss?uid=aixtools_home … – IBM AIX Support Center Tools – United States | Some great #aixtools you should know.

@chmod666 The guy who gives me my passion for #AIX has just opened a blog. Please check it http://www.aixpowerlevel.com  . Thank you @prosty !!!!

@robmcnelly RT @cgibbo RT @Prosty: How to download a specific package from IBM fix central http://aixpowerlevel.com/2013/11/how-to-download-a-specific-package-from-ibm-fix-central/ … #AIX

Logging Console Output

Edit: Some links no longer work.

Originally posted November 27, 2013 on AIXchange

If you want to log your console output so you have a record of the commands you’re running, I can think of a few ways to accomplish this. One is to login and use the script command:

            “The script command makes a typescript of everything displayed on your terminal. The typescript is written to the file specified by the File parameter. The typescript can later be sent to the line printer. If no file name is given, the typescript is saved in the current directory with the file name typescript.

            “The script ends when the forked shell exits.

            “This command is useful for producing hard-copy records when hard-copy terminals are in short supply. For example, use the script command when you are working on a CRT display and need a hard-copy record of the dialog.”

Another option is to set up logging in PuTTY, connect to the HMC over ssh and run vtmenu or run mkvterm:

            “Being someone who does implementation services, I want to always be sure about what I do, what I did and what affect it had. So keeping logs of every time I connect to any device is very important to me. I want to be able to go back in time to any point at any client I have ever worked with (and here are some other putty tricks to try)…”

And here’s a review of 10 native PuTTY tips.

Perhaps though you’re just using the vanilla Java console session on the HMC. In that case, this method should work:

            “Command line access to the HMC through SSH is required. You can enable this from the HMC console or through remote WebSM.

            “Once you are logged in:

            Click HMC Maintenance -> System Configuration -> Enable/Disable Remote Command *Execution, check the SSH box.

            Next, click Enable/Disable Remote Virtual Terminals, check the box for Enable.

            1. Open a vterm window to the LPAR. Look at the left side of the title bar to get the vterm string, which will be in the form:

                        Partition number*Machine type-Model*Serial number

            2. From an SSH session on the HMC, create an empty file in /tmp with the vterm string as the filename. In this example:

            3. $ touch /tmp/004*7040-681*020153A (Note: The serial number in the filename is case sensitive. It must match the string from the vterm title bar exactly.)

            4. Close the vterm, then open it back up and proceed with troubleshooting (activate the LPAR, etc.).

            5. As soon as the LPAR outputs information to the vterm window, check the size of the file in /tmp on the HMC and verify that it has been written to.

            6. If it has not been written to, double check the name of the file against the title bar. If they match, try performing a Close Terminal Connection operation to force it closed, then open it again. This should cause it to start logging to the file.

            7. When logging is no longer needed, close the vterm. Send the output of the file in /tmp on the HMC to your remote Support Center. Next, remove the file from /tmp, if you fail to do so, it will log everything written to the vterm whenever a vterm window is opened. And you will inadvertently fill up /var on the HMC at some point in the future.

            (Note: In an SP environment, use s1term <frame#> <slot#> | tee <logfile> instead. )

            “This is documented in the “HMC Installation and Operations Guide: Appendix D:Using scripts to connect remotely.”

Do you use one of these methods to log console output when you are working on machines, or do you prefer another way?

More Twitter discussion from @robmcnelly:

@mr_nmon After RHEL6.4 install on Power, only took 5 mins as root to get #PowerVC running. Any good #AIX admin user can do it. Writing my hints now.

@sql_handle #IBMPower #AIX sql.sasquatch track – Things Fall Apart: Again my Turn to Win Some or Learn Some http://bit.ly/IhdimK

@LindaGrigoleit #powersystems Academic Initiative is going strong and growing http://bit.ly/1ffbE3r #ibmi #aix #powerlinux #ibmacademicinitiative

RT @cgibbo PowerHA & mountguard via @wmduszyk http://www.wmduszyk.com/?p=10643&langswitch_lang=en … #AIX #PowerHA

@cgibbo LPM and viosecure.

 https://www.ibm.com/developerworks/community/blogs/cgaix/entry/lpm_and_viosecure?lang=en … #AIX #PowerVM

@brian_smi Little Known Feature in #SMIT Install Software Screen on #AIX https://www.ibm.com/developerworks/community/blogs/brian/entry/little_known_feature_in_smit_install_software_screen …

@mr_nmon #PowerVC 1.2 hands-on demo video by Greg Hintermeister showing the new wave of #PowerSystems LPAR management http://www.youtube.com/watch?v=RFTbC6JW7YE&feature=em-uploademail 

Following Up on the Technical University Event

Edit: Some links no longer work. I do not see much paper at events these days.

Originally posted November 19, 2013 on AIXchange

This article got me thinking about my own experiences at the recent IBM Power Systems Technical University conference. I always say that if you only get to go to one IBM education event during the year, this is the one to attend. There’s always something new to learn, and if you find yourself in a session that isn’t at the right skill level for you (too easy/too hard/too whatever), you can easily get up and go to another room down the hall.

In my case I had the chance to spend some time during the conference with IBMer Chris Gibson. I’ve known Chris for years, but since he lives in Australia, this was our first chance to say hello in person. It was also a great opportunity to actually hear IBM “rock stars” like Nigel Griffiths (@mr_nmon on Twitter) and Jay Kruemcke (@chromeaix) during their sessions. Thursday night’s “meet the experts” (or “stump the chumps,” depending on who you ask) event is always a highlight, as attendees answer all sorts of questions from the audience in a freewheeling discussion where customers can get more insight into IBM’s future plans.

One interesting change this year was that IBM distributing an electronic version of the session planner. In years past, upon registering you were handed a booklet containing the whole week’s sessions at a glance. This year we got a URL and login info so we could do everything electronically. Android users could even install an application.

The electronic approach had pros and cons. One nice thing was that information about session changes (additions, drops, changes in venue, etc.) were sent immediately to attendees’ handheld devices. Providing session feedback from my phone was also a snap. Using the directory of attendees, feedback could also be shared with conference goers and conference organizers. In addition, the electronic version made it very easy to search for topics, presenters, etc.

Some downsides to this approach included limited bandwidth and spotty network coverage at the venue. I found cellular coverage to be lacking in some areas. I imagine though that most attendees had at least one phone and one tablet/laptop with them — getting the wifi working fairly quickly wasn’t a huge issue. However, at times it took awhile to load session data , and frankly, I just missed having my usual paper copy.

Apparently I wasn’t the only one, because later in the week, hard copies for each day’s sessions were made available at the registration desk. I like the paper schedule because I can look at the grid, circle the sessions I’m interested in, and know at a glance where I’m headed in the next hour. The paper schedule also offers the added benefit of making it easy to track down sessions slides that interest me. This year there was an index file to help us find specific sessions.

Obviously the trend toward eliminating paper is well-established, but it’s still good for some things. It will be interesting to see how IBM approaches this at future events.

Striking a Balance Between the Command Line and GUIs

Edit: Some links no longer work. Still a good discussion.

Originally posted November 12, 2013 on AIXchange

There’s been some recent discussion and debate about the benefits of using the command line (or green screen, in IBM i parlance) as opposed to GUIs.

Despite the wishes of some, the green screen is still with us. Despite the perceptions of others, the GUI is more than just a pretty interface.

In my case, I’ve been doing more work that involves connecting external storage with IBM i systems. What’s interesting to me is that many (though certainly not all) of the IBM i guys I’ve been dealing with strongly prefer the HMC GUI to the VIO server command line when they’re configuring virtual networking and virtual storage. As more and more can be done straight from the HMC GUI, these folks find that they no longer need to login as padmin. Then they question why they should even bother to learn the commands. Finally, if they never need to touch VIOS after installation, they wonder why they should bother even learning it when they can just call some AIX guy like me if they ever need help.

I can see the merit in this argument, and my counterargument that it’s good to know what’s going on under the covers is generally met with a look of, “I’m just going to call you anyway, so let’s talk about something else.”

This really got me thinking. Just what makes certain people inclined to favor the command line in IBM i over GUIs (and vice versa)? I’m sure familiarity has something to do with it. I saw my first AS/400 command line and first logged in as QSECOFR in the late 1980s. Today I can logon and do practically everything I could do then in the exact same way. If I’d spent my whole career working on servers from AS/400 up to Power Systems with IBM i, why would I want to learn anything else?

By forcing folks like this to learn VIOS, we’re taking them way out of their comfort zone into the crazy upside-down world of UNIX. This isn’t just about IBM i people, though. Other UNIX pros can be driven batty when they’re forced to use the korn shell with set –o vi and stty erase ^? and oem_setup_env and all of the other quirky nonsense many of us take for granted because we use it every day.

While I’m neither shocked nor surprised when someone tells me that they prefer the HMC GUI, it does make me pause and consider. Do IBM i professionals REALLY prefer the command line? The biggest command line backers I know hate mouses and will go to any length to avoid using them. Were this truly a command line vs. GUI argument, then IBM i guys would make the effort learn how to login as padmin just to get out of pointing and clicking on an HMC GUI.

Maybe the truth lies somewhere in the middle. Maybe the IBM i pros do understand that GUIs have their advantages in certain situations. In general though, they still prefer to do things the way they’ve always done them.

I’m the same way. I’d rather login as padmin or root and get my work done on the command line. I find it faster, and I believe it gives me more control of and greater insight into what’s going on with the configuration. But another part of it is that I’ve always preferred logging into command lines when administering any machine. It’s what I’ve always done. The HMC GUI came along — not to mention IBM Systems Director and myriad other tools to manage machines — but I kept doing things the way I did them.

Is this because I’m resistant to change, or is it truly because I’m using the right tool for the job?

A Closer Look at PowerVC

Edit: Some links no longer work.

Originally posted November 5, 2013 on AIXchange

During the recent Enterprise2013 event, IBMers Glen Corneau and Bill Miller gave a nice presentation on PowerVC, the new virtualization management product.

We were also able to see a live demo. The understanding was that this isn’t GA code that will be present once the product is actually released, but what we saw represented the functionality that’s expected to be delivered.

I noticed the following in one of the slide decks: “with PowerVC you can register physical hosts, a storage subsystem, and network resources and use them to create a virtual environment. You can create, resize and attach volumes to virtual machines. You can monitor the utilization of the resources in your environment, and you can migrate virtual machines while they are running. You can capture a virtual machine that is configured the way you want it to be.”

PowerVC is designed to be simple to install and configure, while providing an intuitive user interface. What we saw in Orlando reminded me of the V7000 or XIV interfaces (perhaps you’re familiar with them).

PowerVC is built on OpenStack, which includes open APIs that are designed to provide flexibility and agility. There are two editions, Express and Standard. Express is meant for IVM-managed servers, while standard is intended to be used on HMC managed servers. POWER6, POWER7 and POWER7+ are all supported. On the OS side, AIX and PowerLinux are currently supported, while IBM says IBM i will be supported in a future PowerVC release.

PowerVC is run on top of a RHEL 6.4 OS image running on Power or x86 with a minimum of 8 GB of memory, two virtual uncapped CPUs with a minimum of one entitled CPU (two entitled CPUs are recommended) along with 40 GB of disk (or more if you plan on importing many .iso images). For now, the IBM SVC storage family — SVC, V7000, V3700, V3500 — with V6.4 or later code must be used. PowerVC is currently is limited to one managed storage subsystem.

The PowerVC server must be able to talk across the network to the storage, fabric and IVM/VIOS LPARs. This product does not install VIOS for you; it assumes that IVM/VIOS is already configured and installed.

With the Express edition, IVM 2.2.1.5 or later is required. You can run Virtual SCSI only, and your storage must be pre-zoned. This edition has a limit of five managed hosts and a maximum of 100 LPARs.

Standard edition requires HMC V7.7.8 or later running on CR5/C08 or later HMC models. You must be running VIOS V2.2.3 or later. It supports NPIV only, and Brocade switches only. (Note: Hopefully more vendors’ storage and network products will be supported in future releases, but we need to be sure to let those vendors know that they need to provide APIs to OpenStack.) Ten managed hosts and 40 LPARs per host are allowed, for a maximum of 400 LPARs.

The charts from the presentation in Orlando also detail the differences between VMControl V2.4.3 and   PowerVC V1.2.

            VMControl:

            _ Supports AIX, Linux and IBM i

            _ Supports IVM and HMC

            _ Suspend/Resume workloads

            _ Remote Restart workloads

            _ LPM to host or pool

            _ Virtual SCSI and NPIV (with appropriate storage+SAN)

            _ VIOS Shared Storage Pool support

            _ Requires IBM Systems Director

            _ Supports NIM-based, SCS-based and SSP capture/deploy

            _ Supports IBM DS8000-family, XIV, SVC-family, DS storage

            _ Limited third party disk support

            Use VMControl if you are looking for the following capabilities across multiple platforms:

            • Cross-platform management, navigation and look and feel

            • Management of multi-workload system pools, Virtual Image Versioning management

            • System Pool creation and Manage workload availability end to end

            • Supports NIM, SCS and Shared Storage Pool deployment environments

            • Supports NPIV and VSCI environments for XIV, SVC, V7000, and DS8000

            • Requires IBM Systems Director as a base

            PowerVC

            _ Supports AIX, Linux

            _ IBM i is a statement of direction

            _ Supports IVM and HMC

            _ LPM to host

            _ Virtual SCSI and NPIV (with appropriate storage+SAN)

            _ Built on OpenStack, no IBM Systems Director dependency

            _ Supports SCS-type image capture/deploy

            _ Supports ISO images

            _ Supports SVC-family storage

            _ Modify resources during deploy

            Use PowerVC if you want to manage Virtual Machines on Power running Linux or AIX:

            • PowerVC is advanced virtualization management for Power Systems

            • Fast time to value and quick integration with SmartCloud bundle

            • PowerVC initial offering supports NPIV, V7000 or SVC and Brocade switches

            • Virtual Machine Image management, deployment, relocation, capture and creation

            • Create and manage virtual machines, automate workload and resource provisioning

            • Offered standalone or as part of SmartCloud bundle and AIX Enterprise Edition

Learn more about PowerVC:

            PowerVC on the Web

            PowerVC on Service Management Connect

            PowerVC Prototype Demo

On another note, I saw this on a mailing list:

            Flex System Manager v1.3.1: for Android and iOS1.3.1
            This release went live Friday, October 12, 2013.

Android: https://play.google.com/store/apps/details?id=com.ibm.msm.android

iOS: https://itunes.apple.com/us/app/ibm-flex-system-manager-for/id576901013?ls=1&mt=8

And finally, more Twitter discussion from @robmcnelly:

@chromeaix 9h #Oracle documented their policies for software licensing See http://www.oracle.com/us/corporate/pricing/software-investment-guide/index  …

RT @cgibbo RT @mymindspace: “Best Practice” recommendation for #AIX Virtual Memory Manager settings for #DB2: http://www-01.ibm.com/support/docview.wss?uid=swg21328602 …

RT @chromeaix IBM Software PVU value for any Power Systems core running Linux is now set to 70 #powerSystems #linux http://ow.ly/qnJD9

@IBMRedbooks 22h IBM Power Systems – it is all made by one company! Read our new blog post here: http://ibm.co/16NxOSd  #PowerSystems

RT @BreakingNews FAA: Airlines can safely expand passenger use of portable electronic devices during all phases of flight – @NBCNews

@IBMPowereSupp 30 Oct This is what we were showing at #Enterprise2013 RT @IBMPowereSupp: Support Portal is out and it’s not scary at all! http://support.ibm.com

@attritionorg 29 Oct Telnet of the Day: 107.21.219.86

@IBMRedbooks 29 Oct William Lowe, the ‘father of the IBM PC,’ dies at 72: http://cnet.co/1irimjS

@UnixToolTip 29 Oct “Pointing and clicking does not scale.”

Getting Started with CoD

Edit: Some links no longer work. Luckily COD activiation is much easier these days.

Originally posted October 29, 2013 on AIXchange

Hopefully you read Charlie Cler’s article in the September issue of IBM Systems Magazine about the various Capacity on Demand (CoD) options.

Here’s a CoD question I often get: How do you activate it?

The first step is to get the necessary activation codes from IBM — and yes, you’ll likely be getting multiple codes. Recently a customer I was working with got three codes: one to activate more memory, one to activate more processors and one to enable PowerVM Enterprise edition, VIO servers, micropartitioning, etc., on the newly activated processors.

IBM ships the codes on paper, so the hardest part of the whole exercise is making sure the 34 characters of the activation codes are correctly entered on your HMC. The document you receive from IBM displays the system type, the serial number, the Anchor card CCIN, the Anchor Card serial number and the Anchor card unique identifier. You’ll also see how many previously activated resources you had and how many you’re activating. This can be a nice sanity check to ensure that the order went through as expected.

I’ve had customers mistakenly believe they could activate resources that don’t yet exist in the machine. However, this isn’t magic; the hardware must first be installed. In many cases the needed resources are installed with new customer systems, but not immediately activated.

Once you have your activation code(s), go to your HMC, select the server you want to work on and then select the Capacity on Demand task. There’s a spot on this menu where you can view the history log — which displays all of the dates and times of the various resources you’ve activated — as well as a place to enter your new CoD code.

There are also places where you can view capacity settings for your resources — including, for example, inactive CoD memory, permanently activated memory, temporarily activated memory and installed memory.

The same menus are available for processors. You’ll also find options for Enterprise Enablement. Further down the screen is an area for PowerVM and Other Advanced Functions.

Many customers expect their environments to grow and can logically assume that they’ll eventually need new resources, but in these cases it can be difficult to pinpoint when the need will require action. CoD can be a great way to prepare for the unknown.

Note: We just finished up the IBM Power Systems Technical University at Enterprise2013 event. I attended tons of great sessions and met and talked with many IBM presenters as well as a number of readers of this blog. Incidentally, the 2014 Technical University will be at the Venetian in Las Vegas on Oct. 6-10, so add it to your calendar now and start making plans to attend. I hope to see you there.

Finally, a few highlights from Twitter. Follow @robmcnelly, and check out #ibmenterprise for tweets related to the conference.

RT @ElReg They’ve taken my storage hostage … now what?: How one user device nearly brought down the business. http://bit.ly/19Ac9jG

RT @cgibbo RT @IBMPowerSystems: #PowerSystems tip of the day: seastat search option http://ibm.co/16vH3pS

RT @chromeaix #powersystems #AIX IBM PowerHA SystemMirror rapid deploy cluster worksheets for IBM AIX http://ow.ly/2AXYeJ

RT @cgibbo RT @IBM_FLRT: Check out new #FLRT Lite https://www14.software.ibm.com/support/customercare/flrt/liteHome… Quick and easy recommendations at your fingertips!

RT @scalzi Old School (via @Reddit): http://i.imgur.com/g0Zgf6q.jpg

The Power of the HMC Command Line

Edit: Some links no longer work.

Originally posted October 22, 2013 on AIXchange

When using the HMC, do you do more with the GUI or on the command line? The more systems you’re managing and the more operations you’re doing, the more you’ll benefit by getting comfortable with the HMC command line.

While I like new commands such as lsnportlogin and chnportlogin, the HMC command line itself isn’t new. For instance, this article from 2008 has some handy tricks. And to give you an idea of the wealth of useful information here, I’ll include the list of contents:

            HMC Management

                HMC Version

                Network configuration of the HMC

                Reboot the HMC

                How to change the HMC password (of user hscroot)

                Show Available Filesystem Space

            LPAR Management: Status Information

                LPAR Status

                Show Status and LED/LCD Display of an LPAR

                Show Status and LED/LCD Display of a Systems Running in FullPartitionMode

                Overview LPAR IDs

                Overview Connection State

                Show a List of all I/O Adapters

                Overview DLPAR status

            LPAR Management: Operations

                Soft Reset of an LPAR

                Soft Reset of a Systems Running in FullPartitionMode

                Hard Reset of an LPAR

                Hard Reset of a Systems Running in FullPartitionMode

                Virtual Console

                Activation of an LPAR

                How to boot an LPAR into SMS Menu

                How to Power on a System Running in FullPartitionMode

                Bring the key switch to position NORMAL

            LPAR Configuation

                Change an LPAR’s Name

                Rename a Managed System

                DLPAR: Increase the Number of Processing Units of an LPAR

            Operations in an virtualized environment

                Make virtual WWPNs visible to the SAN

                Show all virtual WWPNs assigned to an LPAR

                Logout virtual WWPNs from the SAN

Here are just a few things you can do from the HMC command line:

* Would you like to see all of the managed systems that are connected to your HMC? Run:

            lssyscfg -r sys -F name

* Perhaps you need to know which LPARs are on your machine and whether or not they’re running:

            lssyscfg -m Server1 -r lpar -F name:state

* This handy command lists every machine connected to your HMC, and tells you whether or not the LPARs on these devices are running:

            for m in $(lssyscfg -r sys -F name); do echo $m ; lssyscfg -r lpar -m $m -F name:state ; done

* Maybe you want to know the machine name, along with the IP address the service processor is using, and whether or not it’s connected to the HMC:

            lssysconn -r all -F type_model_serial_num:ipaddr:state | sort

* Maybe you want to see which I/O devices are assigned to which LPARs:

            lshwres -r io -m Server1 –rsubtype slot -F lpar_name:drc_name:description

* Or perhaps you want to see the profile information for your LPAR 1:

            lssyscfg -r prof -m Server1 –filter “lpar_ids=1”

* Another command I like is lssyscfg, which helps you determine all of the wwpns associated with your LPAR:

            lssyscfg -r prof -m Server1 -F virtual_fc_adapters –filter lpar_names=lpar1

This command would provide this output:

“””2405/client/3/vios2/2405/c0507606b5ef0012,c0507606b5ef0013/0″”,””1605/client/2/vios1/1605/c0507606b5ef0010,c0507606b5ef0011/0″”,””2605/client/3/vios2/2605/c0507606b5ef0014,c0507606b5ef0015/0″”,””1405/client/2/vios1/1405/c0507606b5ef0016,c0507606b5ef0017/0″””

* With this command, you can easily see what the adapter numbers are and which VIO server they’re connected to. Obviously you could change what you’re filtering on; in this case we’re just looking it up via LPAR ID number rather than the LPAR NAME:

            lssyscfg -r prof -m Server1 -F virtual_fc_adapters –filter lpar_ids=8

* Maybe you want to list every WWPN for every LPAR on your machine with its default profile:

            lsnportlogin -m Server1 –filter “profile_names=default”

* Or maybe you really just want the WWPNs without other information included:

            lsnportlogin -m Server1 –filter lpar_names=lpar1 | cut -c 68-88

wwpn=c0507602c5340034

            wwpn=c0507602c5340035

            wwpn=c0507602c5340042

            wwpn=c0507602c5340043

            wwpn=c0507602c5340044

            wwpn=c0507602c5340045

            wwpn=c0507602c5340030

            wwpn=c0507602c5340031

* Maybe you want to list out the LPAR names with the WWPNs:

            lssyscfg -r prof -m Server1 –filter lpar_names=lpar1 -F lpar_name,virtual_fc_adapters

* Or you could check every frame connected to your HMC with something like this:

            lssyscfg -r sys -F name |
            while read M; do lshwres -r virtualio –rsubtype fc –level lpar -m $M -F lpar_name,wwpns|
            sed ‘s/^/’$M,’/’
            done

* This loop is used to login the virtual fibre adapters of all of the LPARs on a frame:

            for i in `lssyscfg -m Server1 -r lpar -F name`; do echo $i;chnportlogin -o login -m Server1 -p $i ; done

There’s much more of course, but this should give you an idea of the power of the HMC command line.

Finally, some interesting links this week courtesy of those I follow on Twitter:

@cgibbo Dynamic Platform Optimizer with Tracy Smith. October 31, 2013. Register now. https://www1.gotomeeting.com/register/214938672 … https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Power+Systems/page/AIX+Virtual+User+Group+-+USA … #AIX

RT @UnixToolTip RT @jpmens: Unix Recovery Legend http://www.ee.ryerson.ca/~elf/hack/recovery  …

@mr_nmon IDC Whitepaper on security of #PowerVM Virtualisation http://public.dhe.ibm.com/common/ssi/ecm/en/pol03175usen/POL03175USEN.PDF … Double standards: POWER=mission critical & x86 anything goes!

@ibmperformance Oracle’s hardware business may be worse than we thought http://gigaom.com/2013/10/14/oracles-hardware-business-may-be-worse-than-we-thought/ … via @gigaom

@chromeaix #powersystems #AIX Using the NIM service handler with the NIM Alternate Disk Migration tool http://ow.ly/2ADpjg

Three Scripts

Edit: Some links no longer work.

Originally posted October 15, 2013 on AIXchange

I’ve seen a few good scripts lately. Brian Smith mentions a couple of them in his blog. This one will show you if your HBA/hdisk settings are actually in effect.

            “There are several storage related settings in AIX that cannot be changed if the device is active. These include “fast_fail,” dynamic tracking (dyntrk), and the “num_cmd_elems” for HBAs and the Queue Depth for hdisks.

            “Your options to set these are either make the device inactive (usually by taking redundant paths offline) and then make the change, or to use the “-P” flag on chdev and then reboot the server to make the change effective at the next boot.

            “The “-P” option on chdev has one major drawback however. As soon as you make the change with chdev “-P”, it appears that the setting is active right away even before the reboot. If you check with “lsattr”, it will appear as if the setting has taken effect. However it actually won’t take effect until the next reboot. What has essentially taken place is that the running configuration is out of sync with the ODM. The ODM reflects the updated settings, however they can’t be changed in the running configuration of the AIX kernel until the next reboot.”

Brian’s other script shows the location of every physical partition on each hdisk:

            “… The output shows which Logical Volume (LV) is on each of the PPs (or if it is free space). The output is color coded so each LV has its own color so that it is very easy to see where each LV physically is across the entire Volume Group. You can specify the number of columns of output depending on the size of your screen.

            “The intended use of the script is to show a visual representation of the Volume Group to make using commands which move around LPs/PPs such as migratelp easier to use, to make LVM/disk maintenance easier, and also as a learning tool.”

Finally, IBMer Dean Rowswell sent me the following script. He explains, “Lately I’ve been working with customers who are still using virtual SCSI, so I updated my old script with some new information. Maybe others will use this, too. It helped me to quickly and easily set the MPIO path priorities to balance I/O across the VIOS.”

It runs on an AIX LPAR. Here’s some sample output:

root@nim:/:# get_vdisk_path_priority
———————
Virtual SCSI adapters
———————
U9117.MMA.06XXXXX-V6-C41  Virtual I/O Slot  vscsi1
U9117.MMA.06XXXXX-V6-C42  Virtual I/O Slot  vscsi2

Attributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: vscsi1 ->
fast_fail 30
Attributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: vscsi2 ->
fast_fail 30

VDISK: hdisk0     ADAPTER: vscsi2  (MPIO priority: 2)   ADAPTER: vscsi1
(MPIO priority: 1)    VG: vgNIM1
VDISK: hdisk2   ADAPTER: vscsi1  (MPIO priority: 1)   ADAPTER: vscsi2
(MPIO priority: 2)     VG: rootvg
VDISK: hdisk3     ADAPTER: vscsi1  (MPIO priority: 1)   ADAPTER: vscsi2
(MPIO priority: 2)     VG: vgNIM2
VDISK: hdisk7     ADAPTER: vscsi2  (MPIO priority: 2)   ADAPTER: vscsi1
(MPIO priority: 1)     VG: None

Now for Dean’s script: 

#!/bin/ksh

# Created by Dean Rowswell, November 8, 2011

# Modified by Dean Rowswell, September 5, 2013

#       Combine all paths into a single line with the hdisk

# Modified by Dean Rowswell, October 2, 2013

#          Add the volume group info for each hdisk

#         Display the vscsi adapter attributes

# This script will display each virtual scsi disk path priority info

VDISKS=`lsdev -Cc disk -Sa -s vscsi -F name`

if [ ${#VDISKS} -eq 0 ]

then

        echo “There are no Virtual SCSI disks on this system”

        exit 0

else

            DATE=`date +’%Y%m%d_%H%M%S’`

        echo “———————“

        echo “Virtual SCSI adapters”

        echo “———————“

        lsslot -c slot|grep vscsi

            for VSCSI in `lsdev -Ccadapter -Sa -F name|grep vscsi`

            do

                           echo “\nAttributes (vscsi_err_recov and vscsi_path_to) for VSCSI adapter: ${VSCSI} -> \c”

                        lsattr -El ${VSCSI} -a vscsi_err_recov,vscsi_path_to -F value | tr ‘\n’ ‘ ‘

            done

            echo

            lspv >/tmp/lspv.${DATE}

            for VDISK in ${VDISKS}

        do

                echo “\nVDISK: ${VDISK}\t\c”

                LSPATHS=`lspath -F ‘parent:connection’ -l ${VDISK}`

                for LSPATH in ${LSPATHS}

               do

                        PARENT=`echo ${LSPATH} | awk -F: ‘{print $1}’`

                        CONN=`echo ${LSPATH} | awk -F: ‘{print $2}’`

                        echo ”  ADAPTER: ${PARENT}  (MPIO priority: `lspath -AE -l ${VDISK} -p ${PARENT} -w ${CONN}|awk ‘{print $2}’`) \c”

                done

                        VG=`grep -w ${VDISK} /tmp/lspv.${DATE} | awk ‘{print $3}’`

                        echo “\tVG: ${VG}\c”

        done

fi

rm /tmp/lspv.${DATE}

echo

More conversation from Twitter (@robmcnelly):

@ROOTvgNET

ROOTVG – AIX & POWER Portal – Stream AIX AUDIT into SYSLOG | Do more with AIX Audit and Syslog! http://www.rootvg.net/content/view/575/1/ … via @ROOTvgNET

‏@chromeaix #powersystems #AIX Using the NIM service handler with the NIM Alternate Disk Migration tool http://ow.ly/2ADpjg

@cgibbo Creating a System Copy WPAR. https://www.ibm.com/developerworks/community/blogs/cgaix/entry/lpar_to_wpar_migration?lang=en … #AIX #WPARS

@ibmvlp Download the Quick Reference mobile app for IBM Power Systems http://www-03.ibm.com/systems/power/resources/mobileapp/index.html …

@power_gaz >100 on “Tricks of the Power Masters” webinar http://tinyurl.com/PowerSystemsTechnicalWebinars … including @cgibbo and @chmod666 Thanks #ibmpowersystems #aix #power7

@UnixToolTip One advantage of visually dull environments like the command line or an editor is that there isn’t much to do there but work.

@mr_nmon 9 Oct Ten Things POWER & AIX Techies need to know from the IBM Announcements 8th Oct 2013 See my #AIXpert blog https://www.ibm.com/developerworks/community/blogs/aixpert/entry/ten_things_power_techies_need_to_know_from_the_ibm_announcements_8th_oct_2013 …

@cgibbo: @brian_smi new -T flag for mksysb command.” This does a JFS2 snapshot & backs that up for a time consistent fileset.

@cgibbo 7 Oct Simplified Shared Ethernet Adapter Failover config. Removes requirement for ctrl channel for SEA FO. #PowerVM #AIX http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=760&letternum=ENUSJP13-0509 …

@cgibbo 7 Oct New lsattr P option to display attributes that may not yet be in effect on running system. AIX 7.1 TL3 & 6.1 TL9 #AIX http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=760&letternum=ENUSJP13-0508 …

@zootman HMC v 7.8 – Tracking of DLPAR activity within the current profile enables reactivation of a LPAR with all changes intact. YAY!

PowerVP, PowerVC Highlight Today’s IBM Announcements

Edit: Some links no longer work.

Originally posted October 8, 2013 on AIXchange

Today’s announcements have quite a bit of interesting information, but two new products that I’m especially excited about are PowerVP and PowerVC.

PowerVP uses a graphical Java client to monitor virtual workloads, which of course can be much more complex than the workloads we once managed on standalone systems. With virtualized systems, performance issues can come from physical hardware, the VIO server, the client LPAR or something outside of the frame entirely.

Once you install an agent on the supported version of your operating system and update your POWER7 or POWER7+ systems to the proper level of firmware, you’ll have the option of viewing system performance in real time or in DVR mode. The latter allows you to read log files to determine how your system was performing earlier. Data is collected on the Windows, AIX or Linux workstation on which the PowerVP monitor runs.

PowerVP allows you to drill down and view specific memory DIMM or CPU usage. You can also see the hardware adapters you’re using and how heavily these components are being utilized. In short, PowerVP provides an overall view of your hardware so you can see how your machines are consuming resources. GA is expected on Nov. 15.

PowerVC is advanced virtualization management. I think of it as a simplified version of the VMcontrol plugin for System Director. This product is based on OpenStack, an open source solution that provides cloud infrastructure capability. IBM wanted to make PowerVC simple to install, simple to configure and simple to use. On a recent briefing call, IBM said it brought customers to its labs and recorded them using the product. These sessions indicated that PowerVC requires minimal training and is very intuitive and self-explanatory.

This video shows a prototype of the code from June 2013. Additional information can be found here and here. PowerVC GA is expected on Nov. 22.

Other announcement highlights include:

* The Power Integrated Facility for Linux — Power IFL is a new solution that allows customers to activate unused cores and memory to run Linux workloads on IBM Power hardware at a very competitive price point. In practice, customers will run these Linux cores in virtual shared processor pools that are separate from their existing AIX or IBM i shared processor pools.

* PowerSC is updated with Linux compliance automation and improvements in the trusted firewall.

* The new PowerVM version features shared storage pool enhancements and improved live partition mobility performance. In addition, new information in the VIOS Performance Advisor tool is going to be available to cover fibre channel, shared Ethernet adapters and shared storage pools.

* New Power Enterprise pools are designed to allow for flexibility for IBM clients. Customers will be able to purchase virtual processors and memory (CUoD resources) that may be shared within a defined pool of enterprise-class Power servers. Applications may be reallocated within that pool of servers as needed with live partition mobility.

* AIX 7.1 TL3 and 6.1 TL9 feature enhanced live backup support and provide better LDAP support for users and groups.

As you learn more about these announcements, what stands out to you?

Finally, more highlights from Twitter (@robmcnelly):

RT @cgibbo RT @Fed67j: A new #hmcScanner version is available with graph: http://ibm.co/1489RC8

@mr_nmon Demo’s are so 1990s! I captured everything I know on #SystemsDirector in YouTube videos=4 hours/14 parts: https://www.ibm.com/developerworks/community/blogs/aixpert/entry/systems_director_6_3_demonstrations?lang=en … #POWER7

RT @cgibbo RT @mohakevin: #AIX – LVM #cheatsheet http://adri.ws/jdwq4

RT @brian_smi Script to show if your #AIX HBA / hdisk settings are actually in effect

 https://www.ibm.com/developerworks/community/blogs/brian/entry/compare_hba_settings …

@sandy_carter Wow! Blogs are 63% more likely to influence purchase decisions than magazines. (Source: Optimind) #socbiz @PamMktgNut #ibmsocialbiz

@IBMRedbooks Watch our new IBM Power 710 and 730 Technical Overview and Introduction video here: http://youtu.be/Jq565b_2fks  #PowerSystems

@AIXmag VIDEO: @robmcnelly demonstrates how to create sysplans from the HMC & run the system-planning tool on your PC. #AIX

RT @Greater_IBM 5 Ways To Become An #IBM Champion (Oct 15 Deadline): http://wp.me/p2kcos-2wm #ibmchampion #developerworks #ITLeader

Logging a Client Virtual Fibre Channel Adapter into a SAN

Edit: Some links no longer work.

Originally posted October 1, 2013 on AIXchange

Recently a customer was presenting some LUNs to some NPIV clients on a server. There were many LUNs and many clients, and the SAN guys wanted all of them to appear on the switch so they could begin zoning them.

I remembered reading this Chris Gibson article about the chnportlogin and lsnportlogin commands that you can run on the HMC command line:

            “There are two new HMC (V7.7.3.0) commands that can force a client Virtual Fibre Channel adapter to log into a SAN. This should make the life of the AIX and SAN administrator easier, as they will no longer need to install AIX in order for the new VFC adapters to log into the SAN. Although there was an unsupported method* for doing this already (see links below). Nor will the SAN admins need to “blind” zone the WWPNs.”

For instance, we could run…

            chnportlogin –m Server1 –o login –id 84

            lsnportlogin -m Server1 –filter lpar_ids=84

… and see something like the code in this Word document:

This output was also helpful in that it provided the wwpn information for each client.

One of the LPARs had issues, so we were able to log out, log back in and then verify the information with these commands:

            chnportlogin -m Server1 -o logout –id 72

            chnportlogin -m Server1 -o login –id 72

            lsnportlogin -m Server1  –filter lpar_ids=72

For usage information you can just run the commands on the HMC command line:

            chnportlogin

            Usage: chnportlogin -o login | logout

                                -m

                                 -p | –id

                                [-n ]

                                [-w ]

                                [-d ]

                                [-v]

                                [–help]

This performs N_Port login and logout operations for virtual fibre channel client adapters that are configured in a partition or a partition profile:

    -o                  – the operation to perform:

                            login  – log in the virtual fibre channel client adapters

                            logout – log out the virtual fibre channel client adapters

    -m – the managed system’s name

    -p – the name of the partition for which the operation is to be performed

    –id – the ID of the partition for which the operation is to be performed

    -n    – the name of the profile for which the operation is to be performed

    -w      – the maximum time, in minutes, to wait for VIOS commands issued by the management console to complete

    -d – the level of detail to request from VIOS commands issued by the management console – values range from 0 (none) to 5 (highest)

     -v                  – enables verbose mode

     –help              – prints this help

lsnportlogin

Usage: lsnportlogin -m

                    [-w ]

                    [-d ]

                    –filter “”

                    [-F []]

                    [–header]

                    [–help]

This lists WWPN login status for virtual fibre channel client adapters.

    -m       – the managed system’s name

    -w           – the maximum time, in minutes, to wait for VIOS commands issued by the management console to complete

   -d         – the level of detail to request from VIOS commands issued by the management console –  values range from 0 (none) to 5 (highest)

    –filter “” – filters the WWPNs to be listed. The syntax is:

                               “filter_name1=value,filter_name2=value,…”

                                 or

                                  “”filter_name1=value1,value2,…”,…”

                                 Valid filter names are:

                                  lpar_ids, lpar_names, profile_names 

     -F []   – delimiter separated list of the names of the attributes to be listed. If no attribute names are specified, then all attributes will be listed.

    –header                 – prints a header of attribute names when -F is also specified

    –help                   – prints this help

Chris’s article cites this additional information:

            * lsnportlogin

            * chnportlogin

            * How to force a vfc-client device to log in to the SAN (The OLD way)!

            * How to Capture SAN Boot Debug for Virtual I/O Server and AIX on P6 Systems

            * Disk path design for AIX including SAN zoning     

Be sure to read the article, and give the instructions a try the next time you’re setting up NPIV.

More highlights from Twitter (@robmcnelly):

RT @cgibbo RT @nixysug: Got sendmail ipv6 errors in syslog?  http://nixys.fr/blog/?p=1260

@ibmperformance 25 Sep Guns and Butter at OpenWorld http://wp.me/p1lgsI-bm

@mr_nmonn 25 Sep Oct 1st #IBM opens 4th #LinuxOnPOWER Centre in Montpellier, France, to cover Europe for briefings, architecture design & HW for users & ISVs.

RT @ElReg Ellison ditches cloud keynote for billionaires’ boat race: Mass exodus after King Database snubs attendees http://bit.ly/15qRHOi

RT @chmod666: Want to reset a lost hscroot password. Add init=/bin/rcpwsh on kernel line in grub at hmc boot. #HMC

@brian_smi 23 Sep Update: Visualize the Physical Layout of an #AIX Volume Group https://www.ibm.com/developerworks/community/blogs/brian/entry/update_visualize_the_physical_layout_of_an_aix_volume_group …

RT @mr_nmon IBM #PowerSystems Announcements Oct 8th Completely new products, large new features & upgrades. Register http://bit.ly/SCwebcastUK

Restricting FTP Access

Edit: Some links no longer work.

Originally posted September 24, 2013 on AIXchange

A customer was trying to restrict user access to a particular directory on an AIX system when FTP was used. We came across two good options.First, I recalled this exchange on Twitter:

            sungokcho: RT @ibmaix: #AIX #tip to restrict ftp user to a given directory use /etc/ftpaccess.ctl. It is useful if the user connects via winscp (via @JuanMDia35)

This 2009 post covers the same thing. And here’s some detailed information:

            ftpaccess.ctl File

            The /etc/ftpaccess.ctl file is searched for lines that start with allow:, deny:, readonly:, writeonly:, readwrite:, useronly:, grouponly:, herald: and/or motd:. Other lines are ignored. If the file doesn’t exist, then ftp access is allowed for all hosts. The allow: and deny: lines are for restricting host access. The readonly:, writeonly: and readwrite: lines are for restricting ftp reads (get) and writes (put). The useronly: and grouponly: lines are for defining anonymous users. The herald: and motd: lines are for multiline messages before and afterlogin.             

The syntax for all lines in /etc/ftpaccess.ctl is in the form:

            keyword: value, value, …

            where you can specify one or more values for every keyword. You can have multiple lines with the same keyword. The lines in /etc/ftpaccess.ctl are limited to 1024 characters, anything more than 1024 characters will be ignored.

            The syntax for the allow: and deny: lines are:

            allow: host, host, …

            deny: host, host, …

           If an allow: line is specified, then only the hosts listed in all the allow: lines are allowed ftp access. All other hosts will be refused ftp access. If there is no allow: line, then all hosts will be given ftp access except those hosts specified in the deny: line(s). The host can be specified as either a hostname or IP address.

            The syntax for the readonly:, writeonly: and readwrite: lines is:

            readonly: dirname, dirname, …

               writeonly: dirname, dirname, …

            readwrite: dirname, dirname, …

            The readonly: lines list the read-only directories and the writeonly: lines list the write-only directories. Read access is denied in a write-only directory and write access is denied in a read-only directory. All other directories are granted access except when a readwrite: line is specified. If a readwrite: line is specified, only directories listed in the readwrite: line and/or listed in the readonly: line are granted access for reading, AND only directories listed in the readwrite: line and/or listed in the writeonly: line are granted access for writing. Also, these lines can have a value of “ALL” or “NONE”.

            The syntax for the useronly:, puseronly:, grouponly:, and pgrouponly: lines is:

               useronly: username, username, …

            puseronly: username, username, …

            grouponly: groupname, groupname, …

            pgrouponly: groupname, groupname, …

Although we found that we could control users with this method, we were looking to do more, so we researched vsftpd and were able to install packages from Perzl.org. (I wrote about installing packages from Perzl.org earlier this year.)                           

From this page we found that vsftpd “supports standard FTP and secure FTPS protocols. Built-in mechanisms allow implicit and explicit mode of FTPS. Security is achieved by using of external SSL library, which simplify the source code of the server. An unusual feature is the ability to force anonymous connections through SSL encryption, thus increasing overall security of anonymous file transfers. SSLv1, SSLv2 and TLS protocols are provided. Optionally validation of client certificates can be configured. The access of users can be controlled by deny and enable lists. The server can be configured to generate detailed activity logs – the log format may be verbose or compatible with wu-ftpd format.”

In our case we edited the configuration file as follows:

            anonymous_enable=NO

            local_enable=YES

            ftpd_banner=”FTP Access”

            local_root=/tmp/transferfiles

            write_enable=YES

            secure_chroot_dir=/home/jail

            idle_session_timeout=3600

            file_open_mode=0777

            local_umask=022

This provided the functionality we were looking for.

Finally, some recent conversation from @rmcnelly on Twitter:

Chris Gibson ‏@cgibbo New VIOS tunables with v2.2.2.2.
https://www.ibm.com/developerworks/community/blogs/cgaix/entry/new_vios_tunables_with_v2_2_2_2?lang=en … #VIOS #AIX #PowerVM

Rob McNelly ‏@robmcnelly 21 Sep
Is string theory right? Is it just fantasy? Out of touch with reality?
http://www.youtube.com/watch?v=2rjbtsX7twc

Nigel Griffiths ‏@mr_nmon 19 Sep
FAQ4: Hostnames short or long? The answer is long and mandatory and don’t user underscore either See AIXpertBlog
https://www.ibm.com/developerworks/community/blogs/aixpert/entry/faq4_hostnames_short_or_long?lang=en

Chris Gibson ‏@cgibbo 18 Sep
What’s next from #Powersystems? Join us on October 8th to find out! http://www.ibm.com/smarter-computing/us/en/readynow/webcast.html …

Nigel Griffiths ‏@mr_nmon 18 Sep
Enterprise2013 = Power Technical Uni Orlando Oct21-25 New products will be explained SSP4, PowerXX & Power## http://www-03.ibm.com/systems/enterprise/ … CU there

Jay Kruemcke ‏@chromeaix 13 Sep Oracle ASM and IBM #FlashSystem best practices http://ow.ly/oQeDk

Nigel Griffiths ‏@mr_nmon 17 Sep
IBM pledges $1Billion for #Linux & specifically for Linux on POWER see Wall Street Journal blog
http://blogs.wsj.com/digits/2013/09/16/ibm-again-pledges-1-billion-to-a-linux-effort/ … PowerSystems #POWER7

Rob McNelly ‏@robmcnelly 13 Sep
You can get an #IBM badge if you hang around the offices long enough: http://imgur.com/uywR5QB

A Look at IBM Electronic Support

Edit: Some links no longer work.

Originally posted September 17, 2013 on AIXchange

In June, Julie Craft presented to the AIX Virtual User Group on the topic of IBM Electronic Support for AIX and Power Systems. Listen to the replay and learn more about the tools that are available for you to use.

One slide shows what’s covered in the presentation. The discussion areas are titled, “Prevent Problems and Stay Current, “Find Information,” “Download Fixes and Updates,” “Troubleshoot Problems,” “Work with IBM Support” and “Learn More.”

Right off the bat Julie states that IBM’s goal is to make it easy for customers to find what they need to do the work. Basically, IBM wants to make finding information simple enough so that customers don’t have to contact IBM. However, should the need arise, you can easily open the calls electronically.Julie mentions the IBM Support Portal (support.ibm.com). This site is meant to be the starting point of the IBM Support experience. It’s designed to centralize the various products that IBM supports and attempts to make the user experience more consistent.

Once you register, you can login and set up subscriptions, notifications, etc. Then you can select the product list that interest you. The presentation replay covers this in detail so be sure to watch it. You’ll learn about the notifications and alerts — like security advisories and new TLs and SPs — that registered users receive. You’ll also learn about delivery options — including daily or weekly emails or RSS feed.

The presentation continues with a closer look at the Support Portal. The documentation tab includes links to Information Centers, Redbooks, white papers, and more. From the downloads page customers can search by APAR, fix ID, arbitrary text, and you can include prerequisites and co-requisite fixes. IBM Support has updated Fix Central in an effort to make firmware and HMC codes easier to find. There aren’t as many menu items to navigate now.

Another portal feature is an entitlement check that allows customers to download fixes. Just enter your machine type and serial number. The various entitlement types are tied to the level of maintenance you have on your machines. Going forward, IBM will move toward making the capability to download fixes a privilege available primarily to paying customers.

The presentation also covers the fix level recommendation tool (FLRT), which can be used for both health checks that display current system fix levels and compares them to IBM recommendations. This can be used for firmware, software etc. Other Electronic Support options include:

* the capability to save an inventory/load a saved inventory,

* a VIOS to NIM master mapping tool that determines the AIX version needed to use NIM with VIOS, and

* a system software mapping tool that displays minimum support levels for AIX and VIOS. 

In addition, services requests and PMRs can be logged without contacting IBM Support. Check out this video for details, and register for service request here.

Speaking of videos, here’s one about customer replaceable units and performing operations on your hardware.

To summarize, the presentation includes this checklist for submitting AIX problems:

            1. Always check the error logs before you open a problem.
            2. Be clear and precise on the problem description and severity. Indicate exactly the error received (entire error code, LEDs, error report entries). Provide clear description of the problem and your environment – analysts do not know your environment nuances.
            3. Include all OS, fix and patch levels in the PMR with you open it. Include TL and service pack levels.
            4. The three most common data gathering tools are snap, zsnap or perfpmr. Snap comes preinstalled on AIX systems. You may be asked to download the latest versions of zsnap or perfpmr to gather additional data.
            5. Make absolutely sure you follow the naming standard in uploading data. Update the PMR with filename and location of where uploaded.
            6. Be sure to execute any steps given by the support analyst precisely. Deviating from what the analyst asks you introduces new variables into the problem determination and can delay resolution.
            7. Don’t be afraid to ask questions. If something is unclear or you are concerned about doing something the analyst asks you to, speak up.
            8. If asking a ‘how to’ question, explain what you are trying to accomplish.
            9. Utilize resources such as developerWorks, Redbooks, forums, etc., for how-to information.
            10. Follow up! Don’t be afraid to ask for status if you don’t hear back from support after a reasonable amount of time.

This is an another excellent presentation. AIX Virtual User Group monthly webinars are always worth the time. But if you want to save time watching webinar replays, do what I do: Download the files and view them with VLC. Then go into playback and select speed/faster. Although the presenters might sound funny, I don’t find it difficult to keep up with what’s being said, and I do save time.

Finally, some recent posts and re-tweets from my Twitter feed (@robmcnelly):

            Are you creative genius behind the #NextPowerApp? Submit your idea to win an iPad: http://bit.ly/13R939f

            How to install #PowerVM VIO Server from the HMC GUI #powersystems

            #PowerSystems Technical Universities Orlando Oct 21st  … (within Enterprise1023) & Athens Nov 5th  …

            RT @cgibbo RT @AIXDownUnder: Script to create MKSYSB backups for all NIM clients & keep 2 versions on hand #AIX

            Are you a Walnut Guru?? Have you used brown? 

Advice on HMC Connections

Edit: Some links no longer work.

Originally posted September 10, 2013 on AIXchange

If you connect a managed system to an HMC and it isn’t recognized, how can you troubleshoot the problem?

Well, it depends. What about the connection? Is your HMC behaving as a DHCP server, or have you assigned static IP addresses in your environment?

This interesting blog post offers some tips and tricks, although the IP addresses they recommended didn’t work for my customer.

The post concludes with this advice:

            “If you have had both HMC-1 and HMC-2 connections possibly taken off their default IP addresses, one way that comes to mind is a ‘sniffer’ utility like WireShark. You can attach your laptop to one of the HMC connections and in a short amount of time determine the IP address of the system connection. If you do work like this, you should be familiar with tools of this type, and be prepared to use them in the case of an unknown IP address assignment.”

Here’s a nice document about accessing ASMI. If you’re dealing with a Power Systems Model 720 or 740, try this Redbook. Section 2.14.2 examines HMC connectivity; 2.14.3 looks at HMC high availability.

If the IP addresses are unknown or not working (perhaps the system was previously set up with static addresses but is now being connected to a new HMC), this video with Brian Smith could help. He discusses resetting an ASMI password and explains how to use the front panel to get the HMC IP addresses. Starting around 1:30, he shows how you can use the arrows on the front of the machine.This document, called “Managing the Control Panel Functions,” can also serve as a reference guide. Page 14 shows what is displayed when performing function 30 from getting in manual mode:

            “Accessing the control panel functions using the physical control panel: The control panel functions correspond to function numbers on the control panel. To activate a control panel function, do the following:                        

  1. Select a function number by pressing the Increment (↑) or Decrement (↓) button on the control panel.

        2. To activate the function, press Enter on the control panel.

         Putting the physical control panel in manual operating mode: You must first put the physical control panel in manual operating mode before you can select or activate certain functions. To put the physical control panel in manual operating mode, do the following:

                                    1.Use the Increment button to scroll to function 02.
                                        02______________________________
                                    2.Press Enter to start function 02.
                                    3.Press Enter again to move to the second character on the function 02 menu. The current system operating mode is displayed with a pointer, as shown in the following example:
                                       02__B__N<___________________P___
                                    4.Use the Increment button to scroll through the system operating modes, and select M for manual, as shown in the following example:
                                       02__B__M<___________________P___
                                    5.Press Enter to select the system operating mode.
                                    6.Press Enter again to exit function 02.                       

            “The control panel is in manual operating mode”

Once you know the HMC IP address, it’s trivial to connect to it, login to ASMI and change the values to whatever is needed to establish communications with the HMC.

On another note, I wanted to try an experiment and include a few tweets that I thought were interesting.  If you are already following @robmcnelly then some of these might be repeats, but hopefully the rest of you will find this information useful as well.

RT @AIXmag RT @kristijan: New blog post: AIX boot hangs with HMC 2700 LED code

RT @cgibbo RT @chmod666: Activating resources temporarily by using On/Off CoD 

#IBM #PowerHA v7.1.2 for AIX Enprise Edition support for EMC SRDF

AIX, NIM Make System Restoration Easier

Edit: Some links no longer work.

Originally posted September 3, 2013 on AIXchange

Recently I received this query:

            “I wonder if you have a basic tutorial describing the steps needed to restore an AIX server. I am thinking of a scenario similar to the following:

            “You have an AIX server with some applications and/or databases running on it. A disaster occurs, and you need to restore the server. I have access to a backup taken by a TSM server. So, what I think that could be done is the following:

            1. Insert the AIX DVD in the server.
            2. Using the HMC console Boot in SMS mode (press F1 or ESC 1 several times when the IBM banner is show on screen).
            3. Choose the DVD as boot device.
            4. Boot the server from DVD.
            5. Select the disk to do the OS install.
            6. Select the packages to be installed (do a basic install).
            7. Once the OS is installed, the server will reboot.
            8. Configure the root password, the network and several basic things.
            9. Access the server from the network to Install TSM client and configure it.
            10. Restore the backup from TSM.

            “What do you think?”

What I think is that while this might be a reasonable way to restore a system that runs another OS, you needn’t go to all this trouble with AIX. I recommend using the mksysb command, which “creates a backup of the operating system (that is, the root volume group). You can use this backup to reinstall a system to its original state if it is corrupted. If you create the backup on tape or UDFS capable media, the backup is bootable and includes the installation programs needed to install from the backup.”

Even though many of you are familiar with mksysb, I wanted to post this question to make a point: Lots of people are new to AIX these days. For the most part, they’re been working in UNIX environments, and then through employer acquisition/job change or what have you, they’re suddenly charged with maintaining AIX systems. They might not realize that NIM and mksysb images are even options. Fortunately, they can seek advice and access others resources to learn more about the platform.

Now, back to the question. I would first make sure I had a good, recent mksysb. This allows you to skip step 1. In step 2, when I was in SMS mode, I’d make sure the networking information allowed for booting from the NIM server. Boot from the NIM; don’t use AIX media. Once the restore was completed, I could skip steps 6-9 and proceed to step 10.

The other advantage with NIM is that it gives me a clone of the system. Obviously you don’t want to have to reload the system, and try and recall all the packages and settings that you had installed, in the aftermath of a disaster. Make sure you are taking your backups. You never know when you might really need it.

For additional information on this topic, there’s this good article by Jaqui Lynch. I’ve also covered NIM on this blog (here and here). Can you think of any circumstances that would require you to completely rebuild an AIX machine from scratch in a disaster situation?

BYOD’s Slippery Slope

Edit: I did not even mention the cost of some of this hardware. Some links no longer work.

Originally posted August 27, 2013 on AIXchange

We’re starting to hear more about BYOD — that’s bring your own device to work:            

“Some believe that BYOD may help employees be more productive. Others say it increases employee morale and convenience by using their own devices and makes the company look like a flexible and attractive employer.”

Having been around long enough to recall the days when employers routinely provided phones and pagers, I understand the benefits of BYOD. However, I can also see some serious issues. What if your smartphone, tablet or laptop is lost or stolen? I do not expect to see your employer offer replace your devices, and since you rely on them for work, you are going to be solely responsible for the loss. Who provides tech support? And what about the data on these devices? Who does that belong to?

Of course, things do happen. One of my sons managed to drop an iPod on an airplane. It slid around the floor of the cabin, someone picked it up and pocketed it, and we never saw it again. Apple was certainly no help. We had the serial number, and you’d think Apple could monitor that were the thief to, say, connect to an iTunes account. But that’s not something they do.

Some people do manage on their own to recover stolen devices. This guy used open source software to locate his stolen laptop and phone. According to this article, you might also have some luck if gmail or Dropbox is still running and updating the IP address information.

However, a savvy thief will simply wipe a pilfered device and remove the tracking software.

Other solutions are works in progress. This article notes that some law enforcement officials support the creation of “kill switches” that would render smartphones inoperable after they are stolen:

            “To drive home their point about the danger of violent smartphone thefts, authorities introduced relatives of 23-year-old Megan Boken, who was shot and killed in St. Louis in 2012 by an assailant who was trying to steal her iPhone.”

Others advocate for the creation of a database of stolen devices. This “blacklist” would allow mobile providers to refuse service on devices that are reported stolen. However, critics point out that thieves could get around this by altering the International Mobile Equipment Identity (IMEI) number.As technologists, you’d think we would be able to develop an elegant solution to these problems, but so far that seems to have eluded us. As much as I love my technology, I’d prefer not to put a target on my back when I use my smartphone.

Power Systems Experts Weigh In

Edit: Some links no longer work.

Originally posted August 20, 2013 on AIXchange

I wish I’d been in Manchester and London last month. Apparently I missed out on a great Power Systems event.

            “The fifth POWER Ask the Experts is a one day customer technical event in the U.K…. It proved to be a very popular free event. We had well over a 100 people attend which was a mixture of customers, a few business partners and some IBMers.”

Even though I wasn’t there, I can at least download the slides. Pat O’Rourke gave a Power Systems update, Nigel Griffiths presented performance best practices with POWER7, Gareth Coates presented hands-on tricks of the Power masters, and David Spurway gave a cost comparison between IBM Power and Intel servers. The finale was an NDA session covering Power systems trends and directions, so obviously we don’t have slides for that one.

Although I may see some of this information at this fall’s IBM Power Systems Technical University conference in Orlando, I am sure some unique topics were covered at the U.K. event.

I do encourage you to check out the slides, because some of these tips may be new to you.

* For starters, by logging into your HMC and running any of these four commands, you’ll get detailed information about memory and disk usage, etc.

            monhmc –r mem –n 0

            monhmc –r disk –n 0

            monhmc –r proc –n 0

            monhmc –r swap –n 0

* Another HMC tip concerns disconnecting and reconnecting a managed system from the HMC. Run mksysconn –o auto to clear the connection history on your HMC before reconnecting the managed system. Run lssyscfg –r sys –F name in order to see which managed systems are attached to your HMC.

* To show the vios and vhost for a client VSCSI adapter, run:

            # print “cvai” | kdb | grep vscsi | grep –v read

* Another VIOS tip: Don’t go into oem_setup_env to run commands on your VIO servers. Be sure to check out slide 20, which covers failures with creating system plans that may stem from messing around as root on your VIOS.

* From the same presentation comes the reminder that the $export CLI_DEBUG=33 command provides detailed information about the commands VIOS is running under the covers.

* This lshwres command provides all the WWPN on a system:

            lshwres -r io –rsubtype slotchildren -m Server-9117-MMB-SN101509A –F
            phys_loc,description,mac_address,wwpn,microcode_version |grep Fibre

* The DPO observations are also well worth a read. Here are some useful DPO-related HMC commands:

            lsmemopt –m -o currscore
            lsmemopt –m -o calcscore
            optmem –m -t affinity –o start

* One must be careful with kdb, but if you want to see how many virtual processors are active, enter the following on the command line:

            # echo vpm | kdb

* Finally, there’s a reference to the IBM Redbook, “IBM PowerVM Virtualization Managing and Monitoring.”

Of course there’s much more information than what I’ve shared here. Download the slides and see for yourself.

The OpenPOWER Consortium can take Power in New Directions

Edit: Some links no longer work.

Originally posted August 13, 2013 on AIXchange

When I think of OpenPOWER, I think of the Linux-capable IBM systems unveiled almost 10 years ago. Now though, the name signifies something new after this week’s IBM announcement:

“IBM, Mellanox, NVIDIA and Tyan… announced plans to form the OpenPOWER Consortium – an open development alliance based on IBM’s POWER microprocessor architecture. The Consortium intends to build advanced server, networking, storage and GPU-acceleration technology aimed at delivering more choice, control and flexibility to developers of next-generation, hyperscale and cloud data centers.

“The move makes POWER hardware and software available to open development for the first time as well as making POWER IP licensable to others, greatly expanding the ecosystem of innovators on the platform. The consortium will offer open-source POWER firmware, the software that controls basic chip functions. By doing this, IBM and the consortium can offer unprecedented customization in creating new styles of server hardware for a variety of computing workloads.”

Basically this announcement is a statement of direction: These companies are saying they plan to form the consortium. We should expect to hear more later on once it’s actually up and running.

The idea, as noted in the Wall Street Journal, will be to look at the complete server hardware stack — from the processor to the firmware to the operating system.

From the WSJ article:

“The alliance the companies plan to announce Tuesday would allow many companies to license IBM microprocessor designs—based on a technology dubbed Power—that are now only found in Big Blue’s own server systems. Licensees could incorporate IBM-designed circuitry in their own chips, with members of the alliance working on related products such as servers, networking and storage devices, participants said.

“The effort will start with Power8, a forthcoming member of the chip family that IBM plans to discuss at a technical conference this month.”

Having been virtualizing systems for more than 40 years, IBM has a long history around enterprise servers and virtualization. I look forward to seeing what this consortium comes up with around these proven technologies.

We’ve known for many years about the virtualization flexibility and raw performance available with POWER systems, PowerVM and the hypervisor, and obviously IBM will continue innovating with AIX and IBM i and coming out with new Power server models. However, the capabilities that we’ve taken for granted with these systems may now be available to more IT pros throughout the industry.

Power chips are already in video game consoles, computers in our vehicles and rovers on Mars, in addition to our computer rooms. Who knows where they’ll end up next?

More on the IBM PowerLinux Announcement

Edit: Some links no longer work.

Originally posted August 6, 2013 on AIXchange

As noted at the end of last week’s post, on July 30 IBM made another PowerLinux announcement. Here’s the full IBM press release.

“The PowerLinux 7R4 is the high-end addition to IBM’s line-up of Power Systems PowerLinux servers running industry standard Linux from Red Hat and SUSE. Joining the PowerLinux 7R1 and 7R2 models, the PowerLinux 7R4 delivers a new level of performance with up to 4 sockets and 32 cores — ideal for clients seeking a Linux solution capable of handling compute-intensive workloads including analytics, cognitive computing, database and web infrastructure. The PowerLinux 7R4 takes advantage of the same virtualization, middleware, and applications that are available on all Power Systems running Linux today.

“In addition to IBM DB2 database software for Linux, which offers an average 98 percent compatibility when migrating Oracle Database applications, IBM announced that EnterpriseDB’s enterprise-level PostgreSQL-based database solution is now available on all Power Systems servers running Linux.

“Switching databases has traditionally been costly and risky due to limited application compatibility and lack of comprehensive migration tools and resources. EnterpriseDB’s Postgres Plus Advanced Server and IBM Power Systems solve this problem by providing extensive Oracle compatibility functionality, migration tools and expertise that can deliver significant cost savings while allowing many Oracle based applications to run virtually unchanged,” said Ed Boyajian, President and CEO, EnterpriseDB.

“IBM has participated in a wide range of open source projects since 1999, and today this includes Open Stack, Open Daylight, KVM, Apache and Eclipse in addition to Linux. Hundreds of IBM programmers and engineers around the world are contributing to open source as part of the collection of global open source communities, including experts working on projects such as KVM and hands-on support for clients, IBM Business Partners and software vendors interested in running Linux on Power Systems. In May 2013 IBM opened the world’s first IBM’s Power Systems Linux Center in Beijing, and in June 2013 IBM announced its intention to open two more IBM Power Systems Linux Centers in New York and Austin.”

Here’s the IBM PowerLinux 7R4 announcement letter:

“The IBM PowerLinux 7R4 (8248-L4T) server is a powerful 2-socket or 4-socket server that ships with 16 or 32 fully activated cores and I/O configuration flexibility to meet today’s growth and tomorrow’s processing needs. The server features:

  • Powerful POWER7+ DCM processors that offer 3.5 GHz and 4.0 GHz performance with 16 or 32 fully activated cores
  • Up to 1024 GB of memory
  • Rich I/O options in the system unit: six PCIe 8X Gen2 slots in the system unit; two GX++ slots for I/O drawers; six hard disk drive (HDD)/solid-state drive (SSD) SAS small form factor (SFF) bays and integrated SAS I/O controllers; integrated multifunction card with four Ethernet, two USB, and one serial port; redundant hot-swap ac power supplies in each enclosure; 19-inch rack-mount 5U configuration …

“Without PowerVM, dynamic LPAR allows one partition per processor. With PowerVM , up to 20 partitions are allowed per processor. Logical partitioning is supported when IBM PowerVM for IBM PowerLinux (#EC22) is ordered.

“The backplane can be configured as one set of six bays, two sets of three bays (3/3), or three sets of two bays (2/2/2). Configuration options will vary, depending upon the controller options and the operating system selected. The controllers for the six-bay or 3/3 configurations are always the two pairs of embedded controllers. If the 2/2/2 configuration is used, the two embedded controllers run the first two sets of bays (2/2) and a feature 5901 PCIe SAS adapter located in a PCIe slot in a CEC enclosure controls the third set (2). By having three controllers, you can have three boot drives supporting three partitions.

“The IBM PowerLinux 7R4 (8248-L4T) server is designed with both IBM and customer serviceability in mind. Advancements such as Guiding Light LED architecture are used to control a system of integrated LEDs that lead the individual servicing the machine to the correct part as quickly as possible. With the PowerLinux 7R4 server, you can replace service parts (customer replaceable unit). To do this, the PowerLinux 7R4 server uses Guiding Light LEDs to indicate the parts that need to be replaced. An HMC attached to the PowerLinux 7R4 server enables support personnel (with your authorization) to remotely log in to review error logs and perform remote maintenance if required.

“Concurrent maintenance guided service procedures will continue to be supported by the Repair and Verify (R&V) component of the Service Focal Point application running on the HMC. Repair procedures that are not covered by the guided R&V component are documented and available for display on any web browser-enabled system as well as on the HMC. These procedures are available through the InfoCenter application.”

If you search for IBM 7R4 you will find more analysis. Here are two additional articles.

InformationWeek says:

“Why buy Power when there are more x86 choices? Performance is the differentiator, according to IBM. Multi-threaded Java applications, for example, can take advantage of four threads per core instead of the two threads per core on Intel machines. What’s more, Power 7+ series upgrades introduced over the last year include a highly optimized IBM Java Virtual Machine for better Java performance. Finally, the machine has a 2.5 times more cache than competitive Intel machines.

The Register says:

 “… because of the relatively high cost of Power Systems iron, which was marketed to Unix and proprietary customers used to paying a premium for every component in their systems, it was difficult to pitch a Power-based machine against an x86 box and win. So, with the PowerLinux machines, IBM cut its prices to take that issue off the table. And now, IBM can focus the conversation on the performance of Java, database, and analytics workloads and show that a Power7+ alternative can take on a Xeon system and make economic as well as technical sense.”

IBM continues to make Power servers an attractive option for running Linux. As AIX and IBM i cannot run on the 7R1, 7R2 or 7R4, IBM has made the pricing on these systems very competitive when compared with traditional x86 commodity hardware. Take the time to investigate whether Linux on Power makes sense in your environment.

In case you missed it, here is some information from today’s Wall Street Journal about the OpenPOWER Consortium:

“The effort will start with Power8, a forthcoming member of the chip family that IBM plans to discuss at a technical conference this month.”

The IBM news release says the consortium is “an open development alliance based on IBM’s POWER microprocessor architecture. The Consortium intends to build advanced server, networking, storage and GPU-acceleration technology aimed at delivering more choice, control and flexibility to developers of next-generation, hyperscale and cloud data centers.”

IBM Systems Magazine also had an article on the announcement today.

An AIX on Power Performance Primer

Edit: Some links no longer work.

Originally posted July 30, 2013 on AIXchange

Check out this document from IBM’s Dirk Michel, “AIX on Power – Performance FAQ.” It’s only 87 pages, but there’s great information. I encourage you to read it and become familiar with its contents.

Chapter 2 asks and answers the question, “what is performance?”

            “For interactive users, the response time is the time from when the users hits the button to seeing the result displayed. The response time often is seen as a critical aspect of performance because of its potential visibility to end users or customers. The throughput of a computer system is a measure of the amount of work performed by a computer system over the period of time. Examples for throughput are megabytes per second read from a disk, database transactions per minute, megabytes transmitted per second through a network adapter. Throughput and response time are related. In many cases a higher throughput comes at the cost of poorer response or slower response as well as better response time comes at the cost of lower throughput.”

Chapter 4 covers workload estimation and sizing:

            “Some questions to consider before beginning the sizing exercise:
            1. What are the primary metrics, e.g., throughput, latency, that will be used to validate that the system is meeting performance requirements?
            2. Does the workload run at a fairly steady state, or is it bursty, thereby causing spikes in load
on certain system components? Are there specific criteria, e.g., maximum response time that must be met during the peak loads?
            3. What are the average and maximum loads that need to be supported on the various system components, e.g., CPU, memory, network, storage?”

Chapter 5 covers performance concepts along with CPU performance, multiprocessor systems, multithreading, processor virtualization, memory performance, caches, cache coherency, virtual memory, memory affinity, processor affinity and more.

Chapter 6 is an examination of performance analysis and tuning:

            “This chapter covers performance analysis and tuning process from a high level point of view. Its purpose is to provide a guideline and best practice on how to address performance problems using a top down approach. Application performance should be recorded using log files, batch run times or other objective measurements. General system performance should be recorded, and should include as many components of the environment as possible. Before collecting any data or making tuning or configuration changes, define what exactly is slow. A clear definition about what aspect is slow usually helps to shorten the amount of time it takes to resolve a performance problem since a performance analyst gets a better understanding what data to collect and what to look for in the data.”

Section 6.3.4 presents a performance analysis flow chart.

Chapter 7 gives a performance analysis how-to:

            “This chapter is intended to provide information and guidelines on how to address common performance problems seen in the field, as well as tuning recommendations for certain areas. Please note that this chapter is not intended to explain the usage of commands or to explain how to interpret their output.”

Chapter 8 includes frequently asked questions. Here’s one I like:

            “I heard that… should I change…?
            “No, never apply any tuning changes based on information from unofficial channels. Changing
performance tunables should be done based on performance analysis or sizing anticipation.”

Chapter 9 features things you should know about POWER7. Section 9.10 covers virtualization best practices, for example:

            9.10.1 Sizing virtual processors

  •    The number of virtual processors of an individual LPAR should not exceed the number of physical cores in the system
  •    Shared processor pool: the number of virtual processors of an individual LPAR should not exceed the number of physical cores in the shared processor pool

            9.10.2 Entitlement considerations
            Best practice for LPAR entitlement would be to set the LPARs entitlement capacity to its average physical CPU usage and let the peaks addressed by additional uncapped cycles. For example, an LPAR running a workload that has an average physical consumed of 3.5 cores and a peak utilization of 4.5 cores should have 5 virtual processors to handle the peak CPU usage and an entitlement of 3.5.

Chapter 11 covers the AIX Dynamic System Optimizer, and Chapter 12 explains how to report a performance problem using perfpmr.

Obviously there’s far more than I’ve listed here. Read it for yourself and share your thoughts in comments.

Also take a look at today’s PowerLinux announcement: http://www-03.ibm.com/press/us/en/pressrelease/41582.wss

The PowerLinux 7R4 is the high-end addition to IBM’s line-up of Power Systems PowerLinux servers running industry standard Linux from Red Hat and SUSE. Joining the PowerLinux 7R1 and 7R2 models, the PowerLinux 7R4 delivers a new level of performance with up to 4 sockets and 32 cores. “Powerful POWER7+ DCM processors that offer:

 • 3.5 GHz and 4.0 GHz performance with 16 or 32 fully activated cores 

• Up to 1024 GB of memory

• Rich I/O options in the system unit:     

• Six PCIe 8X Gen2 slots in the system unit    

• Two GX++ slots for I/O drawers    

• Six hard disk drive (HDD)/solid-state drive (SSD) SAS small form factor (SFF) bays and integrated SAS I/O controllers    

• Integrated Multifunction Card with four Ethernet, two USB, and one serial port”

Virtualization on Power Resources

Edit: Some links no longer work.

Originally posted July 23, 2013 on AIXchange

Lately I’ve been getting questions about virtual processors and shared processor pools. Here are some resources on this topic that might help.

* In January Rosa Davidson of IBM delivered a great two-part presentation, “Capacity Entitlement and Virtual Processors.” The replays and slides are available here.

Here’s an explanation of virtual processors found in the POWER6 documentation in IBM’s Information Center:

“However, when you install and run an operating system on a logical partition that uses shared processors, the operating system cannot calculate a whole number of operations from the fractional number of processing units that are assigned to the logical partition. The server firmware must therefore represent the processing power available to the operating system as a whole number of processors. This allows the operating system to calculate the number of concurrent operations that it can perform. A virtual processor is a representation of a physical processor to the operating system of a logical partition that uses shared processors. “

* Be sure to look at these IBM Systems Magazine articles on virtual processor folding and shared processor settings:

“Virtual processors are what the operating system thinks it has since it can only relate to whole numbers of processors. And the desired virtual processor value is basically the maximum number of physical processors that an uncapped shared processor partition can use if processor units are available in the shared processor pool.”

* Finally, here’s something that I wrote for IBM Systems Magazine‘s AIX EXTRA e-newsletter:

“… keep in mind you can never use more physical CPUs than virtual CPUs as defined in your LPAR. Even if you allocate one virtual processor to an LPAR and set it to be uncapped, you can’t run more than one physical processor because there would be no other virtual processors available.

“This way, you can limit the LPARs in your shared processor pools even if your LPAR is uncapped and there are 16 processors available in a shared processor pool. You still won’t be able to use more than one physical CPU because you only allocated one virtual CPU.

“A virtual processor can represent from 0.1 to 1 of a physical processor. If you have one virtual processor, the range it can physically consume will never be more than one. If you have three virtual processors, you can use from 0.3 to 3, but never more than three.

“It makes sense, as you’re basically giving your VM the illusion that it’s dealing with a physical processor. If it boots up, and sees three virtual processors, even if it’s running on 0.3 physical processors, it won’t see more than three processors. If it’s running uncapped and wanted to use four physical processors, where would they run if there are only three virtual processors?”

I’m sure more good documentation is available. Feel free to post a comment with any resources you’ve used to get up to speed with virtualization on Power.

UNIX Has 2 Vowels… and That’s About It

Edit: Some links no longer work.

Originally posted July 16, 2013 on AIXchange

While reading up on some of the activities surrounding the 25th anniversary of IBM i, I came across this tweet:

“Celebrating 25 years of vowel conservation”

Aaron’s point is that IBM i administrators — going all the way back to the AS/400 days — use precious few vowels on the command line:

            “The AS/400 operating system is consistent in its presentation and names. Commands have names of up to 10 letters. The commands typically take the form of three letters. For example, to work with active jobs, the command is WRKACTJOB. That’s a single word with no spaces. WRK is the AS/400 abbreviation for ‘work’ and ACT is the abbreviation for ‘active.’ Because the AS/400 is consistent in its naming style, after you know some of the abbreviations, you will be able to guess the names of commands.”

As I recently noted, I worked on the AS/400 back when, and I believe that AIX and IBM i pros have much to offer one another. Nevertheless, I have to stick up for AIX here. Look at these common UNIX commands: lsdev, lsattr, lscfg, chdev. Clearly, we’re not wasting vowels, either.In all seriousness, this methodology is pretty standard across all IBM systems. Check out this Tivoli page, for instance:

            “Vowels are often omitted to shorten the name of a command. Commands are named using two conventions, depending on their provenance…

            “Commands that are inherited from the previous versions of Software Distribution are named using the w+verb+object convention, which matches the way you might think of the action. For example, to import a reference model in Change Manager, you use the wimprmod command. To delete a reference model, you use the wdelrmod command.”

In UNIX you’ll find many commands that are or are nearly vowel-free — cp, rm, ls, awk and ln, just to name a few. And of course the UNIX philosophy permeates AIX:

            1. Small is beautiful.
            2. Make each program do one thing well.
            3. Build a prototype as soon as possible.
            4. Choose portability over efficiency.
            5. Store data in flat text files.
            6. Use software leverage to your advantage.
            7. Use shell scripts to increase leverage and portability.
            8. Avoid captive user interfaces.
            9. Make every program a filter.

            Unix is simple. It just takes a genius to understand its simplicity.
            – Dennis Ritchie 

Although I congratulate IBM i admins on their judicious use of vowels over the past 25 years, they’re not alone in this ongoing effort.