Getting NPIV Info from VIO Servers

Edit: Some links no longer work.

Originally posted July 9, 2013 on AIXchange

Here’s another script from Dean Roswell. This one is for getting NPIV information from VIO servers. If you’re not sure how to set up ssh password-free login for your VIOS, read this. The same document is also referenced in last month’s post featuring Dean’s script that displays information about the frames that are managed by your HMC.

Here’s the latest version of this script. I will ask Dean to post in comments when he makes changes to the tools.

            #!/bin/ksh
            # Created by Dean Rowswell, IBM, May 31, 2013
            # List Virtual and Physical Fibre Channel info for NPIV environments
            #
            # Assumption:
            #    Password-less ssh must be setup from this system to the Virtual
            I/O Server(s)

            VIOS_LIST=”vios1 vios2″
            VIOS_USER=”padmin”

            VER=”1.0″

            # Parameter checks
            if [ ${#*} -ne 0 ]
            then
                    while getopts :vVh:u: PARMS
                    do
                           case $PARMS in
                                    v|V)    echo “This is get_lpar_fcinfo version:
            $VER” ; exit ;;
                        h)    VIOS_LIST=`echo $OPTARG | tr ‘,’ ‘ ‘` ;;
                        u)    VIOS_USER=${OPTARG} ;;
                                     ?)      echo “\nUSAGE:\t$0 [ -v, -V, -h, -u ]”
                                          echo “\t-v or -V will print out the
            version and exit”

                                            echo “\t-h VIOS hostname(s) or IP
            address(es) COMMA SEPARATED to use”
                                            echo “\t-u VIOS userid to use (only
            required if padmin not used)\n”
                            echo “EXAMPLE: get_lpar_fcinfo -h vios1,vios2\n”
                                            exit ;;
                            esac
                    done
            fi

            printf “%-12s %-10s %-27s %-5s %-20s %-12s %-5s %-6s %-6s %-27s\n”
            VIOS VFCHOST# VIOS_SLOT LPAR# LPAR_NAME STATUS PORTS PHYS VIRT
            LPAR_SLOT

            Get_Info_From_VIOS () {
            for VIOS in ${VIOS_LIST}
            do
                ssh ${VIOS_USER}@${VIOS} ioscli ioslevel >/dev/null 2>/dev/null
                if [ $? -ne 0 ]
                then
                    echo “Password-less SSH access to VIOS ${VIOS} with user
            ${VIOS_USER} is not setup\n”
                    continue
                fi

                ssh ${VIOS_USER}@${VIOS} “ioscli lsmap -all -npiv -fmt :” | awk -v
            VIOS=”$VIOS” -F: ‘{printf “%-12s %-10s %-27s %-5d %-20s %-12s %-5d
            %-6s %-6s %-27s\n”, VIOS,$1,$2,$3,$4,$6,$9,$7,$11,$12}’
            done
            }

            Get_Info_From_VIOS | sort -k4

If you’re interested, download the (sanitized) output I saw from a test machine below:

If you test it out, leave a comment to let me know how it worked and/or if you find it useful in your environment.

Note: Now AIX news and information is available to you on the go. IBM Systems Magazine, Power Systems edition, just launched an app for the Apple iPad.

Remote HMC Upgrades, Revisited

Edit: Some links no longer work.

Originally posted July 2, 2013 on AIXchange

I had an old CR2 HMC running version 7.4.0.1 that was managing some POWER7 servers along with an old POWER5 server running version SF240_415 microcode. I wanted to go to the latest (as of this
writing) HMC code version, 7.7.7 SP1.

I immediately wondered if the latest HMC code could manage that older version of POWER5 microcode. Happily, it can, with this version of firmware. It’s also a match for the version of microcode that was running on the POWER7 machines.I went ahead and downloaded the 7.7.7.0 files to my HMC so that I could do the upgrade. This January 2011 post covers what the basics of what I wanted to accomplish.

To get the latest files (as opposed to those referenced in my old post), I used this command:

            getupgfiles -h 170.225.15.40 -u anonymous –passwd ftp -d
            /software/server/hmc/network/v7770

It worked like a charm. It rebooted and I was running 7.7.7.0 — until I received this message:

            lshmc –V
            A connection to the Command Server failed.

I found this technote:

            Problem(Abstract)
            A connection to the Command Server failed.
            Symptom
            hscroot@bldhnethmc01:~> lshmc -v
            connect: Connection refused
            A connection to the Command Server failed.
            Resolving the problem
            Reboot HMC

That made me laugh. I tried the reboot, and had no luck. Then it dawned on me that this was an old HMC. Would 7.7.7 even run on it? And shouldn’t I have looked into that before doing the upgrade?

Although not specific to AIX, this is relevant information:

            Important Notes:
            * Version 7.7.7 is not supported and cannot be installed on HMC models C03, C04 or CR2.
            * If an HMC is used to manage any POWER7 processor based server, the HMC must be a model CR3 or later model rack-mount HMC or C05 or later deskside HMC.

Lucky for me I had a CR3 available. I was able to upgrade that HMC with no problems. Once I was at 7.7.7.0, I wanted to get the latest fixpack, so I went to the updates tab on the HMC and selected UPDATE HMC. This information was helpful:

            To install SP1, do the following:
            a) In the HMC Navigation pane, select Updates.
            b) In the Work pane, click the Update HMC button. The “HMC Install Corrective Service Wizard” panel is displayed.
            c) On the Current HMC Driver Information panel, click Next.
            d) On the Select Service Repository panel, click Remote Server, then click Next.
            e) On the Installation and Configuration Options panel (if using a local FTP server, modify the entries as appropriate for your local FTP server):

                        Remote server type: FTP

                        Remote Server: public.dhe.ibm.com
                        User ID: anonymous
                        Password:
                        Remote directory: /software/server/hmc/updates

            Click Next.
            On the Select Service Package panel, scroll down to HMC_Update_V7R770_SP1.iso , click to select, and click Next.

After clicking on Finish, I received this message:

            Management console corrective service installation in progress. Please wait…
            Corrective service file offload from remote server in progress…”

It took quite a while to download the .iso image, but once that happened, the upgrade completed as expected in around 30 minutes:

            The corrective service file offload was successful. Continuing with
            HMC service installation…
            Verifying Certificate Information
            Authenticating Install Packages
            Installing Packages
            — Installing ptf-req ….
            — Installing RSCT ….
            src-3.1.4.2-13008
            rsct.core.utils-3.1.4.2-13008
            rsct.core-3.1.4.2-13008
            rsct.service-3.5.0.0-1
            rsct.basic-3.1.4.2-13008
            — Installing CSM ….
            csm.core-1.7.1.20-1
            csm.deploy-1.7.1.20-1
            csm_hmc.server-1.7.1.20-1
            csm_hmc.hdwr_svr-7.0-3.4.0
            csm_hmc.client-1.7.1.20-1
            csm.server.hsc-1.7.1.20-1
            — Installing LPARCMD ….
            hsc.lparcmd-3.0.0.1-1
            ln:
            creating symbolic link `/usr/hmcrbin/lsnodeid’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/lsrsrc-api’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/mkrsrc-api’
            : File exists
            ln:
            creating symbolic link `/usr/hmcrbin/rmrsrc-api’
            : File exists
            — Installing InventoryScout ….
            — Installing Pegasus ….
            — Updating baseOS ….
            Corrective service installation was successful.

You’re then prompted to reboot. In my case, that also took a nice long while, but it did eventually come back.

Lessons learned from this experience:

1. Don’t assume anything, even when you’re using a crash and burn test box.

2. Make sure your hardware can support the software you plan to run on it.

And the one lesson relearned: Updating your HMC remotely really is the way to go.

The Stages of Team Building

Edit: Which stage are you in?

Originally posted June 25, 2013 on AIXchange

Are you forming, storming, norming or performing? Or perhaps you’re just wondering what in the world I’m talking about.

What I’m talking about is Tuckman’s stages of group development. I was introduced to it through my sons’ involvement with Scouting. Tuckman’s theory is that every group of people is in one of four stages of team building. When we started Wood Badge training, we were at the forming phase:

            “In the first stage of team building, the forming of the team takes place. The individual’s behavior is driven by a desire to be accepted by the others, and avoid controversy or conflict. Serious issues and feelings are avoided, and people focus on being busy with routines, such as team organization, who does what, when to meet, etc. Individuals are also gathering information and impressions — about each other, and about the scope of the task and how to approach it. This is a comfortable stage to be in, but the avoidance of conflict and threat means that not much actually gets done.”

The next phase, we were told, is marked by high enthusiasm and low skills:

            “Every group will next enter the storming stage in which different ideas compete for consideration. The team addresses issues such as what problems they are really supposed to solve, how they will function independently and together and what leadership model they will accept. Team members open up to each other and confront each others’ ideas and perspectives. In some cases storming can be resolved quickly. In others, the team never leaves this stage. The maturity of some team members usually determines whether the team will ever move out of this stage. Some team members will focus on minutiae to evade real issues.

            “The storming stage is necessary to the growth of the team. It can be contentious, unpleasant and even painful to members of the team who are averse to conflict. Tolerance of each team member and their differences should be emphasized. Without tolerance and patience the team will fail. This phase can become destructive to the team and will lower motivation if allowed to get out of control. Some teams will never develop past this stage.”

The norming stage is described as low enthusiasm and low skill. It can be an unpleasant place to be, and many teams never make it out of this stage:

            “The team manages to have one goal and come to a mutual plan for the team at this stage. Some may have to give up their own ideas and agree with others in order to make the team function. In this stage, all team members take the responsibility and have the ambition to work for the success of the team’s goals.”

            When (or if) a team becomes cohesive and cooperative, that’s the performing stage:

            “It is possible for some teams to reach the performing stage. These high-performing teams are able to function as a unit as they find ways to get the job done smoothly and effectively without inappropriate conflict or the need for external supervision. By this time, they are motivated and knowledgeable. The team members are now competent, autonomous and able to handle the decision-making process without supervision. Dissent is expected and allowed as long as it is channeled through means acceptable to the team. Supervisors of the team during this phase are almost always participative. The team will make most of the necessary decisions. Even the most high-performing teams will revert to earlier stages in certain circumstances. Many long-standing teams go through these cycles many times as they react to changing circumstances. For example, a change in leadership may cause the team to revert to storming as the new people challenge the existing norms and dynamics of the team.”

Of course, performing teams have their own challenges. People leave, new people come in. Things change. Eventually you find yourself back at the storming stage, and trying to move though the cycle again. I know I’ve experienced all the stages in my professional life. I’ve been part of high-performing teams where we all trusted each other, relied on our strengths and compensated for our weaknesses. And I’ve been through turf wars where everyone was looking out for themselves. No one helped anyone and nothing of valued was accomplished.

So how are your teams doing?

Sharing Hardware, and Perspective

Edit: Some links no longer work.

Originally posted June 18, 2013 on AIXchange

A few months back I wrote about IBM i and VIO server, so I was immediately intrigued when a colleague recently pointed me to this document on IBM i virtualization and open storage. I believe this is an updated version of the original. Take the time to give it a read.

Here’s some more good information on IBM i. This document is called the performance capabilities reference:

            “The purpose of this document is to help provide guidance in terms of IBM i operating system performance, capacity planning information, and tips to obtain optimal performance on IBM i operating system.”

I found this pretty interesting. Some of it feels very conservative to me compared what I see in the AIX world. If I wasn’t familiar with VIOS and its benefits, reading this document would honestly make me a bit reluctant to use it in production with IBM i. Despite this — and the fact that this document is intended for IBM i users — I think AIX pros can also benefit from the information. We have Power Systems hardware in common, after all.

Some quick highlights:

* Chapter 2 covers IBM i communications performance.

* Chapter 4 covers internal storage performance. Page 35 has a good chart comparing SSD, SAS and SCSI disk, controllers and enclosures. Here’s one example of information that should interest AIX users. Though the numbers might not match exactly, we can still gain good information here.

* Chapter 5 has details on SAN performance numbers, while chapter 6 covers VIOS and IVM. Section 6.2.2 should give you perspective on why IBM i users would find it scary to move to external storage. Their storage has always been managed internally, going back to the introduction of the AS/400 systems. Now these folks have to trust SAN admins to RAID-protect their disks, and even if the disk is protected, IBM i will report otherwise.

* Section 6.2.3 reminds you to not put LUNs into volume groups when using VIOS. Simply map the LUN directly to the client LPAR. Also remember that with IBM i, you can only have 16 virtual disks on each virtual SCSI adapter.

* Section 6.4 provides examples of virtual SCSI performance. Section 6.7 has a VIO client performance guide, while section 6.8 gives performance observations and tips.* Chapter 7 covers logical partitions and best practices when setting them up. There’s good information about applications running on IBM i, and at the end check out chapter 19 for general performance information tips and techniques. This material is pretty specific to IBM i, however.

Given the audience for this blog, I don’t often write about IBM i. But I do so occasionally, because I believe it’s worthwhile. Tempting as it might be, AIX pros shouldn’t dismiss this topic. It’s that sort of mentality, after all, that’s kept IBM i admins from embracing VIOS. (“VIOS is so much like AIX. Why should I bother with it?”) 

Perhaps 15-20 years ago, that attitude would be acceptable. But, again, we all use the same hardware now. In some environments today, AIX and IBM i run together on the same physical frame. Knowing where everyone is coming from — especially as it pertains to a vital area like system performance — can be very beneficial.

Sizing Power Systems

Edit: Some links no longer work.

Originally posted June 11, 2013 on AIXchange

I attend presentations all the time, and I always appreciate it when I get a copy of the slide decks (which are usually in PowerPoint) afterward. That way I can review them later and refresh my memory as needed. I can also share them with the world, something I recently did with this set of slides.

For me, the next best thing to being at a presentation is being able to watch it online. This is one reason why I’m such a fan of the AIX Virtual User Group — they make presentations available via replay. I wish more presenters, whether they’re at technical conferences or speaking to user groups, would record their work and post it on YouTube or some video site. We’d all benefit from their expertise.

If I don’t see a presentation, either live or recorded, I feel like I’m missing out. Sure, if I’m familiar with the topic, I can generally get up to speed simply by reading the material. But I think it’s very important to be able to actually hear the presenter discuss what’s on the PowerPoint and explain, in his or her own words, why these particular notes or these particular graphics were included.With that backdrop, I want to tell you about a presentation from Jorge L. Navarro Cueva from IBM, who discusses ideas for sizing Power Systems.

No, unfortunately, I didn’t get to see this presentation, but I recommend it just the same. Jorge offers elementary advice for anyone — from beginner to expert — who needs to size Power systems. At only 15 slides, it’s a relatively quick read, but I figure many of you may benefit from reviewing some of the concepts he covers. This list of topics is found on page 3 of the slide deck:

            1. Understand the performance metrics.
            2. Know the most used performance benchmarks.
            3. Don’t get obfuscated by benchmarketing.
            4. Variability is your worst enemy.
            5. Size the true peak load.
            6. Avoid the “what is the peak” pitfall.
            7. Be aware of the consequences of undersizing.
            8. Design a balanced system.
            9. Garbage In, Garbage Out.
            10. Master the sizing tool.

Slide 5 offers a sound reminder: App 1 may need four cores and App 2 may need four cores, but the two apps don’t necessarily need the four cores at the same time.

Slide 6 makes a good point: Your intervals may not give you enough detail to properly size a system. Or they may actually provide too much detail.

Given the two desks on slide 8, I can see why one might have a longer wait to get service from Desk A than Desk B.

Jorge’s presentation comes in two sets of slides. His email is listed on the first slide in each set, so if you have questions, you should contact him directly. If you do correspond with Jorge, I hope you’ll post your questions and his answers in comments section. (However, be sure you get his permission before doing so.) This additional information would certainly be useful for others who will come across this post in the future.

Generating HMC and LPAR Info

Edit: Some links no longer work.

Originally posted June 4, 2013 on AIXchange

Dean Roswell sent over another handy script that you should add to your virtual bag of tricks. Dean’s latest script (version 1.4 as of this writing) provides a quick list of information about the HMC and the LPARs running on it.First, set up your ssh client so you can connect without a password between your HMC and the LPAR you’ll run the script on.

If you aren’t sure how to set this up, you should be able to find help through a web search. I created my id_rsa.pub file after viewing this document. It lists these steps:

            To enable scripts to run unattended between an SSH client and an HMC, do the following:

            Open the Remote Command Execution task from the HMC Management work pane.
            From the Remote Command Execution window, select Enable remote command execution using the ssh facility.
            Create an HMC user with one of the following roles:

                        Super administrator (hmcsuperadmin)
                        Service representative (hmcservicerep)
            On the client’s operating system, run the SSH protocol key generator.
            To run the SSH protocol key generator, do the following:
            To store the keys, create a directory named $HOME/.ssh (either RSA or DSA keys can be used).
            To generate public and private keys, run the following command: ssh-keygen -t rsa
            The following files are created in the $HOME/.ssh directory:
                        private key: id_rsa
                        public key: id_rsa.pub
            The write bits for both group and other are turned off. Ensure that the private key has a permission of 600.

Once this was complete, I copied over my file using:

            mykey=’cat $HOME/.ssh/id_rsa.pub’
            ssh hmc.domain.com -l hmcuser mkauthkeys -a \”$mykey\”

Then I copied over Dean’s script, making sure it was executable. I changed the HMC_LIST variable to match an HMC in my environment:

#!/bin/ksh

# Created by Dean Rowswell, IBM, March 20, 2013

# Modified by Dean Rowswell, IBM, April 24, 2013

#    Calculate the USED Processor and Memory values

# Modified by Dean Rowswell, IBM, May 7, 2013 – Version 1.0

#    Display Memory and Processor config for each LPAR

#    Accept parameters for the HMC(s) and HMC user to use

#    Correctly determine HMC information for Version 7.3.5

#    Ignore mem_mode for POWER5 servers

# Modified by Dean Rowswell, IBM, May 9, 2013 – Version 1.1

#    Calculate the LPAR totals for Memory, Processor Entitlement and Virtual Processors

# Modified by Dean Rowswell, IBM, May 9, 2013 – Version 1.2

#    Skip any HMC which does not have password-less ssh setup

# Modified by Dean Rowswell, IBM, May 10, 2013 – Version 1.3

#    Remove the G in the LPAR memory column and add GB label to header

#    Calculate the Entitlement to Virtual Processor ratio for each LPAR and overall system

# Modified by Dean Rowswell, IBM, May 10, 2013 – Version 1.4

#    Fixed bug with divide by zero error if LPAR is in the Not_Activated state and the Virtual Processor value is 0

# List HMC, POWER server, and LPAR info using the HMC

#

# Assumption:

#    Password-less ssh must be setup from this system to the HMC(s) in the HMC_LIST variable

HMC_LIST=”hmc1 hmc2″

HMC_USER=”hscroot”

VER=”1.4″

# Parameter checks

if [ ${#*} -ne 0 ]

then

        while getopts :vVh:u: PARMS

        do

                case $PARMS in

                        v|V)    echo “This is get_lpar_info version: $VER” ; exit ;;

                     h)     HMC_LIST=`echo $OPTARG | tr ‘,’ ‘ ‘` ;;

                     u)     HMC_USER=${OPTARG} ;;

                        ?)      echo “\nUSAGE:\t$0 [ -v, -V, -h, -u ]”

                                echo “\t-v or -V will print out the version and exit”

                                echo “\t-h HMC hostname(s) or IP address(es) COMMA SEPARATED to use”

                                echo “\t-u HMC userid to use (only required if hscroot not used)\n”

                           echo “EXAMPLE: get_lpar_info -h hmc1,hmc2\n”

                                exit ;;

                esac

        done

fi

for HMC in ${HMC_LIST}

do

       ssh ${HMC_USER}@${HMC} date >/dev/null 2>/dev/null

       if [ $? -ne 0 ]

       then

              echo “\nPassword-less SSH access to HMC ${HMC} with user ${HMC_USER} is not setup\n”

              continue

       fi

       echo “\n=================================”

       echo “HARDWARE MANAGEMENT CONSOLE”

       echo “Hostname: ${HMC} / \c”

       ssh ${HMC_USER}@${HMC} “lshmc -v | grep -E ‘TM|SE|RM'” | sed ‘s/eserver xSeries 336 -\[//g’ | sed ‘s/]-//g’ | tr -s ‘\n’ ‘ ‘ | awk ‘

       {MODEL = $2 ; SERIAL = $4 ; VERSION = $6};

       END { print “Model: ” MODEL “\nSerial: ” SERIAL ” / Ver: ” VERSION}’

       echo “`date`”

       echo “=================================”

       MANAGEDSYS=`ssh ${HMC_USER}@${HMC} “lssyscfg -r sys -F type_model*serial_num|sort”`

       for SYSTEM in ${MANAGEDSYS}

       do

              echo “\nIBM POWER SYSTEM: ${SYSTEM} / SysFW Ver: \c”

              ssh ${HMC_USER}@${HMC} “lslic -m ${SYSTEM} -F ecnumber:activated_level|sed ‘s/:/_/g’|cut -c 3-“|tr -s ‘\n’ ‘ ‘

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r proc –level sys -F installed_sys_proc_units:configurable_sys_proc_units:curr_avail_sys_proc_units”|awk -F: ‘

              {INSTALL = $1 ; CONFIG = $2 ; AVAIL = $3};

              END { print “\n   PROC INFO:\t” INSTALL ” Installed / ” CONFIG ” Configurable / ” CONFIG-AVAIL ” Used / ” AVAIL ” Available “}’

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level sys -F installed_sys_mem:configurable_sys_mem:curr_avail_sys_mem:sys_firmware_mem:mem_region_size” |awk -F: ‘

              {INSTALL = $1 ; CONFIG = $2 ; AVAIL = $3 ; SYSFW = $4 ; LMB = $5};

              END { print ”   MEM INFO:\t” INSTALL/1024 ” GB Install / ” CONFIG/1024 ” GB Config / ” (CONFIG-AVAIL)/1024 ” GB Used / ” AVAIL/1024 ” GB Avail / ” SYSFW/1024 ” GB SysFW / ” LMB ” MB LMB”}’

              echo ”   LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)\n   ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP”

              Get_LPAR_Info() {

                     LPARS=`ssh ${HMC_USER}@${HMC} “lssyscfg -r lpar -m ${SYSTEM} -F lpar_id:name:lpar_env:os_version:state|sed ‘s/ /_/g’|sort -n”`

                     for LPAR in ${LPARS}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              PROC=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r proc –level lpar -F lpar_id:curr_proc_mode:curr_sharing_mode:curr_shared_proc_pool_id:run_proc_units:run_procs:run_uncap_weight|sort -n”`

              for LPAR in ${PROC}

              do

                     printf ”      %-24s\n” ${LPAR}

              done

              ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:mem_mode:run_mem” >/dev/null 2>/dev/null

              if [ $? -eq 0 ]

              then

                     MEM=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:mem_mode:run_mem|sort -n”`

                     for LPAR in ${MEM}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              else

                     MEM=`ssh ${HMC_USER}@${HMC} “lshwres -m ${SYSTEM} -r mem –level lpar -F lpar_id:run_mem|sort -n”`

                     for LPAR in ${MEM}

                     do

                           printf ”      %-24s\n” ${LPAR}

                     done

              fi

              }

              Get_LPAR_Info | sort -n | awk -F: ‘{

              if (NF == 5) { LPAR_ID=$1; LPAR_NAME=$2; OS_TYPE=$3; OS_VER=$4; STATE=$5 }

              if (NF == 3) { MEM_MODE=$2; MEM=$3 }

              if (NF == 2) { MEM_MODE=”NA”; MEM=$2 }

              if (NF == 7) { PROC_MODE=$2; SHARE_MODE=$3; SHARED_POOL=$4; PROC_UNITS=$5; VIRT_PROC=$6; WEIGHT=$7 }

              if ((length(LPAR_ID) != 0 && length(MEM_MODE) !=0 && length(PROC_MODE) != 0)) {

              if (VIRT_PROC == 0) { RATIO = “NA”  } else { RATIO = PROC_UNITS/VIRT_PROC}

              printf ”   %3d %-20s %-9s %-24s %-13s %5.1f %-8s %-7s %-17s %-3d %-4.2f %3d %3d %5.2f\n”, LPAR_ID, LPAR_NAME, OS_TYPE, OS_VER, STATE, MEM/1024, MEM_MODE, PROC_MODE, SHARE_MODE, SHARED_POOL, PROC_UNITS, VIRT_PROC, WEIGHT, RATIO; TOTAL_MEM += MEM; TOTAL_PROC_UNITS += PROC_UNITS; TOTAL_VIRT_PROC += VIRT_PROC ; LPAR_ID=””; MEM_MODE=””; MEM=””; PROC_MODE=”” }

              } END {print ”    —————————————————————————————————————————————–” ; printf ”       LPAR TOTALS %63.1f %43.2f %3d %9.2f\n”, TOTAL_MEM/1024, TOTAL_PROC_UNITS, TOTAL_VIRT_PROC, TOTAL_PROC_UNITS/TOTAL_VIRT_PROC}’

       done

done


In my environment I got the following output, some of which is masked to protect the identity of the customer that allowed me to run this. Of course the output looks best in a format wide enough to support all of the columns. Here, most of the lines are wrapped. A few spaces are added so you can see the different entries.

=================================

HARDWARE MANAGEMENT CONSOLE

Hostname: HMC1 / Model: 7042-CR4

Serial: 123456B / Ver: V7R7.7.0.2

Sun Jun  2 15:23:37 CDT 2013

=================================

IBM POWER SYSTEM: 8202-E4B*12345CP / SysFW Ver: AL720_108

PROC INFO:   4.0 Installed / 4.0 Configurable / 2.9 Used / 1.1 Available

MEM INFO:    64 GB Install / 64 GB Config / 32.75 GB Used / 31.25 GB Avail / 2.25 GB SysFW / 256 MB LMB

LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)

ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP

1 vios1              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         3.2 ded      shared  uncap             0   0.50   4 128  0.12

2 lpar1               aixlinux  AIX_5.3_5300-12-05-1140  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

3 vios2              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

4 vios3              vioserver VIOS_2.2.0.10-FP-24_SP-01 Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

5 nim                 aixlinux  Unknown                  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

6 db2                 aixlinux  Unknown                  Running         4.2 ded      shared  uncap             0   0.40   4 128  0.10

7 lpar2               aixlinux  Unknown                  Running         4.0 ded      shared  uncap             0   0.20   2 128  0.10

8 vios3               aixlinux  Unknown                  Running         2.0 ded      shared  uncap             0   0.20   2 128  0.10

—————————————————————————————————————————————–

LPAR TOTALS                                                            30.5                                        2.90  28      0.10

IBM POWER SYSTEM: 9111-520*12346A / SysFW Ver: SF240_415

   PROC INFO:   2.0 Installed / 2.0 Configurable / 2 Used / 0.0 Available

   MEM INFO:    4 GB Install / 4 GB Config / 4 GB Used / 0 GB Avail / 0.34375 GB SysFW / 16 MB LMB

   LPAR INFO:   NOTE: THE MEMORY AND PROCESSOR VALUES ARE FROM THE ACTIVE/RUNNING LPAR VALUES (NOT FROM LPAR PROFILE)

   ID  NAME                 TYPE      OS_VER                   STATE        MEM(GB) MODE    PROC    MODE             POOL  ENT  VP  WT ENT/VP

 1 lpar1           aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 2 demo            aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 3 test            aixlinux  Unknown                  Not_Activated   0.0 ded      ded     share_idle_procs  0   0.00   0   0  0.00

 4 lpar2           aixlinux  Unknown                  Running         3.7 ded      ded     share_idle_procs  0   0.00   2   0  0.00

 5 lpar3           aixlinux  Unknown                  Not_Activated   0.0 ded      shared  uncap             0   0.00   0   0  0.00

 6 lpar4           aixlinux  Unknown                  Not_Activated   0.0 ded      shared  uncap             0   0.00   0   0  0.00

—————————————————————————————————————————————–

LPAR TOTALS                                                             3.7                                        0.00   2      0.00

If you want to see how both scripts look with the proper spacing, check out the attached PDFs below.

Let me know how the script works for you. And many thanks to Dean for giving permission to post it here.

VIOS Installation via GUI

Edit: Some links no longer work.

Originally posted May 28, 2013 on AIXchange

When you build a VIO server on HMC 7.7.7.0 SP1, there’s a new option to make your life easier.

Check out the HMC readme:

“Add a GUI enhancement for the installation of VIOS, allowing the user to install the Virtual I/O     Server and managing Virtual I/O Server images using a GUI interface.”

The readme also points to future support (delivered by service pack) for importing VIO server images via FTP.

This new functionality makes it easier for “non-AIX” people to install VIOS. It seems particularly helpful if you’re dealing with one of the smaller server models with a split backplane, since these models make it physically impossible to load the VIO server by attaching the internal system DVD to both sets of internal disks.

In any event, as soon as I learned of this option, I naturally wanted to try it out. So I first verified my HMC version:

I then created a VIO partition as I typically would, but the first time I activated my new VIO server I saw something new:

I selected yes to install the VIO server, and I got some new options:

It will assume you’re booting over the network since your install images will reside on your HMC, and you’ll see:

Be sure you don’t have any open and connected console windows, because the subsequent error message doesn’t mention the console. It only says your network adapter cannot be detected, and you won’t be able to activate your VIO partition. 

This is a totally different way to install. It’s not necessary to select SMS or open a console window as you might have done in the past. Of course you can still use the old installation method if you prefer.

On the next screen you can specify where you’re installing VIOS from. In my case I was installing from DVD, but remember that this DVD is now physically located in the HMC:

I entered the IP address, subnet mask and gateway as requested. When I selected OK, I got this screen:

The installation process started to run the commands under the covers, using my HMC as a network installation server:

After it copied the DVD information from DVD1, I got:

I then loaded the second VIO install DVD into the HMC, and processing began:

After it had copied both DVDs, flashing messages appeared about powering up the profile, doing ping tests, etc. The messages scroll by in the window on your screen. Unfortunately, the newest messages aren’t written to your screen, they’re located at the bottom of a log file, so you’ll have manually scroll down to view them. Expect to be annoyed whenever new messages pop up. Alternatively, you can ssh into the HMC and check /var/log/nimol.log, but it would be nice if the messages appeared in a friendly way on the screen.

In the log file I saw quite a few entries, this is just a taste of what you will find:
    ioserver nimol: ,info=LED 610: mount  -r 10.44.3.108:/extra/default1/SPOT/usr /SPOT/usr
    ioserver nimol: ,-S,booting,ioserver
    ioserver nimol: ,info=LED 610: mount  10.44.3.108:/extra/default1/mksysb /NIM_BOS_IMAGE
    ioserver nimol: ,info=LED 610: mount  10.44.3.1.08:/extra/default1/bosinst.data /NIM_BOSINST_DATA
    ioserver nimol: ,info=LED 610: mount  10.44.3.108:/extra/default1/lpp_source /SPOT/usr/sys/inst.images,
    ioserver nimol: ,info=extract_data_files
    ioserver nimol: ,info=query_disks
    ioserver nimol: ,info=extract_diskette_data
    ioserver nimol: ,info=setting_console
    ioserver nimol: ,info=initialization
    ioserver nimol: ,info=verifying_data_files
    ioserver nimol: ,info=prompting_for_data_at_console
    ioserver nimol: ,info=BOS install 1% complete : Making boot logical volume.
    ioserver nimol: ,info=BOS install 2% complete : Making paging logical volumes.
    ioserver nimol: ,info=BOS install 3% complete : Making logical volumes.
    ioserver nimol: ,info=BOS install 4% complete : Forming the jfs log.
    ioserver nimol: ,info=BOS install 5% complete : Making file systems.
    ioserver nimol: ,info=BOS install 6% complete : Mounting file systems.
    ioserver nimol: ,info=BOS install 7% complete
    ioserver nimol: ,info=BOS install 7% complete : Restoring base operating system.
    ioserver nimol: ,info=BOS install 7% complete : 0% of mksysb data restored.
    …Skipping…
    ioserver nimol: ,info=BOS install 89% complete
    ioserver nimol: ,info=BOS install 89% complete : Initializing dump device.
    ioserver nimol: ,info=recover_device_attributes
    ioserver nimol: ,-R,success
    ioserver nimol: ,info=BOS install 89% complete : Network Install Manager customization.
    ioserver nimol: ,info=BOS install 90% complete : Creating boot image.

In one instance my network guys hadn’t set up the switch to ensure that the port was on the right VLAN, so my ping test between the new VIO server and the HMC failed. If this happens to you, you won’t be able to simply return to the screen where you entered the IP information. You’ll have to start over. So make sure your physical network is ready to go when you try this.

For a second test, instead of selecting the DVD, I tried the local repository option:

When I selected local repository and clicked on import, I got this screen:

I gave the server image file a name and clicked OK:

After DVD1 finished, it asked for DVD2:

I clicked OK once more. There was no indication that the second DVD was being read, but something must have been processed, because after awhile it returned to my install screen:

Again I filled in my network information and clicked OK. Instead of reading from the DVD in the HMC, it was reading directly from the HMC’s local disk repository, much like the way VIO servers read from the virtual media repository that we can create on our VIO servers. Obviously using the disk image in the repository was much faster than using the DVD media, and I didn’t need to physically remove DVD1 and insert DVD2. This made subsequent installs less painful, as no one needed to visit the raised floor.

You’ll see status updates in the bar in the upper left, like:
    Network booting install adapter
    Network boot proceeding
    Starting installation
    Installation in progress
    Installation completed

At this point, the progress bar keeps going to the right as the server installs.

I didn’t like that it just took control and started installing to a disk without allowing operator intervention. This could be dangerous if disks you don’t want overwritten simply appear on your VIO server. For instance, if I had data on hdisk0 and wanted to install specifically to hdisk1, I didn’t find an intuitive way to specify that I wanted to use hdisk1.

Once the install was running I opened my console window and was able to watch it install as I would normally expect to. At the same time I was logged into my HMC running tail –f /var/log/nimol.log so I could view the install progress.

I’ll write more as I continue to experiment with this.

So have you tried this method? Did you know this new functionality even existed?

The Value of an Open Mind

Edit: I still want to go back to Gilwell, happy land.

Originally posted May 21, 2013 on AIXchange

I’ve learned some interesting lessons about attitude lately. My sons participate in Boy Scouts. I was involved with Scouting at their age, and while I enjoyed the campouts and other outdoor activities, I never worried about rank advancements. And I certainly didn’t care much for uniforms.

In a way I’m more invested in Scouting now, thanks to my boys. Early on I would fill in for other adult leaders during campouts and other activities. But even though I was helping out by showing up and providing the necessary minimum two-deep leadership, I didn’t do much beyond serving as a chaperone.

For some time I was encouraged to attend a training program called Wood Badge. When I first heard about it, I imagined a bunch of gung-ho Scout dads roughing it in the woods, tying knots and climbing trees. That first impression — the one concocted entirely in my mind — made me hesitate. I was already a father. What sort of week-long training did I need to look after a bunch of kids?

Eventually, I relented and took the training, but this didn’t immediately change my attitude. I was surly when I arrived, still questioning in my mind the necessity of this experience. However, it didn’t take long for my preconceived notions to transform.

For one thing, many of the Wood Badge participants were moms. They led Cub Scout packs and were also wanting to learn how to be more effective leaders to the boys they work with. Even the fact that some of us were older and out of shape made me look at things differently — none of these people really lived up to my image of a “super scouter.” Although I went into it as someone who was baffled by the skits and songs and general silliness, by the end of the training I was enjoying the interaction, and looking forward to singing about Gilwell and Happy Land.

This is the course description found on Wikipedia:

            “Wood Badge is a Scouting leadership program and the related award for adult leaders in the programs of Scout associations throughout the world. Wood Badge courses aim to make Scouters better leaders by teaching advanced leadership skills, and by creating a bond and commitment to the Scout movement. Courses generally have a combined classroom and practical outdoors-based phase followed by a Wood Badge ticket, also known as the project phase. By ‘working the ticket,’ participants put their newly gained experience into practice to attain ticket goals aiding the Scouting movement.

            “On completion of the course, participants are awarded the Wood Badge beads to recognize significant achievement in leadership and direct service to young people. The pair of small wooden beads, one on each end of a leather thong (string), is worn around the neck as part of the Scout uniform.”

Admittedly, those words don’t make the Wood Badge course seem very interesting, but I now know from experience. When you’re out in the woods and actually camping and practicing outdoor skills, the instruction holds quite a bit of value. Honestly, the training was outstanding. Wood Badge training consists of six 16-hour days, covering skills like listening, communication, team building and dealing with change. There are games and physical activities that help you learn to work as a team to solve problems. These are all skills that certainly apply to Boy Scouts, but they’re also applicable to our jobs and our personal interactions.

Many others who took the course were as pleasantly surprised as I was. Some said it was much more effective than corporate training they’d been through. By the end I had to agree. We were all strangers coming in, but we bonded more than you’d think possible. And we all wanted to be better Scout leaders and get more involved in the program.

My one regret is I didn’t do it sooner. I wasted years by being so close-minded.

So does this experience have anything to do with working on IBM Power Systems? I think so. Obviously, it showed me the value of a good attitude. It also demonstrated, once again, that as much as you can learn by reading, nothing beats hands-on training.

I’m now working on my Wood Badge tickets, and hoping to someday return to Happy Land. So what are you working on? What goals have you set for yourself? How do you plan on accomplishing them?

A Big Step Forward in Storage

Edit: Some links no longer work.

Originally posted May 14, 2013 on AIXchange

As a consultant I get to play with some cool, cutting-edge technologies. However, I have yet to get my hands on a half-petabyte storage array, consisting of only flash drives:

            “On the 12-hour flight from Zurich to San Francisco, the two scientists plotted out the fastest way to install and setup the two racks — each filled with 240 terabytes of Flash provided by Texas Memory Systems (an acquisition IBM completed in October 2012), as well as 10 IBM Power 730 Express servers.

            “‘This demonstration marks a tipping point for transactional workloads. It’s the first time Flash storage has outperformed hard disks in all aspects, including capacity and performance density, and cost per Input/Output Operations Per Second (IOPS) and energy efficiency,’ Ioannis said.

            “By the numbers, the two achieved a remarkable feat: the IBM Flash System 820 achieved more than 6 million IOPS running an IBM DB2 workload on IBM Power servers.

            “‘In terms of energy our system runs on 19 kilowatts compared to 4.5 megawatts with high capacity hard disks, a 236 fold improvement,’ Nikolas said.”

This article points to IBM’s claim that flash “can speed the response times of information gathering in servers and storage systems from milliseconds to microseconds – orders of magnitude faster. Because it contains no moving parts, the technology is also more reliable, durable and more energy efficient than spinning hard drives.” According to the article, by year end IBM will open 12 “flash competency centers” worldwide for the purpose of introducing its customers to the technology.

A solution that uses less energy while providing massively superior performance? Sign me up. Seriously, I’m hoping I can visit one of those flash competency centers soon.

One more thing from this article:

            “A deal has been announced between IBM and Sprint Nexel involving the installation of nine flash storage systems in Sprint’s data centre, amounting to 150TB of flash capacity. Flash is used to accelerate Sprint Nexel’s phone activation application and the company is expanding its use of the technology to other parts of the data centre. Sprint has a strategy to move its most active data to all-flash storage systems.”Even on home systems, I’ve seen huge performance gains when going with solid-state drives (SSD) compared to hard disk drives (HDD). Although SSD costs are still higher, they seem to be dropping, and (knock on wood) I have yet to experience a failure with my drives.

Perhaps you can get your toes wet with something like this:

            “Storwize V7000 includes IBM System Storage Easy Tier, a function that responds to the presence of [SSDs] in a storage pool that also contains [HDDs]. The system automatically and non-disruptively moves frequently accessed data from HDD MDisks to SSD MDisks, thus placing such data in a faster tier of storage.

            “Easy Tier eliminates manual intervention when assigning highly active data on volumes to faster responding storage. In this dynamically tiered environment, data movement is seamless to the host application regardless of the storage tier in which the data resides. Manual controls exist so that you can change the default behavior, for example, such as turning off Easy Tier on storage pools that have both types of MDisks.”

Some people use external HDD to store lots of media files, but rely on SSD for with their main system. Manually moving the larger, less frequently accessed files to another storage media is something I like to call “poor man’s tiering.”

Is SSD indeed the future of storage? Is there something else I should be watching for?

Verifying Firmware

Edit: Link no longer works.

Originally posted May 7, 2013 on AIXchange

Hopefully you’ve seen Nigel’s post about verifying firmware before installing:

Be sure to check the comments for more information from the developers. For instance:

            Prevention

            Before installing Power firmware, verify through the firmware release notes/readme information that the selected level is supported on the targeted server MTM.

            Example of 01AL770_032_032.readme.txt:

            System firmware level 01AL770_032_032

            System Firmware Release for the General Availability of the POWER7 System p Servers 8231-E1D, 8231-E2D, 8246-L1D, 8246-L2D, 8246-L1T, 8246-L2T, 8202-E4D, 8205-E6D

            Recovery

            Set Boot Side to P. From ASMI:

            – Expand the “Power/Restart Control” menu.

            – Select “Power On/Off System.”

            – Under the “Firmware boot side for the next boot” option, select “Permanent.”

             – Click “Save settings”. (NOT ‘save settings and power on’)

            – Reboot the system. From ASMI:

            – Expand the “System Aids” menu.

            – Select “Reset Service Processor.”

             – Click “Continue.”

            – Wait for the system to reconnect and show stable state in HMC GUI.

            Perform a Reject Fix operation. From HMC:

            – Select the applicable server.

            – Select the “Updates” menu.

            – Select “Change Licensed Internal Code for the current release.”

            – Select “Advanced features.”

            – Select “Reject Fix – Copy Permanent to Temporary.”

            – Click “OK.”

            After the Reject Fix is completed successfully, revert the system to the T side to enable concurrent updates. From ASMI:

            – Expand the “Power/Restart Control” menu.

            – Select “Power On/Off System.”

            – Under the “Firmware boot side for the next boot” option, select “Temporary.”

            – Click “Save settings.”

            – Expand the “System Aids” menu.

            – Select “Reset Service Processor.”

            – Click “Continue.”

            Power on the server.

And also this comment:

            Abstract: REMOVING UNSUPPORTED POWER SYSTEMS FIRMWARE, SRC B1813463

            SYMPTOM: After applying an unsupported system firmware level to the temporary side of the FSP the system stops at SRC B1813463. To resolve this problem follow the steps below to remove the unsupported system firmware. Follow the instructions specific to the method used to update the code.

            IMPORTANT: Always consult firmware readme files and verify supported levels before updating or upgrading system firmware. HMC levels v7r6.3 and v7r7.2 include an update to verify the
system firmware level is supported before allowing a firmware update or upgrade to begin.

            PROBLEM ISOLATION AIDS:
            – The system may be any of the following IBM servers:

            IBM Power 710 Express Server, Type 8231, models E1C, E2B
            IBM Power 720 Express Server, Type 8202, models E4B, E4C
            IBM Power 730 Express Server, Type 8231, models E2B, E2C
            IBM Power 740 Express Server, Type 8205, models E6B, E6C
            IBM Power 750 Express Server, Type 8233, model E8B
            IBM Power 755 Express Server, Type 8236, model E8C
            IBM Power 770 Server, Type 9117, any model
            IBM Power 780 Server, Type 9179, any model
            IBM PowerLinux 7R1 server, Type 8246, models L1C, L1S
            IBM PowerLinux 7R2 Server, Type 8246, models L2C, L2S

            – This tip is not option specific.
            – This tip is not software specific.

            – The system has the symptom described above.

            FIX: User must follow the guidelines listed below to remove the unsupported code. Follow the instructions depending on the method used to update the code:

            — HMC Managed Systems

            1) Using the ASMI, set Boot Side to Permanent.
               a) Expand the “Power/Restart Control” menu.
               b) Expand the “Power On/Off System” menu.
               c) Under the “Firmware boot side for the next boot” option, select “Permanent.”
               d) Click the “Save settings” button. DO NOT click the “Save Settings and Power On” button. It will cause the server to power on running the unsupported firmware side and require that you restart the procedure.
               e) Expand the “System Service Aids” menu.
               f) Select “Reset Service Processor.”
               g) Click the “Continue” button.

            Note: If this step is not completed the unsupported firmware will not be removed and SRC B1813463 will be displayed again.

            2) Using the HMC GUI, wait for the system to reconnect and show a state of “Power off.”
            3) Using the HMC GUI, perform “Reject Fix -Copy Temp. to Perm.”
               a) Select the applicable server.
               b) Select the “Updates” menu.
               c) Select “Change Licensed Int. Code for current release.”
               d) Select “Advanced features.”
               e) Select “Reject Fix – Copy Permanent to Temporary.”
               f) Click the “OK” button.
            4) Wait for “Reject Fix” is completed successfully.
            5) Using the ASMI, set the Boot Side back to Temporary and reset the service processor.
               a) Expand the “Power/Restart Control” menu.
               b) Select “Power On/Off System”.
               c) Under the “Firmware boot side for the next boot” option, select “Temporary.”
               d) Click the “Save settings” button.
               e) Expand the “System Aids” menu.
               f) Select “Reset Service Processor.”
               g) Click the “Continue” button.

            — Stand alone systems via USB
            — Not available for 9117-MMx and 9179-MHx servers.

            Updating firmware via USB is independent of the operating system installed. The only restriction is that the server cannot be HMC managed.

            1) Remove all system firmware present in the USB drives root directory.
            2) Download the RPM file for the latest supported firmware, then copy it into the USB drives root directory. (Note: Only one level of code should be contained in the USB root directory.)
            3) Insert the USB drive to the top port of the FSP (left side port for tower systems).
            4) Change the FSP Boot Side from Temporary to Permanent using either method [A] ASMI, OR [B] Operator (control) Panel.
               [A] Using the ASMI:
                   1) Expand the “Power/Restart Control” menu.
                   2) Expand the “Power On/Off System” menu.
                   3) Under the “Firmware boot side for the next boot” option, select “Permanent.”
                   4) Click the “Save settings” button.
               [B] Using the Operator (control) Panel.
                   1) Use the Increment or Decrement buttons to select Function 02.
                   2) Press the Enter button.
                   3) Press the Enter button until the field marker moves to the right of the character “T.”
                   4) Use the Increment or Decrement button to change the “T” to a “P.”
                   5) Reset the FSP using either method [A] ASMI, or [B] Performing a pin-hole reset, or [C] Removing AC power.
                          [A] Using ASMI:
                             1) Expand the “System Aids” menu.
                             2) Select “Reset Service Processor.”

Power Systems Best Practices

Edit: This is still a good document, but the link keeps changing.

Originally posted April 23, 2013 on AIXchange

Recently I received this set of slides from Fredrik Lundholm covering best practices for Power Systems with AIX. I’ll cover a few highlights, though honestly, I could discuss every slide. The information here is that valuable. So I highly recommend taking take the time to view the entire thing.If you download his slides, be sure to look at the notes. For example on page 7 where he discusses a virtualized system design, the notes contain a couple of links relating to Entitled Software Support, including this ESS how-to guide.

Page 8 lists guidelines for capacity planning. Fredrik points out the rational starting places for your CPU and LPAR weights if no information is provided. The fact that you can make reasonable guesses without a ton of workload information just reminds me how forgiving this platform is. If things change, CPU and memory settings can be easily adjusted. Whole physical adapters can even be added or removed if necessary.

Page 9 covers firmware and using Microcode Discovery service and FLRT.

Page 11 tells you where to get fixes for the VIO server. The notes cover items that have been fixed in each release.

Page 12 covers network best practices. The notes contain a link to a step by step network configuration guide.

Page 13 shows a nice diagram of a shared Ethernet adapter load sharing configuration that is available in VIOS 2.2.1+.

Page 14 shows the recommended architecture when more than one VLAN is used.

Page 15 features a reminder about SEA and virtual Ethernet interfaces. Be sure to select large send and large receive; it’s not the default setting.

            For all SEA interfaces, chdev -l entX -a largesend=1   (survives reboot)

            For all SEA interfaces, chdev -l entX -a large_receive=1   (survives reboot)

Page 17 covers storage and the need to ensure that the correct multi-path drivers are installed.Page 18 has a nice picture illustrating how the configured machines will look.

Page 19 covers setting up fc_err_recov and dyntrk, along with setting up no_reserve and round_robin.

From page 20: To allow graceful round robin load balancing over multiple paths, set timeout_policy to fail_path for all physical hdisks in the VIO server:

            # chdev –l hdisk0 –a timeout_policy = fail_path

Page 21 has links to documentation for installing AIX. Page 22 has a nice chart illustrating good choices for running AIX. The red green and yellow color coding are intended to help you decide which TL to run.

Page 23 lists AIX tuning and values that should be changed.

Page 24 covers AIX 5.3 memory tuning.

Page 26 has a nice tip: Largesend increases virtual Ethernet throughput performance and reduces processor utilization. Starting with AIX 6.1 TL7 sp 1 and AIX 7.1 sp 1, the operating systems that supports the mtu_bypass attribute for the shared Ethernet adapter provide a persistent way to enable the largesend feature. To determine if the operating system supports the mtu_bypass attribute, run the following lsattr command [lsattr -El enX |grep by_pass]. If the mtu_bypass attribute is supported, the… command will return:

            mtu_bypass off Enable/Disable largesend for virtual Ethernet True

            Enable largesend on all AIX en interfaces through:

            chdev -l enX -a mtu_bypass=on

Page 27 shows the recommended vSCSI parameters on each client partition. Page 28 covers vSCSI Queue Depth tuning for different disk subsystems.

There is also a section on PowerHA. It’s recommended that new deployments go with PowerHA 7.1. Page 31 covers I/O pacing with PowerHA.

An FAQ starts on page 32. Here’s a tip I like:

            Q: How do I run nmon to collect disk service times, top process cpu consumption, etc?

            A: STG Lab services recommends the following parameters for nmon data collection:

            /usr/bin/nmon –M -^ –f –d –T –A –s 60 –c 1435 –m /tmp/nmonlog

            This will invoke nmon every minute and continue for 24 hours capturing vital disk access time data along with top processes.

            -d includes the Disk Service Time section in the view

            -T includes the top processes in the output and saves the command line arguments into the UARG section

            -^ includes the Fibre Channel (FC) sections

            On the HMC, there is an “Allow performance information collection” checkbox on the processor configuration tab. Select this checkbox on the partition that you want to collect this data. If you are using IVM… use the lssyscfg command, specifying the all_perf_collection (permission for the partition to retrieve shared processor pool utilization) parameter. Valid values for the parameter are 0, do not allow authority (the default) and 1, allow authority.

Starting on page 36 there are reference documents to older information, which may still be helpful for certain environments.

This is a fantastic set of slides with current, real world information and suggestions.

IBM i Turns 25

Edit: Some links no longer work.

Originally posted April 23, 2013 on AIXchange

Though the focus of this blog is AIX, there is value in discussing the other OSs that can run on IBM Power Systems: Linux, VIOS and IBM i. With that in mind, have you seen all the information and videos about IBM i turning 25?

While I primarily find myself on AIX these days, when I started in the late 1980s I worked on AS/400 systems, the predecessors to IBM i. Part of my job involved tending to a line printer that required us to change paper and forms. The most exciting part of the job was changing from green bar paper to white, and then back again (with an occasional run of custom forms thrown in).

The AS/400 was a great platform to work on as a computer operator. And compared to other operating systems of that era, OS/400 didn’t require much care and feeding. Those machines just ran.

I recall our IBM CE coming on site. He’d log in, look at logs and ask us how we were doing, but the only thing we ever really needed from him was to repair or replace the green screen displays we had connected to the AS/400 via twinax. He never had to actually do anything with the AS/400 box itself. Basically, the guy was our version of the Maytag repairman.

Of course over the past 25 years the AS/400 has gone through a few rebrandings. And over time IBM has brought IBM i and AIX together architecturally. One important thing AIX and IBM i now share in common is the capability to virtualize adapters using the VIO server. However, as AIX pros we are generally more comfortable with VIOS. Sometimes I hear IBM i folks complain about how complicated it is — and IBM is working to make VIOS more user friendly. But this is where, as an AIX/VIOS person, you can help your IBM i friends by configuring VIOS for them. Although you can certainly dedicate your adapters and direct connect to SAN storage, VIOS allows everyone to connect to the same SAN. That’s a nice advantage.

Speaking of the coming together of AIX and IBM i, you should know that COMMON, the conference that for years has centered on AS/400, iSeries, System i and IBM i technologies, continues to add more AIX content to its user group meetings. The one that took place in Austin, Texas, earlier this month had AIX courses covering application development, high availability, networking, systems management and web applications.

So did you know that IBM i is celebrating 25 years? Do you still make the mistake of calling it an AS/400?

If, like me, you worked on the AS/400 in the beginning, that’s one thing. But it’s neither technically correct — nor positive for the platform — to refer to today’s IBM Power Systems running IBM i as an AS/400. While it demonstrates the loyalty that users have always had to AS/400 systems, IBM Champion Trevor Perry points out that it needs to change.  As he states: “Conflicted people called it AS/400. Confused people called it iSeries. Confident people called it IBM i.”

I think AIX users can see his point. I mean, we love our systems, but I don’t know of anyone who still uses the name RS/6000. So what do you think? Does the name matter? Do you plan to step up and call it by its name, or are you going to remain conflicted and call it an AS/400?

The Search for Answers, the Need for Help

Edit: I still ask for help, hopefully you do too.

Originally posted April 16, 2013 on AIXchange

Sure, you work in the field of technology, but that doesn’t automatically make you a creature of social media. So really, how plugged in are you? From Facebook to Twitter to Google+ to news.google.com to plain old email, do you often see the jokes and memes and viral videos that go around the Internet? Or are you so insulated you not only don’t know that planking or the Harlem Shake fad is over, you never knew it was a thing to begin with?

Of course compared to 30 or even 20 years ago, we as a society have fewer and fewer shared experiences. Not that long ago there were four television channels (the three major networks and your local UHF station). People talked about the big TV events because everyone was watching the same things at the same times. You got to see Christmas specials once a year. The Grinch? Once. Rudolph? Once. There were no videos to rent, buy or download. Most households didn’t even have remotes, much less cable television and VCRs.

These days, someone might recommend a long discontinued show (Arrested Development, Firefly, Freaks and Geeks, IT Crowd, etc.) and — thanks to online services like Netflix or Hulu — you might binge on the entire series over one weekend.

To be sure, the way we consume mass media is changing. Even the most-watched programs now, like the Academy Awards or major sporting events, have significantly fewer viewers than what they enjoyed a generation ago. We’re at least as likely to find new music we like on YouTube or Internet radio or even in TV commercials as we are on what is now known as “terrestrial” radio.

If there’s a single vehicle for shared experiences today, it might be YouTube. Consider this presentation that’s generated more than 2 million views between YouTube and TED.com: It’s called “The Art of Asking,” and the presenter is a woman named Amanda Palmer.

I encourage you to watch the whole thing, but I’ll give you some highlights. Around the 9-minute mark she talks about how she got nearly $1.2 million from her Kickstarter fundraising project, and how “crowd-funding” worked for her. She talks about how her record label considered her a failure when she sold only 25,000 recordings. But it turns out that the same number of fans and supporters, around 25,000, created a successful Kickstarter project, and ultimately helped her raise $1.2 million. Selling 25,000 recordings may make you a “failure,” but getting 25,000 people to support you can make you a big success.

Around the 9:30 mark, Palmer mentions how she didn’t make anyone pay for her music; she only asked them to. By asking her audience, she connected with them. And she says when you connect with people, people want to help you.

Palmer concludes by saying we need to change from “how do we make people pay for music?” to “how do we let people pay for music?”

I think this phenomenon has always been a part of our world as IT pros. Because what we do is complex, and no one person has all the answers, we rely on one another. Many people — readers, clients, friends, what have you — ask me for help. And I can assure you that I get help from countless people. Sure, we give each other a hard time. We joke and fool around and say just RTFM. But over the years I’ve developed a mental list of trusted advisors, people I know who know things. I ask, they help. They ask, I help.

Oftentimes help comes in the form of simply answering a question. In your work, when you search for an answer to a technical matter, you’re exercising faith that not only that someone has found the answer, but that they’ve taken the time to put the correct answer out there. Many of my posts are based on real-life experiences. In this blog I attempt to share questions that were answered and things that were discovered. But you don’t need a blog to help others find answers. You can always share what you know in the comments section here or in any other forums you frequent. Your thoughts, ideas and experiences may one day be the answer someone else is searching for.

When people really need assistance, don’t you want to help them?

People Always Make the Difference

Edit: Still one of my favorite places to visit.

Originally posted April 9, 2013 on AIXchange

I recently wrote about visiting customer locations. I didn’t mention it then, but one visit really stands out. I had no problem finding the place, and there was nothing awe-inspiring about the physical environment. What I’ll always remember is how I was treated when I arrived.

Upon entering I was immediately greeted by a security guard — literally, greeted. He’s one of the happiest people I’ve ever met. He welcomed me, asked my name and showed me to the receptionist so I could get signed in and connected with the folks who were waiting for me.

Interacting with this person throughout the day, I noticed something. I wasn’t special. He greeted everyone that came through the door like an old friend. If he didn’t know someone, he asked for a name, and he remembered it.

The other thing that struck me was the reaction to the security guard. While most of the visitors smiled and nodded, only a handful ever actually uttered any response. I asked about this, and he told me that this was typical — his friendliness generally wasn’t reciprocated.

I honestly felt badly to hear that. I wondered he kept such a positive outlook in the face of constant indifference. After all, it’d be easy to conclude that his efforts simply weren’t worth it.

Then he said something I’ve heard a thousand times, but never fully appreciated. He told me he can’t control anyone’s attitude except his own. Despite the lack of response, he chooses on a daily basis to be happy at work and greet everyone by their name. Long story short, this guy’s choice really brightened my day — really, several days. It was a long-term project and I made several follow-up trips to that facility. The security guard always greeted me by name and with a smile.

Smile. Say hello. Remember names. It seems so simple, it seems so trivial. Yet these small gestures really do matter. You can have the world’s most luxurious facility, but people always make the difference. I visit a lot of customer locations, and I could write about some amazing, pristine work environments. But this experience means more to me. Given the choice, I’d much rather work with happy folks in an old building in the middle of nowhere.

It might not be a big deal, but ask yourself, right now, are you in a good mood? Are you smiling? Or are you having a bad day? Everyone has bad days of course, especially when confronted by external issues and problems beyond your control. Still, if you are having a bad day, could it be better if you just made the choice to be happier?

Open Source AIX Software Remains Plentiful

Edit: Some of these links no longer work.

Originally posted April 2, 2013 on AIXchange

Remember the UCLA freeware repository? This post is part of a discussion surrounding the repository going offline back in 2007. As Nigel wrote at the end of this thread:           

“There are still people active in this area. Take a look at www.perzl.org/aix. I got Apache, PHP, rrdtool and the wonderful Ganglia (with POWER5/6) enhancements from here. I would also recommend telling your local IBM representative that you think this needs to be fixed. Customer pressure is a good incentive for IBM to get organized, sort this out and eventually works.”

As this old post shows, perzl.org has been around for a while, though plenty of admins are unaware of it. For instance, just recently when a customer was interested in getting gnupg working on AIX and they were having trouble getting the package dependencies worked out, I referred them to this tip:

            “A solution to the RPM dependency… problem. I guess everybody who has installed a couple of RPM packages using rpm itself and not the help with a tool like yum ran into the following issue:

            1) You have downloaded and want to install RPM aaa.rpm.
            2) aaa.rpm has dependency on bbb.rpm and ccc.rpm.
            3) bbb.rpm has dependency on ddd.rpm and ccc.rpm on eee.rpm and fff.rpm.
            4) etc. 

“So you end up circling through all your RPM files and downloading all prerequisite RPM files just to install aaa.rpm. This can become quite annoying and time-consuming for packages with lots of dependencies. This is actually where a tool like yum is helping you a lot because it does all the steps outlined above for you. Unfortunately, I have so far found no way of compiling and providing YUM for AIX that could be done in a compatible manner (to the IBM provided RPM) as AIX still uses the old V3.0.5 version of RPM while all RPM-based Linux distributions have switched to RPM V4.X a long time ago. Also all recent YUM versions require at least a RPM version >= 4.4.

“My solution approach to this problem:

• Basically what you want is a complete and self-contained list of dependencies for the RPM file aaa.rpm.
• You download all the RPM packages on this list (make sure that you have downloaded them all into a separate directory which was empty before).
• After downloading all the RPM packages on the list you can just install the RPM file aaa.rpm as easy as rpm -Uvh *.rpm
• This approach mimics kind of the AIX NIM behavior of a software bundle (the list here) and a lpp_source (the separate directory containing all required RPM files).”

Read more in the Perzl.org FAQ.

In the meantime, hopefully this procedure can help someone else with a similar situation. Just get whichever .deps file you’re interested in for the package you want to install:

1) If wget is not already installed on your system, download wget to /tmp/gnupg (or some other temporary location)

ftp://ftp.software.ibm.com/aix/freeSoftware/aixtoolbox/RPMS/ppc/wget/wget-1.9.1-1.aix5.1.ppc.rpm

Install the rpm with: rpm –ivh wget*rpm2) Use wget to download the gnupg rpm dependency file to /tmp/gnupg.

Use this file for AIX 7.1:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix71/gnupg-1.4.13-1.aix5.1.ppc.deps

Use this file for AIX 6.1:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix61/gnupg-1.4.13-1.aix5.1.ppc.deps

Use this file for AIX 5.3:

wget http://www.oss4aix.org/download/rpmdb/deplists/aix53/gnupg-1.4.13-1.aix5.1.ppc.deps3)

From /tmp/gnupg, run wget -B http://www.oss4aix.org/download/everything/RPMS/ -i
gnupg-1.4.13-1.aix5.1.ppc.deps.

This will download the dependencies needed to install gnupg.

4) Run rpm –Uvh *rpm. The dependencies are now installed. (On one test LPAR I got a warning about a conflict with /opt/freeware/man/man3/Thread.3. I got past it by running rpm –Uvh –force *rpm.)

5) Download gnupg:

wget http://www.oss4aix.org/download/RPMS/gnupg/gnupg-1.4.13 -1.aix5.1.ppc.rpm

6) Install with rpm –ivh gnupg*rpm.

You should now have gnupg on your system.

            /opt/freeware/bin/gpg –version
            gpg (GnuPG) 1.4.13
            Copyright (C) 2012 Free Software Foundation, Inc.
            License GPLv3+: GNU GPL version 3 or later
            This is free software: you are free to change and redistribute it.
            There is NO WARRANTY, to the extent permitted by law.

            Home: ~/.gnupg
            Supported algorithms:
            Pubkey: RSA, RSA-E, RSA-S, ELG-E, DSA
            Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
                    CAMELLIA128, CAMELLIA192, CAMELLIA256
            Hash: MD5, SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
            Compression: Uncompressed, ZIP, ZLIB, BZIP2

Here’s a list of open source tools in addition to perzl.org. Perhaps you’ll recognize some. Even better, perhaps you’ll find something that’s new to you. Also check out IBM AIX Toolbox download info and Bull AIX freeware.

So which of these open source software repositories do you use and recommend?

The Value of Test Systems

Edit: I still love test labs.

Originally posted March 26, 2013 on AIXchange

Two weeks ago I asked readers to recommend some resources for IT pros who are new to AIX. The first comment was simply this:

“Can’t beat playing around on a test system!”

I couldn’t agree more. I write plenty about the value of training and how it’s worth your time to read IBM Redbooks, and these things are great. Still, nothing beats hands-on learning. I know that back when the size of a JFS filesystem couldn’t be reduced, I was very grateful that my indoctrination to growing a file system came on a test box rather than a production system. I was new, and I needed that practice playground. I think it’s unfortunate that so many customers who switch to AIX from another operating system neglect to add at least one test box to this new environment. At least one. And with multiple test machines, it becomes possible to do things like PowerHA, shared storage pools and live partition mobility testing. With the reasonable cost of current 710 and 720 models, I’m amazed that more customers don’t automatically add test machines to their hardware orders.

And speaking of training, if your boss doesn’t want you out of the office for a week attending an educational conference, tell him there’s an alternative: Just get me a test box. I think every IT pro understands that there’s a huge difference between reading about something and actually doing it. A test box is like a classroom that’s always open and available.

What do you need a test box for? What don’t you need it for? When a test box is available, programmer/administrator mistakes are learning opportunities rather than lost uptime. Test boxes are where we learn, where we validate, where we get comfortable with the technology. If you ask around, I think you’ll find that the people who excel at their jobs generally have spent considerable time on test systems. Certainly access to test hardware makes for more confident admins.

Now, if your employer absolutely won’t pay for one, there are other ways to access a test system. IBM has a virtual loaner program available to business partners. With this you can at least logon to the command line of a remote AIX systems. Of course this isn’t the same as having a test box onsite, available whenever you want to play around with it.

While it’s frustrating thinking about customers that won’t provide test boxes, what’s even worse is hearing from IT pros who don’t use the precious access they have. I really do hear some complain, “I have the lab, but I have no time to use it.” Geez… make time! Skip an episode or two of “Mad Men” or “Big Bang Theory” or “Scooby Doo” or whatever it is that people watch these days. If it really matters to you, you’ll find the time to further yourself professionally.

So do you work with test machines? Does your employer provide them or did you break down and buy an old POWER5 box off of eBay? Please share your experiences in comments.

Helping Visitors is Also Part of the Job

Edit: Still worth considering. Some links no longer work.

Originally posted March 19, 2013 on AIXchange

I visit many customer locations, and each experience is unique. Of course some are more pleasant than others. From my previous jobs at IBM and elsewhere, I know what it’s like to work in the same building every day. But as a consultant, I also know what it’s like to show up at an unfamiliar facility.

While it’s not easy to see your workplace through the eyes of someone who’s never been there before, it’s important to understand the needs of anyone who might need to come into your facility on a one-time or temporary basis. Think of the basic information a newcomer would need. When I go on a customer visit, the first things I need are an address and (hopefully) a contact name and phone number. The fact that phones now have GPS capabilities has truly simplified my life. It wasn’t that long ago that I had to deal with paper maps and/or printed directions from MapQuest or some other website. If I missed an exit or got turned around in a strange city, it could be difficult to get back on track. Now with the automatic rerouting I scarcely put any thought into my trips.

Of course, nothing is guaranteed in this life, including cell phone service. On one customer visit I was without service due to a snowstorm. At least I’m guessing it was the storm. It could have been a coincidence. Whatever the cause, the entire cellular network went out. I lost the maps on my phone, and of course I didn’t have anything printed out. Fortunately, I was near my destination and familiar with the area in general, but in some other town that could have been disastrous.

It’s also important to understand the limitations of GPS. It generally gives you the shortest route, but the shortest route is not always the best route. In fact, if you don’t know where you are you can find considerable trouble just blindly following GPS directions.

So, about your workplace: Is it friendly and accessible to strangers?

Can visitors easily get into the parking lot, and once there, can they make use of close-by visitor parking spots? If your parking situation is unique, you need to let your visitors know. Some companies have guard gates and allow visitors only with advance notice. If this is the case, notify your company’s security team about your visitor’s arrival.

When your visitor arrives, is it clear where they should enter? At one site we were told to enter Door 1. It turns out there were several doors labeled Door 1. Visitors need to know exactly where they need to be.

More security considerations: Do visitors to your company need to sign in with reception? Do they need to show ID and get their picture taken? Do they need to have you come and physically escort them? Even simple things like restroom access can be an issue. I’ve been to many sites where there’s no visitor’s restroom. Needless to say, after a long drive, I generally have use for such facilities. If you don’t have a place where visitors can freshen up, let them know.

Are visitors’ backpacks or electronics inspected at your site? Is there a metal detector? Are phones with cameras allowed on the raised floor? Is there guest wireless access? Is there a place to eat? Is there even a place to sit?

While most companies do right by their visitors, I’ve heard of and experienced my share of horror stories.

What about you? Is your facility visitor-friendly? Or if, like me, your job frequently takes you to new sites, have you had any issues?

For the AIX Newbies

Edit: Some links no longer work. Updated the roadmap to a list of courses from Global Knowledge. Another link to try is this one: https://www.ibm.com/services/learning/us/

Originally posted March 12, 2013 on AIXchange

I assume most of the readers of this blog have years of experience with AIX. But it’s important to recognize that new users regularly come to this operating system. Often these pros previously worked on other versions of UNIX, or even another operating system altogether. If you’re new to AIX, you should be aware of the numerous available options for getting up to speed.

We’ll start with free sources of AIX information and education. The amount of high-quality, freely available information may surprise you. Of course I’ll start by mentioning this blog and recommending that you sign up for IBM Systems Magazine. (Even if you’re not new to the platform, why not get a free copy of the magazine?)

You can also view these quicksheets and quickstarts and these Nigel Griffiths videos. Then there’s the AIX Virtual User Groups. Be sure to check out both the upcoming sessions and the highly informative replays.

The other primary source of free AIX education is IBM Redbooks.

Everything I’ve listed to this point is freely available. Still, to get the most from your experience on AIX, you should also invest in training. Which classes should you attend? This depends largely on your background and experience, as well as, of course, your available time and monetary resources. IBM’s training website is a good starting point. There’s an AIX Users and System Administration roadmap to plan your training, which can take the form of instructor-led online classes, traditional classroom training or self-paced virtual courses.Here are some more specific roadmaps:

AIX Security, Network Administration, Problem Determination, Virtualization and Performance

PowerHA SystemMirror for AIX

AIX Systems Management, Clustering, Internals and Cloud Computing

Once you select the training you want to take, you can consider ways to save. IBM Education Packs can be used for either for IBM classroom, online or onsite training courses. They can also be used to attend IBM technical conferences.Finally, here’s a list of one-day IBM Power Systems training events that run through June.

Non-IBM vendors provide training as well, including Jack Morton and M/UX Data Systems.

Think about someone who’s new to AIX. What other sources of information would you recommend to someone just getting started with the operating systems? Please leave your suggestions in comments.

Readers Respond

Edit: The comments have been lost over the years. Some links no longer work.

Originally posted March 5, 2013 on AIXchange

My recent post about command line shortcuts generated some very good responses. For instance:

            “The part about looping on a set of values reminded me of seq. I missed it from my Linux days, and so had written an imitation in perl before realizing that the AIX Toolbox for Linux Applications page has it packaged in the coreutils RPM.”

Another reader pointed out the apply command:  

“I’m a fan of the least known command, apply. So for your first command:


            apply “lscfg -vl fcs%1” 0 1 2 4 | grep Net
 

Be sure to read the other comments from that post for more great tips.

Incidentally, if you want to learn more about the apply command, look at the man pages. Run “man apply” on your AIX machine and you’ll see:


            “The apply command runs a command string specified by the CommandString parameter on each specified value of the Parameter parameter in turn. Normally, Parameter values are chosen individually; the optional -Number flag specifies the number of Parameter values to be passed to the specified command string. If the value of the Number variable is 0, the command string is run without parameters once for each Parameter value.


Notes:
Because pattern-matching characters in CommandString may have undesirable effects, it is recommended that complicated commands be enclosed in single quotation marks (‘ ‘). You cannot pass a literal % (percent sign) followed immediately by any number without using the -a flag.”

It seems wherever I go, I learn something new. I certainly learn from commenters on this blog, but throughout my career I’ve been fortunate enough to interact with others who’ve taught me simple tricks that have made my job easier. It’s truly one of my favorite things about my career choice.

For instance, there was the customer who informed me that you can move the Windows toolbar from the bottom of the screen to the right side. It takes some getting used to, but if you have a ton of applications open, this option really seems to make better use of your desktop space.

Another customer introduced me to a tool called launchy that I’ve come to love.

Long ago I learned that by running the following from your VIO client…


            #lspath -F “name path_id parent connection status”
            hdisk0 0 vscsi0 810000000000 Enabled
 

… you can map that output to your VIO server when you run lsmap –all.
 

Check it out. The LUN information in your lsmap output…


            LUN           0x8100000000000000
 

… directly corresponds to the lspath information above. This is another way to map disks from the VIO client to the VIO server.

As others have taught me, I’ve tried to return the favor by writing about some lesser-known tools that I’ve relied on over the years. Two of my favorites have always been VNC and screen, but of course the list has grown through time. Back in 2009 I pointed out some other useful tools.So let’s revisit that discussion. Is there some undeveloped (or unknown) capability that you’d like to see? And which desktop tools do you use now on a daily basis that you couldn’t live without?

When Words Don’t Get the Job Done

Edit: Still true, although google translate and duolingo can help these days.

Originally posted February 26, 2013 on AIXchange

As a youngster I worked on AS/400 systems. One day I needed to go from our U.S. corporate headquarters to our manufacturing facility in Tijuana, Mexico, to help install some dumb terminals and printers. I’d fly from Phoenix to San Diego and then walk across the border, where I was picked up by the manufacturing guys. Border crossings were more manageable in those days, since you could get over and back with only a driver’s license.

Being raised in the southwestern U.S., I’ve been around Spanish-speaking people my whole life. Despite this — and the Spanish classes I took in high school — I never really picked up the language. So when I go south of the border, I have to hope I run into English-speakers.

On this particular trip I remember trying to communicate with the crew I was working with. Only the office manager spoke English; the others at the plant did not. Somehow we got everything to work, but the language barrier made for a long and occasionally exasperating day.

I still run into some of the same thing with international teams. When I worked at IBM I remember a project where the developers were in Germany writing code, while the servers and the administrators (including me) were in the U.S. Although their English was good (certainly much better than my non-existent German), we still had to overcome time zone differences and other little misunderstandings along the way.

Interacting with others around the world, I can tell you that language barriers can be quite frustrating. Over the years I’ve had numerous discussions that bordered on games of charades (or perhaps, Pictionary) — different groups of people diagramming, pantomiming or simply guessing at what the other side was saying. Honestly, during those moments it’s tempting to view those who don’t speak your language as less intelligent. I suppose others could have looked at me that way, too — particularly since so many people in other countries at least have a grasp of English, whereas I don’t speak any foreign language. Sometimes in these situations the written word is more easily understood than the spoken word. After all, there are no accents in email or instant messaging communications.

Despite some difficulties here and there, I have mostly fond memories of these interactions. For instance the engagement with the Germans ended well. The German team came over for the go-live, and I got to play tour guide. During some down days we went sightseeing around Colorado. I still remember driving the car out of the Rocky Mountain National Park while a baseball game was in progress on the radio. It was fun to try to explain a game they had never seen based on solely on the announcer’s descriptions of the action.

I’ve been on the other side of this, too. A few years ago I was in South Africa visiting family when the South Africans were taking on Australia in cricket. This was a big deal there. TV broadcasts trumpeted the big “five-day test.” What I remember is that after the five days, the thing somehow ended in a tie. The locals tried to explain it to me, but I never did quite get what the fuss was all about. Of course, back home, the NFL postseason was going on, and I could never get the South Africans to understand either the game of American football in general or why I was so interested in those playoff scores.It’s a cliche, but the world does seem to be getting smaller. All in all, that’s a wonderful thing.

Do you interact with friends, family or coworkers from other countries? What methods do you use to help make sense of one another?

The Case for Patching

Edit: Still important to consider.

Originally posted February 19, 2013 on AIXchange

Do you update your systems? Do you patch your machines monthly? Quarterly? Annually? Do you ever patch?

Are change windows built into your environment (e.g., there’s scheduled system maintenance, say, the third Sunday of each month)? Is it too difficult to get the various applications owners to agree to a set downtime because you have so many different LPARs running on your physical frame? Is downtime simply not allowed in your environment?

Over the years I’ve met a number of people who live by the “if it ain’t broke, don’t fix it” adage. What’s funny is oftentimes the older a system gets, the more reluctant customers are to maintain it. Logically these systems have a greater need for attention than something just out of the box. Of course we’ve all used, seen or at least heard about systems that just kept running. Recently I saw F40s that are still in production, still running AIX 4.3 and still chugging along. And sure, they can keep going for a long time to come. We are fortunate enough to work with incredibly powerful and well-built hardware.

But just think about an older system — not only the hardware that’s running old microcode, but the HMC that’s running old code, the operating system that hasn’t been patched and the application that hasn’t been updated. Even if the machine isn’t visible to the Internet, there’s still great potential for things to go wrong. And if something does go wrong, how would you respond?

Customers in this situation know they’re on their own, and they’re OK with it. Typically I’m told that the application vendor is no longer in business, so they can’t get support for that code anyway. If their hardware dies, they hope they can find someone who can help them — someone who’s familiar with the limitations of older OS versions. They hope they can still get parts for their old hardware. (Along those lines, I know of folks who buy up duplicate servers just so they can have parts available to swap out. I just hope that these customers realize that tearing out part of an old machine and successfully putting it into another old machine is a unique skill.)

So I’ve heard it all, but I’ll never truly understand people who would take these chances. Why rely on hope? There are alternatives — alternatives that don’t involve buying all new systems.

For instance, if you’re running AIX 5.2 or 5.3, you can move onto newer POWER7 hardware by utilizing versioned WPARs. This allows you to keep running your older code on newer, supported versions of the operating system, which in turn provides you with some limited support options.

Many of us who’ve called IBM Support learned that our issue was a known problem that was addressed with an operating system fixpack or firmware update. That’s the advantage of paying for regular maintenance. Updates to your machines and operating systems take care of the known issues.

Of course some will then argue that making these types of changes could introduce new bugs or issues that would have been avoided by not fixing what wasn’t broken. My response to this argument is that test and QA systems are really important. Implement your changes on these boxes first; then move them into production.

Some methods to consider for hardware maintenance include Live Partition Mobility (LPM) or PowerHA. With LPM you can evacuate running LPARs onto other hardware with no downtime, conduct maintenance on your source hardware and then move the LPARs back to the original hardware. Using PowerHA you can move your resource group to a standby node, conduct maintenance on your original node and then move your resource group back. In this case a short outage for the application to restart each time the resource group moves is required, but PowerHA is much faster than some alternatives.

(Note: Whether or not you’re doing maintenance, periodically moving your resource groups around in a PowerHA cluster is a good idea. By doing this you can make sure that the failover actually works, and that changes haven’t occurred on node A weren’t replicated on node B.)

For OS upgrades you might use alt disk copy or multibos to update your rootvg volume group by making a copy of it and updating that copy. You can boot from that copy after the update, and if anything goes wrong, you can quickly change your boot list and return to the original boot disk. This would simplify your backout process if you needed to go back for any reason.

So where do you stand on patching? Let me know in the comments.

Sometimes Even Consultants Need a Consultation

Edit: Some links no longer work.

Originally posted February 13, 2013 on AIXchange

Recently I was brought into a large migration project that was already underway. An outside team had done the design, and the goal of these folks was to create a system that emphasized simplicity. To make it easy to manage, they decided that the system wouldn’t have virtualization or allow the sharing of resources. Each LPAR would have dedicated adapters and dedicated CPUs.

It’s been some time since I’ve seen large systems designed and set up like this. I will admit that, with non-virtualized systems, determining which card belongs to which LPAR is a snap.

Of course there were still challenges. As the system was being set up, the decision was made to install IBM Systems Director on one of the NIM LPARs. This immediately raised a red flag in my mind, because I recall a Nigel Griffiths presentation where he said — and I paraphrase — oh no, never, ever, ever, ever install a NIM server and Systems Director together on the same LPAR running AIX. Really he probably just said this isn’t a good idea.

So I contacted Nigel, and my questions and his responses became the subject of this blog post:

“I have cut down the questions a bit but it is two parts: His customer is thinking about putting Systems Director on to their NIM server. Rob remembered me commenting on this but wants the details. They are planning to give it one dedicated POWER7 core and 12G memory. What do I think about that?

“Two years ago this combination (NIM & ISD) was not allowed (I think it was just not tested so not supported rather than it being a problem), but now is OK. So you can find older [web] pages with duff info.

“However, I do NOT recommend it. To get NIM to push out the very latest AIX version, the NIM server needs to be at that AIX level. But Systems Director may not be supported at that very new AIX level. Then you can’t get Systems Director support. This is a ‘will probably work but not supported’ risk that you have to decide [whether to take].

“Running a single dedicated CPU… will make Systems Director look and feel slow. With a dozen users the CPU use will go up and be a lot more peaky. With NIM it would not matter but Systems Director GUI would suffer and so would the user. Personally a dedicated CPU for NIM is pretty dumb — that CPU could be used elsewhere most of the time.”

I agree. A dedicated CPU is dumb when you could use dedicated donating or shared processor pools. However, in this case, I wasn’t involved in the design of the server. I was only asked by the customer if I could make it work.

My gut feeling was that mixing these workloads was a bad idea, and mixing them on only one dedicated core made it even worse. I certainly understand the customer not blindly taking me at my word, so that’s why I brought in the big guns — i.e., Nigel. Between his presentations, his videos and his all-around Power Systems knowledge, I knew he was the right person to ask. And I’m grateful for his fast response.

No matter how experienced and accomplished you are, it never hurts to have someone you can go to who can give you an answer or validate your own course of action. Who do you have in your corner when you get stuck?

IBM Expands the POWER7+ Server Family

Edit: The links still work as of the time of this writing.

Originally posted February 5, 2013 on AIXchange

After unveiling the first POWER7+ machines in October, IBM is now adding more servers to the POWER7+ family: a new model, the 760, along with refreshed 710, 720, 730, 740 and 750 machines. The new lineup also features POWER7+ chips going into the PowerLinux 7R1, 7R2 and new PureFlex nodes based on the POWER7+ processor.

The refreshed 710, 720, 730, 740, 7R1 and 7R2 machines are set for Feb. 20 general availability. The refreshed 750 and the new 760 will GA on March 15.

The information that follows is gleaned from my participation in various IBM-hosted pre-announcement training sessions. Prior to IBM announcements, business partners are invited to attend sessions that cover the details of the announcement. IBM Power Champions also receive access to additional pre-announcement sessions. Sessions conducted over the past few weeks have covered the different operating systems that run on Power Systems servers, as well as all of the new hardware that is being announced today. Because this information is embargoed, those of us who take part in training sessions agree (by signing nondisclosure agreements) to not discuss what we learn prior to the announcement date.

One point of emphasis with today’s announcement is that the new technology should deliver performance improvements across the product family. (Of course the amount of improvement varies, based on the model chosen and the workload running on it.) IBM also highlighted the pricing changes on the 710 and 730 models, which are supposed to be comparable to the pricing we might expect to see on 2U x86 servers.

As these announcements continue to roll out, it’s worth noting that IBM consistently sticks to its schedule when introducing new products. Not every technology vendor is so reliable. Also keep in mind the big picture. We had POWER5, then POWER6 arrived a few years later. Now we have POWER7. It doesn’t take a genius to figure out that the next versions of POWER processors are being developed as we speak, and that future generations are already in the planning stages. IBM continues to demonstrate its commitment to the platform.

The same holds true for the operating systems. Both IBM i and AIX versions have been updated every few years, with additional functionality delivered via service packs and technology levels. I don’t see this development effort slowing any in the coming years.

POWER7+: The Specs 

As was announced last fall, POWER7+ has 10 MB of L3 cache per core. Its memory compression engines allow for less overhead with active memory expansion, and the chip contains onboard encryption engines and random number generators.

An exciting feature of the POWER7+ machines is their capability to double the maximum number of LPARs on a frame by allowing you to allocate 0.05 of a core to an LPAR. In the “old days,” with a hypothetical 1-core machine, you were limited to 10 LPARs, as each LPAR could only be assigned a minimum of 0.10 of a CPU. Now you can take your 1-core machine to 20 LPARs, with each assigned a minimum of 0.05 of a CPU. This effectively doubles the number of LPARs you can have on POWER7+ machines versus POWER7 machines. Obviously, the higher limits on memory per frame mean you can do more serious workload consolidation.

The specs for these servers are pretty impressive:

  • The 710 is a 2U server with 4-, 6- or 8-core options and up to 256 GB of memory. It has five PCIe Gen2 slots and can support 160 LPARs.
  • The 720 is a 4U server with 4-, 6- or 8-core options and up to 512 GB of memory. It has five PCIe Gen2 regular height slots and four PCIe half height cards, and can support up to 160 LPARs.
  • The 730 is a 2U server that supports 4, 6 or 8 cores per socket, for 16 cores total. It can have up to 512 GB of memory, five PCIe Gen2 slots, and can support up to 320 LPARs.
  • The 740 is a 4U 2-socket server with 6 or 8 cores per socket, for up to 16 cores total. It can have up to 1 TB of memory, five regular height PCIe Gen2 slots and four half height PCIe Gen2 slots. This machine can support up to 320 LPARs.
  • The 750 is a 5U server with four sockets. With all four sockets populated, it’s a 32-core machine with speeds of 3.5 GHz or 4 GHz. It has up to 1 TB of memory, six PCIe Gen2 slots and two GX++ slots, and an integrated split backplane. This machine can support up to 640 LPARs. It can be managed by either IVM or an HMC. It comes with 3 years of 24×7 maintenance coverage.
  • The 760 is also a 5U server with four sockets. If fully populated with six cores per socket, it can have up to 48 cores running at 3.1GHz or 3.4GHz. It has up to 2 TB of memory, six PCIe Gen2 slots and two GX++ slots, and an integrated split backplane. This machine can support up to 960 LPARs. It must be managed by an HMC. It allows for Capacity on Demand for processors. This machine comes with three years of 24×7 maintenance coverage. Unlike the other models being announced, IBM must install this machine. The others of course can be set up by customers.

In the education I attended, IBM said the new 750 and 760 servers offer enterprise system features at express system pricing.

As a reminder, the 770 and 780 can have 4 TB and the 795 can have 16 TB of memory. In the training sessions it was often mentioned that even greater amounts of memory are available to system LPARs through the use of active memory expansion.

More Announcement News

Along with the new hardware, there are new versions of VIOS and AIX software. In addition, IBM has released a statement of direction that points to future support of AIX 5.3 with a service pack on the new servers. Service packs AIX 6.1 TL7 SP7 and AIX 6.1 TL8 SP2 will support these new servers.

Another statement of direction notes future availability of a service pack for AIX 7.1 TL0 and TL1, with a SP2 available for AIX 7.1 TL2. VIOS will require 2.2.2 to run on the new hardware.

IBM is also announcing 2-port 16 Gb Fibre Channel adapters — a 4-port adapter with two ports of 10 Gb FCoE and two ports of 1 GbE. There’s also an enhanced integrated multifunction card where the RJ45 ports are capable of running at 10 Gb. 387 GB SSD 6- and 4-pack options will be available with new server orders.

Finally, there are announcements around IBM AIX Solution Edition for Cognos on Power and IBM AIX Solution Edition for SPSS on Power.

With this announcement, the entire range of the product family (save for the 795, a POWER7 model) is ready to run POWER7+ chips. Which server are you most excited to run in your environment? What features are you looking forward to seeing in action?

IBM hosted an announcement webcast this morning. You can also view Nigel Griffith’s announcement video and Jay Kruemcke’s announcement blog.

Another Source of AIX Info

Edit: Only the last link still works now that developerworks went away.

Originally posted January 29, 2013 on AIXchange

To keep up on IBM Power Systems, I rely on various resources. I’m a long-time reader of Nigel Griffiths’ AIXpertAnthony English’s AIXDownUnder and Chris Gibson’s AIX blog, and I follow a number of folks on Twitter. @IBMRedbooks, @mr_nmon, @chromeaix, @POWERHAguy, @cgibbo, @aixdownunder are just a few “twitterers” who provide good insights and links.Twitter is also where I discovered Brian Smith. Brian is another IBM developerWorks blogger who produces tons of valuable information. For instance, check out his post on using uuencode to embed binaries in scripts and copy/paste transfer files between servers:

            “If you have a relatively small binary file that you need to transfer between servers, you can easily transfer it by copying and pasting it using uuencode/uudecode. This can be a time saver in some circumstances. It also might be helpful if you have a server that isn’t connected to the network but for which you can get a console on through something like the HMC.

            “In this example, we will copy and paste the /usr/bin/ls binary between servers.

            On the source server, type:

                uuencode /usr/bin/ls /dev/stdout

            Then copy all of the output in to the clipboard.

            On the destination server, type:

                uudecode -o /tmp/ls

            “Then press enter, and then paste in the uuencode output from the source server. The copy/pasted ls binary will be saved to /tmp/ls. You can verify the source and destination ls files are identical by comparing the checksum of the files with the csum command. “

Brian also writes about scripting importvg on AIX:

            “There are some situations where you need to exportvg volume groups, and then reimport them  This often occurs when doing disk migrations between servers. The usual routine is to record which PVIDs go with which volume groups, and when you need to import the volume groups again run an importvg and specify the correct volume group name with the hdisk that has the matching PVID. You generally can’t rely on the hdisk name/number because it might be numbered differently.

            “To make this easier, I wrote a small script that automates this process. …”

These are great tips and tricks. I could easily highlight a dozen of Brian’s posts, but I’ll limit myself to three more:

Version 1.0 RC1 of prdiff released

            “[prdiff] is the tool that will compare your LPAR running configuration with the profile and report differences. For more info and to download, go here.

            “This can come in handy if you are not certain that your running profile has been modified with DLPAR operations without also modifying the script definition on the HMC.”

Reset padmin VIO password from the HMC with zero downtime

            “Here is a method you can use to reset a lost VIO padmin password from the HMC with zero downtime on the VIO server. This is a somewhat involved process, but much easier than having to take a downtime on the VIO server to change the password. This is a very challenging task because the viosvrcmd HMC command doesn’t allow the command run on the VIO server to have a pipe (“|”), or any redirection (“<“, “>”) and doesn’t allow for interactive input. So this rules out using something like “chpasswd” to change the password.”

New version of EZH (Easy HMC Command Line Interface) — Including interactive menu

            “For those of you not familiar with EZH, it is a script for the HMC that provides a very simple and easy to use interface to the HMC command line so that you can very quickly, efficiently, and easily complete day to day administration tasks from the command line. It is very easy to install and doesn’t require any modifications to the HMC (the script runs within the restricted HMC shell).

            “I released a new version today with many improvements, including support to easily DLPAR CPU, Virtual CPU, Memory, and VSCSI/VFC slots.

            “It also includes a new interactive menu that you can access by using the ezh command…. More information is available at http://ezh.sourceforge.net/

I’m always looking to add to my reading rotation, so if you have an AIX resource that I’ve overlooked, please let me know in comments.

Blockbuster Performance: Then and Now

Edit: Some links no longer work.

Originally posted January 22, 2013 on AIXchange

Jay Kruemcke recently posted this image on Twitter, and I love it.

The quotes are terrific: “In a world where so many things can go wrong, one machine made a difference” and “not one second of downtime in this thriller” are my favorites.

Given the references to both the IBM RS/6000 and the 1992 Academy Award winner for best picture, this was likely created sometime ago. (On the other hand, did they have Photoshop back then?) In any event, I’d love to learn any specifics about its origins. 

In some ways though, not that much has changed. Like its RS/6000 forerunner, POWER7 systems are renowned for their reliability. And how often is unplanned downtime an issue in your AIX environment? 

I don’t believe that the image Jay posted was ever part of an ad campaign, but IBM has marketed its systems and software systems in a number of distinct ways over the years. For instance, IBM once used its Fiesta Bowl sponsorship to push OS/2. Much more recently, IBM’s latest and greatest innovations are all over YouTube (herehere and here). And of course Watson’s appearance on “Jeopardy!” was a tremendous vehicle for raising awareness of and visibility for our modern platform.

As Watson moves from game shows to healthcare and the financial industry to dealing with natural language in computing, we will continue to see IBM market its solutions. 

But if it was your call, how would you promote IBM Power Systems and AIX and educate the public about their capabilities? It may seem like a superfluous question, but then again, some of the best marketing is word of mouth. Do you tell others about how your machine makes your job easier? Have you ever made your own ad? 

Many of us have questioned IBM marketing over time, but effectively getting the word out is a challenge, and now more than ever. So how would you tell the world that our hardware is better, and that our operating system is unparalleled? How would you explain virtualization, or the fact that 0.05 of a core can now be allocated to an LPAR in a customer’s environment? How can you quickly and easily communicate this information in a way that people, customers and the general public alike, understand? And where would you broadcast your message? TV ads? YouTube videos? Podcasts? Twitter?

Redbooks are Must-Reads

Edit: Still some gems on this list.

Originally posted January 15, 2013 on AIXchange

On this blog I often reference and recommend IBM Redbooks. Technology is constantly shifting and evolving, and with education budgets shrinking in many organizations, Redbooks can help you keep your skills up to date; or, if you’re new to the Power platform, they’re a great starting point.

I find that when I read a Redbook and then go try out the concepts on a test box, I’ll end up re-reading that Redbook. I just feel it’s the best way to learn. You can read every Redbook IBM puts out, but touching a keyboard and trying it and breaking things and seeing what typos might have slipped into the publication is how you transform reading material into practical knowledge.

I know some people don’t like IBM Redbooks. I’ve been told that they’re good sleeping aids. Others say they just don’t have the time. But I’ve read (and reread) Redbooks for years, and I know many others who read and learn from them as well.

Lists of AIX-themed Redbooks have been making the rounds on mailing lists and Twitter. I’ve read a lot of these publications and look forward to reading them all.

So here’s my list. It’s lengthy, but it’d be even longer had I included storage-related Redbooks. (Although storage pros should definitely check out those publications.)

If you’re looking for a quicker read, check out the IBM Redbooks Point of View publications. These are “brief, strategy-oriented documents that represent an author’s perspective on a particular technical topic. Written by senior IBM subject matter experts, the publications examine current industry trends, directions, and emerging technologies.”

Have I missed anything? If you’ve read any Redbooks that aren’t on this list, please add them in Comments.

A New Year, an Annual Highlight

Edit: Some links no longer work.

Originally posted January 8, 2013 on AIXchange

It’s a new year and I can tell you one thing I’m already looking forward to: The next IBM Power Systems Technical University conference.

I’ve written about this annual event often over the years. It just always seems to energize me. I get to go and be surrounded by people who know what Power Systems are and can relate to what I do in my day to day job. They understand VIO servers. They understand AIX and IBM i. It’s a refreshing change from social situations where people ask me what I do, and I tell them — and they have no idea what I’m talking about. Often I end up just saying “I’m in IT” or “I work with computers.” But at the Technical University, everyone gets it. We all understand how we make our livings.

The Technical University typically convenes in late October. At the 2012 event, three new Power Champions — Ron Gordon, Terry Keene and Andy Lin — were announced at the keynote presentation. They were introduced, and then the previous Power Champions (including me) got to stand and be recognized (see here. See also here for photos of the conference in general).

I admit, it is nice to be singled out this way. Being recognized, people came up and talked to me for the rest of the week. I enjoy hearing from other attendees. First of all, it’s just nice to put a face with a name. Oftentimes I’ll meet people whose work I’ve read on blogs or seen and heard in presentations and seminars that I’ve found valuable. Or maybe we’ve simply exchanged emails over time. I believe meeting people in “real life” enhances the relationship, making each person more invested in helping the other.

In addition, those I encounter at Technical University and other conferences often give me ideas for articles and blog posts. The general support and encouragement I’ve received face to face from readers is also greatly appreciated.

What makes Technical University stand out though is the opportunity to meet and interact with experts — including many key IBM technologists — in the Power Systems ecosystem. There simply isn’t a better place to ask a question and get an immediate, informed answer.

I mention Technical University now because if you would like to attend this year’s event — it’s Oct. 21-25 in Orlando — now is the time to plan for it. If you’ve never gone before, no sweat: Each year they ask for a show of hands of those who are there for the first time, and at each conference more than half the attendees in the room have their hands raised. The conference is growing, because it’s a great event — and also because the platform is growing. New customers are migrating to it.

In many companies, 2013 budgets will be set over the next few weeks. I encourage you to make your case to attend IBM Technical University. Attendees come from around the U.S. as well as from overseas. And, as I said, first-timers come in droves every year. You may even be able to qualify for a free voucher from IBM to attend the conference — check with your IBM rep or business partner.

Technical University is one event I look forward to every year. I hope you can experience it for yourself. I believe you’ll find it as valuable as I do.

Two Ways to Measure Network Performance

Edit: Some links no longer work, although the methods still should assuming ftp ports are open.

Originally posted December 18, 2012 on AIXchange

Note: The next update for this blog will be Jan. 8.

I was forwarded a newsletter that contained a piece on measuring network bandwidth. I’m sharing it here with the permission of the author, IBMer Doug Herman. Doug says he compiled his information from a Steve Knudson presentation at a recent IBM Champions event.

Although I’ve used both of these methods, I hadn’t previously covered them in the blog.

From the newsletter:

            Recently I had a situation where we were being told that network performance was unacceptable from one site to another across a high speed WAN link. After using the ftp method described below, we were able to show that the network speeds were not working as expected across the WAN. It turned out that there was a routing issue with our VLAN; the network admins had it going over a much slower link than the one everyone thought we were using. Once they made the change, it worked as expected.

Two Methods for Measuring Network Bandwidth (subhead)

First Method – FTP

This test is from AIX 5L Practical Performance Tools & Tuning Guide.

ftp> bin

200 Type set to I.

ftp> put “|dd if=/dev/zero bs=8k count 1000000” /dev/null

200 PORT command successful.

150 Opening data connection for /dev/null

1000000+0 records in.

1000000+0 records out.

226 Transfer complete.

8192000000 bytes sent in 70.43 seconds (1.136e+05 Kbytes/s)

Local: |dd if=/dev/zero bs=8k count=1000000 remote: /dev/null

—————————————————————————————-

Second Method – iperf (download)

Server LPAR – “systemX”

> rpm –ivh iperf-2.0.5-1.aix5.1.ppc.rpm

> iperf –s

————————————————————

Server listening on TCP port 5001

TCP window size: 256 KByte (default)

————————————————————

Client LPAR – “systemY”

> rpm –ivh iperf-2.0.5-1.aix5.1.ppc.rpm

> iperf –c systemX

————————————————————

Client connecting to systemX, TCP port 5001

TCP window size: 64.2 KByte (default)

————————————————————

[ 3] local 10.1.1.100 port 55707 connected with 10.1.1.222 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 384 KBytes 314 Kbits/sec

Client LPAR – using three parallel threads

> iperf –c systemX –P3

Client connecting to dettc005, TCP port 5001

TCP window size: 128 KByte (default)

————————————————————

[ 5] local 10.1.1.222 port 37477 connected with 10.1.1.100 port 5001

[ 3] local 10.1.1.222 port 37475 connected with 10.1.1.100 port 5001

[ 4] local 10.1.1.222 port 37476 connected with 10.1.1.100 port 5001

[ ID] Interval Transfer Bandwidth

[ 4] 0.0-10.0 sec 1.20 MBytes 1.00 Mbits/sec

[ 3] 0.0-10.1 sec 2.25 MBytes 1.86 Mbits/sec

[ 5] 0.0-14.9 sec 256 KBytes 141 Kbits/sec

[SUM] 0.0-14.9 sec 3.70 MBytes 2.08 Mbits/sec

(Note: This was originally published in the Power Systems newsletter. It’s produced quarterly and is available to non-IBMers. This Nigel Griffiths post provides some details and tells you how to subscribe.)

Have you used either method, or do you have another way of measuring network bandwidth? Please share your thoughts in Comments.

Configuring Cluster Notifications

Edit: Link no longer works.

Originally posted December 10, 2012 on AIXchange

A customer running a PowerHA 7.1.1.2 cluster wanted to be notified when nodes were down and when the resource group moved in their cluster.

Check here and you’ll find details about configuring a custom remote notification method:

“These topics describe how to configure custom remote notification methods to respond to an event, how cluster verification confirms the remote notification configuration, and how node failure affects the remote notification method.

“You can configure a remote notification method through SMIT to issue a customized numeric or alphanumeric page in response to a specified cluster event. You can also send SMS text message notifications to any address, including a cell phone SMS address or mail to an email address. The pager message is sent through the attached dialer modem. Cell phone text messages are sent through email using the TCP/IP connection or an attached GSM wireless modem.

“You can send the following custom remote notifications:

  • Numeric and alphanumeric page
  • SMS text message to any address including a cell phone or mail to an email address
  • SMS text message using a GSM modem to transmit the notification through a wireless connection

“The PowerHA SystemMirror remote notification functionality requirements follow:

  • A tty port used for paging cannot also be used for heartbeat traffic
  • Any tty port specified must be defined to AIX and must be available
  • Each node that might send a page or text messages must have an appropriate modem installed and enabled

“Note: PowerHA SystemMirror checks the availability of the tty port when the notification method is configured and before a page is issued. Modem status is not checked.

“To send an SMS text message over the dialer modem, your pager provider must offer this service.

  • Each node that might send email messages from the SMIT panel using the AIX operating system mail must have a TCP/IP connection to the Internet
  • Each node that might send text messages to a cell phone must have an appropriate Hayes-compatible dialer modem installed and enabled

Each node that might transmit an SMS message wirelessly must have a Falcom-compatible GSM modem installed in the RS232 port with the password disabled. Ensure that the modem connects to the cell phone system.”

My customer just wanted to receive email notifications, but I still had to make sure that I had a tty device defined on the node. I used smitty (smitty sysmirror) to access the PowerHA menus. From there, I selected: Custom Cluster Configuration > Events > Cluster Events > Remote Notification Methods.

I had to select the Configure a Node/Port Pair option. Defining a port would make sense if I was connected to a modem, but it was a needless endeavor in this case, since, as noted, my customer was only interested in enabling email notifications. Hopefully node/port pair configuration will be optional in future PowerHA SystemMirror releases.

In any event, in this screen I chose the node and port. Then I selected the Add a Custom Remote Notification Method option. In these fields I entered a name and the nodenames in the cluster. In the Number to Dial field I entered the email address that would receive the notifications. Then I chose the cluster events for which notifications were desired: rg_move, node_up and node_down. PowerHA users can choose numerous other events, however.

Once it was set up, I verified and synchronized the cluster. Then I ran the Send a Test Remote Notification menu option to make sure it worked. It did.

While I could have done pre- and post-event commands along with some scripting, I felt that using the remote notification method was the better way to go.

The final test came once the cluster was running. We moved the resource group from one node to the other. The notification worked as expected and we got the email notification we wanted.

Have you set up something like this in your HA environment?

Command Line Shortcuts

Edit: An oldie but a goodie.

Originally posted December 4, 2012 on AIXchange

What are your favorite scripting command line shortcuts? When I have a relatively small pile of repetitive things to do, I like to create a for loop, such as:

#for i in 0 1 2 3

>do

>lscfg -vl fcs$i | grep Net

>done

In this case I can easily get my WWPNs from my fibre cards.

If you’ve already run set –o vi and you recall your command history with esc-k, you might end up with something like this on your command line, ready for you to rerun:

#for i in 0 1 2 3^Jdo^Jlscfg -vl fcs$i | grep Net^Jdone

Though it’s certainly easy enough to go back in and edit that directly on the command line using normal vi keys, sometimes with the ^J characters and the lack of spacing — especially if it’s a long command that wraps around on the command line — it can be easier to enter v somewhere on that line and pull yourself into a vi editor session. That makes it easier to work on the command in question:

for i in 0 1 2 3

do

lscfg -vl fcs$i | grep Net

done

When you’re done with your edits, just save out of vi as you normally would, and the command that you put together will run as if it had been edited on the command line.

Here’s another loop I sometimes use:

while (true)

do

df

sleep 5

done

This basically runs the df command every 5 seconds. It will do so forever.

One way to easily remove all hdisks from a system is to run:

for x in `lsdev -Cc disk|grep hdisk|awk ‘{print $1}’`

do

rmdev -dl $x

done

Obviously, it will not rmdev disks that are in use, but I find that on systems with hundreds of hdisks that I want to manipulate for some reason, this can be a handy way to do some cleanup.

Otherwise, if you had some lists of values that you needed to loop on — say you need to delete hdisk12-25 — you could first run:

x=12 to set $x equal to 12, then you could run either

>while [ $x -le 25 ]

> do

> rmdev -dl $x

> ((x=$x+1))

> done

or

while (($x<=25)); do

> rmdev -dl $x

> let x=$x+1

> done

What other simple things do you run on the command line to make your job easier? Please share your tips in Comments.

A Case of Extreme Uptime

Edit: They tell us this should not be a badge of honor because it just means you are running unpatched machines, but I still think it is interesting. I was surprised that the links still work.

Originally posted November 27, 2012 on AIXchange

Ongoing maintenance of our machines is important. You should schedule change windows and make sure servers have the latest firmware and OS patches. Performing regular maintenance is the simplest way to avoid security vulnerabilities. Keeping current on fixes can save you from calling IBM Support; often their first response to a question is to tell you that your issue has been resolved in an already released service pack.

That said, AIX systems provides us with world-class technology. These machine are capable of running for a very long time without any care. And a few do.

A friend recently forwarded this email concerning one of his AIX machines:

            I have a production server that was here when I started in 1999. It was last booted on Jan. 14, 2000, almost 13 years ago…

            It was renamed after applications were migrated off of the server two weeks ago. It is now going to be used as a DR box. As you can see below, it was up 4,675 days before I rebooted it this morning. And yes, it came up just fine.

            # oslevel -r

            4330-11

            # uptime

            09:35AM   up 4675 days,   2:21, 2 users, load average: 1.22, 1.29, 1.28

This is, of course, first and foremost a tribute to the quality of AIX systems. However, a not insignificant amount of good fortune is also involved. This box ran continuously for almost 13 years. Power outages were never an issue. Any hardware issues were resolved through hot swapping. No one accidentally logged into this production server and accidentally ran a shutdown –Fr. The firewall that this box must have operated behind kept it safe from constant attacks.

I was impressed to hear of a production AIX server running for this amount of time without even a reboot. I imagine there are systems that have been up even longer, though I couldn’t find anything specific. If you’d care to do your own research, there are threads devoted to this sort of thing. See here, here, here, here and here.

Frankly, I wouldn’t recommend treating a machine this way. I always want to be sure I’m running a supported operating system with the latest fixes. Still, these types of stories surface every now and again, maybe you have your own. What’s the longest-running production system that you know of? What were the circumstances? Please share your anecdotes in Comments.

At an IT Conference, a Glimpse of Life Outside of IT

Edit: Some links no longer work. I try to mention a little bit more these days.

Originally posted November 20, 2012 on AIXchange

Last month I attended the IBM Power Systems Technical University. I was part of one session that featured IBM executives, IBM employees, and IBM Power Champions discussing different issues around the Power systems ecosystem.

First we went around the room and introduced ourselves. Now, most of us have a quick “elevator pitch” to explain who we are, what we do and what makes us such interesting and wonderful people. Being a consultant, these introductions can be important. For me, displaying the right combination of credibility and likability in these instances can help open doors and spur the people that I talk with to invite me into their organizations to help them make decisions involving their computing environments.

In the session I mentioned something about starting out on the AS/400 and working with that system for 10 years. I talked about my past employment with IBM and working on AIX, and noted my current position with Meridian IT. I added that I’m a Certified Advanced Technical Expert (CATE), a Red Hat Certified Engineer (RHCE) and an all-around swell person.

I didn’t bring up any hobbies or anything I do and enjoy outside of work. I didn’t say where I was from, why I love living there or where I plan to go for my next holiday. I didn’t mention my dreams or aspirations.

Would anyone have been interested if I had? Perhaps so. At this same conference, a lot of us were talking about the keynote session, because of the speech given by Jeff Jonas.

Read his bio, and you’ll see that Jeff is chief scientist of the IBM Entity Analytics group and an IBM Distinguished Engineer. He has many impressive professional accomplishments, and he spoke of his work experiences. When he was introduced, it was mentioned that he has participated in numerous Ironman triathalons over the years.

His material was excellent. Through his storytelling and use of humor, he simplified the technical concepts. His style of presentation was a pleasant change from what many of us have come to expect from technologists at conferences like these. Walking out of that room, you felt like you really understood the projects he was working on. But others I talked to who were there found his personal story just as memorable.

When you think about it, you probably know several people who are really passionate about a sport, a hobby or a subject. I know someone who competes at a very high level in bowling. I’ve met people who enjoy flying planes, who are martial arts experts, who collect and shoot firearms. I know people who run marathons, and people who love sailing. And every single one of these people works in IT.

That’s the thing. Our jobs are important to us, and a lot of us are very passionate about what we do. Still, we’re not defined by our jobs. The things we do outside of work are an even bigger part of our identities.

As they say, when you’re on your deathbed, you won’t be wishing you’d spent more time at the office, but you might be regretting that you didn’t spend more time sharing with your loved ones and pursuing your passions.

Working in IT is just one of many things that makes us who we are. Although we may enjoy our time in the technology field, many of us do impressive things outside of work. When you introduce yourself, do you stick to the resume, or do you also bring up your other interests? Perhaps I should revisit my elevator pitch.

Do You Need the Speed?

Edit: 4G rules the roost but 5G is on the way.

Originally posted November 13, 2012 on AIXchange

With today’s phones, 4G is the fastest. But, all things considered, is the fastest speed automatically the best option? I’ve wondered about that for awhile. More recently, I noticed a writer in Europe — which is, of course, another world as far as cell phone service and providers goes — expressing a similar sentiment:

“4G can do more with the radio spectrum than 3G, but this cleverness comes at the a cost: it requires much more processing power to cope with the surge in data and the electronics will draw more current. This is straightforward physics and – even if mobile networks had no legacy baggage — a 4G network would deplete your battery faster than 3G. The technology in the handset will improve and become more efficient, but that’s no use to us here and now.

“The question you then have to ask is — do you really really need that extra speed? When HSDPA+ has proved more than adequate? Personally, I’m struggling to think of applications where I’m prepared to trade off weight and power drain against that speed. If you’re only ever sat beneath a 4G network mast, and with a briefcase full of power chargers, the question may be a moot.”

In a follow-on post, he added:

“The first is that 3G has far more life in it than we thought. Aware that they’ll be getting a marketing pounding from EE, which has an exclusive on LTE in the UK for some months, rival 3G operators have quietly been upgrading to the latest and much faster version of 3G. The latest flavour of 3G, dual-channel HSPA+, delivers quite amazing speeds on Three’s network.

“My personal choice is for near-4G data speed when I need it and a phone that lasts all day, as opposed to 4G speed and a phone that craps out towards the end of a long lunch while rinsing me of all my cash.”

Though I don’t disagree with the author’s point, I’ve found I need the speed. With all the travel I do, I need a fast, reliable network connection. Here in the U.S., I never saw decent 3G speeds on any handset that I tried, so 4G LTE is the only choice for me. For me 4G is like having a cable modem in my pocket. Hotel Wi-Fi is frequently heavily saturated and barely usable, especially at night when everyone’s back in their rooms, trying to access the hotel’s network. Client sites can be hit or miss as far as external network access. There’s also the issue of restrictions on Internet use. More than once I couldn’t access Google Search due to a client’s internal filters. If you’ve read this blog for awhile, you know that I often turn to Google when I encounter a technical issue I haven’t seen previously.

Thankfully, I can use my phone as a mobile hotspot. This provides me with fast, reliable network access when I’m at the hotel, the airport or a customer site. Of course, the author is correct about 4G and battery life. Even with a 3200 mAh extended battery, your phone will drain in a hurry if you run it as a hotspot for any length of time. I don’t know of any mobile worker who walks around with a charger plugged in all day, and swapping batteries and using external battery packs aren’t elegant solutions either.

The point is, for average daily usage — a few calls, texts, emails and file transfers — 4G is great, but perhaps speed isn’t the primary consideration for most users. Certainly I rely on 4G, but cost and access are still important to me, even if I have to make them less of a priority. I have an unlimited data plan — or so my provider tells me. I find though that after some arbitrary amount of data has been used each month, my speeds get throttled.

Like every user, I’d love unlimited everything: minutes, texts and data, along with the fatest speed available and acceptable battery life at a reasonable price. 4G’s cleverness may come at a cost, but at this point, I feel I have no choice but to pay that fare.

So where does 4G rate with you? Are there other ways to get onto the network I should consider?

Running cldump on a Cluster

Edit: Hopefully nobody runs into this error these days.

Originally posted November 6, 2012 on AIXchange

I was recently asked why the cldump command wasn’t running on a PowerHA 7.1 cluster.

After running /usr/es/sbin/cluster/utilities/cldump, my client received this output:

            cldump: Waiting for the Cluster SMUX peer (clstrmgrES)
            to stabilize………….
            Failed retrieving cluster information.

            There are a number of possible causes:
            clinfoES or snmpd subsystems are not active.
            snmp is unresponsive.
            snmp is not configured correctly.
            Cluster services are not active on any nodes.

            Refer to the HACMP Administration Guide for more information.

I checked and learned that IBM has been scaling back the default SNMP configuration over the years for security reasons. However, this issue is relatively easy to address:

            1) edit /etc/snmpv3.conf (all nodes) and remove the comment hash from this line:

            #COMMUNITY public    public     noAuthNoPriv 0.0.0.0    0.0.0.0         –

            2) add this line (this is the top-level cluster view of the SNMP MIB):

            VACM_VIEW        defaultView     1.3.6.1.4.1.2.3.1.2.1.5 – included –

            3) restart the relevant daemons (this can be done without stopping cluster services):

            stopsrc -s clinfoES
            stopsrc -s snmpd
            stopsrc -s aixmibd
            stopsrc -s hostmibd
            stopsrc -s snmpmibd
            sleep 10
            startsrc -s snmpd
            startsrc -s aixmibd
            startsrc -s hostmibd
            startsrc -s snmpmibd
            sleep 60
            startsrc -s clinfoES

After these changes, cldump was working. 

We also found warning messages when we started cluster services or tried to synchronize the cluster:

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea1 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea2 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea1 does not support fast disk takeover

            WARNING: Volume group datavg is an enhanced concurrent mode volume group used as a serial resource, but the LVM level on node nodea2 does not support fast disk takeover

I called support and was told that this was addressed by IV26874. We were also provided with an iFix, which, once loaded, took care of the problem. So if you see the warning, contact IBM and get the iFix (if it isn’t yet available via service pack.)

Incidentally, neither of these issues was a show-stopper in my client’s environment. I continue to be very impressed by PowerHA 7.1.

Training on PowerHA

Edit: Some links no longer work.

Originally posted October 30, 2012 on AIXchange

In its Oct. 3 announcements, IBM noted that the new PowerHA SystemMirror 7.1 Enterprise edition will go GA on Nov. 9. Since I recently took some IBM training on this product, I’d like to tell you more about it.

First, understand that PowerHA is designed to provide mission-critical application availability through planned and unplanned outage events. True to its name, the enterprise edition is aimed toward multisite configurations, while PowerHA SystemMirror standard edition offers access to the normal capabilities of a local PowerHA cluster.

There are two options for multisite clusters: stretched and linked. A stretched cluster utilizes a single repository disk and occupies a single communications network that can extend for shorter distances. For a real-world example, think of a storage subsystem using GLVM that covers a college campus. The stretched cluster communicates with the nodes in the cluster using multicast.

A linked cluster is two sites in two different networks that are linked together. Distance is not an issue — the sites can be cross campus or cross country. A linked cluster utilizes two separate repository disks instead of the shared repository disk that’s used in a stretched cluster. While cluster-wide AIX commands can be used with both stretched and linked clusters, linked clusters use unicast communications.

HyperSwap provides for a multisite PowerHA cluster with continuous storage availability. With HyperSwap, applications keep running in the event of a storage outage, and the storage is kept in sync via Metro Mirror. Storage maintenance and storage migration can be performed without downtime. However, due to the specialized code involved, non-IBM storage products are not supported. IBM DS8000 storage systems must be on both sides of the HyperSwap solution. 

In the training session I attended, IBM emphasized the tighter integration that now exists between PowerHA and AIX. This is in large part due to Cluster Aware AIX. PowerHA 6.1 used traditional communication-based heartbeats, along with “user space” event processing and rsct topology management. The PowerHA 7.1 architecture features multicast communications, SAN communications, a repository disk heartbeat and kernel-based event processing. It becomes harder to have a cluster become “split brained” or partitioned due to the changes to the topology and heartbeating. 

Other notes about the training session:

* IBM has made changes to the Systems Director plugin to simplify cluster creation. This allows you to use a GUI to create your cluster and access a multisite install wizard.

* Be sure to get the latest service packs for both AIX and PowerHA. Having the latest fixes always helps; it’s vital with PowerHA.

IBM is continuously investing in and improving PowerHA as well as planning future capabilities. As was noted in the training session, the product still allows you to use dynamic logical partitioning to grow your LPARs when needed. You can set up an LPAR with 4 CPUs on your primary node and 1 CPU on your failover node. When it’s time to swap roles, that failover LPAR is able to take on 4 CPUs dynamically. This can save you on software license fees for your backup nodes.

Having worked with it for awhile now, I’m impressed with PowerHA SystemMirror 7.1 Standard edition. I’m still amazed at how easy it is to set up. Have you worked with this version of the product yet? What have your experiences been so far? Are you looking forward to the new capabilities with Enterprise edition?

Computer Jargon: A Look Back

Edit: I still find this interesting, the file probably needs to be updated.

Originally posted October 23, 2012 on AIXchange

Years ago when I worked for IBM I read and enjoyed a file called the “IBM Jargon and General Computing Dictionary.” It seems to be making the rounds again, at least if recent emails and tweets I’ve seen are any indication.

The dictionary’s tenth edition, published back in 1990, is still preserved online. While terms like “back to back remote,” “brass tag” and “Charlie letter” are old school, many of these words and expressions hold up and are still in use today.

Here’s a bit of the editor’s introduction:

“… This edition follows the markup and format of the last (Ninth) edition, and has more than one hundred and seventy new entries (bringing the total to over fourteen hundred entries).

 “This is not only the tenth edition of the dictionary, but is also its tenth year; the first edition was compiled and distributed in 1980. At that time the use of jargon was on the increase, but I now observe that the quantity and use of jargon appears to be decreasing – perhaps as computing becomes less of a specialist discipline. Not only does this make my task as editor of the dictionary a little easier, but it might also imply that the computing industry is at last getting better at communicating with its customers!”

This resonates with me. Most people use computing devices in their daily lives now. People and their cell phones or Smartphones are basically inseparable. Twenty-some years ago, being the “computer guy” had very different connotations. He — and it was pretty much strictly “he” — was usually much more technical than the rest of humanity. He often had a different way of talking, and if you didn’t know the lingo and the acronyms, you had a tough time even understanding him.

More from the dictionary:

“The items in this dictionary have been selected from the huge vocabulary of computer-related terms used in IBM. To be included here, a word or phrase must either have originated in IBM, or (more commonly) its meaning or usage in IBM must be different from the usual. Acronyms and abbreviations are not included except where they are necessary for cross-references, or are used as true words in their own right (for example, “APAR”).

“This dictionary is intended both to inform and to entertain. Each entry has a definition, which is usually supplemented by an explanation and an example of usage. Formal etymologies are not included, since in most cases the etymology is either unknown or disputed. In many cases, a meaning or usage is so subtle or bizarre that a light treatment is more appropriate (and conveys the sense better) than an attempt to define the term formally. As a result, this compilation is not just a source of information but is also a window on the IBM culture, as reflected in its language.”

Unfortunately the dictionary is no longer updated, and this seems to be something of a trend. From what I can tell, the (non-IBM specific) Jargon File was last updated in 2003. Here’s version 4.4.7.

Has the language evolved so much that we no longer need reference materials to help us make sense of the computing world? If you work with Power Systems regularly, do terms like IVM, HMC, KVM, FSM, VIOS, APV and SEA need entries these days, or do you just know what all of these acronyms mean today? Are there others that drive you crazy when you hear them?

Twenty-some years from now, will people be trying to make sense of what we were talking about with our abbreviations and lingo? Will we still rack and stack servers, or will everything be in the cloud?

If the IBM jargon dictionary was still being maintained, which words and terms and abbreviations would you want to add to it? I recently came across a term that would be a natural fit. Ask me about it the next time you see me.

Do any of the the IBM jargon dictionary terms bring back good memories for you?

Dual HMCs and Interface Locking

Edit: This is still relevant.

Originally posted October 16, 2012 on AIXchange

A customer has two HMCs and wants to get them on the network, with both machines controlling the same set of servers. Maximum availability is a priority. The customer doesn’t want to risk any HMC downtime in their environment.

Chapter 8 in this Redbook explains how to set up dual HMCs:

“A dual HMC is a redundant Hardware Management Console (HMC) management system that provides flexibility and high availability. When two HMCs manage one system, they are peers, and each can be used to control the managed system. One HMC can manage multiple managed systems, and each managed system can have two HMCs.

“A redundant remote HMC configuration is very common. When customers have multiple sites or a disaster recovery site, they can use their second HMC in the configuration remotely over a switched network…  The second HMC can be local, or it can reside at a remote location. Each HMC must use a different IP subnet.

“You need to consider the following points:

* Because authorized users can be defined independently for each HMC, determine whether the users of one HMC should be authorized on the other. If so, the user authorization must be set up separately on each HMC.

* Because both HMCs provide Service Focal Point and Service Agent functions, connect a modem and phone line to only one of the HMCs and enable its Service Agent. To prevent redundant service calls, do not enable the Service Agent on both HMCs.

* Perform software maintenance separately on each HMC, at separate times, so that there is no interruption in accessing HMC function. This allows one HMC to run at the new fix level, while the other HMC can continue to run at the previous fix level. However, the best practice is to upgrade both HMCs to the same fix level as soon as possible.

“The basic design of HMC eliminates the possible operation conflicts issued from two HMCs in the redundant HMC configuration. A locking mechanism provided by the service processor allows interoperation in a parallel environment. This allows an HMC to temporarily take exclusive control of the interface, effectively locking out the other HMC. Usually, this locking is held only for the short duration of time it takes to complete an operation, after which the interface is available for further commands.

“Both HMCs are automatically notified of any changes that occur in the managed systems, so the results of commands issued by one HMC are visible in the other. For example, if you choose to activate a partition from one HMC, you will observe the partition going to the Starting and Running states on both HMCs. The locking between HMCs does not prevent users from running commands that might seem to be in conflict with each other. For example, if the user on one HMC activates a partition, and a short time later a user on the other HMC selects to power the system off, the system will turn off. Effectively, any sequence of commands that you can do from a single HMC is also permitted when it comes from redundant HMCs.

“For this reason, it is important to consider carefully how to use this redundant capability to avoid such conflicts. You might choose to use them in a primary and backup role, even though the HMCs are not restricted in that way. The interface locking between two HMCs is automatic, usually of short duration, and most console operations wait for the lock to release without requiring user intervention.”

Although I typically see dual HMCs in larger enterprises, size isn’t a factor. Any type of environment can benefit from this configuration option.

A Cluster of Cluster Resources

Edit: Some links no longer work.

Originally posted October 9, 2012 on AIXchange

I don’t know who at IBM developerWorks wrote this document, but I really like it. By following along with the sections as outlined, you’ll learn how to define and configure PowerHA SystemMirror for AIX.

The first section includes references to a good introductory article, while section two focuses on infrastructure planning and configuration and section three has an IBM Information Center document on smart assists. The final three sections cover networks, resources and resource groups, creating a cluster and testing a configured cluster. Finally, there’s a cheat sheet from Christian Pruett and information about PowerHA training.

Information from the opening chapters of this Redbook is noted throughout the document, plus there are links to presentations, including a couple from IBMer Alex Abderrazag. I appreciate the inclusion of training information and really like how the document is organized overall.

Here’s another resource for PowerHA users: the new IBM draft Redbook, “IBM PowerHA SystemMirror Standard Edition 7.1.1 for AIX Update.” This was just released in September.

Just a quick point about SystemMirror: It runs a feature under the covers called cluster-aware AIX, which is integral to managing shared stored pools in VIOS. I’ll write more about this soon.

As for the SystemMirror Redbook itself, one thing that stands out to me is the step by step instructions for setting up a cluster in Chapter 3. There’s also this, which, quite honestly, made me chuckle:

“During the developing of this book, the repeating question was: what is the recommended virtual Ethernet configuration? The authors all had their own opinion, and there were many long debates on this topic. Finally we agreed that there is no specific or recommended virtual Ethernet configuration because all redundant configurations should work well in a PowerHA environment.”

To me this paragraph nicely sums up our profession. We all have strong beliefs about how to configure systems, and we’re often pretty vocal in pointing out the distinct advantages of our own particular way of doing things.

And it turns out that, despite their disclaimer, the authors managed to get together and settle on these recommendations for configuring virtual Ethernet:

* “Two Virtual IO servers per physical server.
* Use the servers’ already configured Virtual Ethernet settings because no special modification is required. In case of a VLAN tagged network, the preferred solution is to use SEA failover, otherwise use the network interface backup.
* One client side virtual Ethernet interface simplifies the configuration; however, PowerHA misses network events. (This can be remedied by applying APAR IV14422 and configuring your /usr/es/sbin/cluster/netmon.cf file as described in section 3.8.2)

* Two virtual Ethernet interfaces on the cluster LPAR because this enables PowerHA to receive the network events. This results in a more stable cluster.”

I do encourage you to check out all of these resources. With freely available information like this, learning about building a PowerHA cluster is easier than ever.

Reminder: IBM had a big announcement last week featuring POWER7+ hardware and software. I covered the many new solutions and features in this special post.

POWER7+ Systems Unveiled

Edit: Some links no longer work.

Originally posted October 3, 2012 on AIXchange

If you’re planning to upgrade your enterprise Power hardware in the near future, at this point you should focus on IBM’s POWER7+ systems.

On Wednesday IBM announced new versions of its enterprise Power Systems models, along with new software: AIX 7.1 TL2, AIX 6.1 TL8, IBM i 7.1TR5, Linux RHEL 6.3, SLES 11SP2 and PowerVM 2.2.2.

General availability for the software is slated for Oct. 12. The 770 and 780 hardware GA is Oct. 19. IBM i 6.1.1 support of POWER7+ on the 770 and 780 is expected Nov. 9. GA for model upgrades to the POWER7+ 770 and 780 — along with new firmware for the Power 795 — is Nov. 16. Other AIX 7.1 and 6.1 TL levels and VIOS 2.2.1.5 support for POWER7+ 770 and 780 are expected on Dec. 19.

Featuring a more densely packaged chip that gives off less heat and uses less power, POWER7+ systems offer 20-40 percent more performance per core. Another way to consider the progression is through these additional numbers from IBM: POWER7+ performance per watt can be up to five times greater than what was offered with POWER6+, and more than 10 times that of POWER5+.

The L3 cache size has more than doubled to 10 MB (vs. 4 MB in POWER7). POWER7+ processors run at a higher frequency, and include an on-board memory compression accelerator that allows active memory expansion (AME) to run with significantly reduced CPU overhead.

Model numbers have not changed with this announcement. We’re still talking about the Power 770 and 780, but now we’re looking at the “D” machine types in each family.

The new Power 770 with POWER7+ processors is the 9117-MMD. This server allows you to have up to 64 cores running at 3.8 GHz, or up to 48 cores running at 4.2 GHz. Comparatively, the POWER7 770 can run 64 cores at 3.3 GHz, or 48 cores at 3.7 GHz. As noted, the 9117-MMD allows up to 20 LPARs per core (up to 1,000 on the frame) with up to 16 concurrent live partition mobility operations. POWER7 systems support 10 LPARs per core and eight concurrent live partition mobility operations.

Those now using POWER6 570 9117-MMA, POWER7 770 9117-MMB and POWER7 9117-MMC systems will be able to upgrade to the new POWER7+ Model 770 (9117-MMD) system. From the announcement letter:

“You can upgrade the 9117-MMA, 9117-MMB, or 9117-MMC with 9117-MMD processors. For upgrades from 9117-MMA, 9117-MMB or 9117-MMC systems, IBM will install new CEC enclosures to replace your current CEC enclosure.”

The Power 780 also has the same 32 nm POWER7+ core with 10MB L3 cache per core. Where you could previously max out your 780 at 96 cores running at 3.44 GHz, or 64 cores running at 3.92 GHz, the 9179-MHD POWER7+ 780 can have a maximum of 128 cores running at 3.72 GHz, or 64 cores running at 4.42 GHz. As with the 770, the 780 can have 20 LPARs per core and run up to 16 concurrent live partition mobility operations. According to the information I saw, existing POWER6 and POWER6+ 570 9117-MMA and POWER7 780 9179-MHB and 9179-MHC systems can be upgraded to the new Power 780 (9179-MHD).

The 795 servers aren’t refreshing with POWER7+ processors, but they are part of this announcement. The 795 will have a new 256G memory feature with four 64 GB DDR3 DIMMs, so it can support up to 16 TB of memory on the frame. The 795 will also allow 20 LPARs per core with a firmware update. In addition, the 795 has two new PCIe Gen2 GX++ adapters (10G fibre channel card and 10G FCoE/CN) that plug directly into the GX++ slot on the processor card. This card combines a GX adapter + GX cables + PCIe I/O drawer + PCIe adapter into one new 2-port GX hybrid adapter. This card is designed to eliminate the need to have a drawer and the cables. Up to three adapters can be plugged into a processor book, and they can be housed in any of the four GX slots. Gen1 GX and Gen 2 adapters can function in the same processor book.

Here are some other announcement details.

* Elastic Capacity on Demand will enhance On/Off COD. Only two keys — one for processors and one for memory — will be needed to enable 90 days of available but inactive resources.

* Power System Pools will be available for 780 and 795 systems. Elastic COD resources may be purchased and billed for a pool. Rather than have some COD available on one system but not on another, we can now create pools of high-end Power Systems servers that allow sharing of Elastic CoD processor and memory credits. This capability can also be used in support of planned maintenance events. 

While a pool can have up to 10 Power 780 and 795 systems, Power System Pools have two limitations:

1) Fifty percent of the processors in the pool must be active.

2) Although the servers in the pool can be located in multiple data centers, AIX and IBM i cannot be mixed in the same pool.

Note that PowerVM and Electronic Service Agent are needed to enable this functionality. 

* The dynamic platform optimizer (DPO) is a new systems-tuning tool that optimizes processor and memory affinity in virtualized environments. The system can assess the level of affinity on a partition by partition basis. The system and workloads continue to run while the frame adjusts workload placement in the background to optimize performance without requiring additional admin interaction. (Note: This is not the same as the active system optimizer. ASO runs inside of AIX on your LPAR, while DPO runs at the hypervisor level and is designed primarily to optimize your LPAR’s physical cores and memory.)

 * AIX 7.1 TL2 and 6.1 TL8, will ship with the new POWER7+ systems this month. In the same timeframe we’ll also see the appropriate service packs (SP) for AIX 7.1 TL1, AIX 7.1 TL0, AIX 6.1 TL6 and AIX 6.1 TL7 to enable POWER7+ support. In addition, there will be an LPAR-to-WPAR migration tool, which, as you can imagine, helps migrate workloads from an LPAR to a WPAR. If you have AIX 5.3 service extension, expect a TL12 SP to enable POWER7+ processor support according to IBM’s statement of direction to provide future support.

* The new levels of AIX allow for exploitation of POWER7+ crypto offload accelerators, which enable encrypted filesystems and IPsec. According to IBM’s announcement, “this provides cryptographic engines that may relieve the POWER7+ processor from the performance-intensive cryptographic algorithms of AES and SHA. This can offload work from processor cores from doing these tasks and improve performance of those functions.”

* POWER7+ also includes a hardware random number generator and enhanced single precision floating point performance. High quality random numbers help improve security and offload cryptographic CPU cycles from the processor.

* A new virtual processor management scaled throughput option is designed to improve the ratio of workload throughput.

The AIX Enterprise Edition will now include PowerSC and Smartcloud Entry bundle. Also included are AIX 6.1 or 7.1, WPAR manager, IBM Tivoli monitoring, IBM System Director Standard, VMcontrol Enterprise, network control, PowerSC, Smartcloud Entry and storage control. IBM has made some changes to enterprise edition to remove some of the infrequently used items and make room for the newer offerings. If you currently have enterprise edition, you’ll receive any products you don’t currently have at no additional charge. 

* PowerVM 2.2.2, also announced on Wednesday, is set for a Nov. 9 GA. This will allow for the support of 20 LPARs per core on the 770, 780 and 795 systems. VIOS performance advisor updates and live partition mobility improvements are, according to IBM, expected to double the concurrency and improve LPAR movement performance as much as three times.

* PowerHA System Mirror 7.1 Enterprise Edition, also announced on Wednesday, is set for a Nov. 16 GA. We’ll have to get used to some new concepts, including stretched clusters clusters for campus or metro deployments, and linked clusters which enables two sites with independent networks across campus or across the country. I’ll cover this in detail in the near future.

* Finally, a new HMC, the 7042-CR7, will run V7R7.6.0 code and support blade servers. By running the new HMC code along with new 7.6 firmware and newer AIX levels, you’ll be able to set a new, lower minimum CPU for your LPARs (0.05 instead of 0.10). The new code level also allows HMC to support more current web browsers. HMC V7R7.6.0 is the last code level that will run on older models (7310-C04, 7315-CR2 and 7310-CR2). (Note: IBM recommends that if your HMC manages more than 254 partitions, or if you use IBM Systems Director to manage your HMC, at least 3GB of RAM is needed. The HMC should also be a rack-mount CR3 or later.)

For other announcement coverage, check out this IBM Systems Magazine Web Exclusive. And here’s Jay Kruemcke’s take.

Overall, I’m impressed with this announcement. Obviously though, there a ton of material here. Please post any questions in Comments, and I’ll do my best to track down detailed answers.

The Case for Documentation

Edit: I also like HMCscanner output

Originally posted October 2, 2012 on AIXchange

Once I was called in to help a customer that had lost its AIX support staff. I won’t go into the details; just understand that in this case, quite a bit of knowledge vanished overnight and had to be re-created.

We had to figure out passwords and LPAR configurations. Multiple profiles were associated with each LPAR, and there was no one who could answer our questions.  The only way to determine how the profiles were created — or even they were even still active — was to go into each one and look for recent updates. From there, we were left to making educated guesses.

We had to figure out how to connect to the HMC, both locally and remotely. We had to verify network addresses for the HMC as well as the various LPARs. We had to find the user IDs on the various systems that had escalated authority.

Physical connectivity was another puzzle. We found that there were two HMCs in a rack, but only one monitor and keyboard. It turns out the customer was using a KVM in the environment and employed a non-standard way of switching between the different sessions.

We had to figure out how to connect to the storage array, and then determine how the storage was allocated to the servers.

Luckily for us, no lasting damage was done, we were able to recover the passwords and get into the systems. Of course, it did take some time and effort. We didn’t have the luxury of being able to check a runbook, wiki or some other document.

When you build and maintain your own systems, you “just know” all of this information. When you’re a consultant like me and you come into an environment cold, there’s generally someone who can give you this information. Not that it was available in this case, but even documentation can be tricky. Don’t get me wrong, documentation is very valuable — provided it’s current. But outdated documentation is practically worthless, if not actually harmful. It can lead to bad assumptions, which can lead to bad actions, which generally result in system outages.

One tool I rely on in situations like this is HMC sysplans, which provides a snapshot of a machine’s configuration. I also run scripts on all machines so I can have current output. That’s the best way to identify what should be on a machine (or at least, what was on a machine).

Ultimately, though, this is yet another example of why maintaining current documentation is so vital. What kind of critical business information exists only in the memories of your IT staffers? If you were hit by a train, what would be lost?

So what do you do to document the needs and inner-workings of your environment? How frequently do you update this information?

Cache on Hand

Edit: Link no longer works.

Originally posted September 25, 2012 on AIXchange

Chris Gibson tweeted a link to a great read that will help you get your head around the inner-workings of your Power hardware.

Here’s a snippet from the article, “Under the Hood: Of POWER7 Processor Caches.”

“Most of us have a mental image of modern computer systems as consisting of many processors all accessing the system’s memory. The truth is, though, that processors are way too fast to wait for each memory access. In the time it takes to access memory just once, these processors can execute hundreds, if not thousands, of instructions. If you need to speed up an application, it is often far easier to remove just one slow memory access than it is to find and remove many hundreds of instructions.

“To keep that processor busy — to execute your application rapidly — something faster than the system’s memory needs to temporarily hold those gigabytes of data and programs accessed by the processors AND provide the needed rapid access. That’s the job of the Cache, or really caches. Your server’s processor cores only access cache; [servers] do not access memory directly. Cache is small compared to main storage, but also very fast. The outrageously fast speed at which instructions are executed on these processors occurs only when the data or instruction stream is held in the processor’s cache. When the needed data is not in the cache, the processor makes a request for that data from elsewhere, while it continues on, often executing the instruction stream of other tasks. It follows that the cache design within the processor complex is critical, and as a result, its design can also get quite complex.”

The author goes on to describe the cache array, the store-back cache, the L3 cast-out cache and finally, cache coherence:

* “Processor cache holds sets of the most recently accessed 128-byte blocks. You can sort of think of each cache as just a bucket of these storage blocks, but actually it is organized as an array, typically a two dimension array.”

* “So far we’ve outlined the notion of a block of storage being ‘cache filled’ into a cache line of a cache. Clearly, when doing store instructions, there is a need to write the contents of some cache lines back to memory as well.”

* “For POWER7 processors, a storage access fills a cache line of an L2 cache (and often an L1 cache line). And from there the needed data can be very quickly accessed. But the L1/L2 cache(s) are actually relatively small. [Technical Note: The L2 of each POWER7 core only has about 2000 cache lines.] And we’d rather like to keep such blocks residing close to the core as long as possible. So as blocks are filled into the L2 cache, replacing blocks already there, the contents of the replaced L2 are ‘cast-out’ from there into the L3. It takes a bit longer to subsequently re-access the blocks from the L3, but it is still much faster than having to re-access the block from main storage.”

* “This is a Symmetric Multi-processor (SMP). Within such multi-core and multi-chip systems, all memory is accessible from all of the cores, no matter the location of the core or memory. In addition, all cache is what is called ‘coherent’; a cache fill from any core in the whole of the system is able to find the most recent changed block of storage, even if the block exists in another core’s cache. The cache exists, but the hardware maintains the illusion for the software that all accesses are from and to main storage.”

Much more is covered in this article, including tips you may want to consider as a Power programmer. I encourage you to read the whole thing.

Running AIX 5.3 on POWER7 Hardware

Edit: Anyone still running 5.3? Some links no longer work.

Originally posted September 18, 2012 on AIXchange

I was recently asked about potential issues running AIX 5.3 with the latest fixes on POWER7 hardware with dedicated adapters. Somehow this person had gotten the idea that AIX 5.3 could only handle the underlying hardware and adapters by running in a virtualized environment using VIO servers.

Perhaps this person thought that since AIX 5.3 needs to run in POWER6 mode, the newer physical adapters wouldn’t be supported with an old version of the operating system. These days I typically run everything in a virtualized environment using VIO servers, and just off of the top of my head I can’t recall the last time I needed to dedicate an adapter to an AIX 5.3 LPAR on POWER7 hardware.

However, rather than shoot my mouth off, I quickly checked some Redbooks and asked some trusted resources. They confirmed what I figured: AIX 5.3 can most definitely handle all of the latest adapters you can throw at it. Absolutely.

This highlights one of the many strengths of using IBM solutions. From the hardware to the firmware to the hypervisor to the operating systems, one vendor owns the stack. Thus, IBM can ensure that any new hardware and adapters will continue to run with its legacy operating systems.

Speaking of AIX 5.3, remember that even though it’s no longer in support, AIX 5.3 is still eligible for extended support contracts. Another way to get support for your legacy OS is through AIX 5.3 versioned WPARs (see here and here). With this option you can run in POWER7 mode with four threads. IBM also provides bug fixes and how-to support if you choose to run 5.3 WPARs. Obviously running in POWER7 mode should give you a nice a performance boost over using two threads in POWER6 mode.

That IBM makes these options available is all the more impressive when you realize that AIX 5.3 debuted in 2004. You can see how someone would assume that current hardware couldn’t possibly support an eight-year-old operating system.

An item from the above link:

“As new hardware becomes available over the next two years, we will provide new hardware toleration when possible. This will not include new hardware that requires architectural changes.”

Also:

“Some people have asked me about when they should use the AIX 5.3 service extension versus using the AIX 5.3 Workload Partitions product. The answer is pretty easy – if you intend to migrate to a later release of AIX, then use the service extension to bridge you to that point. If however, you believe that you will need to run AIX 5.3 indefinitely for a particular application, then the AIX 5.3 WPARs is the better choice.”

So where are you on this? Have you already migrated all of your systems? Do you have extended support? Do you use the AIX 5.2 or 5.3 versioned WPAR offerings? Do you even know about them?

Looking Beyond Performance

Edit: Still thought provoking.

Originally posted September 11, 2012 on AIXchange

I recently attended another IBM technical briefing. As always, it was time well spent. This briefing included a keynote from IBMer Brad McCredie, whose ideas really resonated with me. Basically, Brad said the golden age of computing may already be behind us. No longer can increasing clock speeds and raw performance improvements dominate our computing hardware purchasing decisions.

Brad used the analogy of buying a new car. In his example, the model’s horsepower and track times hadn’t changed in 30 years. There were no performance improvements whatsoever. Still, there was progress. The new models had better brakes, better seatbelts, a more comfortable interior and significantly higher gas mileage. There was built-in navigation, a better sound system and even lighted cup holders. In short, while performance hadn’t advanced, the overall car driving experience has been transformed. Now it’s the efficiency and creature comforts — not the performance — that catches our eye.

Something similar could be said about the airline industry. Brad took us from the Wright brothers to the Concorde. He noted that planes don’t fly any faster now than they did a generation ago. And yet, Boeing 787s are sold out through the end of the decade. The new planes promise greater fuel efficiency and usability.

Brad then moved onto televisions. With flat-screen prices coming way down, manufacturers no longer emphasize bigger. Instead, they develop features like 3D TV, Internet-ready capabilities and amazing built-in sound systems.

This is now happening in computers as well. We often look at factors beyond raw performance. We want to improve our TCO. We need to manage unplanned outages. We need to run our workloads on fewer cores. We need efficiency.

Obviously, some industries still emphasize raw computing performance. He pointed to the financial industry, where every little bit of speed can help make money. Most customers, however, want to save on power and cooling costs. Or maybe their needs center on RAS and virtualization management features, or the capability to rapidly provision servers or consolidate storage and network infrastructure rather than manage two different networks. We don’t strictly focus on speeds and feeds and raw performance.

So what goes into your computing purchasing decisions these days?

Using mksysb for NIM Backups

Edit: This should still be relevant. Link no longer works.

Originally posted September 4, 2012 on AIXchange

Recently I received a reader question that prompted this email exchange with IBM network installation management (NIM) expert Steve Knudson.

Reader: I currently have one NIM server that I use to recover all of our AIX systems. We will be moving from P5 to P7 hardware, but I will no longer have a tape drive to back up my NIM server to. Nor will I have a DVD writable drive. What I need to know is how can I back up my NIM server to a mksysb file and recover my NIM server from that. I believe that I can’t use the NIM master backup to recover itself to a different LPAR somewhere, correct? Am I going to need an alternate NIM server to recover my NIM master? Do you have a documented procedure somewhere on how to accomplish this? Any help that you could provide on this matter would be greatly appreciated.

Steve’s reply: The way I would approach this….

1) Take a backup of the NIM database into a file in rootvg on the NIM master.
2) Collect mksysb of master, to itself.
3) NIM restore mksysb to new POWER7 LPAR, and in the process, make the question:

Remain NIM client after install? [no]

This eliminates the check and removal of bos.sysmgt.nim.master, nim.spot filesets.
4) After the restore, copy to the POWER7 the various lpp_source, scripts, bosinst_data, resolv_conf resources you want to preserve.
5) On the POWER7 LPAR, run the nim_master_recover command to restore the NIM database on the new LPAR. It will likely look for the copied resources in the exact path and filenames they had on the P5 LPAR.

This restores a backup of the NIM database to a different machine and updates the database to reflect
this change.

If you were planning on changing IP and hostname on the P5, and then setting IP and hostname of P7 LPAR to what the P5 had, you might be able to just restore the NIM database backup on the P7, without nim_master_recover.

Customer reply: Thanks for your response. I have a couple of questions though. So I would need to specify a different hostname and IP address to recover to on the POWER7? Then once I’m ready to shut it down (LPAR on P5), after copying all the lpp_source stuff and scripts, can I rename the POWER7 LPAR with the same hostname and ip address as the one that was on the P5?

I also assume I wouldn’t do the nim_master_recover until the P7 LPAR is updated with the original hostname and ip address?

Steve’s reply: If you want to move the hostname and IP to the new P7 NIM master LPAR, I would restore the mksysb, change hostname and IP on original P5 and then set hostname and IP on the new P7 LPAR. After that, I would restore the NIM database on the P7 and not do nim_master_recover.

Customer reply: How would I restore the master NIM database?

Steve’s reply: smitty nim > Perform NIM administration tasks > Backup/Restore the NIM Database.

Backup/Restore the NIM Database.

Move cursor to desired item and press Enter.

Backup the NIM Database.
Restore the NIM Database from a Backup.

This is where you collect the backup of the NIM database on P5 to start, and also where you’ll restore the NIM database on the P7.

Thus ends the exchange. So have you tried this? Has it been successful? Please share your experiences in Comments.

Training Without the Travel

Edit: Some links no longer work.

Originally posted August 28, 2012 on AIXchange

In February I discussed some great PowerHA V7.1 resources, including virtual user group training and an IBM Redbook. In a follow-on post, I pointed to this training session replay (“Configuring PowerHA SystemMirror V7.1 for AIX Cluster”). This video covers some of the new features in the latest version and also includes a live demo of the instructor setting up a cluster.

You may be unaware that IBM has actually posted several other training session replays on YouTube. They cover topics like IBM Systems Director 6.2 for Power Systems, IBM XIV, BRMS on System i, IBM Storwize V7000, AIX CPU performance management and advanced PowerVM and performance. The full list is available here.

These freely available videos are set up much as an actual IBM training course would be, and I imagine that IBM hopes you like what you see and will then be interested in signing up for a course of a longer
duration. Here’s how IBM puts it:

“Test drive classes do not replace a fee-based class, which is typically 3-5 days in length, provides extensive technical depth and includes hands-on labs. Instead, test drives give you a snapshot of key technical fundamentals using the very popular ILO format.

“Instructor-led online courses (ILO) are taught by a live instructor on a specific day and time. Most courses are exactly the same as their classroom equivalent, including the course duration, content and
student materials. Here is what you need to know:

•    You receive the course materials in advance.
•    To participate on the day of the class, all you need is a broadband Internet connection. This allows you to connect to a virtual classroom where you can interact directly with the instructor and your peers.”

Obviously, many employers are reluctant to budget for IT training and the travel and expenses that go with it. Internet-delivered education like IBM’s ILO is a less-costly alternative, though even then some of us can find it tough to break away from our jobs and remain focused on a class while in the day-to-day office environment. On the other hand, I do know of people who’ve successfully dealt with these potential distractions by taking the classes in office conference rooms or even from home.

Have you taken IBM’s ILO courses or other Internet-delivered education? Please share your experiences in Comments.

PowerVM Best Practices, Part Two

Edit: I still love Redbooks. Part 2.

Originally posted August 21, 2012 on AIXchange

As I said last week, the “IBM PowerVM Best Practices” Redbook has a lot of valuable information. This week I’ll cover the final three chapters of this publication.

Chapter 5 notes

Storage, with virtual SCSI and virtual Fibre Channel, are covered. The authors also address the issue of whether to boot from internal or external disk:

“The best practice for booting a [VIO server] is using internal disks rather than external SAN storage. Below is a list of reasons for booting from internal disks:

* The [VIOS] does not require specific multipathing software to support the internal booting disks. This helps when performing maintenance, migration, and update tasks.
* The [VIOS] does not have to share Fibre Channel adapters with virtual I/O clients, which helps in the event a Fibre Channel adapter replacement is required.

* If virtual I/O clients have issues with virtual SCSI disks presented by the [VIOS] backed by SAN storage, the troubleshooting can be performed from the [VIOS].”

Virtual SCSI and NPIV can be mixed within the same virtual I/O client. Booting devices or rootvg can be mapped via virtual SCSI adapters; data volumes can be mapped via NPIV (section 5.1.3). The pros and cons of mixing NPIV and virtual SCSI are illustrated in table 5-1.

A chdev should be run on all fibre devices (section 5.1.4):

$ chdev -dev fscsi0 -attr fc_err_recov=fast_fail dyntrk=yes
fscsi0 changed

“Changing the fc_err_recov attribute to fast_fail will fail any I/Os immediately if the adapter detects a link event, such as a lost link between a storage device and a switch. The fast_fail setting is only recommended for dual [VIOS] configurations. Setting the dyntrk attribute to yes allows the [VIOS] to tolerate cable changes in the SAN.

The authors recommend exporting disk devices backed by SAN storage as physical volumes. In environments with a limited number of disks, storage pools should be created to manage storage from the VIOS (section 5.2.1).

Virtual adapter considerations and naming conventions are covered in section 5.2.2. The pros and cons of using logical volumes for disk mappings versus mapping entire disks are considered in section 5.2.3. This section also tells us:

“Virtual tape devices are assigned and operated similarly to virtual optical devices. Only one virtual I/O client can have access at a time. It is a best practice to have such devices attached to a [VIOS], instead of moving the physical parent adapter to a single client partition.

“When internal tapes and optical devices are physically located on the same controller as the [VIO server’s] boot disks, it is a best practice to map them to a virtual host adapter. Then, use dynamic logical partitioning to assign this virtual host adapter to a client partition.”

Section 5.2.4 covers configuring the VIOC with Virtual SCSI and lists some recommended tuning options for AIX. Sections 5.3 and 5.4 cover shared storage pools and NPIV, respectively:

“NPIV is now the preferred method of providing virtual storage to virtual I/O clients whenever a SAN infrastructure is available. The main advantage for selecting NPIV, compared to virtual SCSI, is that the [VIOS] is only used a pass through to the virtual I/O client virtual Fibre Channel adapters. Therefore, the storage is mapped directly to the virtual I/O client, with storage allocation managed in the SAN. This simplifies storage mapping at the [VIOS].”

Chapter 6 and 7 notes

Chapter 6 covers performance monitoring, highlighting tools and commands that enable both short- and long-term performance monitoring.

Chapter 7 covers security and advanced PowerVM features, including default open ports on the VIOS like FTP, SSH, telnet, rpcbind and RMC. The authors recommend disabling FTP and telnet if they’re not needed (section 7.1.2). Active memory sharing and active memory duplication are covered in sections 7.4 and 7.4.3.

PowerSC and Live Partition Mobility are covered in sections 7.2 and 7.3. LPM storage considerations are listed in section 7.3.3:

“* When configuring virtual SCSI, the storage must be zoned to both source and target [VIO servers]. Also, only SAN disks are supported in LPM.

* When using NPIV, confirm that both WWPNs on the virtual Fibre Channel adapters are zoned.
* Dedicated I/O adapters must be deallocated before migration. Optical devices in the [VIOS] must not be assigned to the virtual I/O clients that will be moved.
* When using virtual SCSI adapters, verify that the reserve attributes on the physical volumes are the same for the source, and destination [VIO servers].
* When using virtual SCSI, before you move a virtual I/O client, you can specify a new name for the virtual target device (VTD) if you want to preserve the same naming convention on the target frame. After you move the virtual I/O client, the VTD assumes the new name on the target [VIOS]. …”

Section 7.3.4 lists LPM network considerations:

“* Shared Ethernet Adapters (SEA) must be used in a Live Partition Mobility environment.
* Source and target frames must be on same subnet to bridge the same ethernet network that the mobile partitions use.
* The network throughput is important. The higher the throughput, the less time it will take to perform the LPM operation. For example, if we are performing an LPM operation on a virtual I/O client with
8 GB of memory:

– A 100 MB network, sustaining a 30 MB/s throughput, takes 36 minutes to complete the LPM operation.
– A 1 GB network, sustaining a 300 MB/s throughput, takes 3.6 minutes to complete the LPM operation.”

PowerVM Best Practices Redbook

Edit: I still love Redbooks.

Originally posted August 14, 2012 on AIXchange

Occasionally I like to highlight IBM Redbooks that provide particularly valuable information to AIX pros. The new publication, “IBM PowerVM Best Practices,” is the latest example.

The version I viewed was a draft document (“Redpiece”) dated July 2, 2012. If it hasn’t yet been finalized, it should be soon. While a fairly short read at 118 pages, this publication is packed with relevant information that should be easily understood.

Chapter 1 notes

The authors remind us of the features and benefits of running PowerVM (section 1.1). Current VIO server minimums include 30GB of disk, a storage adapter, 768MB of memory and 0.1 processor (section 1.2). It’s worth repeating that that memory figure is a minimum. In fact, 6GB of memory and a core of entitled capacity for your VIOS is suggested (depending, of course, on your workload) in section 1.3.4.

The authors add: “Core speeds vary from system to system, and IBM is constantly improving the speed and efficiency of POWER processor cores and memory. Therefore, the above guidelines are, once again, a starting point. Once you have created your [VIOS] environment, including all the virtual clients, you should test and monitor it to make sure the assigned resources are appropriate to handle the load on the [VIOS].”

Section 1.3.7 addresses the question of whether to run a single VIOS or multiple VIO servers. The authors recommend the latter given the benefits of redundancy.

Section 1.3.9 covers slot numbering and naming conventions.

Chapter 2 notes

Installation, migration and configuration are covered. Included is a nicely documented section illustrating VIOS configuration on an HMC.

Two important reminders: First, set all physical adapters to desired, as setting them to required prevents dynamic LPAR (DLPAR) operations from working (section 2.1.4). Similarly, all virtual adapters should be set to desired if you’re planning on implementing live partition mobility (section 2.1.5).

The authors recommend using NIM to install the VIOS, citing this documentation.

Section 2.3.1 covers the need to perform regular maintenance. Firmware updates and patching should be done once a year. Two other points from the authors:

“When doing system firmware updates from one major release to another, always update the HMC to the latest available version first along with any mandatory HMC patches, then do the firmware. If the operating system is being updated as well, update the operating system first, then HMC code, and
lastly the system firmware.

“In a dual HMC configuration always update both HMCs in a single maintenance window, or disconnect one HMC until it is updated to the same level as the other HMC.”

Section 2.3.3 has a checklist you can use to apply fix packs, service packs, and ifixes. Section 2.4 covers VIOS migration.

Chapters 3 and 4 notes
Administration and maintenance are covered, including the process of backing up and restoring the VIOS (section 3.1). Backing up the VIOS is a separate task from backing up your client LPARs, so be sure you are backing up both (section 3.1.1). The VIOS should be backed up to a remote file (section 3.1.2).

Restoring the VIOS — from either the HMC or by using a NIM server — is discussed in section 3.1.4. In a D/R scenario, NIM is recommended (section 3.1.5).

Changes made with DLPAR operations should be saved by either manually making the changes to the profile, or by using save configuration from the HMC GUI (section 3.2)

Section 3.2.1 has a warning. This has actually tripped me up with NPIV in the past, so pay attention:

“Using the method of adding a virtual FC adapter to a virtual I/O client via a DLPAR operation, and then modifying the permanent partition profile, will result in a different pair of WWPNs between the active and saved partition profiles.

“When a virtual FC adapter is created for a virtual I/O client, a pair of unique WWPNs are assigned to this adapter by the Power Hypervisor. An attempt to add the same adapter at a later stage will result in the creation of another pair of unique WWPNs.

“When adding virtual FC adapters into a virtual I/O client via a DLPAR operation, use the ‘Overwrite existing profile’ option [Figure 3-4, page 44] to save over the permanent partition profile. This will result in the same pair of WWPNs in both the active and saved partition profiles.”

Section 3.3 covers the virtual media repository. Section 3.4 covers power server shutdown and startup.

Chapter four covers networking best practices, examining many different scenarios. You should read through them all.

There’s plenty more, so next week I’ll cover the rest of the material in this publication.

The 411 on a Client Hanging at LED 0611

Edit: Some links no longer work.

Originally posted August 7, 2012 on AIXchange

I was using a NIM server to load AIX, and it kept stopping at LED 0611. My first thought was to check this Redbook, but when I didn’t find anything there, I just did a web search. That let me to Steve Knudson’s AIX Network Installation Management Basics slides, which indicated that I was dealing with an NFS issue.

“Client hangs at LED 0611 – Indicates that some nfs resource that should be exported from the nim master is not available to the nim client. Mostly likely cause is that a parent directory was already exported to the client. the nim_bosinst process doesn’t always give errors when starting off this way. Check exports on the server, they should look something like this:

# exportfs
/export/aix433/images/mksysb.minimal -ro,root=nim6:,access=nim6:
/export/aix433/lppsource -ro,root=nim6:,access=nim6:
/export/aix433/spot/spot_aix433/usr -ro,root=nim6:,access=nim6:
/export/aix433/res/bundle.gnu -ro,root=nim6:,access=nim6:
/export/aix433/res/itab.mkmaster -ro,root=nim6:,access=nim6:
/export/aix433/res/bosinst.initial.B50 -ro,root=nim6:,access=nim6:
/export/nim/scripts/nim6.script -ro,root=nim6:,access=nim6:

“If the exports are substantially different from these, then power off client node. On the nim master run:

nim -o reset -aforce=yes clientnode
nim -Fo deallocate -asubclass=all clientnode
exportfs -ua
edit /etc/exports, remove inappropriate entries
nim_bosinst the client node again

“You can also get this led 611 when nim master resolves client hostnames in /etc/hosts.

“Entries there should be ipaddr fullyqualifiedname shortname.”

I went through these hints, reset the client, deallocated resources and then set up the client again. It still didn’t work. I went through /etc/hosts and tried a few permutations, and again, no luck.

At this point, I made a rookie mistake and changed a few different things at once. One of those changes fixed my issue, I’m just not sure which one. I do have it narrowed down to two possibilities. First, when I found this, I once again made sure that /etc/hosts had the correct information, and then I edited /etc/netsvc.conf and added hosts = local4,bind4.

That might have been the solution. Or, it could have come from this. The last entry in the thread says:

“Having spent the weekend making wild stabs in the dark I stumbled across the ‘solution.’ Please [provide] feedback as I’d like to understand what I’m doing.

“I used SMIT NIM/Perform NIM Administration Tasks/Configure NIM Environment Options. (This is sounding obvious, I know.) I then selected two options, the one that made the difference, I’m not sure. Maybe someone could point it out? Export NIM Resources Globally and Control Client Resource Allocation/Set Allocation Permissions for all clients.”

In any event, after making both of these changes, my NIM server was working again, and my mksysb booted and loaded as expected.

Interestingly enough, after I had this experience, one of my customers went through something similar. The customer addressed their issue by running these commands:

stopsrc –g nfs
cd /etc
rm –rf rmtab xtab
cd /var/statmon
rm –rf sm sm.bak state
startsrc –g nfs

Like many other things in AIX, there’s more than one way to tackle this issue. I’m not sure if there’s a preferred way. Perhaps some of you NIM experts could chime in on that in Comments. I’d love to hear from you.

Five Years In

Edit: Time sure flies.

Originally posted July 31, 2012 on AIXchange

Hard to believe, but it’s been five years since this blog debuted. Check the AIXchange archives and see for yourself.

Since the launch of AIXchange, I’ve written approximately 250 weekly posts — I don’t think we’ve missed more than a handful of weeks in five years. Sometimes I’m asked where I come up with the ideas and topics that I write about. The simple answer is I spend time at customer sites around the U.S., listening to the concerns of people working in their data centers. I talk to customers who are looking at different technologies and solutions. Sometimes I attend training or seminars and conferences.

I read technical manuals. I read other AIX blogs and follow AIX pros on Twitter. I respond to reader questions. In short, every day is a learning experience for me, and I try to share what I learn here.

I greatly enjoy doing this, and I plan to continue as long as you keep reading. Along those lines, what would like to read about going forward? What information would help you deal with the challenges you face in your department and across your company? Are there topics I’ve overlooked? Are there things I’ve covered too much? Please share your views and ideas. That’s how it works, after all.

The Enduring Value of IRC

Edit: I still love irc. Isn’t slack just irc with a GUI slapped on top of it?

Originally posted July 24, 2012 on AIXchange

If you’re old enough to remember Windows 3.11, you may recall the earliest days of IRC:

“Internet Relay Chat (IRC) is a protocol for real-time Internet text messaging (chat) or synchronous conferencing. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer, including file sharing.

“IRC was created in 1988. Client software is available for every major operating system that supports Internet access. As of April 2011, the top 100 IRC networks served more than half a million users at a time, with hundreds of thousands of channels operating on a total of roughly 1,500 servers out of roughly 3,200 servers worldwide.”

Back then, I spent considerable time on efnet and undernet IRC servers. I had a shell account running screen where I would run an IRC client, and connect to any IRC servers that I was interested in. Later on, when I worked at IBM, I took advantage of two IRC channels (#linux and #aix) that ran on the internal IBM network. It was a wonderful resource. When I couldn’t figure out something on my own, when I really needed help, I had quick and easy access to technical experts.

Would you like quick and easy access to technical experts? A few years ago I suggested that using IRC was one way users could keep current on AIX technology. Since writing that, social media has taken off — I know I’ve come to rely on Twitter. And of course today’s instant-messaging (IM) clients, with their video conferencing capabilities, are light years beyond IRC in terms of function. (And the IM evolution is certainly ongoing.)

 Nevertheless, I stand by what I said in 2008: IRC is a valuable tool. It still is.

So get an IRC client. Try connecting to irc.freenode.net. Then try connecting to ##aix. Now, I recommend lurking awhile before you jump into the channel and blurt out all your (tech-centric) problems. The effort though is worth it, because ##aix can put you in touch with a world of talented AIX pros who are willing to help.

I like IRC’s immediacy. You’re not continually checking forums to see if someone replied to the question you posted. This is realtime communications. Keep in mind, however, that the inhabitants of ##aix are essentially volunteers. Don’t get impatient if you don’t receive an immediate response. Wait a few minutes. Wait a half hour if need be. This isn’t anyone’s full-time job. Just remember that you’re accessing a worldwide audience of techies. No matter the hour, someone is out there. Odds are you’ll get help if you just wait. And even if there’s no one who can address your specific issue, sometimes it’s just nice to exchange ideas with another person who can provide a perspective different from your own.

I’ve always liked the IRC environment. I find comfort in the old-school feel. I enjoy the off-topic conversations. I like the relationships that develop over time — and if you do stay connected to a channel and get to know folks rather than simply join and bolt once your question’s been answered, you will make some new friends.

If you’ve never used IRC, I encourage you to give this “old” technology a try. And if you have or do use it, please share your thoughts in Comments.

Using the HMC Scanner

Edit: I just recommended this tool the other day, I still love it. This video is a nice demo as well. Updated the first link, the second link is the old download link that will be going away.

Originally posted July 17, 2012 on AIXchange

I recently downloaded the latest version of the HMC Scanner tool:

“HMC Scanner is a Java program that uses SSH to connect to an HMC, downloads the system configuration and produces a single Excel spreadsheet that contains the configuration of servers and LPARs. The result is a simple way to document configuration and to easily look at most important configuration parameters.

“Information is organized in tabs:

* System summary: name, serial number, cores, memory, service processor IP for each server.
* LPAR summary: list of all LPAR by serve with status, environment, version, processor mode.
* LPAR CPU: processor configuration of each LPAR.
* LPAR MEM: memory configuration of each LPAR.
* Physical slots: list of all slots of each system with LPAR assignment, description, physical location and drc_index.
* Virtual Ethernet: network configuration of each virtual switch and each LPAR.
* Virtual SCSI: configuration of all virtual SCSI adapters, both client and server.
* VSCSI map: devices mapped by each VIOS to partitions.
* Virtual fibre: virtual fibre channel configuration of client and server with identification of physical adapter assigned.
* SW cores: LPAR and virtual processor pool configuration matrix to compute the number of software licenses. Simulation of alternative scenarios is possible.
* CPU pool usage: easy to read history of CPU usage of each system. Based on last 12 months of lslparutil data.
* Sys RAM usage: easy to read history of physical memory assignment to each LPAR. Based on last 12 months of lslparutil data.
* LPAR CPU usage: easy to read history of CPU usage of each LPAR. Based on last 12 months of lslparutil data.”

After downloading and opening the .zip file, I opened a DOS prompt and went to the directory I chose when I unzipped the file. Then I followed these directions:

“Unzip the downloaded file and edit the hmcScanner.bat or hmcScanner.ksh in order to make the BASE variable point to the directory where the ZIP file has been decompressed.”

Once I pointed the BASE variable to the directory containing my extracted files, I ran hmcScanner.bat and saw:

            HMC Scanner Version 0.3
            Missing or wrong arguments. Syntax is:

            hmcScanner.Loader [-p ] [-dir ] [-perf ] [-readlocal] [-key file] [-stats] is the directory where data will be stored. Default is current directory.

        and is data collection retrieval interval. Syntax is: YYYY MMDD  d=daily data samples; h=hourly data samples
        -readlocal will force reading of existing local data without contacting HMC
        -key will use OpenSSH private key (default $HOME/.ssh/id_rsa)
        -stats will produce system statistics

I tried running hmcScanner.bat hmchostname hscroot –p password (obviously using a real hostname and password). It ran for a few minutes and started to gather data. After it finished, it generated an .xls file that had summary information about my HMC-managed systems. (Sample here.)

The system summary displays the serial number, the number of cores installed and active, memory installed and active, service processor IP information, etc. 

The next tab is labeled LPAR Summary. It provides a view of the LPARs on the machine and whether they’re running or not. It also displays the OS version and the mode the processors are running in.

The LPAR CPU tab displays processor information including entitlement, weight, minimums and maximums, and whether the processors are shared or capped.

The LPAR MEM tab displays similar information for memory statistics.

The Physical Slots tab shows which LPARs are assigned which physical cards on the machine.

The Virtual Ethernet tab displays the virtual slot numbers, whether the network adapters are trunked together, the virtual Ethernet MAC address, the virtual switch it’s attached to and the VLAN ID.

The Virtual SCSI tab displays the slots that are set up, and which slots they’re attached to.

The VSCSI Map tab shows how disks — including LUN IDs, backing devices, etc. — are mapped. There are also Virtual Fibre and SW cores tabs.

The HMC Scanner is useful on its own, but when coupled with the HMC system plans you can generate from your HMC (select System Plans/Create System Plan if you don’t already run these in your environment), it provides some fantastic resources to help you document your system.

The Compatibility of VIOS and IBM i

Edit: I still recommend checking that the sun still occupies the sky once in a while.

Originally posted July 10, 2012 on AIXchange

“What do you mean you can’t see the disk?”

I was surprised. I’d just run mkvdev –vdev hdisk2 –vadapter vhost0 on my VIO server and mapped hdisk2 to an IBM i client, and the guy doing the install couldn’t see the disk I just mapped to his LPAR.

I wondered why I was having VIOS manage these disks anyway. Why not give him the physical adapters and let him go nuts? There were two good reasons to go VIOS. First, multiple IBM i clients would be using the internal disk. Second, the company wanted to be positioned for the future. They planned to attach the IBM i partitions to a SAN.

I was working with a large RAID5 array made out of 6x600G internal SAS disks. It was nearly 3TB in size.

How do you create a big RAID array out of internal disks? In the VIOS I was logged in as padmin, and I’d assigned the storage adapters to the VIOS LPAR in the HMC GUI. In this instance we only had one VIOS, so it was fairly straightforward.

I ran diagmenu from the $ prompt and got into my normal diag screens that I’m familiar with in AIX.  From there I hit Enter, then I ran Task Selection>RAID array manager>IBM SAS Disk Array Manager.

The next step was to “Create an Array Candidate pdisk and Format to 528 Byte Sectors.” I selected all of the relevant hdisks I was planning to use in the array and let diag format and then delete them as hdisks. They would then be recreated as pdisks. (This took a while, but I needed a break. I stepped outside to confirm that the sun still occupies the sky.)

Once the formatting/deleting/recreating was complete, I was able to “Create a SAS Disk Array” from within the IBM SAS Disk Array Manager menu. I chose RAID5 and my stripe size, and the new hdisk was created. Now we’re back at the point where I mapped the hdisk that my IBM i admin couldn’t see. So I unmapped the single large disk and carved it up into smaller logical volumes. I added hdisk2 into the datavg volume group using this command:

mkvg –vg datavg hdisk2

Then I made some smaller logical volumes to present to IBM i using this command:

mklv –lv disk1 datavg 500G

Once I’d carved up the logical volumes, I mapped them with this command:

mkvdev –vdev disk1 –vadapter vhost0

Lowering my logical disk sizes was the key. Once I did that and the remapping, all was well, and the IBM i admin was able to use the disk as presented.

While this is an older IBM Redbook, it has good information about IBM i and VIOS and disk limitations:

http://www.redbooks.ibm.com/redbooks/pdfs/sg247668.pdf

From Section 4.5.4: “The maximum available size of a logical drive for System i is 2 TB – 512 bytes. But for performance reasons we recommend much smaller sizes, as is described further in this section.”

I seem to do more and more with IBM i. I guess these shops are recognizing how well IBM i and VIOS work together. That’s a good thing for them, and it’s good for us AIX users, too. Anytime we can work together, we can learn from one another.

Helping Users Help Themselves

Edit: Still good stuff.

Originally posted July 3, 2012 on AIXchange

Someone recently shared with me a thread where IT pros lament end users’ lack of computer expertise. Even though end users have their own job responsibilities, you’d think that companies would hire people who at least have a basic understanding of e-mail or widely used business applications like Microsoft Office. However, that frequently isn’t the case.

Anyway, this lengthy thread generated an epic comment, which I’ll repost below. The commenter’s basic point is that while an automobile is an incredibly complex machine, when something goes wrong most car owners can at least explain their problem. With end users, again, that frequently isn’t the case:

“I’ve actually been using the cars analogy for a couple months now and I think it’s very fitting. Imagine if you were a mechanic who owned an auto shop and your average customer call went something like this?

“Customer: My car isn’t working and I need you to fix it immediately, this is an emergency.
Mechanic: Alright sir what seems to be the problem?
Customer: I don’t know, I tried to use my car on Friday and it didn’t work, now it’s Monday and I need to get to work and I can’t and this needs to be fixed right now.
Mechanic: Can you start the car? Can you even get into your car? Does it make any sounds when you try to start it? Are all four tires there?
Customer: I don’t know, I don’t know what any of that stuff means, I tried to get to work and it wouldn’t let me and you need to fix it now because you changed my oil six months ago.
Mechanic: Alright well what kind of car are you driving?
Customer: I don’t know, a green one, why does that matter?
Mechanic: Please take a look at the back of your car and see if there are any letters or numbers that would indicate a vehicle model or manufacturer
Customer: Ok, my car is a SV2 87K.
Mechanic: No sir that’s your license plate. My records indicate that you drive a Nissan Altima, can you confirm that the key you’re using to try and get into this car says Nissan on it?
Customer: My key says Lexus but I don’t see how that makes a difference, I’ve been using this key on this car for years and it’s always worked, what did you do to my car?”

Could you imagine your mechanic having to ask if you’re using the right key, or if you have tires? Could you imagine saying you didn’t know? Yet that sort of thing happens every day in the world of tech support. We ask customers — even end users — if the machine is plugged in. We really do ask if they’ve tried turning it off and on again.

It’s our job to help people, but the people seeking our help need to help us by providing basic information. I cannot count the number of times I’ve asked “what changed?” only to, quite honestly, be lied to. “I didn’t do anything,” I’ll hear. Then later the user admits to deleting files after reading somewhere that doing so would make his computer run faster. Which it could — unless of course you randomly delete some important system files.

Dealing with end users can be frustrating. Laughing at comments like the aforementioned car analogy is one way to deal with the frustration. But more important is to always keep in mind the user’s perspective. Sometimes I wonder if it isn’t that users are dumb, but they’re afraid of feeling dumb. Perhaps they don’t confess because they’ve been belittled for their computer ignorance in the past.

 As I wrote back in 2009, no matter how difficult they might make our jobs, end users still deserve our respect:

“A co-worker of mine once snapped at a nurse when she had problems logging into her workstation. She responded by asking him if he’d like to come up the hall with her and fix an IV or administer some drugs. Touche. The nurse was just as knowledgeable and passionate about healthcare as my coworker was about technology. Working with computers was important, but it was only a small part of her job. She just needed to enter data and to print some reports. She didn’t care about drivers, passwords or proper startup/shutdown sequences. Once we showed her how to do what she needed to do, she was fine, and we didn’t hear from her again.”

Perhaps end users as a whole should understand computers more than they do. Remember though — few people are as passionate about technology as we are. And ultimately, we support businesses. These businesses don’t run the latest and greatest hardware because they’re geeked about feeds and speeds. They rely on computers to process data and solve problems. It’s our job to be friendly and helpful and, above all, help people help themselves.

Moving VIOS to Internal Disks

Edit: Link no longer works.

Originally posted June 26, 2012 on AIXchange

Recently, a client of mine had a VIO server that was booting from a SAN. Many shops boot everything from SANs, in part to avoid the hassles of working with internal disks. The flip side is that booting from internal disks leaves available the capability to troubleshoot a system even if the SAN is down. If you boot from SAN and the SAN has an issue, the entire system is unavailable until the SAN is fixed. If you boot from internal disks, you can still login and point to the errors in your error log, even if your remaining LPARs are unusable because they boot from SAN (or at least get their data from the SAN).

In this case, the VIOS had been built on smaller-sized SAN LUNs, and my client decided that the operating system should be moved onto larger-sized internal SAS disks. To accomplish this, I opted to add the disks to rootvg and then run the mirrorios command to mirror rootvg onto the internal drives.

However, when I tried to run extendvg rootvg hdisk52, I got this error:

Unable to add at least one of the specified physical volumes to the
volume group. The maximum number of physical partitions (PPs) supported by the volume group must be increased.  Use the lsvg command to display the current maximum number of physical partitions (MAX PPs per PV:) and chvg -factor to change the value.

*******************************************************************************
The command’s response was not recognized.  This may or may not
indicate a problem.
*******************************************************************************

*******************************************************************************
The command’s response was not recognized.  This may or may not
indicate a problem.
*******************************************************************************

extendvg: Unable to extend volume group.


Here’s the fix for this issue:

            -factor
            Changes the limit of the number of physical partitions per physical volume, specified by factor. factor should be between 1-16 for 32 disk volume groups and 1-64 for 128 disk volume groups.
            If factor is not supplied, it is set to the lowest value such that the number of physical partitions in the volume group is less than factor x1016. If factor is specified, the maximum number of physical partitions per physical volume for the volume group changes to factor x1016.

I aimed low, first trying chvg –factor 2. When that didn’t work, I moved to chvg –factor 3. Bingo. That command was able to extend rootvg.

Once I was able to extendvg successfully, I ran mirrorios. That produces this prompt:


            This command causes a reboot. Continue [y|n]?

I chose yes. While I could have run mirrorios with the –defer option, I prefer to do the reboot right away so I know everything works as expected once the mirror operation completes.

Then I reminded myself of something else. I ran mirrorios hdisk52 hdisk53, and of course it did its job of spreading the copies across both disks. After running unmirrorios hdisk52 and unmirrorios hdisk53 to clean that up, I ran mirrorios hdisk52. Once this completed, I was able to unmirror hdisk14 (my SAN LUN), then mirrorios hdisk53. This left me with what I wanted — copies of rootvg on my internal SAS disks, with the copy on my SAN LUN clear so I could remove it from rootvg.

What about your environment? Do you run VIOS on internal disks like my client, or do you boot from SAN?

Not-So-Crazy Ideas

Edit: What do I consider crazy now that I should give a second look to.

Originally posted June 19, 2012 on AIXchange

I was in a meeting when a consultant suggested we adopt a storage area network solution for our servers. I was part of a team that laughed the consultant out of the room.

“What an idiotic solution,” we thought. “You want us to put all of our storage together in one piece of hardware? What if that machine goes down? Our whole environment will go down with it, and who knows how long it would take to untangle that mess?

“You want us to pay how much? Please leave. You’re clearly insane.”

This meeting took place in the mid 1990s. At the time we had AS/400 and Novell Netware servers, along with Windows, Sun, SCO and AIX machines. Everything in the computer room had internal disks.

With SANs still in their infancy, we may have been wise to hesitate. But fast forward some 15 years. Our initial fears have long since been addressed. Most SANs have built-in redundancy, so if a disk or controller goes down, the SAN keeps running, and the data is perfectly preserved. SANs are so prevalent now we pretty much take them for granted. Probably every person reading this either currently uses a SAN or has used one at some point.

Shortly after that long-ago storage meeting, we were presented with another crazy idea. This guy, a new hire into our organization, wanted to move our core switches to Fast Ethernet, running at 100 full. At the time we were perfectly happy running on an FDDI ring.

“What about network collisions?” we wondered. “Why would we go with Ethernet?

Back then we were invested in FDDI. We put our budget and expertise into it. There was even an emotional investment. (Really, one coworker was so upset when we ultimately went with Ethernet, he left the company.)

Replacing most of our network interfaces and switches to accommodate Ethernet was expensive, but we realized that the times were changing, and we needed to change with them.

Of course, technology is full of stories like these. How many of you were DOS experts? Did you know your way around Windows 3.11, OS/2 or Netware? When was the last time you put your skills with line printers or reel to reel tapes to good use? What about configuring dot matrix printers or serial devices directly attached to AIX machines? I’m sure a few folks still do these things, but by and large much of this technology is gone. And by and large, we’ve smoothly adapted to what’s come after.

It’s human nature to fall into routines and resist change. But change is the nature of technology. It’s not all bad, either. Honestly, I believe one of the fun things about working in IT is being a part of this constant change, and seeing old assumptions crumble — even if those assumptions were sometimes my own.

A Look at IBM’s XIV

Edit: Are there many still out there? Some links no longer work.

Originally posted June 12, 2012 on AIXchange

Are you running IBM’s XIV disk storage system?

If you’re not familiar with this solution, Anthony Vandewerdt’s blog is a good starting point. Anthony, an IBM storage solutions specialist who’s based in Australia, has covered, among other things, XIV historyXIV Release 3.1 materials (including a really cool video) and the XIV Release 3.1 SSD read cache.

Anthony also explains how to load the latest XIV GUI (Version 3.01).

Relatedly, if you check the release notes, you’ll find this summary:

“IBM XIV Management Tools v3.0.1 contains all the utilities required to manage and monitor your XIV system: IBM XIV GUI for graphic management and control, IBM XIV Top for real time monitoring, and IBM XIV XCLI for CLI access and scripting.

“Version 3.0.1 offers support for IBM XIV Storage System Gen3, as well as enhanced GUI functionality and usability and internal updates for XIVTop and XCLI utilities. The Management Tools package can be used to manage all generations of the XIV system from a single console.”

The tools run on different versions of Windows, MAC OS, Linux, AIX, Solaris, and HP-UX. Check the readme for more details, and download the GUI. (Note: The relevant management tools and their fix packs are listed in the middle of the page under the Management Tools heading.)

In my case I selected the Windows version, and after my 300 MB download, I had a GUI to play with.

If you don’t have an XIV system, you can still get a glimpse of its functionality. There’s an option to select demo mode on your management console that allows you to simulate an XIV environment and its storage-provisioning features. (Note: Registration with IBM.com is required to download the management tools or fixes.)

So have you considered XIV for your shop?

A New User’s Take on the Command Line

Edit: I am not sure this has gotten any easier.

Originally posted June 5, 2012 on AIXchange

I recently worked with a customer whose environment is kind of interesting. Even though Linux is prevalent and he has a background of running Windows servers on VMware, there’s little — let’s call it traditional — UNIX hardware.

This customer, however, just installed  AIX on Power for the first time, and it was enlightening to hear his perspective on this leading-edge technology.

Given his long-term use of VMware, he found the virtualization concepts easy enough to grasp. On the other hand, installation and navigation of the HMC and AIX were new ground.

One of the first stumbling blocks came once we’d installed the VIO server. We were navigating around in the Korn shell after accepting all of the license agreements on the command line. Of course I do this all the time, so it’s second nature to me. That’s why I didn’t have a good answer when he wondered why better command history navigation and command completion options haven’t been implemented.

There were little annoyances. His term type wasn’t set properly. Sometimes the function keys worked, sometimes not. He had to manually type stty erase ^? to get his backspace key to work. Another source of frustration was that the F4 key would provide a pick list in smitty — sometimes. And sometimes not.  These little annoyances are easy to work around, provided you’ve been doing it for a while.

When I showed him how to use set –o vi and then navigate the command line like a vi session, I received a look of sheer incredulity. He asked why on earth he should he need to know this obscure stuff? Couldn’t AIX just allow users to get their command history via the up arrow, as DOS 5 and DOSKEY users have been doing for years? He noted that every other command line he uses, including Cisco’s IOS, allows him to use the up arrow to go through his shell history and serves as tab completion when he enters commands.

I told him he could load bash and others shells onto AIX for a more user-friendly command line experience. Of course, that doesn’t help him when he’s doing work as root or padmin using the Korn shell.

Some of his critiques were actually pretty funny. For instance, when I informed him that the esc-k combination allows him to go back in his shell history and edit previous commands, he jokingly told me that “esc k” in Spanish means “What did I just type?”  (As que — pronounced “k” in Spanish — means “what” in English, I can see how he would make that leap.) 

Other words he used to describe the AIX interface included “antiquated,” “primitive” and “painful.” He told me it wasn’t 1982 anymore. Then he suggested that perhaps this interface would pass for modern in North Korea.

Despite his light-hearteded comments, the customer clearly wasn’t impressed with the out-of-the-box Korn shell command line experience. Although we are able to clean some of the issues up with entries in our .profile, it left me wondering about improvements that could be made to the command line.

We moved on to loading operating systems, virtual Ethernet, virtual storage and virtual optical devices. The complaining continued. Once I got to the NIM server — or, as he called it, “the secret of nimh server” — the customer had had enough for the day.

I accepted all his comments with a grain of salt. Even for IT pros, learning something new can be daunting. Since that first day he’s gotten more comfortable with the AIX environment. The rate of complaints has slowed.

But it’s interesting to think about. Imagine you didn’t have your years of experience with the command line and try to see things with fresh eyes. What kinds of crazy things do we do on a daily basis that we accept as perfectly normal? Does it extend beyond the command line? Are there features and functions  in AIX that would turn off users who don’t know the platform? Could any or all of these things truly be a barrier that keeps some customers from adopting AIX and Power solutions? Please share your thoughts in Comments.

Implementing a Shared Storage Pool

Edit: This is still a slick way to handle disk.

Originally posted May 29, 2012 on AIXchange

I wrote about shared storage pools (here and here) back in January. Recently, I had an opportunity to implement one with a customer.

We had two 720 servers, each of which had two VIO servers. We upgraded to the latest VIOS code, making sure our HMC and firmware were at current levels. Then we presented the LUNs from our SAN, following the steps outlined in my January posts.

First we made sure all the LUNs were set to no reserve.

            chdev -dev -attr reserve_policy=no_reserve

Then we created the cluster. While I’m giving the names referenced in Nigel Griffiths’ presentation (see my first post linked above), for the record, we used our own names.

            cluster -create -clustername galaxy -repopvs hdisk2
            -spname atlantic -sppvs hdisk3 hdisk5 -hostname bluevios1.ibm.com

With that accomplished, I could see that we had a cluster.

            cluster –list
            cluster -status -clustername galaxy

In our cluster, we used one 20G repository LUN and two 500G LUNs for our data.

The cluster –create command took a few minutes to run. On our first try we didn’t have fully qualified hostnames on our VIO servers, so we got an error when we tried to create the cluster. Changing the names was easy enough, and after that, the cluster was successfully created.

We ran the cluster –list and cluster –status commands again and got the output we expected. Then from the same node we ran the cluster –addnode command to add a second VIOS to our cluster.

            cluster –addnode –clustername galaxy –hostname redvios1.ibm.com

It took about a minute to add that node, and it was successful. We ran cluster –status again to confirm that the second VIOS was added.

One thing I liked about the process is that the output provides the node name, machine type and model information. This way it’s easy to see determine which physical server is running the command.

We did the same procedure for the next two VIO servers. This took a bit longer, likely because they were on another physical server. Still, at the end of the procedure the cluster -status command displayed all four VIO servers in the cluster. When we logged into each of the other VIO servers and ran cluster –status, we saw the same output.

(Note: Running lspv won’t tell you that the disks in your storage pool are in use, but the lspv -free command will give you this confirmation. This could be an issue if you were mapping the entire hdisk to a client LPAR — i.e., the “old” way. But because you’re not actually mapping hdisks directly, this isn’t necessarily a problem.)

To create a new vdisk to map to our client LPAR, we ran:

            mkbdsp -clustername galaxy -sp atlantic 16G-bd vdisk_red6a -vadapter vhost2

Once we had our disk created and mapped, we ran:

            lssp –clustername galaxy –sp atlantic –bd

That showed us that vdisk_red6a was in the cluster.

Then we ran this command to map it in vios2:

mkbdsp -clustername galaxy -sp atlantic -bd vdisk_red6a -vadapter vhost2

If you compare the command that creates the vdisk to the one that maps the vdisk to the client LPAR, the only difference is the size you provide. Someone can tell me if there’s an easier way to do it. For my own amusement I tried using the old mkvdev command. It didn’t work.

When we ran lsmap –all, we could see the same vdisk presented to the client, going from both VIO servers.

We then wanted to try live partition mobility using shared storage pools. This posed some problems, but searching on the error message we encountered (HSCLA24E) turned up this entry:

“This week we were trying to migrate some VIO hosted LPARs using XIV disk from one POWER7 system to another. The disk is hosted on a VIO server via the fabric, then using VSCSI devices to map up to the servers. Unfortunately the migration failed and the message we got was HSCLA24E: The migrating partition’s virtual SCSI adapter 2 cannot be hosted by the existing virtual I/O server (VIOS) partition on the destination managed system. To migrate the partition, set up the necessary VIOS hosts on the destination managed system, then try the operation again.

“So we did some searching and found the following:

“HSCLA24E error:         

1) On the source VIOS partition, do not set the adapter as required and do not select any client partition can connect when you create a virtual SCSI adapter (can cause the error code).
2) Max transfer size of the used hdisk may not be different on source and destination VIOS.
3) The VIO servers may not have set the reserve_policy to single_path, no_reserve is required.
4) Destination VIO servers are not able to see the disks the client needs.
5) The same VTD (virtual target devices) names may not exist on the destination system.”

In our case we addressed no. 1 by unselecting the “any client can connect” option and mapping to the specific client we were using. With these changes, we could successfully migrate the LPAR.

In the course of changing the adapters, we rebooted the VIO servers. Be patient when rebooting. It seems to take some time for the servers to restart and join the cluster. You’ll know it’s ready when the cluster –status command changes from “state down” to “state OK.” (We joked that you only have to give it until “despair + 1.”)

Also, be sure to run df and check your /var/vio/SSP/’clustername’ filesystem that gets created on all the members of your cluster. That was a quick and dirty way for us to determine that our status was about to change to OK. As the cluster comes online, and as you run cluster –status, you’ll see the filesystems mount and the status change from down to OK.

This initial build-out of shared storage pools offers some advantages. For starters, there are fewer, larger LUNs to present and zone to the VIO servers. With larger LUNs being carved up in the pool, there’s less to manage with no reserves and the mkvdev commands. Of course some would argue that this advantage is offset by the need to run mkbpsp commands on both VIO servers.

It’s also nice being able to login to one cluster node, create a vdisk and see that new vdisk show up on all four nodes, rather than having to login to each VIO server separately. This just feels like a cleaner disk management solution.

As I continue to work with shared storage pools, I’m sure I’ll have more lessons to pass along. If you’ve been using this technology, please share your thoughts in Comments.

Restoring Old Data

Edit: I enjoy crazy projects like this one.

Originally posted May 22, 2012 on AIXchange

A customer was looking to restore some data from an old LTO1 tape. The tape was created in 2005 with versions of AIX and Tivoli Sysback that were common back then. Since the customer no longer had hardware that could read the tape, I was asked if I could help retrieve the data.

An LTO3 tape drive can read LTO1 tapes, so I figured I’d look for that hardware rather than try to scrounge up an LTO1 drive. I contacted IBMers Jeff Werner and Pete Dragovich. From their Chicago lab, they use all kinds of leading-edge and vintage hardware and software to conduct customer-initiated projects and training activities. For instance, they recently performed a simulated HMC update (from 5.2.1 to 6.1.3) for a customer that lacked its own test box. They’ve also recently done a couple of proof of concept (POC) projects, one to simulate failover with IBM’s PowerHA high availability solution and the other involving SLES 11 Linux on POWER6. I figured if anyone could help me, it’d be Jeff and Pete.

I picked up the tape from the customer and brought it to the IBM lab. We built a POWER4 server with an old version of AIX and loaded the Tivoli Sysback code along with the tape utilities we needed for our LTO drive.

After getting everything ready, we loaded the tape, ran /usr/sbin/readsbheader –Nd –f/dev/rmt0 –l, and waited for a list of information. Instead, we immediately got errors. The volume label could not be read. After searching online, we discovered we could run commands to move the tape around and try to read some tape labels, but then we got I/O errors.

Fortunately, the customer had made two copies of the backup tapes, so I picked up the second tape. At that point, I was told that they’d used an unusual block size. That turned out to be the key. We tried 0, 512, and 1024, but hadn’t thought about 262144. Once we changed the block size, we were able to restore the data and all went as expected.

With the data restored, we copied it to a Windows machine to burn a DVD copy of the data. The customer then loaded the DVD onto their own Windows machine, and sent it to their server via FTP.

Some takeaways from this project: If you’re archiving data, either keep the hardware and software that’s capable of reading it, or periodically transfer important old data to newer media. And if you use a different block size when writing data, be sure that’s documented—preferably on the tape itself.

I enjoy solving problems, and this was no exception. Now if I’m asked to restore an old Sysback tape, I know I can do it—provided Jeff and Pete are again willing to lend me a hand.

Tracking NPIV Connectivity

Edit: I still love handy scripts.

Originally posted May 15, 2012 on AIXchange

IBMer Glenn Lehman posted this script on a mailing list, and with his permission I’m posting it here. Glenn offered this introduction and description: “I search for various 4-digit IBM storage types. My example is coded to recognize 2810, 2107,1750 (which translates to XIV, DS8300, DS6800).

“I share this as an intro… because prior to starting, our team was concerned with how we would keep track of all this configuration information, so I wrote and we now use my handy script. It gathers and parses and organizes all the info we wanted to track into neat columns. It’s all automatic, and all real-time.

“Our environment was all p570 frames — and strictly NPIV (no vSCSI). We have dual SAN fabrics and dual VIO servers per frame for redundancy and disk multi-pathing.

“Let me know if you find this useful, or if you make any changes or modifications so we can share it with others.”

He adds that the following should be customizable for those familiar with ksh scripting.

#!/bin/ksh
#set -x
#@============================================================================
# Script    :  Collect and display FC/Disk/SAN info for given VIOS pair
# Author    :  Glenn Lehman / IBM
#
# History  :  Created: Thu Jan 26 EST 2012
# Modify    :  Jan 30 2012 – add dph VIOS option
#          :  Jan 31 2012 – add ping test to confirm VIOS is active
#          :  Feb 06 2012 – add uniq date suffix to TMP filename
#          :  Feb 09 2012 – add disk type and count to output
#
# Usage    :  Invoke from remote LPAR with NPIV frame parameter
#=============================================================================
#
##########################  Assumptions  ################################
# There is a remote server that can access all VIO-servers & VIO -clients
#  and that remote server is where this script resides and invokes from
# There are 2 VIO-Servers in a given frame
# There is a distributive shell method configured in the environment
# The LPAR profile name is VIO-…
# Example: Profile=>  VIO-hostAix61prd
#          Hostname=> hostAix61prd
# Storage types are one of the following:
# XIV – 2810
# DS8300 – 2107
# DS6800 – 1750
# There are dual SAN Fabrics with unique domains, so that the FCIDs
# can pinpoint the fabric
# Example:
# 0x15 Fabric vs. 0x16 Fabric


##########################  Customization Step  #########################
# Assign the distributive shell command and usage options
#
DSH=’dsh -w’

function Get_lsmap
{
## capture the full lsmap output for VIO1 ##
$DSH $VIOH1 “/usr/ios/cli/ioscli lsmap -all -npiv” | cut -f2- -d’:’ > $TMP.map1
## convert to line separated stanzas ##
cat $TMP.map1 | grep -v “^ $” | \
    awk ‘
    $1 == “VFC” {print $0 “\n \n”}
    $1 != “VFC” {print $0}’ > $TMP.sep1

## capture the full lsmap output for VIO2 ##
$DSH $VIOH2 “/usr/ios/cli/ioscli lsmap -all -npiv” | cut -f2- -d’:’ > $TMP.map2
## convert to line separated stanzas ##
cat $TMP.map2 | grep -v “^ $” | \
    awk ‘
    $1 == “VFC” {print $0 “\n \n”}
    $1 != “VFC” {print $0}’ > $TMP.sep2

cat $TMP.sep1 $TMP.sep2 > $TMP.map_all
return
}

function Show_Usage
{
echo “Requires a single valid parameter that denotes a VIOS pair
using npiv mapping…”
echo
echo “Usage: $(basename ${0}) < ae2 | b12 | e40 | 88f >”
echo;echo
return
}

function Show_Header
{
echo “\t=========================================================================================”
echo “\t  $FRAME virtual (fscsi) mapping – $(date)”
echo “\t=========================================================================================”
echo ”                                          VIOS
VIOC        “
echo “VFCHOST    ID#  Partition            Server,Slot,Phys
Slot,Virt  Virtual WWPN used  Disks    FC/SAN Status”
echo “———-  —  —————–    ——————-
———  ——————  ——  —————————-

return
}

function Ping_VIOS
{
## Verify VIOS pings OK from this server
VIOS=$1
/etc/ping -c 2 $VIOS  > /dev/null
PING_RC=$?
if [[ $PING_RC != 0 ]]; then
  echo “ERROR: Remote server ping from nim0 to $VIOS failed…. aborting.”
  exit 99
fi
return
}


# —— MAiN PROGRAM BEGiNS HERE —— #
if [[ $# -ne 1 ]]; then
  Show_Usage
  exit 99
fi

## Define pretty-print variables
typeset -L11 VFCHOST
typeset -R2  PLPID
typeset -L20 PLPNAME
typeset -L21 SFCINFO
typeset -L10 CFCINFO
typeset -L8 DISKINFO
typeset -L19 PWWPN

VIOPAIR=$1

##########################  Customization Step  #########################
# Assign Server Frame keyword shortcuts for custom parameter input;
# add these to the Show_Usage “Usage” line
# Assign the various VIO-server hostnames and the Server Frame name
# for each “shortcut”
# Example: Use ae2 as shortcut for Server-9117-MMA-SN10D1AE2
#
case $VIOPAIR in
  ae2 | AE2 ) VIOH1=lbvio1-ae2; VIOH2=lbvio2-ae2;
FRAME=’Server-9117-MMA-SN10D1AE2′;;
  b12 | B12 ) VIOH1=lbvio1-b12; VIOH2=lbvio2-b12;
FRAME=’Server-9117-MMA-SN10D1B12′;;
  e40 | E40 ) VIOH1=lbvio1-e40; VIOH2=lbvio2-e40;
FRAME=’Server-9117-MMA-SN1014E40′;;
  88f | 88F ) VIOH1=lbvio1-88f; VIOH2=lbvio2-88f;
FRAME=’Server-9117-MMA-SN106988F-NDev’;;
  * ) echo “VIOS parm – $VIOPAIR – is invalid; aborting…”
      Show_Usage
      exit 99
esac

TMP=”/tmp/get_vfchosts-$(date +%s)”

## confirm VIOS pair is active and ping-able ##
Ping_VIOS $VIOH1
Ping_VIOS $VIOH2

## collect and reorganize the lsmap output ##
Get_lsmap

Show_Header

## build sorted list of vfchosts ##
grep vfchost $TMP.map1  | awk -v vioh=$VIOH1 ‘{print $0″  “vioh}’ >
$TMP.vfchost1
grep vfchost $TMP.map2  | awk -v vioh=$VIOH2 ‘{print $0″  “vioh}’ >
$TMP.vfchost2
cat $TMP.vfchost1 $TMP.vfchost2 | sort -tt -n -k 2,2 > $TMP.vfchost_all

cat $TMP.vfchost_all | while read VFCHOST PHYLOC PARTID NAME1 NAME2 VIOS; do
    PLPID=$PARTID
    LPNAME=$(echo ${NAME1}${NAME2})
    if [[ $LPNAME != “VIO-“* ]]; then
      PLPNAME=’unassigned’
      HOSTNAME=”
      VIOS=$NAME1                      # shifts position if less fields in line
    else
      LPNAME=${LPNAME%AIX}
      HOSTNAME=${LPNAME#VIO-}
      PLPNAME=$LPNAME
    fi
    SSLOT=$(echo $PHYLOC | awk -F- ‘{print $NF}’)
    PFC=$(grep -wp $PHYLOC $TMP.map_all | grep -w FC | awk ‘{print
$2}’ | cut -f2 -d’:’)
    if [[ $PFC != “fcs”* ]]; then
      PFC=’none’
    fi
    SFCINFO=”$VIOS,$SSLOT,$PFC”
    CSLOT=$(grep -wp $PHYLOC $TMP.map_all | grep VFC | awk ‘{print
$NF}’ | awk -F- ‘{print $NF}’)
    if [[ $CSLOT != “C”* ]]; then
   0;   CSLOT=’undefined’
    fi
    VFC=$(grep -wp $PHYLOC $TMP.map_all | grep VFC | awk ‘{print $3}’
| cut -f2 -d’:’)
    if [[ -n $HOSTNAME ]]; then
      $DSH $HOSTNAME “lsdev -c disk” | cut -f2- -d’:’ > $TMP.disk
      DSKCNT=$(grep -c MPIO $TMP.disk | awk ‘{print $1}’)
      ##########################  Customization Step  #########################
      # Assign the possible remote disk types based on their 4-digit
      # machine type
      #
      if ( grep -w 2810 $TMP.disk > /dev/null ); then
        DISKTYP=’XIV’
      elif ( grep -w 2107 $TMP.disk > /dev/null ); then
        DISKTYP=’DS83′
      elif ( grep -w 1750 $TMP.disk > /dev/null ); then
        DISKTYP=’DS68′
      else
        DISKTYP=’unkn’
      fi
      DISKINFO=”$DISKTYP($DSKCNT)”
      $DSH $HOSTNAME “fcstat $VFC | grep ‘Port'”  | cut -f2- -d’:’ > $TMP.fcst
      WWPN=$(grep ‘World Wide Port’ $TMP.fcst | awk ‘{print $NF}’)
      FBRC=$(grep ‘Port FC ID’ $TMP.fcst | awk ‘{print substr($NF,1,4)}’)
      TYPE=$(grep ‘Port Type’ $TMP.fcst | awk ‘{print $NF}’)
    else
      WWPN=’no active flogi  ‘
      FBRC=’….’
      TYPE=”
      DISKINFO=’unknown’
    fi
    PWWPN=$WWPN
    CFCINFO=”$CSLOT,$VFC”
    PCNT=$(grep -wp $PHYLOC $TMP.map_all | grep Ports | cut -f2 -d’:’)
    print “$VFCHOST $PLPID  $PLPNAME $SFCINFO  $CFCINFO  $PWWPN
$DISKINFO $PCNT ports logged in $FBRC $TYPE”
done
echo

## Be polite and clean up after yourself ##
rm -f $TMP.*
exit

In the process of posting this script to the blog, some of the formatting may be altered. Make the logical adjustments as need be. And if you have your own handy scripts or tools, please send them to me and I’ll share them here.

Another Grab Bag

Edit: Some links no longer work.

Originally posted May 8, 2012 on AIXchange

As I’ve noted before, I love passing along tips and tricks. And I love hearing IT horror stories. A little of both this week:

* First, here’s an email I got from someone who changed the xfer_size option on his machine. In his words:

“I found this blog post a day or two too late. I had tried some AIX fibre card tuning on my Domino servers, which consist of two physical and two VIO NPIV virtual adapters.

“I meant to only change the real cards to 200000 for the xfer_size option, but I had changed the virtual adapters as well, and rebooted. The LPAR hung at LED code 554, and I had to mount OS disk in maintenance mode to mess around with it. This allowed me to undo my change and rmdev the OS disk and paths, to get the LPAR back.

“FYI in case anyone tries this in the future, hopefully they will learn from my mistakes. I am not sure if this is an XIV or an NPIV issue, but I would advise people to not mess with the NPIV xfer_size settings, especially for their root disk.”

* I found this item on a mailing list (from Phil L.) and feel it’s worth sharing:

“[VIOS 2.2.1.3 introduced] clustering… which is controlled by System Director. System Director automatically starts snmpd, so even if you have it disabled with the viosecure commands it will still start via System Director. The workaround:

            dirsnmpd (Systems Director) is started from:

                /opt/ibm/icc/cimom/bin/startdirsnmpd

            To inhibit dirsnmpd at bootup:

            Edit: startdirsnmpd script

            Comment-out:

                # /usr/bin/startsrc -s snmpd > /dev/null 2>&1

“IBM Support is considering modification of the viosecure rules.”

* I saw this in a recent e-mail:

“We have two VIO servers that need to be updated from version 1.4.1.2-FB-9.2 to version 2.2.0.13-FP24 SP03. We did update the two VIO servers in our second data center (so it was a non-production environment). The problem was that we did it incorrectly.

“We put in the migration CD and ran ‘updateios; instead of booting off the CD and running ‘migrateios.’

“We had to rebuild the whole environment from scratch. We definitely want to avoid that in the production environment!”

Me now. All of you have a test lab to try things out first, right?

“I did have a backup of the VIO servers (created via the command ‘ioscli backupios –file … -mksysb,’ run as padmin) but we were unable to recover from that backup.

“We worked with IBM Support and they still were not able to recover from our backup so we had to do a fresh install of the VIO server software and rebuild the environment from scratch, and then I had to
recover each LPAR (12) from their mksysb’s. It was a 40-hour weekend!

Me again. All of you know your backups are good and you’ve tested your recovery process, right?

“In fairness, a lot of that time was actually waiting for CDs to spin. I think if the migration had been done correctly there would only have been a handful of commands that actually needed to be run.”

IBM Sticks with the HMC

Edit: The HMC has still survived. Some links have not.

Originally posted May 1, 2012 on AIXchange

So the SDMC evolution was upon us. I took my test drive. But then, just like that, it’s over. The HMC has apparently survived.

Nigel Griffiths (mr_nmon) posted these tweets:

  • “SDMC withdrawn as IBM listens to customers. SDMC only functions like dual VIOS+AME for Blades go in the next HMC version. Long live the HMC.”
  • “I had better add: SDMC owners needing Blade functions it is supported for 2 more yrs & you can convert to HMC (same HW) at your convenience.”

 What exactly is going on?

Start with these announcement letters (here and here). IBM is no longer selling SDMC units. The SDMC hardware indicator is going away. The SDMC and SDMC virtual appliances are also going away, although they’ll be supported through April 2015.

IBM has announced this statement of direction:

“IBM intends to enhance its systems management capabilities for Power Systems hardware as follows:

  • Continued integration between the base platform management capabilities in HMC and advanced capabilities in IBM Systems Director
  • Enhancement of the HMC to add support for Power Systems blades and mixed rack and blade server environments
  • New HMC virtual appliance offering
  • New process for transitioning from SDMC to HMC
  • Improved usability”

I, for one, look forward to running the HMC appliance in smaller environments.

This FAQ includes the following, but read the entire document as I didn’t copy everything:

“Why is this change being made?

“Clients have asked IBM to continue enhancing the HMC, which is a trusted, secure and dedicated management appliance for Power Systems. At the same time, due to the rapid increase in adoption of advanced virtualization and cloud solutions by Power Systems clients, IBM plans to continue enhancing the virtualization management capabilities of the Systems Director management server software.

“What are the options for clients using an SDMC?

“Clients currently using an SDMC for management can either convert it to an HMC (at no charge) immediately, or continue using the SDMC and convert it at a more convenient time in the future. The SDMC will continue to be supported by IBM until April 2015.

“What are the options for clients using Power blades?

“Clients currently using an SDMC for managing Power blades can continue using it or switch to using IVM instead. When the new release of the HMC (featuring blade support) is available, they can then convert the SDMC to an HMC at no charge.

“Are hardware changes required to convert an SDMC to an HMC?

“The SDMC is based on the same hardware as the HMC, and no hardware changes are required to make the conversion. After converting an SDMC to an HMC, clients can benefit from the additional memory and disk capacity.

“Will endpoint management licenses be required for the HMC?

“No.

“What happens to support agreements after clients convert an SDMC into an HMC?

“SDMC clients with current SWMA contracts may contact IBM Support for assistance before, during and after the HMC conversion process. When their SWMA is due for renewal, clients who have converted an SDMC to HMC should renew using the HMC Machine Control Program Remote Support
Agreement (MCRSA).

“How can a client using SDMC to manage Power blades transition to using IVM?

“As an alternative to the SDMC, clients can manage Power blades with the IVM. However, the IVM has limitations (compared to the SDMC and HMC) and cannot support dual VIOS configurations.

This IBM Redpaper covers SDMC-to-HMC migrations.

Basically you prepare for the migration by gathering IP addresses, verifying passwords, backing up profiles and then removing managed systems and frames from the SDMC. You download the HMC code and service packs (or order it on media) and then install the HMC code. I understand IBM will soon come out with video demonstrations of this process.

Keep in mind that none of this affects the FlexSystems Manager that will be used with the newly announced IBM PureFlex systems. I’ll get into that topic soon.

So what are your thoughts about this change? Had you done much with the SDMC, or were you taking a wait and see attitude?

The Connection to Storage

Edit: Still good information.

Originally posted April 17, 2012 on AIXchange

In case you’re wondering why this server blog just published a post about storage, it’s simple: Without storage, servers don’t have anywhere to read and write their data. Many of us server admins do have some knowledge of storage, but many more do not. Understanding the differences between storage technologies is important. It can help us when we need to discuss our options with our storage friends.

Anyway, back to Norman Bogard’s storage webinar. By the way, although I wish everyone had access to this training, it was provided exclusively for IBMers and IBM Business Partners. The content is not available online, which is why I’m posting this information (with Norman’s permission).

I’ll continue with a rough summary of how he compares and contrasts network-attached storage and storage area networks. We’ll start with SANs:

“Block-level storage devices and SANs like IBM V7000, IBM DS8000 and IBM XIV provide access to equal sized blocks of storage, and the blocks are found by block numbers on a device. All read and write operations are performed on data blocks – mainly using the SCSI protocol. Block services are segmented into LUNs or vdisks, and you might usually have a few dozen of them.

“On the other hand, NAS devices like N series, SONAS or V7000 Unified provide access to files.  These files are found by a name within a tree of names: read, write, create, delete and many more. CIFS, NFS, FTP and other protocols are used to access these files. Device services are exposed as exports, directories and files, and in this case we might be accessing a few hundred, or possibly even millions or billions of files.

“Your NAS may be connected to hundreds or thousands of client machines. Authorization is handled by user IDs for reads, writes and meta-data operations.

“Who owns the filesystem? With direct-attached storage, it’s a simple case. The storage lives on the server. Think of regular SCSI disks and expansion drawers filled with disks, or old SSA drawers of disks.

“With a SAN, the server still owns the files and it controls how the data is written to the disk, even though those disk arrays might be on a disk subsystem instead of internal disk. NAS, on the other hand, handles the filesystems and the files and just gives you access to them after you have authenticated.

“With converged, or unified, storage, there are two fundamental approaches to intermixing block and file storage within a single system. IBM’s N series uses block on file. A device file with a logical unit number (LUN) assigned to it is stored within the file server’s WAFL (write anywhere file layout) file system and then mapped to a host. File and block data are stored within the same file system.

“IBM’s Storwize V7000 Unified uses file on block instead of WAFL. A raw device from the V7000 is mapped to hosts. File data is contained within discrete devices. Host block data is contained within discrete devices. File and block data are stored independently.

“Based on your application type, rules of thumb can help you decide whether NAS or SAN makes more sense in your environment.

“Applications and data types that typically reside in block stores or SAN include RDBMS (Oracle, SQL Server, DB2), analytics (stream processing), OLTP, metadata layers (component of content management), e-mail (MS Exchange, Lotus Notes) and virtualization stacks (VMware: VDI, VMDK implementations, HyperV; Citrix Xen)

“Applications/data types that typically reside in files or NAS includes rich media (pictures, videos, seismic data, medical imaging, etc.), VOD, AOD, IPTV analytics (SAS grid), enterprise content management (ECM, e.g., web stores), research data sets, user files (documents, etc.), product lifecycle/data management (PLM/PDM) and virtualized environments (VMware client-driven deployment).

“Another consideration is how you backup these different environments. With NAS you typically have consistent snapshots since the files and filesystems are consistent on the NAS device. Replication is supported, and NAS usually integrates with backup software.

“With a SAN, integration with the host file system is needed to ensure consistency. Many times backups are moved through a master media server to disk or tape. Replication is supported once the file system is consistent.”

The webinar also covers file systems, file shares, network services, authentication and authorization, quota, data availability, data protection using snapshots, backups and replication, antivirus support and file cloning.

I love education in this format. This webinar takes a big concept like storage and breaks it down to easy to comprehend descriptions. We may work on servers, but servers connect to storage. That’s why learning about storage is worth our effort.

Valuable Insight into Storage

Edit: It amazes me how much further we have come since I wrote this.

Originally posted April 10, 2012 on AIXchange

I recently took some online training that I found interesting and valuable. The webinar presenter, Norman Bogard, compares and contrasts network-attached storage (NAS) and storage area networks (SANs). (In his presentation Norman acknowledges Brett Cooper and Nils Haustein for their input, so I want to be sure to mention them here.)

When I think of early iterations of direct-attached storage, like SSA disks or regular internal SCSI disks, it amazes me how far we’ve come. Advances have been brought not just to storage hardware, but to the network infrastructure. Networks and switches are so much faster and more robust now.

The webinar opens with a review of the history of NAS and SAN and protocols like NFS, NCP, SMB and CIFS. Then some terminology is introduced. I’m paraphrasing and borrowing some of the language from the slide deck since Norman did such a good job of compiling the material:

“SAN, or block storage, will leverage Small Computer System Interface (SCSI) commands to read-write specific blocks. Common SCSI access methods include Fiber Channel (FC), Internet Small Computer System Interface (iSCSI), or InfiniBand (IB). InfiniBand is a high speed network interconnect.

“NAS, or file storage, reads and writes files instead of blocks. The NAS has control of the files, contrasted with a SAN where the server would have control over the files.  A file server is a storage server dedicated (primarily) to serving file-based workloads.

“A NAS gateway is a server that provides network-based storage virtualization. It provides protocol translation from host-based CIFS/NFS to Storage Area Network (SAN) based block storage. Examples of NAS gateways are IBM N series & SONAS; NetApp V Series; EMC VNX/Celerra; OnStor (LSI); HP P4000 Unified Gateway.

“Unified Storage is a single logical, centrally managed storage platform that serves both block (FC, iSCSI, IB) and file-based (CIFS, NFS, HTTP, etc.) workloads. Examples of Unified Storage includes IBM N series; NetApp V series; IBM Storwize V7000 Unified.

“When you compare NAS and SAN, you will find that they have similar concepts. For example, your redundancy for your SAN will come from your MPIO or SDD drivers, while redundancy for the NAS will come from teaming or trunking your network ports for resiliency or improved bandwidth, depending on how you have set things up.

“Your security for a SAN will come from LUN Masking and Zoning, while you would control access on the network the same way you always would, with things like VLANS, exports, and shares.

“Your physical connectivity to the SAN would come via the HBA, while your network traffic for the NAS would go out the same NIC that it always has, at least until converged network adapters become more widely deployed. Once we have converged adapters, all of the traffic will be network traffic, although you will then be dealing with more encapsulation of the different frames and protocols.

“Your underlying protocol on a SAN is SCSI, while you use the same IP/UDP protocols that you do with networking when you use NAS.

“You will call your SAN devices arrays, and you will present LUNs, while your NAS will have filers and data movers. You will have structured/relational data on a SAN, and unstructured data on a NAS.”

From there, Norman contrasts the concepts of block storage (SAN) and file storage (NAS). I’ll share more about this webinar in next week’s post.

The Importance of DR Testing

Edit: Taking a backup without actually testing you can recover is really just as good as making a wish. Link no longer works.

Originally posted April 3, 2012 on AIXchange

Recently my customer wanted to see if its old, unsupported application could be recovered in an emergency. They were running AIX 5.2, and I was cloning that to another piece of older hardware for a disaster/recovery test. While the customer had been taking mksysb and application backups for some time, this was the first actual attempt to recover the system.

After restoring the mksysb to the target machine, we went to run the vendor’s built-in scripts to recover the database. It turns out that all of the data and binaries we needed to run the recovery operation was on datavg instead of rootvg.

It also turned out there was no datavg backup, only the database backups. This became the first item to address, ensuring we had a savevg of datavg. In this case the customer had 135 logical volumes that had to be recreated, some jfs2 and some raw.

The customer really wanted this cloned machine to be identical to the source machine, down to the PP size, but no way was I going to recreate 135 logical volumes manually. So I went ahead and did a savevg from the source machine. Had this been an actual DR situation, we would have already been in trouble had the LV information not been stored somewhere. (Hint: Besides backups, it may be handy to have output from important files like /etc/filesystems available to you in case of an emergency.)

When I tried to restore the information from the savevg and remake the volume group

(smitty/system storage management/logical volume manager/volume groups/remake a volume group),

it kept coming up with a 512MB PP size instead of the 64MB PP size I was inputting. Even when I tried it from the command line (restvg –f /dev/rmt0 –r –n –P 64 hdisk4), it’d still create the 512MB PP size.

However, since I still had the source system, it was a simple matter of taking the logical volume information from the source volume group and copying it to /tmp:

lsvg –l datavg > /tmp/datavgout.file

Because I only cared about the first and third columns of the lsvg output, I ran this command to obtain the LV name and LV type:

cat /tmp/datavgout.file | awk ‘{print $1, $3}’ > /tmp/datavgout2.file

I created the datavg on the target system manually with the 64 MB PP size. Then I edited the datavgout2.file and made sure it had the correct LV name, LV type and the number of PPs that I wanted on the target machine. To read the file and create the LVs, I ran this simple loop:

cat ‘/tmp/datavgout2.file’ | while read $i $j $k
do
mklv –t $j –y $i datavg1 $k
done

($i is the name, $j is the type and $k is the number of PPs.)

I did end up using smitty (smitty/system storage management/logical volume manager/volume groups/restore) to restore the files in the jfs2 filesystems.

Once the volume group had been recreated and the necessary files were restored, I could use the database backup tape to restore the database.

The customer now takes a daily savevg of datavg, and all of the necessary LV information from the entire system is saved as part of the rootvg backup. In the end we were able to get a running system. Even more important, we learned something. Without going through this exercise, my customer may have been missing key information and data it needed to restore the system in an actual disaster.

This is why simply having a DR plan isn’t enough. Your plan must be tested. Even if you think you have good backups, it might not be the case.

Incidentally, Anthony English recently wrote about recovering datavg filesystems as well. He discusses using the mkvgdata (to capture the volume group structure) and restvg commands. His information is worth considering if you find yourself recreating a volume group. It’s certainly simpler than going through the gyrations that I went through, though I’ll still need to test it to see if Anthony’s approach will eliminate my customer’s PP issue.

VIOS and IBM i

Edit: Some links no longer work.

Originally posted March 27, 2012 on AIXchange

Two questions for IBM i shops: Are you reluctant to use a VIO server and attach it to your SAN, even though your SAN isn’t supported directly by IBM i? Do you end up telling yourself that internal disks give you better performance?

If so, this document might help alleviate your fears.

It covers different topics related to IBM i virtualization and open storage, including how to use an IBM i partition to host another IBM i partition:

“An IBM i 6.1/7.1 LPAR can host one or more additional IBM i LPARs, known as virtual client LPARs. Virtual client partitions typically have no physical I/O hardware assigned and instead leverage virtual I/O resources from the host IBM i partition. The types of hardware resources that can be virtualized by the host LPAR are disk, tape, optical and networking.”

There’s also this about IBM i using open storage as a client of the VIOS:

“IBM i virtual client partitions can also be hosted by VIOS. VIOS is virtualization software that runs in a separate partition with the purpose to provide virtual storage, optical, tape and networking resources to one or more client partitions. The most immediate benefit that VIOS brings to an IBM i client partition is the ability to expand its storage portfolio to use 512-byte/sector open storage. Open storage volumes (or logical units, LUNs) are physically attached to VIOS through a FC or a Serial-attached SCSI (SAS) connection and then made available to IBM i. While IBM i does not directly attach to the storage area network (SAN) in this case, as soon as open storage LUNs become available through VIOS, they are managed the same way as integrated disks or LUNs from a directly attached storage system and run IBM i on a Power blade.”

Finally, something about blades:

“The third major virtualization enhancement with IBM i 6.1 is the ability to run an IBM i LPAR and its applications on a Power blade server, such as IBM BladeCenter JS12 or JS22. Running IBM i on a Power blade is beyond the scope of this paper. Refer to the IBM i on a Power Blade Readme First for a complete technical overview and implementation instructions.”

The document covers supported configurations and concepts to help you visualize what I’m proposing. I’ll highlight this section on 5.2 performance:

“When creating an open storage LUN configuration for IBM i as a client of VIOS, it is crucial to plan for both capacity and performance. As LUNs are virtualized for IBM i by VIOS instead of being directly connected it may seem that the virtualization layer will necessarily add a significant performance overhead. However, internal IBM performance tests clearly show that the VIOS layer adds a negligible amount of overhead to each I/O operation. Instead, the tests demonstrate that when IBM i uses open storage LUNs virtualized by VIOS, performance is almost entirely determined by the physical and logical configuration of the storage subsystem.

“The IBM Rochester, MN, performance team has run a significant number of tests with IBM i as a client of VIOS using open storage. The resulting recommendations on configuring both the open storage and VIOS are available in the latest Performance Capabilities Reference manual (PCRM).”

I find more customers that are willing to give VIOS a try. I’ve yet to find one that decided to switch back because performance was unacceptable.

I realize that this blog’s readership is very AIX-centric, but plenty of shops run Linux and IBM i as well. It’s nice to know that the frame that you’re virtualizing with VIOS to run AIX can run other operating systems as well. Not that this is a new idea.

Automatically Changing IP Addresses in a D/R Environment

Edit: This is still an interesting idea.

Originally posted March 20, 2012 on AIXchange

I recently spoke to a customer that has its primary and backup servers in different locations. The customer boots from a SAN, with the SAN replicating from site 1 to site 2. In the event of a disaster, the customer wants to fire up its site 2 LPAR from the replicated copy of rootvg. However, the networks are also in different locations.

Rather than do some admin kung fu to allow each network to have the same IP address when it boots up, the customer sought the capability to easily change the IP address depending on the frame being used to boot the LPAR. The customer says this functionality is available in VMware’s recovery management product, and wanted to know if the same type of thing can be done from the HMC.

I checked with a couple of my IBM contacts to see if they had any ideas. Chris Gibson had a good one.

“You could write a script that lives on the source system. It checks the system ID (lsattr -El sys0 -a systemid) when the LPAR boots. And if it’s a particular system serial number, it could bring up the interface with a different IP address.”

I forwarded this suggestion to the customer, and literally within a day their script was working. With the customer’s permission I’m sharing it here, along with their caveat:

“It works … the script is pretty rough. I’m no shell script expert by any means, but it does what I need it to do. I have it in /etc/inittab right before the rctcpip stuff, and that seems to work fine.”

Before trying to use this script, make sure your domain, nameserver, gateway and primary and backup server information are accurate for your environment. Of course you might be able to simplify or improve what’s here, but this should help you get started. Also note that in the process of posting this script to the blog, some of the formatting may be altered. You savvy scripters should move things around if need be.

#!/bin/ksh
# This script checks to see whether the system is booting off hardware
#at the primary or backup site and sets the IP, gateway, and name
#server based on what hardware it is booting from

# check to see which hardware is booting
OPTION=`lsattr -El sys0 -a systemid -F value`
IPADDRESS=`lsattr -El en0 -a netaddr -F value`
HOST=$(hostname)
DOMAIN=”mydomain.com”
PRIMARY=”IBM,123″
BACKUP=”IBM,456″
NAMESERVER=”10.9.0.1″
GATEWAY=”10.9.16.1″
# set the primary and backup IP to the correct subnet (xx.1 for primary, xx.2 for backup)
PRIMARYIP=`echo $IPADDRESS | awk -v pos=4 -v repl=1 ‘{print
substr($0,1,pos-1) repl substr($0,pos+1)}’`
BACKUPIP=`echo $IPADDRESS | awk -v pos=4 -v repl=2 ‘{print
substr($0,1,pos-1) repl substr($0,pos+1)}’`
BACKUPGW=”10.9.16.1″
BACKUPNS=”10.9.30.32″

echo “Host Hardware: $OPTION”
echo “Current IP: $IPADDRESS”
echo “Primary IP: $PRIMARYIP”
echo “Backup IP: $BACKUPIP”

if [ “$OPTION” = “$PRIMARY”  ]
then
        echo “Running from primary site”
        if [ “$IPADDRESS” = “$BACKUPIP” ] ; then
                echo “Setting IP for primary location”
                /usr/sbin/mktcpip -h $HOST -a $PRIMARYIP -m
255.255.255.0 -i en0 -n $NAMESERVER -d $DOMAIN -g $GATEWAY -A no -t
N/A
        fi
fi
if [ “$OPTION” = “$BACKUP”  ]
then
        echo “Running from backup site”
    if [ “$IPADDRESS” = “$PRIMARYIP” ] ; then
        /usr/sbin/mktcpip -h $HOST -a $BACKUPIP -m 255.255.255.0 -i en0 -n
$BACKUPNS -d $DOMAIN -g $BACKUPGW -A no -t N/A
    fi
fi

As always, I love getting reader questions and submissions. That includes scripts. Please send me your scripts or any other useful tips. We all benefit when you share your expertise.

Where the Virtual Still Falls Short

Edit: At least google voice running on your laptop lets you text using a Model M. The MIT link no longer works.

Originally posted March 13, 2012 on AIXchange

I’ve written before about my fondness for the durability and quality of a certain type of old keyboard.

In fact, for a very long time, I told myself that in a perfect world I’d find a way to hook my Model M keyboard up to a Bluetooth adapter, and then connect that to my mobile phone. Even if it wouldn’t be practical to carry that setup on the road, it’d sure be satisfying to take the typing speed that only a real keyboard can provide and bring it into the mobile world.

To my chagrin though, others eagerly anticipate a keyboard-free future:

“Why do we still use a keyboard and mouse to interact with digital information? This mode of human-computer interaction, invented more than 40 years ago, severely constrains our ability to access and interact naturally with digital content.

“Our group designs new interfaces that integrate digital content in people’s lives in more fluid and seamless ways. Our aim is to make it easier and more intuitive to benefit from the wealth of useful digital information and services. Our work is focused in the following areas:

“Augmented Experiences: We augment a person’s experience of their surroundings with relevant digital information. We try to make the experience as seamless as possible, blending the digital information into the physical environment and making interaction with that information natural and fluid.

“Responsive Objects: We alter everyday objects by embedding sensors, actuators and displays so that the objects can respond to people using them in meaningful, noticeable ways.

“Collaborative Interactions: We experiment with novel interfaces that are designed from the start for use by multiple people. The projects support collaborations ranging from small numbers to very large numbers of people and further differ in whether they support collocated versus remote collaboration as well as synchronous versus asynchronous collaborations.

“Programmable Materials: We invent interfaces and machines for control and manipulation of materials such as paper, fabric, wood and food. Our goal is to endow materials and manufacturing with some of the advantages associated with the digital world such as modifiability, programmability, responsiveness and personalization.”

Sure, I guess I look forward to the day when my glasses can transform into some type of heads-up display, and I can access other types of information just by focusing my eyes or looking in different directions. It will be nice when my personal digital assistant truly becomes that, or when Siri and Evi and the like actually function seamlessly.

It will be a great convenience to speak to my machine and find it not only understands me, but knows what I mean and not just what I say. (Or maybe we’ll have to get brain implants before we can have a better human/machine interface. Not really convenient, but still potentially useful.)

For me though, the physical still beats the virtual, hands down. I know a big part of that is all the time I’ve invested in physical keyboards. I’ve had my awesome Model M for going on three decades now. Given more time, I suppose I’ll eventually become proficient on virtual keyboards. But I type so much faster and scroll around the screen so much more easily with my keyboard and mouse. I can type reasonably quickly with a Blackberry since it has an actual keyboard, but with any virtual keyboard touch screen, I just plod along. And while my Android phone is adequate at voice recognition, that doesn’t help me when I’m at a meeting or in any environment where I don’t have the luxury of speaking aloud. Sure, virtual keyboards are fine for short messages, but when I need to type anything more substantial than “:)” and LOL, I don’t like them. Autocorrect can rescue me some of the time, but often it just introduces new problems.

Call me a dinosaur if you insist, but until I get my perfectly augmented reality, I’ll live happily with my old mouse and older keyboard.

The Disruptive Force of Data Lost

Edit: The link below is to one of my older articles. It still feels like yesterday.

Originally posted March 6, 2012 on AIXchange

This anecdote from author Neil Gaiman got me thinking:

“I left my Macbook Air on a plane on Sunday night, and have spent most of the rest of the week doing things like being on the phone to the backup service, learning that the tracking software I’d thought was on there was on there, but hadn’t been activated, buying a new computer, etc. I didn’t get the thing I was meant to be writing written. I was grumpy.

“And this morning I got an e-mail telling me that the thing that I would have been working on all week, that I’d already lost 15 pages of … was now going to change so radically I would have wasted a week’s work if I’d been working on it. So I am happy.”

I view this story on a few levels. Do you have a backup of your phone and laptop if you should lose both right now? If you’ve installed tracking software, have you tested it? Are you comfortable knowing that all someone has to do is wipe your machine and your tracking software will be useless? And were that to happen, would you still have your contacts, your latest projects, the data that is critical to you? Or would they be lost forever?

On a larger scale, if your data center burned down, can you restore it? Do you have disaster/recovery procedures in place? Have you tested them?

But beyond the loss of the Gaiman’s machine and his data, I was also struck that he had a deadline, he had something that needed to be worked on and completed, and he had lost some of that work. Fiddling with his computer rebuild had caused him to lose time that he could have spent working on the project. As things turned out though, his requirements changed and any work he would have done would have been wasted.

I’ve heard that some people are purposefully nonresponsive. When they get a call, e-mail or instant message, they’ll wait instead of answering immediately. Then, by the time they do respond, the person making the inquiry may have solved the problem without any assistance. While I don’t think IT folks should ignore their users, it is true that many times people will reach out rather than take a moment to examine their issue a little further. And, once they dig into the problem, they can often help themselves.

Of course, in our world, projects seldom change, and time spent fighting our machines is simply time lost. At least in this case, it’s nice to think that the universe was looking out for this author and things worked out in the end for him. Hopefully the universe looks out for all of us on occasion.

ASO the First Phase in Autonomic Tuning

Edit: This is something I have not thought about in a long time.

Originally posted February 28, 2012 on AIXchange

I’ve touched on Active System Optimizer (ASO) before, but now that Nigel Griffiths has released an ASO video, it seems an appropriate time to expound on this topic.

To run ASO, you must be at AIX7.1 TL01 or greater on POWER7 hardware running in POWER7 mode. ASO is installed by default on AIX 7. It’s not supported on older AIX releases or older hardware. (It will appear to be running, but will actually be hibernating.)

Nigel recommends running:

* oslevel –s to verify your AIX version

* lslpp –L | grep –I optimi to help verify that the ASO fileset is installed

* lsconf | grep ^Processor to verify that your LPAR is running in POWER7 mode.

ASO works under the covers; you don’t need to do anything to start it. It optimizes workloads based on AIX kernel statistics and the POWER7 hardware performance counters. It’s designed to improve cache and memory affinity by dynamically evaluating tuning options and making changes — including moving processes and memory — on the fly. It conducts pre- and post-monitoring to ensure that the changes improve performance. If improvement isn’t detected, ASO backs out the changes, hibernates and tries again later. Listen to Nigel’s presentation for details on this.

ASO provides cache affinity, aggressive cache affinity and memory affinity. It monitors performance and detects situations where threads can be moved from one chip to another to utilize the closer L3 cache.

ASO operates best on multi-threaded workloads. The jobs it monitors should be stable and long-running so it can most effectively make changes to the workload. It also needs to be a busy LPAR, otherwise you don’t gain much by trying to move things around. By “busy,” I mean that the processes need at least a 10-second lifetime. If applications come and go more quickly, ASO cannot make recommendations. If you’ve done manual tuning, either on your own or based on recommendations from IBM Support, ASO will simply hibernate rather than monitor and override your changes. In addition, specific processes can be marked as “don’t bother these,” and ASO won’t impact them.

ASO runs as an SRC kernel service. To locate it, search your machine for the aso process.

Use these commands to get ASO running:

1) start the kernel service with startsrc –s aso

2) run asoo –o aso_active=1

According to Nigel, the aso process uses very minimal CPU time, so this shouldn’t add much additional overhead on the system.

Logs are located in the /var/log/aso/* directory. You’ll see two files:

* aso.log, which has on/off/hibernating information

* aso_process.log, which provides details of actions and modified processes

Nigel says this log file isn’t formally documented, but you should be able figure out what it’s doing.

The man page says ASO can run outside of SRC, but this should probably only be done for debugging. You can also set shell variables before starting processes and apps. This provides some control over how they function with ASO.

As always, Nigel has much more than I can cover here. For instance, he shows real life examples of ASO running on his machine, output from the logfile, and more.

To wrap up, he tells us that ASO is largely set and forget. ASO uses near zero CPU when running, and it gently applies changes, tests behavior and undoes the changes if necessary. Because ASO is good for complex, multithreaded, long-running applications, it can move things around inside of your LPAR if they’re spread across CPUs.

Best of all, this is, as he says, just the first phase of clever autonomic affinity tuning. So keep your eyes open for what’s ahead.

A Good Look at PowerHA

Edit: Some links no longer work.

Originally posted February 21, 2012 on AIXchange

Another great Virtual User Group webinar recently took place, this one featuring Shawn Bodily’s presentation, “Introduction to PowerHA SystemMirror for AIX Standard Edition.” Be sure to get the presentation materials and listen to the replay. And look forward to the next VUG webinar, when this topic will be continued.

This material is very similar to the presentation on the same topic at last fall’s IBM Technical University. That’s an indication of the interest surrounding this solution. As Shawn notes in his presentation, IBM has unveiled 23 major releases of the product, an average one release per year. More than 12,000 customers use it worldwide. While Standard Edition is the focus of Shawn’s presentation, he points out that PowerHA SystemMirror for AIX Enterprise edition allows for multi-site cluster management, while also including all the functionality of Standard edition.

Shawn notes that “PowerHA gives you cluster management for the data center. The software monitors, detects and reacts to events. It establishes a heartbeat between the systems, and enables automatic switch-over.” He then defines what HA solutions generally try to do, which is eliminate single points of failure. The goal is to reduce downtime, but HA can also help you with planned downtime as well. It might not be fault tolerant, but it will be fault resistant.

One chart shows how failure points can be eliminated. To eliminate node failure, use multiple nodes. To eliminate VIO failover, use dual VIO servers. To eliminate site failure, deploy additional sites. Other items are covered, but again, the idea is to build in redundancy so you can continue to provide access to the applications and data that the business runs on.

Shawn discussed his customer interactions (an area I’ve touched on previously):

“First, another IBM representative will tell the customer about the hardware and the systems’ reliability, availability and serviceability (RAS) features. Then a second rep will discuss live partition mobility and how it seamlessly shifts logical partitions from one frame to another. So after 20-30 minutes of hearing about how the hardware never fails, THEN Shawn must step in and explain why the customer should be concerned with high availability and disaster recovery. That’s one tough act to follow.”

Some additional slides introduce Live Partition Mobility and explain how it allows you to move your live running operating system and application from one physical frame to another. Of course, that’s a hardware maintenance solution. What about software maintenance? With PowerHA, you can fail over to the other node, upgrade your application or your operating system, then fail back and do the same thing on the other side. Shawn notes that basically, PowerHA performs the functions that LPM doesn’t. He also gets into how PowerHA is used to recover from node failures, network failures, loss of shared storage access, and — with version 7.1 based on Cluster Aware AIX technology — rootvg errors. Finally, Shawn covers the differences between PowerHA 6.1 and 7.1.

Shawn makes these additional points:

* Remember, we can fail over from one system to one system, from one system to any other system, from any system to one, and from any to any. We can also fail over between different versions of hardware from the same or different families, assuming you can live with the performance degradation, and after you verify that the version of the operating system you want to run will work with your particular configuration. Failing POWER7 to POWER6 or POWER5 could conceivably work as long as you verify that your particular setup is supported.

* Often you’ll want to make your service IP address highly available, along with your application server and your shared storage.

* You can create user defined resources, custom resource groups and more granular resource group options. You can set up resource group dependencies. In his example, you might want to be sure your database is running before you try to start application servers, so you could configure that as a dependency.

* You can configure different priorities and choose which nodes to your resource starts on. This provides a great deal of flexibility and control when setting things up.

* You can also configure things to automatically run DLPAR and COD operations. So you could have a very “skinny” standby node, but when needed it could perform operations on the HMC and bring additional memory and CPU resources online.

* You can have application monitors so you can take actions if PowerHA detects that the application has gone down, and you can set up file collections to have the software help you keep config files or other important files kept in sync across the cluster. This is meant to support all regular files, as opposed to things like trying to keep password files in sync.

* Configuration and smart assistants are available to configure clusters. System Director plugins are also available, both for managing and monitoring your cluster’s state.

* The CAA command — /usr/sbin/clcmd — can be used to distribute commands across all cluster nodes.

* A cluster test tool can be used to validate clusters. This is also a good way to run tests across many different clusters in the environment, to ensure that we’re running the same tests across all of the machines.

Shawn’s final slide lists these great resources:

* IBM Redbook: PowerHA SystemMirror 7.1 for AIX

PowerHA & Aix Support & Compatibility Matrix

PowerHA Hardware Support Matrix

Incidentally, you can follow Shawn on Twitter.

I really enjoy these webinars, and of course the availability of the replays and presentation materials is a huge convenience. Hopefully my writing about VUG webinars encourages you to take the time and listen for yourself.

Note: Completely unrelatedly, I’m a YouTube sensation. Well, maybe not, but I’m there. Five new videos just went live where you can see me and some other IBM Power Champions talking with IBM’s Ian Jarman at last fall’s Technical University.

IVM, HMC and SDMC Continued

Edit: The links still work.

Originally posted February 14, 2012 on AIXchange

Continuing from last week, here’s more on the recently released IBM Redpaper, “IBM PowerVM Getting Started Guide.”

Chapter 2: IVM

From the authors:

“IBM developed the Integrated Virtualization Manager (IVM) as a server management solution that performs a subset of the HMC and SDMC features for a single server, avoiding the need for a dedicated HMC or SDMC server. IVM manages a single stand-alone server — a second server managed by IVM has its own instance of IVM installed. With the subset of HMC and SDMC server functionality, IVM provides a solution that enables the administrator to quickly set up a server. IVM is integrated within the Virtual I/O Server product, which services I/O, memory, and processor virtualization in IBM Power Systems.

“There are many environments that need small partitioned systems, either for test reasons or for specific requirements, for which the HMC and SDMC solutions are not ideal. A sample situation is where there are small partitioned systems that cannot share a common HMC or SDMC because they are in multiple locations.

“IVM is a simplified hardware management solution that inherits most of the HMC features. It manages a single server, avoiding the need for an independent personal computer. It is designed to provide a solution that enables the administrator to reduce system setup time and to make hardware management easier, at a lower cost.

“When not using either the HMC or the SDMC, VIOS takes control of all the hardware resources. There is no need to create a specific partition for the VIOS. When VIOS is installed using the default settings, it installs on the server’s first internal disk controller and onto the first disk on that controller. IVM is part of VIOS and activated when VIOS is installed without an HMC or SDMC.”

Chapter 2 continues with details on IVM installation.

I wish this chapter would include screen shots. (There are screen shots in chapters 3-4.) The Redpaper describes the steps, but for those unfamiliar with the interface it might be confusing. Some screen shots could help.

Chapter 3: HMC

More from the authors:

“Note: There is flexibility for you to plan your own adapter numbering scheme. The Maximum virtual adapters setting needs to be set in the Virtual Adapters window to allow for your numbering scheme. The maximum setting is 65535 but the higher the setting, the more memory the managed system reserves to manage the adapters.”

They cover the three VIOS installation methods: DVD, via the HMC (using the installios command) and via Network Installation Manager (NIM). One of the notes says:

“Interface en5 is the SEA adapter created in 3 on page 29. Alternatively, an additional virtual adapter may be created for the VIOS remote connection, or another physical adapter may be used (it will need to be cabled) for the TCP/IP remote connection. TCP and UDP port 657 must be open between the HMC and the VIOS. This is a requirement for DLPAR (using RMC protocol).”

I know when I set up shared Ethernet adapters on VIO servers, I like to add an additional virtual Ethernet adapter to put my VIO IP address on. This allows me to perform maintenance on my VIOS and SEA without an outage, as the network traffic goes out my backup SEA on my other VIOS.

Section 3.2 covers setting up dual VIO servers:

“The benefit of a dual VIOS setup is that it promotes Redundancy, Accessibility and Serviceability (RAS). It also offers load balancing capabilities for MPIO and for multi SEA configuration setups. The differences between a single and dual VIOS setup are:

The additional VIOS partition
The additional virtual Ethernet adapter used as the SEA Control Channel adapter per VIOS
Setting the trunk priority on the virtual Ethernet adapters used for bridging to physical adapters in an SEA configuration.”

The authors explain how to move from a single VIO SEA to a dual VIO scenario by adding the control channel adapter using this command:

   chdev -dev ent5 -attr ctl_chan=ent6 ha_mode=auto

They also mention that we can run commands on the VIO command line or use cfgassist, which is similar to smitty in AIX.

Section 3.3 covers setting up virtual fibre. The authors argue that virtual SCSI disks be used for rootvg and NPIV be used for data LUNs:

“Virtual Fibre Channel allows disks to be assigned directly to the client partitions from the SAN storage system. With virtual SCSI, the disks are assigned to the VIOS partition before they are mapped to a virtual SCSI adapter.

“The preference is to still use virtual SCSI for client partition operating system disk, and use virtual Fibre Channel for the data. The reasons for using virtual SCSI for client partition operating system disks are:

* When the disks are assigned VIOS first, they can be checked before having them mapped to a client. Whereas using virtual Fibre Channel this cannot be determined until the client partition is loaded from an installation source.

* Operating systems such as AIX and Linux have their kernels running in memory. If serious SAN issues are being experienced, the VIOS will first detect the problem and sever the link to the client partition. The client partition will halt abruptly reducing any risk to data corruption. With operating systems using virtual Fibre Channel or physical Fibre Channel, the partition will remain running for a period. During that period the client partition is susceptible to data corruption.

* Operating system disks using virtual SCSI are not reliant on external device drivers whereas operating system disks using virtual Fibre Channel are. When it comes to upgrading the external device drivers, the client partitions would need to follow special procedures to upgrade.”

Chapter 4: SDMC

From the authors:

“The IBM Systems Director Management Console (SDMC) provides system administrators the ability to manage IBM Power System servers as well as IBM Power Blade servers. The SDMC organizes tasks in a single panel that simplifies views of systems and day-to-tay tasks. The SDMC is also designed to be integrated into the administrative framework of IBM Systems Director.

“The SDMC can automatically handle the slot allocation of virtual adapters for the user. With the SDMC the user can choose to either let the SDMC manage the slot allocations, or use the traditional manual mechanism to allocate the virtual adapter IDs.”

According to section 4.1.4, setting up an SEA failover configuration is a simple GUI operation when using SDMC:

Select the primary VIOS, the physical adapter you want to use and the backup VIO and its physical adapter. Then hit OK:

“The SDMC automatically creates the SEA adapters on both VIOS1 and VIOS2. The SDMC will also configure the control channel as a part of this step. The virtual Ethernet adapter with the highest VLAN ID is used for the SEA control channel.”

This should remove the possibility of errors arising from setting up the control channels manually.

Although you can do the same with the HMC GUI, I still prefer to manage things on the command line.

The publication has much more. It’s well worth your time.

One Guide to the IVM, HMC and SDMC

Edit: The link still works.

Originally posted February 7, 2012 on AIXchange

The just-published IBM Redpaper, “IBM PowerVM Getting Started Guide,” shows you how to use the Integrated Virtualization Manager (IVM), Hardware Management Console (HMC) and the Systems Director Management Console (SDMC) to configure your systems. It’s an extremely valuable guide that’s brief enough, at 104 pages, to be read quickly.

The chapters are independent, so they can be read in any order. I’ll run down some highlights in posts over the next two weeks:

Chapter 1

* There’s a great chart on page 2 that compares and contrasts the advantages and disadvantages for the IVM, HMC and SDMC. 

* Section 1.2 covers planning:

“Be sure to check system firmware levels on your power server and HMC or SDMC before you start. Decide if you will use Logical Volume Mirroring (LVM) — in AIX LPARs — or Multipath IO (MPIO) at the VIOS level. Obviously if you are running NPIV you would want to run MPIO at the AIX level. The examples in this paper use MPIO. Make sure your Fibre Channel switches and adapters are N_Port ID Virtualization (NPIV) capable if you will be using NPIV.

Make sure your network is properly configured.

Check the firewall rules on the HMC or SDMC.

Plan how much processor and memory you will assign to the VIOS for best performance.”

* The authors recommend using a dual VIOS architecture — two VIO servers — to provide serviceability and scalability. So do I.

* Part of planning includes establishing a VIO slot number scheme. While the SDMC automates slot allocation, the authors illustrate their preferred scheme in Figure 1-2 on page 5.

The authors suggest a VIO slot numbering scheme where the server slot is 101, 102, 103, etc. in both VIO servers, and the client is 11, 12 connecting to VIO1, and 21 and 22 connecting to VIO2. When mapped, VIO1 would map 11 to 101, 12 to 102, and VIO2 would map 21 to 101, 22 to 102. I prefer a numbering scheme where my even-numbered adapters come from one VIOS (VIO1) and my odd-numbered adapters come from the other (VIO2), with both client and server using the same numbers. In my case I like 100, 110, 120, 130 coming from VIO1, and 101 111 121 131 coming from VIO2. Of course, you may have your own numbering scheme — which I’d love to hear about in Comments.

* Section 1.3 covers the terminology differences between Power- and x86-based systems, which can be handy for someone with little or no background managing power systems. This can help them make the transition in terminology between the two.

* Section 1.4 lists some prerequisites for setting up the machines:

“Check that:

  • Your HMC or SDMC (the hardware or the virtual appliance) is configured, up, and running.
  • Your HMC or SDMC is connected to the new server’s HMC port. We suggest either a private network or a direct cable connection.
  • The TCP port 657 is open between the HMC/SDMC and the Virtual Server in order to enable Dynamic Logical Partition functionality.
  • You have IP addresses properly assigned for the HMC, and SDMC.
  • The Power Server is ready to power on.

All your equipment is connected to 802.3ad capable network switches with link aggregation enabled. Refer to the Chapter 5: Advanced Configuration on page 75 for more details.

Fibre Channel fabrics are redundant. Refer to Chapter 5: Advanced Configuration on page 75 for more details.

Ethernet network switches are redundant.

SAN storage for virtual servers (logical partitions) is ready to be provisioned.”

Chapter 5

The next three chapters are devoted to the specific approaches you might choose to take. Chapter 2 covers the IVM, Chapter 3 the HMC and Chapter 4 the SDMC. I’ll dissect those options next week. For now I’ll briefly discuss Chapter 5 (Advanced Configuration):

“This chapter describes additional configurations to a dual Virtual I/O Server (VIOS) setup and highlights other advanced configuration practices. The advanced setup addresses performance concerns over the single and dual VIOS setup.

This chapter includes the following sections:

  • Adapter ID numbering scheme
  • Partition numbering
  • VIOS partition and system redundancy
  • Advanced VIOS network setup
  • Advanced storage connectivity
  • Shared processor pools
  • Live Partition Mobility
  • Active Memory Sharing
  • Active Memory Deduplication
  • Shared storage pools”

* Table 5-1 illustrates an example of virtual SCSI adapter ID allocations.

* Section 5.4 covers advanced VIOS network setup, including link aggregation and VLAN tagging:

“The VIOS partition is not restricted to only one SEA adapter. It can host multiple SEA adapters where:

A company security policy may advise a separation of VLANs so that one SEA adapter will host secure networks and another SEA adapter will host unsecure networks.

A company may advise a separation of production, testing, and development networks connecting to specific SEA adapter configurations.

“There are considerations regarding the use of IEEE 802.3ad Link Aggregation, 802.1Q VLAN tagging, and SEA:

There is a maximum of 8 active ports and 8 standby ports in an 802.3ad Link Aggregation device.

Each of the links in a 803.3ad Link Aggregation device should have their speeds set to a common speed setting. For example, set all links to 1g/Full duplex.

A virtual Ethernet adapter is capable of supporting up to 20 VLANS (including the Port Virtual LAN ID – PVID).

A maximum of 16 virtual Ethernet adapters with 20 VLANS assigned to each adapter can be associated to an SEA adapter.

A maximum of 256 virtual Ethernet adapters can be assigned to a single virtual server, including the VIOS partitions.

The IEEE 802.1Q standard supports a maximum of 4096 VLANS. SEA failover is not supported in IVM as it only supports a single VIOS partition.”

Whether you set up IBM Power Systems all the time or you’re just getting started with the platform, this Redpaper is an excellent resource for learning or reviewing the relevant technology and terminology.

More on Shared Storage Pools

Edit: Some links no longer work

Originally posted January 31, 2012 on AIXchange

Back in 2010 I wrote about the changes that were coming to VIOS. One of those big changes, shared storage pools, is now a reality. This gives admins another option to consider when setting up disks on Power servers.

In larger companies, disk changes are typically implemented by SAN teams with many other responsibilities and, often, different priorities. However, by allocating storage to the servers up front and setting it up in a storage pool, admins can manage shared storage pools. In doing so, we can be more responsive to requirement changes. And with thin provisioning, we can determine the amount disk we actually use on each server. For the first time since the days of internal disks and expansion drawers, disk is back under our control.

Here’s how Nigel Griffiths explains shared storage pools:

“The basic idea behind this technology… is that [VIO servers] across machines can be clustered together and allocate disk blocks from large LUNs assigned to all of them rather than having to do this at the SAN storage level. This uses the vSCSI interface rather than the pass through NPIV method. It also reduces SAN admin required for Live Partition Mobility — you get the LUN available on all the VIOS and they organise access from there on. It also makes cloning LPARs, disk snapshots and rapid provisioning possible. Plus thin provisioning — i.e., disk blocks — are added as and when required, thus saving lots of disk space.”

Continuing from last week, here’s more from Nigel’s presentation.

Since shared storage pools are built on top of cluster-aware AIX, the lscluster command also provides more information, including: lscluster –c  (configuration), lscluster –d  (list all hdisks), lscluster –i  (network interfaces), lscluster –s  (network stats).

In the demo, he also discusses adding disk space and assigning it to client VMs. Keep in mind that you cannot remove a LUN from the pool. You can replace a LUN but you can’t remove one.

He also covers thin and thick provisioning using shared storage pools and shows you how to conduct monitoring. Run topas on your VIOS and then enter D (make sure it’s upper-case) so you can watch the disk I/O get spread across your disks in 64 MB chunks. From there, Nigel covers how to set up alerts on your disk pool. If you’re using thin provisioning, you must ensure you don’t run out of space.

Nigel also shares his script, called lspool. It’s designed to do the work of multiple scripts by presenting all of the critical information at one time instead of running multiple commands:

# lspool list each cluster and for each list its pools and pool details
~/.profile
clusters=`cluster -list | sed ‘1d’ | awk -F ” ” ‘{ printf $1 ” ” }’`
echo “Cluster list: ” $clusters
for clust in $clusters
do
pools=`lssp -clustername $clust | sed ‘1d’ | awk -F ” ” ‘{ printf $1 ” ” }’`
echo Pools in $clust are: $pools
for pool in $pools
do
lssp -clustername $clust | sed ‘1d’ | grep $pool | read p size free
totalLU numLUs junk
let freepc=100*$free/$size
let used=$size-$free
let usedpc=100*$used/$size
echo $pool Pool-Size: $size MB
echo $pool Pool-Free: $free MB Percent Free $freepc
echo $pool Pool-Used: $used MB Percent Used $usedpc
echo $pool Allocated: $totalLU MB for $numLUs Logical Units
alert -list -clustername $clust -spname $pool | sed ‘1d’ | grep $pool
| read p poolid percent
echo $pool Alert-Percent: $percent
if [[ $totalLU > $size ]]
then
let over=$totalLU-$size
echo $pool OverCommitted: yes by $over MB
else
echo $pool OverCommitted: no
fi
done
done

Nigel examines snapshots and cloning with shared storage pools, noting that the different commands — snapshot –create, snapshot –delete, snapshot –rollback and snapshot –list — use different syntax. Sometimes it asks for a –spname flag, other times it asks for a –sp flag. Pay attention so you know the flags that are needed with the commands you’re running. He also demonstrates how some of this management can be handled using the HMC GUI.

The viosbr command is also covered. I discussed it here.

Nigel recommends that you get started by asking the SAN team to hand over a few TB that you can use for testing. Also make sure your POWER6 and POWER7 servers are at the latest VIOS 2.2 level. It’s worth the effort. This technology will save time, boost efficiency and increase your overall responsiveness to users.

Finally, here’s Nigel’s shared storage pools cheat sheet:

1. chdev -dev -attr reserve_policy=no_reserve
2. cluster -create -clustername galaxy -repopvs hdisk2
-spname atlantic -sppvs hdisk3 hdisk5 -hostname bluevios1.ibm.com
3. cluster –list
4. cluster -status -clustername galaxy
5. cluster –addnode –clustername galaxy –hostname redvios1.ibm.com
6. cluster -rmnode [-f] -clustername galaxy -hostname redvios1.ibm.com
7. cluster –delete –clustername galaxy
8. lscluster –s or –d or –c or –i = CAA command
9. chsp –add –clustername galaxy -sp atlantic hdisk8 hdisk9
10. chsp -replace -clustername galaxy -sp atlantic -oldpv hdisk4 -newpv hdisk24
11. mkbdsp -clustername galaxy -sp atlantic 16G
-bd vdisk_red6a -vadapter vhost2 [-thick]
12. rmbdsp -clustername galaxy -sp atlantic -bd vdisk_red6a
13. lssp -clustername galaxy -sp atlantic -bd
14. lssp -clustername galaxy
15. alert -set -clustername galaxy –spname atlantic -value 80
16. alert -list -clustername galaxy -spname atlantic
17. errlog –ls
18. snapshot -create name -clustername galaxy -spname atlantic -lu LUs
19. snapshot -delete name -clustername galaxy -spname atlantic -lu LUs
20. snapshot -rollback name -clustername galaxy -spname atlantic -lu LUs
21. snapshot –list -clustername galaxy -spname atlantic
22. viosbr -backup -clustername galaxy -file Daily -frequency daily -numfiles 10
23. viosbr -view -file File -clustername Name …
24. viosbr -restore -clustername Name …
25. lsmap -clustername galaxy –all

Take the time to listen to the replay, and you’ll learn even more. I highly recommend it.

Getting Started with Shared Storage Pools

Edit: Some links no longer work.

Originally posted January 24, 2012 on AIXchange

The December AIX Virtual User Group webinar featured Nigel Griffiths’ discussion of phase 2 of shared storage pools. If you didn’t tune in, download the presentation materials and listen to the replay.

The new shared storage pool functionality is enabled with the latest PowerVM 2.2 service pack, and is a feature of PowerVM Standard and Enterprise. If you already have PowerVM, simply download the VIO server fixpack to obtain these new features. (Note: Because this TL is based on AIX 6.1 TL7, your NIM server must be at AIX 6.1 TL7 or AIX 7.1 TL1 to use your NIM server with your VIO server.)

 One thing to note, as Nigel points out in the presentation, is that the most common VIOS storage options have been around for some time:

1) Logical volumes, created from a volume group and presented to client LPARs
2) Whole local disks
3) SAN LUNs
4) File-backed storage, either from a file system on local disk or a file system on SAN disks
5) NPIV LUNs from SAN

Nigel then discusses the newest option: using SAN LUN disks that are placed into a shared storage pool. This new option, he emphasizes, doesn’t eliminate any of the other options. It does not portend the death of NPIV. It’s just an additional VIOS storage choice we now have.

Listen to the replay or look over the slides to gather Nigel’s thoughts on the benefits of shared storage pools. He explains that fibre channel LUNs and NPIV can be complex. They require knowledge of the SAN switch and the SAN disk subsystem. If you need to make changes, it might take your SAN guys awhile to implement them. This can slow overall responsiveness. That’s to say nothing of smaller organizations that don’t have dedicated SAN guys. Live partition mobility can be tough work if your disks aren’t pre-zoned to the different frames.

With a shared storage pool you pre allocate the disk to the VIO servers. Then it’s under your control. You can more easily allocate the space to your virtual machines.

POWER6 and POWER7 servers (including blades) are needed to use shared storage pools. At minimum you should allocate a 1 GB of LUN for your repository and another 1 GB of LUN for data, but in order to be useful, in most cases you’ll need much larger LUN(s) – think terabytes of disk — if you plan to do much with it. 

Your VIOS must have the hostname set correctly to resolve the other hostnames. In Nigel’s experience he couldn’t use short hostnames — they had to be changed to their fully qualified names.

He also recommends giving your VIOS a CPU and at least 4 GB of memory. “Skinny” VIOS servers aren’t advisable with shared storage pools. Currently, the maximum number of nodes is four, the maximum physical disks in a pool is 256, the maximum virtual disks in a cluster is 1,024, and the number of clients is 40. A pool can have 5 GB to 4 TB of individual disks, and storage pool totals can range from 20 GB to 128 TB. Virtual disk capacity (LU) can be from 1 GB to 4 TB, with only one repository disk.

If you played around with phase one, you’ll find that many of your limits have been removed. Now you can use shared storage pools for live partition mobility, perform non-disruptive cluster upgrades and use third party multi-pathing software.

You cannot have active memory sharing paging disks on shared storage pool disks.

Nigel covers the relevant terminology (clusters, pools, logical units, etc.).  He also demonstrates how to actually prepare and set up your disks. In a nutshell you must get your LUNs online and zoned to your VIO servers, and you need to set your reserve policy to no_reserve on your LUNs.

After covering the commands for managing clusters — cluster –create, cluster –list, cluster –status, cluster –addnode, cluster –rmnode and cluster –delete — he recommends creating the cluster on one of the VIO servers and then adding additional VIO servers to the shared storage pool cluster. From there, you can allocate space to the VM clients.

Next week I’ll have more information from Nigel’s presentation, including scripts and cheat sheets. In the meantime, why not upgrade your test machine’s VIO servers to the latest level so you can try this functionality?