Power Systems Best Practices

Edit: This is still a good document, but the link keeps changing.

Originally posted April 23, 2013 on AIXchange

Recently I received this set of slides from Fredrik Lundholm covering best practices for Power Systems with AIX. I’ll cover a few highlights, though honestly, I could discuss every slide. The information here is that valuable. So I highly recommend taking take the time to view the entire thing.If you download his slides, be sure to look at the notes. For example on page 7 where he discusses a virtualized system design, the notes contain a couple of links relating to Entitled Software Support, including this ESS how-to guide.

Page 8 lists guidelines for capacity planning. Fredrik points out the rational starting places for your CPU and LPAR weights if no information is provided. The fact that you can make reasonable guesses without a ton of workload information just reminds me how forgiving this platform is. If things change, CPU and memory settings can be easily adjusted. Whole physical adapters can even be added or removed if necessary.

Page 9 covers firmware and using Microcode Discovery service and FLRT.

Page 11 tells you where to get fixes for the VIO server. The notes cover items that have been fixed in each release.

Page 12 covers network best practices. The notes contain a link to a step by step network configuration guide.

Page 13 shows a nice diagram of a shared Ethernet adapter load sharing configuration that is available in VIOS 2.2.1+.

Page 14 shows the recommended architecture when more than one VLAN is used.

Page 15 features a reminder about SEA and virtual Ethernet interfaces. Be sure to select large send and large receive; it’s not the default setting.

            For all SEA interfaces, chdev -l entX -a largesend=1   (survives reboot)

            For all SEA interfaces, chdev -l entX -a large_receive=1   (survives reboot)

Page 17 covers storage and the need to ensure that the correct multi-path drivers are installed.Page 18 has a nice picture illustrating how the configured machines will look.

Page 19 covers setting up fc_err_recov and dyntrk, along with setting up no_reserve and round_robin.

From page 20: To allow graceful round robin load balancing over multiple paths, set timeout_policy to fail_path for all physical hdisks in the VIO server:

            # chdev –l hdisk0 –a timeout_policy = fail_path

Page 21 has links to documentation for installing AIX. Page 22 has a nice chart illustrating good choices for running AIX. The red green and yellow color coding are intended to help you decide which TL to run.

Page 23 lists AIX tuning and values that should be changed.

Page 24 covers AIX 5.3 memory tuning.

Page 26 has a nice tip: Largesend increases virtual Ethernet throughput performance and reduces processor utilization. Starting with AIX 6.1 TL7 sp 1 and AIX 7.1 sp 1, the operating systems that supports the mtu_bypass attribute for the shared Ethernet adapter provide a persistent way to enable the largesend feature. To determine if the operating system supports the mtu_bypass attribute, run the following lsattr command [lsattr -El enX |grep by_pass]. If the mtu_bypass attribute is supported, the… command will return:

            mtu_bypass off Enable/Disable largesend for virtual Ethernet True

            Enable largesend on all AIX en interfaces through:

            chdev -l enX -a mtu_bypass=on

Page 27 shows the recommended vSCSI parameters on each client partition. Page 28 covers vSCSI Queue Depth tuning for different disk subsystems.

There is also a section on PowerHA. It’s recommended that new deployments go with PowerHA 7.1. Page 31 covers I/O pacing with PowerHA.

An FAQ starts on page 32. Here’s a tip I like:

            Q: How do I run nmon to collect disk service times, top process cpu consumption, etc?

            A: STG Lab services recommends the following parameters for nmon data collection:

            /usr/bin/nmon –M -^ –f –d –T –A –s 60 –c 1435 –m /tmp/nmonlog

            This will invoke nmon every minute and continue for 24 hours capturing vital disk access time data along with top processes.

            -d includes the Disk Service Time section in the view

            -T includes the top processes in the output and saves the command line arguments into the UARG section

            -^ includes the Fibre Channel (FC) sections

            On the HMC, there is an “Allow performance information collection” checkbox on the processor configuration tab. Select this checkbox on the partition that you want to collect this data. If you are using IVM… use the lssyscfg command, specifying the all_perf_collection (permission for the partition to retrieve shared processor pool utilization) parameter. Valid values for the parameter are 0, do not allow authority (the default) and 1, allow authority.

Starting on page 36 there are reference documents to older information, which may still be helpful for certain environments.

This is a fantastic set of slides with current, real world information and suggestions.