Edit: Some links no longer work.
Originally posted September 6, 2016 on AIXchange
This great new techdoc from Steve Knudson recently went live. It includes a set of slides that cover Ethernet on POWER, along with a cheat sheet that you may find valuable as you transition to 10G adapters on POWER8 servers:
“Moving some older FCoE 10Gb adapters from POWER7, PCIe-Gen1 slots, to POWER8, PCIe-Gen3 slots, we saw SEA throughput on a single 10Gb Ethernet port move from approx 4.2Gb/sec, up to 8.95Gb/sec. LPAR to LPAR, within the POWER8 hypervisor, we saw an astonishing 45Gb/sec, AIX to AIX. See the full slide deck attached.
The cheat sheet for AIX and SEA performance:
1) Before SEA is configured, put dcbflush_local=yes on the trunked virtual adapters. If SEA is already configured, skip this.
$ chdev -dev entX -attr dcbflush_local=yes
2) Configure SEA. largesend is on the SEA by default, put large_receive on also.
$ chdev -dev entY -attr large_receive=yes
3) Up in AIX, before IP is configured, put dcbflush_local on virtual Ethernet adapters. If IP is already configured, skip this.
# chdev -l ent0 -a dcbflush_local=yes (slide 55)
4) Up in AIX, put thread and mtu_bypass on the interface en0 (slide 55).
# chdev -l en0 -a thread=on
# chdev -l en0 -a mtu_bypass=on
5) Assure you have enough CPU in sending AIX, sending VIO, receive VIO, and receiving AIX. See slides 75-76.”
From the agenda in the slides:
-Physical Ethernet Adapters
-Jumbo Frames
-Link Aggregation Configuration
-Shared Ethernet Adapter SEA Configuration
-VIO 2.2.3, Simplified SEA Configuration
-SEA VLAN Tagging
-VLAN awareness in SMS
-10 Gb SEA, active – active
-ha_mode=sharing, active – active
-Dynamic VLANs on SEA
-SEA Throughput
-Virtual Switch – VEB versus VEPA mode
-AIX Virtual Ethernet adapter
-AIX IP interface
-AIX TCP settings
-AIX NFS settings
-largesend, large_receive with binary ftp for network performance
-iperf tool for network performance
Most syntax in this presentation is VIO padmin, sometimes root smitty.
From slide 13:
Jumbo frames is a physical setting. It is set
-on Ethernet switch ports
-on physical adapters
-on the link aggregation, if used
-on the Shared Ethernet Adapter.
-Jumbo frames is NOT set on the virtual adapter or interface in the AIX client LPAR.
-Do not change MTU on the AIX client LPAR interface. We will use mtu_bypass (largesend) in AIX.
-mtu_bypass – up to 64KB segments sent from AIX to SEA, resegmentation on the SEA for the physical network (1500 or 9000 as appropriate).
From slide 16, link aggregation configuration:
-Mode – standard if network admin explicitly configures switch ports in a channel group for our server.
-Mode – 8023ad if network admin configures LACP switch ports for our server. ad = Autodetect – if our server approaches switch with one adapter, switch sees one adapter. If our server approaches switch with a Link Aggregation, switch auto detects that. For 10Gb, we should be LACP/8023ad.
-Hash Mode – default is by IP address, good fan out for one server to many clients. But will transmit to a given IP peer on only one adapter.
-Hash Mode – src_dst_port, uses source and destination port numbers in hash. Multiple connections between two peers likely hash over different adapters. Best opportunity for multi-adapter bandwidth between two peers. Whichever mode used, we prefer hash_mode=src_dst_port
-Backup adapter – optional, standby, single adapter to same network on a different switch. Would not use this for link aggregations underneath SEA Failover configuration. Also would likely not use on a large switch, where active adapters are connected to different, isolated “halves” of a large “logical” switch.
-Address to ping – Not typically used. Aids detection for failover to backup adapter. Needs to be a reliable address, but perhaps not the default gateway. Do not use this on the Link Aggregation, if SEA will be built on top of it. Instead use netaddr attribute on SEA, and put VIO IP address on SEA interface.
-Using mode and hash_mode, AIX readily transmits on all adapters. You may find switch delivers receives on only adapter – switches must enable hash_mode setting as well.
From slide 19, Shared Ethernet Adapter (SEA) configuration:
-Some cautions with largesend
-POWER Linux does not handle largesend on SEA. It has negative performance impact on sftp and nfs in Redhat RHEL.
-A few customers have had trouble with what has been referred to as a DUP-ACK storm when packets are small, and largesend is turned off in one client. Master APAR IV12424 lists APARs for several levels of AIX.
-A potential “denial of service” attack can be waged against largesend, using a specially crafted sequence of packets. ifixes for various AIX levels are listed here.
-largesend is NOT a universal problem, and these ifixes are not believed to be widely needed.From slide 77, iperf 10 Gb, SEA:
-If you are getting less than the values on the two previous slides…
-It appears that LARGESEND is on physical 10Gb adapter interfaces automatically, but you can set it explicitly:
$ chdev –dev en4 –attr mtu_bypass=on
-Check that largesend, large_receive are on SEA at both ends:
$ chdev –dev ent4 –attr largesend=1 large_receive=yes
-Check that mtu_bypass (largesend) is on AIX client LPAR interfaces:
# chdev –l en0 –a mtu_bypass=on
-Watch CPU usage in both VIOs, both Client LPARs during iperf interval and make sure no LPAR is pegged or starving.
You’ll find plenty of other helpful tips and tricks here, so take the time to read through the slides. I’m sure you’ll learn at least one new thing by learn something you didn’t already know.