Edit: It has been a while since I needed to mess with SSA disks.
Originally posted September 2005 by IBM Systems Magazine
Recently, a user opened a problem ticket reporting that copying files back and forth from a server we support was taking an unusually long time. The files weren’t all that large, but the throughput was just terrible. After poking around a bit, we found that the Ethernet card wasn’t set to the correct speed. When we ran lsattr -El ent0, we found the media_speed set to Auto_Negotiation. I knew what the problem was immediately.
We’ve seen the Auto_Negotiation setting on Ethernet adapters to be problematic on AIX. Our fast Ethernet port on the switch was always set to be 100/Full. With Auto_Negotiation on, sometimes the card would correctly set itself to 100/Full, but at other times it would go to 100/Half. This causes the slowdown on the network because you now have collisions on the network, which you can see with netstat -v.
Packets with Transmit collisions:
1 collisions: 204076 6 collisions: 37 11 collisions: 1
2 collisions: 65375 7 collisions: 6 12 collisions: 0
3 collisions: 16894 8 collisions: 2 13 collisions: 0
4 collisions: 2404 9 collisions: 0 14 collisions: 0
5 collisions: 255 10 collisions: 2 15 collisions: 0
You can also determine if you’re having Receive Errors and see what speed your adapter is running at by using netstat -v. You’ll see something similar to the following:
RJ45 Port Link Status : up
Media Speed Selected: Auto negotiation
Media Speed Running: 100 Mbps Full Duplex
Transmit Statistics: Receive Statistics:
——————– ——————-
Packets: 33608151 Packets: 82280769
Bytes: 3364953629 Bytes: 89992126877
Interrupts: 15105 Interrupts: 79762362
Transmit Errors: 0 Receive Errors: 14000
Packets Dropped: 1 Packets Dropped: 14
Bad Packets: 0
How did we fix the duplex issue? We detached the interface and ran a chdev to make it 100/Full: chdev -l ‘ent0′ -a media_speed=’100_Full_Duplex’. Once we made this change, there were no more collisions and the user was a happy camper.
Verifying Failed SSA Disks
Another issue that seems to crop up is when SSA disks die. How do you know which physical disk in your drawer needs to be replaced? In some instances, when the disk dies, you’re no longer able to go into Diag / Task Selection / SSA Service Aids / Link Verification to select your disk and identify it because it’s no longer responding.
In this situation, you can use link verification to identify the SSA disks on either side of the failed disk. You can then look for the disk that’s between the two blinking disks, and you know which disk is bad. Another way to verify that you’ve selected the correct disk to replace is to run lsattr -El pdiskX, where “X” is replaced with your failing pdisk number. This provides the serial number that you can match with the serial number printed on the disk. (Note: The serial number may not be an exact match, but you can match fields 5-12 in the output (omit the trailing 00D) with the printed serial number on the disk.) Here’s the highlighted output:
lsattr -El pdisk45
adapter_a ssa3 Adapter connection False
adapter_b none Adapter connection False
connwhere_shad 006094FE94A100D SSA Connection Location False
enclosure 00000004AC14CB52 Identifier of enclosure containing the Physical Disk False
location Location Label True
primary_adapter adapter_a Primary adapter True
size_in_mb 36400 Size in Megabytes
Another way to find your disk based on its location codes is by using lsdev -C | grep pdiskX. After replacing it, you can simply run rmdev -dl pdiskX, swap it with your replacement disk and run cfgmgr.
If your SSA disk was part of a raid array, hopefully at this point your hot spare took over, and you can just make your replacement disk the new hot spare disk. To make your disk a hot spare, use diag / task selection / ssa service aids / smit — ssa raid arrays / change show use of an ssa physical disk, and change your newly replaced disk from a system disk to a hot spare disk. To verify all is well, I like to go into smitty / devices / SSA RAID Arrays / List Status of Hot Spare Protection for an SSA RAID Array. It should report that the raid array is protected and the status is good. Keep in mind that only the latest SSA adapter (4-P) will allow list status of hot spare protection to work; older cards such as the 4-N don’t have this feature.