A Tool for SAN Troubleshooting

Edit: Still good stuff.

Originally posted July 7, 2015 on AIXchange

Are you looking for more information about your SAN? Do you want to learn about the LUNs that have been presented to your host? Maybe you want to be able to compare what your machine sees now as opposed to what it was seeing on the SAN.

IBM has a SAN troubleshooting tool that can help you. It’s called devscan:

The purpose of devscan is to make debugging storage problems faster and easier. Devscan does this by rapidly gathering a great deal of information about the Storage Area Network (SAN) and displaying it in an easy to understand manner. Devscan can be run from any AIX host, including VIO clients, or from a VIOS.

The information devscan displays is gathered from the SAN itself or the device driver, not from ODM, with exceptions described in the man page. The data is therefore guaranteed to be current and correct.

In the default case, devscan is unable to change any state on the SAN or on the host, making it safe to run even in production environments. In all cases, devscan is safer to run than cfgmgr, because it cannot change the ODM. Some of the optional commands devscan can use are able to cause a state change on the SAN. Details are provided in the man page.

Devscan can report a list of all available target devices and LUNs
For each LUN, devscan can report
· ODM name and status
· PVID, if there is one
· Device type
· Capacity and block size
· SCSI status
· Reservation status, both SCSI-2 and SCSI-3
· ALUA status
· Time to service a SCSI Read

Devscan scans a set of SCSI adapters, and then issues a set of commands to a set of targets and LUNs on those adapters. In the default case, devscan finds every Fibre Channel, SAS, iSCSI, and VSCSI adapter in the system and traverses each one. It issues SCSI Report LUNs and Inquiry commands to every target and LUN it finds. The set of adapters to be scanned, targets and LUNs to be traversed, and commands to be issued may be controlled with several of the optional flags.

Usage examples
1. To run against all SCSI adapters with the default command set (Start, Report LUNs, and Inquiry):
    devscan
2. To run against only the fscsi3 adapter and gather SCSI Status from all attached devices:
    devscan -c7 –dev=fscsi3
3. To determine what the NPIV client using WWPN C0507601A673002A can see through all Fibre Channel adapters on the VIOS (e.g., because the client cannot boot):
    devscan -t f -n C0507601A673002A
4. To run devscan in machine-parseable mode using “::” as the field delimiter:
    devscan –concise –delim=”::”
5. To run devscan against only the VSCSI adapters in the system and write the output to /tmp/vscsi_scan_results:
    devscan -tv -o /tmp/vscsi_scan_results
6. To scan only the storage port 5001738000330193:
    echo “f|||5001738000330193” | devscan –whitelist=-
7. To scan only the storage at SCSI ID 0x010400:
    echo “f|010400” | devscan –whitelist=-
8. To scan only for hdisk15:
    echo “hdisk15” | devscan –whitelist=-
9. To scan for all targets except the one with WWNN 5001738000330000:
    echo “f||||5001738000330000” | devscan –blacklist=-
10. To scan for an iSCSI target at 192.168.3.147:
    echo “192.168.3.147” | devscan –iscsitargets=-
11. To check the SCSI status of hdisk71 on all the Fibre adapters in the system and send the output to /tmp/devscan.out:
    echo “hdisk71” | devscan –whitelist=- -o /tmp/devscan.out -tf -c7 -F

1. Processing FC device:
    Adapter driver: fcs4
    Protocol driver: fscsi4
    Connection type: none
    Local SCSI ID: 0x000000
    Device ID: df1000fe
    Microcode level: 271102

The connection type of “none” indicates this adapter has never had a link.
2. Processing FC device:
    Adapter driver: fcs0
    Protocol driver: fscsi0
    Connection type: fabric
    Link State: down
    Current link speed: 4 Gbps
    Local SCSI ID: 0x180600
    Device ID: 77102224
    Microcode level: 0125040024

The link state of “down” indicates this adapter had a link up since the last time it was configured, but does not currently.
3. Nameserver query succeeded, but indicated  no targets are available on the SAN. This means the adapter’s link to the switch is good, but no storage is available, typically because the storage has unexpectedly left the SAN or because it was not zoned to this host port.

4. Processing iSCSI device:
    Protocol driver: iscsi0

    No targets found
    Elapsed time this adapter: 0.001358 seconds

For non-Fibre Channel devices, there is no name server, so the no-targets condition looks like this.

5. 00000000001f7d00 0000000000000000
    START failed with errno ECONNREFUSED

Devcsan is able to reach this device, so the host is connected to the SAN and the nameserver is reporting it, but we are not able to log in to the device. This is an end device problem.

6. Vendor ID: IBM Device ID: 2107900 Rev: 5.90 NACA: yes
PDQ: Not connected PDT: Unknown or no device
Dynamic Tracking Enabled
TUR SCSI status:

Check Condition (sense key: ABORTED_COMMAND;
ASCQ: LOGICAL UNIT NOT SUPPORTED)
ALUA-capable device
Report LUNs failed with errno ENXIO
Extended Inquiry failed with errno ETIMEDOUT
Test Unit Ready failed with errno EIO

Other usage examples can be found on the website. Download devscan and follow these installation instructions:

1. Download the package to your machine.
2. Uncompress and extract the archive. The binary and man page are placed in, /usr/local/bin and /usr/share/man/man1/, respectively, and are ready for use.

Here’s some of the output that I saw on a test machine:

    Running on host: vio1

    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
    Processing FC device:
        Adapter driver: fcs0
        Protocol driver: fscsi0
        Connection type: fabric
        Link State: up
        Local SCSI ID: 0x010000
        Local WWPN: 0x10000090fa535192
        Local WWNN: 0x20000090fa535192
        Device ID: 0xdf1000e21410f103
        Microcode level: 00010000020025200009

    SCSI ID LUN ID           WWPN             WWNN
    ———————————————————–
    0a0600  0000000000000000 500507680230e835 500507680200e835
        Vendor ID: IBM          Device ID: 2145     Rev: 0000 NACA: yes
        PDQ: Connected          PDT: Block (Disc)
        Name:          hdisk14  Path:            0  VG:       None found
        Device already SCIOLSTARTed    Dynamic Tracking Enabled
        Status: Enabled
        ALUA-capable device

    0a0600  0001000000000000 500507680230e835 500507680200e835
        Vendor ID: IBM          Device ID: 2145     Rev: 0000 NACA: yes
        PDQ: Connected          PDT: Block (Disc)
        Name:          hdisk15  Path:            0  VG:       None found
        Device already SCIOLSTARTed    Dynamic Tracking Enabled
        Status: Enabled
        ALUA-capable device

    0a0600  0002000000000000 500507680230e835 500507680200e835
        Vendor ID: IBM          Device ID: 2145     Rev: 0000 NACA: yes
        PDQ: Connected          PDT: Block (Disc)
        Name:          hdisk16  Path:            0  VG:    caavg_private
        Device already SCIOLSTARTed    Dynamic Tracking Enabled
        Status: Enabled
        ALUA-capable device

    2 targets found, reporting 20 LUNs,
    20 of which responded to SCIOLSTART.
    Elapsed time this adapter: 00.391183 seconds

Did you know this tool existed? Have you used it? What did you think?