Managing a Dump Device

Edit: Don’t overlook this.

Originally posted May 20, 2014 on AIXchange

Have you ever seen errors like this in your error log?

E87EF1BE   0515150014 P O dumpcheck      The largest dump device is too small.
E87EF1BE   0514150014 P O dumpcheck      The largest dump device is too small.

Have you verified that your system is capable of storing a system dump? If not, this technote on managing a dump device could help:

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure. Its intent is to help the system administrator ensure that a system dump will be complete and usable for troubleshooting purposes.

This document applies to AIX versions 5, 6, and 7.

There are different sections to this technote, including:

Managing system dump devices
Determining proper size for dump device
Setting a tape drive as a dump device
Do not dump to a mirrored logical volume
Dumping outside the rootvg
Remote dumps over to a network
How to create a dedicated dump device
Related documentation

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are two dumps devices (a primary and secondary). To view information about the current dump devices, enter:

sysdumpdev -l

Example:

# sysdumpdev -l

primary             /dev/lg_dumplv

secondary            /dev/sysdumpnull

copy directory      /var/adm/ras

forced copy flag    FALSE

always allow dump    TRUE

dump compression     ON

type of dump         traditional

The document also provides information about the primary and secondary dump devices, along with different flags that you can set to manage your dump devices:

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the rootvg volume group.

It is not recommended that a standalone dump logical volume be mirrored. It is much better practice to have a primary and a secondary dump device, each wholly contained on separate hdisks, rather than mirroring these devices. If for some reason the primary dump device is inaccessible the dump program will then attempt to dump to the secondary device.

So how do you fix the error I listed at the start of this post? Read the whole technote for more information, but the short answer is: estimate how much space you need for your dump by running sysdumpdev –e, then divide that estimated size by your physical partition (PP) size to determine how many PPs your dump device should have:

Note: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device should be a size that is greater than the value returned. If the dump device is a standard dump logical volume, such as lg_dumplv, then use the command extendlv to increase its size. If it is the primary paging space hd6, use the command chps.

Believe me, you don’t want to wait for a catastrophic operating system software failure to discover that your dump devices are too small.