Edit: Still true today, have you tested your backup lately?
Originally posted May 20, 2008 on AIXchange
A buddy recently told me about a situation he encountered where a non-disruptive disk update on a storage area network proved extremely disruptive. The client lost its LUNs, which impacted all of the LPARs that were booting from SAN, along with many machines that had datavg stored out in the SAN. Rootvg on these partitions was gone.
Still, the client figured it could restore its environment from their most recent backups. However, this was a test environment. There were no backups. The machines were built and used, with no thought ever given to recovering them. Since this wasn’t production, it wasn’t important enough to add jobs to the backup server to accommodate these machines, right?
This was another painful lesson learned. The machines had to be rebuilt. The scripts, users, tools and code that was being developed were lost.
I, for one, have covered this topic repeatedly, because it’s important. Yet some continue to ignore the message. I still find environments that don’t have consistent, recent backups. Yes, IBM hardware is robust, and it’s known for reliability and availability. But even the best hardware is still man-made, and machines break. And even the most experienced administrators logon and make mistakes. Stuff happens.
To this, some respond: “I have nothing to worry about in my environment. I take my weekly mksysb. I send the tapes offsite. I take my nightly backup. It goes over the network to my backup server, which we then offload to tape, which we then take offsite.”
But do you test those backups? Are you sure the tapes can be read? I’m not even talking about a true disaster-recovery scenario, where your building burned down and you’re trying to rebuild. I’m talking about simply trying to restore data to machines in your current environment.
When you really need to restore isn’t the time to find out that you can’t restore. It’s critical that you take the time to confirm this now, when things are still running smoothly.
Ask yourself what will be lost if you must go back to your last backup. Did you take that backup last night? If your outage happened right before closing time, is your business prepared to rollback to last night’s backup and redo the work that was done during the day?
More and more the answer is no: Outages any kind and any length cannot be tolerated.
Constantly look at your environments. If you add file systems to machines, are the new file systems being backed up? Is the frequency of the backups acceptable to the business? Does management agree with your conclusion?
The objective of a backup is to be able to restore. Be sure that you can.