HA and DR Overview

Edit: Some links no longer work.

Originally posted March 28, 2017 on AIXchange

What are the different high availability (HA) and disaster recovery (DR) solutions are available for Power Systems? What are the pros and cons of these different solutions?

This comparison document created by Carl Burnett, Joe Cropper, and Ravi Shankar helps answer these questions:

“There are many elements to consider as you plan high availability and disaster recovery solutions for your Power Systems environment. In this article, we will explore some of these solutions and discuss some considerations and best practices when using technologies like PowerHA, Geographically Dispersed Resiliency and PowerVC within the data center.

Clustering based HA or DR solutions rely on redundant standby nodes in the cluster to be able to take over the workload and start them when the primary node fails. Each node in the cluster will monitor health of various elements such as network interfaces, storage, partner nodes, etc. to act when any of these elements fail. Clustering technologies were the closest to fault tolerant environments in regards to HA or DR support based on completely redundant software and hardware components. Cluster solutions are often operating system or platform specific and they do provide detailed error monitoring, though they require considerable effort to deploy and maintain….

It is expected that Power deployments use both models of HA & DR as needed. Cluster based HA-DR solutions are best for protecting critical workloads. For example, SAP environments are distributed and cluster-based HA would be the best method to monitor and act for various components of the stack. For other workloads, a VM restart-based model might be sufficient protection for HA and DR….”

That’s from section 2.0, which includes a nice graph that presents various solution types in terms of the availability they provide and the complexity of setting them up.

This comes from sections 3.0. 3.1 and 3.2:

“Cluster HA/DR solutions have existed for Power systems for a long time (PowerHA has been the leading HA/DR solution on AIX for more than 20 years). They have been enhanced recently to provide additional capabilities and user experiences.

VM restart based HA/DR solutions are new in 2016 and are described below:
1. PowerVC High Availability Features: PowerVC added new capabilities around VM restart High Availability. These capabilities enable customers to deploy cloud environments easily and enable simplified High availability.
2. Geographically Dispersed Resiliency (GDR) Disaster Recovery: IBM introduced a new offering for disaster recovery using VM restart technology and storage mirroring.

… PowerVC provides enterprise virtualization and cloud management of Power Systems and leverages OpenStack to do so. PowerVC has introduced high availability management functions over its past few releases. Listed below is a summary of those features:

· One-click system evacuation: During planned maintenance windows, this feature allows administrators to evacuate a host by leveraging live partition mobility (LPM). PowerVC orchestrates the mobility of all active VMs to other hosts in the environment (or a host of your choice), thereby allowing maintenance (e.g., firmware or VIOS updates, etc.) to be performed without disrupting workloads. While the host is in maintenance mode, PowerVC will not place any new VMs on this host either. Once maintenance is done, VMs can be then be placed on the host again and normal operation can resume.

· Automated remote restart: PowerVC has supported PowerVM’s simplified remote restart feature since its inception in the POWER8 timeframe. This feature allows an administrator to rebuild a VM residing on a failed host to a healthy host (assuming the hosts have shared storage). This is a critical availability feature as it provides a mechanism to recover critical VMs in the event their hosting server fails unexpectedly (read: unplanned outage).

… Power systems now provides VM restart based DR solution for the entire data center. GDR integrates deeply with PowerVM environments (HMC, VIOS) to provide for DR restart of VMs across sites using storage replicated VM images. GDR Disaster Recovery solution is easy to deploy and manage. GDR can manage recovery of hundreds of VMs across the sites.”

The References section end of the document also points you to information about Geographically Dispersed Resiliency (GDR), starting with this diagram about the IBM offering. There are also links to two GDR articles (here and here).