Edit: Some links no longer work.
Originally posted February 21, 2012 on AIXchange
Another great Virtual User Group webinar recently took place, this one featuring Shawn Bodily’s presentation, “Introduction to PowerHA SystemMirror for AIX Standard Edition.” Be sure to get the presentation materials and listen to the replay. And look forward to the next VUG webinar, when this topic will be continued.
This material is very similar to the presentation on the same topic at last fall’s IBM Technical University. That’s an indication of the interest surrounding this solution. As Shawn notes in his presentation, IBM has unveiled 23 major releases of the product, an average one release per year. More than 12,000 customers use it worldwide. While Standard Edition is the focus of Shawn’s presentation, he points out that PowerHA SystemMirror for AIX Enterprise edition allows for multi-site cluster management, while also including all the functionality of Standard edition.
Shawn notes that “PowerHA gives you cluster management for the data center. The software monitors, detects and reacts to events. It establishes a heartbeat between the systems, and enables automatic switch-over.” He then defines what HA solutions generally try to do, which is eliminate single points of failure. The goal is to reduce downtime, but HA can also help you with planned downtime as well. It might not be fault tolerant, but it will be fault resistant.
One chart shows how failure points can be eliminated. To eliminate node failure, use multiple nodes. To eliminate VIO failover, use dual VIO servers. To eliminate site failure, deploy additional sites. Other items are covered, but again, the idea is to build in redundancy so you can continue to provide access to the applications and data that the business runs on.
Shawn discussed his customer interactions (an area I’ve touched on previously):
“First, another IBM representative will tell the customer about the hardware and the systems’ reliability, availability and serviceability (RAS) features. Then a second rep will discuss live partition mobility and how it seamlessly shifts logical partitions from one frame to another. So after 20-30 minutes of hearing about how the hardware never fails, THEN Shawn must step in and explain why the customer should be concerned with high availability and disaster recovery. That’s one tough act to follow.”
Some additional slides introduce Live Partition Mobility and explain how it allows you to move your live running operating system and application from one physical frame to another. Of course, that’s a hardware maintenance solution. What about software maintenance? With PowerHA, you can fail over to the other node, upgrade your application or your operating system, then fail back and do the same thing on the other side. Shawn notes that basically, PowerHA performs the functions that LPM doesn’t. He also gets into how PowerHA is used to recover from node failures, network failures, loss of shared storage access, and — with version 7.1 based on Cluster Aware AIX technology — rootvg errors. Finally, Shawn covers the differences between PowerHA 6.1 and 7.1.
Shawn makes these additional points:
* Remember, we can fail over from one system to one system, from one system to any other system, from any system to one, and from any to any. We can also fail over between different versions of hardware from the same or different families, assuming you can live with the performance degradation, and after you verify that the version of the operating system you want to run will work with your particular configuration. Failing POWER7 to POWER6 or POWER5 could conceivably work as long as you verify that your particular setup is supported.
* Often you’ll want to make your service IP address highly available, along with your application server and your shared storage.
* You can create user defined resources, custom resource groups and more granular resource group options. You can set up resource group dependencies. In his example, you might want to be sure your database is running before you try to start application servers, so you could configure that as a dependency.
* You can configure different priorities and choose which nodes to your resource starts on. This provides a great deal of flexibility and control when setting things up.
* You can also configure things to automatically run DLPAR and COD operations. So you could have a very “skinny” standby node, but when needed it could perform operations on the HMC and bring additional memory and CPU resources online.
* You can have application monitors so you can take actions if PowerHA detects that the application has gone down, and you can set up file collections to have the software help you keep config files or other important files kept in sync across the cluster. This is meant to support all regular files, as opposed to things like trying to keep password files in sync.
* Configuration and smart assistants are available to configure clusters. System Director plugins are also available, both for managing and monitoring your cluster’s state.
* The CAA command — /usr/sbin/clcmd — can be used to distribute commands across all cluster nodes.
* A cluster test tool can be used to validate clusters. This is also a good way to run tests across many different clusters in the environment, to ensure that we’re running the same tests across all of the machines.
Shawn’s final slide lists these great resources:
* IBM Redbook: PowerHA SystemMirror 7.1 for AIX
* PowerHA & Aix Support & Compatibility Matrix
* PowerHA Hardware Support Matrix
Incidentally, you can follow Shawn on Twitter.
I really enjoy these webinars, and of course the availability of the replays and presentation materials is a huge convenience. Hopefully my writing about VUG webinars encourages you to take the time and listen for yourself.
Note: Completely unrelatedly, I’m a YouTube sensation. Well, maybe not, but I’m there. Five new videos just went live where you can see me and some other IBM Power Champions talking with IBM’s Ian Jarman at last fall’s Technical University.