Edit: Link works at the time of this writing.
Originally posted February 18, 2014 on AIXchange
Recently during a PowerHA 7.1.2 installation, the network team was unable to get multicast communication working properly. Fortunately we were able to use this document to get everything going.
From the document, entitled “PowerHA System Mirror 7.1 and Multicasting Setup”:
“PowerHA SystemMirror 7.1 Standard Edition High Availability solution implements clustering using multicast (IP based multicast) based communication between the nodes/hosts in the cluster. Multicast based communication provides for optimized communication method to exchange not only heartbeats, but also allows clustering software to communicate critical events, cluster coordination messages etc in 1 to N method instead of communication 1 to 1 between the hosts.
“Multicast communication is a well established mode of communication in the world of TCP/IP network communication. However in some cases, the network switches used in the communication path need to be reviewed and enabled for multicast traffic to flow between the cluster hosts through them. This document explains some of the network setup aspects that may need to be reviewed before the PowerHA SystemMirror 7.1 cluster is deployed.
“Note that multicast communication is used during the initial discovery phase when the cluster is being created, but also during the normal operations of the cluster. Hence it is extremely important that the multicast traffic to flow between the cluster hosts in the datacenter before the cluster formation can be attempted. Please plan to test and verify the multicast traffic flow between the would-be cluster nodes before even attempting to create the cluster.”
Before I get too far along, I should note that with PowerHA 7.1.3, unicast is an added communication option. In fact, it’s the default option. These issues with getting multicast working are likely a behind this change. But in the case of this customer, the commitment was made to version 7.1.2.
Here’s a bit more about multicast:
“Multicasting is a form of addressing, where a group of hosts form a group and exchange messages. A multicast message sent by one in the group is received by all in the group. This allows for efficient cluster communication where many times messages need to be sent to all the nodes in the cluster. For example a cluster member may need to notify the rest of the nodes about a critical event and can accomplish the same by sending a single multicast packet with the relevant information.
“One of the simplest method to test end to end multicast communication is to use the mping command available on AIX. In Fig 1, start the mping command in receive mode on one Host (Say Host A) and then use mping command to send packets from the other Host (Host B). If multiple hosts will be part of the cluster, test end to end mping communication from each host to the other.”
Finally, here are the document’s troubleshooting guidelines:
“If mping command fails to receive packets from Host to Host in the network environment, there could be some issue in the network path in regards to multicast packet flow. Follow some of the general guidelines below to troubleshoot the issue:
- Review the switch vendor’s documentation for guidelines in regards to switch setup. Some of the known switch guideline links are included in the reference.
- Disable IGMP snooping on the switches. Most switches will allow for disabling IGMP snooping. If your network environment allows, disable the IGMP snooping and allow all multicast traffic to flow without any problems across switches.
- If your network requirements does not allow snooping to be disabled: Debug the problem by disabling the IGMP snooping and then adding network components one at a time for snooping
- Debug if necessary by eliminating the cascaded switch configurations (having only one switch between the Hosts).”
In our case, we disabled the IGMP snooping on the switch and multicast started to work.
What about your experience? Did you have any issues getting multicasting up and running? Please share your thoughts in comments.