oct2003.tar

Getting Status Information about an HACMP Cluster

Arni Snorri Eggertsson

During the time that I have managed High Availability Cluster Multiprocessing for AIX (HACMP) clusters, I have found that status information for these clusters is not very accessible. There are ways of monitoring them with standard commands that answer one question per command or with the CLVIEW program, which is actually getting better and better with each release. But it's difficult to monitor for status changes and especially difficult to do custom monitoring, such as Big Brother. And since HACMP is intended for running highly available applications, state monitoring is one of the essential parts for it to meet its goals.

Over time I created small scriptlets using SNMP to give me information about specific things inside HACMP. These include checking if all interfaces are up, and if each application is up and running and on its intended node. As my collection of scriptlets grew, I decided to combine them into one script that would create a human-readable status screen (see Listing 1). My experience is that condensed information that can fit on one screen is very useful for systems administrators to quickly and easily debug a problem.

But how can you get this information and wrangle with it? The easiest way to do this is by querying the HACMP server demons using SNMP (the command is snmpinfo on AIX and snmpget on Linux). You can't, however, receive all information using SNMP, so I had to use at least one command that belongs to the HACMP command set.

Detailed discussion about SNMP in general is beyond the scope of this article, but I assume readers at least know what SNMP is. Actually, I think anyone with basic shell scripting skills who reads through the script given in Listing 2 will understand it without knowing anything about SNMP.

Information is actually quite easy to retrieve using SNMP since IBM is kind enough to publish the MIB (Management Information Database), which helps translate and query SNMP data into a human-readable format. Here is an example without the MIB in place:

root@server1:/>snmpinfo -m dump -c public -h localhost \
1.3.6.1.4.1.2.3.1.2.1.5.1
1.3.6.1.4.1.2.3.1.2.1.5.1.1.0 = 1
1.3.6.1.4.1.2.3.1.2.1.5.1.2.0 = 41:43:4D:45
1.3.6.1.4.1.2.3.1.2.1.5.1.3.0 =
1.3.6.1.4.1.2.3.1.2.1.5.1.4.0 = 2
1.3.6.1.4.1.2.3.1.2.1.5.1.5.0 = 1
1.3.6.1.4.1.2.3.1.2.1.5.1.6.0 = 1043707312
1.3.6.1.4.1.2.3.1.2.1.5.1.7.0 = 0
1.3.6.1.4.1.2.3.1.2.1.5.1.8.0 = 32
1.3.6.1.4.1.2.3.1.2.1.5.1.9.0 = 73:65:72:76:65:72:31
1.3.6.1.4.1.2.3.1.2.1.5.1.10.0 = 73:65:72:76:65:72:31
1.3.6.1.4.1.2.3.1.2.1.5.1.11.0 = 2
1.3.6.1.4.1.2.3.1.2.1.5.1.12.0 = 1

However, if we have the MIB in place and repeat the query, we get the following. Note that I also added the "-v" flag to translate the results from HEX into ASCII:

root@server1:/>snmpinfo -m dump -v  -c public -h localhost -o \
  /usr/sbin/cluster/hacmp.defs 1.3.6.1.4.1.2.3.1.2.1.5.1
clusterId.0 = 1
clusterName.0 = "ACME"
clusterConfiguration.0 = ""
clusterState.0 = 2
clusterPrimary.0 = 1
clusterLastChange.0 = 1043707312
clusterGmtOffset.0 = 0
clusterSubState.0 = 32
clusterNodeName.0 = "server1"
clusterPrimaryNodeName.0 = "server1"
clusterNumNodes.0 = 2
clusterNodeId.0 = 1

And to make it even easier, you can replace the query string "1.3.6.1.4.1.2.3.1.2.1.5.1" with the word "cluster" and get the same results. All this translation between these numbers and names is done with help from the MIB. The MIB that is deployed with HACMP/CS and HACMP/ES is named hacmp.defs and hacmp.my. These are plain text files and provide information about what you can get from HACMP with SNMP. You may have to open these files to fully understand the result set. The constants I set up first in the script are pure information from the MIB.

If you are using HACMP/ES, the MIB files are located under /usr/es/sbin/cluster but, if you are using HACMP/CS, the MIB files can be found under /usr/sbin/cluster.

The scripts can easily be modified to run on a different node than one of the cluster nodes. To make them run on a different AIX machine, just comment out the part where I use the clfindres command and update the variables that decide which host to query. Please note that different operating systems have different commands of using SNMP so some adjustments may be required to port the script to another platform.

To understand how some of the information is retrieved, you can view Table 1, which describes the SNMP keys and what the retrieved information means. Some SNMP keys describe themselves.

The entire script can be found in Listing 2. The script is basically built from about five function calls, each of which does a bit of work. When combining my scriptlets into this single script, I tried to be as modular as possible, but that is not always possible when reusing code. I also realize that this can all be done with Perl in a simpler manner, but I prefer shell scripts.

The script works like this: gather_information is called, which reads the variable SNMPKEYS, and this variable contains all the SNMP keys listed in Table 1. These are the keys from which I get most of the information. The gather_information function calls get_info, which translates all the SNMP information into variables using the snmp_cmd function. Once we have most of the information loaded as variables, it's time to print the status screen using the print_report function. Since shell scripts do not offer three-dimensional arrays, I had to improvise when printing the network information and that's why network information is retrieved from a special function.

This is just one way of how to use the information gathered. This script can easily be extended to do much more and even report to open source monitoring tools such as BigBrother or BigSister. I encourage everyone to have a go at modifying the status screen and let their imagination go wild.

Arni Snorri Eggertsson is an RHCE and an CATE for AIX 5L and AIX 4. He has been working professionally in the *nix world for about four years now although he has much longer been active as an amateur. His focus is large HACMP installations and AIX support. Arni can be contacted at arnie@gormur.com.