Cover V14, i12
dec2005.tar

Trap Customization in an Enterprise OpenView Operations/NNM Environment

Andy Yuen

Many of the enterprise customers I've worked with manage a large number of servers and network devices using the scalability features of OpenView Operations (OVO) and Network Node Manager (NNM). They tend to end up with a configuration similar to that depicted in Figure 1. OVO is used as the manager of managers responsible for centralized event browsing and the management of servers with multiple NNM collection stations handling SNMP events from network devices.

Management of SNMP devices is performed on the collection stations, but OVO still needs to be informed of important events so operations staff will be notified of potential problems to investigate and resolve. With so many messages coming into the OVO event system, unabated, the events in the browser will grow and grow making it almost impossible for staff to make sense of what the system is telling them. Consequently, mechanisms are required to automatically remove some of the alarms reported in the OVO event browser (i.e., change them from active to history events) when the alarm condition no longer exists. Making this happen implies the need to define a clearing event to every alarm event wherever possible.

Possible Solutions

OVO and NNM both have in-built facilities to partially address this problem. Let's examine them in turn. OVO comes with a set of templates for system management that the administrator can push out to OVO agents to monitor the servers on which they are running. These templates have already defined clearing events for many of the alarm conditions. OVO Smart Plug-ins are additional templates/monitors that can be selectively installed to handle application monitoring such as Exchange server, Oracle database, etc. Again, these templates/monitors already have clearing events defined. If templates for the events that you want to manage are not included in these pre-built templates, you must create them yourself using the OVO graphical user interface.

NNM comes with Event Correlation Services (ECS). ECS contains some pre-built circuits to avoid the occurrence of event storms when a main router goes down, generating of a large number of unreachable alarms for the downstream devices. These pre-built circuits include:

  • ConnectorDown
  • MgXServerDown
  • Pairwise
  • RepeatedEvent
  • ScheduledMaintenance

The "Pairwise" circuit does exactly what we want. This pair-wise correlation matches a clearing (parent) event to one or more previously occurring alarm (child) events. Alarm events can be configured to:

  • Display in the Alarm Browser while awaiting arrival of a clearing event.
  • Display in the Alarm Browser only if the specified time window is exceeded without the arrival of a matching clearing event.
  • Remove from the Alarm Browser the alarm event, or set it to Acknowledged upon receiving a clearing event.

Because OVO includes a copy of NNM, ECS is also available to OVO.

Let's examine how we can apply these built-in facilities in the environment described previously. For system management, the decision is simple: either use the pre-built templates, or create your own since there is only one instance of OVO running as the MOM.

In handling SNMP traps generated by network devices or hosts, we have a dilemma. Configuring the Pairwise circuit on the NNM collection stations will cause correlation to occur in the NNM collection station's event browser only. It will not affect the OVO event browser. To use ECS on OVO, we must forward the SNMP traps to OVO, defeating the purpose of using collection stations for scalability reasons. If all traps are forwarded to OVO, there is no offloading of processing to the collection stations.

Another limiting factor is that, for an event to appear in the OVO event browser, the node generating the event must be configured in the OVO Node Bank. In an enterprise environment, this is not always feasible. Customers may want to channel events from a set of related devices via a gateway node that has already been configured in the Node Bank. The devices generating the SNMP traps may not be configured in the OVO Node Bank because the administrator does not want to clutter the Node Bank. These devices may not even appear in the OVO object database by design (usually for scalability reasons). I will show an example in a later section.

Approach Taken

The approach taken to resolve this dilemma is to use a combination of OVO custom template and OVO's opcmsg command. The idea is to configure automatic actions in the event system on the NNM collection stations. The automatic actions convert the content of the trap to a corresponding opcmsg command and invoke it to send the information to OVO. The event forwarding process is as follows:

1. NNM collection station receives a trap that needs forwarding to OVO.

2. NNM's event system executes the automatic action, a custom Perl script that packages the trap information into an opcmsg command and executes it to forward the event to OVO. The reason to execute a custom Perl script as an automatic action and not execute the opcmsg command directly is that the Perl script can do dynamic severity translation based on a selected trap varbind instead of hard-coding a fixed OpenView severity. However, the custom Perl script can also be configured to send the opc message at a fixed OpenView severity. By supporting dynamic severity translation, the same trap can be used as both the alarm and clearing event depending on the severity value contained in the varbind. This feature is important because many MIBs do not have separate alarm and clearing events but rely on the severity contained in a particular varbind to indicate the kind of event.

3. The opc message is sent to OVO using opcmsg.

4. The message text prefix string (described next) triggers the custom template.

5. The template causes OVO's message correlation system either to add the event to the OVO active event browser for an alarm event or to remove an existing alarm from the browser for a clearing event.

The opcmsg command is available on all servers with OVO agent installed. You may want to look up the manpage on opcmsg for a full description on its command syntax.

For our purposes, we use each field in the opcmsg command as follows:

  • Severity -- Specifies one of OVO's severity levels: normal, warning, minor, major, or critical. SNMP devices may define severity differently in their respective MIB. In such cases, the SNMP device's severity must be mapped to OVO's by the custom Perl script.
  • Application -- Denotes the application generating the event. This can be assigned on a per-MIB basis.
  • Object -- Used as the correlator between alarm and clearing events. It must be unique so that a clearing event can be matched to the alarm event and cause it to be removed from the OVO event browser. In most cases, you have to concatenate the device name generating the event and one or more varbind values. For example, for SNMP linkup and linkdown traps, the reporting node name and the varbind "ifIndex", combined, will uniquely match these events. If the names of the device and the ifIndex are device456 and 3, respectively, object can be set to "device456-3" to be used as the correlator.
  • Msg_text -- Contains the text you want it to appear in the OVO event browser. This is specific to each individual trap. This is prefixed with a short special text string to trigger the custom template for message correlation (described next). Let's define this special prefix string as "-AaYy-".
  • Msg_grp -- The message group in which you want the message to appear. The message group must have already been defined in OVO.
  • Node -- The node appearing in the OVO event as the one generating the event. The node must have already been defined in the OVO Node Bank. If the devices sending traps have not been defined in the OVO Node Bank, then the node name of the gateway host who forwards these events must be specified here.

The custom OVO template for handling these opcmsg messages must perform the following:

  • Identify the special short prefix text string in the Msg_text using Message Key and Message Key Relation defined in the Message Correlation Window.
  • Use the object field as the correlator for message correlation.
  • Remove the special short prefix from msg_text before display.
  • A less severe event cancels a more severe event. For example, major cancels critical, minor cancels both major and critical, and so on.

Defining the above template only requires a few clicks and entering the Message Key and Message Key Relation information for OVO message correlation using the OVO administrator's graphical user interface for an OVO administrator. Consequently, I am going to concentrate the rest of the article on the generation of the appropriate opcmsg command and configuring the NNM event system.

GUI to Generate Custom Script and NNM Event Configuration

It is true that the appropriate opcmsg commands can be manually configured as automatic actions using NNM's xnmtrap GUI. However, specifying an opcmsg command directly precludes the possibility of dynamic severity translation based on the value of a trap varbind because certain MIBs define their own severity levels that are quite different from OpenView's. Also, owing to the complex syntax of the opcmsg command, it is unlikely that it can be coded correctly the first time. This is why I have developed a Web-based user interface to facilitate the creation of a custom Perl script and a partial trapd.conf file for use in replacing the existing section of the trapd.conf file. This makes it easy to configure and customize the forwarding of traps using the approach described previously.

The Web application is named trapapp. It contains four menu items:

  • Upload MIB File -- If you want to customize traps from a particular MIB, you must first upload the MIB using this menu. This is the default home page if there are no MIB-specific trapapp configuration files on the system (e.g., when the Web application is run for the first time). Figure 2 is the screenshot for this page. A MIB-specific trapapp configuration file is created once the MIB has been uploaded to the application. The name of the configuration file is the same as the MIB file uploaded with the file extension ".conf" appended. The "Edit MIB-specific Configuration" page is displayed after the upload.
  • Select Customization Configuration File -- If there are MIB-specific trapapp configuration files created by the application on the system, this will be the home page. You will be presented with a selection list to pick an existing MIB-specific trapapp configuration for modification. This page is shown in Figure 3. The "Edit MIB-specific Configuration" page is displayed after the selection.
  • Edit MIB-specific Configuration -- This menu item is available only when a particular MIB file has been uploaded or when a MIB-specific trapapp configuration file has been selected. It displays a page that allows you to customize each trap. You pick the trap you want to customize by clicking on the trap selection button as shown in Figure 4. This brings up the page shown in Figures 5a and 5b for you to enter detailed customization information.
  • Generate Script and Partial trapd Configuration -- Again, this menu item is available only when a particular MIB file has been uploaded or when a MIB-specific trapapp configuration file has been selected. It displays a page that allows you to complete any additional information needed for customizing a Perl script that assembles and executes an opcmsg command (Figure 6). The customization information entered is validated before the script and the partial trapd.conf file are generated. If there is any missing or inconsistent information in the MIB-specific trapapp configuration file, error messages will be displayed and no script and configuration files will be generated until all identified problems have been rectified. Figure 7 shows configuration problems identified by the validation process that could be difficult to spot if you were configuring opcmsg commands manually using xnmtrap. The name of the partial trapd.conf file generated is the same as the MIB's but with ".trapd.conf" appended. Additionally, it also updates the trusted command file "trapapp" in the $OV_CONF/trustedCmds.conf directory.

Implementation

The trapapp application is implemented using Perl. It consists of the Perl script trapapp.pl and the Perl module MIBFSM.pm. The only non-standard module used is Config::IniFiles, which reads/writes Windows .ini-style configuration files. You have to download and install the Config::IniFiles module before you can use trapapp. Trapapp.pl implements all the form displays and gathering of information for generating the custom Perl script and partial trapd.conf file. You will notice that there are a number of files with a ".tpl" file extension. Trapapp uses a simple templating system to create Web page content and the custom Perl script based on these template files. They are text files with special markers that look like:

%%NAME%%
These special markers are replaced by the values of hash entries with keys equal to the NAME part of the marker. The text replacement is handled by the procedure named template.

The MIBFSM.pm module implements a finite state machine specifically designed for a single purpose: the extraction of V1 TRAP-TYPE or V2 NOTIFICATION-TYPE information that includes trap number, trap name, and varbinds from uploaded MIB files in ASN.1 syntax. It does not recognize any other MIB definitions syntax.

As you enter information to customize traps using the trapapp application, the information is saved in different sections of a configuration file in Windows .ini file format using the Config::IniFiles module. A configuration file consists of five sections: trap, format, alarm, clearing, and opcinfo. The format of each section is described below (please note that you do not have to know the format of the configuration file to use the application):

[trap]
trapnum=trapname~varbind1~varbind2~...~varbindN
The purpose of the trap section is to record the trap name and its varbinds and associate that information with a trap number, for example:

15=netHealthInfo~nhdErrorDate~nhdErrorTime~nhdErrorCode~nhdError \
  Message~nhdServerIp~nhdServerName~nhServerPort~nhElementId

[format]
trapnum=eventType~messageFormat
where "eventType" can be either alarm, clearing, or ignore.

The purpose of the format section is to identify the traps used for generating alarm and clearing events and their message format. Traps with eventType ignore are ignored during script generation, for example:

21=alarm~Alarm raised for $5, $11 for $12: $8 \
  (NH:$2,$3,$6,$16,$17,$9,$10,$5)

[alarm]
trapnum=correlatorFormat~varbind/fixedSeverity~toNormal~toWarning \
  ~toMinor~toMajor~toCritical
where varbind/fixedSeverity contains either a fixed OpenView severity (one of normal, warning, minor, major, or critical) or a varbind. In the latter case, toNormal, toWarning, etc. contains the translation mapping from the varbind's severity to that of OpenView's.

The purpose of the alarm section is to define the correlator for alarm and clearing events and translation mapping from MIB-specific severity to that of OpenView's:

21=$0-$5-$16~nhdErrorMessage~normal~warning~minor~major~critical

[clearing]
trapnum=alarmTrapNum~fixedSeverity
where alarmTrapNum specifies the alarm trap this event is to clear and fixedSeverity specifies the fixed OpenView severity of the trap. For example:

23=16~normal

[opcinfo]
appname=appName
messagegroup=groupName
nodename=nodeName or $0
If a specific nodeName is given, it will be used as the name of the sender of the trap. If $0 is specified, the host name of the agent sending the trap will be specified as the source of the message. If you are using a gateway host to send information on devices not configured in the OVO Node Bank, use a fixed nodename instead of $0.

The purpose of this section is to supply the information to complete the opcmsg command:

appname=eHealth
messagegroup=eHealth
nodename=$0
The trapapp Web application has a configuration file named trapapp.ini. Its content is shown below:

[app]
scriptdir=./gen/
opcmsg=/opt/OV/bin/OpC/opcmsg
trapdconf=C:/Program Files/HP OpenView/NNM/conf/C/trapd.conf
proddir=/opt/trapapp/
trustedcmddir=C:/Program Files/HP OpenView/NNM/conf/trustedCmds.conf/
It consists of an [app] section and the following entries:

  • scriptdir -- Specifies the directory in which the generated Perl script and partial MIB-specific trapd.conf file are to be placed. The specified directory must already exist.
  • Opcmsg -- Defines the full path of the opcmsg command.
  • Proddir -- Specifies the directory in which the production version of the generated Perl script is kept. It is safer to copy the generated script to a production directory to avoid unintentional changes using the Web-based GUI. This path is used to configure the trusted commands only. Trapapp does not perform the copying of scripts from the scriptdir directory to the proddir directory.
  • trapdconf -- Defines the complete path to NNM's trapd.conf file. Note that the path is different on different platforms (e.g., Windows and Unix).
  • Trustedcmddir -- Specifies the trusted command directory where a script's name must be listed before it will be executed as an automatic action.

When you click on the Generate button, the application performs the following tasks:

  • Carries out consistency checks on the configuration information. If it fails the consistency check, all tasks below are skipped and error messages are displayed.
  • Extracts the validated information from the MIB-specific configuration file and creates the custom Perl script by using the script.tpl template file.
  • Uses the Perl ".." operator to extract the relevant sections in the trapd.conf file, adds the automatic action, and saves this partial trapd.conf file for use by the administrator to update the production trapd.conf file using the xnmevents -replace command.
  • Adds the name of the custom Perl script to the trusted command list file: $OV_CONF/trustedCmds.conf/trapapp if the file does not exist or the Perl script name entry is absent in the file.

In summary, the trapapp Web application is designed to reduce to a minimum the amount of manual configuration required on NNM collection stations.

Setup and Usage

I assume here that the reader is familiar with the configuration of a Web server, creating OVO templates, and configuring NNM automatic actions using the respective OVO and NNM tools. If so, then setting up the trapapp Web application is simple. All it does is generate a custom Perl script and a partial trapd.conf file for each MIB you uploaded. To set up trapapp, complete the following steps:

1. Download and install the Config::IniFiles Perl module either from CPAN or ActivePerl.

2. Unzip (on Windows) or jar xvf (on Unix assuming you have Java 2 SDK installed) the trapapp.zip file into its own directory named trapapp (any name will do).

3. Configure a virtual directory on the Web server named ovtrap to point to that directory.

4. Configure default document on the Web server as index.html.

Invoke the trapapp application by using the URL:

http://localhost/trapapp/
You can now start experimenting on the trapapp application and examine the generated partial trapd.conf file. Once you are comfortable with the generated partial trapd.conf file. You can update NNM's trapd.conf file with the information contained in the generated partial trapd.conf file using the NNM command xnmevents:

Xnmevents -replace partialConfigFile
where partialConfigFile is the full path name of the generated partial trapd.conf file.

It is trivial to include a menu item in the Web application to do this. I do not implement it because providing such a convenience feature makes it too easy to change the production trapd configuration unintentionally while experimenting with the Web GUI. It is exactly for the same reason that the trusted command list contained in the $OV_CONF/trustedCmds.conf/trapapp is only updated if the generated Perl script name is absent in the file.

Additionally, there is a one-off setup involving the following tasks on the OVO Server:

1. Define OVO template that implements the message correlation facility described earlier.

2. Distribute the template to the OVO agent on the OVO server.

When things do not work as expected, the following checklist may help you resolve the problems:

1. Does the information in the trapapp.ini file correctly reflect your environment?

2. Has the message group specified in the opcmsg command been defined in OVO?

3. Has the node name used in the opcmsg command been configured in the OVO Node Bank?

4. Has the MIB you uploaded to trapapp been loaded using NNM's xnmloadmib tool?

5. Have you updated the trapd.conf file by using the xnmevents command as described previously?

6. Is the path information contained in OV_CONF/trustedCmds.conf/trapapp for the custom Perl scripts correct?

7. Has the generated Perl script been moved to the production directory and made executable (on Unix platforms only)?

An Example

If you use Concord eHealth in your organization, it is likely that you will have installed eHealth Integration for OpenView on one or more of your NNM collection stations. The installation includes loading the concord-diagmon.mib and setting up eHealth-specific menu items in the OpenView graphical user interface. It is quite common that not all servers managed by eHealth are configured in the OVO Node Bank or even in the OpenView object database. The approach described in this article handles this situation.

An example configuration for the concord-diagmon.mib is set up when you installed trapapp. In this sample configuration, trap 21 (nhLiveAlarm) and 23 (nhLiveClearAlarm) have been set up as alarm and clearing events, respectively. The correlator used is $A-$5-$16 (i.e., agentHostName-nhElementIp-nhExceptionId). nhLiveAlarm uses content of the varbind nhSeverity as the severity to use in the opsmsg command. In this case, the possible values of nhSeverity are the same as those for OpenView.

Limitations

Please note that this Web application is intended for OpenView administrators' personal use only. It is not intended to be used by the casual OpenView user. No security or concurrent access control has been implemented. Hence, if more than one user is updating the configuration for the same MIB at the same time, the result is unpredictable.

Conclusion

The trapapp Web application presented in this article provides a convenient way to customize relevant incoming traps to NNM collection stations and convert them into opc messages for forwarding to OVO for display and message correlation. It also helps to ensure the quality of the trap customization due to its extensive check on consistency of the configuration. You can find trapapp updates/fixes and other free tools under "Free Tools" on the Web site of Kardinia Software at: http://www.kardinia.com. If you have enhanced and/or fixed any bugs in trapapp, please send me a description and your changes so that I can post them on the Kardinia Web site.

Andy Yuen is a solutions architect who specializes in application development and system/network management. He has a master's degree in Electrical Engineering from Carleton University, Canada. He can be contacted at: andyyuen@ozemail.com.au.