Trap
Customization in an Enterprise OpenView Operations/NNM Environment
Andy Yuen
Many of the enterprise customers I've worked with manage a large
number of servers and network devices using the scalability features
of OpenView Operations (OVO) and Network Node Manager (NNM). They
tend to end up with a configuration similar to that depicted in
Figure 1. OVO is used as the manager of managers responsible for
centralized event browsing and the management of servers with multiple
NNM collection stations handling SNMP events from network devices.
Management of SNMP devices is performed on the collection stations,
but OVO still needs to be informed of important events so operations
staff will be notified of potential problems to investigate and
resolve. With so many messages coming into the OVO event system,
unabated, the events in the browser will grow and grow making it
almost impossible for staff to make sense of what the system is
telling them. Consequently, mechanisms are required to automatically
remove some of the alarms reported in the OVO event browser (i.e.,
change them from active to history events) when the alarm condition
no longer exists. Making this happen implies the need to define
a clearing event to every alarm event wherever possible.
Possible Solutions
OVO and NNM both have in-built facilities to partially address
this problem. Let's examine them in turn. OVO comes with a set of
templates for system management that the administrator can push
out to OVO agents to monitor the servers on which they are running.
These templates have already defined clearing events for many of
the alarm conditions. OVO Smart Plug-ins are additional templates/monitors
that can be selectively installed to handle application monitoring
such as Exchange server, Oracle database, etc. Again, these templates/monitors
already have clearing events defined. If templates for the events
that you want to manage are not included in these pre-built templates,
you must create them yourself using the OVO graphical user interface.
NNM comes with Event Correlation Services (ECS). ECS contains
some pre-built circuits to avoid the occurrence of event storms
when a main router goes down, generating of a large number of unreachable
alarms for the downstream devices. These pre-built circuits include:
- ConnectorDown
- MgXServerDown
- Pairwise
- RepeatedEvent
- ScheduledMaintenance
The "Pairwise" circuit does exactly what we want. This pair-wise
correlation matches a clearing (parent) event to one or more previously
occurring alarm (child) events. Alarm events can be configured to:
- Display in the Alarm Browser while awaiting arrival of a clearing
event.
- Display in the Alarm Browser only if the specified time window
is exceeded without the arrival of a matching clearing event.
- Remove from the Alarm Browser the alarm event, or set it to
Acknowledged upon receiving a clearing event.
Because OVO includes a copy of NNM, ECS is also available to OVO.
Let's examine how we can apply these built-in facilities in the
environment described previously. For system management, the decision
is simple: either use the pre-built templates, or create your own
since there is only one instance of OVO running as the MOM.
In handling SNMP traps generated by network devices or hosts,
we have a dilemma. Configuring the Pairwise circuit on the NNM collection
stations will cause correlation to occur in the NNM collection station's
event browser only. It will not affect the OVO event browser. To
use ECS on OVO, we must forward the SNMP traps to OVO, defeating
the purpose of using collection stations for scalability reasons.
If all traps are forwarded to OVO, there is no offloading of processing
to the collection stations.
Another limiting factor is that, for an event to appear in the
OVO event browser, the node generating the event must be configured
in the OVO Node Bank. In an enterprise environment, this is not
always feasible. Customers may want to channel events from a set
of related devices via a gateway node that has already been configured
in the Node Bank. The devices generating the SNMP traps may not
be configured in the OVO Node Bank because the administrator does
not want to clutter the Node Bank. These devices may not even appear
in the OVO object database by design (usually for scalability reasons).
I will show an example in a later section.
Approach Taken
The approach taken to resolve this dilemma is to use a combination
of OVO custom template and OVO's opcmsg command. The idea
is to configure automatic actions in the event system on the NNM
collection stations. The automatic actions convert the content of
the trap to a corresponding opcmsg command and invoke it
to send the information to OVO. The event forwarding process is
as follows:
1. NNM collection station receives a trap that needs forwarding
to OVO.
2. NNM's event system executes the automatic action, a custom
Perl script that packages the trap information into an opcmsg
command and executes it to forward the event to OVO. The reason
to execute a custom Perl script as an automatic action and not execute
the opcmsg command directly is that the Perl script can do
dynamic severity translation based on a selected trap varbind instead
of hard-coding a fixed OpenView severity. However, the custom Perl
script can also be configured to send the opc message at a fixed
OpenView severity. By supporting dynamic severity translation, the
same trap can be used as both the alarm and clearing event depending
on the severity value contained in the varbind. This feature is
important because many MIBs do not have separate alarm and clearing
events but rely on the severity contained in a particular varbind
to indicate the kind of event.
3. The opc message is sent to OVO using opcmsg.
4. The message text prefix string (described next) triggers the
custom template.
5. The template causes OVO's message correlation system either
to add the event to the OVO active event browser for an alarm event
or to remove an existing alarm from the browser for a clearing event.
The opcmsg command is available on all servers with OVO
agent installed. You may want to look up the manpage on opcmsg
for a full description on its command syntax.
For our purposes, we use each field in the opcmsg command
as follows:
- Severity -- Specifies one of OVO's severity levels: normal,
warning, minor, major, or critical. SNMP devices may define severity
differently in their respective MIB. In such cases, the SNMP device's
severity must be mapped to OVO's by the custom Perl script.
- Application -- Denotes the application generating the event.
This can be assigned on a per-MIB basis.
- Object -- Used as the correlator between alarm and clearing
events. It must be unique so that a clearing event can be matched
to the alarm event and cause it to be removed from the OVO event
browser. In most cases, you have to concatenate the device name
generating the event and one or more varbind values. For example,
for SNMP linkup and linkdown traps, the reporting node name and
the varbind "ifIndex", combined, will uniquely match these events.
If the names of the device and the ifIndex are device456 and 3,
respectively, object can be set to "device456-3" to be used as
the correlator.
- Msg_text -- Contains the text you want it to appear in the
OVO event browser. This is specific to each individual trap. This
is prefixed with a short special text string to trigger the custom
template for message correlation (described next). Let's define
this special prefix string as "-AaYy-".
- Msg_grp -- The message group in which you want the message
to appear. The message group must have already been defined in
OVO.
- Node -- The node appearing in the OVO event as the one generating
the event. The node must have already been defined in the OVO
Node Bank. If the devices sending traps have not been defined
in the OVO Node Bank, then the node name of the gateway host who
forwards these events must be specified here.
The custom OVO template for handling these opcmsg messages
must perform the following:
- Identify the special short prefix text string in the Msg_text
using Message Key and Message Key Relation defined in the Message
Correlation Window.
- Use the object field as the correlator for message correlation.
- Remove the special short prefix from msg_text before display.
- A less severe event cancels a more severe event. For example,
major cancels critical, minor cancels both major and critical,
and so on.
Defining the above template only requires a few clicks and entering
the Message Key and Message Key Relation information for OVO message
correlation using the OVO administrator's graphical user interface
for an OVO administrator. Consequently, I am going to concentrate
the rest of the article on the generation of the appropriate opcmsg
command and configuring the NNM event system.
GUI to Generate Custom Script and NNM Event Configuration
It is true that the appropriate opcmsg commands can be
manually configured as automatic actions using NNM's xnmtrap GUI.
However, specifying an opcmsg command directly precludes
the possibility of dynamic severity translation based on the value
of a trap varbind because certain MIBs define their own severity
levels that are quite different from OpenView's. Also, owing to
the complex syntax of the opcmsg command, it is unlikely
that it can be coded correctly the first time. This is why I have
developed a Web-based user interface to facilitate the creation
of a custom Perl script and a partial trapd.conf file for use in
replacing the existing section of the trapd.conf file. This makes
it easy to configure and customize the forwarding of traps using
the approach described previously.
The Web application is named trapapp. It contains four menu items:
- Upload MIB File -- If you want to customize traps from a particular
MIB, you must first upload the MIB using this menu. This is the
default home page if there are no MIB-specific trapapp configuration
files on the system (e.g., when the Web application is run for
the first time). Figure 2 is the screenshot for this page. A MIB-specific
trapapp configuration file is created once the MIB has been uploaded
to the application. The name of the configuration file is the
same as the MIB file uploaded with the file extension ".conf"
appended. The "Edit MIB-specific Configuration" page is displayed
after the upload.
- Select Customization Configuration File -- If there are MIB-specific
trapapp configuration files created by the application on the
system, this will be the home page. You will be presented with
a selection list to pick an existing MIB-specific trapapp configuration
for modification. This page is shown in Figure 3. The "Edit MIB-specific
Configuration" page is displayed after the selection.
- Edit MIB-specific Configuration -- This menu item is available
only when a particular MIB file has been uploaded or when a MIB-specific
trapapp configuration file has been selected. It displays a page
that allows you to customize each trap. You pick the trap you
want to customize by clicking on the trap selection button as
shown in Figure 4. This brings up the page shown in Figures 5a
and 5b for you to enter detailed customization information.
- Generate Script and Partial trapd Configuration -- Again, this
menu item is available only when a particular MIB file has been
uploaded or when a MIB-specific trapapp configuration file has
been selected. It displays a page that allows you to complete
any additional information needed for customizing a Perl script
that assembles and executes an opcmsg command (Figure 6).
The customization information entered is validated before the
script and the partial trapd.conf file are generated. If there
is any missing or inconsistent information in the MIB-specific
trapapp configuration file, error messages will be displayed and
no script and configuration files will be generated until all
identified problems have been rectified. Figure 7 shows configuration
problems identified by the validation process that could be difficult
to spot if you were configuring opcmsg commands manually
using xnmtrap. The name of the partial trapd.conf file generated
is the same as the MIB's but with ".trapd.conf" appended. Additionally,
it also updates the trusted command file "trapapp" in the $OV_CONF/trustedCmds.conf
directory.
Implementation
The trapapp application is implemented using Perl. It consists
of the Perl script trapapp.pl and the Perl module MIBFSM.pm. The
only non-standard module used is Config::IniFiles, which reads/writes
Windows .ini-style configuration files. You have to download and
install the Config::IniFiles module before you can use trapapp.
Trapapp.pl implements all the form displays and gathering of information
for generating the custom Perl script and partial trapd.conf file.
You will notice that there are a number of files with a ".tpl" file
extension. Trapapp uses a simple templating system to create Web
page content and the custom Perl script based on these template
files. They are text files with special markers that look like:
%%NAME%%
These special markers are replaced by the values of hash entries with
keys equal to the NAME part of the marker. The text replacement is
handled by the procedure named template.
The MIBFSM.pm module implements a finite state machine specifically
designed for a single purpose: the extraction of V1 TRAP-TYPE or
V2 NOTIFICATION-TYPE information that includes trap number, trap
name, and varbinds from uploaded MIB files in ASN.1 syntax. It does
not recognize any other MIB definitions syntax.
As you enter information to customize traps using the trapapp
application, the information is saved in different sections of a
configuration file in Windows .ini file format using the Config::IniFiles
module. A configuration file consists of five sections: trap, format,
alarm, clearing, and opcinfo. The format of each section is described
below (please note that you do not have to know the format of the
configuration file to use the application):
[trap]
trapnum=trapname~varbind1~varbind2~...~varbindN
The purpose of the trap section is to record the trap name and its
varbinds and associate that information with a trap number, for example:
15=netHealthInfo~nhdErrorDate~nhdErrorTime~nhdErrorCode~nhdError \
Message~nhdServerIp~nhdServerName~nhServerPort~nhElementId
[format]
trapnum=eventType~messageFormat
where "eventType" can be either alarm, clearing, or ignore.
The purpose of the format section is to identify the traps used
for generating alarm and clearing events and their message format.
Traps with eventType ignore are ignored during script generation,
for example:
21=alarm~Alarm raised for $5, $11 for $12: $8 \
(NH:$2,$3,$6,$16,$17,$9,$10,$5)
[alarm]
trapnum=correlatorFormat~varbind/fixedSeverity~toNormal~toWarning \
~toMinor~toMajor~toCritical
where varbind/fixedSeverity contains either a fixed OpenView severity
(one of normal, warning, minor, major, or critical) or a varbind.
In the latter case, toNormal, toWarning, etc. contains the translation
mapping from the varbind's severity to that of OpenView's.
The purpose of the alarm section is to define the correlator for
alarm and clearing events and translation mapping from MIB-specific
severity to that of OpenView's:
21=$0-$5-$16~nhdErrorMessage~normal~warning~minor~major~critical
[clearing]
trapnum=alarmTrapNum~fixedSeverity
where alarmTrapNum specifies the alarm trap this event is to clear
and fixedSeverity specifies the fixed OpenView severity of the trap.
For example:
23=16~normal
[opcinfo]
appname=appName
messagegroup=groupName
nodename=nodeName or $0
If a specific nodeName is given, it will be used as the name of the
sender of the trap. If $0 is specified, the host name of the agent
sending the trap will be specified as the source of the message. If
you are using a gateway host to send information on devices not configured
in the OVO Node Bank, use a fixed nodename instead of $0.
The purpose of this section is to supply the information to complete
the opcmsg command:
appname=eHealth
messagegroup=eHealth
nodename=$0
The trapapp Web application has a configuration file named trapapp.ini.
Its content is shown below:
[app]
scriptdir=./gen/
opcmsg=/opt/OV/bin/OpC/opcmsg
trapdconf=C:/Program Files/HP OpenView/NNM/conf/C/trapd.conf
proddir=/opt/trapapp/
trustedcmddir=C:/Program Files/HP OpenView/NNM/conf/trustedCmds.conf/
It consists of an [app] section and the following entries:
- scriptdir -- Specifies the directory in which the generated
Perl script and partial MIB-specific trapd.conf file are to be
placed. The specified directory must already exist.
- Opcmsg -- Defines the full path of the opcmsg command.
- Proddir -- Specifies the directory in which the production
version of the generated Perl script is kept. It is safer to copy
the generated script to a production directory to avoid unintentional
changes using the Web-based GUI. This path is used to configure
the trusted commands only. Trapapp does not perform the copying
of scripts from the scriptdir directory to the proddir directory.
- trapdconf -- Defines the complete path to NNM's trapd.conf
file. Note that the path is different on different platforms (e.g.,
Windows and Unix).
- Trustedcmddir -- Specifies the trusted command directory where
a script's name must be listed before it will be executed as an
automatic action.
When you click on the Generate button, the application performs
the following tasks:
- Carries out consistency checks on the configuration information.
If it fails the consistency check, all tasks below are skipped
and error messages are displayed.
- Extracts the validated information from the MIB-specific configuration
file and creates the custom Perl script by using the script.tpl
template file.
- Uses the Perl ".." operator to extract the relevant sections
in the trapd.conf file, adds the automatic action, and saves this
partial trapd.conf file for use by the administrator to update
the production trapd.conf file using the xnmevents -replace
command.
- Adds the name of the custom Perl script to the trusted command
list file: $OV_CONF/trustedCmds.conf/trapapp if the file does
not exist or the Perl script name entry is absent in the file.
In summary, the trapapp Web application is designed to reduce
to a minimum the amount of manual configuration required on NNM
collection stations.
Setup and Usage
I assume here that the reader is familiar with the configuration
of a Web server, creating OVO templates, and configuring NNM automatic
actions using the respective OVO and NNM tools. If so, then setting
up the trapapp Web application is simple. All it does is generate
a custom Perl script and a partial trapd.conf file for each MIB
you uploaded. To set up trapapp, complete the following steps:
1. Download and install the Config::IniFiles Perl module either
from CPAN or ActivePerl.
2. Unzip (on Windows) or jar xvf (on Unix assuming you
have Java 2 SDK installed) the trapapp.zip file into its own directory
named trapapp (any name will do).
3. Configure a virtual directory on the Web server named ovtrap
to point to that directory.
4. Configure default document on the Web server as index.html.
Invoke the trapapp application by using the URL:
http://localhost/trapapp/
You can now start experimenting on the trapapp application and examine
the generated partial trapd.conf file. Once you are comfortable with
the generated partial trapd.conf file. You can update NNM's trapd.conf
file with the information contained in the generated partial trapd.conf
file using the NNM command xnmevents:
Xnmevents -replace partialConfigFile
where partialConfigFile is the full path name of the generated partial
trapd.conf file.
It is trivial to include a menu item in the Web application to
do this. I do not implement it because providing such a convenience
feature makes it too easy to change the production trapd configuration
unintentionally while experimenting with the Web GUI. It is exactly
for the same reason that the trusted command list contained in the
$OV_CONF/trustedCmds.conf/trapapp is only updated if the generated
Perl script name is absent in the file.
Additionally, there is a one-off setup involving the following
tasks on the OVO Server:
1. Define OVO template that implements the message correlation
facility described earlier.
2. Distribute the template to the OVO agent on the OVO server.
When things do not work as expected, the following checklist may
help you resolve the problems:
1. Does the information in the trapapp.ini file correctly reflect
your environment?
2. Has the message group specified in the opcmsg command
been defined in OVO?
3. Has the node name used in the opcmsg command been configured
in the OVO Node Bank?
4. Has the MIB you uploaded to trapapp been loaded using NNM's
xnmloadmib tool?
5. Have you updated the trapd.conf file by using the xnmevents
command as described previously?
6. Is the path information contained in OV_CONF/trustedCmds.conf/trapapp
for the custom Perl scripts correct?
7. Has the generated Perl script been moved to the production
directory and made executable (on Unix platforms only)?
An Example
If you use Concord eHealth in your organization, it is likely
that you will have installed eHealth Integration for OpenView on
one or more of your NNM collection stations. The installation includes
loading the concord-diagmon.mib and setting up eHealth-specific
menu items in the OpenView graphical user interface. It is quite
common that not all servers managed by eHealth are configured in
the OVO Node Bank or even in the OpenView object database. The approach
described in this article handles this situation.
An example configuration for the concord-diagmon.mib is set up
when you installed trapapp. In this sample configuration, trap 21
(nhLiveAlarm) and 23 (nhLiveClearAlarm) have been set up as alarm
and clearing events, respectively. The correlator used is $A-$5-$16
(i.e., agentHostName-nhElementIp-nhExceptionId). nhLiveAlarm uses
content of the varbind nhSeverity as the severity to use in the
opsmsg command. In this case, the possible values of nhSeverity
are the same as those for OpenView.
Limitations
Please note that this Web application is intended for OpenView
administrators' personal use only. It is not intended to be used
by the casual OpenView user. No security or concurrent access control
has been implemented. Hence, if more than one user is updating the
configuration for the same MIB at the same time, the result is unpredictable.
Conclusion
The trapapp Web application presented in this article provides
a convenient way to customize relevant incoming traps to NNM collection
stations and convert them into opc messages for forwarding to OVO
for display and message correlation. It also helps to ensure the
quality of the trap customization due to its extensive check on
consistency of the configuration. You can find trapapp updates/fixes
and other free tools under "Free Tools" on the Web site of Kardinia
Software at: http://www.kardinia.com. If you have enhanced
and/or fixed any bugs in trapapp, please send me a description and
your changes so that I can post them on the Kardinia Web site.
Andy Yuen is a solutions architect who specializes in application
development and system/network management. He has a master's degree
in Electrical Engineering from Carleton University, Canada. He can
be contacted at: andyyuen@ozemail.com.au. |