SNMP
Trap Handling with Nagios
Francois Meehan
My company has been very successful in providing network-monitoring
solutions based on Nagios/Netsaint. For environments that required
SNMP trap handling, however, the technique suggested in Nagios's
documentation was too cumbersome. That technique requires coding
for each individual trap message that needs to be monitored and,
for those clients, we could only suggest making use of commercial
solutions, all of which come with high price tags and complicated
implementations.
Recently, however, we discovered Alex Burger's "SNMP Trap Translator"
project that extends Net-SNMP. Coupled with Risto Vaarandi's event
correlation tool called "SEC", a small Python script, and, of course,
Nagios, we have put together a very scalable, efficient alternative.
The whole process is pictured in Figure 1.
Please note that we used Red Hat advanced server for this particular
installation, but this solution should be adaptable to other modern
Linux distributions.
Pre-requisites:
1. Net-SNMP with snmptrapd configured.
2. SNMPTT, SNMP trap translator.
3. SEC -- Simple event correlator.
4. Nagios.
5. Mib definition files for the equipment or software you need to
monitor.
One of the beauties of this solution is that we can use the event
severity set by the mib designer. Nagios will always report the
event status based on this information.
Net-SNMP
Net-SNMP, formally known as UCD-SNMP, is installed by default
on most Linux distributions. Here we are specifically interested
in configuring the trap receiver portion of the installation. The
trap receiver is a daemon that receives its startup configuration
in /etc/rc.d/init.d/snmptrapd. We modified the following line:
OPTIONS="-s -u /var/run/snmptrapd.pid"
To this:
OPTIONS="-On -u /var/run/snmptrapd.pid"
As quoted from SNMP Trap Translator documentation: "The -On is recommended.
This will make snmptrapd pass OIDs in numeric form and prevent SNMPTT
from having to translate the symbolic name to numerical form."
We then modified the file /usr/share/snmp/snmptrapd.conf by adding
the following line:
traphandle default /usr/sbin/snmptt
When the modification is done, the deamon must be restarted for the
changes to take effect.
SNMP Trap Translator
Install the trap translator by following the supplied instructions
and then configure the file /etc/snmp/snmptt.ini by altering the
some of the parameters as follows:
mode = standalone
dns_enable = 1
strip_domain = 1
net_snmp_perl_enable = 1
translate_value_oids = 1
translate_enterprise_oid_format = 1
translate_trap_oid_format = 1
translate_varname_oid_format = 1
log_enable = 1
syslog_enable = 1
syslog_level = info
snmptt_conf_files = /etc/snmp/snmptt.conf
Once the trap translator is successfully installed, you need to feed
some mib trap definitions to it. For example, assume that you have
the mib for the APC UPS "powernet361.mib":
./snmpttconvertmib --in=/usr/share/snmp/mibs/powernet361.mib --out=/etc/snmp/snmptt.conf
The next time you add mib definitions using this command, the translated
mib will be appended to the snmptt.conf file. You can use a different
output file but, if you do so, then do not forget to change the snmptt.ini.
For example, you can do:
snmptt_conf_files = <<END
/etc/snmp/snmptt.conf.generic
/etc/snmp/snmptt.conf.compaq
/etc/snmp/snmptt.conf.cisco
/etc/snmp/snmptt.conf.hp
/etc/snmp/snmptt.conf.3com
END
When a trap occurs, the trap translator will output the information
received from the trap to syslog and, at this point, SEC will take
over.
SEC
SEC was installed before we discovered SNMP Trap Translator. We
had used it as an event filter-correlator in a centralized syslog
configuration to interface some events with Nagios. It may be possible
to have the translator interface directly with Nagios, but we thought
we would lose a lot in terms of flexibility if we did this.
By using SEC, we can alter the flow of messages and do things
such as:
- For a non-stable machine or non-production equipment, you can
delay alarms that occur at night until morning.
- If you are suddenly bombarded by traps from a device, SEC can
be used to regulate the flow.
Note that we chose to install SEC in the "/opt" directory.
We start SEC with the following statement in the rc.local file:
/opt/sec/sec.pl -input=/var/log/messages -conf=/opt/sec/sec.conf \
-detach -log=/var/log/sec.log
In our sec.conf file is the rule that handles SNMP traps:
# Snmptrap event translated by snmptraptt
type=Single
ptype=RegExp
pattern=snmptt.*(Normal|INFORMATIONAL|MINOR|WARNING|SEVERE|MAJOR|CRITICAL) \
\"Status Events\" (\w+) \- (.*)
desc=snmptrap received from $2
action=shellcmd /opt/nagios/libexec/eventhandlers/snmptraphandling.py \
$2 $1 "$3"
snmptraphandling.py
The snmptraphandling.py Python script performs the following actions
for us:
- Formats the output of SEC
- Obtains the current time in epoch format
- Translates the severity code
- Posts the data into the Nagios command file
See Listing 1 for complete code.
Nagios
This is the delicate part. Great care must be taken when monitoring
traps to get it right. Given the high security standards of the
customer for which we did this initial setup, one requirement was
that all major events had to be audited. Thus, for each Nagios host
capable of sending traps, we configured three services: one for
critical, one for warnings, and one for "OK" events. This way, when
a critical or warning event occurs, the systems administrators must
submit a passive check result "OK" and enter a brief description
of what action was taken to correct the situation in order to reset
the event. This information is then kept in the Nagios event log.
Because of the number of SNMP-capable devices, we have made use
of service templates as shown below:
define service {
use passive-check-template
name generic-snmptrap
service_description snmp_trap_handling
is_volatile 1
check_period none
notification_interval 120
notification_options w,u,c,r
notification_period 24x7
check_command passive_check_missing
max_check_attempts 1
check_freshness 0
stalking_options o,w,u,c
register 0
}
Note the "stalking_options" parameter. Without it, if two or more
traps are received quickly one after the other, the Nagios administrator
will receive only alerts from the first trap and thus will not get
duplicate messages.
For each host to be monitored, we created the following services
that refer to the previous service template as shown above:
define service {
use generic-snmptrap
host_name my_example_host
contact_groups my_example_group
service_description snmp_trap_handling_ok
}
define service {
use generic-snmptrap
host_name my_example_host
contact_groups my_example_group
service_description snmp_trap_handling_warning
}
define service {
use generic-snmptrap
host_name my_example_host
contact_groups my_example_group
service_description snmp_trap_handling_critical
}
To test the whole solution, we used an APC UPS with an on-board management
card. After we configured the trap destination and community parameters,
we began our tests. We simply disconnected the UPS from the mains,
and Nagios immediately started to receive events and generate alarms.
We then fed mib definitions for most of our major equipment vendors
and, since then, have witnessed many different alarms for disks, cooling
fans, and a host of other events.
Because we are using Nagios to monitor processes, it only makes
sense to use it to monitor the presence of SEC and snmptrapd processes.
This gives Nagios a brand new range of opportunities and makes our
customer really happy.
This solution is only possible because smart individuals take
the time to write high-quality software, such as Nagios, SNMP Trap
Translator, Net-SNMP, and SEC.
References
Nagios -- http://nagios.org/
Net-SNMP -- http://www.net-snmp.org/
SEC -- http://kodu.neti.ee/~risto/sec/
SN MP Trap Translator -- http://www.snmptt.org/
Francois Meehan has more than 20 years of experience in computer
operations, covering midrange computers to PCs, from AIX to OpenBSD,
Linux, and Windows operating systems. He founded his own consulting
company, CEDVAL Info Inc., specializing in network monitoring and
open source solutions integration. CEDVAL Info customers are mostly
in the pharmaceutical and manufacturing industries in the Montreal
region and the United States. Francois, his wife, Lise, and their
two children live in Notre-Dame-de-l'Ile-Perrot, an island on the
St. Lawrence River west of Montreal. Francois can be contacted at:
francois@cedval.org. |