Cover V14, i03
mar2005.tar

SNMP Trap Handling with Nagios

Francois Meehan

My company has been very successful in providing network-monitoring solutions based on Nagios/Netsaint. For environments that required SNMP trap handling, however, the technique suggested in Nagios's documentation was too cumbersome. That technique requires coding for each individual trap message that needs to be monitored and, for those clients, we could only suggest making use of commercial solutions, all of which come with high price tags and complicated implementations.

Recently, however, we discovered Alex Burger's "SNMP Trap Translator" project that extends Net-SNMP. Coupled with Risto Vaarandi's event correlation tool called "SEC", a small Python script, and, of course, Nagios, we have put together a very scalable, efficient alternative. The whole process is pictured in Figure 1.

Please note that we used Red Hat advanced server for this particular installation, but this solution should be adaptable to other modern Linux distributions.

Pre-requisites:

1. Net-SNMP with snmptrapd configured.
2. SNMPTT, SNMP trap translator.
3. SEC -- Simple event correlator.
4. Nagios.
5. Mib definition files for the equipment or software you need to monitor.

One of the beauties of this solution is that we can use the event severity set by the mib designer. Nagios will always report the event status based on this information.

Net-SNMP

Net-SNMP, formally known as UCD-SNMP, is installed by default on most Linux distributions. Here we are specifically interested in configuring the trap receiver portion of the installation. The trap receiver is a daemon that receives its startup configuration in /etc/rc.d/init.d/snmptrapd. We modified the following line:

OPTIONS="-s -u /var/run/snmptrapd.pid"
To this:

OPTIONS="-On -u /var/run/snmptrapd.pid"
As quoted from SNMP Trap Translator documentation: "The -On is recommended. This will make snmptrapd pass OIDs in numeric form and prevent SNMPTT from having to translate the symbolic name to numerical form."

We then modified the file /usr/share/snmp/snmptrapd.conf by adding the following line:

traphandle default /usr/sbin/snmptt
When the modification is done, the deamon must be restarted for the changes to take effect.

SNMP Trap Translator

Install the trap translator by following the supplied instructions and then configure the file /etc/snmp/snmptt.ini by altering the some of the parameters as follows:

mode = standalone
dns_enable = 1
strip_domain = 1
net_snmp_perl_enable = 1
translate_value_oids = 1
translate_enterprise_oid_format = 1
translate_trap_oid_format = 1
translate_varname_oid_format = 1
log_enable = 1
syslog_enable = 1
syslog_level = info
snmptt_conf_files =  /etc/snmp/snmptt.conf
Once the trap translator is successfully installed, you need to feed some mib trap definitions to it. For example, assume that you have the mib for the APC UPS "powernet361.mib":

./snmpttconvertmib --in=/usr/share/snmp/mibs/powernet361.mib --out=/etc/snmp/snmptt.conf
The next time you add mib definitions using this command, the translated mib will be appended to the snmptt.conf file. You can use a different output file but, if you do so, then do not forget to change the snmptt.ini. For example, you can do:

snmptt_conf_files = <<END
/etc/snmp/snmptt.conf.generic
/etc/snmp/snmptt.conf.compaq
/etc/snmp/snmptt.conf.cisco
/etc/snmp/snmptt.conf.hp
/etc/snmp/snmptt.conf.3com
END
When a trap occurs, the trap translator will output the information received from the trap to syslog and, at this point, SEC will take over.

SEC

SEC was installed before we discovered SNMP Trap Translator. We had used it as an event filter-correlator in a centralized syslog configuration to interface some events with Nagios. It may be possible to have the translator interface directly with Nagios, but we thought we would lose a lot in terms of flexibility if we did this.

By using SEC, we can alter the flow of messages and do things such as:

  • For a non-stable machine or non-production equipment, you can delay alarms that occur at night until morning.
  • If you are suddenly bombarded by traps from a device, SEC can be used to regulate the flow.

Note that we chose to install SEC in the "/opt" directory.

We start SEC with the following statement in the rc.local file:

/opt/sec/sec.pl -input=/var/log/messages -conf=/opt/sec/sec.conf \
  -detach -log=/var/log/sec.log
In our sec.conf file is the rule that handles SNMP traps:

# Snmptrap event translated by snmptraptt
type=Single
ptype=RegExp
pattern=snmptt.*(Normal|INFORMATIONAL|MINOR|WARNING|SEVERE|MAJOR|CRITICAL) \
  \"Status Events\" (\w+) \- (.*)
desc=snmptrap received from $2
action=shellcmd /opt/nagios/libexec/eventhandlers/snmptraphandling.py \
  $2 $1 "$3"
  
snmptraphandling.py

The snmptraphandling.py Python script performs the following actions for us:

  • Formats the output of SEC
  • Obtains the current time in epoch format
  • Translates the severity code
  • Posts the data into the Nagios command file

See Listing 1 for complete code.

Nagios

This is the delicate part. Great care must be taken when monitoring traps to get it right. Given the high security standards of the customer for which we did this initial setup, one requirement was that all major events had to be audited. Thus, for each Nagios host capable of sending traps, we configured three services: one for critical, one for warnings, and one for "OK" events. This way, when a critical or warning event occurs, the systems administrators must submit a passive check result "OK" and enter a brief description of what action was taken to correct the situation in order to reset the event. This information is then kept in the Nagios event log.

Because of the number of SNMP-capable devices, we have made use of service templates as shown below:

define service {
    use    passive-check-template
    name   generic-snmptrap
    service_description    snmp_trap_handling
    is_volatile 1
    check_period    none
    notification_interval    120
    notification_options    w,u,c,r
    notification_period    24x7
    check_command    passive_check_missing
    max_check_attempts    1
    check_freshness    0
    stalking_options    o,w,u,c
    register 0
}
Note the "stalking_options" parameter. Without it, if two or more traps are received quickly one after the other, the Nagios administrator will receive only alerts from the first trap and thus will not get duplicate messages.

For each host to be monitored, we created the following services that refer to the previous service template as shown above:

define service {
    use    generic-snmptrap
    host_name     my_example_host
    contact_groups    my_example_group
    service_description    snmp_trap_handling_ok
}

define service {
    use    generic-snmptrap
    host_name    my_example_host
    contact_groups    my_example_group
    service_description    snmp_trap_handling_warning
}

define service {
    use    generic-snmptrap
    host_name     my_example_host
    contact_groups    my_example_group
    service_description    snmp_trap_handling_critical
}
To test the whole solution, we used an APC UPS with an on-board management card. After we configured the trap destination and community parameters, we began our tests. We simply disconnected the UPS from the mains, and Nagios immediately started to receive events and generate alarms. We then fed mib definitions for most of our major equipment vendors and, since then, have witnessed many different alarms for disks, cooling fans, and a host of other events.

Because we are using Nagios to monitor processes, it only makes sense to use it to monitor the presence of SEC and snmptrapd processes. This gives Nagios a brand new range of opportunities and makes our customer really happy.

This solution is only possible because smart individuals take the time to write high-quality software, such as Nagios, SNMP Trap Translator, Net-SNMP, and SEC.

References

Nagios -- http://nagios.org/

Net-SNMP -- http://www.net-snmp.org/

SEC -- http://kodu.neti.ee/~risto/sec/

SN MP Trap Translator -- http://www.snmptt.org/

Francois Meehan has more than 20 years of experience in computer operations, covering midrange computers to PCs, from AIX to OpenBSD, Linux, and Windows operating systems. He founded his own consulting company, CEDVAL Info Inc., specializing in network monitoring and open source solutions integration. CEDVAL Info customers are mostly in the pharmaceutical and manufacturing industries in the Montreal region and the United States. Francois, his wife, Lise, and their two children live in Notre-Dame-de-l'Ile-Perrot, an island on the St. Lawrence River west of Montreal. Francois can be contacted at: francois@cedval.org.