oct2003.tar

Network Monitoring with Nagios

Syed Ali

System and network monitoring is essential for systems administrators. There are many SNMP-based management tools -- both commercial and free -- that can be used to manage and monitor nodes on a network. There are also non-SNMP based tools, which do a good job of monitoring network nodes. In this article, I will explain how to install and use Nagios, a GPL tool written by Ethan Galstad that you can use for host and service monitoring.

Nagios is primarily based on Linux, but will run under most Unix variants. Nagios has many useful features that can help enhance performance and promote a user-friendly environment. For example, if you have a large number of Web servers running in your environment, you can monitor the HTTP service on all of them using Nagios. The best part is that you do not have to install any program on your Web servers to do the monitoring because Nagios can monitor the HTTP service remotely. Similarly, you can monitor almost any TCP- or UDP-based service that is running on any platform without having to install Nagios on any of the monitored hosts. I use Nagios to monitor services such as HTTP, DHCP, DNS, FTP, SMTP, and LDAP.

Nagios can also monitor host resources such as disk space, CPU load, log file size, running processes, and memory usage. Unlike service monitoring, host resource monitoring requires installation of the Nagios monitoring agent on the host that is being monitored. Nagios supports notification of problems and resolutions via email or pager. With the help of a plugin, Nagios can also be used to monitor SNMP-based events. Information about monitored hosts and services can be displayed in a 3-D VRML map.

Installation

You can download Nagios at: http://www.nagios.org. CVS-enlightened users can download Nagios source code by typing:

cvs'-'cvs -d:pserver:anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios
anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios login

with a blank password. For example, hit the enter key when prompted for a password and then type:

cvs -z3  -d:pserver:anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios
anonymous@cvs.nagios.sourceforge.net:/cvsroot/nagios co nagios

You should also check:

http://www.nagios.org/cvs.php

before attempting to download via CVS in case of any updates.

Nagios consists of a core program and some plugins. The core program calls the plugins to do the monitoring. check_smtp, for example, is a plugin that checks to see whether SMTP is running on the specified host. The core program is written in C and some of the plugins are in C as well as Perl. The core program (as of this writing) is v1.0 and can be downloaded as nagios-1.0.tar.gz.

For the purposes of this article, I will show compile and install information on a Red Hat Linux v7.2 box with kernel v2.4.7-10smp. Nagios does not require a fast machine to run, although I happen to have a dual processor 2.4-GHz Pentium IV with 4 GB of RAM on which I am running Nagios. Once you have downloaded Nagios, download the basic set of plugins from: http://www.nagios.org/download/. You can start with the base set, "nagios-plugins-1.3.0.tar.gz". A list of the plugins is available at: http://sourceforge.net/project/showfiles.php?group_id=29880.

I performed the installation of Nagios as root and Nagios was smart enough to set the appropriate permissions as well as ownership on files. Nagios is not required to run as root, nor would I recommend it. The default user and group that Nagios configuration files contain is called "nagios".

It's best if you create this user and group as follows. The useradd command will automatically create a user and a group called "nagios", and will set the primary group of user "nagios" as group "nagios". The home directory will also automatically be set as /home/nagios, but you can modify that using the -d option:

root:/usr/local/src:#useradd nagios -d
  /usr/local/nagios

Next, unzip the Nagios program:

root:/usr/local/src:#id
uid=0(root) gid=0(root)
groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),2(daemon)
root:/usr/local/src:#pwd
/usr/local/src
root:/usr/local/src:#ls
nagios-1.0.tar.gz  nagios-plugins-1.3.0.tar.gz
  
root:/usr/local/src:#tar xvfz  nagios-1.0.tar.gz

cd to the nagios-1.0 directory and run the configure command:

root:/usr/local/src/nagios-1.0:#./configure

The configure command assumes the following defaults, which you can change as you see fit. I recommend you leave the default installation directory as is:

./configure --prefix=/usr/local/nagios
  --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios
  --with-nagios-user=nagios --with-nagios-grp=nagios

Compile Nagios:

root:/usr/local/src/nagios-1.0:#make all

To install the binaries, run:

root:/usr/local/src/nagios-1.0:#make install

Three sample configuration files are automatically created by Nagios -- nagios.cfg-sample, cgi.cfg-sample, and resource.cfg-sample. If you want these files installed in the appropriate directory (i.e., /usr/local/nagios/etc), assuming you choose the default installation path of "/usr/local/nagios", run the following command:

root:/usr/local/src/nagios-1.0:#make install-config

To automatically start Nagios at the next reboot, run:

root:/usr/local/src/nagios-1.0:#make install-inits

which will place the Nagios startup file "/etc/init.d/nagios". I recommend creating start and stop links for the appropriate run level in /etc/rc3.d as follows:

root:/etc/rc3.d:#ln -s ../init.d/nagios S100nagios
root:/etc/rc3.d:#ln -s ../init.d/nagios K01nagios

Installing and Configuring Plugins

Your next step is to install the plugins Nagios uses to monitor hosts and services. Downloading "nagios-plugins-1.3.0.tar.gz" (http://sourceforge.net/project/showfiles.php?group_id=29880) will get the latest set of plugins. You do not have to configure the plugins in the same source directory as Nagios. For example, I downloaded the plugins module in /usr/local/src and then went on to uncompressing and untarring the plugins file as follows:

root:/usr/local/src:ls -l nagios-plugins-1.3.0.tar.gz

-rw-r--r--    1 root     root   491510 Mar  2 00:09 nagios-plugins-1.3.0.tar.gz
root:/usr/local/src:gzip -d nagios-plugins-1.3.0.tar.gz

root:/usr/local/src:tar xvf nagios-plugins-1.3.0

The above procedure should create a nagios-plugins-1.3.0 directory where it is untarred. Once you cd to the nagios-plugins-1.3.0 directory, you must configure the plugins and install them, which can be done as follows:

root:/usr/local/src/nagios-plugins-1.3.0:./configure
root:/usr/local/src/nagios-plugins-1.3.0:make all
root:/usr/local/src/nagios-plugins-1.3.0:make install

Since we choose the default install path of Nagios (i.e., /usr/local/nagios), we do not have to specify the path for Nagios when configuring the plugins. However, if you choose another path, use the following syntax when running the configure script for the plugins:

root:/usr/local/src/nagios-plugins-1.3.0:./configure
 --prefix=BASEDIRECTORY --with-nagios-user=SOMEUSER
 --with-nagios-group=SOMEGROUP --with-cgiurl=SOMEURL

Configuration

The next step is to configure Nagios, which is the most involved part of the project. Nagios has a number of configuration files that can be seen by running ls in /usr/local/nagios/etc:

root:/usr/local/nagios/etc:ls
contactgroups.cfg-sample
escalations.cfg-sample    hosts.cfg-sample
misccommands.cfg-sample   services.cfg-sample
cgi.cfg-sample            contacts.cfg-sample
hostextinfo.cfg-sample    nagios.cfg-sample
         
checkcommands.cfg-sample
dependencies.cfg-sample   hostgroups.cfg-sample
resource.cfg-sample       timeperiods.cfg-sample

Notice that all of these have -sample appended because they are sample configuration files. Before using any of these files, remove "-sample".

We can effectively break the configuration files of Nagios into the following types:

1. Main configuration file (nagios.cfg)
2. Resource file(s) (resource.cfg)
3. Object configuration files
4. CGI configuration file (cgi.cfg)
5. Extended information configuration files

Nagios is highly customizable, so I will only cover some of configuration directives in this article. Since nagios.cfg is the main configuration file for Nagios, start configuration of nagios.cfg as follows:

root:/usr/local/nagios/etc:su - nagios
nagios:/usr/local/nagios:cd etc
nagios:/usr/local/nagios/etc:cp
nagios.cfg-sample nagios.cfg
nagios:/usr/local/nagios/etc:vi nagios.cfg

There are more than 65 configuration directives in nagios.cfg. Because we chose the default configuration, we do not have to modify any of the configuration directives in this file to get Nagios up and running. A few of the directives are listed below.

Specify the location of the nagios log file:

log_file=/usr/local/nagios/var/nagios.log

Specify which user nagios should run as:

nagios_user=nagios

Specify which group nagios should run as:

nagios_group=nagios

If you want Nagios to use syslog, leave the option below at 1:

use_syslog=1

Specify how often you want Nagios to rotate its log file (d stands for daily):

log_rotation_method=d

Object Configuration

Next, modify the object configuration files. According to Ethan Galstad (the main developer of Nagios), an object is "simply a generic term I use to describe various data definitions you need in order to monitor anything." The files that contains object definitions include the following:

1. checkcommands.cfg
2. contactgroups.cfg
3. contacts.cfg
4. dependencies.cfg
5. escalations.cfg
6. hostgroups.cfg
7. hosts.cfg
8. misccommands.cfg
9. services.cfg
10. timeperiods.cfg

Contacts Definitions

The contacts.cfg file contains information on contacts. A contact is a systems administrator or other person who will be notified in the event of an emergency. An example entry is as follows:

define contact{
        contact_name                   joe
        alias                          System Admins
        service_notification_period    24x7
        host_notification_period       24x7
        service_notification_options   w,u,c,r
        host_notification_options      d,u,r
        service_notification_commands  notify-by-email
        host_notification_commands     host-notify-by-email
        email                          joe@mycompany.com
         }


define contact{
        contact_name                   mary
        alias                          System Admins
        service_notification_period    24x7
        host_notification_period       24x7
        service_notification_options   w,u,c,r
        host_notification_options      d,u,r
        service_notification_commands  notify-by-email
        host_notification_commands     host-notify-by-email
        email                          mary@mycompany.com
         }

Next, edit the contactgroups.cfg file and make an entry as follows:

define contactgroup{
        contactgroup_name      admintrators
        alias                  System Admins
        members                joe,mary
         }

Contact groups are very useful because you can create groups such as HttpAdmins or SmtpAdmins and use these groups for notification of respective services. (See sidebar "contact.cfg Parameters".) In the above example, I have created a group called "administrators" that contains user "joe" and "mary". I use this group for notification of all hosts and services because we do not have separate HTTP or SMTP administrators.

Host Definitions

To define the hosts that Nagios will monitor for us, we need a host definition file. Nagios host definition file is called hosts.cfg and in case of the default installation resides in /usr/local/nagios/etc/hosts.cfg. Defining a host in the host definition file means providing a name for the host, an IP address, and defining other parameters that Nagios will use to monitor the host.

Configuration of object files can be done either with the "old" method or the template-based "new" default method. The "old" method was file-based, but did not use templates. Using a template for host and service definition simplifies adding new hosts and services for monitoring. For example, consider adding a host monitor by editing hosts.cfg file. A host monitor is an instance of a host that defines attributes about the host (such as name, IP address, how many checks Nagios should run against the host) that can be seen in the following example:

define host{
         name                      generic-host

         notifications_enabled          1
         event_handler_enabled          1
         flap_detection_enabled         1
         process_perf_data              1
         retain_status_information      1
         retain_nonstatus_information   1
         check_command                  check-host-alive
         max_check_attempts             10
         notification_interval          0
         notification_period            24x7
         notification_options           d,u,r
         register                       0
         }

define host{
         use                     generic-host
         host_name               www1
         alias                   www1
         address                 10.10.10.1
         parents                 gw-10
         }

        }

define host{
         use                     generic-host
         host_name               www2
         alias                   www2
         address                 10.10.10.2
         parents                 gw-10
         }

This shows a template called "generic-host", and a host called "www1" and "www2", which uses the generic template. The generic-host template defines all the configuration directives that any host using this template will inherit. You can override the inheritance by duplicating the template-specified configuration in the host definition itself. For example, if you want notifications disabled for host www2 while you perform maintenance on it, you would modify the host entry as follows:

define host{
         use                     generic-host
         host_name               www2
         alias                   www2
         address                 10.10.10.2
         notifications_enabled   0
         parents                 gw-10
         }

In the above case, www will inherit all properties of generic-host, and the notifications_enabled property of the generic-host template will be overwritten with the new value of 0 (i.e., notifications disabled for host www2).

The required configuration directives are shown in the sidebar (of the same name). For the optional configuration directives, visit:

http://nagios.sourceforge.net/docs/1_0/xodtemplate.html#host

Hostgroups

The hostgroups.cfg file lets you create a logical group of hosts. You could modify the hostgroups.cfg file to create a group for HTTP servers as follows:

define hostgroup{
  hostgroup_name  http-servers
  alias           HTTP Servers
  contact_groups  admin
  members         www1, www2
  }

Logically grouping servers based on the service to monitor is one possible methodology. A host should belong to at least one group and may belong to multiple groups. For example, if one of your HTTP servers is also your email server, you can have another hostgroup such as:

define hostgroup{
  hostgroup_name  smtp-servers
  alias           SMTP Servers
  contact_groups  admin
  members         www2
  }

Service Definitions

We now need to define a service to monitor on our previously defined host. You can use templates in service definitions just as you can with host definition templates. I first define a generic service (called "generic-service") that has a large number of configuration options:

define service{
         name                     generic-service
         active_checks_enabled           1
         passive_checks_enabled          1
         parallelize_check               1
         obsess_over_service             1
         check_freshness                 0
         notifications_enabled           1
         event_handler_enabled           1
         flap_detection_enabled          1
         process_perf_data               1
         retain_status_information       1
         retain_nonstatus_information    1
         is_volatile                     0
         check_period                    24x7
         max_check_attempts              3
         normal_check_interval           3
         retry_check_interval            1
         contact_groups                  admin
         notification_interval           0
         notification_period             24x7
         notification_options            w,u,c,r
         register                        0
         }

I can then create a service definition for an http-servers group that inherits the generic-service properties and adds a few of its own:

define service{
         use                             generic-service
         hostgroup_name                  http-servers
         service_description             HTTP
         contact_groups                  admin
         check_command                   check_http
         }

(See the sidebar for Nagios required service parameters.) For additional directives, visit: http://nagios.sourceforge.net/docs/1_0/xodtemplate.html#service.

More on Plugins

Plugins do the actual monitoring for Nagios. The core Nagios engine calls the plugins to check on hosts and service. Nagios provides a number of plugins, and you can write your own plugins if want to monitor almost any host or service. The plugins that come with Nagios have help available when you execute the plugins with the -h option.

When you specify the check_command option in services.cfg for a particular service, Nagios looks up the check_command in the file checkcommands.cfg and then runs the command based on the specified options. Look at one of the entries in checkcommands.cfg:

  define command{
           command_name    check_http
           command_line    $USER1$/check_http -H
$HOSTADDRESS$
       }

The $HOSTADDRESS$ is a macro that Nagios expands to the IP address of the host as defined in hosts.cfg. Nagios provides 32 user-defined macros from $USER1$ through $USER32$. All macros are defined in the resource.cfg file. In the resource.cfg file, $USER1$ has already been defined as "$USER1$=/usr/local/nagios/libexec". Therefore, Nagios will execute:

/usr/local/nagios/libexec/check_http -H 10.10.10.1

for the following entry in services.cfg file:

define service{
         use                             generic-service
         host_name                       www1
         service_description             HTTP
         contact_groups                  admin
         check_command                   check_http
         }

Remember from the previous example that www1 has IP address 10.10.10.1. When the service check executes, Nagios will look up the command check_http in checkcommands.cfg and will run the command with the options specified in checkcommands.cfg.

Web-based GUI

Nagios provides an excellent, optional Web-based user interface. If you want to use the Nagios Web interface, you must edit cgi.cfg and also your Web server configuration file. In my case, I am using Apache and have modified the httpd.conf as follows:

ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin/
     <Directory "/usr/local/nagios/sbin/">
         AllowOverride AuthConfig
         Options ExecCGI
         Order allow,deny
         Allow from all
     </Directory>
     Alias /nagios/ /usr/local/nagios/share/
     <Directory "/usr/local/nagios/share/">
         AllowOverride AuthConfig
         Options None
         Order allow,deny
         Allow from all
     </Directory>

The above configuration will let me access the Nagios GUI by typing:

http://nagioshost/nagios/

The ExecCGI options allows execution of CGI scripts and the AllowOverrise AuthConfig allows me to use the directives below in my .htaccess file:

AuthName        "Nagios Access"
AuthType        Basic
AuthUserFile    /usr/local/nagios/etc/htpasswd.users
require         valid-user

Setting up authentication to access the Nagios Web interface is a good idea. To set up authentication, create a .htaccess file (as shown above) and set up a new user as follows:

htpasswd -c /usr/local/nagios/etc/htpasswd.users admin

For the Nagios cgi configuration file cgi.cfg, I have modified the following entries:

refresh_rate=90

xedtemplate_config_file=/usr/local/nagios/etc/hostextinfo.cfg

default_statusmap_layout=3
host_unreachable_sound=hostdown.wav
host_down_sound=hostdown.wav
service_critical_sound=critical.wav
service_warning_sound=warning.wav
service_unknown_sound=warning.wav
normal_sound=noproblem.wav

The refresh_rate is the rate at which Nagios will refresh the Web page for the status, statusmap, and extinfo CGIs. The hostextinfo.cfg file is explained in the "Beautification" section later in this article.

The Web interface lets you view a status "map" of the hosts being monitored. The map can be laid out in the following coordinates:

0 = User-defined coordinates
1 = Depth layers
2 = Collapsed tree
3 = Balanced tree
4 = Circular
5 = Circular (Marked Up)

User-defined coordinates allows a user to pick and choose where hosts are displayed on the status map. The coordinates of a host on the status map should be defined in the extended information file, which is hostextinfo.cfg by default. Coordinates are defined as 2d or 3d, in positive integer format, with x and y coordinates. The coordinates you specify are for the upper left-hand corner of the host icon that is drawn. For example, in the hostextinfo.cfg file:

define hostextinfo{
         host_name          www
         icon_image         web.png
         vrml_image         web.png
         statusmap_image    web.gd2
         2d_coords          100,250
         3d_coords          100.0,50.0,75.0
         }

The depth layers option displays the parent nodes and not the child nodes. Child nodes are visible by clicking on a parent node. Depth layer is useful for a large network where there are many parent/child relationships.

The collapsed tree option displays all hosts in a layered tree-like manner, giving you the option of clicking on a host and zooming in to see its child nodes, if any are defined. Child nodes are not displayed by default.

The balanced tree option displays all the hosts that are being monitored as nodes of a tree, with the root of the tree being the Nagios server process. All nodes are considered equal distance from the root.

The circular option shows all the hosts around the central Nagios server, arranged in a circular manner. This gives a cluttered view if you have many hosts being monitored.

The sound parameters define sounds that are played on the client Web browser. I think the sounds are useful because I do not have to constantly watch the Nagios GUI or check my email to be notified of a system-down status.

Beautification

If you want to use image icons to represent the host and services you are monitoring, Nagios comes with a decent amount of icons in jpg, gif, jd2, and png format. You can download six additional logo packs from:

http://www.nagios.org/download/

Each logo pack contains of a number of logos that you can use to represent your hosts and services.

Conclusion

Nagios can be a useful tool in a network for monitoring hosts and services. The ability of Nagios to remotely monitor services without the installation of software on the monitored hosts is a great plus. Nagios can also be used without SNMP or along with SNMP to effectively monitor a network. The professional GUI that Nagios offers rivals those of many commercial tools. If you are low on budget or favor free software over commercial ones, then give Nagios a try. You will not be disappointed.

Resources

Nagios html and pdf documentation -- http://www.nagios.org/docs/

Nagios FAQ -- http://www.nagios.org/faqs/

Nagios email list -- http://www.nagios.org/mailinglist.php

There are seven mailing lists: nagios-[users,announce,devel,checkins] and nagiosplug-[help,devel,checkins]. The one most helpful to new users is probably nagios-announce (for new announcements). If you want extra help after reading the documentation and the FAQ, then try nagios-users.

Syed Ali has a Master's in Computer Science from Stevens Institute of Technology and is an MCP, MCSE, MCT, CCNA, CCAI, RHCE, and SCSA. He currently works for a research laboratory in Princeton, New Jersey, as a supervisor for systems administration. Syed can be contacted at: alii@paul.rutgers.edu.