Cover V12, I07

Article
Figure 1
Figure 2
Figure 3
Figure 4
Sidebar

jul2003.tar

Monitoring Latency with Smokeping

Bill Kramp

Smokeping is a great tool that complements the traditional ping utility. Ping will quickly show whether a server is up and how fast it is responding, but ping does not provide a way to study the round trip time (RTT) -- the latency of the connection to the server over a time interval. Smokeping, however, writes latency information into a database. The smokeping program sends many requests during each polling period to the network device and records the response times. Smokeping can use pings, http requests, or SMTP requests to the server or network service to collect the latency information. This data can then be viewed with a browser to show the latency for the network service. The graphs can also show the latency distribution for each sample period.

Smokeping consists of two main parts:

  • Smokeping -- The program that tests network devices and records latency information in the database.
  • smokeping.cgi -- A script that displays latency information.

Smokeping usually runs with a polling period of 5 minutes to collect the latency to the network services. It reads a configuration file that details the network server and service to test. The default setting is to send 20 requests during each polling period to the devices defined in the configuration file.

All of the data is collected and stored using the Round Robin Database (RRD). RRD will probably be familiar to people who use MRTG, Cricket, or the many other front ends used for collecting SNMP data from network switches and routers. In this article, I will cover how to configure and use smokeping to measure latency on the network. I will provide a sample configuration script and some example graphs of collected data. I will also explain how I used Smokeping to document and resolve problems on the network.

Software Installation

Smokeping is a free software package with a GPL license. You can download smokeping from the download section at the Web site:

http://people.ee.ethz.ch/~oetiker/webtools/smokeping/
Several other utilities are required to make smokeping work, such as Perl, Apache, fping, echoping, SpeedyCGI, and RRD. You will also need Socket6 if you need to resolve IPv6 addresses. Links to these utilities can be found in the resources section at the end of this article and at the smokeping Web site. All the utilities must be installed before smokeping will work. Download, configure, and install the software using the instructions included with each utility. Use gunzip and tar to extract each of the utilities. Most of the software can then be configured, compiled, and installed with the following commands:

# ./configure
# make
# make install
Some of the utilities may need to be configured with other options to meet your needs. Make sure to read the README and INSTALL instructions with each utility before installing.

Smokeping

The smokeping software should be installed under a user with limited rights like "smokeping". After creating a user named smokeping, install the smokeping software in this smokeping home directory, using the smokeping account. For this article, the home directory for smokeping will be /home/smokeping. Within the smokeping directory "/home/smokeping", make the following directories: var, public_html, and public_html/.simg. In the directories bin and etc, remove ".dist" from the end of the files that end with ".dist". This can be done by using the mv command to move them, or by making a copy with the cp command. Copy the "/home/smokeping/htdocs/smokeping.cgi.dist" script to the new public_html directory. Examples:

cp /home/smokeping/etc/config.dist /home/smokeping/etc/config
cp /home/smokeping/htdocs/smokeping.cgi.dist /home/smokeping/public_html/smokeping.cgi
The smokeping and smokeping.cgi scripts must be configured with the correct paths to the utilities and configuration file. Edit /home/smokeping/bin/smokeping to make sure the path for Perl is correct. Also check that the paths for the smokeping and RRD library directories, as well as the path for the configuration file, are defined correctly. This may require the use of the find command, if you are not sure of their locations. The paths for the lib and config file for smokeping could look like this:

#!/usr/bin/perl -w
use lib qw(/local/rrdtool/lib/perl);
use lib qw(/home/smokeping/lib);
Smokeping::main("/home/smokeping/etc/config");
The next file to edit is the /home/smokeping/public_html/smokeping.cgi script to make sure the path to speedy, the SpeedyCGI tool, is correct. As in the smokeping utility, the configuration file, and the library, directories for smokeping and RRD need to be defined. The paths for the config and lib directories will be the same as the smokeping utility:

#!/usr/bin/speedy -w
use lib qw(/local/rrdtool/lib/perl);
use lib qw(/home/smokeping/lib);
Smokeping::cgi("/home/smokeping/etc/config");
Apache

Apache only needs a few changes in its httpd.conf file to allow access to the smokeping.cgi script in the smokeping directory. The AddHandler option for CGI scripts must be uncommented by removing the leading pound sign (#). Access to the public_html directory must be added to the httpd.conf file as well. The changes for httpd.conf, shown in Listing 1, should allow smokeping to work, but should not be considered a secure configuration. Steps should be taken to secure the configuration of Apache and to limit access to the smokeping.cgi script in a manner that will meet your security policies. (All listings are on the Sys Admin Web site: http://www.sysadminmag.com.)

Utility Testing

It is important to test each utility to make sure the utilities are working properly. Smokeping will only work if the utilities on which it depends are working correctly. Understanding how each utility works will make configuration and troubleshooting much easier. These utilities can also be useful for testing and troubleshooting network services in other applications. The following sections describe three important utilities that support smokeping:

  • fping
  • echoping
  • curl

For more on configuring smokeping's support utilities, refer to the utility documentation.

Fping

Fping can send out pings to a single address or multiple IP addresses at the same time. After sending out the many concurrent pings, it will keep track of the responses from each server. By comparison, the ping utility can only query one server at a time. If an IP address doesn't respond to fping, it will try several more times. You should only use this tool to ping hosts or sets of IP addresses that you are authorized to probe. Using this tool to ping hosts or scan networks that you do not control could be viewed as an illegal activity. A simple query to test fping is:

Command->
# fping -e -c 3 192.168.1.1
Response->
192.168.1.1 : [0], 84 bytes, 0.34 ms (0.34 avg, 0% loss)
192.168.1.1 : [1], 84 bytes, 0.32 ms (0.33 avg, 0% loss)
192.168.1.1 : [2], 84 bytes, 0.32 ms (0.32 avg, 0% loss)
192.168.1.1 : xmt/rcv/%loss = 3/3/0%, min/avg/max = 0.32/0.33/0.34
The response to the fping command showed three pings and their latency. It also showed the minimum, maximum, and average latency value for the three pings. To make fping executable by the smokeping account, fping must be made setuid root. This will allow processes other than root, like smokeping, to run the fping command. Under Linux, the command to make fping setuid root is:

# chmod 4755 fping
Echoping

Echoping can be used to test several types of network services, such as http, https, smtp, chargen, etc. The three main types used by Smokeping are http, https, and smtp. To test an http service, use the -h option to specify that the echoping request will be an html page, followed by the path. The server and port for the Web server are then specified, with a colon used to separate the server address and port number. The -n followed by a number option informs echoping to execute the command three times:

Command->
Echoping -n 3 -h / www.domain-name:80
        
Response->
Elapsed time: 0.002667 seconds
Elapsed time: 0.002565 seconds
Elapsed time: 0.002613 seconds
---
Minimum time: 0.002565 seconds (99805 bytes per sec.)
Maximum time: 0.002667 seconds (95988 bytes per sec.)
Average time: 0.002615 seconds (97897 bytes per sec.)
Median time: 0.002613 seconds (97972 bytes per sec.)
Echoping should respond with the message showing the elapsed time for each of the http requests. It will also display the minimum, maximum, average, and median latency values for the Web server. Testing all the Web sites manually will be easier to troubleshoot than within smokeping.

If encrypted http connections are going to be used, the OpenSSL library will need to be installed first. Issue the Echoping configure command with the "-with-ssl" option. Then issue the commands make and make install.

It is also very easy to test the response times of email servers. Issuing the following command can test the latency of an SMTP connection:

# echoping -n 3 -S 192.168.1.1
Curl

Curl is another useful tool for testing http and https connections. Echoping will inform you of the response time or a failure, but it is sometimes useful to know exactly what the http or https response looks like. Curl will display the html response in text form, and not interpret the html commands. With command-line options that are available, it can use either GET or POST to send information to a Web server. It can also send user authentication information to a server to access protected resources. Here is a sample use of curl to display an https Web page:

# curl --insecure https://web.domain-name
The "--insecure" option tells it not to use any default CA certificates. The "-cacert <CA certificate>" option would be used to define a certificate to be used during a connection.

Smokeping Configuration

The smokeping configuration script consists of five basic sections: general, database, presentation, probes, and targets (see sidebar). The format is hierarchical structure for use with the parseconfig module that comes with smokeping. The sample config script in Listing 2 shows a basic configuration file for smokeping.

Running Smokeping

It is always a good idea to run things manually before turning them loose. Running smokeping with the debug option will cause it to display information about what it is doing. Smokeping will not run as a daemon while in debug mode, but will only run once and display more detailed information about what it is doing:

Command->
/home/smokeping/bin/smokeping --debug 

Response (condensed)->
### fping seems to report in 1 milliseconds
Launched successfully
FPing: probing 4 targets
EchoPingHttp: probing 3 targets
EchoPingHttp: forks 5, rounds 1, timeout 300
EchoPingHttp: executing cmd /local/bin/echoping -h / -n 20 mars:80
EchoPingHttp: mars: got 0.001975 0.002002 0.002008 0.002016 
0.002018 0.002022 0.002022 0.002024 0.002028 0.002030 0.002031 
0.002040 0.002059 0.002060 0.002068 0.002072 0.002106 0.002251 
0.002315 0.004432
The response showed that four targets were defined in latency testing with fping. Three http targets were also defined, of which only the "mars" Web server debug data is displayed. The "mars" http response shows the results of the 20 latency tests.

To start Smokeping as a daemon, execute the smokeping utility without any options. Smokeping will daemonize a process and return to the shell prompt. The process can be restarted to read the configuration file by running smokeping with the "--restart" option.

Web Pages

Smokeping uses a CGI script to query the RRD database and create the graphs and Web pages. To save a couple of keystrokes in specifying the URL to the CGI script, create an index.html file (Listing 3) in the smokeping public_html directory to automatically redirect to the smokeping.cgi script.

Graphs

Smokeping uses two different types of graphs to display the latency information: an overview graph and a detailed graph. The overview graph shows the RTT latency value for the past 10 hours (Figure 1). The graph displays the median RTT value for that time period, as well as the last retrieved value. The average packet loss is also calculated and displayed on the graph. Clicking on the graph will retrieve the detailed graph for that server or service being monitored.

The detailed smokeping graph (Figure 2) use color codes to indicate any packet loss. The color-coded bar indicates that median value of the packets sent during each polling period. Smokeping will use green if there was no packet loss. Light blue is used for losing one packet, with increasingly darker shades for loss increases up to four. Purple and red are used when the packet loss is 10 or 19 per polling period. If all packets during a polling period are lost, nothing will be displayed for that time period on the graph.

Smokeping also uses grey bars to indicate the distribution of the response times. A dark grey indicates a tight packing of many polls with the same RTT. A tall bar of light grey indicates a large variance of the latency in the packets sent.

When viewing the smokeping graphs for latency, be aware that the data collected using pings may show packet loss. Many routers will drop the ICMP ping packets when they are busy handling other traffic. ICMP is given a much lower priority when compared to the other UDP and TCP traffic. It is better to test for latency using http, smtp, or the other supported services under smokeping.

Resolving Network Problems with Smokeping

The following sections describe some examples of how smokeping can help troubleshoot network problems.

Random Latency Problem

Last year, some users noticed a problem with their terminal application experiencing a response delay of 5 to 6 seconds to a remote site. The problem did not happen all the time, and of course, never occurred when I tried it. The remote site could not detect any problem with the network equipment. I configured smokeping to monitor the latency to the remote server and the routers in between. After several days, the data showed that the problem was occurring at the remote sites server (Figure 3). The graph showed that the server was experiencing latencies of up to 7 seconds at random intervals. The router graph (Figure 4) does show some packet loss, but not at the same time as the server. In some cases, the router has zero packet loss with no change in latency, while the server shows latency problems.

Copies of the graphs showing smokeping's findings were emailed to the remote site. The remote site determined that the router was very busy handling our inbound connection, and lots of outbound connections at those times. The remote site then used a packet shaper to give priority to our application traffic, which resolved the problem.

Broadband Problem

In another case, a remote site using a broadband provider for Internet access began to see slower response times in accessing our site. Another site with the same broadband provider was not experiencing this problem. The Cricket/RRD data showed that utilization of the bandwidth was not the cause. Looking at the smokeping data showed the latency had been slowly getting worse over a few days. I contacted the provider and explained the problem along with the data I was seeing with the smokeping graphs. They sent a technician who diagnosed a problem with the external broadband equipment.

Monitoring Server Performance

Smokeping can also be used to monitor the latency of a Web server. We have used it to monitor the response time of a Web server with a backend database. Smokeping makes a request that only the Web server handles, and makes requests that make use of the backend database. Smokeping provides a history of how the two servers are performing. When a user complains of performance problems, I can look at the graphs to see what has changed and when. This helps reduce troubleshooting time if we can see a correlation to a server patch or modification.

Conclusion

Smokeping has proven valuable to me in performing my job responsibilities. Learning how to use it, and the other utilities like curl and echoping, has been very useful with other network projects. Identifying a problem, and backing that up with a graph, went a long way toward resolving network problems quickly. I think Smokeping just reinforces the old saying that a picture is worth a thousand words.

Resources

Smokeping -- http://people.ee.ethz.ch/~oetiker/webtools/smokeping/

RRD -- http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/pub/

Apache -- http://www.apache.org/

Perl -- http://www.perl.com/

Speedy CGI -- http://daemoninc.com/speedycgi/

Fping -- http://people.ee.ethz.ch/~oetiker/webtools/smokeping/pub/fping-2.4b2_to.tar.gz

Echoping -- http://echoping.sourceforge.net/

Curl -- http://curl.haxx.se/download.html

Bill Kramp is the Network Administrator for the Finger Lakes Community College, located in Upstate New York. He's been employed at the college since 1989, where he designs and manages the networking for the main campus and the two Extension Centers. He uses many open source tools for monitoring the network: Cricket/RRD, Netsaint (Nagios), smokeping, Snort, etc. Bill can be reached at: krampwd@flcc.edu.