Monitoring Web Server Reliability
Carlos Ramirez
Monitoring the reliability of a Web server is a task often ignored by Web administrators. Besides configuring and tuning a Web server for optimal performance, ensuring the reliability and availability of the content must also be taken into account. This is especially true in an intranet environment where the primary purpose of internal Web sites is to provide information (company announcements, project data, company services, etc.) as opposed to entertainment. Thus, having broken links or faulty CGI scripts returning the infamous 500 Server Error message is unacceptable. Such errors not only render a Web site useless, but can also be frustrating for users.
Fortunately for the Web administrator, there are many methods that can assist in monitoring a Web server for bad links or mis-configuration errors. Some methods require visitor interaction, while others can be fully automated via a cron job. One commonly used method is the implementation of custom error messages. These messages offer information about the error. More sophisticated messages provide forms that allow the visitors to submit the problems as they occur. Unfortunately, most visitors are not inclined to fill out a form or send an email to the Webmaster after they have been deprived of their information. So Web administrators must take other approaches.
One typical approach is to implement log analyzing tools (or web-bots). These tools can generate every imaginable statistical view of the data available in the server log files. They can help track down broken links and provide some interesting information about your visitors. However, like many sys admins, I often find myself too busy to trudge through the lengthy reports generated by log analyzers. This is not an efficient solution, especially when managing multiple Web servers. Furthermore, these reports lack the right data views for a systems administrator to determine whether a server needs attention.
In this article, I will describe a Perl script that I wrote that analyzes the reliability of a Web server. It compiles a summary based solely on the servers HTTP response codes in the access log file. See Figure 1 for a sample HTTP summary report. The primary goal is to gather pertinent data about a Web server that can give a clear view of how well a server is handling requests and one that does not require too much time to examine. Here, reliability is measured by the calculated percentage of successful requests. A reliable server is one that handles, on average, approximately 95% of incoming requests successfully. However, before you can understand the output produced by the script, you must first become familiar with the HTTP status codes.
Understanding HTTP Status Codes Whenever a user agent (browser, web-bot, etc.) issues an HTTP request for a resource on a Web server, the server responds with a success or error code. This status code is logged in the access log file, along with other information pertaining to the individual request. The server response codes are predefined 3-digit integers (as listed in the HTTP RFC 2068) with associated descriptions for each code. All status codes are categorized into classes as follows:
1xx -- Informational
2xx -- Success
3xx -- Redirection
4xx -- Client Error
5xx -- Server Error
For instance, a request resulting in 304 falls under the Redirection-class of status codes with a textual description of Not Modified. Other frequent HTTP status codes are:
401 -- Unauthorized
403 -- Forbidden
404 -- File Not Found
500 -- Internal Server Error
It is these common server and client errors that cause the most grief for visitors (not to mention Web administrators). (See Figure 1.)
The Perl Script: httpsummary.pl The Perl script analyzes HTTP request entries of the day and generates a summary report based on the HTTP status codes. The report displays the total number of hits, the total data served, and the percentage of successful hits. Additionally, it lists a table of the status codes, the number of occurrences, and percentage of each code. The calculated success ratio is based upon the total number of non-400 and 500 requests divided by the total number of requests. (This ratio is configurable.)
The script provides the following three run modes: interactive command-line mode, non-interactive command-line mode, and an HTML mode. The interactive command-line mode is triggered with the -i option. This will display the HTTP summary to your terminal (or STDOUT). For example:
$ httpsummary.pl -i
Web Access Summary: yourdomain.com
URL: http://yourdomain.com/cgi-bin/httpsummary.pl
------------------------------------------------------------------
Hits: 25 Success Ratio: 48.00% Data Served: 0.03 MBytes
------------------------------------------------------------------
200: OK (Hits: 12)
500: Internal Server Error (Hits: 13)
The non-interactive mode is ideally for cron jobs. In this mode, the HTTP summary is emailed to a designated list of Web administrators. (See Figure 2.)
Finally, there is the HTML mode, which is accessed via a browser. In this mode, you will get an HTML version of the report with additional functionality. (See Figure 3.) The HTML version allows you to view the hits that resulted for a specific status code. You can also look at detailed information for a specific document, such as the time the document was accessed, the remote host that requested it, and the size of the document. I recommend running the script from cron because this eliminates the need to physically monitor a server but, more importantly, it follows the principle objective -- full automation!
Requirements Before you can install and use the script, you will need the following:
1. Perl installed on your system (http://www.perl.com)
2. The Apache Web Server (v1.3.x; http://www.apache.org)
3. The following modules from CPAN (http://search.cpan.org/):
HTTP::Status -- Provides a list of HTTP Status codes and their meanings and follows HTTP RFC 2068 requirements.
HTML::Template -- Provides templating functionality to generate HTML code.
Mail::Sendmail -- Provides platform independent email.
CGI.pm -- Used to handle CGI forms.
Installation and Configuration The installation of the script consists of copying the httpsummary.pl script into a CGI directory defined in your HTTP server. If you do not plan on accessing the script from the Web, you can place the script anywhere you please. The next step is to edit the following configuration variables defined in the httpsummary.pl. A listing of the configuration variables can be located at the beginning of the script in the section labeled Configuration Area. Perl script (Listing 1):
$ACCESSLOG -- /path/to/you/http/access_log
$MAILSERVER -- your_outgoing_mail_server
Required for the interactive command-line mode.
$WEB_URL -- http://www.yourserver.com/cgi- bin/httpsummary.pl
This is used to provide a link to the script in the email message so that the recipient can access the HTML version.
$SERVER -- host_name
This is used for text summary reports. I decided not to explicitly use the UNIX hostname command, since I wanted to make this script portable.
$FROM -- sender_email_address
$MYBTES -- 1 [default]
Display total data transfer size in MB, otherwise display in bytes.
1 is recommended to high-traffic sites.
@RECIPIENT -- qw(recipient1 recipient2 ...)
A space-separated list of admins that will be receiving the report.
@SUCCESS -- (codeNum1, codeNum2,..,codeNumX)
List of codes that should be considered as successfully requested. Used to generate the success ratio. You can use wildcards to indicate sets. For example, 3* allows all 3xx status codes to be considered as successful hits.
There might be times when you want to include/exclude certain status codes as successful hits. (Don't abuse this feature.)
Now that you have configured the script to fit your needs, you are ready to schedule it as a cron job to fully automate your Web server monitoring tasks.
Sample cron entry:
#Min Hour day mo. d-o-w command
#### ### ### ### ### #######
* 9,12,15 * * * \
/usr/local/apache/cgi-bin/httpsummary.pl
The sample cron entry will run the program at 9 a.m., 12 p.m., and 3 p.m. Running the script too many times during the day (or your shift), can decrease the significance of examining the summary report. Also, it is important to schedule times in which you think that the server has handled enough requests. A small sampling of requests can sometimes produce misleading results.
Conclusion Interpreting the results generated by the httpsummary.pl script requires that you fully understand the configuration of the server that you are monitoring and also a familiarization of the HTTP status codes. Mastering theses two requirements will enable you to utilize this script to maximize the reliability of your Web servers. Although this utility will not prevent errors from occurring, it will keep you informed of how well your Web server is handling requests, and it is then the duty of the Web administrator to act on this information.
About the Author
Carlos Ramirez graduated from California State University, Fullerton, with a degree in physics. Since then, he has worked as a UNIX systems administrator and Web developer. His primary interests are Web server management, Web application development, Internet/intranet architecture, and programming in Perl. He can be reached at carlos@quantumfx.com.
|