mar2004.tar

Monitoring News Service from the Client's View

Ralf Van Dooren

Although the death of Usenet has been predicted for a long time, it is still alive and kicking. However, with increasing amounts of data (text, binaries, and spam) traveling each day through Usenet, keeping your news servers running smoothly is becoming more and more a "mission impossible". Performance tuning is an essential tool to keep news flowing efficiently.

To successfully tune your setup, however, you'll need a monitoring environment that can help you making the correct tuning decisions. This article describes a customer-oriented way of monitoring your news service. I will describe a Perl tool (Listing 1) that "behaves" like a typical customer and can therefore provide insight about how your customers are experiencing your news service.

Since the massive rollout of broadband technologies like Cable or xDSL, news server usage has increased dramatically. To give you an idea, I have gathered some statistics throughout the years for an incoming IHAVE feed (i.e., news coming in from other news servers):

January 2002: 200 GB per day
March 2002: 220 GB per day
October 2002: 400 GB per day
May 2003: 600 GB per day
November 2003: 850 GB per day

This data is only based on news articles coming in from external sources: the feed must be sent out to other news peers, and customers reading messages will often generate more traffic than this feed alone. If you have a clientele that actively uses news, the total traffic to and from your news servers will be much, much higher than this.

These stunning figures have an impact on the servers coping with the daily mass of netizens trying to download the latest "announcements". Most ISPs running a news service must implement a complex setup of news feeders, readers, high-speed storage devices, spam filters, and other essential tools and devices.

All these servers can be monitored individually by looking at the performance of the server itself, for example:

CPU load
Disk usage (iowait)
Bandwith utilization
Check whether the news-server port (TCP 119) reacts

This will present good information about the health of a particular server. If the news server port doesn't respond, your customers will likely complain.

But, what if customers are complaining about a bad news server experience while all the servers appear "green" in your Big Brother monitoring environment? It's time to monitor the news service as a whole from the customers' point of view. The NNTP-client tool described in this article will help you start monitoring from the users' perspective.

To begin, we must define some performance values that we consider "good". The method described here uses three values, but you can extend this if you like:

A posting must be retrievable within 10 minutes on the server on which it was posted. (This monitors your internal news system.)
A posting must be retrievable within 30 minutes on a reference news server outside your network. (This monitors your connection to the outside world.)
A posting must be retrievable within 30 minutes after posting through the reference news server on your local news server. (This monitors your connection from the outside world.)

Next, we must pick a reference news server that will be used to check the outside connectivity of your news service. It must have the following three minimum requirements:

Close to your network (e.g., a direct news peer).
Reading and posting rights with an account or from your IP address.
Technically reliable.

After these requirements have been determined and set up, we can start NNTP-client. It simulates a customer by reading articles, posting articles, etc. NNTP-client is written in Perl, and it stores the results in RRDtool, which has been discussed in previous issues of Sys Admin magazine.

The NNTP-client program requires Net::NNTP and Time::HiRes -- both CPAN modules. You can install them manually, but if you have the CPAN module (highly advised if you use CPAN regularly), you can execute the following commands:

# perl -MCPAN -e shell
cpan> install Time::HiRes
cpan> install Net::NNTP
cpan> q

As I mentioned previously, NNTP-client also requires RRDtool. You can use RRDtool in combination with Perl in three ways:

A system call in Perl (system ("rrdtool <commands>"))
RRD pipes module (RRDp)
Shared RRD module (RRDs)

RRDs is the most efficient way, so that is the module that we'll be using. It can be found in "perl-shared" in the RRDtool installation directory. Do the following to install (you must have RRDtool already configured and installed on your system):

$ cd /tmp/rrdtool/perl-shared
$ perl Makefile.PL
$ make test
# make install

An RRD database must also be created so that the outcome of the program can be stored in it. There are many options and possibilities, so you will have to determine your own requirements (what kind of RRAs do you need, etc). NNTP-client expects three data sources (DS). To demonstrate the program, the RRD database will be created:

$ rrdtool create /usr/local/newstest/client.news1.rrd --step 60 \
DS:postok:GAUGE:300:0:100 \
DS:locok:GAUGE:300:0:100 \
DS:remoteok:GAUGE:300:0:100 \
RRA:AVERAGE:0.5:1:2400

The "news1" refers to the host found in the newslib.pl package (Listing 2). For a detailed explanation of RRDtool, I refer to the manuals, which can be found online on the RRDtool homepage.

The NNTP-client program basically performs the following tasks, which can be found under "Main":

1. Post an article in a newsgroup.

2. Ten minutes after posting, check whether the article is locally retrievable.

3. Thirty minutes after posting, check whether the article is retrievable via the reference server.

4. Cancel the article to prevent unnecessary spreading of the articles.

For a successful run, we must define some variables; these can be configured in the beginning of the program:

$max_articles -- How many articles should be posted each run.

$check_local_time -- Time in seconds after which the program checks the availability of the article on the local server.

$check_remote_time -- Time in seconds after which the program checks the availability of the article on the reference server.

$reference_server -- The news server that will be used as a reference server.

$testgroup -- The newsgroup that will be used for the test

$TEXT -- The file that will be used to compose the message. Define the absolute path to the file.

The servers on which the "local" checking will be done are defined in newslib.pl (Listing 2), a small Perl package which I wrote for this tool. It defines the hosts and some routines.

In the post_articles subroutine, a unique message-id is created, the posting is composed, posted to the news server, and the reaction of the server is checked.

You may be using Cleanfeed to filter spam messages. Cleanfeed checks for similar postings. Our program may hit the Cleanfeed scoring mark, and posting will be denied. To circumvent this, there are two methods:

1. Configure Cleanfeed so the IP address on which your test program is not checked by Cleanfeed.

2. Post "random" texts.

Because you may not have control over the Cleanfeed configuration, in this program I use the second method. The $TEXT file will be filled with blocks of FAQs, READMEs, or whatever suits you. The program selects a couple of lines from this "garbage" file and posts this.

The program then sleeps for $check_local_time seconds, checks if the article(s) can be retrieved on the server on which they were posted. After that, it waits until $check_remote_time has elapsed since posting the articles and checks for availability on the remote server.

Act like a good netizen: the test messages just posted have served their purpose, so why let them circulate on the news servers all around the globe? It's time to cancel your message, and this is where the subroutine cancel_messages kicks in; it creates the cancel message and gives a summary in the body on the reason of the cancel. Use a valid email address so people can respond to your messages if needed.

Finally, all these results (e.g., was posting, local and remote retrieving successful) are stored in the RRDtool database using the RRDs call. To check whether the run was successful, execute:

$ rrdtool dump /usr/local/newstest/client.news1.rrd AVERAGE

If the program works, you will see something like this on the end:

1070204520: nan nan nan
1070204580: nan nan nan
1070204640: 1.0000000000e+02 1.0000000000e+02 1.0000000000e+02
1070204700: 1.0000000000e+02 1.0000000000e+02 1.0000000000e+02
1070204760: 1.0000000000e+02 1.0000000000e+02 1.0000000000e+02

If you only see nan nan nan entries, there was no data stored and thus an error occurred.

It is best to run this program from cron approximately every 30 minutes. Add this to your crontab file:

15,45 * * * * /usr/local/newstest/ nntp-client.pl

Your RRDtool database will be filled with the results, which you then can retrieve as needed. It will help you do performance tuning (e.g., is the server running smoothly at this time?) and trend analysis (is there performance degradation over time?). It can also help you see the results of your tuning; perhaps more and more postings are getting through on time.

You can also tweak the $check_local_time and $check_remote_time variables in NNTP-client to see how "fast" your servers are. It should be fairly trivial to change this program into one that checks how long it takes for a posting to appear locally and/or remotely.

Using this kind of tool allows you to better understand your clients' experience of news service and see how successful your tuning is.

References

RRDtool -- http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/

Perl -- http://www.perl.org

CPAN -- http://www.cpan.org

Cleanfeed -- http://www.bofh.it/~md/cleanfeed/

Ralf Van Dooren works as a consultant for Snow (http://snow.nl), a leading Unix consultancy company based in the Netherlands. He holds various certifications for Unix (SCSA,LPIC2) and networking (CCNP) and is challenged by clients to find the right answer (42) for their problems. He can be reached at: r.vdooren@snow.nl.