Cover V13, i13

Article

aug2004.tar

Automating the spamlist.org Blacklist

Hal Pomeranz

As I become increasingly frustrated with the amount of spam I receive via the Deer Run mail servers, I start reaching for ever more draconian tools for blocking spam. One of the more useful blacklists that I've found on the Internet is the one at www.spamlist.org. However, the maintainers of the list (probably in an effort not to get sued by the spammers) go out of their way not to provide the list in a form that can be automatically incorporated into your mail server configuration. So, in this article, I will share a few simple tools I've created for automatically using this list on my own personal mail servers.

It should be noted that the spamlist.org blacklist is a very dangerous list. Using it blindly will result in a huge number of false positives, because the spamlist.org list blocks entire countries (China, Korea, Brazil, Greece, and others) that are known spam-havens. Before implementing the automated solution presented here, install the list manually and watch your mail server logs carefully to make sure that you are not rejecting email you legitimately wish to receive. In my case, I also maintain an extensive "white list" of people from whom I do want to receive email, despite the spamlist.org list and other blacklists to which I subscribe. However, this may not be reasonable for a large enterprise, so please approach the spamlist.org list with extreme caution.

Sendmail Configuration

The easiest way to use the spamlist.org list is via Sendmail's access_db feature. While this feature is described more fully in the O'Reilly & Associates Sendmail book, the simplest way to enable this feature is to add the following macro to the file you use to generate your sendmail.cf file for your external mail relay:

FEATURE('access_db', 'hash -o /etc/mail/access')

"hash" means that the access database will be a Berkeley DB-style hash database (alternatively, "dbm" can be used on systems without Berkeley DB support). This is an "optional" ("-o") database, meaning Sendmail will run without complaint if the DB file doesn't exist, and the name of the database file will be /etc/mail/access.db (the ".db" extension is added automatically by Sendmail because of the choice of database type).

Normally, the access database is maintained manually by the local site administrator and contains entries for sites from which you don't want to receive email:

Connect:64.35.104	        REJECT
Connect:64.119.222          REJECT
Connect:64.143.184          REJECT
From:optinamerica.com       REJECT
From:superdealmail.com      REJECT
biz                         ERROR:550 Move out of biz domain
As you can see, the access database gives you fairly fine-grained control based on the source of the SMTP connection and/or the sender's domain name. In the last line of the example, we're not only applying a wholesale block to email originating from ".biz" addresses, but also supplying a custom error message with the rejection. In fact, the access database also lets you control the fate of the incoming email based on the recipient address as well, but that's not germane to this discussion.

Typically, the access database for a site is created in a text file called /etc/mail/access. This text file is then converted into a Berkeley DB file via the makemap program:

cd /etc/mail
makemap -v access < access
The above command creates a file called access.db (again, the extension is supplied automatically) from the input text file called access. The -v ("verbose") option tells makemap to display the key/value pairs as they're being added to the database. Note that if your particular operating system doesn't supply the makemap program, the source code is included with the Sendmail source distribution available from ftp.sendmail.org.

Downloading the spamlist.org Blacklist

The actual spamlist.org blacklist is available from:

http://www.spamlist.org/html/the_list.html
The lines of text we need to feed into our access database are unfortunately encapsulated in the middle of this rather large HTML document. Luckily, the lines are set off between "<pre>...</pre>" tags, so it's a trivial Perl script to extract the lines we need:

#!/usr/bin/perl

$print = 0;
while (<>) {
    if (/<\/pre>/) { exit; }
    if ($print) {
        s/\s*$//;
        print "$_\n";
    }
    if (/<pre>/) { $print = 1; }
}
Note the extra code to remove the DOS-style CR-LF at the end of each line and replace it with the standard Unix new-line sequence. In a fit of whimsy, I called this script spamalama.

Now that we can extract the necessary lines from the original HTML document, what to do with them? I maintain a manual access database with RCS on my systems, so I don't want the lines from the spamlist.org list to clobber that. Instead, I redirect the output of my spamalama script to a separate file and feed both my manually maintained access DB text file and the spamalama output file into the makemap program to build my Berkeley DB file. And since I hate doing things manually, I encoded everything into a simple Makefile in the /etc/mail directory:

all:: access.db virtusertable.db

virtusertable.db: virtusertable
        makemap -v hash virtusertable < virtusertable

access.db: access access-sl
        cat access access-sl | makemap -v hash access

access-sl: spamlist.org/the_list.html
        spamlist.org/spamalama spamlist.org/the_list.html > access-sl
As you can see, I also use the same Makefile for updating other database files used by Sendmail, like the virtusertable.db file for the various virtual domains I host.

But let's examine the rules for building the access.db file. The access.db file is built from two dependencies -- the access file that I maintain by hand, and the access-sl file that is created by calling the spamalama script. The spamalama script and the HTML input file it feeds on have been placed in the /etc/mail/spamlist.org directory (the pathname is arbitrary, but it is convenient to have them grouped together in a subdirectory of /etc/mail). So now if I update either the /etc/mail/access file or the /etc/mail/spamlist.org/the_list.html file and then run make in the /etc/mail directory, my access database will automagically get updated. Hooray!

Automating Downloads from www.spamlist.org

The only remaining item is a script that I can run from cron to automatically update the /etc/mail/spamlist.org/the_list.html file on my machine and run the appropriate make command. With the help of the wget utility (available from any FSF archive site), this is no problem:

#!/bin/sh

MAILDIR=/etc/mail
LISTDIR=$MAILDIR/spamlist.org
URL=http://www.spamlist.org/html/the_list.html

PATH=/bin:/usr/bin:/usr/sbin:/usr/local/bin
export PATH

FILE='basename $URL'
cd $LISTDIR
mv $FILE $FILE.bak
wget $URL
cd $MAILDIR
make
Note that I like using wget here because wget automatically preserves timestamps on the files that it downloads. This means that if the spamlist.org site hasn't been updated since the last time I ran my script, then the timestamp on the the_list.html file won't change and my make command won't do anything.

Assuming I install the above script as /etc/mail/spamlist.org/update.sh, the cron invocation is something like:

45 23 * * 0 /etc/mail/spamlist.org/update.sh
Note that the spamlist.org site is generally updated on a weekly basis at best, so running the script once a week as shown here is probably sufficient.

Conclusion

Although this collection of scripts and Makefiles works well for my personal email server at Deer Run Associates, I cannot stress enough how aggressive the spamlist.org blacklist is. Please use caution when implementing this list at your site or you may experience terrible consequences. Read and heed the disclaimers at http://www. spamlist.org. You have been warned.

Hal Pomeranz (hal@deer-run.com) is a relatively young curmudgeon who hates spam in all forms. This causes him to take perilous and foolhardy steps to protect his inbox from pollution.