Automating
the spamlist.org Blacklist
Hal Pomeranz
As I become increasingly frustrated with the amount of spam I
receive via the Deer Run mail servers, I start reaching for ever
more draconian tools for blocking spam. One of the more useful blacklists
that I've found on the Internet is the one at www.spamlist.org.
However, the maintainers of the list (probably in an effort not
to get sued by the spammers) go out of their way not to provide
the list in a form that can be automatically incorporated into your
mail server configuration. So, in this article, I will share a few
simple tools I've created for automatically using this list
on my own personal mail servers.
It should be noted that the spamlist.org blacklist is a very dangerous
list. Using it blindly will result in a huge number of false positives,
because the spamlist.org list blocks entire countries (China, Korea,
Brazil, Greece, and others) that are known spam-havens. Before implementing
the automated solution presented here, install the list manually
and watch your mail server logs carefully to make sure that you
are not rejecting email you legitimately wish to receive. In my
case, I also maintain an extensive "white list" of people
from whom I do want to receive email, despite the spamlist.org list
and other blacklists to which I subscribe. However, this may not
be reasonable for a large enterprise, so please approach the spamlist.org
list with extreme caution.
Sendmail Configuration
The easiest way to use the spamlist.org list is via Sendmail's
access_db feature. While this feature is described more fully
in the O'Reilly & Associates Sendmail book, the
simplest way to enable this feature is to add the following macro
to the file you use to generate your sendmail.cf file for
your external mail relay:
FEATURE('access_db', 'hash -o /etc/mail/access') "hash" means that the access database
will be a Berkeley DB-style hash database (alternatively, "dbm"
can be used on systems without Berkeley DB support). This is an
"optional" ("-o") database, meaning Sendmail
will run without complaint if the DB file doesn't exist, and
the name of the database file will be /etc/mail/access.db
(the ".db" extension is added automatically by
Sendmail because of the choice of database type).
Normally, the access database is maintained manually by the local
site administrator and contains entries for sites from which you
don't want to receive email:
Connect:64.35.104 REJECT
Connect:64.119.222 REJECT
Connect:64.143.184 REJECT
From:optinamerica.com REJECT
From:superdealmail.com REJECT
biz ERROR:550 Move out of biz domain
As you can see, the access database gives you fairly fine-grained
control based on the source of the SMTP connection and/or the sender's
domain name. In the last line of the example, we're not only
applying a wholesale block to email originating from ".biz"
addresses, but also supplying a custom error message with the rejection.
In fact, the access database also lets you control the fate of the
incoming email based on the recipient address as well, but that's
not germane to this discussion.
Typically, the access database for a site is created in a text
file called /etc/mail/access. This text file is then converted
into a Berkeley DB file via the makemap program:
cd /etc/mail
makemap -v access < access
The above command creates a file called access.db (again, the
extension is supplied automatically) from the input text file called
access. The -v ("verbose") option tells makemap
to display the key/value pairs as they're being added to the
database. Note that if your particular operating system doesn't
supply the makemap program, the source code is included with
the Sendmail source distribution available from ftp.sendmail.org.
Downloading the spamlist.org Blacklist
The actual spamlist.org blacklist is available from:
http://www.spamlist.org/html/the_list.html
The lines of text we need to feed into our access database are unfortunately
encapsulated in the middle of this rather large HTML document. Luckily,
the lines are set off between "<pre>...</pre>"
tags, so it's a trivial Perl script to extract the lines we need:
#!/usr/bin/perl
$print = 0;
while (<>) {
if (/<\/pre>/) { exit; }
if ($print) {
s/\s*$//;
print "$_\n";
}
if (/<pre>/) { $print = 1; }
}
Note the extra code to remove the DOS-style CR-LF at the end of each
line and replace it with the standard Unix new-line sequence. In a
fit of whimsy, I called this script spamalama.
Now that we can extract the necessary lines from the original
HTML document, what to do with them? I maintain a manual access
database with RCS on my systems, so I don't want the lines
from the spamlist.org list to clobber that. Instead, I redirect
the output of my spamalama script to a separate file and
feed both my manually maintained access DB text file and the spamalama
output file into the makemap program to build my Berkeley
DB file. And since I hate doing things manually, I encoded everything
into a simple Makefile in the /etc/mail directory:
all:: access.db virtusertable.db
virtusertable.db: virtusertable
makemap -v hash virtusertable < virtusertable
access.db: access access-sl
cat access access-sl | makemap -v hash access
access-sl: spamlist.org/the_list.html
spamlist.org/spamalama spamlist.org/the_list.html > access-sl
As you can see, I also use the same Makefile for updating other
database files used by Sendmail, like the virtusertable.db
file for the various virtual domains I host.
But let's examine the rules for building the access.db
file. The access.db file is built from two dependencies --
the access file that I maintain by hand, and the access-sl
file that is created by calling the spamalama script. The
spamalama script and the HTML input file it feeds on have
been placed in the /etc/mail/spamlist.org directory (the
pathname is arbitrary, but it is convenient to have them grouped
together in a subdirectory of /etc/mail). So now if I update
either the /etc/mail/access file or the /etc/mail/spamlist.org/the_list.html
file and then run make in the /etc/mail directory,
my access database will automagically get updated. Hooray!
Automating Downloads from www.spamlist.org
The only remaining item is a script that I can run from cron
to automatically update the /etc/mail/spamlist.org/the_list.html
file on my machine and run the appropriate make command.
With the help of the wget utility (available from any FSF
archive site), this is no problem:
#!/bin/sh
MAILDIR=/etc/mail
LISTDIR=$MAILDIR/spamlist.org
URL=http://www.spamlist.org/html/the_list.html
PATH=/bin:/usr/bin:/usr/sbin:/usr/local/bin
export PATH
FILE='basename $URL'
cd $LISTDIR
mv $FILE $FILE.bak
wget $URL
cd $MAILDIR
make
Note that I like using wget here because wget automatically
preserves timestamps on the files that it downloads. This means that
if the spamlist.org site hasn't been updated since the last time
I ran my script, then the timestamp on the the_list.html file
won't change and my make command won't do anything.
Assuming I install the above script as /etc/mail/spamlist.org/update.sh,
the cron invocation is something like:
45 23 * * 0 /etc/mail/spamlist.org/update.sh
Note that the spamlist.org site is generally updated on a weekly basis
at best, so running the script once a week as shown here is probably
sufficient.
Conclusion
Although this collection of scripts and Makefiles works well for
my personal email server at Deer Run Associates, I cannot stress
enough how aggressive the spamlist.org blacklist is. Please use
caution when implementing this list at your site or you may experience
terrible consequences. Read and heed the disclaimers at http://www.
spamlist.org. You have been warned.
Hal Pomeranz (hal@deer-run.com) is a relatively young
curmudgeon who hates spam in all forms. This causes him to take
perilous and foolhardy steps to protect his inbox from pollution. |