Cover V14, i13

Article

jun2005.tar

Using New Features in SpamAssassin 3.0

Robert Haskins and Dale Nielsen

In this article, we will look at the new features that have been added to SpamAssassin (SA) 3.0 and show you how to use them. We'll start by giving an overview of the changes in SpamAssassin 3.0 and then show an example upgrade of an existing SpamAssassin 2.64 installation.

What's New in SA 3.0

There are a number of changed functionality and new features in SpamAssassin 3.0. These modifications include:

  • License change
  • API changes
  • New spam filtering rules
  • Database and LDAP changes
  • Network changes
  • New plug-in framework

Each of these areas is covered briefly below.

License Change

Note that we are not lawyers, so you should speak to an attorney if you have questions regarding licensing issues. Before version 2, SpamAssassin was licensed under the GNU Public License (GPL). However, with SpamAssassin moving to the Apache Foundation, it is now covered by the Apache Software Foundation License (ASF). If you are simply using the SpamAssassin software in the operations of your network, there is no need to worry. However, if you are a developer and want to incorporate SpamAssassin with a GPL-based software package, then you may have a problem because the ASF license may not be compatible with the GPL. The ASF license is, however, approved by the Open Source Initiative (OSI).

API Changes

There have been significant changes in the application programming interface (API) for SA 3.0. Unless you are a developer, the changes to the SpamAssassin API mostly affect the programs used to integrate SpamAssassin into your mail transfer agent (MTA), such as amavisd-new and MIMEdefang. To run SpamAssassin 3.0 successfully, you should be running the following versions of SpamAssassin MTA integration software at a minimum:

Amavisd-new: amavisd-new-20030616-p8 (2.2.1 is latest)
MIMEdefang: 2.42 and higher (2.49 is latest)
Qmail-Scanner: 1.23 and higher (1.25 is latest)

As with most open source software, "later versions are better". So if you have a choice, go with the latest stable version of the MTA integration software. Of course, the internal SpamAssassin components (such as spamc) have been updated, so if you're using procmail to integrate with your MTA, you don't have to do anything.

New Spam Filtering Rules

A number of new rules are distributed with SpamAssassin 3.0 by default. SpamAssassin 3.0 distributes a total of 937 rules, and SpamAssassin 2.6 distributes 601 rules, making the difference 336 additional rules in 3.0. However, some of this difference may result from deleted rules rather than additional rules. In addition to the changed rules, a number of the default scores have changed, as has been the case in the past with new SA versions (even minor upgrades).

Database and LDAP Changes

The most significant change in the database and LDAP support is the ability to store user preferences in a MySQL or Postgres database or an LDAP store. This is a major development for any larger site that would like to simplify their SpamAssassin setup by placing all of their user information into a database. Also, the Bayesian information (tokens, scores) can be placed into a database or directory. Benefits of doing this include:

  • Ability to store preferences, Bayesian scores, and auto-whitelist information for users who have no home directories on the server running SpamAssassin
  • Ability for the end user to easily manage SpamAssassin settings by implementing a graphical user interface to database/directory

For smaller sites (10-20 users) that don't have LDAP or a database infrastructure already in place, it might not be worthwhile to deploy a database or directory to get these benefits. However, for larger sites or any site with an existing database or directory infrastructure, it is probably worthwhile to implement these features.

Network Changes

Previous versions of SpamAssassin enabled the user to identify trusted networks. SpamAssassin 3.0 provides the ability to further identify trusted networks. Specifically, the idea of "internal networks" is identified. These are machines that are internal mail relay machines or MX relay hosts for your domains. This list is used to detect spammers who send their garbage directly to MX (or backup MX) hosts. Mail relay machines that accept mail directly from dial-up hosts or high-speed DSL/cable modem-connected clients should not be placed on the internal network lists. Instead, they should be placed only on the "trusted networks" list.

New Plug-in Framework

This version of SpamAssassin has the ability for developers to extend the functionality by using plug-ins. This will enable interested parties, either open source or commercial, to easily extend SpamAssassin's abilities as they wish. SpamAssassin 3.0 is distributed with the following four plug-ins:

  • Hashcash
  • RelayCountry
  • Sender Policy Framework (SPF)
  • URIDNSBL

In the Hashcash scheme, senders include proof of spent CPU time in order to compute a value as an indication that they are not spammers. Including an acceptable hashcash value will lower the SpamAssassin score for the message. RelayCountry enables the SpamAssassin user to utilize a new geographic-based token, which identifies the mail servers through which the message passes on its way to the recipient. SPF implements the Sender Policy Framework checks on the sender's domain. SPF can be thought of as reverse mail exchange (MX) records that define which IP addresses are allowed to originate email for a domain. URIDNSBL gives SpamAssassin the ability to check the body of the message for spammer-related URLs and help identify spam messages by adjusting the score appropriately.

Upgrading from SA 2.x

The balance of this article concerns upgrading a SpamAssassin 2.x installation to 3.0.2. We used Gentoo Linux version 2004.3 (with all updates applied as of January 21, 2005) as the platform for our examples. We cannot possibly cover all the potential permutations of SpamAssassin configurations. Thus, for the purposes of the upgrade coverage in this article, we made the following assumptions:

  • Initial installation of SpamAssassin was version 2.64
  • Per-user invocation of SpamAssassin by spamc/spamd and procmail version 3.22
  • Postfix version 2.1.5
  • Have installed SA files in their default locations
  • Have sudo or root access to the machine

Other versions of software we used included:

  • Perl 5.8.5
  • MySQL 4.0.23
  • OpenLDAP 2.1.30

The steps to upgrade SpamAssassin are as follows:

1. Download and build SpamAssassin 3.0.2.

2. Shut down spamd and Postfix.

3. Synchronize the old SA 2.64 Bayesian journals.

4. Back up the old SpamAssassin 2.64 installation.

5. Install new SpamAssassin v3.0.2.

6. Upgrade the old SA 2.64 Bayesian journals to the new 3.0 format.

7. Start up spamd and Postfix and test.

Download and Build SA 3.0.2

Download the SA tar files from one of the SpamAssassin Apache Software Foundation Web site mirrors like this:

bash$ wget \
  http://apache.roweboat.net/spamassassin/source/ \
Mail-SpamAssassin-3.0.2.tar.gz
Consult the References section for a pointer to the complete list of ASF mirrors. Next, build SA like this (output from build scripts/commands has been deleted):

bash$ tar xvf Mail-SpamAssassin-3.0.2.tar.gz.z
bash$ cd Mail-SpamAssassin-3.0.2
bash$ perl Makefile.PL
bash$ make
You have built SpamAssassin-3.0.2 and now can move on to the next step. Shut down spamd and Postfix, etc. This is accomplished by executing the following commands:

bash$ sudo /etc/init.d/postfix stop
Stopping postfix...  [ ok ]
bash$ sudo /etc/init.d/spamd stop
Stopping spamd...  [ ok ]
Synchronize Bayesian Journals

This step may or may not be necessary, but it is easy to perform and doesn't harm anything if the users on your system are not using journaling. Some users choose to use journaling with the Baysian configuration for performance purposes. Journaling causes each Bayesian-related change to be simply journaled and then, at the end, the journal is synced into the db files. If journaling is going on, then this step is required to make sure stuff in the journals gets written out before you upgrade the Bayesian DB.

To accomplish this task, we have written a small shell script called syncJournal-2.64.sh to perform this work. The script assumes that SA users can be identified by having a .spamassassin directory under their home directories:

#! /bin/sh
PATH=/bin:/usr/bin
users='awk -F: \
  '{ if (system( "test ! -d " $6 "/.spamassassin")) print $1; }' \
  /etc/passwd'

for user in ${users} ; do
   echo "syncing journal for ${user}"
   su ${user} -c 'sa-learn --rebuild'
done
To run the script, simply invoke it like this:

bash$ sudo ../syncJournal-2.64.sh
Back Up the Old SpamAssassin 2.64 Installation

This step doesn't need to be performed per se, but should be in case you want to go back to your old SpamAssassin 2.64 configuration. The backup-2.64.sh script presented here takes every system file or directory (but not user files) from the SA 2.64 default installation locations and renames them to the same location. The files are given the same filenames but have "-2.64" appended to the end. Please note that the Perl version is set on the second line of the script. If you are running something other than Perl 5.8.5, please adjust this line accordingly:

#! /bin/sh
perlVersion=5.8.5

mv /etc/mail/spamassassin /etc/mail/spamassassin-2.64
mv /usr/bin/sa-learn /usr/bin/sa-learn-2.64
mv /usr/bin/spamassassin /usr/bin/spamassassin-2.64
mv /usr/bin/spamc /usr/bin/spamc-2.64
mv /usr/bin/spamd /usr/bin/spamd-2.64

cp -p /usr/lib/perl5/${perlVersion}/i686-linux/perllocal.pod \
  /usr/lib/perl5/${perlVersion}/i686-linux/perllocal.pod-2.64

mv /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin.pm \
  /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin.pm-2.64
mv /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin \
  /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin-2.64
mv \
/usr/lib/perl5/site_perl/${perlVersion}/i686-linux/auto/Mail/SpamAssassin \
/usr/lib/perl5/site_perl/${perlVersion}/i686-linux/auto/Mail/SpamAssassin-2.64
mv /usr/share/man/man1/sa-learn.1 /usr/share/man/man1/sa-learn.1-2.64
mv /usr/share/man/man1/spamassassin.1 \
  /usr/share/man/man1/spamassassin.1-2.64
mv /usr/share/man/man1/spamc.1 /usr/share/man/man1/spamc.1-2.64
mv /usr/share/man/man1/spamd.1 /usr/share/man/man1/spamd.1-2.64

for file in /usr/share/man/man3/Mail::SpamAssassin* ; do
   mv ${file} ${file}-2.64
done

mv /usr/share/spamassassin /usr/share/spamassassin-2.64
Install SpamAssassin v3.0.2

Just run a make install in the SpamAssassin-3.0.2 installation directory like this:

bash$ sudo make install
This will install SpamAssassin in the correct locations on your system.

Upgrade the SA 2.64 Bayesian Journals to 3.0

To keep our Bayesian history, we must import the SpamAssassin 2.64 Bayesian journals for each user to the new SA 3.0.2 installation. We have written the following script for this purpose, and it's called syncJournal-3.0.2.sh. It takes every user in the system with a .spamassassin directory and converts the Bayesian journal to the new 3.0 format:

#! /bin/sh
PATH=/bin:/usr/bin
users='awk -F: \
  '{ if (system( "test ! -d " $6 "/.spamassassin")) print $1; }' \
  /etc/passwd'

for user in ${users} ; do
   echo "syncing journal for ${user}"
   su ${user} -c 'sa-learn --sync'
done
After the upgrade to SpamAssassin 3.0.2, place the syncJournal-3.02.sh script in the directory above the SA installation directory and run it like this:

bash$ sudo ../syncJournal-3.0.2.sh
Note that the script will output an error for every user on the system, similar to this:

bayes: bayes db version 2 is not able to be used, aborting! at
/usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/BayesStore/ 
DBM.pm line 160.
These errors can safely be ignored.

Start spamd and Postfix and Test

Finally, we need to start spamd and Postfix and test the installation to make sure everything is working as expected. To do this, just run the startup scripts for spamd and Postfix like this:

bash$ sudo /etc/init.d/spamd start
Starting spamd...  [ ok ]
bash$ sudo /etc/init.d/postfix start
Starting postfix...  [ ok ]
To test, we run two test messages through SpamAssassin that are included as part of the SA 3.0.2 distribution. One is a regular test message, and the other is a message that contains the special SpamAssassin test called GTUBE (short for Guaranteed To be Unsolicited Bulk Email).

Non-Spam Test

The sample non-spam message distributed by SpamAssassin is located in the top-level SpamAssassin-3.0.2 directory as "sample-nonspam.txt". This is an excellent test message as it exhibits many characteristics that SpamAssassin looks for in a message, such as multiple URLs and spaces between letters (e.g., "Q u o t e O f T h e M o m e n t"). Simply email the sample non-spam message via your Mozilla Thunderbird email client (if you use the SpamAssassin system as your mail relay). Or, make your current directory the SpamAssassin top-level directory and execute a mail command like this:

bash$ sendmail you@yourdomain.com < ./sample-nonspam.txt
Replace you@yourdomain.com with an account name on the machine running SpamAssassin. The message should make it through to your inbox.

Spam (GTUBE) Test

The test spam message is distributed as sample-spam.txt in the top-level SpamAssassin-3.0.2 directory. As with the non-spam message, send this message to yourself from your GUI email client outside your network. Alternatively, you can send the message from the machine running SpamAssassin using the following command line:

bash$ sendmail you@yourdomain.com < ./sample-spam.txt
The message should be identified as a spam message by SpamAssassin and be disposed of accordingly.

References

Apache Software Foundation License FAQ -- http://www.apache.org/foundation/licence-FAQ.html

OSI list of approved licenses -- http://www.opensource.org/licenses

Amavisd-new -- http://www.ijs.si/software/amavisd/

MIMEdefang -- http://www.mimedefang.org/

Procmail -- http://www.procmail.org

HashCash -- http://www.hashcash.org/

Sender Policy Framework -- http://spf.pobox.com/

SpamAssassin -- http://spamassassin.apache.org/

OpenLDAP -- http://www.openldap.org/

Apache Software Foundation list of download mirrors -- http://www.apache.org/mirrors/

Sudo home -- http://www.courtesan.com/sudo/

Postfix home -- http://www.postfix.org

Dale Nielsen is a partner in Avacoda, LLC, a consulting company specializing in systems administration and software development. He has worked as a systems administrator since receiving his degree in Computer Science from the University of Massachusetts. He has more than twenty years experience administering Unix and Linux based mail servers, firewalls, and workstations.

Robert Haskins is currently employed by Renesys Corporation, a leader in real-time Internet connectivity monitoring and reporting. After an initial stint working at a nuclear power plant, Robert has fought spam in many environments including enterprise, cable modem ISP, network equipment manufacturer, wholesale dialup ISP, competitive local exchange carrier, traditional ISP, and network management services provider. Robert writes regularly for Usenix ;login:, speaks on the topic of fighting spam, and is a member of the IEEE, Usenix, and SAGE.

Robert and Dale wrote the Addison-Wesley book Slamming Spam: A Guide for System Administrators (ISBN 0131467166). They also coauthored an anti-spam patent for ZipLink, Inc. Robert and Dale can be reached via email at: authors@slammingspam.com.