Using New Features in SpamAssassin 3.0
Robert Haskins and Dale Nielsen
In this article, we will look at the new features that have been
added to SpamAssassin (SA) 3.0 and show you how to use them. We'll
start by giving an overview of the changes in SpamAssassin 3.0 and
then show an example upgrade of an existing SpamAssassin 2.64 installation.
What's New in SA 3.0
There are a number of changed functionality and new features in
SpamAssassin 3.0. These modifications include:
- License change
- API changes
- New spam filtering rules
- Database and LDAP changes
- Network changes
- New plug-in framework
Each of these areas is covered briefly below.
License Change
Note that we are not lawyers, so you should speak to an attorney
if you have questions regarding licensing issues. Before version
2, SpamAssassin was licensed under the GNU Public License (GPL).
However, with SpamAssassin moving to the Apache Foundation, it is
now covered by the Apache Software Foundation License (ASF). If
you are simply using the SpamAssassin software in the operations
of your network, there is no need to worry. However, if you are
a developer and want to incorporate SpamAssassin with a GPL-based
software package, then you may have a problem because the ASF license
may not be compatible with the GPL. The ASF license is, however,
approved by the Open Source Initiative (OSI).
API Changes
There have been significant changes in the application programming
interface (API) for SA 3.0. Unless you are a developer, the changes
to the SpamAssassin API mostly affect the programs used to integrate
SpamAssassin into your mail transfer agent (MTA), such as amavisd-new
and MIMEdefang. To run SpamAssassin 3.0 successfully, you should
be running the following versions of SpamAssassin MTA integration
software at a minimum:
Amavisd-new: amavisd-new-20030616-p8 (2.2.1 is latest)
MIMEdefang: 2.42 and higher (2.49 is latest)
Qmail-Scanner: 1.23 and higher (1.25 is latest)
As with most open source software, "later versions are better".
So if you have a choice, go with the latest stable version of the
MTA integration software. Of course, the internal SpamAssassin components
(such as spamc) have been updated, so if you're using procmail to
integrate with your MTA, you don't have to do anything.
New Spam Filtering Rules
A number of new rules are distributed with SpamAssassin 3.0 by
default. SpamAssassin 3.0 distributes a total of 937 rules, and
SpamAssassin 2.6 distributes 601 rules, making the difference 336
additional rules in 3.0. However, some of this difference may result
from deleted rules rather than additional rules. In addition to
the changed rules, a number of the default scores have changed,
as has been the case in the past with new SA versions (even minor
upgrades).
Database and LDAP Changes
The most significant change in the database and LDAP support is
the ability to store user preferences in a MySQL or Postgres database
or an LDAP store. This is a major development for any larger site
that would like to simplify their SpamAssassin setup by placing
all of their user information into a database. Also, the Bayesian
information (tokens, scores) can be placed into a database or directory.
Benefits of doing this include:
- Ability to store preferences, Bayesian scores, and auto-whitelist
information for users who have no home directories on the server
running SpamAssassin
- Ability for the end user to easily manage SpamAssassin settings
by implementing a graphical user interface to database/directory
For smaller sites (10-20 users) that don't have LDAP or a database
infrastructure already in place, it might not be worthwhile to deploy
a database or directory to get these benefits. However, for larger
sites or any site with an existing database or directory infrastructure,
it is probably worthwhile to implement these features.
Network Changes
Previous versions of SpamAssassin enabled the user to identify
trusted networks. SpamAssassin 3.0 provides the ability to further
identify trusted networks. Specifically, the idea of "internal networks"
is identified. These are machines that are internal mail relay machines
or MX relay hosts for your domains. This list is used to detect
spammers who send their garbage directly to MX (or backup MX) hosts.
Mail relay machines that accept mail directly from dial-up hosts
or high-speed DSL/cable modem-connected clients should not
be placed on the internal network lists. Instead, they should be
placed only on the "trusted networks" list.
New Plug-in Framework
This version of SpamAssassin has the ability for developers to
extend the functionality by using plug-ins. This will enable interested
parties, either open source or commercial, to easily extend SpamAssassin's
abilities as they wish. SpamAssassin 3.0 is distributed with the
following four plug-ins:
- Hashcash
- RelayCountry
- Sender Policy Framework (SPF)
- URIDNSBL
In the Hashcash scheme, senders include proof of spent CPU time
in order to compute a value as an indication that they are not spammers.
Including an acceptable hashcash value will lower the SpamAssassin
score for the message. RelayCountry enables the SpamAssassin user
to utilize a new geographic-based token, which identifies the mail
servers through which the message passes on its way to the recipient.
SPF implements the Sender Policy Framework checks on the sender's
domain. SPF can be thought of as reverse mail exchange (MX) records
that define which IP addresses are allowed to originate email for
a domain. URIDNSBL gives SpamAssassin the ability to check the body
of the message for spammer-related URLs and help identify spam messages
by adjusting the score appropriately.
Upgrading from SA 2.x
The balance of this article concerns upgrading a SpamAssassin
2.x installation to 3.0.2. We used Gentoo Linux version 2004.3 (with
all updates applied as of January 21, 2005) as the platform for
our examples. We cannot possibly cover all the potential permutations
of SpamAssassin configurations. Thus, for the purposes of the upgrade
coverage in this article, we made the following assumptions:
- Initial installation of SpamAssassin was version 2.64
- Per-user invocation of SpamAssassin by spamc/spamd and procmail
version 3.22
- Postfix version 2.1.5
- Have installed SA files in their default locations
- Have sudo or root access to the machine
Other versions of software we used included:
- Perl 5.8.5
- MySQL 4.0.23
- OpenLDAP 2.1.30
The steps to upgrade SpamAssassin are as follows:
1. Download and build SpamAssassin 3.0.2.
2. Shut down spamd and Postfix.
3. Synchronize the old SA 2.64 Bayesian journals.
4. Back up the old SpamAssassin 2.64 installation.
5. Install new SpamAssassin v3.0.2.
6. Upgrade the old SA 2.64 Bayesian journals to the new 3.0 format.
7. Start up spamd and Postfix and test.
Download and Build SA 3.0.2
Download the SA tar files from one of the SpamAssassin Apache
Software Foundation Web site mirrors like this:
bash$ wget \
http://apache.roweboat.net/spamassassin/source/ \
Mail-SpamAssassin-3.0.2.tar.gz
Consult the References section for a pointer to the complete list
of ASF mirrors. Next, build SA like this (output from build scripts/commands
has been deleted):
bash$ tar xvf Mail-SpamAssassin-3.0.2.tar.gz.z
bash$ cd Mail-SpamAssassin-3.0.2
bash$ perl Makefile.PL
bash$ make
You have built SpamAssassin-3.0.2 and now can move on to the next
step. Shut down spamd and Postfix, etc. This is accomplished by executing
the following commands:
bash$ sudo /etc/init.d/postfix stop
Stopping postfix... [ ok ]
bash$ sudo /etc/init.d/spamd stop
Stopping spamd... [ ok ]
Synchronize Bayesian Journals
This step may or may not be necessary, but it is easy to perform
and doesn't harm anything if the users on your system are not
using journaling. Some users choose to use journaling with the Baysian
configuration for performance purposes. Journaling causes each Bayesian-related
change to be simply journaled and then, at the end, the journal
is synced into the db files. If journaling is going on, then this
step is required to make sure stuff in the journals gets written
out before you upgrade the Bayesian DB.
To accomplish this task, we have written a small shell script
called syncJournal-2.64.sh to perform this work. The script assumes
that SA users can be identified by having a .spamassassin directory
under their home directories:
#! /bin/sh
PATH=/bin:/usr/bin
users='awk -F: \
'{ if (system( "test ! -d " $6 "/.spamassassin")) print $1; }' \
/etc/passwd'
for user in ${users} ; do
echo "syncing journal for ${user}"
su ${user} -c 'sa-learn --rebuild'
done
To run the script, simply invoke it like this:
bash$ sudo ../syncJournal-2.64.sh
Back Up the Old SpamAssassin 2.64 Installation
This step doesn't need to be performed per se, but should be in
case you want to go back to your old SpamAssassin 2.64 configuration.
The backup-2.64.sh script presented here takes every system file
or directory (but not user files) from the SA 2.64 default installation
locations and renames them to the same location. The files are given
the same filenames but have "-2.64" appended to the end. Please
note that the Perl version is set on the second line of the script.
If you are running something other than Perl 5.8.5, please adjust
this line accordingly:
#! /bin/sh
perlVersion=5.8.5
mv /etc/mail/spamassassin /etc/mail/spamassassin-2.64
mv /usr/bin/sa-learn /usr/bin/sa-learn-2.64
mv /usr/bin/spamassassin /usr/bin/spamassassin-2.64
mv /usr/bin/spamc /usr/bin/spamc-2.64
mv /usr/bin/spamd /usr/bin/spamd-2.64
cp -p /usr/lib/perl5/${perlVersion}/i686-linux/perllocal.pod \
/usr/lib/perl5/${perlVersion}/i686-linux/perllocal.pod-2.64
mv /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin.pm \
/usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin.pm-2.64
mv /usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin \
/usr/lib/perl5/site_perl/${perlVersion}/Mail/SpamAssassin-2.64
mv \
/usr/lib/perl5/site_perl/${perlVersion}/i686-linux/auto/Mail/SpamAssassin \
/usr/lib/perl5/site_perl/${perlVersion}/i686-linux/auto/Mail/SpamAssassin-2.64
mv /usr/share/man/man1/sa-learn.1 /usr/share/man/man1/sa-learn.1-2.64
mv /usr/share/man/man1/spamassassin.1 \
/usr/share/man/man1/spamassassin.1-2.64
mv /usr/share/man/man1/spamc.1 /usr/share/man/man1/spamc.1-2.64
mv /usr/share/man/man1/spamd.1 /usr/share/man/man1/spamd.1-2.64
for file in /usr/share/man/man3/Mail::SpamAssassin* ; do
mv ${file} ${file}-2.64
done
mv /usr/share/spamassassin /usr/share/spamassassin-2.64
Install SpamAssassin v3.0.2
Just run a make install in the SpamAssassin-3.0.2 installation
directory like this:
bash$ sudo make install
This will install SpamAssassin in the correct locations on your system.
Upgrade the SA 2.64 Bayesian Journals to 3.0
To keep our Bayesian history, we must import the SpamAssassin
2.64 Bayesian journals for each user to the new SA 3.0.2 installation.
We have written the following script for this purpose, and it's
called syncJournal-3.0.2.sh. It takes every user in the system with
a .spamassassin directory and converts the Bayesian journal to the
new 3.0 format:
#! /bin/sh
PATH=/bin:/usr/bin
users='awk -F: \
'{ if (system( "test ! -d " $6 "/.spamassassin")) print $1; }' \
/etc/passwd'
for user in ${users} ; do
echo "syncing journal for ${user}"
su ${user} -c 'sa-learn --sync'
done
After the upgrade to SpamAssassin 3.0.2, place the syncJournal-3.02.sh
script in the directory above the SA installation directory and run
it like this:
bash$ sudo ../syncJournal-3.0.2.sh
Note that the script will output an error for every user on the system,
similar to this:
bayes: bayes db version 2 is not able to be used, aborting! at
/usr/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/BayesStore/
DBM.pm line 160.
These errors can safely be ignored.
Start spamd and Postfix and Test
Finally, we need to start spamd and Postfix and test the installation
to make sure everything is working as expected. To do this, just
run the startup scripts for spamd and Postfix like this:
bash$ sudo /etc/init.d/spamd start
Starting spamd... [ ok ]
bash$ sudo /etc/init.d/postfix start
Starting postfix... [ ok ]
To test, we run two test messages through SpamAssassin that are included
as part of the SA 3.0.2 distribution. One is a regular test message,
and the other is a message that contains the special SpamAssassin
test called GTUBE (short for Guaranteed To be Unsolicited Bulk Email).
Non-Spam Test
The sample non-spam message distributed by SpamAssassin is located
in the top-level SpamAssassin-3.0.2 directory as "sample-nonspam.txt".
This is an excellent test message as it exhibits many characteristics
that SpamAssassin looks for in a message, such as multiple URLs
and spaces between letters (e.g., "Q u o t e O f T h e M o m e n
t"). Simply email the sample non-spam message via your Mozilla Thunderbird
email client (if you use the SpamAssassin system as your mail relay).
Or, make your current directory the SpamAssassin top-level directory
and execute a mail command like this:
bash$ sendmail you@yourdomain.com < ./sample-nonspam.txt
Replace you@yourdomain.com with an account name on the machine running
SpamAssassin. The message should make it through to your inbox.
Spam (GTUBE) Test
The test spam message is distributed as sample-spam.txt in the
top-level SpamAssassin-3.0.2 directory. As with the non-spam message,
send this message to yourself from your GUI email client outside
your network. Alternatively, you can send the message from the machine
running SpamAssassin using the following command line:
bash$ sendmail you@yourdomain.com < ./sample-spam.txt
The message should be identified as a spam message by SpamAssassin
and be disposed of accordingly.
References
Apache Software Foundation License FAQ -- http://www.apache.org/foundation/licence-FAQ.html
OSI list of approved licenses -- http://www.opensource.org/licenses
Amavisd-new -- http://www.ijs.si/software/amavisd/
MIMEdefang -- http://www.mimedefang.org/
Procmail -- http://www.procmail.org
HashCash -- http://www.hashcash.org/
Sender Policy Framework -- http://spf.pobox.com/
SpamAssassin -- http://spamassassin.apache.org/
OpenLDAP -- http://www.openldap.org/
Apache Software Foundation list of download mirrors -- http://www.apache.org/mirrors/
Sudo home -- http://www.courtesan.com/sudo/
Postfix home -- http://www.postfix.org
Dale Nielsen is a partner in Avacoda, LLC, a consulting company
specializing in systems administration and software development.
He has worked as a systems administrator since receiving his degree
in Computer Science from the University of Massachusetts. He has
more than twenty years experience administering Unix and Linux based
mail servers, firewalls, and workstations.
Robert Haskins is currently employed by Renesys Corporation,
a leader in real-time Internet connectivity monitoring and reporting.
After an initial stint working at a nuclear power plant, Robert
has fought spam in many environments including enterprise, cable
modem ISP, network equipment manufacturer, wholesale dialup ISP,
competitive local exchange carrier, traditional ISP, and network
management services provider. Robert writes regularly for Usenix
;login:, speaks on the topic of fighting spam, and is a member
of the IEEE, Usenix, and SAGE.
Robert and Dale wrote the Addison-Wesley book Slamming
Spam: A Guide for System Administrators (ISBN 0131467166). They
also coauthored an anti-spam patent for ZipLink, Inc. Robert and
Dale can be reached via email at: authors@slammingspam.com. |