Article
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6
Table 1
Table 2

aug2004.tar

High Performance Content-Filtering Mail System

Thomas H. Jones II

Email-borne spam, viruses, worms, and other nastiness make daily life a hassle for users and a hazard for unprotected systems. A number of commercial solutions exist to help manage these problems. However, their associated costs, speed penalties, and overall ease to install and administer can leave admins who are concerned with security looking for a better answer.

Fortunately, open source comes to the rescue. With a very basic system running a free operating system -- such as Linux or the BSDs -- it is possible to create a fast, powerful, and very low-cost email content filtering system. Such a system can be set up either as an ultimate email delivery destination or as an inline network device for the main Internet mail host. In this article, I will describe how to create a content-filtering email host and how to test its functionality.

Requirements

This article is based around software packages listed in Table 1. Each one of these packages (with the exception of the C compiler) will need to be downloaded, compiled, installed, and configured to create the desired integrated mail system. At the time of the writing, the following software versions were used:

Postfix 2.0.18 (Feb 5, 2004 build)
Postfix TLS patch for Postfix 2.0.18 (Feb 5, 2004 build)
Cyrus SASL 2.1.17
OpenSSL 0.9.7c
Berkeley DB 4.2.52
Clam A/V 0.67-1
SpamAssassin 2.63
AMaViSd-New 6/16/03 build patch-level 7
GCC 3.3.2 C Compiler
GNU Patch 2.5.4
GNU Make 3.80
Perl 5.8.3

A few of the packages require additional Perl modules to be downloaded and installed from CPAN. Each package comes with a manifest that details its Perl module dependencies.

The GNU make requirement is noted primarily for commercial Unix systems. Some systems' developer packages include a make program that will fail on some of the above packages. In those instances, GNU's make may be the only functional alternative.

Note that a source of entropy (randomness) needs to be available on the mail system. Ideally, this would be in the form of a random device driver such as "/dev/random" or "/dev/urandom". Alternately, entropy may be provided by an entropy service such as EGD (Entropy Gathering Daemon) or PRNGD (Pseudo Random Number Generator Daemon).

I chose each of these software packages due to one, overriding reason -- cost. When I originally undertook this project, it was as a personal/professional enrichment exercise. Money was not available for things like commercial anti-virus (A/V) or content-filtering software, mail transport agents (MTAs), or compilers. Therefore, open source software was the only real choice. The software packages represent a combination of free, highly capable, and fairly security-conscious software choices.

Create Required userids and groupids

The various components of this mail system run as different, non-privileged (i.e., not root) users and groups. This is part of the design that helps make this mail system more secure. This user and group creation should be done first, as the configure and install scripts for some of the applications will fail if required users are missing.

The Postfix software requires the creation of its own user under which to run. This userid should be created with the user name and group name "postfix". Postfix also requires a secondary group under which to run certain setgid processes. This group should be named "postdrop".

The anti-spam and anti-virus software all run under the control of the same parent process. Thus, although each might normally be built with its own user and group assignments as specified in each application's README or INSTALL files, it is necessary to give them a common user and group assignment. I recommend creating a user "amavis" and assigning it to the group "amavis".

Create the Build Environment

This process involves making your system capable of creating the mail-related programs from their source code distributions. You'll need a C compiler and a Perl interpreter. Because it is also helpful to have your Berkeley DB libraries available when the Perl interpreter is built, building the Berkeley DB libraries is covered in this section.

To begin, install a C compiler:

If the system in question is a Linux host built with "developer support" packages installed, the necessary C compiler (a GCC variant), basic support libraries, and headers will already be installed. If not already installed, most Linux distributions have some form of package management tool that can be used to fetch and install the needed files.
If the system is a commercial Unix variant (e.g., Solaris), either a vendor-supplied compiler or a free compiler can be used. Given previous notes about the genesis of this project, a free compiler like GCC is assumed. Typically, there are sites geared towards commercial Unix variants that allow you to download a binary distribution. Find the appropriate site (e.g., for Solaris, SunFreeware.Com), download the binary distribution package, and install it on your system.

The next task is to add Berkeley-style DBM support to the system's build environment. SleepyCat software makes a Berkeley DBM package. Before building this software package, it is a good idea to set up build environment to hardcode library search paths into compiled object codes (binaries, dynamic libraries, etc.). For the combination of Solaris and GCC 3.x, the LD_RUN_PATH environmental may be used to hardcode the run-time linker search paths into the binaries. (Other operating systems might honor the DT_RUN_PATH environmental setting -- check the compiler man pages for your operating environment.) The Berkeley DBM installs into "/usr/local/BerkeleyDB.4.2" by default. For systems like Solaris, there may also be some file dependencies in "/usr/local/lib". Therefore, it is a good idea to set the LD_RUN_PATH environmental to "/usr/local/lib:/usr/local/BerkeleyDB.4.2/lib".

Once this environmental is set, follow the build directions and installation instructions outlined in the READMEs and INSTALL files. Typically, when building SleepyCat's Berkeley DBM package, the configure options --enable-cryptography --enable-hash --enable-queue --enable-replication --enable-verify --enable-cxx --enable-java --enable-rpc --enable-shared --enable-static produce a good Berkeley DBM package with the features required by the programs in this project, as well as those required by some programs not directly related to this project.

Building and Installing Perl

Some of the Perl modules required to build the content filters for this project will need a very up-to-date version of the Perl package. While most Unix operating systems now come with a Perl package, the installed version may not be current and may not have some features recommended for this project. Therefore, installation of a new version of Perl is recommended, and will be required in most cases. The Perl source code comes with a configure script to set up the Perl build. Unless other specific deviations are required, use the defaults with the following two exceptions: answer "yes" to building a threading Perl and to building libperl. It might also be a good idea to use the PerlIO option rather than the STDIO default option. This will create the necessary makefiles and, with luck, result in a good Perl package. With the presence of the previously compiled Berkeley DBM package, the appropriate Perl DBM modules should automatically get included in the Perl build. Follow the included READMEs and INSTALL files for final installation instructions.

Install the Encryption/SSL Routines

Several of the remaining software packages rely on routines found in the OpenSSL package to provide session encryption services. Therefore, OpenSSL should be built after installation of the C compiler and Perl interpreter.

As with most open source software packages, OpenSSL includes a configure script to make code portability easier. Good choices for build options include shared zlib-dynamic no-krb5 --prefix=/usr/local --openssldir=/usr/local/openssl <OSARCH> (where OSARCH equates to something like "solaris-sparcv9-gcc" or "linux-pentium"). This will result in:

Building of shared libraries (recommended for use by later software builds in this project).
Linking against the zlib compression/decompression libraries. Note that if these are not already installed or are installed only as static libraries, they should be installed as dynamic libraries before proceeding further. Otherwise, remove the zlib-dynamic option.
Disabling Kerberos v5 support.
Installing all libraries, binaries, and man pages in "/usr/local".
Setting "/usr/local/openssl" as the default certificate root.

By default, OpenSSL will root itself in "/usr/local/ssl". Many software packages that work with OpenSSL expect this location and will require overrides if the default is changed. However, using the "/usr/local" root obviates the need for adding ":/usr/local/ssl/lib" into the LD_RUN_PATH or equivalent variable. Keeping the path that the run-time linker searches short will result in marginally quicker initial application startups. It also means that, if a process must be traced, there are fewer NOENT entries in the trace output.

OpenSSL can take a fair amount of time to compile. There are a number of modules to compile and some are computationally intensive to create. It also comes with a hideous amount of manual pages that makes the package installation take longer than one might expect.

Install Authentication Libraries

The Cyrus SASL packages are responsible for providing authentication services to other applications. These are done variously through use of linked authentication libraries and optional authentication daemons. Various plug-ins can be created to provide authentication mechanisms, such as "plain" and login, CRAM and Digest MD5, One-Time Password (OTP), NTLM, and others. It can even provide access to other backend user credential stores such as LDAP or a MySQL database (useful for distributed management of large user spaces that don't require Unix shell accounts). These authentication routines can be used by Postfix to allow third-party relaying only from authenticated users.

Most modern email clients support SMTP authentication. Common authentication methods used by email clients include login, plain, CRAM-MD5, and Digest-MD5. Therefore, it is important that the Cyrus SASL package be compiled to support these login mechanisms. To ensure that these four modules get built, specify --enable-cram --enable-digest --enable-plain --enable-login in your list of options passed to the configure script.

Cyrus SASL typically also makes use of Berkeley DBM files. However, unless the configure script is instructed where to find the Berkeley DBM install location, DBM support will be absent. To tell the configure script where to find Berkeley DBM, specify --with-bdb-libdir=/usr/local/BerkeleyDB.4.2/lib --with-bdb-incdir=/usr/local/BerkeleyDB.4.2/include.

Miscellaneous options found to be useful:

--enable-sample -- Build sample client and server; this can be useful for diagnostic tasks.
--enable-static -- Build static libraries.
--enable-shared --enable-staticdlopen -- Include shared library support and define usage methods.
--enable-java --with-javabase=/usr/java/include -- Include Java hooks.
--with-devrandom=/dev/random -- Where SASL should look for entropy sources.
--with-pam -- Recommended, if your system supports Pluggable Authentication Modules.
--with-saslauthd=/usr/local/var/saslauth -- Where the Cyrus SASL Authentication daemon will store state information.
--with-openssl --with-des --with-rc4 -- To enable OpenSSL support and use OpenSSL's DES and RC4 cryptographic routines.

These options are specified more as a matter of habit than of true necessity. Most of them either have default values or are automatically found on their own.

To make the whole configuration easier (and more easily repeatable in case of configuration errors), it may be best to simply dump all of the chosen options to a file, one option per line. This file can then be fed to the configure script by issuing ./configure 'cat OptionsFile' in the top-level source directory. Once the configuration successfully completes, simply follow the installation and configuration instructions in the READMEs and INSTALL files.

Install TLS-Enabled Postfix

The stock Postfix software does not yet natively support TLS. To add this support to Postfix, a patch must be applied to the source code. This patch comes in the form of a unified diff.

Warning: The patch author writes the patch against specific versions and builds of Postfix. The patch is also written against a specific version of OpenSSL. Attempting to use this patch against different versions/builds of Postfix and the wrong version of OpenSSL may result in broken code.

Also, note that if the software is being built on a system like Solaris, the vendor-supplied patch program will not work with this diff file. Please download, build, and install GNU patch from the link specified in the requirements section.

Actual application of the patch is fairly trivial. Unroll the Postfix source code distribution and the patch distribution into the same parent directory (e.g., "/usr/local/src"). From this common directory, issue the command patch -p0 < PatchDir/pfixtls.diff (where PatchDir is the directory created by unrolling the patch tarball) A screenful (or so) of "patching file" messages will scroll by. At this point, Postfix is ready to be configured for build, including the TLS extensions.

Building Postfix with TLS support can be done by following the READMEs and INSTALL documents. Minimal options must be set at configuration time, as the install scripts will ask where to put the various components. There is no "configure" script. Build options get passed directly to make. Specifying extra options should be done while running the make makefiles command. This should be done as follows:

make makefiles CCARGS='-DUSE_SASL_AUTH \
-I/usr/local/include/sasl -DUSE_SSL \
-I/usr/local/ssl/include' \
AUXLIBS="-L/usr/local/lib -lsasl2 -lssl -lcrypto"

The installation documents provide other options, but those are only for providing default answers to the install scripts. Once the creation of the makefiles is done, simply run make, then make install. You will be prompted for where to install the various components. Choose something that makes sense for your system [1].

Install SpamAssassin

SpamAssassin is a collection of Perl modules used for scanning, categorizing, and tagging emails. This package can be obtained as a source code distribution or via Perl's CPAN feature. I prefer the CPAN method and will outline the steps necessary to create a working SpamAssassin configuration via the CPAN tools. The following should get you a nicely functional SpamAssassin package:

# perl -MCPAN -e shell

cpan shell -- CPAN exploration and modules installation (v1.76)
ReadLine support enabled

cpan> install Pod::Usage
cpan> install HTML::Parser
cpan> install Sys::Syslog
cpan> install Net::DNS
cpan> install Mail::Audit
cpan> install Mail::Internet
cpan> install Net::SMTP
cpan> install Digest::SHA1
cpan> install Net::Ident
cpan> install IO::Socket::SSL

cpan> install ExtUtils::MakeMaker
cpan> install File::Spec
cpan> install BerkeleyDB
cpan> install DB_File
cpan> install Mail::SpamAssassin

This will also take care of installing SpamAssassin and putting a default configuration file into place. SpamAssassin writes its default, system-wide configuration file to /etc/mail/spamassassin/local.cf.

Note that, sometimes, CPAN modules may not build correctly within the CPAN construct. In those instances, you must build the source code by hand. If a CPAN module fails to compile, try the following: exit CPAN; cd $HOME/.cpan/build; cd into the module's source directory; issue the commands make distclean, perl Makefile.PL, make, make test, then make install. If necessary, restart CPAN. If this still results in failure (other than due to missing dependencies), retry your efforts using the GNU version of make.

If you prefer to manually compile the above modules, CPAN can still be used to grab the module list. Simply download the modules by issuing perl -MCPAN -e "get Module::Name" (where Module::Name is the name of one of the above-listed modules). Once all of the above have been downloaded, cd to the $HOME/.cpan/build directory and compile each set of modules. Note that this method will not automatically account for dependencies. Only building within the CPAN environment offers that functionality.

Install Anti-Virus Software

Clam A/V was chosen because it's a free, simple to use open source solution. Its installation is fairly trivial and requires little in the way of configuration and maintenance. Pre-compile configuration is as simple as setting:

--enable-shared --enable-static  --enable-id-check \
--enable-bigstack --with-user=USER --with-group=GROUP \
--with-dbdir=/path/to/definitions/dir.

The first two options are fairly self-explanatory. If you are using user namespace management other than /etc/passwd (e.g., LDAP) and your Clam A/V user exists only in the alternate namespace, you must enable the --enable-id-check option. Enabling "bigstack" support will increase the amount of memory consumed by the application and prevent larger messages from exhausting pre-allocated memory space.

Because Clam A/V will be running in concert with the AMaViSd-new process, you must set Clam A/V to function with the same user and group IDs that AMaViSd-new uses. The --with-user=USER --with-group=GROUP options do this. Typically, create a "filter" user and group (or other similar user and group) for use by both daemons. The --with-dbdir= option tells the Clam A/V daemon where to look for its virus definitions and the freshclam definition update process where to write new virus definitions to. Once the configuration script completes, simply use make to build and make install to finish the installation of the Clam A/V software.

Install AMaViSd-new Software

AMaViSd is a master filter process. It listens as a daemon/service awaiting inbound traffic. It then takes that traffic and passes it off to the other filters for which you have configured it to act as a front-end. By default, if you already have SpamAssassin and Clam A/V installed, it will pass off to them.

Like SpamAssassin, it is written in Perl. It also has a list of modules on which it depends. As with SpamAssassin, you will need to install these via CPAN:

cpan> install Archive::Tar
cpan> install Archive::Zip
cpan> install Compress::Zlib
cpan> install Convert::TNEF
cpan> install Convert::UUlib
cpan> install MIME::Base64
cpan> install Mail::Internet
cpan> install Net::Server
cpan> install Net::SMTP
cpan> install Digest::MD5
cpan> install IO::Stringy
cpan> install Time::HiRes
cpan> install Unix::Syslog

Several of these should already be installed and up to date -- especially since some were previously required by the SpamAssassin install.

On some Unix systems, the Unix::Syslog module can sometimes fail to build correctly from within the CPAN context. If the compile fails, exit the CPAN environment and cd to ${HOME}/.cpan/build/Unix-Syslog-X.XXX. Clean up the environment from the broken CPAN build by issuing make distclean. Recreate the make files by issuing perl Makefile.PL. Once Perl creates the new makefiles, you should be able to issue a make followed by a make install and have the Unix::Syslog module correctly built and installed. If this bombs, try GNU make, instead.

The MIME::Parser module is also required; however, the CPAN version lacks a patch. You can get the correct MIME::Parser module from http://search.cpan.org/dist/MIME-tools. Download the latest 6.2xxx version and build it similarly to Unix::Syslog.

Once these modules are installed, AMaViSd should work (when activated). This is a manual process that is detailed in the INSTALL file. Be sure that you run AMaViSd under the same userid and groupid under which you installed Clam A/V. Otherwise, all sorts of permissions/ownership errors will crop up when attempting to run the software.

Configure Clam A/V

The Clam anti-virus software is configured via the file /usr/local/etc/clamav.conf. This file is fairly well commented. Pick configuration values that make sense for your environment. Listing 1 shows an example configuration. There are a number of other options that can be specified and these values are just for example purposes. The options are laid out in the default configuration file as well as the Clam A/V man pages.

Like any virus scanner, Clam A/V is only as good as its virus definitions are current. Clam A/V comes with an application called freshclam, which can either be periodically run from cron or set up as a daemon that polls for virus definition updates from the Clam A/V project's definition server. To set it up to run as a daemon that hourly polls for new definitions, start the freshclam process as 'freshclam -d -c 24'. The freshclam process will also require proper configuration. The freshclam configuration file is /usr/local/etc/freshclam.conf. If this file does not exist, create it by copying the example configuration file included in the source build directory. Modify it to suit your environment.

Configure AMaViSd

The AMaViSd software is configured via the file /usr/local/etc/amavisd.conf. This file is fairly well commented. Pick configuration values that make sense for your environment. Only a handful of changes are needed to make the AMaViSd software work for your site. Listing 2 shows an example configuration.

This example assumes that you are not going to run the AMaViSd software as a centralized virus scanner for a number of networked hosts. If you want AMaViSd to support multiple hosts, take special care to modify the various "acl" directives. Also, AMaViSd will, by default, scan for a number of packages at startup. If this behavior is not desired, simply comment out the scans from the configuration block. Of course, looking over all of these various scanner options can give you ideas for other AMaViSd plug-ins with which to experiment. Finally, it is also recommended that, if you will be doing extensive white listing or blacklisting, that you configure AMaViSd to use external hash files. This is done by way of the read_hash() directive:

read_hash(%whitelist_sender, '/path/to/whitelist.txt');
read_hash(%blacklist_sender, '/path/to/blacklist.txt');

When the files are updated, the AMaViSd process will need to be bounced to cause the changes to be read in.

Configure Postfix

Configuring Postfix is done in two main parts -- the baseline configuration, and the anti-UCE controls. There is also one optional part, the AUTH/TLS components. The baseline configuration ensures a minimally functional Postfix configuration. The anti-UCE controls help make the Postfix MTA a bit more resistant to being used to deliver, send, or relay spam and other garbage. The optional AUTH/TLS components allow Postfix to provide SMTP-authentication functions to SMTP clients and to do automated, point-to-point encryption of message contents between TLS-enabled SMTP servers. All of these are configured within the Postfix configuration file, main.cf.

A baseline configuration will include such things as how the SMTP server identifies itself, whom it trusts, where it finds helper programs, etc. Listing 3 shows an example of such a configuration. Note that the setgid_group value in this file should be set to something different from the Postfix user's main group assignment.

Anti-UCE controls can be as simple as strictly enforcing RFC-specified SMTP behavior and as complex as doing lookups against DNS-based blacklists or using custom filters. The next example (see Listing 4) does a bit of all of this, the last of which is instructing Postfix to pass off (more advanced) content-filtering functions to the AMaViSd process.

Note that any or all of these rules may not be present in the default Postfix configuration file because it's possible to have a working Postfix configuration without these directives. These optional directives simply improve the SPAM killing ability of Postfix by enabling certain built-in checking routines.

The last line in the configuration shown in Listing 4 also requires a change to the Postfix master.cf file. Two services will need to be added to the master.cf file, as follows:

smtp-amavis      unix  -       -       n       -       2       smtp
  -o smtp_data_done_timeout=1200
127.0.0.1:10025  inet  n       -       n       -       -       smtpd
    -o content_filter=
    -o local_recipient_maps=
    -o relay_recipient_maps=
    -o smtpd_restriction_classes=
    -o smtpd_client_restrictions=
    -o smtpd_helo_restrictions=
    -o smtpd_sender_restrictions=
    -o smtpd_recipient_restrictions=permit_mynetworks,reject
    -o mynetworks=127.0.0.0/8
    -o strict_rfc821_envelopes=yes
    -o smtpd_error_sleep_time=0
    -o smtpd_soft_error_limit=1001
    -o smtpd_hard_error_limit=1000

The last section of the Postfix configuration is to enable SMTP-authentication routines and/or host-to-host SMTP data encryption. If there is a need to provide SMTP relay access to roaming clients or to a list of clients that would be unwieldy to manage via explicit access controls, authenticated access is the best way to go. It allows you to provide relay access to those customers without turning the SMTP server into an abuseable "open mail relay". Creating an "open mail relay" is irresponsible Internet behavior and will almost certainly result in the server getting banned by the Internet at large.

By setting up TLS functions, two things can be accomplished -- SMTP relay authentication credentials can be made highly difficult to compromise, and SMTP-to-SMTP email data can be encrypted. Given the number of hops that an email may go through, the unknown nature of who might be snooping on said emails, and the potentially sensitive nature of the data sent, it makes sense to enable encryption even if authentication functions are unneeded. The configuration shown in Listing 5 accomplishes just that -- encryption without authentication. It should be noted that, given this Postfix configuration file, turning on authentication is as simple as changing smtpd_sasl_auth_enable from "no" to "yes".

The different map files listed in the example configuration segments must at least exist and be placed into the proper format. For regular access map files (such as the blacklist file), simply touch'ing the file and running it through postmap will suffice. The aliases file(s) is slightly different, in that it must be run through the postaliases command instead [2]. Attempting to run the aliases file(s) through postmap will produce a flurry of error messages.

Starting It All Up

Assuming no typos in any of the various daemons' configuration files, the mail system should be about ready to go. The mail system should be started in the following order: anti-virus (Clam A/V), content-filter engine (AMaViSd), then MTA (Postfix). The Postfix and AMaViSd packages both come with init scripts. Ensure that these are installed into the root run control script directory (on System V Unix systems, this is /etc/init.d). The Clam A/V software will require the creation of an init script. On Solaris systems, the following should do:

#!/bin/sh
#
# Manage the clamav service

CLAMDAEMON="/usr/local/sbin/clamd"
FRESHCLAM="/usr/local/bin/freshclam"
CLAMREFRESH="-d -c 24"

case $1 in
   start)
          $CLAMDAEMON
          $FRESHCLAM $CLAMREFRESH
   ;;
   stop)
          pkill clamd
          pkill freshclam
   ;;
   *)
          echo "Usage: $0 [start|stop]"
   ;;
esac

The init scripts should then be linked into the appropriate run-level directories. On a Solaris system, see Table 2 for recommended init script locations. Adapt this setup as necessary for the appropriate Unix flavor, taking special care to preserve the relative start orders.

Testing the Setup

Once all of the daemons are configured and running, a few tests are needed to ensure correct functioning of the configuration. The tests are: ability to send mail to remote systems, ability to receive mail from remote systems, whether email is correctly flagged as spam, and whether email is correctly flagged as containing viruses.

Testing Ability to Send Mail to Remote Systems

Depending on your operating system, you may not have tools to correctly send email from the command line via Postfix. If this is the case, the following test, using Postfix's Sendmail API, will confirm outbound functionality:

# sendmail
From: testsender@smtp-gate.mail.domain
To: testrecept@mail.domain
Subject: TEST

TEST
.

If outbound email is functioning correctly, the remote email address will have the test message in the designated user's inbox.

Testing Ability to Receive Mail from Remote Systems

Testing the ability to receive email is as simple as taking any given mail client and sending a test message to an address on the Postfix server. An address that is usually safe to send to is root@mail.server.FQDN. If the message shows up in the root user's mailbox, inbound SMTP is functioning correctly. If the message fails to show up, check the system logs for answers.

Testing Virus Stopping

If the AMaViSd processes are properly configured, 'netstat' should show something like the following:

127.0.0.1.10024          *.*        0      0 49152      0 LISTEN
127.0.0.1.10025          *.*        0      0 49152      0 LISTEN

This indicates that the AMaViSd listener processes are bound and listening. Furthermore, the above indicates that they are listening only for locally initiated traffic (preferred from a security perspective). Listing 6 will test whether they are functioning correctly.

The "X50..." line is a special string of text used for testing the virus scanning function. It should always result in at least one SMTP "250" response statement that contains the token "BOUNCE". An email warning should also have been sent to the user configured to receive virus alerts (usually postmaster). It should look something like:

From: virusalert@smtp-gate.mail.domain
To: virusalert@smtp-gate.mail.domain
Subject: VIRUS (Eicar-Test-Signature) FROM LOCAL <root@mail.domain>

If not, the virus scanner is malfunctioning. Check the system logs for failure indications.

Testing Spam Stopping

This test also can be accomplished using any mail client desired. It is critical that the message body contain the following text:

XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X

Like the virus test, the above should result in an email alert being sent to your spam alert account (usually postmaster). This alert email should be addressed from "spam.police@mail.server.domain." Again, if the alert email does not show up, check the system logs for failure indication.

Lastly, confirm overall functionality of the mail system by sending an email message through it and looking at the message headers. If the message sent contains known spam indicators, output will be similar to the below (truncated) headers:

Date: Wed, 3 Mar 2004 16:00:12 -0500 (EST)
From: Super-User <root@smtp.mail.domain>
Message-Id: <200403032100.i23L0CjP019360@smtp.mail.domain>
To: testuser@wsl.mail.domain
Subject: Header Test
X-Virus-Scanned: by amavisd-new at mail.domain
X-Spam-Status: No, hits=4.6 tagged_above=1.0 required=5.0
        tests=BAYES_40, CLICK_BELOW, EXCUSE_REMOVE, FRONTPAGE,
        HTML_70_80, HTML_LINK_CLICK_HERE, HTML_MESSAGE,
        HTML_TITLE_EMPTY, MIME_HTML_ONLY, OFFERS_ETC
X-Spam-Level: ****

Pay special attention that an "X-Virus-Scanned:" line is present. That proves that the message passed through the content filtering agent.

If the test message was populated with spam-ish data, an "X-Spam-Status" line should be present. This indicates that the message was scanned for spam content, what types of spam tests it was flagged by, and the cumulative hits of the combined spam hits. If enough spam-ish content was populated into the message, the "Subject:" line will also be modified to indicate high spam content.

Miscellaneous Content Blocks

Postfix also offers an in-built mechanism for blocking mails based on header, body, and MIME content-type checks. These were mentioned in passing by way of the Postfix configuration file example. The specific configuration parameters are header_checks, body_checks, and mime_header_checks. To quickly and easily block unwanted attachments, place the following in your mime_header_checks map file:

/name=[^>]*\.(ade|adp|app|asd|asf|asx|bas|bat|chm|cmd|com|cpl|crt|
dll|exe|fxp|hlp|hta|hto|inf|ini|ins|isp|jse?|lib|lnk|mdb|mde|msc|
msi|msp|mst|ocx|pcd|pif|prg|reg|scr|sct|sh|shb|shs|sys|vb|vbe|vbs|
vcs|vxd|wmd|wms|wmz|wsc|wsf|wsh|\{[^\}]+\}) /
   REJECT E-mails with Unsafe Attachment Types Are Rejected by Rule.

The above is a regular expression that will check the "name=" field of a MIME block. If that block contains any offending attachment types (e.g., .pif, .exe, .dll, etc.), the email will be rejected and the sending system given an error message of "E-mails with Unsafe Attachment Types Are Rejected by Rule". If you find that desired emails are being discarded or rejected, try breaking the above regular expression rule up into discrete rules. Assign an error number to each pattern matched. This will help you locate which rule resulted in the discard or bounce.

These three check mechanisms allow you to accept/reject email based on a virtually limitless criteria set. It's mostly a matter of figuring out the necessary regular expression to catch those items that require action.

Deployment Options

Given all of the above, the question may follow "how to implement it?" There are two main ways to implement the filtration server into a mail solution: as a standalone destination host (i.e., a system that will host user mail), or as an inline filtering device for a downstream mail or mailbox server. There also exist a few variations on how to implement an inline content-filtering mail server: standard SMTP via MX precedence, via transport mapping, and using either SMTP or LMTP direct delivery to downstream mail hosts.

Use as an inline content-filtering device has two advantages: it allows the downstream mail or mailbox servers to be afforded protection that might not be otherwise available to them; it offloads the content-filtering overhead to another host. Both of these allow the downstream mail hosts to be essentially left unmodified.

Filter Host as Destination

This is perhaps the simplest way to deploy this configuration. In this scenario, mail that comes into the filter server is either delivered locally to a valid recipient, or bounced (rejected). Postfix determines whether a recipient address is a valid local recipient by either consulting the system's user database (e.g., /etc/passwd), or by using defined local user lookup maps. These alternate namespace maps are configured using the local_recipient_maps directive. There are additional parameters to configure when using alternate name spaces, but detailing those options falls outside of the scope of this article.

Mail delivery can be configured to deliver to user's home directories by way of the home_mailbox directive, or to a common mail spool directory by way of the mail_spool_directory directive. The Postfix software can natively handle either traditional Unix "mbox" format or the newer "maildirs" format for local message delivery. For maildirs format, simply add a "/" to the end of the directory name specified through either the home_mailbox or mail_spool_directory directives.

Local mail delivery will also require a way for users to collect their mail. Although this is also outside of the scope of this article, I have a few recommendations. For smallish implementations, the Washington IMAP server is a fairly standard choice. For larger implementations, Cyrus IMAP is probably a better choice. Both of these servers offer email collection via POP3 and IMAP protocols and also support TLS-protected sessions.

Lastly, local delivery will likely imply use of the server as an SMTP relay for the POP/IMAP clients. If SMTP relay service is to be provided to the POP/IMAP users, the relay service should make use of SMTP authentication. Furthermore, to protect the login credentials, TLS should be used to encrypt the SMTP authentication sessions.

Inline Filter Device -- SMTP via MX

The SMTP specifications provide a means by which mail servers can be set up with delivery preference levels. For example, a given email destination would have the highest preference, whereas hosts that are intended to act as a backup for that server's deliveries have lower preference. SMTP's behavior is to contact the highest preference host that it can reach. Failing that, it will fall back to the next highest preference host until it exhausts the list of MX hosts.

Given some minor "abuses" (uses that might not have been at the core of the design), this can be used to enforce a mail flow. This flow can be set up to require inbound email from the Internet to pass through the content-filtering host before delivery to the destination SMTP host. The "abuses" would be to set up the highest preference MX host to not accept inbound SMTP connections from any host other than designated upstream SMTP servers. In this case, that would be the content-filtering hosts. This could be accomplished via ACLs within the destination SMTP application (e.g., TCP Wrappers), the destination SMTP host (e.g., IP Filter) or by way of firewall devices.

The MX preference list would cause the inbound traffic to flow to the first host it could read -- the content-filter host -- then on to the destination MX host. This flow would be created via DNS.

Inline Filter Device -- SMTP Transport Mapping

Sometimes, it may seem advantageous to "hide" the last-hop MX host in DNS. That is to say, do not advertise in public DNS that your final destination SMTP host even exists. To do this, your upstream mail hosts -- in this case, the content-filter host(s) -- will be designated the last-hop MX host in DNS.

As last-hop MX hosts, emails to these systems would normally bounce back to sender with a "mail loops back to myself" or similar error. Normally, this error would be prevented by configuring the last-hop MX for local delivery. To avoid this error without performing local delivery, the Postfix process must be told what else to do with the email. This explicit routing is used to make Postfix shunt email on to the next-hop system, using the transport_map directive. Typically, this will be defined something like transport_map = dbm:/etc/postfix/transport_map. This tells Postfix to consult a Berkeley DBM formatted lookup file (more efficient if lots of explicit routes are required). It will be formatted similar to the following:

.mail.domain                      smtp:[last-hop.FQDN]
mail.domain                       smtp:[last-hop.FQDN]
smtp-gate.mail.domain             :

This map table instructs Postfix to deliver anything destined explicitly for "mail.domain" or to any subdomain of "mail.domain" to the host "last-hop.FQDN" (e.g., imap.mail.domain). The square brackets around the "last-hop.FQDN" are critical, in that they instruct Postfix not to attempt an MX lookup for that hostname, but to perform the delivery directly to the named host. The two records with the bare ":" right-hand argument instruct Postfix to perform normal, MX-based delivery for the hosts explicitly named on the left-hand side. In this case, those are the names by which the content-filter host is known, which were previously set in the Postfix configuration file with the myhostname directives.

Final Thoughts

I'm a long-time Sendmail administrator, and I've found that fighting spam with Sendmail, although possible, exerted a significant overhead on the systems on which it was used. It also proved to be annoyingly slow and sometimes buggy and was not holding up well to the increasing volumes of spam.

Sendmail can be configured to use helper programs called Milters. However, the Milter I used, MIMEDefang, did not behave or perform well in a Sun Solaris 9 Sendmail environment. This caused me to search for a better solution. After searching mail-related forums, I tried several different MTAs. I chose Postfix because of its good combination of performance, extensibility, stability, and administrative ease.

Postfix proved to be such a good solution, that I relegated my Sendmail server to be purely an SMTP relay for POP and IMAP customers. Even with the virus and spam-filtering functions removed from the Sendmail server, the Sendmail server was considerably slower than either of the eventual Postfix content-filter servers. Note, however, that the Sendmail server in question is a Sun Enterprise 250 with 2GB of memory and 2x400MHz CPUs; one Postfix server is a Sun Ultra I with 512MB of memory and 1x167MHz CPU; and the other Postfix server is a Sun Ultra II with 512MB of memory and 2x296MHz CPUs. A queue flush of only a few hundred messages sent from the slowest Postfix server to the Sendmail server would put a serious hurt on the Sendmail server.

I also tested Postfix on a secondary MX (the Ultra I). When I saw how well the secondary MX was functioning, it made sense to offload the filtering work from the Sendmail host completely to the secondary MX. Furthermore, it made sense not to make the filter host a single point of failure. Therefore, an old Ultra II was allocated to be a parallel filter host. Then MX rules were changed for all of the domains trafficking the mail system to cause them to MX terminate at the filter hosts. The filter hosts, in turn, implemented transport maps to ensure delivery to the Sendmail server.

All of this experimentation proved that a reliable and fast mail architecture could be created with just three hosts. The fact that two hosts could act as a front-end for a back-end mail store indicates that the solution was fairly scalable [3] in addition to being reliable [4].

About two months after initially writing this article, I was able to convince my employer to allow me to install the previously described filtering system. These systems were deployed as in-line content filters, in front of the corporate Exchange servers. After full deployment of two filter hosts, a daily average of 72% of SPAM traffic and nearly 100% of virus traffic was stopped at the filter hosts. This greatly reduced the administrative load on the Exchange administrators as well as significantly reducing the network, CPU, and disk usage of the Exchange systems. Finally, it also greatly cut down on the amount of SPAM received by the more SPAM-afflicted Exchange users.

Thanks to Robert Bastille for proofing and testing the original manuscript. Thanks to members of the All Things Unix forum at DSL Reports for reforming a Sendmail bigot.

Thomas H. Jones II currently works for a small consulting company based in Virginia. Previously, he spent seven years in the ISP industry and three years working for Enterprise Hardware Vendors.

1 When the installation script asks for "install_root:", all files will be installed relative to this root. For example, if you set install_root = /usr/local, everything will be installed under /usr/local. This is useful if you want to run components of Postfix chroot()ed.

2 If you already have a Sendmail aliases file, Postfix can be made to understand it via the alias configuration directives, above, and using postalias against the Sendmail alias file.

3 Scalability is provided by means of DNS. If the content-filtering configuration is to include more than one filtering server, each server should be configured with the same MX preference level. SMTP traffic will tend to be distributed in a round-robin fashion across the available filter servers. If/when the number of content-filter servers is insufficient to support the mail flow to the last-hop MX host, simply add more identically configured content-filter hosts to the mail flow architecture. As more servers are added, traffic will be spread across the new systems as the updated DNS information propagates.

4 Availability comes through the fallback nature of MX references. As previously noted, SMTP's includes an availability fallback feature. Therefore, if one of the content-filter hosts is offline for repair, upgrade, etc., the calling SMTP host will attempt to connect to the next available MX host. That will be whichever of the remaining content-filter hosts it is able to contact first. It should become fairly obvious that, in an ideal world, a minimum of two content-filters must be configured in front of the downstream mail hosts.