Secondary
MX Spam Trap
Liam Widdowson
Spam is an increasingly significant menace in the design, capacity
planning, and day-to-day maintenance of email systems. In January
of 2003, MessageLabs estimated that one in every 4.1 emails was
identifiable as spam [1]. As of April 2004, this has escalated to
one in every 1.8 emails [2], with no clear end in sight.
Many readers may associate spam simply as an annoyance for users,
but spam is a significant issue for mail system operators. The massive
influx of spam causes mail system operators to expand their systems
significantly to cope with the load and/or deploy spam countermeasures.
Spam is no longer a user-specific annoyance; it is now a major issue
for ISPs.
Over the past decade, there have been significant efforts in developing
systems to detect and combat spam. These can be loosely categorized
into two types -- those based on performing content analysis, and
those based on network metrics. Each spam detection method has inherent
advantages and disadvantages, which in turn affect platform requirements,
classification throughput, and the number of false positives and
false negatives.
Initially in the 1990s, the utilization of RBL (Real-time Black
Lists) proved both popular and effective in halting the vast quantities
of spam originating from open relays and the MTAs of known spammers.
However, spammers have become more sophisticated -- they utilize
ISP relays, subverted and virus-infected PCs, checksum-resistant
content, and many other techniques to try to circumvent and actively
subvert anti-spam systems. Both the open source and commercial software
communities continue to actively develop countermeasures to stay
ahead of the spammers in a never-ending arms race.
Over the past few years, RBLs have become increasingly ineffective
due to the significant collateral damage and customer complaints
that occur when a large ISP or carrier is black-listed. This has
led to the creation of sophisticated anti-spam "frameworks", such
as SpamAssassin [3] and Sophos PureMessage [4], that aggregate the
results of multiple anti-spam techniques such as heuristics, content
analysis, and distributed checksums [5] as part of an overall scoring
scheme to determine the probability of a message being spam or ham.
Within both content-analysis and network metric-based solutions,
there is a trade off between false negatives and false positives.
That is, tweaking or modification of a particular metric to significantly
reduce false positives, in turn, increase false negatives (with
varying degrees of magnitude, depending on the particular solution
in question).
Most recently, the application of statistical approaches to spam
detection based on Bayesian filters (e.g., CRM114 [6] and DSPAM
[7]) have begun to provide users with high performance solutions
with extremely low false negative and false positive rates.
In this article, I describe the implementation of a new spam countermeasure
that uses a novel network-centric approach. It exploits, to their
disadvantage, a particular behavior displayed by some spammers.
Furthermore, this approach is able to provide a modest hit rate
while guaranteeing zero false positives.
The countermeasure exploits spammer's preference for secondary
and tertiary MX mail servers to create a secondary MX spam trap.
The key requirement behind the development of the secondary MX spam
trap is to reduce the amount of spam processed by a mail platform
and other complementary spam countermeasures. It is not designed
as a standalone anti-spam solution in its own right.
However, like most spam countermeasures, the secondary MX spam
trap will over time become ineffective as spammers change their
behavior. Until then, it can provide a simple and effective way
to reduce the total amount of spam received by a mail system without
the fear of false positives.
The Secondary MX Spam Trap
The premise behind a secondary MX spam trap is that, typically,
spammers prefer using secondary mailer exchanges for delivery of
spam. Spammers do this based on the assertion that secondary mailer
exchanges often have more relaxed mail rejection controls (i.e.,
user existence checks, RBL lookups, etc.) and that secondary mailer
exchanges are sometimes trusted by the primary mailer exchange.
Secondary mailer exchangers are often operated by a third-party
(such as an upstream ISP or sister organization) and, as such, may
not have similar (or any) spam countermeasures.
Additionally, the software used by spammers to send unsolicited
email seldom conforms to the RFC 2821 specification for email delivery.
These applications are built to achieve the maximum possible message
delivery throughput while ignoring transient or permanent errors.
It is possible to exploit this behavior to reduce the total amount
of spam delivered to the mail system while not impacting legitimate
MTAs. This is achieved by adding to a domain record one or more
secondary MX records that point to a special SMTP service that always
returns transient soft error codes when any client attempts to deliver
mail.
Legitimate MTAs will never attempt to connect to the secondary
MX unless the primary MX is down. If, for any reason a legitimate
MTA does attempt to connect to the secondary MX, it will be instructed
that a transient error has occurred and that it should retry delivery.
This will occur transparently to both the sender and the receiver.
The remote MTA will then retry delivery and should select the primary
MX, which will accept the mail as per normal.
Spammers seldom retry message delivery if the initial attempt
is unsuccessful due to a transient or permanent error. The spam
trap prevents such a spam message from ever being processed by a
local MTA, much less delivered to an end user.
Implementation
To achieve the aforementioned behavior, I have written a small
open source multi-threaded SMTP daemon named smtptrapd [9]. The
software is written in C and utilities POSIX threads in a producer/consumer
model to minimize resource utilization.
Smtptrapd behaves like a standard SMTP service in accordance with
RFC 2821. Upon connection, it provides a banner and returns a 250
OK response to the HELO, EHLO, NOOP, and MAIL FROM SMTP verbs. However,
it returns a 451 response to the RCPT TO verb, which is the key
behavior for the spamtrap. It does not implement the DATA verb because
the message content transmission stage is never reached.
This behavior is employed because any RFC 821- or 2821-compliant
MTA will re-queue a message when the remote MTA has sent a 4xx error
code in response to an RCPT TO verb. Anecdotal evidence indicates
that some obscure SMTP implementations do not behave appropriately
when a 4xx transient error is issued in response to the DATA command.
Therefore, replying at the RCPT TO stage is safest. Further, this
approach also provides the added benefit of reducing network traffic
as the message content is never transmitted.
Smtptrapd also has a number of built-in security controls to discourage
spammers from abusing the service. The daemon will forceably drop
an SMTP connection after a particular number of SMTP commands have
been received or if a session timeout has been reached. The software
can also run in a chrooted environment as an unprivileged user.
Installation
The smtptrapd source tarball can be downloaded from:
http://smtptrapd.inodes.org
The included Makefile assumes the gcc compiler but should work with
almost any Unix variant that has a POSIX 1003.1c-compliant threads
implementation. The software has been tested on Linux and Solaris.
Patches for other operating systems are welcomed.
Almost every smtptrapd option can be supplied as a command-line
argument. However, three options exist that must be configured at
compile time. These options are at the top of the smtptrapd.c file
and control timeout metrics. They are as follows:
#define SMTP_CMD_MAX 50
#define SMTP_CMD_TOUT 60
#define SMTP_TIME_MAX 120
The SMTP_CMD_MAX constant defines the maximum number of commands the
daemon may receive before forceably closing the connection. The SMTP_CMD_TOUT
option controls the maximum time the daemon will wait for input. The
SMTP_TIME_MAX constant specifies the total time a session can take
before forceably closing the connection. A number of other pre-processor
constants are listed at the top of the source. These specify the defaults
for a number of options but can be overridden using particular command-line
arguments.
The software can be compiled and installed by simply running make
and then make install. For example:
$ make linux
# make install
Once installed, the software can be run by executing smtptrapd as
root. By default, the daemon will create 10 threads, listen on port
25, and change userid to nobody. These parameters can be changed by
specifying various command-line arguments. These arguments are self-explanatory
and are listed by passing a -h argument to smtptrapd. For example:
$ smtptrapd -h
smtptrapd: -c [chroot dir] -l [tcp listen address]
-b [smtp banner hostname] -u [username]
-t [number of threads]
-p [listen port]
The system can also optionally run in a chrooted environment. The
creation of a chroot jail is beyond the scope of this article but
details on doing so can be found in previous Sys Admin articles
[8].
Once running, the software can be tested by attempting an SMTP
session using telnet. For example:
$ telnet localhost 25
220 hostname ESMTP Service Ready
ehlo
250 OK
mail from: <>
250 OK
rcpt to: <user@hostname.tld>
451 Try again later
quit
221 Closing connection
A log entry is written to the "mail" facility of syslog for every
inbound connection. The log entry for the above delivery attempt is
as follows:
Jun 5 17:28:32 callisto smtpdtrap[1420]: info: connection
[ip: 127.0.0.1; f: mail from: <>; r: rcpt to: <user@hostname.tld>]
To direct spammers to the trap, MX records for a domain need to be
amended. The MX records for a typical zone file may look as follows:
IN MX 10 mx10.hostname.tld.
IN MX 20 mx20.hostname.tld.
In this example, mx20 points to a system running smtptrapd. It is
also possible to specify multiple secondary MX records, which is encouraged
to increase the probability of spammers hitting the secondary MX spam
trap if their software randomly selects an MX. However, when doing
so, administrators must consider that adding many MX records will
increase the size of DNS responses. A very large number of MX records
may force the name server to respond with TCP rather than UDP packets.
This may cause an issue with firewalls or DNS implementations that
do not interoperate correctly over TCP.
Testing and Results
Operation of smtptrapd in the wild has demonstrated that a drop
of 9-10% in the total volume of email received by the primary MX
occurs after implementation. Further, more than 90% of connections
from particular IP addresses made to the spamtrap are never re-attempted
on the primary MX.
Observation of connection logs has also shown that the vast majority
of connections to the secondary MX are from CPE address ranges and
not legitimate MTAs. While these results may seem very modest, a
reduction in traffic of about 10% can translate to a significant
reduction in hardware and resource requirements for systems with
very large inbound message rates.
Conclusion
The war against spam has no end in sight. Until we find a way
to dissolve the spammers' business model, they will continue to
deliver unsolicited bulk email. Until then, systems administrators
and engineers will have to deploy and maintain spam countermeasures
for the foreseeable future.
References
1. Wood, P., 2003. A Spammer in the Works, MessageLabs -- http://www.messagelabs.com/uk/here/pdf/ASpammerInTheWorks.pdf
2. MessageLabs, 2004 -- http://www.messagelabs.org/
3. SpamAssassin, 2004 -- http://www.spamassassin.org/
4. Sophos, 2004 -- http://www.sophos.com/products/pm/
5. Distributed Checksum Clearinghouse, 2004 -- http://www.rhyolite.com/anti-Spam/dcc/
6. CRM114, 2004 -- http://crm114.sourceforge.net/
7. DSPAM, 2004 -- http://www.nuclearelephant.com/projects/dspam/
8. Widdowson, L., 2002. Jailed Internet Services, Sys Admin
August 2001 -- http://www.samag.com/documents/s=1147/sam0108f/0108f.htm.
9. Smtptrapd, 2004 -- http://smtptrapd.inodes.org
Liam Widdowson has spent many years working exclusively on
very large ISP email systems. He is currently the Systems Architect
of a multi-million user ISP email system, the largest of its type
in Australia. Liam can be contacted at: lbw@telstra.com. |