Article

aug2004.tar

Secondary MX Spam Trap

Liam Widdowson

Spam is an increasingly significant menace in the design, capacity planning, and day-to-day maintenance of email systems. In January of 2003, MessageLabs estimated that one in every 4.1 emails was identifiable as spam [1]. As of April 2004, this has escalated to one in every 1.8 emails [2], with no clear end in sight.

Many readers may associate spam simply as an annoyance for users, but spam is a significant issue for mail system operators. The massive influx of spam causes mail system operators to expand their systems significantly to cope with the load and/or deploy spam countermeasures. Spam is no longer a user-specific annoyance; it is now a major issue for ISPs.

Over the past decade, there have been significant efforts in developing systems to detect and combat spam. These can be loosely categorized into two types -- those based on performing content analysis, and those based on network metrics. Each spam detection method has inherent advantages and disadvantages, which in turn affect platform requirements, classification throughput, and the number of false positives and false negatives.

Initially in the 1990s, the utilization of RBL (Real-time Black Lists) proved both popular and effective in halting the vast quantities of spam originating from open relays and the MTAs of known spammers. However, spammers have become more sophisticated -- they utilize ISP relays, subverted and virus-infected PCs, checksum-resistant content, and many other techniques to try to circumvent and actively subvert anti-spam systems. Both the open source and commercial software communities continue to actively develop countermeasures to stay ahead of the spammers in a never-ending arms race.

Over the past few years, RBLs have become increasingly ineffective due to the significant collateral damage and customer complaints that occur when a large ISP or carrier is black-listed. This has led to the creation of sophisticated anti-spam "frameworks", such as SpamAssassin [3] and Sophos PureMessage [4], that aggregate the results of multiple anti-spam techniques such as heuristics, content analysis, and distributed checksums [5] as part of an overall scoring scheme to determine the probability of a message being spam or ham.

Within both content-analysis and network metric-based solutions, there is a trade off between false negatives and false positives. That is, tweaking or modification of a particular metric to significantly reduce false positives, in turn, increase false negatives (with varying degrees of magnitude, depending on the particular solution in question).

Most recently, the application of statistical approaches to spam detection based on Bayesian filters (e.g., CRM114 [6] and DSPAM [7]) have begun to provide users with high performance solutions with extremely low false negative and false positive rates.

In this article, I describe the implementation of a new spam countermeasure that uses a novel network-centric approach. It exploits, to their disadvantage, a particular behavior displayed by some spammers. Furthermore, this approach is able to provide a modest hit rate while guaranteeing zero false positives.

The countermeasure exploits spammer's preference for secondary and tertiary MX mail servers to create a secondary MX spam trap. The key requirement behind the development of the secondary MX spam trap is to reduce the amount of spam processed by a mail platform and other complementary spam countermeasures. It is not designed as a standalone anti-spam solution in its own right.

However, like most spam countermeasures, the secondary MX spam trap will over time become ineffective as spammers change their behavior. Until then, it can provide a simple and effective way to reduce the total amount of spam received by a mail system without the fear of false positives.

The Secondary MX Spam Trap

The premise behind a secondary MX spam trap is that, typically, spammers prefer using secondary mailer exchanges for delivery of spam. Spammers do this based on the assertion that secondary mailer exchanges often have more relaxed mail rejection controls (i.e., user existence checks, RBL lookups, etc.) and that secondary mailer exchanges are sometimes trusted by the primary mailer exchange. Secondary mailer exchangers are often operated by a third-party (such as an upstream ISP or sister organization) and, as such, may not have similar (or any) spam countermeasures.

Additionally, the software used by spammers to send unsolicited email seldom conforms to the RFC 2821 specification for email delivery. These applications are built to achieve the maximum possible message delivery throughput while ignoring transient or permanent errors.

It is possible to exploit this behavior to reduce the total amount of spam delivered to the mail system while not impacting legitimate MTAs. This is achieved by adding to a domain record one or more secondary MX records that point to a special SMTP service that always returns transient soft error codes when any client attempts to deliver mail.

Legitimate MTAs will never attempt to connect to the secondary MX unless the primary MX is down. If, for any reason a legitimate MTA does attempt to connect to the secondary MX, it will be instructed that a transient error has occurred and that it should retry delivery. This will occur transparently to both the sender and the receiver. The remote MTA will then retry delivery and should select the primary MX, which will accept the mail as per normal.

Spammers seldom retry message delivery if the initial attempt is unsuccessful due to a transient or permanent error. The spam trap prevents such a spam message from ever being processed by a local MTA, much less delivered to an end user.

Implementation

To achieve the aforementioned behavior, I have written a small open source multi-threaded SMTP daemon named smtptrapd [9]. The software is written in C and utilities POSIX threads in a producer/consumer model to minimize resource utilization.

Smtptrapd behaves like a standard SMTP service in accordance with RFC 2821. Upon connection, it provides a banner and returns a 250 OK response to the HELO, EHLO, NOOP, and MAIL FROM SMTP verbs. However, it returns a 451 response to the RCPT TO verb, which is the key behavior for the spamtrap. It does not implement the DATA verb because the message content transmission stage is never reached.

This behavior is employed because any RFC 821- or 2821-compliant MTA will re-queue a message when the remote MTA has sent a 4xx error code in response to an RCPT TO verb. Anecdotal evidence indicates that some obscure SMTP implementations do not behave appropriately when a 4xx transient error is issued in response to the DATA command. Therefore, replying at the RCPT TO stage is safest. Further, this approach also provides the added benefit of reducing network traffic as the message content is never transmitted.

Smtptrapd also has a number of built-in security controls to discourage spammers from abusing the service. The daemon will forceably drop an SMTP connection after a particular number of SMTP commands have been received or if a session timeout has been reached. The software can also run in a chrooted environment as an unprivileged user.

Installation

The smtptrapd source tarball can be downloaded from:

http://smtptrapd.inodes.org

The included Makefile assumes the gcc compiler but should work with almost any Unix variant that has a POSIX 1003.1c-compliant threads implementation. The software has been tested on Linux and Solaris. Patches for other operating systems are welcomed.

Almost every smtptrapd option can be supplied as a command-line argument. However, three options exist that must be configured at compile time. These options are at the top of the smtptrapd.c file and control timeout metrics. They are as follows:

#define SMTP_CMD_MAX 50
#define SMTP_CMD_TOUT 60
#define SMTP_TIME_MAX 120

The SMTP_CMD_MAX constant defines the maximum number of commands the daemon may receive before forceably closing the connection. The SMTP_CMD_TOUT option controls the maximum time the daemon will wait for input. The SMTP_TIME_MAX constant specifies the total time a session can take before forceably closing the connection. A number of other pre-processor constants are listed at the top of the source. These specify the defaults for a number of options but can be overridden using particular command-line arguments.

The software can be compiled and installed by simply running make and then make install. For example:

$ make linux
# make install

Once installed, the software can be run by executing smtptrapd as root. By default, the daemon will create 10 threads, listen on port 25, and change userid to nobody. These parameters can be changed by specifying various command-line arguments. These arguments are self-explanatory and are listed by passing a -h argument to smtptrapd. For example:

$ smtptrapd -h
smtptrapd: -c [chroot dir] -l [tcp listen address]
           -b [smtp banner hostname] -u [username] 
           -t [number of threads]
           -p [listen port]

The system can also optionally run in a chrooted environment. The creation of a chroot jail is beyond the scope of this article but details on doing so can be found in previous Sys Admin articles [8].

Once running, the software can be tested by attempting an SMTP session using telnet. For example:

$ telnet localhost 25
220 hostname ESMTP Service Ready
ehlo
250 OK
mail from: <>
250 OK
rcpt to: <user@hostname.tld>
451 Try again later
quit
221 Closing connection

A log entry is written to the "mail" facility of syslog for every inbound connection. The log entry for the above delivery attempt is as follows:

Jun  5 17:28:32 callisto smtpdtrap[1420]: info: connection
[ip: 127.0.0.1; f: mail from: <>; r: rcpt to: <user@hostname.tld>]

To direct spammers to the trap, MX records for a domain need to be amended. The MX records for a typical zone file may look as follows:

IN    MX    10    mx10.hostname.tld.
IN    MX    20    mx20.hostname.tld.

In this example, mx20 points to a system running smtptrapd. It is also possible to specify multiple secondary MX records, which is encouraged to increase the probability of spammers hitting the secondary MX spam trap if their software randomly selects an MX. However, when doing so, administrators must consider that adding many MX records will increase the size of DNS responses. A very large number of MX records may force the name server to respond with TCP rather than UDP packets. This may cause an issue with firewalls or DNS implementations that do not interoperate correctly over TCP.

Testing and Results

Operation of smtptrapd in the wild has demonstrated that a drop of 9-10% in the total volume of email received by the primary MX occurs after implementation. Further, more than 90% of connections from particular IP addresses made to the spamtrap are never re-attempted on the primary MX.

Observation of connection logs has also shown that the vast majority of connections to the secondary MX are from CPE address ranges and not legitimate MTAs. While these results may seem very modest, a reduction in traffic of about 10% can translate to a significant reduction in hardware and resource requirements for systems with very large inbound message rates.

Conclusion

The war against spam has no end in sight. Until we find a way to dissolve the spammers' business model, they will continue to deliver unsolicited bulk email. Until then, systems administrators and engineers will have to deploy and maintain spam countermeasures for the foreseeable future.

References

1. Wood, P., 2003. A Spammer in the Works, MessageLabs -- http://www.messagelabs.com/uk/here/pdf/ASpammerInTheWorks.pdf

2. MessageLabs, 2004 -- http://www.messagelabs.org/

3. SpamAssassin, 2004 -- http://www.spamassassin.org/

4. Sophos, 2004 -- http://www.sophos.com/products/pm/

5. Distributed Checksum Clearinghouse, 2004 -- http://www.rhyolite.com/anti-Spam/dcc/

6. CRM114, 2004 -- http://crm114.sourceforge.net/

7. DSPAM, 2004 -- http://www.nuclearelephant.com/projects/dspam/

8. Widdowson, L., 2002. Jailed Internet Services, Sys Admin August 2001 -- http://www.samag.com/documents/s=1147/sam0108f/0108f.htm.

9. Smtptrapd, 2004 -- http://smtptrapd.inodes.org

Liam Widdowson has spent many years working exclusively on very large ISP email systems. He is currently the Systems Architect of a multi-million user ISP email system, the largest of its type in Australia. Liam can be contacted at: lbw@telstra.com.