Cover V14, i02

Article
Figure 1

feb2005.tar

Taking Back Your Mailbox with Greylisting

Sean Reifschneider

Unsolicited commercial email, a.k.a. spam, is an attack on the Internet. It's as simple as that. Until we start really treating it as such, the problem is only going to get worse. In the meantime, we've got greylisting.

Greylisting combines many desirable attributes. It's incredibly low maintenance. It has not misclassified a single legitimate message in the 3 months I've been using it, and it has reduced the number of spams I need to look at by 5 to 10 times. When I first heard about greylisting, I dismissed it as too easy for spammers to circumvent. However, in the 18 months since Evan Harris proposed the idea (July 2003), greylisting seems to be as effective as ever.

In this article, I'll provide details of how greylisting works, offer information to help you decide whether it's right for you, and suggest pointers for installation.

What Is Greylisting?

Greylisting works because the vast majority of spammers and email-propagated worms cheat. To get their messages out quickly and easily, they take shortcuts. One of the biggest shortcuts is that most message deliveries are attempted only once. If a delivery attempt fails for any reason, the sender ignores that delivery and tries other addresses.

Greylisting relies on spammers using software that does not queue messages with temporary failures to try them again at a later time. Real mail servers will continue trying to deliver a message for at least several hours, and usually several days if it doesn't go through the first time. Implementing this functionality significantly complicates the mailing software and uses resources on the sender's computers, which is probably why so many spammers are neglecting to do it.

In general terms, greylisting works by creating a "triple" of remote IP address, envelope sender, and envelope recipient for every incoming recipient. I say "recipient" instead of "message" because in the case of carbon copies, a single message can have multiple recipients. This triple is then looked up in a database. For the first hour that a new triple is seen, it is rejected with a temporary failure (400-series SMTP response), requesting that the remote Mail Transfer Agent (MTA) try again later. After that, the recipient is allowed for delivery.

Some of the greylisting implementations also limit the window of opportunity. After the first hour, if a message is not received within the next three hours, the entry is removed and the remote sender must start all over again. This further ensures that the remote system is queuing the message instead of sending another campaign.

Benefits

The primary benefits of greylisting are its effectiveness and low requirements for maintenance. It's the single most effective technique I've found for cutting down on the spam that makes it into my mailbox. I've been using SpamAssassin, ClamAV, outsourced spam blocking companies, and other techniques, and the addition of greylisting has reduced the amount of spam in our mailboxes by 5 to 10 times. Additionally, with it we've been able to entirely eliminate our quarantine folders.

The low maintenance requirements are a huge benefit for the already overworked sys admin. After the initial install and a few tweaks, I haven't had to touch the greylisting system at all. Unlike other techniques, it doesn't require rulesets to be updated, quarantine folders to be reviewed, lists to be preened, decisions about which RBLs (Real-time Black-hole Lists) to use, etc. It just works.

Even if spammers start queuing more of their messages, there are advantages to delaying the initial message from a particular sender. That delay gives other anti-spam techniques such as DCC, Spamcop, and RBLs a better chance of detecting the message as spam. When the spam or virus delivery is retried, it is more likely to be caught by these other mechanisms.

Drawbacks

The biggest complaint against using greylisting is that it will almost certainly cause some email to be delayed. The first message from a particular sender at a particular location will, under most circumstances, take at least an hour to be delivered. If a user is on the telephone with the recipient and an email is sent, the email won't immediately come through unless the sender is already in the greylist. This has happened to me only once in the 3 months we've been using greylisting. Some users may be intolerant of this delay. The good news is that some greylist implementations allow you to configure the greylisting on a per-user basis.

Many greylisting implementations have a "learning" mode to help decrease delayed email from legitimate senders when greylisting is first turned on. In learning mode, the greylist goes through all the motions but never rejects messages. The database is then primed with frequent senders. Run in learning mode for the first few days or weeks to reduce the impact to your users.

Outgoing mail may also be delayed because of a technique called "SMTP callbacks". SMTP callbacks are used by some mail servers to try to combat forged sender addresses. When a mail server gets a message claiming to be from a particular user, a connection is made to a mail server for that domain to check that it will accept email for that sender address. If this happens from one of your users, the greylisting software may cause the remote server to deny the message until after the greylisting period expires.

SMTP callbacks should be made obsolete by SPF (Sender Policy Framework), so this will become less and less of a problem for greylisting hosts.

A bigger problem is caused by mail servers that randomly use IP addresses in a pool for sending a particular message or that vary the sender address with every message or every mail attempt. For example, Google's Gmail will usually connect from a different IP address each time it tries to re-deliver a message. Some mailing lists will try to track bounces to individual messages so that every message sent will have a different sender address.

A whitelist is kept of the known addresses for which this is an issue, so that messages from affected providers can bypass greylisting. Another recommendation is that, instead of using the literal IP address of the sender as part of the triple, one could mask off the address to the top 24 bits, for example, without dramatically reducing the effectiveness.

The final drawback to consider is that greylisting MUST be implemented on ALL outward-facing SMTP servers for a domain. This is because the remote IP address is an important part of greylisting.

If you have one or more secondary MXs (Mail eXchangers), they must all implement greylisting, or messages to those servers will circumvent the greylist. This also prevents the use of many outsourced or third-party spam-blocking services, unless they implement greylisting in their system.

Performance

Having every incoming recipient trigger a database lookup -- and probably a database insert -- seems like a fairly expensive operation. However, most databases are optimized for exactly these sorts of operations and can perform them quickly.

I ran some performance tests with the greylisting implementation that I'm using. On a 2.66-GHz P4 system using 2 Hitachi 7200RPM 80GB drives in a RAID-1 array, the system was able to handle 384 greylist requests per second. This was with a real-world dataset of more than 50,000 greylist requests from our production mail server logs.

This greylist implementation uses the filesystem for the database and, therefore, is not as fast as it could be. We selected this option because the performance is more than three orders of magnitude higher than our typical mail load. The lack of database configuration and maintenance is why this implementation was selected.

My experience has been that using greylisting to drop 6 out of 7 incoming message attempts has freed up a majority of cycles on our system. Before greylisting, our modest 1-GHz server was often unable to meet demands. We were regularly running into mail delays of several hours because the system simply couldn't keep up. Even after we upgraded to a much newer 2.66-GHz system, the system was still struggling. Adding greylisting has completely eliminated the instances of spam clogging of the email server.

Countermeasures

A common argument against greylisting is that spammers can employ countermeasures to work around it. That's certainly true and, in fact, some spammers already do get past greylisting. However, my experience has been that the vast majority of spam messages sent are currently getting caught by greylisting. I could imagine that if significant numbers of hosts implemented greylisting, the effectiveness would drop off within 6 to 18 months.

In the meantime, your users' mailboxes and quarantine folders will be dramatically smaller. Six to 18 months will also allow some other technologies to mature to the point where we can start relying on them. SPF, Sender-ID, and trust networks are up-and-coming techniques that will be effective for the long term but need additional maturing to be fully effective.

Case Study

In early August, I implemented greylisting on our company mail server. We are a company of five people and have been on the Internet a very long time. Our addresses are all over, making us a huge target for spam. Our quarantine folder has had as many as 20,000 messages per day in it, averaging 6,000 per day before we started getting tough on spam.

The quarantine folder is simply a collection of the messages that scored between 4 and 10 in SpamAssassin. Anything with a score greater than 10 is thrown away because it's almost certainly spam, but we like to review the marginal ones.

Figure 1 shows the number of messages in our quarantine folder that are less than 24 hours old. The trend is pretty clear from November to May, going from 2,000 per day up to 6,000 per day. In May, we added an outsourced spam prevention company, which cut the spam back down to 2,000 per day. However, within a month it had jumped back up to more than 4,000 per day. During the next month, I was able to get that back down to "only" 2,000 per day by spending 15 to 30 minutes per day tweaking the settings on the third-party system.

In August, I enabled greylisting. Since then, we've been between 50 and 150 messages in the quarantine per day. Additionally, the spam coming into our individual mailboxes has dropped by a factor of ten. Instead of 20 or 30 (of another 30 to 60 legitimate messages per day, not counting mailing lists), we're now down to 1 to 3 spams per day that we have to deal with.

All without the aggravation of legitimate messages getting stuck in quarantine folders, having to devote a fraction of the day to maintenance, and other problem areas we've seen before greylisting.

Here are some statistics from our mail server over the past four days:

114852 SMTP connections
26502 message delivery attempts
22904 new greylist database "triple" entries
10711 unique hosts making new greylist entries
2576 messages delivered to users
1210 deliveries allowed by greylist entries
403 messages that were delayed by greylisting, but then delivered
39 legitimate messages delayed
0 false positives

The number of SMTP connections is high for a number of reasons. Primarily, it's because we have one domain that regularly receives dictionary attacks. Those connections get rejected before they get to the greylisting, making that number artificially high.

About 86% of message attempts hit the greylist as new entries. Of those, only about 2% attempt a second delivery and make it past the greylist. Roughly 10% of those 2% are legitimate messages.

Summary

Greylisting is an incredibly effective tool for helping to block spam and some email transmitted worms. We have seen high performance combined with low maintenance and few (if any) legitimate rejected messages. Although there are some drawbacks that make it unsuitable in some cases, it can provide very dramatic results.

While 86% effectiveness means that greylisting is not suitable as an only line of defense, it is relatively cheap. Rejecting 6 out of 7 messages with greylisting gives you the cycles to run other more expensive anti-spam systems to get rid of the rest. Greylisting is very good at getting rid of the obvious spam quickly.

References

http://greylisting.org/ -- Links to more detailed information on greylisting, whitelists of greylist-incompatible systems, links to greylisting implementations for many mail servers, and detailed information about greylisting.

http://spf.pobox.com/ -- The Sender Policy Framework (SPF) homepage, an important emerging technology for blocking of forged sender addresses.

Implementations

Postfix -- http://isg.ee.ethz.ch/tools/postgrey/

qmail -- http://smtpd.develooper.com/

Sendmail -- http://hcpnet.free.fr/milter-greylist/

Others -- http://greylisting.org/implementations/

Sean Reifschneider is co-founder and a member of Technical Staff at tummy.com, ltd. With tummy.com, he's been helping provide Linux-based solutions to clients and participating in the Linux and open source communities since 1995. For more of his writing on technical topics, see: http://www.tummy.com/journals/users/jafo. He can be contacted at: jafo@tummy.com.