Taking
Back Your Mailbox with Greylisting
Sean Reifschneider
Unsolicited commercial email, a.k.a. spam, is an attack on the
Internet. It's as simple as that. Until we start really treating
it as such, the problem is only going to get worse. In the meantime,
we've got greylisting.
Greylisting combines many desirable attributes. It's incredibly
low maintenance. It has not misclassified a single legitimate message
in the 3 months I've been using it, and it has reduced the number
of spams I need to look at by 5 to 10 times. When I first heard
about greylisting, I dismissed it as too easy for spammers to circumvent.
However, in the 18 months since Evan Harris proposed the idea (July
2003), greylisting seems to be as effective as ever.
In this article, I'll provide details of how greylisting works,
offer information to help you decide whether it's right for you,
and suggest pointers for installation.
What Is Greylisting?
Greylisting works because the vast majority of spammers and email-propagated
worms cheat. To get their messages out quickly and easily, they
take shortcuts. One of the biggest shortcuts is that most message
deliveries are attempted only once. If a delivery attempt fails
for any reason, the sender ignores that delivery and tries other
addresses.
Greylisting relies on spammers using software that does not queue
messages with temporary failures to try them again at a later time.
Real mail servers will continue trying to deliver a message for
at least several hours, and usually several days if it doesn't go
through the first time. Implementing this functionality significantly
complicates the mailing software and uses resources on the sender's
computers, which is probably why so many spammers are neglecting
to do it.
In general terms, greylisting works by creating a "triple" of
remote IP address, envelope sender, and envelope recipient for every
incoming recipient. I say "recipient" instead of "message" because
in the case of carbon copies, a single message can have multiple
recipients. This triple is then looked up in a database. For the
first hour that a new triple is seen, it is rejected with a temporary
failure (400-series SMTP response), requesting that the remote Mail
Transfer Agent (MTA) try again later. After that, the recipient
is allowed for delivery.
Some of the greylisting implementations also limit the window
of opportunity. After the first hour, if a message is not received
within the next three hours, the entry is removed and the remote
sender must start all over again. This further ensures that the
remote system is queuing the message instead of sending another
campaign.
Benefits
The primary benefits of greylisting are its effectiveness and
low requirements for maintenance. It's the single most effective
technique I've found for cutting down on the spam that makes it
into my mailbox. I've been using SpamAssassin, ClamAV, outsourced
spam blocking companies, and other techniques, and the addition
of greylisting has reduced the amount of spam in our mailboxes by
5 to 10 times. Additionally, with it we've been able to entirely
eliminate our quarantine folders.
The low maintenance requirements are a huge benefit for the already
overworked sys admin. After the initial install and a few tweaks,
I haven't had to touch the greylisting system at all. Unlike other
techniques, it doesn't require rulesets to be updated, quarantine
folders to be reviewed, lists to be preened, decisions about which
RBLs (Real-time Black-hole Lists) to use, etc. It just works.
Even if spammers start queuing more of their messages, there are
advantages to delaying the initial message from a particular sender.
That delay gives other anti-spam techniques such as DCC, Spamcop,
and RBLs a better chance of detecting the message as spam. When
the spam or virus delivery is retried, it is more likely to be caught
by these other mechanisms.
Drawbacks
The biggest complaint against using greylisting is that it will
almost certainly cause some email to be delayed. The first message
from a particular sender at a particular location will, under most
circumstances, take at least an hour to be delivered. If a user
is on the telephone with the recipient and an email is sent, the
email won't immediately come through unless the sender is already
in the greylist. This has happened to me only once in the 3 months
we've been using greylisting. Some users may be intolerant of this
delay. The good news is that some greylist implementations allow
you to configure the greylisting on a per-user basis.
Many greylisting implementations have a "learning" mode to help
decrease delayed email from legitimate senders when greylisting
is first turned on. In learning mode, the greylist goes through
all the motions but never rejects messages. The database is then
primed with frequent senders. Run in learning mode for the first
few days or weeks to reduce the impact to your users.
Outgoing mail may also be delayed because of a technique called
"SMTP callbacks". SMTP callbacks are used by some mail servers to
try to combat forged sender addresses. When a mail server gets a
message claiming to be from a particular user, a connection is made
to a mail server for that domain to check that it will accept email
for that sender address. If this happens from one of your users,
the greylisting software may cause the remote server to deny the
message until after the greylisting period expires.
SMTP callbacks should be made obsolete by SPF (Sender Policy Framework),
so this will become less and less of a problem for greylisting hosts.
A bigger problem is caused by mail servers that randomly use IP
addresses in a pool for sending a particular message or that vary
the sender address with every message or every mail attempt. For
example, Google's Gmail will usually connect from a different IP
address each time it tries to re-deliver a message. Some mailing
lists will try to track bounces to individual messages so that every
message sent will have a different sender address.
A whitelist is kept of the known addresses for which this is an
issue, so that messages from affected providers can bypass greylisting.
Another recommendation is that, instead of using the literal IP
address of the sender as part of the triple, one could mask off
the address to the top 24 bits, for example, without dramatically
reducing the effectiveness.
The final drawback to consider is that greylisting MUST be implemented
on ALL outward-facing SMTP servers for a domain. This is because
the remote IP address is an important part of greylisting.
If you have one or more secondary MXs (Mail eXchangers), they
must all implement greylisting, or messages to those servers will
circumvent the greylist. This also prevents the use of many outsourced
or third-party spam-blocking services, unless they implement greylisting
in their system.
Performance
Having every incoming recipient trigger a database lookup -- and
probably a database insert -- seems like a fairly expensive operation.
However, most databases are optimized for exactly these sorts of
operations and can perform them quickly.
I ran some performance tests with the greylisting implementation
that I'm using. On a 2.66-GHz P4 system using 2 Hitachi 7200RPM
80GB drives in a RAID-1 array, the system was able to handle 384
greylist requests per second. This was with a real-world dataset
of more than 50,000 greylist requests from our production mail server
logs.
This greylist implementation uses the filesystem for the database
and, therefore, is not as fast as it could be. We selected this
option because the performance is more than three orders of magnitude
higher than our typical mail load. The lack of database configuration
and maintenance is why this implementation was selected.
My experience has been that using greylisting to drop 6 out of
7 incoming message attempts has freed up a majority of cycles on
our system. Before greylisting, our modest 1-GHz server was often
unable to meet demands. We were regularly running into mail delays
of several hours because the system simply couldn't keep up. Even
after we upgraded to a much newer 2.66-GHz system, the system was
still struggling. Adding greylisting has completely eliminated the
instances of spam clogging of the email server.
Countermeasures
A common argument against greylisting is that spammers can employ
countermeasures to work around it. That's certainly true and, in
fact, some spammers already do get past greylisting. However, my
experience has been that the vast majority of spam messages sent
are currently getting caught by greylisting. I could imagine that
if significant numbers of hosts implemented greylisting, the effectiveness
would drop off within 6 to 18 months.
In the meantime, your users' mailboxes and quarantine folders
will be dramatically smaller. Six to 18 months will also allow some
other technologies to mature to the point where we can start relying
on them. SPF, Sender-ID, and trust networks are up-and-coming techniques
that will be effective for the long term but need additional maturing
to be fully effective.
Case Study
In early August, I implemented greylisting on our company mail
server. We are a company of five people and have been on the Internet
a very long time. Our addresses are all over, making us a huge target
for spam. Our quarantine folder has had as many as 20,000 messages
per day in it, averaging 6,000 per day before we started getting
tough on spam.
The quarantine folder is simply a collection of the messages that
scored between 4 and 10 in SpamAssassin. Anything with a score greater
than 10 is thrown away because it's almost certainly spam, but we
like to review the marginal ones.
Figure 1 shows the number of messages in our quarantine folder
that are less than 24 hours old. The trend is pretty clear from
November to May, going from 2,000 per day up to 6,000 per day. In
May, we added an outsourced spam prevention company, which cut the
spam back down to 2,000 per day. However, within a month it had
jumped back up to more than 4,000 per day. During the next month,
I was able to get that back down to "only" 2,000 per day by spending
15 to 30 minutes per day tweaking the settings on the third-party
system.
In August, I enabled greylisting. Since then, we've been between
50 and 150 messages in the quarantine per day. Additionally, the
spam coming into our individual mailboxes has dropped by a factor
of ten. Instead of 20 or 30 (of another 30 to 60 legitimate messages
per day, not counting mailing lists), we're now down to 1 to 3 spams
per day that we have to deal with.
All without the aggravation of legitimate messages getting stuck
in quarantine folders, having to devote a fraction of the day to
maintenance, and other problem areas we've seen before greylisting.
Here are some statistics from our mail server over the past four
days:
114852 SMTP connections
26502 message delivery attempts
22904 new greylist database "triple" entries
10711 unique hosts making new greylist entries
2576 messages delivered to users
1210 deliveries allowed by greylist entries
403 messages that were delayed by greylisting, but then delivered
39 legitimate messages delayed
0 false positives
The number of SMTP connections is high for a number of reasons.
Primarily, it's because we have one domain that regularly receives
dictionary attacks. Those connections get rejected before they get
to the greylisting, making that number artificially high.
About 86% of message attempts hit the greylist as new entries.
Of those, only about 2% attempt a second delivery and make it past
the greylist. Roughly 10% of those 2% are legitimate messages.
Summary
Greylisting is an incredibly effective tool for helping to block
spam and some email transmitted worms. We have seen high performance
combined with low maintenance and few (if any) legitimate rejected
messages. Although there are some drawbacks that make it unsuitable
in some cases, it can provide very dramatic results.
While 86% effectiveness means that greylisting is not suitable
as an only line of defense, it is relatively cheap. Rejecting 6
out of 7 messages with greylisting gives you the cycles to run other
more expensive anti-spam systems to get rid of the rest. Greylisting
is very good at getting rid of the obvious spam quickly.
References
http://greylisting.org/ -- Links to more detailed information
on greylisting, whitelists of greylist-incompatible systems, links
to greylisting implementations for many mail servers, and detailed
information about greylisting.
http://spf.pobox.com/ -- The Sender Policy Framework (SPF)
homepage, an important emerging technology for blocking of forged
sender addresses.
Implementations
Postfix -- http://isg.ee.ethz.ch/tools/postgrey/
qmail -- http://smtpd.develooper.com/
Sendmail -- http://hcpnet.free.fr/milter-greylist/
Others -- http://greylisting.org/implementations/
Sean Reifschneider is co-founder and a member of Technical
Staff at tummy.com, ltd. With tummy.com, he's been helping provide
Linux-based solutions to clients and participating in the Linux
and open source communities since 1995. For more of his writing
on technical topics, see: http://www.tummy.com/journals/users/jafo.
He can be contacted at: jafo@tummy.com. |