Email
Worm Defense Tools
Philip B. Chase
The email worms of 2003 and 2004 provided an object lesson in
the need for filtering email for malware. Sobig.f, MyDoom, Bagle,
and Netsky ran rampant on the Internet thanks to unprotected clients
and mail servers. Even protected mail servers continue to suffer
from these email worms with unwanted traffic, quarantined messages,
virus alerts, and the processor load required to identify the malware.
Eliminating or reducing the flow of malware would free server
resources for more important tasks, especially during times of an
email worm outbreak. Email servers and mail scanners that block
all active content can accomplish this goal, but such a restrictive
solution does not work for every shop. Fortunately, the goal of
load reduction can be met while still maintaining some openness.
Email worms have a lot of behaviors in common. They harvest addresses,
randomly forge messages, and reuse addresses in fairly similar ways.
These patterns can be exploited to test mail messages to determine
which hosts are compromised so more efficient methods can be used
to stop future malware from reaching the compromised hosts. In this
article, I'll discuss a method for safely and automatically
identifying and blocking compromised hosts.
Why This Works
Email worms collect addresses from local address books, Web browser
caches, and just about any file that might have an email address
embedded in it to improve the odds of finding a viable host to invade.
For some peoples' computers, these collections of addresses
contain a seemingly random list of hostnames. Such randomness might
be seen in the file of a home user who doesn't use the computer
for work.
On the other hand, a computer used in a business environment,
an academic environment, or for telecommuting would almost certainly
have a corporate directory wherein all of the users share the same
domain name or a small set of domain names. Addresses embedded in
other files would also commonly be those of the user's co-workers.
This works because the worm can harvest a list of addresses that
will likely cause it to send malware to the same mail servers over
and over. This also works because the compromised hosts are almost
never MTAs (more on that later). They are generally just poorly
managed clients. They can be blocked because, in normal use, they
don't open a connection to the SMTP port on your mail server.
Their users have mail clients configured to talk to mail servers
run by their employers or their ISPs.
Methodology
The first step in blocking a compromised host is finding a malware
email. Alert messages from a mail scanner are an excellent data
stream for building a list of compromised hosts.
With an alert message in hand, legitimate hosts must be filtered
out. Many mail servers without virus scanners may forward or bounce
malware messages. Though the traffic is unwanted, these hosts are
MTAs so blocking them risks blocking legitimate traffic. This system
isn't an impenetrable wall of security; it is a load mitigator.
With a compromised host identified, a check must then be made
against a list of known mail forging worms. Not all malware comes
from compromised hosts, so not all malware warrants a block.
If the alert is not eliminated by those tests, the address and
a description of the malware can be fed to a block list. A properly
configured mail server can use the block list to block further connections
on the SMTP port from the offending host.
Philosophy of Blocking
What to block and why to block can be a hotly debated topic. To
reduce the risk of erroneously blocking an MTA, I have taken a fairly
conservative approach in my code.
Some MTAs will blindly pass malware so forwarded emails must be
screened out. These MTAs might have no virus scanners or they might
scan mail only upon local delivery. Fortunately, the number of "Received"
headers provides an easy indicator the message was not delivered
directly. As such, the Perl filter I have written, wormhost (Listing
1), uses a maximum received header test:
if (($line =~ /^Received/i)) {
++$received_headers;
if ($received_headers > $normal_received_headers) {
# This message was relayed or bounced
my_exit("$self: Possible relayed message. Too many \
received headers: $received_headers");
}
}
List servers can bounce messages with malware. Perhaps they are configured
to bounce all attachments, or their virus scanners or blocking rules
are quite silent, but I have witnessed malware coming from list processors
on legitimate MTAs. Filtering these requires comparison of the host
names used in the headers to determine when a message is likely to
be from a list server:
# get data to watch for list servers bouncing worms
if ($line =~ /^MAILFROM: .*@([^@]+)$/) {
$mailfrom_host = $1;
chop $mailfrom_host;
debug("$self: mailfrom_host: $mailfrom_host");
}
if ($line =~ /^Received:\s+from\s+(\S+)\s/) {
$received_host = $1;
debug("$self: received_host: $received_host");
}
if ($line =~ /^Message-Id: .*@([^@]+)>$/) {
$message_id_host = $1;
debug("$self: message_id_host: $message_id_host");
}
if ($line =~ /From: .*@([^@]+)>$/) {
$from_host = $1;
debug("$self: from_host: $from_host");
}
if (!($from_host =~ /^$/) &&
$from_host eq $mailfrom_host &&
$from_host eq $received_host &&
$from_host eq $message_id_host ) {
my_exit("$self: probably list server bounce from: $from_host");
}
Some MTAs will bounce based on patterns in attachments, attachment
type, or a positive hit from a virus scanner and send the entire virus,
intact and functional, to your mail server. As annoying as this behavior
is, it happens quite often. The senders are MTAs, so virus alerts
must be checked for phrases indicating bounce messages:
if (($line =~ /Returned mail: see transcript for details/i)) {
if ($debug) {
die "$self: Probable bounce: $line\n";
} else {
exit;
}
}
if (($line =~ /Message status - undeliverable/i)) {
if ($debug) {
die "$self: Probable bounce: $line\n";
} else {
exit;
}
}
There is also the possibility that a local message contains malware.
So messages must be checked for local origination lest one's
own mail server block itself. A test for a minimum number of received
headers after processing the entire message can suffice in many situations:
if ($received_headers < $normal_received_headers) {
my_exit("$self: Possible local message. Too few received \
headers: $received_headers");
}
Despite all of these tests, legitimate MTAs will occasionally be blocked
by the filter. The best approach is to use a white list from the very
beginning so erroneous listings can be easily and permanently fixed.
Prerequisites
Most of the components needed to run the filter are already installed
on a typical mail server in this day and age. A message transfer
agent (MTA) is, of course, required. To check for the worms, a mail
scanner is needed to disassemble the mail messages, and a virus
scanner is needed to check the message components for malware. Each
of these components will likely have its own set of prerequisites
like unzipping tools, mime message processors, logging systems,
and script engines.
For my tests and for the purposes of this article, the above tasks
were performed by Qmail, Qmail-Scanner, and Network Associates'
UVScan, respectively. I fully expect wormhost is somewhat tied to
this mix of tools. The typical number of received headers, the inclusion
of the message header in the virus alert, and the malware names
are not universal. Expect to tweak the code when using other MTAs,
mail scanners, and virus scanners.
Wormhost requires Perl and some commonly included Perl modules,
but nothing else. It reads a virus-alert message on STDIN. If it
believes it has found a compromised host, it outputs a useful record
type of your choosing on STDOUT.
The filter output is controlled with command-line parameters:
usage: wormhost [flags] [files]
options:
-t output tinydns records
-r output rbldns records
-b output BIND records
-z zone name for RBL queries
-d output a tab delimited record
-a output address only
-h print this usage info
-v verbose output
Typically, the virus alert would be delivered via an entry in a .qmail
file or similar:
|/usr/local/worm-defense-tools/bin/process-virus-alerts.sh
This shell script simply calls wormhost with the desired parameters
and puts the output in an appropriate place. In this example, the
output will be in tinydns format with the base domain set to wbl.example.com
and appended to a tinydns data file:
#!/bin/sh
/usr/local/worm-defense-tools/bin/wormhost \
-tzwbl.example.com >> ~/tinydns/data
Wormhost supports several output formats because there are several
ways to implement the block list. The blocking methods generally fall
into two categories: real-time black lists (RBLs); and tcp server
deny lists.
Creating a real-time black list (RBL) is fairly easy, makes the
list available to as many hosts as desired, and is very well supported
by modern MTAs. Customized responses delivered to the blocked host
are easy to configure through the use of TXT records in DNS. White
lists are well supported using the RBL standards, making corrections
quick and easy. That said, using an RBL will incur some propagation
delays if the MTA is not talking directly to the DNS server hosting
the RBL. These delays can result in numerous worm deliveries for
the MTA that produced the blacklist entry.
Implementing a black list in a TCP server is a reliable way to
block a host immediately. Generating such a black list is about
as easy as generating RBL entries. On the downside, the built-in
data distribution methods of a DNS-based RBL are not available,
though one could be grafted on with rsync or similar tools. Some
TCP servers do not support customized response messages. Using such
a server would mean rejected clients might not receive a meaningful
response if they get one at all. A meaningful message like "Host
blocked due to Netsky worm" tells the recipient your server
is up and healthy; if perhaps a bit too closed to normal mail traffic.
I'll show an example of an RBL here.
Real-Time Black Lists
For my real-time block list, I used Dan Bernstein's tinydns.
The installation of tinydns is fairly standard except for a few
modifications to the Makefile for data.cdb. The Makefile must include
a zone file for the worm black list and a zone file for the white
list. If you prefix all of your zone files with "db.",
the Makefile might look something like this:
remote: data.cdb
/usr/bin/rsync -az -e ssh /service/tinydns/root/ \
data.cdb 192.0.0.1:/service/tinydns/root/data.cdb
data.cdb: data
/usr/local/bin/tinydns-data
data: db.*
@/bin/echo "# DO NOT EDIT THIS FILE DIRECTLY" > data
/bin/ls -1 db.* | /bin/grep -v ~ | /usr/bin/xargs \
-i /bin/cat {} >> data
Wormhost can output the needed tinydns records as shown above. Unfortunately,
the records output by wormhost are not necessarily unique so they
must be sorted and "uniqued" before they are concatenated
with other tinydns data and compiled.
Listing 2 shows a daemon to place a file of unique records in
the tinydns data directory and recompile data.cdb. This script was
written to run under the Dan Bernstein's daemontools. It requires
that package or something similar to set its environment variables
and run it as a daemon.
With the data available via DNS, it is a fairly simple matter
to reconfigure an smtpd to use the block list. With qmail, one need
only insert a line to run rblsmtpd into the /var/qmail/supervise/qmail-smtpd/run
script:
...
exec /usr/local/bin/softlimit -m 10000000 \
/usr/local/bin/tcpserver -v -R -l 0 -x /etc/tcp.smtp.cdb \
-c "$MAXSMTPD" -u "$QMAILDUID" -g "$NOFILESGID" 0 smtp \
rblsmtpd -a whitelist.example.com -r wbl.example.com \
/var/qmail/bin/qmail-smtpd \
2>&1
In the example above, rblsmtpd allows hosts listed in whitelist.example.com,
while denying access to hosts listed in wbl.example.com.
When setting up an RBL, keep in mind there are some DNS servers
specifically designed to support RBL lists that can simplify the
task. Dan Bernstein's rbldns and Michael Tokarev's rbldnsd
both accept more simplified input records and address other special
needs of RBLs.
MTA Pilot Error
One of the little white lies of this project is that we can avoid
blocking MTAs because a real MTA would, at worst, relay a worm.
A relay is pretty easy to see in the headers so we can be kind to
the relays and not block them. The fallacy is the MTA could be the
originator of the worm.
There are lots of MTAs that run on the MS Windows family of products
-- the same host as the significant email worms. There is little
to stop a careless systems administrator from running an email client
from a Windows console where an MTA is hosted, execute a worm, and
compromise the host.
This scenario is not wild conjecture. My email worm filter blocked
a particular host multiple times for sending email worms. Virus-alert
logs showed a pattern exactly like that of a non-MTA worm-infected
host. Based on normal emails, worm-laden emails, and tests with
the blocked MTA, I can only conclude the situation I described above
occurred multiple times on this host.
On one level, this story is a tale of sloppy systems administration,
but it is also a pitch for white lists. Some real MTAs will be blocked
by this system, so be prepared to fix the problem by implementing
a white list from the beginning.
Futures
What I have shown here is useful, but not complete. Chiefly, the
output of wormhost does not lend itself to good logging. None of
the record formats timestamp the output. Nor are the records serialized
in a way so they could be tied to the virus alerts or quarantine
messages that precipitated the block.
The sys admin of a blocked MTA might want to know why her MTA
is blocked and how to get it unblocked. There is no provision in
this system to show a URL that documents the block. A database containing
the details of the block events, a Web server, and a bit of PHP
could deliver this, but that is not implemented here.
In the long run, worms are likely to change and make wormhost
less effective. The best way to combat this is an analysis of the
negative hits, but there is no provision in wormhost to collect
the negative hits or provide statistics on them.
Conclusion
We need not sit idly by and let compromised hosts assault our
servers. Adaptive technologies, such as those described here, can
exploit the patterned behavior of malware to automatically protect
our systems' resources so they can do the work to which we
have tasked them.
Acknowledgments
I thank Ray Strubinger, Dr. Robert G. Frank, and my wife, Susan,
for their support and encouragement as these tools went from a few
scraps of undocumented shell script to something suitable for public
consumption.
References
DaemonTools -- http://cr.yp.to/daemontools.html
DJBDNS -- http://cr.yp.to/djbdns.html
Network Associates -- http://www.nai.com
Perl Cookbook, First Edition, by Tom Christiansen &
Nathan Torkington, ISBN 1-56592-243-3.
Qmail -- http://cr.yp.to/qmail.html
Life with qmail -- http://www.lifewithqmail.org/
Qmail-Scanner -- http://qmail-scanner.sourceforge.net/
rbldnsd -- http://www.corpit.ru/mjt/rbldnsd.html
FAQ for rbldnsd and dnscache -- http://surbl.org/dnscache-rbldnsd.html
rblsmtpd -- http://cr.yp.to/ucspi-tcp.html
RBLsmptd RBLdns HowTo -- http://ladro.com/docs/dns/rblsmtpd.html
Philip Chase graduated from Rice University in 1986 with a
degree in Mechanical Engineering. He currently heads the information
technology group at the College of Public Health and Health Professions
at the University of Florida where he maintains Linux and NetWare
Systems. Philip can be reached at: pbc@afn.org. |