Cover V13, i13
aug2004.tar

Email Worm Defense Tools

Philip B. Chase

The email worms of 2003 and 2004 provided an object lesson in the need for filtering email for malware. Sobig.f, MyDoom, Bagle, and Netsky ran rampant on the Internet thanks to unprotected clients and mail servers. Even protected mail servers continue to suffer from these email worms with unwanted traffic, quarantined messages, virus alerts, and the processor load required to identify the malware.

Eliminating or reducing the flow of malware would free server resources for more important tasks, especially during times of an email worm outbreak. Email servers and mail scanners that block all active content can accomplish this goal, but such a restrictive solution does not work for every shop. Fortunately, the goal of load reduction can be met while still maintaining some openness.

Email worms have a lot of behaviors in common. They harvest addresses, randomly forge messages, and reuse addresses in fairly similar ways. These patterns can be exploited to test mail messages to determine which hosts are compromised so more efficient methods can be used to stop future malware from reaching the compromised hosts. In this article, I'll discuss a method for safely and automatically identifying and blocking compromised hosts.

Why This Works

Email worms collect addresses from local address books, Web browser caches, and just about any file that might have an email address embedded in it to improve the odds of finding a viable host to invade. For some peoples' computers, these collections of addresses contain a seemingly random list of hostnames. Such randomness might be seen in the file of a home user who doesn't use the computer for work.

On the other hand, a computer used in a business environment, an academic environment, or for telecommuting would almost certainly have a corporate directory wherein all of the users share the same domain name or a small set of domain names. Addresses embedded in other files would also commonly be those of the user's co-workers. This works because the worm can harvest a list of addresses that will likely cause it to send malware to the same mail servers over and over. This also works because the compromised hosts are almost never MTAs (more on that later). They are generally just poorly managed clients. They can be blocked because, in normal use, they don't open a connection to the SMTP port on your mail server. Their users have mail clients configured to talk to mail servers run by their employers or their ISPs.

Methodology

The first step in blocking a compromised host is finding a malware email. Alert messages from a mail scanner are an excellent data stream for building a list of compromised hosts.

With an alert message in hand, legitimate hosts must be filtered out. Many mail servers without virus scanners may forward or bounce malware messages. Though the traffic is unwanted, these hosts are MTAs so blocking them risks blocking legitimate traffic. This system isn't an impenetrable wall of security; it is a load mitigator.

With a compromised host identified, a check must then be made against a list of known mail forging worms. Not all malware comes from compromised hosts, so not all malware warrants a block.

If the alert is not eliminated by those tests, the address and a description of the malware can be fed to a block list. A properly configured mail server can use the block list to block further connections on the SMTP port from the offending host.

Philosophy of Blocking

What to block and why to block can be a hotly debated topic. To reduce the risk of erroneously blocking an MTA, I have taken a fairly conservative approach in my code.

Some MTAs will blindly pass malware so forwarded emails must be screened out. These MTAs might have no virus scanners or they might scan mail only upon local delivery. Fortunately, the number of "Received" headers provides an easy indicator the message was not delivered directly. As such, the Perl filter I have written, wormhost (Listing 1), uses a maximum received header test:

if (($line =~ /^Received/i)) {
    ++$received_headers;
    if ($received_headers > $normal_received_headers) {
# This message was relayed or bounced
my_exit("$self: Possible relayed message.  Too many \
  received headers: $received_headers");
    }
}
List servers can bounce messages with malware. Perhaps they are configured to bounce all attachments, or their virus scanners or blocking rules are quite silent, but I have witnessed malware coming from list processors on legitimate MTAs. Filtering these requires comparison of the host names used in the headers to determine when a message is likely to be from a list server:

# get data to watch for list servers bouncing worms
if ($line =~ /^MAILFROM: .*@([^@]+)$/) {
    $mailfrom_host = $1;
    chop $mailfrom_host;
    debug("$self: mailfrom_host: $mailfrom_host");
}

if ($line =~ /^Received:\s+from\s+(\S+)\s/) {
    $received_host = $1;
    debug("$self: received_host: $received_host");
}

if ($line =~ /^Message-Id: .*@([^@]+)>$/) {
    $message_id_host = $1;
    debug("$self: message_id_host: $message_id_host");
}

if ($line =~ /From: .*@([^@]+)>$/) {
    $from_host = $1;
    debug("$self: from_host: $from_host");
}

if (!($from_host =~ /^$/) &&
    $from_host eq $mailfrom_host &&
    $from_host eq $received_host &&
    $from_host eq $message_id_host ) {
    my_exit("$self: probably list server bounce from: $from_host");
}
Some MTAs will bounce based on patterns in attachments, attachment type, or a positive hit from a virus scanner and send the entire virus, intact and functional, to your mail server. As annoying as this behavior is, it happens quite often. The senders are MTAs, so virus alerts must be checked for phrases indicating bounce messages:

if (($line =~ /Returned mail: see transcript for details/i)) {
    if ($debug) {
die "$self: Probable bounce: $line\n";
    } else {
exit;
    }
}
if (($line =~ /Message status - undeliverable/i)) {
    if ($debug) {
die "$self: Probable bounce: $line\n";
    } else {
exit;
    }
}
There is also the possibility that a local message contains malware. So messages must be checked for local origination lest one's own mail server block itself. A test for a minimum number of received headers after processing the entire message can suffice in many situations:

if ($received_headers < $normal_received_headers) {
  my_exit("$self: Possible local message. Too few received \
    headers: $received_headers");
}
Despite all of these tests, legitimate MTAs will occasionally be blocked by the filter. The best approach is to use a white list from the very beginning so erroneous listings can be easily and permanently fixed.

Prerequisites

Most of the components needed to run the filter are already installed on a typical mail server in this day and age. A message transfer agent (MTA) is, of course, required. To check for the worms, a mail scanner is needed to disassemble the mail messages, and a virus scanner is needed to check the message components for malware. Each of these components will likely have its own set of prerequisites like unzipping tools, mime message processors, logging systems, and script engines.

For my tests and for the purposes of this article, the above tasks were performed by Qmail, Qmail-Scanner, and Network Associates' UVScan, respectively. I fully expect wormhost is somewhat tied to this mix of tools. The typical number of received headers, the inclusion of the message header in the virus alert, and the malware names are not universal. Expect to tweak the code when using other MTAs, mail scanners, and virus scanners.

Wormhost requires Perl and some commonly included Perl modules, but nothing else. It reads a virus-alert message on STDIN. If it believes it has found a compromised host, it outputs a useful record type of your choosing on STDOUT.

The filter output is controlled with command-line parameters:

usage: wormhost [flags] [files]

options:
-t output tinydns records
-r output rbldns records
-b output BIND records
-z zone name for RBL queries
-d output a tab delimited record
-a output address only
-h print this usage info
-v verbose output
Typically, the virus alert would be delivered via an entry in a .qmail file or similar:

|/usr/local/worm-defense-tools/bin/process-virus-alerts.sh
This shell script simply calls wormhost with the desired parameters and puts the output in an appropriate place. In this example, the output will be in tinydns format with the base domain set to wbl.example.com and appended to a tinydns data file:

#!/bin/sh
/usr/local/worm-defense-tools/bin/wormhost \
  -tzwbl.example.com >> ~/tinydns/data
Wormhost supports several output formats because there are several ways to implement the block list. The blocking methods generally fall into two categories: real-time black lists (RBLs); and tcp server deny lists.

Creating a real-time black list (RBL) is fairly easy, makes the list available to as many hosts as desired, and is very well supported by modern MTAs. Customized responses delivered to the blocked host are easy to configure through the use of TXT records in DNS. White lists are well supported using the RBL standards, making corrections quick and easy. That said, using an RBL will incur some propagation delays if the MTA is not talking directly to the DNS server hosting the RBL. These delays can result in numerous worm deliveries for the MTA that produced the blacklist entry.

Implementing a black list in a TCP server is a reliable way to block a host immediately. Generating such a black list is about as easy as generating RBL entries. On the downside, the built-in data distribution methods of a DNS-based RBL are not available, though one could be grafted on with rsync or similar tools. Some TCP servers do not support customized response messages. Using such a server would mean rejected clients might not receive a meaningful response if they get one at all. A meaningful message like "Host blocked due to Netsky worm" tells the recipient your server is up and healthy; if perhaps a bit too closed to normal mail traffic. I'll show an example of an RBL here.

Real-Time Black Lists

For my real-time block list, I used Dan Bernstein's tinydns. The installation of tinydns is fairly standard except for a few modifications to the Makefile for data.cdb. The Makefile must include a zone file for the worm black list and a zone file for the white list. If you prefix all of your zone files with "db.", the Makefile might look something like this:

remote: data.cdb
        /usr/bin/rsync -az -e ssh /service/tinydns/root/ \
          data.cdb 192.0.0.1:/service/tinydns/root/data.cdb

data.cdb: data
        /usr/local/bin/tinydns-data

data: db.*
        @/bin/echo "# DO NOT EDIT THIS FILE DIRECTLY" > data
        /bin/ls -1 db.* | /bin/grep -v ~ | /usr/bin/xargs \
          -i /bin/cat {} >> data
Wormhost can output the needed tinydns records as shown above. Unfortunately, the records output by wormhost are not necessarily unique so they must be sorted and "uniqued" before they are concatenated with other tinydns data and compiled.

Listing 2 shows a daemon to place a file of unique records in the tinydns data directory and recompile data.cdb. This script was written to run under the Dan Bernstein's daemontools. It requires that package or something similar to set its environment variables and run it as a daemon.

With the data available via DNS, it is a fairly simple matter to reconfigure an smtpd to use the block list. With qmail, one need only insert a line to run rblsmtpd into the /var/qmail/supervise/qmail-smtpd/run script:

...
exec /usr/local/bin/softlimit -m 10000000 \
    /usr/local/bin/tcpserver -v -R -l 0 -x /etc/tcp.smtp.cdb  \
        -c "$MAXSMTPD" -u "$QMAILDUID" -g "$NOFILESGID" 0 smtp \
    rblsmtpd -a whitelist.example.com -r wbl.example.com \
    /var/qmail/bin/qmail-smtpd \
    2>&1
In the example above, rblsmtpd allows hosts listed in whitelist.example.com, while denying access to hosts listed in wbl.example.com.

When setting up an RBL, keep in mind there are some DNS servers specifically designed to support RBL lists that can simplify the task. Dan Bernstein's rbldns and Michael Tokarev's rbldnsd both accept more simplified input records and address other special needs of RBLs.

MTA Pilot Error

One of the little white lies of this project is that we can avoid blocking MTAs because a real MTA would, at worst, relay a worm. A relay is pretty easy to see in the headers so we can be kind to the relays and not block them. The fallacy is the MTA could be the originator of the worm.

There are lots of MTAs that run on the MS Windows family of products -- the same host as the significant email worms. There is little to stop a careless systems administrator from running an email client from a Windows console where an MTA is hosted, execute a worm, and compromise the host.

This scenario is not wild conjecture. My email worm filter blocked a particular host multiple times for sending email worms. Virus-alert logs showed a pattern exactly like that of a non-MTA worm-infected host. Based on normal emails, worm-laden emails, and tests with the blocked MTA, I can only conclude the situation I described above occurred multiple times on this host.

On one level, this story is a tale of sloppy systems administration, but it is also a pitch for white lists. Some real MTAs will be blocked by this system, so be prepared to fix the problem by implementing a white list from the beginning.

Futures

What I have shown here is useful, but not complete. Chiefly, the output of wormhost does not lend itself to good logging. None of the record formats timestamp the output. Nor are the records serialized in a way so they could be tied to the virus alerts or quarantine messages that precipitated the block.

The sys admin of a blocked MTA might want to know why her MTA is blocked and how to get it unblocked. There is no provision in this system to show a URL that documents the block. A database containing the details of the block events, a Web server, and a bit of PHP could deliver this, but that is not implemented here.

In the long run, worms are likely to change and make wormhost less effective. The best way to combat this is an analysis of the negative hits, but there is no provision in wormhost to collect the negative hits or provide statistics on them.

Conclusion

We need not sit idly by and let compromised hosts assault our servers. Adaptive technologies, such as those described here, can exploit the patterned behavior of malware to automatically protect our systems' resources so they can do the work to which we have tasked them.

Acknowledgments

I thank Ray Strubinger, Dr. Robert G. Frank, and my wife, Susan, for their support and encouragement as these tools went from a few scraps of undocumented shell script to something suitable for public consumption.

References

DaemonTools -- http://cr.yp.to/daemontools.html

DJBDNS -- http://cr.yp.to/djbdns.html

Network Associates -- http://www.nai.com

Perl Cookbook, First Edition, by Tom Christiansen & Nathan Torkington, ISBN 1-56592-243-3.

Qmail -- http://cr.yp.to/qmail.html

Life with qmail -- http://www.lifewithqmail.org/

Qmail-Scanner -- http://qmail-scanner.sourceforge.net/

rbldnsd -- http://www.corpit.ru/mjt/rbldnsd.html

FAQ for rbldnsd and dnscache -- http://surbl.org/dnscache-rbldnsd.html

rblsmtpd -- http://cr.yp.to/ucspi-tcp.html

RBLsmptd RBLdns HowTo -- http://ladro.com/docs/dns/rblsmtpd.html

Philip Chase graduated from Rice University in 1986 with a degree in Mechanical Engineering. He currently heads the information technology group at the College of Public Health and Health Professions at the University of Florida where he maintains Linux and NetWare Systems. Philip can be reached at: pbc@afn.org.