Cover V13, i06

Article

jun2004.tar

Procmail Hints and Hacks

Kevin Shortt

This article covers the popular mail delivery agent procmail, written by Stephen R. van der Berg. If you are familiar with procmail and its uses, then this article will suit you well. In it, I'll describe some procmail filtering techniques that I have used as an email administrator over the years. These recipes illustrate how I filled some real-world needs with procmail.

I will begin with a simple backup recipe for all incoming mail. No recipe gets simpler than this. It begins with its declaration (:0) and the flag c. Essentially, this means to use a copy (c) of the incoming message and place it in the folder called "backup". Placing this recipe first in your .procmailrc file will ensure that a copy of all incoming mail will be put aside before any other recipe erroneously destroys it.

The recipe:

 :0 c
 backup
This can be useful if you are in the early stages of learning the power of procmail. Additionally, as an administrator, it is useful to place this into users' .procmailrc files as an easy way to restore inboxes for users who remove their email from the server with a PC client and then watch their PC die with lots of unread mail. I implement this for some select cases in such a way that a backup copy of each message is placed into a separate file with the date used as part of the filename. Then, every night I delete any backup inboxes that are older than three or five days. Each case is different. I will expand further on the files named by date later in this article.

Auto-Responding

Next, let's look at a simple auto-responder. This auto-responder recipe uses a copy of the incoming message, checks whether the message is for a particular alias, extracts the sender, and then generates a reply that includes a new subject and a pre-written message body taken from a text file. It then emails a reply to the original sender.

The recipe:

 :0 hc
 * ^TOalias@sysadminmag.com
 * !^FROM_DAEMON
 * !^FROM_MAILER
 * !^X-Loop: reader@sysadminmag.com
 | (/usr/bin/formail -r                    \
    -I"From: Firstname Lastname <reader@sysadminmag.com>" \
    -I"Subject: *** OUT OF OFFICE ***"     \
    -A"X-Loop: reader@sysadminmag.com";    \
    /bin/cat /home/reader/outofoffice.msg) \
   | /usr/sbin/sendmail -oi -t
This recipe begins with its declaration (:0) and the flags h and c. This line essentially means to use a copy (c) of the incoming message and process only its headers (h). The filter conditions follow next. The conditions are one or more lines that begin with a "*". The first condition in this recipe verifies that the incoming email message is actually intended for the address you've chosen. This is not a necessary condition, but it is helpful when you have alias email spooling into a shared mailbox and you only want to send auto-replies for emails that were sent to the alias address.

The two lines "* !^FROM_DAEMON" and "* !^FROM_MAILER" indicate to skip any incoming message that is from mailer-daemon or postmaster. This will help prevent you from sending an auto-reply in the event the incoming message is from a mailing list. These two variables actually expand to a much larger regular expression. Please read the man page on procmailrc(1) (in the MISCELLANEOUS section) for further details.

There are many dangers associated with allowing your auto-responder to post back to a mailing list. It is clearly not a good idea. The line "!^X-Loop: reader@sysadminmag.com" will prevent a reply from being generated if the email address (reader@sysadminmag.com) is listed in an X-Loop: header of the incoming message. If a message contains this X-Loop: header with our email address, it is likely that this recipe has already replied to a message whose sending email address also has an auto-responder set up. This prevents the endless looping of replies.

The next line of our recipe is the action line. Procmail only allows one action line. So for readability, the end of each line is escaped with a backslash to indicate the line is to be continued onto the next line. Our action uses formail, cat, and sendmail to generate the recipe's response.

Formail is a component of the procmail package that can be used as a filter to format or reformat an email message. The action line in this recipe consists of two main components that each begin with a pipe "|". The first component pipes the message to a single block of parentheses that executes formail and /bin/cat to generate the reply. The formail call prepares the headers for the reply message. This step is crucial, because at this point in the recipe, we do not know who to reply to; formail handles that for us here.

The "-r" instructs formail that we want only to generate an auto-reply header. The "-I" flag will replace or insert a new field into the header of the outgoing message. The From field gets inserted here if you like. The Subject header of the incoming message gets fully replaced with our new one. If you would like to keep the existing Subject header of the incoming message, simply omit an argument with "-ISubject". The "-A" flag is for appending custom headers to the message.

This recipe adds the message header "X-Loop: reader@sysadminmag.com". You should replace that with the username sending the reply. This header helps prevent mail loops with the given mail account. It essentially prevents another auto-responder from responding to our auto-reply. As discussed above, the recipe will not reply again once this header is in the message. Once all of the arguments to formail are complete, it is important to note that a semicolon is necessary to be able to execute a second command inside the first set of parentheses.

Before the close of the parentheses, the recipe shells out to execute /bin/cat to grab the message body from a predefined file in the user's home directory. Inside the file should appear the text you would like the reader of the auto-reply to see as the body of the message. The parentheses are typically used when multiple shell commands are required to generate your reply. In this case, the recipe uses formail and cat to generate a message that gets piped to the second component of our recipe.

The second component is how the response is sent. The recipe now shells out to sendmail to send the auto-reply. The argument "-oi" tells sendmail to ignore dots in the mail message. The "-t" tells sendmail to look for recipients to send the message.

Adding an Attachment

Now let's expand this recipe to include adding an attachment with the reply. I have had a few requests over the years that have asked me to set up an auto-reply for an incoming email alias that would reply with a PDF or an MSWord document with marketing information regarding a client's products. For example, business Web site A sells widgets. They offer free information for their widgets by stating in their advertisements that the customer can simply send email to "freeinfo@websiteA.com" to receive a free brochure via email. A potential customer emails "freeinfo@websiteA.com" and a reply is immediately sent out with a MIME attachment of a predefined electronic brochure.

The following recipe expands on the previous one by changing the layout of the message into a MIME-formatted message. (MIME stands for Multipurpose Internet Mail Extensions and is defined in RFC 2045.) Essentially, the defined format allows each message and all of its components, including body and attachments, to exist and be printed as plain ASCII text. Consequently, each reply that our recipe builds is plain ASCII text.

The recipe:

 :0 hc
 * ^TOalias@sysadminmag.com
 * !^FROM_DAEMON
 * !^FROM_MAILER
 * !^X-Loop: reader@sysadminmag.com
 | (/usr/bin/formail -r                    \
    -I"From: Firstname Lastname <reader@sysadminmag.com>" \
    -I"Subject: *** Free Info from WebSite A ***"     \
    -A"X-Loop: reader@sysadminmag.com";    \
    -I"MIME-Version: 1.0"\
    -I"Content-Type: multipart/mixed; \
      boundary=\"8323328-1766922534-967595046=:27818\"";\
    echo "--8323328-1766922534-967595046=:27818";\
    echo "Content-type: TEXT/PLAIN; charset=US-ASCII";\
    /bin/cat /home/reader/FreeInfoGreeting.txt \
    echo "--8323328-1766922534-967595046=:27818";\
    echo "Content-Type: APPLICATION/MSWord; \
      NAME=\"brochure.doc\"";\
    echo "Content-Transfer-Encoding: base64";\
    echo "Content-Disposition: \
      attachment; filename=\"brochure.doc\"";\
    echo "Content-Description: MSWord Document";\
    echo "";\
    /usr/bin/mimencode -b /full/path/to/brochure/brochure.doc;\
    echo "--8323328-1766922534-967595046=:27818--"\
    ) | /usr/sbin/sendmail -oi -t
Since I've explained most of this recipe previously, I will only elaborate on the additions. I did not read the RFC to learn how to do this. I essentially studied my inbox using vi. Then I wrote this recipe to mimic a MIME-formatted message, which is a message created with sections that are differentiated with unique boundary tags.

The additions to this recipe include adding a "MIME-Version: 1.0" header that communicates to the mail client software (Mail User Agent -- MUA) that the message is in MIME format. A Content-type header is inserted to indicate to the MUA that the message is a multipart/mixed message with a boundary tag of "8323328-1766922534-967595046=:27818". This boundary tag is only a unique marker to indicate the end of a section of the MIME format. It's important to note that this string can be anything reasonably unique. In this recipe, it means that each section in the outgoing message will begin and end with "--8323328-1766922534-967595046=:27818". Note that each actual boundary must begin with a "--".

Inside each section, a Content-type header must also be inserted. This communicates to the MUA the type of data that exists in the current section. This allows the MUA to determine the appropriate software to display or execute the content to the user. In this new recipe, we create a message with only two sections to our MIME-formatted message. More than two sections can be used. This is done by using the boundaries that were declared inside our first Content-type header. One of our sections is "Content-type: TEXT/PLAIN" and the other is "Content-Type: APPLICATION/MSWord". This means that we have a MIME-formatted message with a body in simple plain text with an attachment in MSWord format.

The body of the message (TEXT/PLAIN) was placed into a file called "/home/reader/FreeInfoGreeting.txt". This was placed into the message using the cat command. The MSWord document needed more work in order to be included in the MIME format. A header "Content-Type-Encoding: base64" was added to let the MUA know it needed to decode the message using a Base64 encoding method.

Before the MUA can decode the message, however, we must first encode it. So, once the "Content-Disposition" and "Content-Description" headers are added along with an empty line, the recipe creates the MIME-encoded attachment by using the command-line executable /usr/bin/mimencode. This creates a flat-text version of the MSWord document that is easily decoded by almost every MUA. Finally, we wrap the attachment with a final boundary. Please note the trailing two dashes "--", which indicate to the MUA that this is the final boundary of our message.

The easiest way to learn how this is done would be to send yourself an email with an attachment. Then, view the flat email file on your Unix server to see how it was put together. Be sure that you experiment with different attachment types and MUAs. Also note the boundary strings and how they vary. If you have a lot of time, you could also read RFC 2045.

Halting Viruses

The next recipe I will present is a trick I use to stop viruses from reaching my users' inboxes. Anti-virus software packages can typically protect your computer better than the average procmail recipe. However, sometimes analyzing the message before the vendor's update is available can allow you to create a procmail recipe that will work in the interim. I have used this method numerous times.

There are also many good procmail sites that offer free recipes for virus defense. I definitely recommend that users implement them if it makes sense for their system. The best way to stop an incoming virus is to get hold of a few copies of it. Using vi or your favorite text editor on your Unix machine, examine the message in plain text with the encoded MIME format. Typically, there is a common text string, a signature if you will, that can be gleaned from its attachment. Once you determine that common signature, a simple recipe can be written to filter out the incoming message. If I don't have the time to gather the information myself, I can usually find it by searching the Internet.

The recipe:

  :0 B
  * > 20000
  * < 35000
  * (8HcggH2gd2Bl7w1Admz|di+Nn/wLVnu82DbWBlON \
     fE1TBxRhYxk7Oot8JGkDgznr+FtGdb2FwHSjjMmFGMe2a7iApeie)
  * filename=".*\.(pif|exe|scr|zip|bat|cmd)"
  * ^ *Content-Disposition: attachment;
  /tmp/netsky
This recipe illustrates an example of how to stop an inbound virus. This particular recipe will only stop two forms of a variant of the NETSKY virus that was launched in January 2004. This recipe should be placed into your servers' global procmailrc file, which is typically /etc/procmailrc. The recipe reads as follows: Match any message whose body is larger than 20K yet smaller than 35K AND contains either the string "8HcggH2gd2Bl7w1Admz" OR "di+Nn/wLVnu82DbWBlONfE1TBxRhYxk7Oot8JGkDgznr+FtGdb2FwHSjjMmFGMe2a7iApeie" in the message body (indicated by the "B" after the declaration) AND has a header with "Content-Disposition: attachment" AND an attachment of file extensions "pif" OR "exe" OR "scr" or "zip" OR "bat" or "cmd".

The matched strings were taken from the MIME-encoded attachment of the actual virus. If an incoming message matches all of these attributes, it will be delivered to the file /tmp/netsky. This will keep the incoming matched messages out of your users' inboxes. The message can also be thrown away immediately by saving it to /dev/null. I always prefer, however, to save the messages to a file in /tmp to ensure I have the ability to extract a message in the event of a false capture.

This file can be purged as often as you like. This recipe is only a very small illustration of the many ways procmail can be used to filter incoming viruses. I still recommend that you deploy a professional grade anti-virus software to protect from incoming email viruses.

Handling Spam

In the world of filtering email, filtering by date can be very useful. There are many great uses for this concept. One of the ways I have implemented this concept is with SpamAssassin. I use SpamAssassin for filtering Spam on an email server. The following recipe grabs all incoming messages that have been processed by SpamAssassin and checks for the "^X-Spam-Status: Yes" header. It then files it away upon a match. In my implementation, I file it to a dedicated filesystem and into a mail file with the current date as part of the filename. Then, each evening via crond, I run a script that parses the spam directory structure that I devised and deletes any spam file older than two weeks. This gives each user two weeks to verify that the captured spam has no false positives. A symbolic link named "spam" inside a user's mail directory gives that user access to their spam folder on the separate file system. It also lessens my interaction with maintaining the growth of the filesystem.

For this solution, I have used variables to set the folder name. It is important that the folder name be in a directory that is already created. This way, spooling into filenames of a pre-defined format is easy. A pre-defined format enables you to write a script that uses the find command to delete spam files older than a given period. Place the recipe variables near the top of your .procmailrc file and place the recipe just below your SpamAssassin recipe.

The variables:

USERNAME=reader
SPAMFOLDER=/spam/reader
DATE='/usr/bin/date +%Y-%m-%d'
The recipe:
:0:
* ^X-Spam-Status: Yes
$SPAMFOLDER/$USERNAME-$DATE
This is self-explanatory. Set the USERNAME variable to your username. Set the SPAMFOLDER variable to the directory in which you will be placing daily spam files and ensure that this path exists before deploying the recipe. Set the DATE variable to the date command. Change the arguments to /usr/bin/date to suit your needs. I set this one to create a string of YYYY-MM-DD.

This concept is also great for role accounts. Email that arrives for postmaster could be filtered and archived in the same manner. It is often mail of this type that gets deleted without being read. Archiving for a set number days, weeks, or months could be applied and help put an end to wasting resources. It can also be distributed with INCLUDERC commands in the global procmailrc file if you are using SpamAssassin for all users on your server.

In conclusion, procmail is a great tool for systems administrators. If you are creative enough, you can solve many problems using procmail. I have only scratched the surface of some of the things that can be done with it. There are many sites on the Internet that offer free recipes to filter incoming viruses and spam. The procmail Web site has many links at the bottom to get you started. Other resources for information include the procmailex and procmailrc man page, as well as your favorite search engine. I have been using these resources and procmail for years and will continue to do so for years to come.

References

IETF RFC 2045 (MIME) -- http://www.ietf.org/rfc/rfc2045.txt

Procmail -- http://www.procmail.org

SpamAssassin -- http://www.spamassassin.org

Sendmail -- http://www.sendmail.org

Man pages -- procmail(1), procmailex(5), procmailrc(5)

Kevin Shortt has been working in a Unix Systems Engineering and Administration role with occasional programming for almost seven years. He is a graduate of the University at Buffalo with two Bachelor of Science degrees in Computer Science and Business Administration. He enjoys computers, playing soccer, and spending time with his wife and three children. He can be reached at: shortt@cgicafe.com.