Cover V14, i13

Article

jun2005.tar

Successful Spam Filtering

Jeffrey Fulmer

Email is an effective and inexpensive collaboration tool. Since Ray Tomlinson's @ sign helped specify username at host computer, electronic mail has become an integral part of our lives. Today nearly 60 billion messages are sent on a daily basis. Of that total, more than 60 percent can be classified as spam. Productivity gains derived from email are offset by this nuisance.

Electronic junk mail is damaging to both mail systems and employee productivity. Nearly every enterprise is affected by it, yet, according to Gartner Research, only 10 percent have spam-filtering technologies in place. This lapse is not for lack of filtering technologies; there are many products from which to choose. Filtering solutions that fail to consider business requirements, however, will not succeed. This article will examine "best practices" for a successful implementation.

Any systems administrator who has participated in a project in which large numbers of end users are affected understands that spam filtering should not be taken lightly. It affects nearly every computer user in the company. The risks are high, but so are the rewards. If you reduce daily mailbox maintenance by 5 minutes for each of 1000 employees, then you will save the company the cost of about 8 average salaries. Those savings don't include bandwidth reduction and damage prevented by virus quarantine. The line is fine. You can offset savings by deleting a time-sensitive business contract or an important sales lead. One such an occurrence could be enough to derail your entire filtering project. For this reason, it is necessary to examine the environment and the culture into which you plan to introduce spam filtering.

Requirements Gathering

Like any business endeavor, communication is the key to success. From the onset, you must engage business units in order to fit your filter into the enterprise. Their feedback is vital. Explain the effort and its importance to the company. Few people like spam, so this is not a hard sell. If they desire your project, then it is more likely to succeed. Once they agree to combat spam, explain your proposal in detail. Then be prepared to listen.

Concerns will arise as details become apparent. New technologies are often greeted with suspicion, and this one is no exception. When users learn that private conversations will be filtered and manipulated, you will have their complete attention. Some employees will invoke Big Brother and sow seeds of discontent. Most will worry about messages that never get delivered. Envision this meeting before it occurs. When you select a technology for implementation, realize that it has to be flexible. Many of the questions you encounter will be non-negotiable requests. Can a particular address always go through? Can you let all my messages through? The more legitimate concerns you accommodate, the better your chances for acceptance.

Nothing will scrap your solution quicker than business disruptions. If new contacts are deleted, if timely information is quarantined, then immediate backlash will occur. The very thought of such interruptions gives people reason to pause. If you break business functionality, then management may scrap your system or scale it back so that it's rendered meaningless. During these sessions, you must gather information to build a comprehensive picture. Which domains should be whitelisted? Mail from some companies should always go through. Does your daily terminology match a pattern of spam? For example, while SpamAssassin has a file dedicated to pornography keywords, it has another devoted to medicinal drugs. Its keyword phrases file contains a large section on low-cost loans and credit cards. Administrators in the pharmaceutical or financial industries need to be sensitive to this reality before implementation. The more tuning you can perform in advance, the less pain you'll suffer later.

Few groups will be affected more by this project than the help desk. They might receive calls before implementation. Once users learn that email might be filtered for spam, they may call the help desk at first sign of a problem. If a user discusses a filter with a help desk operator who knows nothing about it, confusion will reign. A sustained increase in help desk tickets will undermine gains from spam reduction. A carefully planned implementation will alleviate this problem. Make sure help desk personnel are aware of the project from the onset. "The spam filter hasn't been installed yet; I'll open a ticket with the mail administrator." Once the project is implemented, you can reduce calls by automating as much as possible. Allow people to view quarantined mail before it expires. A Web interface that provides user-driven whitelisting will make everyone's life easier. The more control you can put on the desktop, the fewer calls and tickets you will generate.

Preparation

There are two key elements to a successful filtering implementation -- effective communication, and gradual assimilation. Business units are engaged in order to gather necessary requirements. Supplied with appropriate information, software selection is easy. The administrator needs to simply select a package to meet the requirements. If you work in a shop that has not fully embraced open source technology, then a thorough list of requirements could help you make the case for a particular software package. A comparison of features and requirements will help demonstrate its appropriateness.

While this article is focused on a successful implementation, one issue must be addressed with regard to software selection. To adequately evaluate a product, it is important to understand the methods used to produce the false positive and false negative numbers the vendor advertises. If false positives are tracked as a percentage of all mail rather than as a percentage of mail that was flagged as spam, then the vendor's numbers will be considerably lower than actual user experience. The discrepancy will be compounded if you rely on the vendor's unrealistic data when you introduce the project to business units.

To alleviate concern for business disruption, a good filtering system should offer several levels of granularity. Users should be provided with a mechanism to opt out of the process entirely. An automatic process allowing users to whitelist important email addresses or domains reduces administrative overhead and engages users. The more options you provide, the more empowered they'll feel. If your solution is another tool in their box, then they'll be more likely to accept this new technology and assist its implementation.

No amount of preparation is going to prevent false positives; legitimate messages will be flagged as spam. When the filter is first installed, test it in "audit" mode. During this phase, messages are filtered and scored, data is collected, reports are generated but no messages are deleted or quarantined. The administrator should try to paint a comprehensive portrait of enterprise spam. The idea is to establish score ranges for obvious spam, likely spam, and legitimate messages (ham). As you gather statistics, continue to tune the filter. With careful adjustment, you are ready for the next step -- spam reduction.

With reports and statistics in hand, work with business units to establish thresholds for quarantine and automatic deletion. It's a good idea to initially set thresholds higher than your level of confidence. In time you can ease them down to match your statistical analysis. A gradual implementation will help increase the comfort of others. Some business units may opt out of the system all together. None of their mail should ever be stopped. Make sure the system meets their requirements. Now, there is just one more step before implementation.

All good systems require good documentation, and this is no exception. Provide a detailed description of the system and its processes. Describe how to view and retrieve quarantined mail. Users should know how to append addresses and domains to a whitelist. Some administrators send mail to the client with spam analysis embedded in the message. Provide instructions that will enable a user to filter messages in their email client based on that information. A desktop administrator might be able to push a configuration to every desktop. A client-side mailbox could serve as a primary or secondary quarantine area. Simply because some users have opted out of the system doesn't mean they should be excluded. Your documentation should provide instructions on how to avoid spam. Many times users don't know what they did to start the deluge.

Implementation

What good is spam filtering if you can't demonstrate how well it works? Make sure you have a reporting mechanism in place before implementation. At minimum, it should measure total volume by day, month, and year, as well as the top outbound and inbound senders. Once filtering is in place, the report should capture statistics about the spam itself. How many messages were flagged as spam and how many passed as legitimate? For what reasons were messages rejected? The more data you have, the better you'll be able to react to it. What is the effect of moving a spam threshold two-tenths of a point? An adequate report will provide the answer.

A good administrator is careful and methodical. Okay, he's paranoid. If you can't bring yourself to dump obvious spam into /dev/null, then send it to another mailbox. If you have sufficient disk storage, cycle the spam boxes with a log rotation program. Retain each daily spam box for x number of days, then send it to /dev/null. This will provide a buffer from which to recover false positives. If space is lacking, daily files can be off-loaded to another server for rotation.

Remember, gradual assimilation is one of the keys to success. Anybody can reduce spam by 60% in a single month. It's better to achieve that in 6 months to a year without interruption to the business. Spam filtering requires continued administration. Unlike Ron Popeil, you can't just "set it and forget it." Matt Cramer is an Information Security Architect in a large multi-national enterprise; he implemented a spam filtering system 2 years ago. During that time, email volume increased 150% while Cramer was able to reduce unwanted email by 93%. Legitimate email messages are flagged as spam just one-hundredth of one percent of the time. He never stopped tuning the configuration to meet real-world conditions.

Cramer provides some advice for aspiring spam hunters. "Filtering needs to strike a balance -- specific to your enterprise -- between false positives and false negatives (missed spam). Whether you build your own filters or purchase a commercial one, it is important for an enterprise to understand what the business will tolerate for these values, because no spam filter will be perfect."

Jeffrey Fulmer has administered enterprise computer systems professionally since 1995. He is an open source software developer and the primary author of siege. He currently resides in Pennsylvania with his wife and English bulldog.