Successful Spam Filtering
Jeffrey Fulmer
Email is an effective and inexpensive collaboration tool. Since
Ray Tomlinson's @ sign helped specify username at host computer,
electronic mail has become an integral part of our lives. Today
nearly 60 billion messages are sent on a daily basis. Of that total,
more than 60 percent can be classified as spam. Productivity gains
derived from email are offset by this nuisance.
Electronic junk mail is damaging to both mail systems and employee
productivity. Nearly every enterprise is affected by it, yet, according
to Gartner Research, only 10 percent have spam-filtering technologies
in place. This lapse is not for lack of filtering technologies;
there are many products from which to choose. Filtering solutions
that fail to consider business requirements, however, will not succeed.
This article will examine "best practices" for a successful implementation.
Any systems administrator who has participated in a project in
which large numbers of end users are affected understands that spam
filtering should not be taken lightly. It affects nearly every computer
user in the company. The risks are high, but so are the rewards.
If you reduce daily mailbox maintenance by 5 minutes for each of
1000 employees, then you will save the company the cost of about
8 average salaries. Those savings don't include bandwidth reduction
and damage prevented by virus quarantine. The line is fine. You
can offset savings by deleting a time-sensitive business contract
or an important sales lead. One such an occurrence could be enough
to derail your entire filtering project. For this reason, it is
necessary to examine the environment and the culture into which
you plan to introduce spam filtering.
Requirements Gathering
Like any business endeavor, communication is the key to success.
From the onset, you must engage business units in order to fit your
filter into the enterprise. Their feedback is vital. Explain the
effort and its importance to the company. Few people like spam,
so this is not a hard sell. If they desire your project, then it
is more likely to succeed. Once they agree to combat spam, explain
your proposal in detail. Then be prepared to listen.
Concerns will arise as details become apparent. New technologies
are often greeted with suspicion, and this one is no exception.
When users learn that private conversations will be filtered and
manipulated, you will have their complete attention. Some employees
will invoke Big Brother and sow seeds of discontent. Most will worry
about messages that never get delivered. Envision this meeting before
it occurs. When you select a technology for implementation, realize
that it has to be flexible. Many of the questions you encounter
will be non-negotiable requests. Can a particular address always
go through? Can you let all my messages through? The more legitimate
concerns you accommodate, the better your chances for acceptance.
Nothing will scrap your solution quicker than business disruptions.
If new contacts are deleted, if timely information is quarantined,
then immediate backlash will occur. The very thought of such interruptions
gives people reason to pause. If you break business functionality,
then management may scrap your system or scale it back so that it's
rendered meaningless. During these sessions, you must gather information
to build a comprehensive picture. Which domains should be whitelisted?
Mail from some companies should always go through. Does your daily
terminology match a pattern of spam? For example, while SpamAssassin
has a file dedicated to pornography keywords, it has another devoted
to medicinal drugs. Its keyword phrases file contains a large section
on low-cost loans and credit cards. Administrators in the pharmaceutical
or financial industries need to be sensitive to this reality before
implementation. The more tuning you can perform in advance, the
less pain you'll suffer later.
Few groups will be affected more by this project than the help
desk. They might receive calls before implementation. Once users
learn that email might be filtered for spam, they may call the help
desk at first sign of a problem. If a user discusses a filter with
a help desk operator who knows nothing about it, confusion will
reign. A sustained increase in help desk tickets will undermine
gains from spam reduction. A carefully planned implementation will
alleviate this problem. Make sure help desk personnel are aware
of the project from the onset. "The spam filter hasn't been installed
yet; I'll open a ticket with the mail administrator." Once the project
is implemented, you can reduce calls by automating as much as possible.
Allow people to view quarantined mail before it expires. A Web interface
that provides user-driven whitelisting will make everyone's life
easier. The more control you can put on the desktop, the fewer calls
and tickets you will generate.
Preparation
There are two key elements to a successful filtering implementation
-- effective communication, and gradual assimilation. Business units
are engaged in order to gather necessary requirements. Supplied
with appropriate information, software selection is easy. The administrator
needs to simply select a package to meet the requirements. If you
work in a shop that has not fully embraced open source technology,
then a thorough list of requirements could help you make the case
for a particular software package. A comparison of features and
requirements will help demonstrate its appropriateness.
While this article is focused on a successful implementation,
one issue must be addressed with regard to software selection. To
adequately evaluate a product, it is important to understand the
methods used to produce the false positive and false negative numbers
the vendor advertises. If false positives are tracked as a percentage
of all mail rather than as a percentage of mail that was flagged
as spam, then the vendor's numbers will be considerably lower than
actual user experience. The discrepancy will be compounded if you
rely on the vendor's unrealistic data when you introduce the project
to business units.
To alleviate concern for business disruption, a good filtering
system should offer several levels of granularity. Users should
be provided with a mechanism to opt out of the process entirely.
An automatic process allowing users to whitelist important email
addresses or domains reduces administrative overhead and engages
users. The more options you provide, the more empowered they'll
feel. If your solution is another tool in their box, then they'll
be more likely to accept this new technology and assist its implementation.
No amount of preparation is going to prevent false positives;
legitimate messages will be flagged as spam. When the filter is
first installed, test it in "audit" mode. During this phase, messages
are filtered and scored, data is collected, reports are generated
but no messages are deleted or quarantined. The administrator should
try to paint a comprehensive portrait of enterprise spam. The idea
is to establish score ranges for obvious spam, likely spam, and
legitimate messages (ham). As you gather statistics, continue to
tune the filter. With careful adjustment, you are ready for the
next step -- spam reduction.
With reports and statistics in hand, work with business units
to establish thresholds for quarantine and automatic deletion. It's
a good idea to initially set thresholds higher than your level of
confidence. In time you can ease them down to match your statistical
analysis. A gradual implementation will help increase the comfort
of others. Some business units may opt out of the system all together.
None of their mail should ever be stopped. Make sure the system
meets their requirements. Now, there is just one more step before
implementation.
All good systems require good documentation, and this is no exception.
Provide a detailed description of the system and its processes.
Describe how to view and retrieve quarantined mail. Users should
know how to append addresses and domains to a whitelist. Some administrators
send mail to the client with spam analysis embedded in the message.
Provide instructions that will enable a user to filter messages
in their email client based on that information. A desktop administrator
might be able to push a configuration to every desktop. A client-side
mailbox could serve as a primary or secondary quarantine area. Simply
because some users have opted out of the system doesn't mean they
should be excluded. Your documentation should provide instructions
on how to avoid spam. Many times users don't know what they did
to start the deluge.
Implementation
What good is spam filtering if you can't demonstrate how well
it works? Make sure you have a reporting mechanism in place before
implementation. At minimum, it should measure total volume by day,
month, and year, as well as the top outbound and inbound senders.
Once filtering is in place, the report should capture statistics
about the spam itself. How many messages were flagged as spam and
how many passed as legitimate? For what reasons were messages rejected?
The more data you have, the better you'll be able to react to it.
What is the effect of moving a spam threshold two-tenths of a point?
An adequate report will provide the answer.
A good administrator is careful and methodical. Okay, he's paranoid.
If you can't bring yourself to dump obvious spam into /dev/null,
then send it to another mailbox. If you have sufficient disk storage,
cycle the spam boxes with a log rotation program. Retain each daily
spam box for x number of days, then send it to /dev/null. This will
provide a buffer from which to recover false positives. If space
is lacking, daily files can be off-loaded to another server for
rotation.
Remember, gradual assimilation is one of the keys to success.
Anybody can reduce spam by 60% in a single month. It's better to
achieve that in 6 months to a year without interruption to the business.
Spam filtering requires continued administration. Unlike Ron Popeil,
you can't just "set it and forget it." Matt Cramer is an Information
Security Architect in a large multi-national enterprise; he implemented
a spam filtering system 2 years ago. During that time, email volume
increased 150% while Cramer was able to reduce unwanted email by
93%. Legitimate email messages are flagged as spam just one-hundredth
of one percent of the time. He never stopped tuning the configuration
to meet real-world conditions.
Cramer provides some advice for aspiring spam hunters. "Filtering
needs to strike a balance -- specific to your enterprise -- between
false positives and false negatives (missed spam). Whether you build
your own filters or purchase a commercial one, it is important for
an enterprise to understand what the business will tolerate for
these values, because no spam filter will be perfect."
Jeffrey Fulmer has administered enterprise computer systems
professionally since 1995. He is an open source software developer
and the primary author of siege. He currently resides in Pennsylvania
with his wife and English bulldog. |