One Way to Automatically Delete Spam on Pair Networks’ Servers
Author
Zig Zichterman
Date
December 14, 2003
Downloading spam to your email client and then deleting it is inefficient. It wastes server space, time and bandwidth. Far better is to never deliver spam to your mailbox.
Servers at Pair Networks have spamassasin
, which helps identify spam.
Servers at Pair Networks also have procmail
, which help route and act
on mail based on some filters. Using the two together to block delivery
of spam requires a bit of understanding of how mail works at Pair, and
a short procmail
script.
The End Result
I got what I wanted:
spamassassin
score 5.0+: obvious spam. Quietly delete this mail, I don't want to even know about it.spamassassin
score 2.0-5.0: probably spam, but some wanted mail. Store this in agrey area
mailbox, so I can sift through it manually.spamassassin
score 2.0-: wanted mail. Send this on through to my mailbox.
I even got a few niceties:
- one script needs to work for all the email accounts on my domain.
- keep spam and ham mailbox files on the server so that I can use them to
retrain
spamassassin
’s Bayesian filters.
Instructions
1. Turn off Junk E-Mail Filtering for the mailbox
Turn off in Junk E-Mail Filtering
for the mailbox in the Pair Account Control Center
for the mailbox. Yes, I said off. You will turn it on elsewhere, in a recipe.
2. Create a Filter recipe for the mailbox.
- Passed To:
/usr/local/bin/procmail MYACCOUNT=email-account
- Junk E-Mail Filtering: ON (checked)
In olden days, you had to pass to preline
, which would then pass
to procmail
. But now the Account Control Center quietly does this preline
for you behind the scenes. (You can view the Account Control Center’s
output files /var/rules/pair-account-name/.qmail*
, updated every 5
minutes or so:
/var/rules/ziggr/.qmail-ziggr:com-uinuin # FILTER uinuin@ziggr.com /usr/local/bin/procmail MYACCOUNT=uinuin __SPAM |/usr/local/bin/sa_client.pl -rh -rs -c1 '/var/qmail/bin/preline /usr/loc al/bin/procmail MYACCOUNT=uinuin ' # MAILBOX ziggr.com@uinuin@uinuin |/var/qmail/bin/preline -f /var/qmail/bin/bouncesaying "Message rejected by mailbox due to space constraints." /usr/local/libexec/virtual.local -f ${SENDER:-${USER:-UNKNOWN}} ziggr/ziggr.com/uinuin
MYACCOUNT=uinuin
sets a variable that I use in my procmail script.
This variable controls the directory where the procmail delivers mail.
This lets me use the same script for all my domain’s email accounts.
The MAILBOX
part is automatically generated for every mailbox,
and is not part of the filter rule.
3. Initialize mailbox files
Initialize the mailbox files where your messages will eventually land.
In your email client, create IMAP folders grey
and ham
.
Although I am not positive this step is required, I fear that if you skip this
step, and procmail creates these files instead of your IMAP client, IMAP will
become confused. IMAP on Pair Networks servers requires some special setup
stuff at the top of each IMAP mailbox file.
Once you create files in your IMAP client, you can see the mailbox files at
/usr/boxes/pair-account-name/your-domain.com/$MYACCOUNT^/.imap/
If you use POP3 instead of IMAP, you can see the mailbox files at
/usr/boxes/pair-account-name/your-domain.com/$MYACCOUNT
If you use POP3 for your main account (account@pair.com), your mailbox file is in your home directory, I think.
4. Write a ~/.procmailrc
script
# enable debugging logging LOGFILE=$HOME/.maillog VERBOSE=YES # prevent qmail (the program that is calling procmail # as a filter) from delivering the original mail. We'll # handle all delivery from here, thankyouverymuch. EXITCODE=99 # aim our output at the appropriate imap folder # somewhere in pair's file hierarchy. $MYACCOUNT # is set on the command line that invoked us, so # don't set it here. #MYACCOUNT=uinuin MYDIR=/usr/boxes/ziggr/ziggr.com/$MYACCOUNT^/.imap # Spam level 5.0 or greater: autodelete :0 * ^X-Spam-Level: \*\*\*\*\* /dev/null # Spam level 2.0-4.9: hold in grey area :0 * ^X-Spam-Level: \*\* $MYDIR/grey # Spam level < 2.0: it's probably real email, deliver as normal :0 $MYDIR/ham
Those first two lines (LOGFILE
and VERBOSE
)
are just debugging hooks, you can comment them out (stick a # sign in
front of them) once you are happy with everything. If you do not eventually
comment out these lines, ~/.maillog
will grow and grow, consuming
your disk quota. While developing, I usually keep one shell window open with
tail -f ~/.maillog
running, so I can see as each email message
comes in and gets routed.
EXITCODE=99
tells qmail
that we are going to take care of
delivering each message, and that qmail
does not need to deliver it to the mailbox.
If we did not do this, qmail
would see the default return code 0 (OK)
and interpret that as the filter program said the email was acceptable, so now I should
deliver it.
This results in duplicate copies of all mail landing in your mailbox, as well
as filtered spam/ham copies landing in spam/ham mailboxes.
MYDIR=/usr/boxes/ziggr/ziggr.com/$MYACCOUNT^/.imap
sets the parent
directory that contains mailbox files grey
and ham
. If you use
POP3 instead of IMAP, you will want to change this path, and you will need to keep
separate POP3 mailboxes for grey
and ham
.
Next come the two filters that sort spam:
# Spam level 5.0 or greater: autodelete :0 * ^X-Spam-Level: \*\*\*\*\* /dev/null
This first filter looks for any message whose header contains a spamassassin
score of 5.0 or greater (5 stars). These messages are routed to /dev/null
and never heard from again.
# Spam level 2.0-4.9: hold in grey area :0 * ^X-Spam-Level: \*\* $MYDIR/grey
This second filter looks for any message whose header contains a spamassassin
score of 2.0 to 4.9. These files are routed to the mailbox $MYDIR/grey
,
where they will sit until you manually sift through the mess.
Note that the order of filters in procmail filters is significant. If you put the 2-star filter first, it would match before the 5-star filter, and all spam would land in the grey box, no spam would be automatically deleted.
# Spam level < 2.0: it's probably real email, deliver as normal :0 $MYDIR/ham
All remaining mail is sent to a ham mailbox.
That’s It. You’re Done.
You might want to monitor ~/.maillog
for a few days to see what
email is being deleted and why, or tweak the number of stars in the ~/.procmailrc
filters.
A Note on Bayesian Filters
Your Pair account stores its Bayesian data in /usr/boxes/your-pair-account/bayes/
. This data
is shared across all email accounts in your domain, so if your accounts have wildly
diverging mail, you might not be able to use Bayesian filters as effectively as you would
like.
You do not have to share Bayesian filter data across all your email accounts.
You could set up your own spamassassin
configuration, but that is a bit
more work than I was willing to do.
References
pairusers.com
especially http://www.pairusers.com/?HowPairEmailReallyWorks
PairUsers.com is a wiki database capturing a lot of helpful knowledge about being a Pair Networks customer. Not an official Pair Networks project, just a bunch of users helping out other users.
man pages
man procmail
man procmailrc
man procmailex
RTFM, baby! I eventually had to break down and read the man pages on procmail
.
Mindlessly copying and pasting from other example scripts was not nearly as effective as
spending a few minutes to learn what I was doing.
procmail
FAQ
http://rhols66.adsl.netsonic.fi/era/procmail/mini-faq.html
Although I read the FAQ, I neither needed nor used anything
I learned from the FAQ. My .procmailrc
is just too simple.
Pair Support Web Pages
http://pair.com/support/knowledge_base/e-mail/junk_e-mail_filtering_overview.html
Pair Networks’ own support pages are pretty useless on this topic. Just a few light
sentences telling how to turn on/off a checkbox. Nothing on combining the three tools
qmail
, spamassassin
and procmail
.
Pair Support Newsgroups
nntp://news.pair.com/pair.mail
The pair.mail
newsgroup on Pair Networks’ support forums contains a vast
wealth of data, dating back years.