|
The first real content in this area is everybody's favorite: spam!
Seriously, though, I do not profess to be an expert on the topic, but
I do have several pieces of interesting technology in place (read software
solutions) that may be of interest to some of you.
In my anti-spam arsenal, I have:
| | check for a message-id header, etc |
| | is it to me? pipe to spam filters, or mail folders |
| | a learning mail filter |
| | which relay is bad? |
| | redirect known spammers to spamd |
| | sandbox/tarpit yet leave a reason why in the reject message |
I also have some special email addresses that have a direct line to being
classified as spam:
So in general, my solution at my own apartment flows like this:
- Incoming email is either accepted by sendmail, not a known spammer yet,
or redirected via pf to spamd, sandbox away!
- mail to me is filtered through procmail, which determines via bmf if
this seems to be spam, or not
- spam gets shoved into a mail folder named bulk.spam-YYYY-MM
- if the email recipient is one of the above emails, I shove this into
a special sendmail queue that re-classifies non-spam as spam (training
the filter), and also gets stored in bulk.spam-YYYY-MM
- if the mail is 'to' or 'cc' me, it gets into my inbox
- otherwise, procmail dumps the mail into one of many multiple mailing list
folders
- as I read my mailing lists, or inbox, and I find that spam has filtered
through the above mechanisms, I have a few macros defined in my .muttrc
that are of use:
index S "|/usr/sbin/sendmail -L sm-spamd-queuer -C/u/todd/etc/mail/sendmail.cf todd@spam.fries.net\ns=spam-`date +%Y-%m`.bz2"
macro index X ";|formail -s /usr/sbin/sendmail -L sm-spamd-queuer -C/u/todd/etc/mail/sendmail.cf todd@spam.fries.net\n;s=spam-`date +%Y-%m`.bz2"
macro index A "|/u/todd/bin/pipetogoodprogs\ns"
macro index V "|bmf -t\n"
... without going into too many details, the above .muttrc macros allow me
to press 'S' on a specific message that is spam, and add it to my 're-classify'
queue. If I tag a bunch of messages, X does the same thing, only on all of
the tagged messages. If I go into my 'bulk.spam-YYYY-MM' file and notice
something that should not be flagged as spam, I can tap 'A'. If I am curious
if bmf will classify a particular message as spam or not, I can tap 'V'.
- The scripts above (pipetogoodprogs and the sendmail queue) reclassify
email as good or bad, respectively. This is necessary because if a message
is 'learned' to be good when indeed it is bad, this results in spam in my
inbox. Or visa versa. The programs I pipe to are bmf and relaydb.
- bmf is a Bayesian mail filter. Suffice it to say, it counts frequency of
words in good and bad email messages, and uses these frequencies too determine
messages that are good or bad. When classifying a message, it also learns by
adding the counts to good or bad tallies. Thus a feedback mechanism allows
a much more flexible method than a static set of rules. At one point in
time, when I intiially tried it, spamassassin only had a static set of rules.
I hear it uses Bayesian filtering as well now. C'est la vie!
- relaydb (from the man page) is a mail header analyzer that builds a
database of IP addresses either known as legitimate senders or spammers.
I invoke relaydb when it is known that a particular mail is spam, or not.
I also invoke relaydb when I am re-classifying email. This is how it is
meant to be used.
- spamd is fed a blacklist from spews and relaydb. The whitelist in relaydb
is used to remove any addresses that are 'known to be good mail relays'.
I have my own personal whitelist, hosts I always want to receive mail
from, and my own personal blacklist, hosts that have annoyed me and I
never wish to receive mail from them again.
So there you have it. From pf to sendmail or spamd, to procmail, to bmf and
classification, to relaydb, and then feedback to pf/spamd. This is my setup.
Any questions?
Todd T. Fries todd@fries.net
|