Note: this page is horribly outdated. The CRM114 program hasn't received any updates since 2010. If you're looking for a bayesian e-mail filter, look at Apache SpamAssassin instead.
CRM114 with maildrop
A friend of mine, Joost van Baal, mentioned some time ago that he was autoconfiscating a spam detector called CRM114. Though bogofilter has worked reasonably well for me, I thought it'd be nice to try something new. This document describes how I installed and configured CRM114. It's by no means the only and certainly not the best way, but it Works For Me™.
The default setup of CRM114 assumes it's ok to change email as it filters it, by adding a few headers. I personally prefer to have such software (useful as it may be) to leave my precious spam^H^H^H^Hemail alone as much as possible. These scripts simply pass judgment, the task of the actual delivering is left to maildrop.
Fish
The scripts below are set up to simply give a HAM/SPAM/FISH message instead of acting as a filter. If the script reports ‘FISH’, then that means that crm114 was uncertain about whether it was spam or not: the email smells fishy but may be ham after all. Every piece of email in the fish category needs to be fed back to crm114 to train it. You should also train crm114 on false positives and false negatives, but those should be rare with this system.
Installing the software
# aptitude install crm114 mimedecode procmail
Because CRM114 wants to invoke formail, you need the procmail package too, even if we'll use maildrop to do our actual delivery.
Setting up ~/.crm114/
All files in this directory can be generated with this Makefile.
Simply mkdir ~/.crm114
, then type make -C ~/.crm114
.
Setting up the configuration file
Although the Makefile installs a default configuration file, it may be easier to download this config file instead. The following settings are especially relevant:
:log_to_allmail.txt: /no/ :mime_decoder: /mimedecode/ :accepted_mail_exit_code: /0/ :rejected_mail_exit_code: /0/ :program_fault_exit_code: /1/
Creating wrapper scripts
These four wrapper scripts will invoke CRM114 in a convenient way:
crm-stats
– invokes crm114 to determine the score of the message;crm-judge
– judges if something is spam, ham or fish by invokingcrm-stats
and inspecting the score;crm-spam
– corrects CRM114 if it mistook a spam email for normal mail or fish;crm-ham
– corrects CRM114 if it mistook a normal email for spam or fish.
Put these scripts somewhere in your PATH
and set the executable permission bits on them (chmod 755 crm-*
).
Setting up mutt
I use the following bindings and config options:
macro index H "<pipe-message>~/bin/crm-ham\n<save-message>!\n" macro pager H "<pipe-message>~/bin/crm-ham\n<save-message>!\n" macro index S "<pipe-message>~/bin/crm-spam\n<save-message>=spam\n" macro pager S "<pipe-message>~/bin/crm-spam\n<save-message>=spam\n" unset pipe_decode set sleep_time=0 unset wait_key
Correct the path to the wrapper scripts to where you put them.
Setting up maildrop
Your ~/.mailfilter
would look something like this:
CRM114=`~/bin/crm-judge` if($CRM114 eq 'FISH') to mail/fish if($CRM114 eq 'SPAM') to mail/spam to $DEFAULT
This way, if anything is wrong it will deliver your mail to the default destination instead of classifying it as spam.
Training the system
Just let CRM114 do its job for a bit, make sure to inspect the fish folder. Whenever it makes a mistake, use the H and S keys as appropriate to move the messages to the right place. It only expects to be trained on mistakes, but if you feed it something that it would've gotten correct anyway it will simply disregard it. That way you can use the H and S keys without worrying about overcorrecting it.
Don't try to feed it a corpus, that will only confuse the poor thing. It learns fast enough anyway.
Happy filtering.