|Download||: crmtrainer (4 KB)|
This script has been written with the CRM114 spam filter in mind, but can be easily adapted to any bayesian filter. The idea is to provide a simple way for any remote user in order to train their server-side filter. Reciepe :
- Have an IMAP server with storage in mbox format
- Have CRM114 properly installed and filtering (of course, still untrained). It should tag email by appending the standard X-CRM114-* header, and automatically send spam-tagged emails to a Spam folder.
- For every user, identify the path to their Inbox and Spam folders. Have crmtrainer run regularly on it.
When a spam goes through the user's inbox, the latter will simply move it to its Spam folder; crmtrainer will notice this and notify CRM114 about its error. Similarly, the user can move a mislassified ham out of its Spam folder, and crmtrainer will also retrain the filter. The user only naturally moves emails between its Inbox and Spam folders.
- It was first written with Mail::Box (for mbox, mh and maildir support) but was awfully slow (about 100-500x compared to the simple mbox parser in crmtrainer).
- CRM114 works best with train-on-errors (TOE), which is very natural for a human to train.
- crmtrainer effort is mostly proportionnal to the number of emails to retrain, and a user will only retrain a few misplaced emails a day when CRM114 corpus is mature.
- crmtrainer should be run from a user conjob (crontab -e). Ideally, it would be triggered by the IMAP server when a 'move mail' operation occurs.
- crmtrainer does not properly locks the mbox files (but only reads them). It means that in the worst case it could retrain garbled emails.
sh: line 1: crmtrainer: command not found