Keeping the cruft out (was Re: no To: header in emails)

Bob George mailings02 at ttlexceeded.com
Wed Mar 3 15:52:05 CET 2004


Eric Wood <eric at interplas.com> wrote:
> [...]
> It's not.  Except to say that I'm trying to /dev/null the
> cruft before it hits my bogofilter section and spoil my
> database.

I'm going to some lengths to avoid cruft in bayes as well:

1. I'm using several tools to flag spam (spamassassin, bogofilter, others).
2. Anything that gets flagged by any of these tools goes to a folder for review
(including bogo "unsure").
3. Only "real" spam gets added to a folder for spam training, and OK stuff
winds up in the ham training folder. False positives/negatives are corrected
and retrained seperately via -Ns, -Sn.
4. Messages fed into training are passed through a filter that strips out local
headers.
5. Nothing is fed to training automatically.
6. No spam-related list mails are fed into bayes at all.

So far, a few thousand of each ham and spam have been fed into bogofilter.
Based on my reading of the docs, I'm doing an ongoing "full training" backed
with "retrain on error." I've got the luxury of being able to temporarily store
inbound ham, so I don't need to use -u. I plan to back off to "train on
exception" eventually. It seems to be working VERY well for me.

Is this a good approach?

- Bob






More information about the Bogofilter mailing list