bogofilter-tuning.HOWTO

David Relson relson at osagesoftware.com
Sun Feb 1 20:19:22 CET 2004


On 01 Feb 2004 13:56:57 -0500
Tom Anderson wrote:

> The bogofilter-tuning.HOWTO file appears to require updating.  It
> should assume the use of a single wordlist.db instead of seperate
> files, and also it need not talk about the "other" classification
> methods.  
> 
> Also, this tutorial recommends a robs value >= 0.01, however, the
> default in bogofilter.cf appears to be 0.001, which is specifically
> warned against in the tuning howto.  The default min_dev also appears
> to be artificially low compared to the recommended value here.
> 
> Moreover, I'm fairly certain that most people receive many more spams
> than hams, and I'm concerned about the verbiage recommending nearly
> equal numbers in the list.  Trying to maintain such an equilibrium is
> quite a time-intensive process and probably unfeasible for most
> people. My list contains 13598 spams to 4278 hams, or roughly 3:1, but
> I don't seem to suffer any ill effects.  In fact, I don't receive any
> false positives at all, ever, and only 5-8 false negatives per diem
> (cutoffs at 0.25, 0.65).  So where does this equilibrium
> recommendation stem from?
> 
> Finally, I would suggest that the warning about requiring constant
> updating in the -u mode be amended to consider the flip side.  Without
> training, as spam tactics mutate over time, your database may become
> as equally unusable.  So, in either case, you still have to train
> consistently.  Using -u permits you to save time by only training on
> the mistakes instead of on every email that arrives.
> 
> Greg, would you be interested, as the original author, in revising
> this file at all?
> 
> Tom

Tom,

Thanks for the notes about the HOWTO.  I'll look at the items you
mention and fix them for the next release.

FWIW, last time I checked, I was getting roughly comparable amounts of
ham and spam.

> P.S. Bogofilter rocks!  Those 13598 spams have been only since October
> when I started a new database from scratch.  Assuming manually
> checking for 1-2 seconds each of those spams without bogofilter,
> versus the 1-2 seconds per dozen or two in my spam box for false
> positives now using bogofilter, I've saved around 4 hours of time. 
> That's about 1 hour per month.  In the course of a year, that means I
> get 1.5 8-hour work days of bonus vacation because of bogofilter!

Is that all?  The last few months have each set a record high.  Nov was
a bit over 7000; Dec was slightly over 9000; and Jan was just over
10000.  For comparison, in Jan 2003 I got approx 1750 spam.

David




More information about the Bogofilter mailing list