bogofilter-tuning.HOWTO
David Relson
relson at osagesoftware.com
Sun Feb 1 20:19:22 CET 2004
On 01 Feb 2004 13:56:57 -0500
Tom Anderson wrote:
> The bogofilter-tuning.HOWTO file appears to require updating. It
> should assume the use of a single wordlist.db instead of seperate
> files, and also it need not talk about the "other" classification
> methods.
>
> Also, this tutorial recommends a robs value >= 0.01, however, the
> default in bogofilter.cf appears to be 0.001, which is specifically
> warned against in the tuning howto. The default min_dev also appears
> to be artificially low compared to the recommended value here.
>
> Moreover, I'm fairly certain that most people receive many more spams
> than hams, and I'm concerned about the verbiage recommending nearly
> equal numbers in the list. Trying to maintain such an equilibrium is
> quite a time-intensive process and probably unfeasible for most
> people. My list contains 13598 spams to 4278 hams, or roughly 3:1, but
> I don't seem to suffer any ill effects. In fact, I don't receive any
> false positives at all, ever, and only 5-8 false negatives per diem
> (cutoffs at 0.25, 0.65). So where does this equilibrium
> recommendation stem from?
>
> Finally, I would suggest that the warning about requiring constant
> updating in the -u mode be amended to consider the flip side. Without
> training, as spam tactics mutate over time, your database may become
> as equally unusable. So, in either case, you still have to train
> consistently. Using -u permits you to save time by only training on
> the mistakes instead of on every email that arrives.
>
> Greg, would you be interested, as the original author, in revising
> this file at all?
>
> Tom
Tom,
Thanks for the notes about the HOWTO. I'll look at the items you
mention and fix them for the next release.
FWIW, last time I checked, I was getting roughly comparable amounts of
ham and spam.
> P.S. Bogofilter rocks! Those 13598 spams have been only since October
> when I started a new database from scratch. Assuming manually
> checking for 1-2 seconds each of those spams without bogofilter,
> versus the 1-2 seconds per dozen or two in my spam box for false
> positives now using bogofilter, I've saved around 4 hours of time.
> That's about 1 hour per month. In the course of a year, that means I
> get 1.5 8-hour work days of bonus vacation because of bogofilter!
Is that all? The last few months have each set a record high. Nov was
a bit over 7000; Dec was slightly over 9000; and Jan was just over
10000. For comparison, in Jan 2003 I got approx 1750 spam.
David
More information about the Bogofilter
mailing list