bogofilter-0.13.0 available

Greg Louis glouis at dynamicro.on.ca
Thu May 22 13:26:41 CEST 2003


On 20030521 (Wed) at 2012:41 -0400, David Relson wrote:

> I'm the laid back member of the team.  Greg and Matthias help to keep me 
> honest.

Since this seems to be valued, here goes Greg again keeping David
honest ;)

> The drift Greg writes about will take a while before it can cause 
> errors.  If you have large wordlists, it will be a long while.  For small 
> wordlists, it'll be a short while.

Not necessarily.  Especially with the high minimum deviations we have
been advocating before making these recent changes, errors may arise
very quickly after a mistake in training.  I'm sure we've all seen
messages that get a score that's wrongly close to 1 or 0 before
training and yet get a score that's rightly close to 0 or 1 after one
training pass.  This is the downside to using high minimum deviations:
when there are few tokens contributing to the decision, a little
training can go a long way very quickly.  Using a high minimum
deviation seriously exacerbates the risk involved in the -u option,
because detrimental effects can be immediate or nearly so.

The good news is that preliminary experimentation suggests that with
Paul Graham's refinements as implemented in 0.13, we may be able to get
back to looking at most of the tokens.  Right now I personally am
getting best results with a minimum deviation of 0.02, down from 0.44
that was optimal with 0.12.3.  Big users especially should consider
retuning.

As for retraining, David rightly says you can do without it if you're
patient.  Being a type A d00d, I wasn't patient ;)  What I did at home
was a full rebuild of the training db.  At work, users have submitted
false positives and false negatives for training and I don't have those
messages to retrain with, so in order not to lose them and yet to start
reaping the benefit of the new options, I took enough hand-classified
spams and nonspams to increase the database size by about 60% ('coz
that's what I had available) and trained on those with the new options. 
Seems to be doing ok so far (all of 15 hours, mind you).

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |





More information about the Bogofilter mailing list