me rindo: training to exhaustion

David Relson relson at osagesoftware.com
Sat May 7 18:54:59 CEST 2005


On Sat, 7 May 2005 12:41:02 -0400
Todd Slater wrote:

> I call bogofilter in my .procmailrc with -e and -p; mails get sent to
> different folders based on the X-Bogosity header.
> 
> I'm getting more "unsures" than I'd like and was thinking that training
> to exhaustion might speed up bogofilter's learning. I use maildir format
> and it looks like bogominitrain.pl requires mbox. I use mutt for mail
> and while I can write tagged messages to an mbox, it's kind of a pain.
> 
> What's the basic command line stuff I'd need to train to exhaustion? Is
> it proper to repeatedly register an unsure that's spam with "bogofilter
> -s < message" and then check with "bogofilter -v < message" to see when
> bogofilter thinks it's spam?
> 
> Thanks,
> Todd

H'lo Todd,

I don't recommend train to exhaustion.  True, it'll give you short term
accuracy.  However, it will also lessen your long term accuracy.  

I think you'd do better by lowering the value of spam_cutoff in
bogofilter.cf.  Running "bogofilter -Q" will show your current
settings.  Bogofilter starts with a conservative 0.99 value for this
since it's better to have false negatives than false positives. 

One approach is to arbitrarily select a lower value like 0.98 or 0.95
or 0.90...  

A better approach is to look at all the scores for the Unsure-Ham
(unsures that are really ham) and then set spam_cutoff above the
highest of those scores.  This will ensure that none of those Unsures
will be considered spam.

HTH,

David




More information about the Bogofilter mailing list