New software uploaded [was: Problems with Asian Spam]

Tom Anderson tanderso at oac-design.com
Wed Nov 22 17:11:26 CET 2006


David Relson wrote:
> I suspect part of bogofilter's slowness in learning these are spam
> is that my wordlist has approx 500,000 messages in it and this
> causes learning to be slow.  
> 
> I'm thinking of adding a "--scale" option to bogoutil that would allow
> counts to be scaled.  For example, scaling to 10,000 would scale counts
> from 1...N to 1..10000. 
> 
> Whether this helps can be tested by registering a bunch of false
> negatives with old wordlist and again with scaled wordlist and seeing
> if messages scores are more appropriate.

Training to exhaustion works wonders for me.  If you get a false 
negative, train it as spam again and again until it correctly classifies 
as spam.  This wipes out any scaling issues.  Usually only one training 
run does it, but some spams take 5 or more.  Bfproxy does this 
automatically for me, and I rarely see the same spam twice.

http://www.orderamidchaos.com/bogofilter/bfproxy

Tom




More information about the Bogofilter mailing list