Database persistence

Tom Anderson tanderso at oac-design.com
Tue Sep 19 22:30:30 CEST 2006


Christophe Journel wrote:
> Is it a good idea to erase old token which have been included in the
> database before a given date,
> instead of using bogotil to erase old tokens.
> 
> For intance, the word : Hello
> since 2003, i added at least 150 000 times this word as ham.
> If tomorrow this word comes into spam, i will certainly have to wait a long
> time since a mail is tagged spam.

I get around this problem by using exhaustive training on each email, 
meaning that if a spam comes in as a false negative and I send it to 
bfproxy for training, it will be trained again and again until the 
correct bogosity is achieved.  This usually only requires 1-2 
iterations, but I've had some go 5 or 6.  What this achieves is 
adjusting, for example, the "hello" token, among others, enough times to 
reduce or increase its score just enough to send this particular email 
over the edge to the correct classification.  There's no need to make 
drastic manual adjustments to your list if you do automatic exhaustive 
training on every error.  I've been using this method for going on two 
years now and my filter accuracy is very high.

You can get bfproxy here:
http://www.orderamidchaos.com/bogofilter/bfproxy

Tom




More information about the Bogofilter mailing list