Database persistence
Tom Anderson
tanderso at oac-design.com
Tue Sep 19 22:30:30 CEST 2006
Christophe Journel wrote:
> Is it a good idea to erase old token which have been included in the
> database before a given date,
> instead of using bogotil to erase old tokens.
>
> For intance, the word : Hello
> since 2003, i added at least 150 000 times this word as ham.
> If tomorrow this word comes into spam, i will certainly have to wait a long
> time since a mail is tagged spam.
I get around this problem by using exhaustive training on each email,
meaning that if a spam comes in as a false negative and I send it to
bfproxy for training, it will be trained again and again until the
correct bogosity is achieved. This usually only requires 1-2
iterations, but I've had some go 5 or 6. What this achieves is
adjusting, for example, the "hello" token, among others, enough times to
reduce or increase its score just enough to send this particular email
over the edge to the correct classification. There's no need to make
drastic manual adjustments to your list if you do automatic exhaustive
training on every error. I've been using this method for going on two
years now and my filter accuracy is very high.
You can get bfproxy here:
http://www.orderamidchaos.com/bogofilter/bfproxy
Tom
More information about the Bogofilter
mailing list