Maintaining a snappy bogofilter
David Relson
relson at osagesoftware.com
Fri Apr 11 15:45:52 CEST 2003
At 09:34 AM 4/11/03, Chris Ditri wrote:
>Hello again,
>
>Well, it is nice to know that the wordlists can get very large before any
>action has to be taken.
>
>I was wondering if it was necessary to every few months dump it all (using
>bogoutil) to ascii, strip out non ascii characters, and load it back into a
>new db. Does this sound inappropriate or excessive?
>
>I already use the bogoutil feature to kill words over 200 characters long and
>over 3 months old. Does this sound excessive as well?
>
>Thanks!
>
>Chris
Chris,
Have you considered adding "replace_nonascii_characters=Yes" to your config
file? That'd take care of your first problem.
Bogofilter already pitches tokens longer than 35 characters.
"3 months" sounds reasonable. I don't know whether it is necessary or will
make a noticeable difference. My wordlists contain everything back to Oct
6 when I put bogofilter into production. I have rebuilt the wordlists
several times. The 0.7/0.8 database format change necessitated one of the
changes. I also rebuilt sometime after switching from Graham to Robinson
(since they use different MAX_REPEATS values).
David
More information about the Bogofilter
mailing list