Maintaining a snappy bogofilter

David Relson relson at osagesoftware.com
Fri Apr 11 15:45:52 CEST 2003


At 09:34 AM 4/11/03, Chris Ditri wrote:

>Hello again,
>
>Well, it is nice to know that the wordlists can get very large before any
>action has to be taken.
>
>I was wondering if it was necessary to every few months dump it all (using
>bogoutil) to ascii, strip out non ascii characters, and load it back into a
>new db.  Does this sound inappropriate or excessive?
>
>I already use the bogoutil feature to kill words over 200 characters long and
>over 3 months old.  Does this sound excessive as well?
>
>Thanks!
>
>Chris

Chris,

Have you considered adding "replace_nonascii_characters=Yes" to your config 
file?  That'd take care of your first problem.

Bogofilter already pitches tokens longer than 35 characters.

"3 months" sounds reasonable.  I don't know whether it is necessary or will 
make a noticeable difference.  My wordlists contain everything back to Oct 
6 when I put bogofilter into production.  I have rebuilt the wordlists 
several times.  The 0.7/0.8 database format change necessitated one of the 
changes.  I also rebuilt sometime after switching from Graham to Robinson 
(since they use different MAX_REPEATS values).

David







More information about the Bogofilter mailing list