Maintaining a snappy bogofilter
David Relson
relson at osagesoftware.com
Thu Apr 10 15:00:54 CEST 2003
At 08:41 AM 4/10/03, Chris Ditri wrote:
>Hello Everyone,
>
>I was wondering what people to do keep their goodlist and spamlist databases
>fast and trim. Do they need to be rebuilt from time to time or somehow
>"defragged"?
>
>Any recommendations?
>
>Thanks!
>
>Chris
Chris,
My spamlist currently has 80,413 words and 11,306 messages and my goodlist
has 235,043 words and 29,736 messages. Performance seems fine and I don't
do anything to keep it fast and trim.
If I _were_ to do something, I'd use the maintenance capabilities in
bogoutil. Two capabilities in particular come to mind. The first is the
ability to delete all hapaxes, i.e. words occurring only once in the
corpus. The second is the ability to delete all words older than a certain
age.
The ability is there and I don't know at what point it becomes of value to
use it.
David
More information about the Bogofilter
mailing list