garbage removal
Greg Louis
glouis at dynamicro.on.ca
Thu May 8 21:29:29 CEST 2003
On 20030508 (Thu) at 1449:25 -0400, David Relson wrote:
> At 02:35 PM 5/8/03, Jef Poskanzer wrote:
>
> >I've been wondering about this issue too. From reading the man pages
> >it seems like I'm supposed to run 'bogoutil -m' every so often, say from
> >a weekly cron job. However I don't see anything about this in the FAQ
> >or the mailing list archive or google. Is that what people are doing?
>
> Bogofilter _does_ have maintenance abilities as you've noticed. However,
> we haven't yet figured out if/when they're necessary. My wordlists have
> been accumulating since bogofilter's infancy (Oct 2002) and are up to 14MB
> (goodlist.db) and 7MB (spamlist.db). I've noticed no decline in speed,
> hence have had no need to delete anything.
>
> Other than using the "replace_nonascii_characters" option (because there
> seemed to be lots of totally garbaged tokens), I've not done anything about
> maintenance.
>
> Others have busier mail servers and bigger wordlists. Perhaps they'll
> share their observations with us.
>
I was one of the people who thought it would be necessary to prune
garbage from the training database. When the maintenance function
became available, I pounced on it gleefully and raussed all count-1
tokens, only to find that I was delivering 25% of spam! I restored the
pre-maintenance versions of goodlist.db and spamlist.db and decided I
would want to understand how bogofilter works in a bit more depth
before doing "maintenance" of that kind.
My current goodlist.db at home is about 10 Mb and spamlist 17 Mb; like
David, I've been accumulating tokens since last October. At work,
where I serve just under 100 users, we're at 58 Mb and 36 Mb
respectively. The only maintenance I've done is occasionally to dump
the whole db and reload it, using bogoutil -d and -l; this compacts the
b trees and improves performance a bit.
I don't use replace-nonascii-characters; we correspond in many
languages at work, and even at home I need the iso-8859-1 set. Doesn't
seem to hurt all that much.
--
| G r e g L o u i s | gpg public key: finger |
| http://www.bgl.nu/~glouis | glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
More information about the Bogofilter
mailing list