garbage removal

Greg Louis glouis at dynamicro.on.ca
Thu May 8 21:29:29 CEST 2003


On 20030508 (Thu) at 1449:25 -0400, David Relson wrote:
> At 02:35 PM 5/8/03, Jef Poskanzer wrote:
> 
> >I've been wondering about this issue too.  From reading the man pages
> >it seems like I'm supposed to run 'bogoutil -m' every so often, say from
> >a weekly cron job.  However I don't see anything about this in the FAQ
> >or the mailing list archive or google.  Is that what people are doing?
> 
> Bogofilter _does_ have maintenance abilities as you've noticed.  However, 
> we haven't yet figured out if/when they're necessary.  My wordlists have 
> been accumulating since bogofilter's infancy (Oct 2002) and are up to 14MB 
> (goodlist.db) and 7MB (spamlist.db).  I've noticed no decline in speed, 
> hence have had no need to delete anything.
> 
> Other than using the "replace_nonascii_characters" option (because there 
> seemed to be lots of totally garbaged tokens), I've not done anything about 
> maintenance.
> 
> Others have busier mail servers and bigger wordlists.  Perhaps they'll 
> share their observations with us.
> 
I was one of the people who thought it would be necessary to prune
garbage from the training database.  When the maintenance function
became available, I pounced on it gleefully and raussed all count-1
tokens, only to find that I was delivering 25% of spam!  I restored the
pre-maintenance versions of goodlist.db and spamlist.db and decided I
would want to understand how bogofilter works in a bit more depth
before doing "maintenance" of that kind.

My current goodlist.db at home is about 10 Mb and spamlist 17 Mb; like
David, I've been accumulating tokens since last October.  At work,
where I serve just under 100 users, we're at 58 Mb and 36 Mb
respectively.  The only maintenance I've done is occasionally to dump
the whole db and reload it, using bogoutil -d and -l; this compacts the
b trees and improves performance a bit.

I don't use replace-nonascii-characters; we correspond in many
languages at work, and even at home I need the iso-8859-1 set.  Doesn't
seem to hurt all that much.

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list