Using the -u option and database size

Bill McClain wmcclain at salamander.com
Wed Mar 21 12:27:22 CET 2007


On Wed, 21 Mar 2007 10:42:41 +0100
Peter Gutbrod <lists at media-fact.com> wrote:

> So far I used the -u option with bogofilter. Meanwhile my wordlist.db has
> grow up to about 200 MB and I'm wondering, whether it puts much load onto
> the server to match each mail against such a big database.
> 
> I think the size is mainly due to the automatic registering with the -u
> option.
> 
> So what do you think? Is it better not to use the -u option to keep the
> database small? Or so you think a 200MB database is not a problem even on a
> production mailserver that is receiving thousands of (spam) email every day.

You might look into the "threash_update" parameter: 

#       Skip autoupdating if the spamicity is within this value
#       of 0.000000 (surely ham) or 1.000000 (surely spam).

I use the default of 0.01, meaning messages with spamicity greater than 0.99
and hams less than 0.01 are not registered. This cuts down autoupdate
registrations by a large factor; maybe 1/10?

The idea is that well-recognized messages do not need to be registered. You
do miss some new tokens and counts that might be useful in the future, but in
practice I've found that accuracy is not harmed.

-Bill
-- 
Sattre Press                                The King in Yellow
http://sattre-press.com/                 by Robert W. Chambers
info at sattre-press.com         http://sattre-press.com/kiy.html



More information about the Bogofilter mailing list