Using the -u option and database size

John G Walker johngwalker at tiscali.co.uk
Wed Mar 21 12:38:42 CET 2007



On Wed, 21 Mar 2007 06:27:22 -0500 Bill McClain
<wmcclain at salamander.com> wrote:

> On Wed, 21 Mar 2007 10:42:41 +0100
> Peter Gutbrod <lists at media-fact.com> wrote:
> 
> > So far I used the -u option with bogofilter. Meanwhile my
> > wordlist.db has grow up to about 200 MB and I'm wondering, whether
> > it puts much load onto the server to match each mail against such a
> > big database.
> > 
> > I think the size is mainly due to the automatic registering with
> > the -u option.
> > 
> > So what do you think? Is it better not to use the -u option to keep
> > the database small? Or so you think a 200MB database is not a
> > problem even on a production mailserver that is receiving thousands
> > of (spam) email every day.
> 
> You might look into the "threash_update" parameter: 
> 
> #       Skip autoupdating if the spamicity is within this value
> #       of 0.000000 (surely ham) or 1.000000 (surely spam).
> 
> I use the default of 0.01, meaning messages with spamicity greater
> than 0.99 and hams less than 0.01 are not registered. This cuts down
> autoupdate registrations by a large factor; maybe 1/10?
> 
> The idea is that well-recognized messages do not need to be
> registered. You do miss some new tokens and counts that might be
> useful in the future, but in practice I've found that accuracy is not
> harmed.
> 

I seem to have missed this parameter, which looks very useful. So thanks
for that.

Bogofilter very quickly pushes the spamicity of recognised messages to
the extremes, so it would not be unreasonable to set this parameter to
0.1, cutting down registration by an even greater amount,


-- 
 All the best,
 John



More information about the Bogofilter mailing list