Using the -u option and database size

Peter Gutbrod lists at media-fact.com
Thu Mar 22 18:54:33 CET 2007


am 21.03.2007 12:27 Uhr schrieb Bill McClain unter wmcclain at salamander.com:

> On Wed, 21 Mar 2007 10:42:41 +0100
> Peter Gutbrod <lists at media-fact.com> wrote:
> 
>> So far I used the -u option with bogofilter. Meanwhile my wordlist.db has
>> grow up to about 200 MB and I'm wondering, whether it puts much load onto
>> the server to match each mail against such a big database.
>> 
>> I think the size is mainly due to the automatic registering with the -u
>> option.
>> 
>> So what do you think? Is it better not to use the -u option to keep the
>> database small? Or so you think a 200MB database is not a problem even on a
>> production mailserver that is receiving thousands of (spam) email every day.
> 
> You might look into the "threash_update" parameter:
> 
> #       Skip autoupdating if the spamicity is within this value
> #       of 0.000000 (surely ham) or 1.000000 (surely spam).
> 
> I use the default of 0.01, meaning messages with spamicity greater than 0.99
> and hams less than 0.01 are not registered. This cuts down autoupdate
> registrations by a large factor; maybe 1/10?
> 
> The idea is that well-recognized messages do not need to be registered. You
> do miss some new tokens and counts that might be useful in the future, but in
> practice I've found that accuracy is not harmed.

Bill,

sounds good. But processing the messages with a shell script (as I do now)
is a bit complicated, if I have to take into account the spamicity value as
well to decide, whether I should unregister the message beforehand or not.
Especially, as bash can't do floating point comparisons.

It would make things much easier, if bogofilter would add an "Bogofilter
registered" header to all messages, that have already been registered.

Is there an option to enable something like this with the current
bogofilter? I didn't find one so far.

Otherwise I lean towards not using -u and training bogofilter just with -s
or -n to keep things simple, like John suggested.

Peter





More information about the Bogofilter mailing list