Using the -u option and database size

Tom Anderson tanderso at oac-design.com
Thu Mar 22 19:23:11 CET 2007


Peter Gutbrod wrote:
> am 21.03.2007 12:27 Uhr schrieb Bill McClain unter wmcclain at salamander.com:
> 
> 
>>On Wed, 21 Mar 2007 10:42:41 +0100
>>Peter Gutbrod <lists at media-fact.com> wrote:
>>
>>
>>>So far I used the -u option with bogofilter. Meanwhile my wordlist.db has
>>>grow up to about 200 MB and I'm wondering, whether it puts much load onto
>>>the server to match each mail against such a big database.
>>>
>>>I think the size is mainly due to the automatic registering with the -u
>>>option.
>>>
>>>So what do you think? Is it better not to use the -u option to keep the
>>>database small? Or so you think a 200MB database is not a problem even on a
>>>production mailserver that is receiving thousands of (spam) email every day.
>>
>>You might look into the "threash_update" parameter:
>>
>>#       Skip autoupdating if the spamicity is within this value
>>#       of 0.000000 (surely ham) or 1.000000 (surely spam).
>>
>>I use the default of 0.01, meaning messages with spamicity greater than 0.99
>>and hams less than 0.01 are not registered. This cuts down autoupdate
>>registrations by a large factor; maybe 1/10?
>>
>>The idea is that well-recognized messages do not need to be registered. You
>>do miss some new tokens and counts that might be useful in the future, but in
>>practice I've found that accuracy is not harmed.
> 
> 
> Bill,
> 
> sounds good. But processing the messages with a shell script (as I do now)
> is a bit complicated, if I have to take into account the spamicity value as
> well to decide, whether I should unregister the message beforehand or not.
> Especially, as bash can't do floating point comparisons.
> 
> It would make things much easier, if bogofilter would add an "Bogofilter
> registered" header to all messages, that have already been registered.
> 
> Is there an option to enable something like this with the current
> bogofilter? I didn't find one so far.
> 
> Otherwise I lean towards not using -u and training bogofilter just with -s
> or -n to keep things simple, like John suggested.
> 
> Peter

Of course bogofilter can already add a header.  It's called "X-Bogosity" 
and the value is Unsure, Spam/Yes, Ham/No, or however you have it set in 
your bogofilter.cf.  If your X-Bogosity is Unsure, then it was not 
registered, otherwise it was.  Just do a simple regex.

Tom





More information about the Bogofilter mailing list