Using the -u option and database size
Tom Anderson
tanderso at oac-design.com
Thu Mar 22 19:23:11 CET 2007
Peter Gutbrod wrote:
> am 21.03.2007 12:27 Uhr schrieb Bill McClain unter wmcclain at salamander.com:
>
>
>>On Wed, 21 Mar 2007 10:42:41 +0100
>>Peter Gutbrod <lists at media-fact.com> wrote:
>>
>>
>>>So far I used the -u option with bogofilter. Meanwhile my wordlist.db has
>>>grow up to about 200 MB and I'm wondering, whether it puts much load onto
>>>the server to match each mail against such a big database.
>>>
>>>I think the size is mainly due to the automatic registering with the -u
>>>option.
>>>
>>>So what do you think? Is it better not to use the -u option to keep the
>>>database small? Or so you think a 200MB database is not a problem even on a
>>>production mailserver that is receiving thousands of (spam) email every day.
>>
>>You might look into the "threash_update" parameter:
>>
>># Skip autoupdating if the spamicity is within this value
>># of 0.000000 (surely ham) or 1.000000 (surely spam).
>>
>>I use the default of 0.01, meaning messages with spamicity greater than 0.99
>>and hams less than 0.01 are not registered. This cuts down autoupdate
>>registrations by a large factor; maybe 1/10?
>>
>>The idea is that well-recognized messages do not need to be registered. You
>>do miss some new tokens and counts that might be useful in the future, but in
>>practice I've found that accuracy is not harmed.
>
>
> Bill,
>
> sounds good. But processing the messages with a shell script (as I do now)
> is a bit complicated, if I have to take into account the spamicity value as
> well to decide, whether I should unregister the message beforehand or not.
> Especially, as bash can't do floating point comparisons.
>
> It would make things much easier, if bogofilter would add an "Bogofilter
> registered" header to all messages, that have already been registered.
>
> Is there an option to enable something like this with the current
> bogofilter? I didn't find one so far.
>
> Otherwise I lean towards not using -u and training bogofilter just with -s
> or -n to keep things simple, like John suggested.
>
> Peter
Of course bogofilter can already add a header. It's called "X-Bogosity"
and the value is Unsure, Spam/Yes, Ham/No, or however you have it set in
your bogofilter.cf. If your X-Bogosity is Unsure, then it was not
registered, otherwise it was. Just do a simple regex.
Tom
More information about the Bogofilter
mailing list