multiple filters

Michael D Richards michael at emdee.net
Wed Mar 19 15:16:24 CET 2003


tsh at mrc-lmb.cam.ac.uk wrote:

>1. Is it realistic to operate, say, some hundreds of bogofilter
>databases on the same box (the total number of messages processed
>would be the same as for a single global filter, but each user
>would have his own tables), and is this likely to require a
>very beefy box. Are there any performance indicators anywhere?
>
I do not have hundreds, but I do have dozens working this way with qmail 
as the MTA and it seems to work fine. The toughest part is figuring out 
your interfaces and getting it all to work, but it is no big deal.

>2. Can bogofilter be trained on a diet of spam-only? What happens if
>the ham wordlists are empty? Whenever a spam msg is added to the spam
>corpus (have I got the right terminology here?) is it necessary
>to compensate with some ham in the ham corpus to avoid skewing things.
>  
>
I started out using precollected corpora, but recently I slightly 
changed the method and I now start a user with nothing. You can only get 
away with this if you use the update option. At first all mail that 
comes in will be considered "ham". I have users move their spam to a 
Spam folder (using IMAP) and every hour register everything in there 
that wasn't previously marked as spam registered as spam an unregistered 
from the gooddb.

The effectivness of this method seems slighly lower at first, but 
rapidly improves to the normal level of expected results. The chance of 
false positives may be slightly, but not significantly higher. 
Specifically I have seen one false positive using this method where I 
had seen none before. It might just have been a matter of time.

Michael~





More information about the Bogofilter mailing list