Help with bogofilter setup

David Relson relson at osagesoftware.com
Tue Dec 31 19:49:22 CET 2002


Doug,

I think your plans are fine.  As you probably know by now, I strongly 
recommend using the Fisher algorithm ('-f' switch or "algorithm=fisher" in 
the config file).  It's trisate discrimination (between spam, ham, and 
unsure) is great.

David

At 01:29 PM 12/31/02, Doug Mandell wrote:

>Hi, pardon me if this question is too simple for this list, I've been 
>lurking for a while hoping to get all my questions answered without having 
>to answer any but as the new year approaches I thought it might be time to 
>finally get Bogofilter going.
>
>I think it'd probably be easiest to describe my desired setup, then get 
>advice on how to get it working properly.  I've got a small domain that 
>handles mail for some friends and myself, I want to set up bogofilter to 
>prevent an increasing spam problem from becoming an epidemic.
>
>Currently we're using sendmail and my users all download their mail via 
>pop3 (they're scattered about the country).  I thought the best way to use 
>bogofilter was to use the passthrough option to add a header, then the 
>users can decide whether or not they want to filter mail based on what the 
>header tells them.  I would also set up a spam@ and nospam@ address, any 
>mail that was inappropriately classed would be sent to one of those 
>addresses and a cron job would relabel them (using the -S or -N flags, right?).

Yep.  People are successfully doing that.

>I've already set up preliminary spam and nospam dbs, now the question is 
>how to setup the passthrough bit.  After mail comes in to Sendmail it gets 
>sent to procmail which delivers the mail to the user mailboxes in 
>/var/spool/mail, how do I setup procmail so that it uses bogofilter 
>(bogofilter -p -e right?) to add the X-Bogosity header prior to delivery 
>of the mail?

You've got the right idea.  I've attached a copy of my /etc/procmailrc file 
which is used on the mailserver for my small domain (which serves me, my 
business, and my family).


>Secondly, do you think that my overall plan is a good idea?  Because they 
>all use pop3 and not imap it's impractical to filter mail into mailboxes 
>on the server, it seems that allowing the users to filter based on the 
>header is the best idea.
>
>Finally, will having two big dbs for all users rather than having a db for 
>each user be a bad idea?  I've populated the dbs with an equal number of 
>spam and nospam messages from each user (that I've received spam from so 
>far, anyway), should that be sufficient?  I'd much rather find out now 
>that my whole setup won't work than to find out after I've set everything up.

One interesting technique is called "train on error"  You'll neeed to 
create two mailboxes - one with spam and the other with ham - and new 
(empty) wordlists and then run the script contrib/randomtrain.  Randomtrain 
runs bogofilter for each message (in a random order) and compares the 
bogofilter output (ham/spam/unsure) with the known correct answer (since it 
knows whether the message is from the spam or ham mailbox).  If bogofilter 
incorrectly classified the message, randomtrain feeds the message to 
bogofilter (with the proper s/n flag) so that bogofilter will know better 
next time.  It can be useful to run randomtrain a second time (without 
clearing the word lists).

I just rebuilt my wordlists from 3 1/2 months of messages (approx 20,000) 
and am now running on wordlists with a total of 1000 messages (roughly) and 
am getting good results.


>I know I may not be asking all the right questions, please feel free to 
>treat me like the idiot I am!
>
>Thanks,
>
>Doug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: procmailrc
Type: application/octet-stream
Size: 1570 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20021231/eeb9e17d/attachment.obj>


More information about the Bogofilter mailing list