using bogofilter on smtp server side

Tom Anderson tanderso at oac-design.com
Thu Apr 7 16:40:44 CEST 2005


Ehlo Michal,

>> I'm replacing mail server at work (qmail, but thats not an issue) and
>> piping every message received via smtp through `bogofilter -p -e`. Using
>> my personal wordlist (from bogofilter in procmail) doesn't work well
>> since rcvd:, from: and to: keys no longer match. I bounced a bunch of 
>> spam
>> from my workstation to the new mail server, and half of it is now
>> unsure (due to different keys in the header).

Why not set up a wordlist for each user?  You can chain wordlists so that 
your personal wordlist is still the basis, but then each user has their own 
deviations.

More info: http://bogofilter.sourceforge.net/faq.shtml#multiple

>> I'm pondering how to treat messages send to spam at my.mail.server and
>> nonspam at my.mail.server by my users to train bogofilter. I'll ask users
>> to bounce missclassified messages, but I know some of them might reply
>> or forward to these addresses.
>>
>> What is the best strategy? Strip the header of the messsage and
>> reclassify just the body? Try to determine if message was forwarded
>> instead of bounced and only then strip the header? Or jest let it all in
>> and let the algorithm handle it?

I would suggest that your users always forward as attachment so that headers 
are completely intact and unaltered.

>> How do you do it?
I use http://orderamidchaos.com/bogofilter/bfproxy for parsing out 
attachments and registering them.


>>
>> Thanks,
>>   Michal
>>
>> PS. please cc me as I'm not on the mailing list.
>
> H'lo Michael,
>
> Bogofilter learns quickly.  Many people have crossed over from old
> versions to new versions and have encountered the new header file
> changes.
>
> The best strategy is to continue with your existing wordlist
> and train on all the unsures.  Yes, the number will be significant for
> a while.  Since bogofilter learns quickly, the rate of unsures should
> drop quickly.
>
> Sounds like you're worried about work addresses being tainted as
> spammish.  Right?  You can find out current scores for work related
> tokens using bogoutil, i.e.
>
>  bogoutil -p $BOGODIR work.com user1 user2 ...
>
> The scores will reflect the ratio of spam to good mail.
>
> For dealing with new messages, have your users forward them as
> attachments.  Then have a script process the messages to
> spam at my.mail.server, extract the attachment email, and train with it.
> For the users who simply forward (rather than forward as attachment),
> you could strip the headers.  It should work fine.
>
> After things settle down, you may want to use look at the timestamps in
> a wordlist dump ("bogoutil -d") and see if there are lots of old
> tokens.  Likely you can delete the really old ones (say those that are
> more than 2 yrs old).
>
> HTH,
>
> David
>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter 

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list