using bogofilter on smtp server side

David Relson relson at osagesoftware.com
Thu Apr 7 13:43:52 CEST 2005


On Wed, 6 Apr 2005 12:28:10 -0500
Michal Sabala wrote:

> Hi,
> 
> I've been using bogofilter for many years out of my procmail and it
> worked out great.
> 
> I'm replacing mail server at work (qmail, but thats not an issue) and
> piping every message received via smtp through `bogofilter -p -e`. Using
> my personal wordlist (from bogofilter in procmail) doesn't work well
> since rcvd:, from: and to: keys no longer match. I bounced a bunch of spam
> from my workstation to the new mail server, and half of it is now
> unsure (due to different keys in the header).
> 
> I'm pondering how to treat messages send to spam at my.mail.server and
> nonspam at my.mail.server by my users to train bogofilter. I'll ask users
> to bounce missclassified messages, but I know some of them might reply
> or forward to these addresses.
> 
> What is the best strategy? Strip the header of the messsage and
> reclassify just the body? Try to determine if message was forwarded
> instead of bounced and only then strip the header? Or jest let it all in
> and let the algorithm handle it?
> 
> How do you do it?
> 
> Thanks,
>   Michal
> 
> PS. please cc me as I'm not on the mailing list.

H'lo Michael,

Bogofilter learns quickly.  Many people have crossed over from old
versions to new versions and have encountered the new header file
changes.  

The best strategy is to continue with your existing wordlist
and train on all the unsures.  Yes, the number will be significant for
a while.  Since bogofilter learns quickly, the rate of unsures should
drop quickly.

Sounds like you're worried about work addresses being tainted as
spammish.  Right?  You can find out current scores for work related
tokens using bogoutil, i.e.

  bogoutil -p $BOGODIR work.com user1 user2 ...

The scores will reflect the ratio of spam to good mail.  

For dealing with new messages, have your users forward them as
attachments.  Then have a script process the messages to
spam at my.mail.server, extract the attachment email, and train with it.
For the users who simply forward (rather than forward as attachment),
you could strip the headers.  It should work fine.

After things settle down, you may want to use look at the timestamps in
a wordlist dump ("bogoutil -d") and see if there are lots of old
tokens.  Likely you can delete the really old ones (say those that are
more than 2 yrs old).

HTH,

David

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list